PowerPoint Presentation

Document Sample
PowerPoint Presentation Powered By Docstoc
					Back to the future: Where we’re going, we
          don’t need phonemes.

   Implications of a gradient lexicon.



            Bob McMurray
           University of Iowa
           Dept. of Psychology
   Peter Ladefoged
    (1925 – 2006)

He taught us all phonetics
                          Collaborators


Richard Aslin
Michael Tanenhaus
David Gow




Joe Toscano
Cheyenne Munson     The students of the
Meghan Clayards     MACLab
Dana Subik
In language, information arrives sequentially.

   • Partial syntactic and semantic representations are
     formed as words arrive.

   The cowboys chased the linguists away…


   • Words are identified over
     sequential phonemes.
Spoken Word Recognition is an ideal arena in which
to study these issues because:

   • Speech production gives us a lot of rich temporal
     information to use in this way.

   • We have a clear understanding of the input (from
     phonetics).

   • The output is easy to measure online
Online Comprehension

  • Listeners form hypotheses as the input unfolds.

  • Need measurements of how listeners interpret
    speech, moment-by-moment.

  • May reveal how information is integrated:

     - Discreteness vs. Gradiency
     - Combinatorial Units
Mechanisms of Temporal Integration

 Stimuli do not change arbitrarily.

 Perceptual cues reveal something about the
  change itself.

 Active integration:
   • Anticipating future events
   • Retain partial present representations.
   • Resolve prior ambiguity.
                                                                      Overview
                                          Input: b...      u…   tt…     e…    r
                                            time

1) Speech perception and Spoken Word       beach
                                           butter

   Recognition.                            bump
                                           putter
                                             dog




            2) Lexical activation is sensitive to fine-
               grained detail in speech.

3) Where we’re going, we don’t need                              Lexicon

   phonemes: evidence for continuous
                                                   Cue 1        Cue 2        time
   information integration.
            4) Back in time: staying off the garden-
               path.
5) Forward to the future: coping with
   (and benefiting from) with
   phonological modification.
Online Word Recognition

• Information arrives sequentially
• At early points in time, signal is temporarily ambiguous.


                           X
                          basic          bakery
ba… kery                            X
                                   barrier

                         X
                      barricade          baitX
                                  X
                                  baby
• Later arriving information disambiguates the word.
Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest
  moments of input.

• Activation Based: Lexical candidates (words) receive
  activation to the degree they match the input.

• Parallel Processing: Multiple items are active in
  parallel.

• Competition: Items compete with each other for
  recognition.
Input:    b...   u…   tt…   e…   r
  time

beach
butter
 bump
 putter
   dog
These processes have been well defined for a phonemic
representation of the input.

                       
              l   gwd

But considerably less ambiguity if we consider
subphonemic information.

Example: subphonemic effects of motor processes.
                                                Coarticulation

Any action reflects future actions as it unfolds.

Example: Coarticulation
   Articulation (lips, tongue…) reflects current, future and
   past events.

   Subtle subphonemic variation in speech reflects temporal
   organization.

      n              n
                           Sensitivity to these perceptual
      e             e     details might yield earlier
      t              c     disambiguation.
                     k
These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded.


      Example: Categorical Perception
                                            Categorical Perception


                       100                                     100




                                                                 Discrimination
B




                        % /p/
                                Discrimination


                                    ID (%/pa/)
P                          0                                     0
                                B                VOT       P

 • Sharp identification of tokens on a continuum.
 • Discrimination poor within a phonetic category.

Subphonemic variation in VOT is discarded in favor of a
discrete symbol (phoneme).
                                              Sense
Categorical Perception (CP)

Defined fundamental
computational problems.
                                           Words
CP is output of




                              Phonology
   • Speech perception
                                          Phonemes
Input to
   • Phonology
   • Word recognition.
                                             Sound
Evidence against the strong form of Categorical
Perception from psychophysical-type tasks:

        Discrimination Tasks
         Pisoni and Tash (1974)
         Pisoni & Lazarus (1974)
Classic explanation: & Viemeister (1977)
         Carney, Widin
       Training
Auditory tasks: non-categorical
         Samuel (1977)
Phonological tasks: categorical
         Pisoni, Aslin, Perey & Hennessy (1982)
       Goodness within-category variation is noise.
Paradigmatic CP: Ratings
        Miller (1997)
Not important to higher language.
         Massaro & Cohen (1983)
                                               Sense
Categorical Perception (CP)

Enables a divide-and-conquer
approach.
                                            Words        Cont.




                               Phonology
                                                         cues
                                                       (non-CP)
                                           Phonemes




                                              Sound
                                               Sense
Categorical Perception (CP)

Enables a divide-and-conquer
approach.
                                            Words
But, assumes that




                               Phonology
1) Speech tasks tap phonemes               Phonemes
   (or something like them)

2) Phonemes (or something
   like them) are legitimate                  Sound
   processing units.
Minimal computational problem:

Computing meaning.

                                          Words




                             Phonology
                                         Phonemes




                                            Sound
Minimal computational problem:

Computing meaning.

CP tasks don’t necessarily                  Words
tap a stage of this problem.




                               Phonology
                                           Phonemes
                                                      ?
                                                      CP




                                              Sound
Minimal computational problem:

Computing meaning.

CP tasks don’t necessarily                     Words
tap a stage of this problem.




                                  Phonology
                                              Phonemes
Lexical representation:
clearly a component.


     Goal: Reassess continuous                   Sound
     sensitivity (non-CP) w.r.t. words
                                     Experiment 1




  Does within-category acoustic detail
   systematically affect higher level
              language?


Is there a gradient effect of continusuous
   acoustic detail on lexical activation?
                                                     Experiment 1

    A gradient relationship would yield systematic effects of
    subphonemic information on lexical activation.


    If this gradiency is useful for temporal integration, it must be
    preserved over time.


    Need a design sensitive to both acoustic detail and detailed
    temporal dynamics of lexical activation.




                                 McMurray, Aslin & Tanenhaus (2002)
                                                         Acoustic Detail

Use a speech continuum—more steps yields a better
picture acoustic mapping.

KlattWorks: generate synthetic continua from natural
speech.

    9-step VOT continua (0-40 ms)

    6 pairs of words.
        beach/peach      bale/pale       bear/pear
        bump/pump        bomb/palm       butter/putter

    6 fillers.
        lamp     leg     lock   ladder   lip      leaf
        shark    shell   shoe   ship     sheep    shirt
                                        Temporal Dynamics


          How do we tap on-line recognition?
          With an on-line task: Eye-movements

Subjects hear spoken language and manipulate objects in
a visual world.

Visual world includes set of objects with interesting
linguistic properties.

   a beach, a peach and some unrelated items.

Eye-movements to each object are monitored throughout
the task.

            Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
Why use eye-movements and visual world paradigm?

   • Relatively natural task.

   • Eye-movements generated very fast (within 200ms of
     first bit of information).

   • Eye movements time-locked to speech.

   • Subjects aren’t aware of eye-movements.

   • Fixation probability maps onto lexical activation..
              Task




A moment
to view the
items
            Task


     Bear


Repeat
1080
times
                                                                   Identification Results


                  1
                 0.9
                 0.8
                 0.7
                                                                   High agreement
proportion /p/




                 0.6                                               across subjects and
                 0.5                                               items for category
                 0.4
                 0.3
                                                                   boundary.
                 0.2
                 0.1
                  0
                       0   5   10   15   20   25    30   35   40
                   B                VOT (ms)                  P


                  By subject:                      17.25 +/- 1.33ms
                  By item:                         17.24 +/- 1.24ms
                                                         Task

                                       200 ms
                                                Trials
                              1

                              2

                              3

                              4

                              5




                         % fixations
Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
                                                Time
                                                                                        Task



                            VOT=0 Response=                VOT=40 Response=
                      0.9
Fixation proportion



                      0.8
                      0.7
                      0.6
                      0.5
                      0.4
                      0.3
                      0.2
                      0.1
                       00    400   800   1200   1600   0    400   800   1200   1600   2000

                                                Time (ms)

             More looks to competitor than unrelated items.
                                                                                                          Task


Given that
   • the subject heard bear                                               How often was the subject
   • clicked on “bear”…                                                   looking at the “pear”?


                      Categorical Results                                    Gradient Effect
Fixation proportion




                                                    Fixation proportion
                                           target                                                target




                       competitor                                            competitor
                                    time                                                  time
                                                                                                          Results


                                  Response=                                           Response=
                       0.16
                                                           VOT                                VOT
Competitor Fixations



                       0.14                                   0 ms                                20 ms
                       0.12                                   5 ms                                25 ms
                                                              10 ms                               30 ms
                        0.1                                   15 ms                               35 ms
                                                                                                  40 ms
                       0.08
                       0.06

                       0.04

                       0.02

                         0
                              0       400     800   1200   1600       0   400   800    1200   1600    2000
                                                    Time since word onset (ms)


                       Long-lasting gradient effect: seen throughout the
                       timecourse of processing.
                                    Response=                        Response=
                            0.08

     Competitor Fixations   0.07
                                                                     Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                     Category
                                                     Boundary
                            0.02
                                0       5       10   15   20    25       30     35   40
                                                       VOT (ms)

Area under the curve:
  Clear effects of VOT B: p=.017*                                             P: p<.001***
          Linear Trend B: p=.023*                                             P: p=.002***
                                    Response=                        Response=
                            0.08

     Competitor Fixations   0.07
                                                                     Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                     Category
                                                     Boundary
                            0.02
                                0       5       10   15   20    25       30     35   40
                                                       VOT (ms)

Unambiguous Stimuli Only
  Clear effects of VOT B: p=.014* P: p=.001***
          Linear Trend B: p=.009** P: p=.007**
                                                  Summary


Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
   • Gradient effect of VOT on looks to the competitor.
   • Effect holds even for unambiguous stimuli.
   • Seems to be long-lasting.

Consistent with growing body of work using priming
(Andruski, Blumstein & Burton, 1994; Utman, Blumstein &
Burton, 2000; Gow, 2001, 2002).
              An alternative framework
1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

2) Continuous acoustic detail is represented as gradations in
   activation across the lexicon.

3) This can do the work of sublexical units like phonemes.

4) Gradient sensitivity coupled to normal word recognition
   processes enables the system to take advantage of
   subphonemic regularities for temporal integration.
                                        Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

     Voicing
     Laterality, Manner, Place
     Natural Speech
     Vowel Quality
     Infant voicing categories
                                                     Extensions

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

     Voicing
     Laterality, Manner, Place
     Natural Speech
     Vowel Quality
     Infant voicing categories      B                  P
   Metalinguistic Tasks


                                     L                 Sh
                                                                                    Extensions

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

     Voicing
     Laterality, Manner, Place
     Natural Speech
                                                          0.1
     Vowel Quality




                                  Competitor Fixations
     Infant voicing categories                          0.08                       Response=P
                                                                                    Looks to B
                                                         0.06

   Metalinguistic Tasks                                 0.04 Response=B
                                                                Looks to B
                                                         0.02                Category
                                                                             Boundary
                                                           00    5   10   15   20   25   30   35   40
                                                                       VOT (ms)
                                                                                    Extensions

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

     Voicing
     Laterality, Manner, Place
     Natural Speech
                                                          0.1
     Vowel Quality




                                  Competitor Fixations
     Infant voicing categories                          0.08                       Response=P
                                                                                    Looks to B
                                                         0.06

   Metalinguistic Tasks                                 0.04 Response=B
                                                                Looks to B
                                                         0.02                Category
                                                                             Boundary
                                                           00    5   10   15   20   25   30   35   40
                                                                       VOT (ms)
                                         Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

     Voicing
     Laterality, Manner, Place
     Natural Speech
     Vowel Quality
     Infant voicing categories

   Metalinguistic Tasks

  ? Non minimal pairs        (Exp 3-4)
  ? Duration of effect
2) Continuous acoustic detail is represented as gradations in
   activation across the lexicon.


    Input:    b...     u…       m…        p…
      time

      bump
      pump
      dump
        bun
     bumper
      bomb
3) This can do the work of sublexical units like phonemes.



    If lexical processes can represent speech detail, do
    we need sublexical processes?

    Perhaps:
    How are multiple cues (to the same phoneme)
    integrated? (Exp 2)
                                       Temporal Integration

4) Gradient sensitivity coupled to normal word recognition
   processes enables the system to take advantage of
   subphonemic regularities for temporal integration.

    Regressive ambiguity resolution (exp 3-5):
       • Ambiguity retained until more information arrives.

    Progressive expectation building (exp 5-6):
       • Phonetic distinctions are spread over time
       • Anticipate upcoming material.
                                                            Overview

    1) Speech perception and Spoken Word
       Recognition.

                2) Lexical activation is sensitive to fine-
                   grained detail in speech.

    3) Where we’re going, we don’t need                  Lexicon

       phonemes: evidence for continuous
                                                Cue 1   Cue 2   time
       information integration.
                4) Back in time: staying off the garden-
                   path.
    5) Forward in time: coping with (and
       benefiting from) with phonological
      modification.
           Substitute your favorite sublexical unit here
                   (syllables, diphones, etc)…

Traditional speech chain:
             signal-> phonemes -> words

                      words



                phonemes       ə  
                                    Measuring phonemes


What do phonemes do?
 1) Categorize continuous
    acoustic detail?
                                   words
 2) Integrate multiple cues?
 3) Generalize phonological
    information during         phonemes

    development?
 4) Learning new words?
 5) Speech production?
 6) Reading?
                                       Measuring phonemes

 What do you do with phonemes?
                                      words
   1) Categorize continuous
      acoustic detail?
      (categorizer)               phonemes

   2) Integrate multiple cues?
      (buffer)

We have: an extremely sensitive measure of:
       - lexical activation
     Occam’s razor: if this stuff doesn’t happen until the
       - temporal meets the
     information dynamics lexicon then phonemes are
Can we use it to assess sublexical processes?
       - Categorization anything computationally.
             - not adding
       - Integration
             - not theoretically necessary
                                        Measuring phonemes


The psychological reality of the phoneme?

No.

Computational Necessity of the phoneme.

Logic:
   Take the phoneme seriously.
   What does it do (computationally) (in a specific task)?
   Does that actually get done (during comprehension)?


May still “exist”…
May still have computational necessity for other tasks…
                        1) Categorizing continuous detail


1) Categorize continuous acoustic detail. (phoneme as
    “categorizer”

  Categorical perception: continuous detail discarded.

                                                 
     - only in metalinguistic tasks (what are these
       tapping?)
                                  Phonemes
                                               ə 
  Gradient lexical activation: continuous detail
     systematically affects lexical activation.

                                  No phonemes
                                  required
                                                   
                               2) Integrating multiple cues


2) Integrating multiple cues. (phoneme as “buffer”)

Phonetic cues for a given phoneme are spread out over
time.

   •   Combined at phoneme level prior to accessing
       lexicon?

        or

   •   Direct access to lexicon. Lexical integration?
                               Integrating multiple cues


Phoneme Story
                       Cue 1   Cue 2       time
• Integration before
  lexical access
• “Buffer”                      Phoneme



The Alternative                 Lexicon
• Integration at the
  Lexicon

                       Cue 1   Cue 2      time
                                               The Logic


The Logic:

1) Assess Temporal Integration.

   - Use two asynchronous cues to single phoneme.
   - Assess lexical activation over time.

      Simultaneous effects: phonemic integration.
      Asynchronous effects: lexical integration.
                                               The Logic


The Logic:

1) Assess Temporal Integration.

   - Use two asynchronous cues to single phoneme.
   - Assess lexical activation over time.

      Simultaneous effects: phonemic integration.
      Asynchronous effects: lexical integration.
                      Which cues?
                                     Asynchronous cues

Asynchronous cues to voicing:
                   VOT  Vowel Length

Both covary with speaking rate: rate normalization




       VOT   Vowel Length
                                       Phonetic Context

Asynchronous cues to voicing:
                   VOT  Vowel Length

Both covary with speaking rate: rate normalization




       VOT   Vowel Length
Manner of Articulation

Formant Transition Slope (FTSlope):
Temporal cue like VOT: covaries with vowel length.


                  belt




                  welt
              9-step VOT continua (0-40 ms)
                                   beach/peach
2 Vowel
Lengths
          x                        beak/peak
                                   bees/peas
              9-step formant transition slope
                                   bench/wench
                                   belt/welt
                                   bell/well

                         The usual task
                         1080 Trials
                                 Experiment 2



Results

Step 1:
   Assess gradiency

Step 2:
   Assess temporal integration
                                             Manner Continua

                            1

9-step b/w continua        0.9
                           0.8       N=36
VOT varied.                0.7
                           0.6




                      %W
                           0.5
                           0.4

   bench/wench             0.3
                           0.2                                   Long

   belt/welt               0.1
                                                                 Short

                            0
   bell/well                     1   2   3    4     5    6   7   8       9

                                                  Step
                                                                                                                                              Manner Continua

                                                               Looks to competitor

                                    Clicked on                                                                                    Clicked on
             0.18                                                                                             0.18

             0.16                                                                                             0.16
                                                                                  -5
             0.14                                                                 -4                          0.14
                                                                                  -3
             0.12                                                                 -2                          0.12
                                                                                  -1
Looks to W




                                                                                                 Looks to B
                                                                                                                                                                                   1
              0.1                                                                                              0.1                                                                 2
                                                                                                                                                                                   3
                                                                                                                                                                                   4
             0.08                                                                                             0.08                                                                 5

             0.06
                                                                                                              0.06
             0.04
                                                                                                              0.04
             0.02
                                                                                                              0.02
               0
                    0   200   400    600   800     1000      1200   1400   1600        1800   2000              0
                                                                                                                     0    200   400   600   800     1000      1200   1400   1600   1800   2000
                                                 Time (ms)
                                                                                                                                                  Time (ms)




                                                                             All                                         Cons
                                                               B             .0001                                       .003
                                                               W             .0001                                       .01                                  Experiment 1
                           Gradiency Results


               Exp 2

              All   Cons
Manner    B        
          W        

Voicing   B
          P
                           Gradiency Results



              All   Cons
Manner    B        
          W        

Voicing   B        
          P        
                                      Gradiency Results


                                       Replication
                  All    Cons         All     Cons
Manner      B                              
            W                              

Voicing     B                              
            P                              

• Replication of gradiency with Manner + Voicing.

• Phoneme not required as “categorizer”
  (categorization isn’t happening)
                               Integrating multiple cues


Phoneme Story
                       Cue 1   Cue 2       time
• Integration before
  the Lexicon
• “Buffer”                      Phoneme



The Alternative                 Lexicon
• Integration at the
  Lexicon

                       Cue 1   Cue 2      time
                              Results: Temporal Dynamics



When do effects on lexical activation occur?

   VOT / FTStep effects cooccurs with vowel length.
     (Phonemic Integration)


   VOT / FTStep precedes vowel length.
     (Lexical integration)
             Compute 2 effect sizes at each 20 ms time slice.

              • VOT / FTStep: Regression slope of competitor
                fixations as a function of VOT.

                                                                                                                                   Time = 320 ms…
                                                                                                                       0.14




                                                                                                Competitor Fixations
              0.2

             0.18                                                                                                      0.12
             0.16
                                                                                 -5                                     0.1
                                                                                 -4
             0.14
                                                                                 -3
                                                                                 -2
                                                                                                                       0.08
             0.12                                                                -1
Looks to P




                                                                                                                       0.06
              0.1

             0.08                               t                                                                      0.04
                                                                                                                                                Y = M320x + B
             0.06

             0.04
                                                                                                                       0.02                     M320 = 0
             0.02
                                                                                                                         0
               0
                                                                                                                             -30    -25   -20   -15   -10   -5    0
                    0   200   400   600   800       1000    1200   1400   1600    1800   2000

                                                Time (ms)
                                                                                                                                   Distance from Boundary (VOT)
                 Compute 2 effect sizes at each 20 ms time slice.

                  • VOT / FTStep: Regression slope of competitor
                    fixations as a function of VOT.

                                                                                                                                         Time = 720 ms…
                                                                                                                           0.14




                                                                                                    Competitor Fixations
                  0.2

                 0.18                                                                                                      0.12
                 0.16
                                                                                     -5                                     0.1
                                                                                     -4
                 0.14
                                                                                     -3
                                                                                     -2
                                                                                                                           0.08
                 0.12                                                                -1
    Looks to P




                                                                                                                           0.06
                  0.1

                 0.08                               t                                                                      0.04
                                                                                                                                                    Y = M720x + B
                 0.06
                                                                                                                           0.02
                 0.04

                 0.02
                                                                                                                             0
                   0
                                                                                                                                 -30    -25   -20   -15   -10   -5    0
                        0   200   400   600   800       1000    1200   1400   1600    1800   2000
                                                                                                                                       Distance from Boundary (VOT)
                                                   Time (ms)
Compute 2 effect sizes at each 20 ms time slice.


  • Vowel Length: Difference (D) between fixations after
    hearing long vs. short vowel.
                             VOT = 30                         L-S = VL
  • Repeat for each                          0.25

    time slice,                               0.2
                                                                                Long
                      Competitor Fixations


                                                                                Short
    subject.                                 0.15


                                              0.1


                                             0.05


                                               0
                                                    0   500   1000       1500    2000   2500
                                                                 Time (ms)
Resulting dataset…

 Subject   Time      VOT (M)     Vowel (D)
 1         20        -0.0023     0.0094
           40        -0.0016     0.0095
           60        -0.0008     0.0108
           …
           2000      0.06021     0.123
 2         20        0.0014      0.0091
           40        0.0018      0.0088
                                 at each
     Compute average effect size0.0104 time slice.
        60         0.0029
           …
     When does it (statistically) depart from 0?
        2000         0.0604        0.1223
 …
                                 Voicing Continua: Temporal Dynamics

                1

              0.9

              0.8

              0.7

              0.6
Effect Size



              0.5

              0.4
                                                     t
                                                                           VOT
                                                                           Vowel
              0.3

              0.2

              0.1

                0

              -0.1
                     0   200   400   600   800      1000   1200   1400   1600   1800   2000

                                                 Time (ms)

                                                                          VOT: 660 ms
              Voiced Sounds Only                                         Vowel: 820 ms
                         Voicing Continua: Temporal Dynamics

                1

              0.9

              0.8

              0.7

              0.6
Effect Size


              0.5

              0.4                                 VOT
                                                  Vowel
              0.3

              0.2

              0.1

                0

              -0.1
                     0   500     1000      1500           2000

                               Time (ms)


                                            VOT: 640 ms
              Voiceless Sounds Only
                                           Vowel: 780 ms
                                Voicing Continua: Temporal Dynamics

                1

              0.9

              0.8

              0.7

              0.6
Effect Size


              0.5

              0.4                                                          VOT
              0.3
                                                                           Vowel

              0.2

              0.1

                0

              -0.1
                     0   200   400   600   800     1000   1200   1400   1600   1800   2000

                                                 Time (ms)


                                                                     VOT: 560 ms
              Combined
                                                                    Vowel: 800 ms
                      Integration Results

                VOT    Vowel
Voicing   B     660    820           
          P     640    780           
          All   560    800           




Manner    B
          W
          All
                           Manner Continua: Temporal Dynamics

                 1

               0.9

               0.8

               0.7

               0.6

 Effect Size   0.5

               0.4

               0.3                                                        FTStep
               0.2
                                                                          Vowel

               0.1

                 0

               -0.1

               -0.2

               -0.3
                      0   200   400   600   800      1000   1200   1400   1600   1800   2000

                                                  Time (ms)

                                                                   FTStep: 840 ms
Stops only                                                         Vowel: 1340 ms
                               Manner Continua: Temporal Dynamics

                1

              0.9

              0.8

              0.7

Effect Size   0.6

              0.5

              0.4
                                                                         FTStep
                                                                         Vowel
              0.3

              0.2

              0.1

                0

              -0.1
                     0   200   400   600   800      1000   1200   1400   1600   1800   2000

                                                 Time (ms)


                                                                     FTStep: 280 ms
Approximants only                                                     Vowel: 860 ms
                                 Manner Continua: Temporal Dynamics


                1

              0.9

              0.8

              0.7

              0.6
Effect Size




              0.5

              0.4
                                                                        FTStep
                                                                        Vowel
              0.3

              0.2

              0.1

                0

              -0.1
                     0   200   400   600   800     1000   1200   1400   1600   1800   2000

                                                 Time (ms)

                                                                        FTStep: 620 ms
                Combined
                                                                         Vowel: 880 ms
                         Integration Results

                VOT/FT    Vowel
Voicing   B     660       820           
          P     640       780           
          All   560       800           




Manner    B     840       1340          
          W     280       860           
          All   620       820           
                         Replication Underway

                   VOT/FT    Vowel
Voicing      B     660       820          
             P     640       780          
             All   560       800          
Voicing II   All   560       800          
             B     560       800          
             P     660       860          

Manner       B     840       1340         
             W     280       860          
             All   620       820          
Manner II    All   620       740          
             B     440       680          
             W     700       880          
                                           Exp 2: Summary

Phonemes not computationally necessary for:
  • Phonetic categorization.
  • Asynchronous cue integration.
Between signal and
words there doesn’t         words
appear to be any
   •   Complex computation      Still may play a role in
   •   Nonlinearities           • Development
                                • Reading
What role do intermediate       • Production…
units play?
             
         ə
                                    Exp 2: Summary

Phonemes not computationally necessary for:
  • Phonetic categorization.
  • Asynchronous cue integration.


What is doing the     words
work?

Lexical activation
processes.


            
        ə
                                                  Overview

1) Speech perception and Spoken Word
   Recognition.

            2) Lexical activation is sensitive to fine-
               grained detail in speech.

3) Where we’re going, we don’t need
   phonemes: evidence for continuous
   information integration.
            4) Back in time: staying off the garden-
               path.
5) Forward in time: coping with (and
   benefiting from) with phonological
   modification.
Argument thus far
• Continuous cues systematically affect lexical activation
  (contrary to CP & standard paradigm)
   - No need for sublexical units as “categorizers”
• Continuous acoustic cues are integrated lexically.
   - No need for sublexical units as “buffers”.
• Word recognition is about fundamentally continuous
  mappings from sound to meaning.
Argument thus far
• Continuous cues systematically affect lexical activation
  (contrary to CP & standard paradigm)
   - No need for sublexical units as “categorizers”
• Continuous acoustic cues are integrated lexically.
   - No need for sublexical units as “buffers”.
• Word recognition is about fundamentally continuous
  mappings from sound to meaning.

    What does that buy you? Temporal integration
                                              Misperception


What if initial portion of a stimulus was misperceived?

       Competitor still active
         - easy to activate it rest of the way.

       Competitor completely inactive
         - system will “garden-path”.

P ( misperception )  distance from boundary.

Gradient activation allows the system to hedge its bets.
barricade vs. parakeet        / beIrəkeId / vs. / peIrəkit /

  Input:     p/b   eI     r   ə   k      i     t…
     time

   Categorical Lexicon
  parakeet
 barricade


   Gradient Sensitivity
  parakeet
 barricade
                                             Misperception


Experiment 3

Can gradient maintenance of lexical alternatives prevent
system from wandering down “garden-path”?

By avoiding commitment
to a discrete phoneme, can
the system elegantly
recover from early                               
misperception?
                                         ə
                                             
                                                     come on,
                                                      gz.

                                                   lt’s go!
                                    Experiment 3 Methods

10 Pairs of b/p items.

     Voiced          Voiceless      Overlap
     Bumpercar       Pumpernickel   6
     Barricade       Parakeet       5
     Bassinet        Passenger      5
     Blanket         Plankton       5
     Beachball       Peachpit       4
     Billboard       Pillbox        4
     Drain Pipes     Train Tracks   4
     Dreadlocks      Treadmill      4
     Delaware        Telephone      4
     Delicatessen    Television     4
X
                                                                   Experiment 3: Results


                              Barricade -> Parricade
                       1
                                                            VOT
                      0.8                                     0
                                                              5
Fixations to Target




                      0.6                                     10
                                                              15
                      0.4                                     20
                                                              25
                      0.2                                     30
                                                              35
                       0
                        300         600               900
                                          Time (ms)




                      Faster activation of target as VOTs near lexical endpoint.

                                 —Even within the non-word range.
                                                                       Experiment 3: Results


                              Barricade -> Parricade              Parakeet -> Barakeet
                       1
                                                                                                     VOT
                      0.8                                                                              0
                                                                                                       5
Fixations to Target




                      0.6                                                                              10
                                                                                                       15
                      0.4                                                                              20
                                                                                                       25
                      0.2                                                                              30
                                                                                                       35
                       0
                        300         600               900   300       600               900   1200
                                          Time (ms)                         Time (ms)




                      Faster activation of target as VOTs near lexical endpoint.

                                 —Even within the non-word range.
               Experiment 3: Garden path

b/parri…


           X      Garden-path analysis:

                  Identify trials in which

                  Time          Fixation
                  Pre POD       Competitor
    b/parricade


                           X               Garden-path analysis:

                                           Identify trials in which

                                           Time          Fixation
                                           Pre POD       Competitor
                                           Post POD      Target


       Is the latency to switch to the target related to VOT?
           • Accelerated latency = more (residual) target activation


b/parricade

                   X                              Is the latency to switch to the
                                                  target related to VOT?
                                                      • Accelerated latency =
                                                        more (residual) target
                                                        activation

                                 260
                                 240
  Sparse data
                Time to Target




                                 220
                                 200
         but:                    180
                                                                                        B: p=.002
                                 160
                                 140
                                                                       Voiced
                                                                       Voiceless
                                                                                        P: p=.007
                                 120
                                 100
                                       0    5      10     15      20        25     30

                                           Distance from Prototype (VOT, ms)
b/parricade

               X                                    Is the latency to switch to the
                                                    target related to VOT?
                                                        • Accelerated latency =
                                                          more (residual) target
                                                          activation


                                280
                                260

  More data…
               Time to Target




                                240
                                220                                                      VOT: p=.0001
                                200
                                180                                                      Targ: p=.001
                                                                        Voiced
                                160
                                140                                     Voiceless        V x T: p>.1
                                120
                                      0   5    10     15    20     25        30     35
                                          Distance from Prototype (VOT, ms)
                                   Experiment 3: Summary


Same gradiency seen in McMurray et al (2002)

Facilitates ambiguity resolution (time-to-target)
   • 240 ms after VOT
                   ?
Gradiency lasts…
             … and is useful


Idiosyncracies:


      ?????
    • Attenuated gradiency for B
    • Effect of target.
    • Interactions with target.
                                      Effect of reduced
                                      VOT range?
                                                                                    Extensions
                                                       0.9
                                                       0.8




                                 Fixations to Target
                                                       0.7
    Replication: longer                                0.6
                                                       0.5
    continua (0-45 ms).                                0.4
                                                       0.3
                                                       0.2
                                                       0.1
                                                        0
                                                            300   500   700   900    1100   1300
                                                                        Time (ms)

    Exp 4
    Does the presence of the visual competitor (parakeet)
       artificially heighten competitor activation (and
       cause the effect)?





                                                                                    Experiment 4


    Exp 4
    Does the presence of the visual competitor (parakeet)
       artificially heighten competitor activation (and
       cause the effect)?


                  X
                                                     0.9
                                                                                                   0
                                                     0.8
                                                                                                   5
                                                     0.7




                               Fixations to Target
                                                                                                   10
                                                     0.6
                                                                                                   15
                                                     0.5
                                                                                                   20
                                                     0.4
                                                                                                   25
                                                     0.3
                                                                                                   30
                                                     0.2
                                                                                                   35
                                                     0.1
                                                                                                   40
                                                      0
                                                                                                   45
                                                           300   500   700    900    1100   1300
                                                                       Time (ms)





                                                              Experiment 4: Garden Path

b/parricade

                 X                               Is the latency to switch to the
                                                 target related to VOT?
                                                     • Accelerated latency =
                                                       more (residual) target
                                                       activation


                                 250
                Time to Target




                                 200

         No                                                                           p=.0001
   Competitor                    150
                                                                          Voiced
                                                                          Voiceless
                                 100
                                       0    10         20          30          40
                                           Distance from Prototype (VOT, ms)
                           Regressive Ambiguity Resolution

Gradiency:
   Gradient effect of within-category
   variation without minimal-pairs.

   Gradient effect long-lasting:
   mean POD = 240 ms.
                                   Regressive Ambiguity Resolution

Gradiency:
   Gradient effect of within-category variation
   without minimal-pairs.
   Gradient effect long-lasting:
   mean POD = 240 ms.


Regressive ambiguity resolution:
   • Subphonemic gradations maintained until more
     information arrives.
   • Improves (or hinder) recovery from garden path.
   • Gradient lexical sensitivity prevents over-
     committing to a garden-path interpretation.
   • Lexical processes (parallel activation, gradiency)
     are pivotal: maintain interpretation.
                                                  Overview

1) Speech perception and Spoken Word
   Recognition.

            2) Lexical activation is sensitive to fine-
               grained detail in speech.

3) Where we’re going, we don’t need
   phonemes: evidence for continuous
   information integration.
            4) Back in time: staying off the garden-
               path.
5) Forward in time: coping with (and
   benefiting from) with phonological
   modification.
                    Progressive Expectation Formation


Can within-category detail be used to predict
future acoustic/phonetic events?


Yes: Phonological regularities create systematic
within-category variation.

   • Predicts future events.
                                Experiment 5: Anticipation

Word-final coronal consonants (n, t, d) assimilate the place
of the following segment.

          Maroong Goose        Maroon Duck
Place assimilation -> ambiguous segments
                         —anticipate upcoming material.

Input:      m… a… rr… oo… ng… g… oo…             s…
   time

 maroon
  goose
   goat
   duck
                                                                         Experiment 5: Anticipation

    Assimilation is subphonemic, continuous, not discrete.



                        F2 Transitions in /æC/                                F3 Transitions in /æC/
                               Contexts                                             Contexts
                 1850                                             2800

                 1800                                             2750




                                                 Frequency (Hz)
Frequency (Hz)




                 1750
                                                                  2700
                 1700
                                                                  2650                           coronal
                 1650                                                                            assimilated
                 1600                                             2600                           labial

                 1550                                             2550
                            Pitch Period                                          Pitch Period
Subject hears
   “select the maroon    duck”
   “select the maroon    goose”
   “select the maroong   goose”
   “select the maroong   duck” *


                                   We should see
                                   faster eye-
                                   movements to
                                   “goose” after
                                   assimilated
                                   consonants.
                                                                                               Results

                                                  Onset of “goose” + oculomotor delay
                         0.9
                         0.8

   Fixation Proportion
                         0.7
                         0.6
                         0.5
                         0.4                                                 Assimilated
                         0.3                                                 Non Assimilated
                         0.2
                         0.1
                          0
                               0               200               400                600
                                                     Time (ms)
                                   Looks to “goose“ as a function of time



Anticipatory effect on looks to non-coronal.
                                                      Onset of “goose” + oculomotor delay
                           0.3



    Fixation Proportion
                          0.25                                                  Assimilated
                                                                                Non Assimilated
                           0.2

                          0.15

                           0.1

                          0.05

                            0
                                 0                 200                400                   600
                                                         Time (ms)
                                     Looks to “duck” as a function of time



Inhibitory effect on looks to coronal (duck, p=.024)
         a quick runm picks you up.

         a quick runm takes you down          ***


When /p/ is heard, the bilabial feature can be assumed to
come from assimilation (not an underlying /m/).



When /t/ is heard, the bilabial feature is likely to be
from an underlying /m/.
Subject hears
   “select the mud    drinker”
   “select the mudg   gear”      Critical Pair
   “select the mudg   drinker
     Onset of “gear”                                Avg. offset of “gear” (402 ms)

                             0.45



       Fixation Proportion
                              0.4
                             0.35
                              0.3
                             0.25
                              0.2
                             0.15                            Initial Coronal:Mud Gear
                              0.1
                                                             Initial Non-Coronal:Mug Gear
                             0.05
                               0
                                    0   200 400   600 800 1000 1200 1400 1600 1800 2000
                                                        Time (ms)



Mudg Gear is initially ambiguous with a late bias
towards “Mud”.
  Onset of “drinker”                                 Avg. offset of “drinker (408 ms)
                             0.6

       Fixation Proportion   0.5

                             0.4

                             0.3

                             0.2

                             0.1                                 Initial Coronal: Mud Drinker
                                                                 Initial Non-Coronal: Mug Drinker
                              0
                                   0   200   400   600   800 1000 1200 1400 1600 1800 2000
                                                           Time (ms)


Mudg Drinker is also ambiguous with a late bias towards
“Mug” (the /g/ has to come from somewhere).
                                     Onset of “gear”
                           0.8




     Fixation Proportion
                           0.7
                           0.6
                           0.5
                           0.4
                                                                           Assimilated
                           0.3
                                                                           Non Assimilated
                           0.2
                           0.1
                            0
                                 0                 200               400         600
                                                         Time (ms)



Looks to non-coronal (gear) following assimilated or
non-assimilated consonant.

In the same stimuli/experiment there is also a
progressive effect!
                                              predicts
Phonological modification
has a benefit and a cost:
 • Creates predictive        Assimilated     Subsequent
   regularities.            (ambiguous)      Consonant
 • Creates ambiguity.         Segment

                                                    resolves
Gradiency:                                          ambiguity
 • Enables anticipation.
 • Retain both items until ambiguity resolution.
                           What is the mechanism?
                                                    predicts


   What is the mechanism?             Assimilated   Subsequent
                                     (ambiguous)    Consonant
                                       Segment

                                                      resolves
                                                      ambiguity
Lexical activation
   • Activation/competition processes retain partial
      representations.
   • Lexical processes are predictive by nature.

Prediction:
   • Lexical competition ought to inhibit these
       processes.
                                                          Experiment 3: Extensions

                                        0.8


Compare                                 Green/m Boat
                                        0.7




progressive                             0.6




                     Looks to Labial
                                        0.5


effect as a                             0.4                                               Assim-Labials
                                                                                          Labials
                                        0.3

function of                             0.2



competition                             0.1


                                         0
                                              200   300     400      500      600   700             800

                                                                  Time (ms)

                                       0.8

Competition:                           Eight/Ape Babies
                                       0.7


100 ms delay                           0.6
               Looks to Labial




in effect.                             0.5


                                       0.4                                                Assimilated
                                                                                          Neutral
                                       0.3



Reduction in                           0.2


                                       0.1

magnitude.                              0
                                             200    300     400      500      600   700             800

                                                                  Time (ms)
Sensitivity to subphonemic detail allows the system to
simultaneously cope with and harness graded phonological
modification.

   • Increase priors on likely upcoming events.
   • Decrease priors on unlikely upcoming events.
   • Retain ambiguity until resolution occurs.

Lexical processes may play a pivotal role.
                                                           Conclusions

    Lexical activation is exquisitely sensitive
    to within-category detail.

    This sensitivity enables integration over time at multiple
    levels and time-scales using normal word recognition
    mechanisms
      1) Phonetic (e.g. VOT / Rate): No                    Lexicon



         need for “buffer”.                       Cue 1   Cue 2   time




      2) Lexical (e.g. barricade/parakeet):
         regressive ambiguity resolution.

      3) Phonological (maroong goose):
         progressive facilitation, regressive
ə        ambiguity resolution.
                                       Summary

                                Sense
The standard paradigm

• Does not capture data.
• Limits our thinking in
  terms of how continuous    Words
  detail might be used.

                            Phonemes




                               Sound
                                        Summary

The standard paradigm             Sense

• Does not capture data.
• Limits our thinking in
  terms of how continuous
  detail might be used.        Words

The Alternative

• Continuous phonetic cues.
• Integrated directly by
  normal lexical processes.     Sound
• Rich temporal integration.
• Computationally simple.
                                                                 Conclusions

Integration over time at multiple levels.
    1) Phonetic (e.g. VOT / Rate)
    2) Lexical (e.g. barricade/parakeet)
    3) Phonological (maroong goose)

These are rich, general processes:
   • Sentential garden-paths.
   • V-to-V coarticulation and harmony.                         My lab
   • Anticipatory R-coloration
   • Vowel Nasalization
  •   V-to-V coarticulation (Beddor,   •   Prosodic domain
      Fowler, Cole).                       (Keating)
  •   Assimilation (Byrd, Mitterer,    •   Reduction (Manuel)
      Gow)                             •   Misproduction (Goldrick)
                                                                       Other
  •   Nasalization (Dahan)             •   Bilingual categories (Ju)   folks
  •   R-Coloration (Hillenbrand)       •   Allophonic variation
  •   Vowel length/embedded words          (Samuel)
      (Crosswhite, Salverda)
                                 Take home message


    Spoken language is defined by change.

     But the information to cope with it is
       in the signal—if we look online.

Within-category acoustic variation is signal, not
                    noise.
Back to the future: Where we’re going, we
          don’t need phonemes.

   Implications of a gradient lexicon.



            Bob McMurray
           University of Iowa
           Dept. of Psychology
                             IR Head-Tracker
                                 Emitters
          Head-Tracker Cam       Monitor



   Head


            2 Eye cameras

Computers connected            Subject
via Ethernet                   Computer

       Eyetracker
       Computer
Misperception: Additional Results
10 Pairs of b/p items.
   • 0 – 35 ms VOT continua.


20 Filler items (lemonade, restaurant, saxophone…)

Option to click “X” (Mispronounced).

26 Subjects

1240 Trials over two days.
                                                                              Identification Results

                1.00
                0.90
Response Rate
                0.80
                0.70
                                 Voiced
                0.60
                0.50             Voiceless
                                                                             Significant target
                0.40
                0.30
                                 NW                                          responses even at
                0.20
                0.10                                                         extreme.
                0.00
                       0     5         10    15   20   25      30       35

                       Barricade                             Parricade       Graded effects of VOT
                1.00
                0.90
                                                                             on correct response
                                                                             rate.
Response Rate




                0.80
                0.70
                0.60
                                                            Voiced
                0.50
                                                            Voiceless
                0.40
                0.30                                        NW
                0.20
                0.10
                0.00
                       0    5         10     15   20   25      30       35
                       Barakeet                               Parakeet
                                                                  Phonetic “Garden-Path”


                        “Garden-path” effect:
                                Difference between looks to each target (b
                        vs. p) at same VOT.

                                     VOT = 0 (/b/)                  VOT = 35 (/p/)
                       1
Fixations to Target




                      0.8
                                Barricade
                      0.6
                                Parakeet
                      0.4

                      0.2

                       0
                            0        500               1000   0     500               1000   1500

                                           Time (ms)                      Time (ms)
                           0.15


                            0.1                             Target
( Barricade - Parakeet )
    Garden-Path Effect




                           0.05


                              0
                                                                           GP Effect:
                           -0.05
                                                                           Gradient effect of VOT.
                            -0.1
                                    0   5   10   15   20    25   30   35   Target: p<.0001
                                                 VOT (ms)
                           0.06                                            Competitor: p<.0001
                           0.04
                                   Competitor
( Barricade - Parakeet )
   Garden-Path Effect




                           0.02
                              0
                           -0.02
                           -0.04
                           -0.06
                           -0.08
                            -0.1
                                    0   5   10   15   20    25   30   35

                                                 VOT (ms)
Assimilation: Additional Results
                                  Exp 3 & 4: Conclusions


Within-category detail used in recovering from
assimilation: temporal integration.

   • Anticipate upcoming material
   • Bias activations based on context
      - Like barricade/parakeet: within-category
        detail retained to resolve ambiguity..

Phonological variation is a source of information.
                             Non-parametric approach?

                                        Categories
• Competitive Hebbian Learning
  (Rumelhart & Zipser, 1986).
• Not constrained by a particular
  equation—can fill space better.
• Similar properties in terms of           VOT
  starting  and sparseness.
                                                   Voicing Continua

                             1
                           0.9
9-step b/p continua        0.8
                                     N=29
                           0.7

VOT varied.                0.6




                      %P
                           0.5
                           0.4
                           0.3
   beach/peach             0.2
                                                                   Long
                                                                   Short
                           0.1
   bees/peas                 0
                                 0   5   10   15    20   25   30     35    40
   beak/peak                                       VOT
                                                                                                                                             Voicing Continua

                                                               Looks to competitor

                                    Clicked on                                                                                  Clicked on
              0.2                                                                                              0.2

             0.18                                                                                             0.18

             0.16                                                                                             0.16
                                                                                  -5
                                                                                  -4
             0.14                                                                                             0.14
                                                                                  -3
                                                                                  -2
             0.12                                                                                                                                                                  +1




                                                                                                 Looks to B
                                                                                  -1                          0.12
Looks to P




                                                                                                                                                                                   +2
              0.1                                                                                              0.1                                                                 +3
                                                                                                                                                                                   +4
             0.08                                t                                                            0.08
             0.06
                                                                                                              0.06
             0.04
                                                                                                              0.04
             0.02
                                                                                                              0.02
               0
                                                                                                                0
                    0   200   400    600   800       1000    1200   1400   1600    1800   2000
                                                                                                                     0    200   400   600   800     1000      1200   1400   1600   1800   2000
                                                 Time (ms)
                                                                                                                                                  Time (ms)




                                                                                  All                                    Proto
                                                                    B             .0001                                  .0001
                                                                    P             .008                                   >.1

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:11/3/2012
language:English
pages:138