Docstoc

PPT - Department of Psychology

Document Sample
PPT - Department of Psychology Powered By Docstoc
					Continuous acoustic detail affects
   spoken word recognition

      Implications for
          cognition,
    development
 and language
 disorders.                 Bob McMurray
                          University of Iowa
                         Dept. of Psychology
                    Collaborators

Richard Aslin
Michael Tanenhaus
David Gow
J. Bruce Tomblin




 Joe Toscano
 Cheyenne Munson
 Dana Subik
 Julie Markant
                         Why Speech and Word Recognition

1) Interface between perception and cognition.
   - Basic Categories            - Meaning
   - Continuous Input -> Discrete representations.

2) Meaningful stimuli are almost always temporal.
   - Music         - Visual Scenes (across saccades)
   - Language

3) We understand the:
   - Cognitive processes (word recognition)
   - Perceptual processes (speech perception)
   - Ecology of the input (phonetics)

4) Speech is important: disordered language.
                                                Divisions, Divisions…




                        Perception (& Action)
                      Speech                      Word Recognition,
      Psychology
                     Perception                  Sentence Processing




                                                        Cognition
                                                     Phonology,
       Linguistics   Phonetics
                                                     The Lexicon



Speech / Language     Speech,
                                                      Language
         Pathology    Hearing
                                  Divisions, Divisions…


Divisions useful for framing research and focusing
questions.

But:

       Divisions between domains of study

                   can become…

       Implicit models of cognitive processing.
                 Divisions in Spoken Language Understanding


      Acoustic                      Speech Perception
                                    • Categorization of acoustic
                                      input into sublexical units.


                        Sublexical Units
                        /a/     /la/         /ip/
                              /b/      /l/     /p/


                                                     Lexicon
Word Recognition
• Identification of target word
  from active sublexical units.
                                                Divisions yield processes


      Acoustic                      Speech Perception
                                    • Pattern Recognition
                                    • Normalization Processes
                                    • Stream Segregation
                        Sublexical Units
                        /a/     /la/         /ip/
                              /b/      /l/     /p/


                                                      Lexicon
Word Recognition
• Competition
• Activation
• Constraint Satisfaction
                                     Processes yield models


     Acoustic                 Speech Perception
                              • Extract invariant
                                phonemes and features.
                              • Discard continuous
                                            variation.
                        Sublexical Units
Reduce
Continuous              /a/ /la/ /ip/
Variance                   /b/ /l/ /p/


                                           Lexicon
Word Recognition
• Identify single
  referent.             Reduce
• Ignore competitors.   Variance
                             The Variance Reduction Model

    Words
            Remove     Variance Reduction Model (VRM)
            variance   Understanding speech is a process of
                       progressively extracting invariant,
Phonemes (etc)
                       discrete representations from
            Remove     variable, continuous input.
            variance




Continuous speech cues play a minimal role in word
recognition (and probably wouldn’t be helpful
anyways).
                                     Temporal Integration


The VRM might apply if speech were static.



                                  “Goon”




         Goal: Identify /u/
         Signal: Low F1, F2, High F3
         Noise: Initially: F2 decreasing Variance
                 Later: F2 increasing     Reduction
                 Presence of anti-formant Mechanisms
                                      Temporal Integration


But the dynamic properties make it more difficult.


 Gone.                Hasn’t
                                    “Goon”
Maybe in             happened
 STM?                   yet.


           Goal: Identify /u/
           Signal: Low F1, F2, High F3
           Noise: Initially: F2 decreasing
                   Later: F2 increasing
                   Presence of anti-formant
                                       Temporal Integration


But the dynamic properties make it more difficult.


 Gone.                Hasn’t
                                     “Goon”
Maybe in             happened
 STM?                   yet.
                                           Variance
                                           Utilization
                                           Mechanisms
           Goal: Identify /u/
           Signal: Low F1, F2, High F3
           Signal': Initially: F2 decreasing   Prior /g/
                    Later: F2 increasing         Upcoming
                    Presence of anti-formant     /n/
                                                       Goals

1) Replace the Variance                   Words
   Reduction Model with                           Remove
   the Variance Utilization                       variance
   Model.                             Phonemes (etc)
                                                  Remove
2) Normal lexical activation                      variance
   processes can serve as
   variance utilization mechanisms.

3) Speculatively (and not so speculatively) examine
   the consequences for:
    •   Temporal Integration / Short Term Memory.
    •   Development
    •   Non-normal Development
                                 Outline

1) Review
    • Origins of the VRM.
    • Spoken Word Recognition.
2) Empirical Test
3) The VUM
    •   Lexical Locus
    • Temporal Integration
    • SLI proposal
4) Developmental Consequences
    • Empirical Tests
    • Computational Model
    • CI proposal
                                             Word Recognition

Online Spoken Word Recognition

• Information arrives sequentially
• Fundamental Problem: At early points in time, signal is
  temporarily ambiguous.

                           X
                          basic          bakery
ba… kery                            X
                                   barrier

                        X
                      barricade          baitX
                                  X
                                  baby

• Later arriving information disambiguates the word.
                                           Word Recognition

Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest
  moments of input.

• Activation Based: Lexical candidates (words) receive
  activation to the degree they match the input.

• Parallel Processing: Multiple items are active in
  parallel.

• Competition: Items compete with each other for
  recognition.
                                     Word Recognition


Input:    b...   u…   tt…   e…   r
  time

beach
butter
 bump
 putter
   dog
                                        Word Recognition


These processes have been well defined for a phonemic
representation of the input.

                  nI n
             c A g S

Considerably less ambiguity if we consider subphonemic
information.
    • Bonus: processing dynamics may solve problems in
      speech perception.

Example: subphonemic effects of motor processes.
                                                Coarticulation

Any action reflects future actions as it unfolds.

Example: Coarticulation
   Articulation (lips, tongue…) reflects current, future and
   past events.

   Subtle subphonemic variation in speech reflects temporal
   organization.
                        Sensitivity to these perceptual
                        details might yield earlier
      n           n     disambiguation.
      e          e
      t           c     Lexical activation could retain
                  k     these perceptual details.
                                               Review:




These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded.


      Example: Categorical Perception
                                            Categorical Perception


                       100                                     100




                                                                 Discrimination
B




                        % /p/
                                Discrimination


                                    ID (%/pa/)
P                          0                                     0
                                B                VOT       P

 • Sharp identification of tokens on a continuum.
 • Discrimination poor within a phonetic category.

Subphonemic variation in VOT is discarded in favor of a
discrete symbol (phoneme).
                                  Categorical Perception


Evidence against the strong form of Categorical
Perception from psychophysical-type tasks:

      Discrimination Tasks
       Pisoni and Tash (1974)
       Pisoni & Lazarus (1974)
       Carney, Widin & Viemeister (1977)
      Training
       Samuel (1977)
       Pisoni, Aslin, Perey & Hennessy (1982)
      Goodness Ratings
       Miller (1997)
       Massaro & Cohen (1983)
                                Variance Reduction Model

                                           Words
CP enabled a                                       Remove
                                                   variance
fundamental independence of            Phonemes (etc)
speech perception & spoken                         Remove
word recognition.                                  variance




Evidence against CP seen as supporting VRM
(auditory vs. phonological processing mode).

Critical Prediction: continuous variation in the
signal should not affect word recognition.
                                    Experiment 1




  Does within-category acoustic detail
   systematically affect higher level
              language?


Is there a gradient effect of subphonemic
       detail on lexical activation?
                        McMurray, Aslin & Tanenhaus (2002)

A gradient relationship would yield systematic effects of
subphonemic information on lexical activation.


If this gradiency is useful for temporal integration, it must be
preserved over time.


Need a design sensitive to both acoustic detail and detailed
temporal dynamics of lexical activation.
                                                         Acoustic Detail

Use a speech continuum—more steps yields a better
picture acoustic mapping.

KlattWorks: generate synthetic continua from natural
speech.

    9-step VOT continua (0-40 ms)

    6 pairs of words.
        beach/peach      bale/pale       bear/pear
        bump/pump        bomb/palm       butter/putter

    6 fillers.
        lamp     leg     lock   ladder   lip      leaf
        shark    shell   shoe   ship     sheep    shirt
Acoustic Detail
                                         Temporal Dynamics


          How do we tap on-line recognition?
          With an on-line task: Eye-movements

Subjects hear spoken language and manipulate objects in
a visual world.

Visual world includes set of objects with interesting
linguistic properties.

   a beach,, a peach and some unrelated items.

Eye-movements to each object are monitored throughout
the task.

            Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
                                          Temporal Dynamics

Why use eye-movements and visual world paradigm?

   • Relatively natural task.

   • Eye-movements generated very fast (within 200ms of
     first bit of information).

   • Eye movements time-locked to speech.

   • Subjects aren’t aware of eye-movements.

   • Fixation probability maps onto lexical activation..
              Task




A moment
to view the
items
Task
            Task


     Bear


Repeat
1080
times
                                                                    Identification Results


                  1
                 0.9
                 0.8
                 0.7
                                                                   High agreement
proportion /p/




                 0.6                                               across subjects and
                 0.5                                               items for category
                 0.4
                 0.3
                                                                   boundary.
                 0.2
                 0.1
                  0
                       0   5   10   15   20   25    30   35   40
                   B                VOT (ms)                  P


                  By subject:                      17.25 +/- 1.33ms
                  By item:                         17.24 +/- 1.24ms
                                           Eye-Movement Analysis

                                       200 ms
                                                Trials
                              1

                              2

                              3

                              4

                              5




                         % fixations
Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
                                                Time
                                                                  Eye-Movement Results



                            VOT=0 Response=                VOT=40 Response=
                      0.9
Fixation proportion



                      0.8
                      0.7
                      0.6
                      0.5
                      0.4
                      0.3
                      0.2
                      0.1
                       00    400   800   1200   1600   0    400    800   1200   1600   2000

                                                Time (ms)

             More looks to competitor than unrelated items.
                                                                                 Eye-Movement Results


Given that
   • the subject heard bear                                               How often was the subject
   • clicked on ―bear‖…                                                   looking at the ―pear‖?


                      Categorical Results                                    Gradient Effect
Fixation proportion




                                                    Fixation proportion
                                           target                                                target




                       competitor                                            competitor
                                    time                                                  time
                                                                                Eye-Movement Results


                                  Response=                                           Response=
                       0.16
                                                           VOT                                VOT
Competitor Fixations



                       0.14                                   0 ms                                20 ms
                       0.12                                   5 ms                                25 ms
                                                              10 ms                               30 ms
                        0.1                                   15 ms                               35 ms
                                                                                                  40 ms
                       0.08
                       0.06

                       0.04

                       0.02

                         0
                              0       400     800   1200   1600       0   400   800    1200   1600    2000
                                                    Time since word onset (ms)


                       Long-lasting gradient effect: seen throughout the
                       timecourse of processing.
                                                                        Eye-Movement Results


                                    Response=                        Response=
                            0.08

     Competitor Fixations   0.07
                                                                     Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                     Category
                                                     Boundary
                            0.02
                                0       5       10   15   20    25       30     35   40
                                                       VOT (ms)

Area under the curve:
  Clear effects of VOT B: p=.017*                                             P: p<.001***
          Linear Trend B: p=.023*                                             P: p=.002***
                                                                        Eye-Movement Results


                                    Response=                        Response=
                            0.08

     Competitor Fixations   0.07
                                                                     Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                     Category
                                                     Boundary
                            0.02
                                0       5       10   15   20    25       30     35   40
                                                       VOT (ms)

Unambiguous Stimuli Only
  Clear effects of VOT B: p=.014* P: p=.001***
          Linear Trend B: p=.009** P: p=.007**
                                                  Summary


Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
   • Gradient effect of VOT on looks to the competitor.
   • Effect holds even for unambiguous stimuli.
   • Seems to be long-lasting.

Consistent with growing body of work using priming
(Andruski, Blumstein & Burton, 1994; Utman, Blumstein &
Burton, 2000; Gow, 2001, 2002).
                                                       Extensions

Basic effect has been extended to other phonetic cues.
   - general property of word recognition…

       Voicing (b/p)1
       Laterality (l/r), Manner (b/w), Place (d/g)1
       Vowels (i/I, /)2
       Natural Speech (VOT)3
                                              P           L
    X Metalinguistic Tasks3
                                       Bear
1 McMurray,  Clayards, Tanenhaus &
   Aslin (2004)
2 McMurray & Toscano (in prep)
3 McMurray, Aslin, Tanenhaus, Spivey
                                              B          Sh
   and Subik (submitted)
                                                                               Lexical Sensitivity

Basic effect has been extended to other phonetic cues.
   - general property of word recognition…

       Voicing (b/p)1
       Laterality (l/r), Manner (b/w), Place (d/g)1
       Vowels (i/I, /)2
                                     0.1
       Natural Speech (VOT)3



                                       Competitor Fixations
                                                              0.08                        Response=P
                                                                                          Looks to B
    X Metalinguistic Tasks3                                   0.06

                                                              0.04 Response=B
                                                                     Looks to B
1 McMurray,  Clayards, Tanenhaus &                            0.02                Category
   Aslin (2004)                                                                   Boundary
2 McMurray & Toscano (in prep)                                  00    5   10    15   20   25   30   35   40
3 McMurray, Aslin, Tanenhaus, Spivey
                                                                            VOT (ms)
   and Subik (submitted)
                                                                               Lexical Sensitivity

Basic effect has been extended to other phonetic cues.
   - general property of word recognition…

       Voicing (b/p)
       Laterality (l/r), Manner (b/w), Place (d/g)
       Vowels (i/I, /)
                                     0.1
       Natural Speech (VOT)



                                       Competitor Fixations
                                                              0.08                        Response=P
                                                                                          Looks to B
    X Metalinguistic Tasks                                    0.06

                                                              0.04 Response=B
                                                                     Looks to B
1 McMurray,  Clayards, Tanenhaus &                            0.02                Category
   Aslin (2004)                                                                   Boundary
2 McMurray & Toscano (in prep)                                  00    5   10    15   20   25   30   35   40
3 McMurray, Aslin, Tanenhaus, Spivey
                                                                            VOT (ms)
   and Subik (submitted)
                              The Variance Utilization Model

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.


2) Acoustic detail is represented as gradations in activation
   across the lexicon.


3) Normal word recognition processes do the work of.
   • Maintaining detail
   • Sharpening categories
   • Anticipating upcoming material
   • Resolving prior ambiguity.
                              The Variance Utilization Model


      Input:    b...    u…       m…        p…
        time

        bump
b/p
        pump
        dump
          bun
      bumper
        bomb

       Gradations phonetic cues preserved as relative
       lexical activation.
                              The Variance Utilization Model


      Input:    b...    u…       m…        p…
        time

        bump
b/d     pump
        dump
          bun
      bumper
        bomb

       Gradations phonetic cues preserved as relative
       lexical activation.
                                The Variance Utilization Model


   Input:     b...       u…         m…         p…
      time

      bump
      pump
Vowel
length dump
       bun
    bumper
      bomb

     Non-phonemic distinctions preserved.
              (e.g. vowel length: Gow & Gordon, 1995;
              Salverda, Dahan & McQueen 2003)
                               The Variance Utilization Model


      Input:    b...     u…        m…        p…
        time

        bump
        pump
n/m
        dump
          bun
                                         n/m info lost

      bumper
        bomb

       Material only retained until it is no longer needed.
       Words are a conveniently sized unit.
                        The Variance Utilization Model


Input:    b...    u…       m…        p…
  time

  bump
  pump
  dump
    bun
bumper
  bomb

 No need for explicit short-term memory: lexical
 activation persists over time.
                       The Variance Utilization Model


Input:    b...    u…       m…       p…
  time

  bump
  pump
  dump
    bun
bumper
  bomb

 Lexical competition: Perceptual warping (ala CP)
 results from natural competition processes.
                             The Variance Utilization Model

Current models of spoken word recognition

• Immediacy:           Phonetic cues not simultaneous,
                       Activation retains early cues.

• Activation Based:    Graded response to graded input.

• Parallel Processing: Preserves alternative
                       interpretations until confident.
                       Anticipatory activation for
                       future possibilities.

• Competition:         Non-linear transformation of
                       perceptual space.
                             The Variance Utilization Model

Current models of spoken word recognition

• Immediacy:           Phonetic cues not simultaneous,
                       Activation retains early cues.

• Activation Based:    Graded response to graded input.

• Parallel Processing: Preserves alternative
                       interpretations until confident.
                       Anticipatory activation for
                       future possibilities.

• Competition:         Non-linear transformation of
                       perceptual space.
                             The Variance Utilization Model

Current models of spoken word recognition

• Immediacy:           Phonetic cues not simultaneous,
                       Activation retains early cues.
• Parallel Processing: Preserves alternative
                       interpretations until confident.
                       Anticipatory activation for
                       future possibilities.

Can lexical activation help integrate continuous acoustic
cues over time?
   • Regressive ambiguity resolution.
   • Anticipation of upcoming material.
         Experiment 2: Regressive Ambiguity Resolution




How long are gradient effects of within-category
detail maintained?

Can subphonemic variation play a role in ambiguity
resolution?     ?

How is information at multiple levels integrated?
                                                  Misperception


What if initial portion of a stimulus was misperceived?

       Competitor still active
         - easy to activate it rest of the way.

       Competitor completely inactive
         - system will ―garden-path‖.

P ( misperception )  distance from boundary.

Gradient activation allows the system to hedge its bets.
                                               Misperception

barricade vs. parakeet        / beIrəkeId / vs. / peIrəkit /

  Input:     p/b   eI     r   ə   k      i     t…
     time

   Categorical Lexicon
  parakeet
 barricade


   Gradient Sensitivity
  parakeet
 barricade
         Methods (McMurray, Tanenhaus & Aslin, in prep)

10 Pairs of b/p items.

     Voiced          Voiceless      Overlap
     Bumpercar       Pumpernickel   6
     Barricade       Parakeet       5
     Bassinet        Passenger      5
     Blanket         Plankton       5
     Beachball       Peachpit       4
     Billboard       Pillbox        4
     Drain Pipes     Train Tracks   4
     Dreadlocks      Treadmill      4
     Delaware        Telephone      4
     Delicatessen    Television     4
    Methods




X
                                                                   Eye Movement Results


                              Barricade -> Parricade
                       1
                                                            VOT
                      0.8                                     0
                                                              5
Fixations to Target




                      0.6                                     10
                                                              15
                      0.4                                     20
                                                              25
                      0.2                                     30
                                                              35
                       0
                        300         600               900
                                          Time (ms)




                      Faster activation of target as VOTs near lexical endpoint.

                                 --Even within the non-word range.
                                                                        Eye Movement Results


                              Barricade -> Parricade              Parakeet -> Barakeet
                       1
                                                                                                     VOT
                      0.8                                                                              0
                                                                                                       5
Fixations to Target




                      0.6                                                                              10
                                                                                                       15
                      0.4                                                                              20
                                                                                                       25
                      0.2                                                                              30
                                                                                                       35
                       0
                        300         600               900   300       600               900   1200
                                          Time (ms)                         Time (ms)




                      Faster activation of target as VOTs near lexical endpoint.

                                 • Even within the non-word range.
                                                                          Eye Movement Results

                 1


                                VOT
                0.8
                                Lexical
  Effect Size




                0.6



                0.4



                0.2



                 0
                      0   200     400     600      800      1000   1200    1400   1600

                                                Time (ms)


Effect of VOT reduced as lexical information takes over.
                                              Experiment 2b


Are results driven by the presence of the visual competitor?
        or
Is this a natural process of lexical activation?



                       X                Look, Ma, no
                                         parakeet!
                                                                                                                Experiment 2b: Results

                               Barricade -> Parricade                                                           Parakeet-> Barakeet
                     0.9                                                                              0.9

                     0.8                                                                              0.8

                     0.7                                                                              0.7
                                                                             0
Looks to Barricade




                                                                                  Looks to Parakeet
                     0.6                                                     5
                                                                                                      0.6
                                                                             10
                     0.5                                                     15                       0.5
                                                                             20
                     0.4                                                     25                       0.4
                                                                             30
                     0.3                                                                              0.3
                                                                             35
                                                                             40
                     0.2                                                                              0.2
                                                                             45

                     0.1                                                                              0.1

                      0                                                                                0
                           0    200   400   600   800   1000   1200   1400                                  0   200   400   600   800   1000   1200   1400

                                              Time                                                                            Time



                           • Effect found even without visual competitor.
                           • Regressive ambiguity resolution is a general property
                             of lexical processes.
                                 Experiment 2 Conclusions

Gradient effect of within-category variation without
minimal-pairs.

Gradient effect long-lasting: mean POD = 240 ms.

Effect is not driven by visual context.

Regressive ambiguity resolution:
   • Subphonemic gradations maintained until more
     information arrives.
   • Subphonemic gradation not maintained after
     POD.
   • Subphonemic gradation can improve (or hinder)
     recovery from garden path.
                             The Variance Utilization Model

Current models of spoken word recognition

• Immediacy:           Phonetic cues not simultaneous,
                       Activation retains early cues.
• Parallel Processing: Preserves alternative
                       interpretations until confident.
                       Anticipatory activation for
                       future possibilities.

Can lexical activation help integrate continuous acoustic
cues over time?
   • Regressive ambiguity resolution. 
   • Anticipation of upcoming material. ?
                     Progressive Expectation Formation


Can within-category detail be used to predict
future acoustic/phonetic events?


Yes: Phonological regularities create systematic
within-category variation.

   • Predicts future events.




                           (Gow & McMurray, in press)
                                 Experiment 3: Anticipation

Word-final coronal consonants (n, t, d) assimilate the place
of the following segment.

          Maroong Goose        Maroon Duck
Place assimilation -> ambiguous segments
                         —anticipate upcoming material.

Input:      m… a… rr… oo… ng… g… oo…             s…
   time

 maroon
  goose
   goat
   duck
                                         Methods

Subject hears
   ―select the maroon    duck‖
   ―select the maroon    goose‖
   ―select the maroong   goose‖
   ―select the maroong   duck‖ *


                                   We should see
                                   faster eye-
                                   movements to
                                   ―goose‖ after
                                   assimilated
                                   consonants.
                                                                                               Results

                                                  Onset of “goose” + oculomotor delay
                         0.9
                         0.8

   Fixation Proportion
                         0.7
                         0.6
                         0.5
                         0.4                                                 Assimilated
                         0.3                                                 Non Assimilated
                         0.2
                         0.1
                          0
                               0               200               400                600
                                                     Time (ms)
                                   Looks to “goose“ as a function of time



Anticipatory effect on looks to non-coronal.
                                                                                                  Results


                                                      Onset of “goose” + oculomotor delay
                           0.3



    Fixation Proportion
                          0.25                                                  Assimilated
                                                                                Non Assimilated
                           0.2

                          0.15

                           0.1

                          0.05

                            0
                                 0                 200                400                   600
                                                         Time (ms)
                                     Looks to “duck” as a function of time



Inhibitory effect on looks to coronal (duck, p=.024)
                                               Summary


Sensitivity to subphonemic detail:
   • Increase priors on likely upcoming events.
   • Decrease priors on unlikely upcoming events.
   • Active Temporal Integration Process.


Occasionally assimilation creates ambiguity
  • Resolves prior ambiguity: mudg drinker
  • Similar to experiment 2…

   • Progressive effect delayed 200ms by lexical
     competition—supports lexical locus.
                                               Adult Summary


Lexical activation is exquisitely sensitive to within-
category detail.


This sensitivity is useful to integrate material over time.

   • Regressive Ambiguity resolution.
   • Progressive Facilitation

Underpins a potentially lexical role in speech perception.
                          Consequences for Language Disorders

Word Recognition: not separable from speech perception.

Specific Language Impairment => Deficits in:

   • Speech Perception: Less categorical perception
     (some debate: Thibodeaux & Sussman, 1979; Coady, Kluender &
     Evans, in press; Manis et al, 1997; Serniclaes et al, 2004; Van Alphen
     et al, 2004)


   • Word Recognition: Slower recognition.
     (Montgomery, 2002; Dollaghan, 1998)

Could word recognition deficits account for apparent
perceptual deficits?
                        The Variance Utilization Model


Input:    b...    u…       m…        p…
  time

  bump
  pump
  dump
    bun
bumper
  bomb


 Lexical competition: Perceptual warping (ala
 CP) results from natural competition processes.
                               The Variance Utilization Model


Categorical perception:
   • Stimuli in the same category become closer in
      perceptual space (e.g. Goldstone, 2001)

Lexical competition:
   • Most active lexical candidate inhibits
      alternatives.
   • Becomes more active.
   • More similar to prototype…
   • Feedsback to alter phoneme representations
       (Magnuson, McMurray, Tanenhaus & Aslin, 2003)

   •   Two versions of same word (category) become
       more similar
                            The Variance Utilization Model

              b      p
 If competition is suppressed…
    Input: 80        20
        … by a low-familiarity word
        …should see less CP
          beach      peach
        …greater 20
Activates: 80 sensitivity to within-category detail
                                Words

         beach     peach
Competes: 90       10            Words

             b     p                         Critical step.
Feedback:   90     10       Phonemes (etc)   Input warped


     [90 10] more similar to prototype, [100 0].
     Perceptual space warped.
                     Consequences for Language Disorders


Visual World Paradigm: ideal test

   • Simple task: useable with many populations.

   • No meta-linguistic knowledge required.

   • Used to examine:

      - Lexical Activation (Allopenna et al, 1998)
      - Lexical Competition (Dahan et al, 2001)
      - Within-category sensitivity (McMurray et al, 2002)
                    Consequences for Language Disorders

                                      (with J. Bruce Tomblin, V.
Proposed Research Program                 Samelson, and S. Lee)

   Population: SLI & Normal Adolescents
               16-17 y.o.
               Iowa Longitudinal Study (Tomblin et al)

   Step 1:     Word Familiarity (~200 words)
   Step 2:     Basic Word Recognition
               Stimuli: Beaker, Beetle, Speaker, etc.
   Step 3:     Frequency effects
               Familiar words more active than unfamiliar.
   Step 4:     Gradiency (sensitivity to VOT) suppressed
               for familiar words (high competition).
   Step 5:     How do we buttress lexical activation?
                                      Consequences of VUM

Word recognition sensitive to perceptual detail.
  • Temporal integration.

Word recognition supports perceptual processed.
  • Hypothesis: related to SLI



Continuous variability NOT discarded during recognition.

   Does this change how we think about development?
                                             Development

Historically, work in speech perception has been linked
to development.

Sensitivity to subphonemic detail must revise our view
of development.


Use: Infants face additional problems:

   No lexicon available to clean up noisy input: rely on
   acoustic regularities.

   Extracting a phonology from the series of utterances.
                                           Development

Sensitivity to subphonemic detail:

For 30 years, virtually all attempts to address this
question have yielded categorical discrimination (e.g.
Eimas, Siqueland, Jusczyk & Vigorito, 1971).


   Exception: Miller & Eimas (1996).
     • Only at extreme VOTs.
     • Only when habituated to non-
       prototypical token.
                                                     Use?


Nonetheless, infants possess abilities that would
require within-category sensitivity.


  • Infants can use allophonic differences at word
    boundaries for segmentation (Jusczyk, Hohne
    & Bauman, 1999; Hohne, & Jusczyk, 1994)

  • Infants can learn phonetic categories from
    distributional statistics (Maye, Werker &
    Gerken, 2002; Maye & Weiss, 2004).
                                Statistical Category Learning


Speech production causes clustering along contrastive
phonetic dimensions.

E.g. Voicing / Voice Onset Time
       B:     VOT ~ 0
       P:     VOT ~ 40


 Within a category, VOT forms Gaussian distribution.


 Result: Bimodal distribution
                                             0ms         40ms
                                                   VOT
                                    Statistical Category Learning

To statistically learn speech categories, infants must:

     • Record frequencies of tokens at each value along a
       stimulus dimension.

     • Extract categories from the distribution.

                     +voice          -voice
         frequency




                       0ms             50ms
                              VOT

    • This requires ability to track specific VOTs.
                               Statistical Category Learning

Known statistical learning abilities (Maye et al) predict:

   •   Within category sensitivity.

   •   Graded structure to category.

Why no demonstrations?
                               Statistical Category Learning

Why no demonstrations of sensitivity?

   • Habituation
          Discrimination not ID.
          Possible selective adaptation.
          Possible attenuation of sensitivity.

   • Synthetic speech
          Not ideal for infants.

   • Single exemplar/continuum
          Not necessarily a category representation

Experiment 4: Reassess issue with improved methods.
                                                        HTPP


Head-Turn Preference Procedure
                                   (Jusczyk & Aslin, 1995)

Infants exposed to a chunk of language:
      • Words in running speech.
      • Stream of continuous speech (ala statistical learning
        paradigm).
      • Word list.

Memory for exposed items (or abstractions) assessed:
    • Compare listening time between consistent and
      inconsistent items.
                                         HTPP




Test trials start with all lights off.
                       HTPP




Center Light blinks.
                                       HTPP




Brings infant’s attention to center.
                                 HTPP




One of the side-lights blinks.
                                       HTPP



                                   Beach…
                                   Beach…
                                   Beach…




When infant looks at side-light…
      …he hears a word
                                HTPP




…as long as he keeps looking.
                                                            Methods


 7.5 month old infants exposed to either 4 b-, or 4 p-words.

 80 repetitions total.                     Bomb      Palm
                                           Bear      Pear
 Form a category of the exposed            Bail      Pail
 class of words.
                                           Beach     Peach

 Measure listening time on…
                           Original words Bear       Pear
                             Competitors Pear        Bear
                   VOT closer to boundary Bear*      Pear*


McMurray & Aslin, 2005
                                                  Methods

Stimuli constructed by cross-splicing naturally
produced tokens of each end point.

B:     M= 3.6 ms VOT
P:     M= 40.7 ms VOT

B*:    M=11.9 ms VOT
P*:    M=30.2 ms VOT

B* and P* were judged /b/ or /p/ at least 90%
consistently by adult listeners.

B*: 97%
P*: 96%
                                     Novelty or Familiarity?

Novelty/Familiarity preference varies across infants and
experiments.

We’re only interested in the middle stimuli (b*, p*).

Infants were classified as novelty or familiarity preferring
by performance on the endpoints.

     Novelty   Familiarity
                               Within each group
 B     36          16          will we see evidence
                               for gradiency?
 P     21          12
                                         Novelty or Familiarity?

After being exposed to
       bear… beach… bail… bomb…

Infants who show a novelty effect…
       …will look longer for pear than bear.


                                         What about in between?
  Listening Time




                                         Categorical
                                         Gradient


                   Bear   Bear*   Pear
                                                                    Results

                        Novelty infants (B: 36       P: 21)
                      10000

                      9000
Listening Time (ms)


                      8000

                      7000
                                                   Exposed to:
                      6000
                                                         B
                      5000
                                                         P

                      4000
                              Target     Target*       Competitor



                          Target vs. Target*:      p<.001
                      Competitor vs. Target*:      p=.017
                                                                     Results

                        Familiarity infants (B: 16     P: 12)

                      10000
                                                       Exposed to:
                       9000                                  B
Listening Time (ms)



                                                             P
                       8000

                       7000

                       6000

                       5000

                       4000
                               Target     Target*      Competitor



                          Target vs. Target*:       P=.003
                      Competitor vs. Target*:       p=.012
                                                                                                             Results

                                 Infants exposed to /p/
                                                                       .009**
               10000

                                                                                                 Novelty
Listening Time (ms)


                      9000        .024*
                      8000                                                                       N=21
                      7000


                      6000

                      5000
                                          Listening Time (ms)
                                                                9000                .028*
                      4000
                                                                8000
                             P                                  P*              B                .018*
                                                                7000


                                                                6000
                      Familiarity
                          N=12                                  5000


                                                                4000
                                                                            P               P*           B
                                                                                                           Results

                                  Infants exposed to /b/
                      10000
                                                                    >.2
                                                                    >.1
                                  <.001**
                      9000
                                                                                             Novelty
Listening Time (ms)




                      8000
                                                                                             N=36
                      7000


                      6000

                                                            10000                 .06
                      5000
                                                                                              .15
                                      Listening Time (ms)

                                                             9000
                      4000
                              B                                B*
                                                             8000             P

                                                             7000


                                                             6000
                      Familiarity
                                                             5000
                          N=16
                                                             4000
                                                                          B             B*             P
                                 Experiment 4 Conclusions

Contrary to all previous work:

7.5 month old infants show gradient sensitivity to
subphonemic detail.
   • Clear effect for /p/
   • Effect attenuated for /b/.
Reduced effect for /b/… But:
  Listening Time




                                                        Null Effect?


                   Bear              Bear*    Pear
                    Listening Time




                                                             Expected Result?


                                     Bear    Bear*   Pear
 Listening Time
                                        Actual result.   

                  Bear   Bear*   Pear

• Bear*  Pear

• Category boundary lies between Bear & Bear*
   - Between (3ms and 11 ms) [??]

• Within-category sensitivity in a different range?
                                          Experiment 5

Same design as experiment 3.
VOTs shifted away from hypothesized boundary
Train
        Bomb     Bear
        Beach    Bale
                         -9.7 ms.

Test:
        Bomb     Bear
        Beach    Bale
                         -9.7 ms.
        Bomb*    Bear*
        Beach*   Bale*
                         3.6 ms.
        Palm     Pear
        Peach    Pail
                         40.7 ms.
                                                                Results

                             Familiarity infants (34 Infants)

                      9000
                                     =.01**

                      8000                        =.05*
Listening Time (ms)




                      7000


                      6000



                      5000


                      4000
                                B-            B           P
                                                               Results

                             Novelty infants (25 Infants)

                      9000
                                                 =.002**

                      8000
                                     =.02*
Listening Time (ms)




                      7000


                      6000


                      5000


                      4000
                                B-           B             P
                                   Experiment 5 Conclusions


• Within-category sensitivity in /b/ as well as /p/.


• Shifted category boundary in /b/: not consistent with
  adult boundary (or prior infant work)….


• Graded structure supports statistical learning.
       Will an implementation of this model allow us to
       understand developmental mechanism?
                                       Computational Model

Distributional learning model

1) Model distribution of tokens as
   a mixture of Gaussian distributions
   over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with the
   highest posterior probability is the ―category‖.

3) Each Gaussian has three       
   parameters:
                                          

                                VOT       
Statistical Category Learning

1) Start with a set of randomly selected Gaussians.

2) After each input, adjust each parameter to find best
   description of the input.

3) Start with more Gaussians than necessary--model doesn’t
   innately know how many categories.
       -> 0 for unneeded categories.


        VOT                          VOT
                                  Training:
                                  Lisker & Abramson
                                  (1964) distribution
                                  of VOTs




• Not successful with large K.
• [Successful with K=2…
       …but what if we were learning Hindi?]
   Solution: Competition (winner-take-all)



                    Competition         No Competition
1 Category               5%                  0%
2 Categories             95%                 0%
>4 Categories            0%                  100%

% in right place         95%                 66%


Mechanism #1:       Competition Required.
                    Validated with neural network.
What about the nature of the initial state?

Classic view (e.g. Werker & Tees, 1984):

   • Infants start with many small (nonnative)
     categories.
   • Lose distinctions that are not used in native
     language.
Small (nonnative) categories =>
                         Large native categories.



                                     Combining small
                                     categories: easy.

                                     What about
                                     reverse
                                     (large => small)?
Large (overgeneralized) categories =>
                         Smaller native categories.



                                     Dividing large
                                     categories: hard.
Large (overgeneralized) categories =>
                         Smaller native categories.



                                        Dividing large
                                        categories: hard.




Mechanism #2:       Combining small categories easier
                    than dividing large.

Related to adult non-native speech perception findings?
                      Question:

 Reduced auditory acuity in cochlear implant users.

                       Answer:
Larger region in which stimuli are not discriminable.
     Assess non-native discrimination in CI users.

  • Small categories: Auditory acuity not that bad.
  Larger initial categories. Problem for learning?
  • Large categories: suggest different learning
    mechanisms.
                        (with J. Bruce Tomblin & B. Barker)
                                            Infant Summary

Infants show graded sensitivity to subphonemic detail.
    • Supports variance utilization model.
    • Variance used for statistical learning.

Model suggests aspects of developmental mechanism:
  • Competition.
  • Starting state (large vs. small)

Remaining questions
  • Unexpected VOT boundary: may require 2AFC task
    (anticipatory eye-movement methods)

   • Role of initial category size and learning (possible CI
     application).
                                                Conclusions

Infant and adults sensitive to subphonemic detail.

Continuous detail not discarded by perception / word
recognition.

  X Variance Reduction          Variance Utilization
Normal SWR mechanisms yield:
     1) Temporal Integration
     2) Perceptual warping
                                                Conclusions

Infant and adults sensitive to subphonemic detail.

Infant sensitivity allows long term phonology learning.
   • Potentially reveals developmental mechanism.


Competition processes:
     1) Potentially responsible for CP – locus of SLI?
     2) Essential for learning.
                                           Conclusions


     Spoken language is defined by change.

       But the information to cope with it is
in the signal—if lexical processes don’t discard it.

   Within-category acoustic variation is signal,
                   not noise.
                             IR Head-Tracker
                                 Emitters
          Head-Tracker Cam       Monitor



   Head


            2 Eye cameras

Computers connected            Subject
via Ethernet                   Computer

       Eyetracker
       Computer
Continuous acoustic detail affects
   spoken word recognition

      Implications for
          cognition,
    development
 and language
 disorders.                 Bob McMurray
                          University of Iowa
                         Dept. of Psychology
Misperception: Additional Results
                                                                              Identification Results

                1.00
                0.90
Response Rate
                0.80
                0.70
                                 Voiced
                0.60
                0.50             Voiceless
                                                                             Significant target
                0.40
                0.30
                                 NW                                          responses even at
                0.20
                0.10                                                         extreme.
                0.00
                       0     5         10    15   20   25      30       35

                       Barricade                             Parricade       Graded effects of VOT
                1.00
                0.90
                                                                             on correct response
                                                                             rate.
Response Rate




                0.80
                0.70
                0.60
                                                            Voiced
                0.50
                                                            Voiceless
                0.40
                0.30                                        NW
                0.20
                0.10
                0.00
                       0    5         10     15   20   25      30       35
                       Barakeet                               Parakeet
                                                                  Phonetic “Garden-Path”


                        ―Garden-path‖ effect:
                                Difference between looks to each target (b
                        vs. p) at same VOT.

                                     VOT = 0 (/b/)                  VOT = 35 (/p/)
                       1
Fixations to Target




                      0.8
                                Barricade
                      0.6
                                Parakeet
                      0.4

                      0.2

                       0
                            0        500               1000   0     500               1000   1500

                                           Time (ms)                      Time (ms)
                           0.15


                            0.1                             Target
( Barricade - Parakeet )
    Garden-Path Effect




                           0.05


                              0
                                                                           GP Effect:
                           -0.05
                                                                           Gradient effect of VOT.
                            -0.1
                                    0   5   10   15   20    25   30   35   Target: p<.0001
                                                 VOT (ms)
                           0.06                                            Competitor: p<.0001
                           0.04
                                   Competitor
( Barricade - Parakeet )
   Garden-Path Effect




                           0.02
                              0
                           -0.02
                           -0.04
                           -0.06
                           -0.08
                            -0.1
                                    0   5   10   15   20    25   30   35

                                                 VOT (ms)
Assimilation: Additional Results
       runm picks

       runm takes     ***


When /p/ is heard, the bilabial feature can be
assumed to come from assimilation (not an
underlying /m/).

When /t/ is heard, the bilabial feature is likely to be
from an underlying /m/.
                                  Exp 3 & 4: Conclusions


Within-category detail used in recovering from
assimilation: temporal integration.

   • Anticipate upcoming material
   • Bias activations based on context
      - Like Exp 2: within-category detail retained to
        resolve ambiguity..

Phonological variation is a source of information.
Subject hears
   ―select the mud    drinker‖
   ―select the mudg   gear‖      Critical Pair
   ―select the mudg   drinker
     Onset of “gear”                                Avg. offset of “gear” (402 ms)

                             0.45



       Fixation Proportion
                              0.4
                             0.35
                              0.3
                             0.25
                              0.2
                             0.15                            Initial Coronal:Mud Gear
                              0.1
                                                             Initial Non-Coronal:Mug Gear
                             0.05
                               0
                                    0   200 400   600 800 1000 1200 1400 1600 1800 2000
                                                        Time (ms)



Mudg Gear is initially ambiguous with a late bias
towards ―Mud‖.
  Onset of “drinker”                                 Avg. offset of “drinker (408 ms)
                             0.6

       Fixation Proportion   0.5

                             0.4

                             0.3

                             0.2

                             0.1                                 Initial Coronal: Mud Drinker
                                                                 Initial Non-Coronal: Mug Drinker
                              0
                                   0   200   400   600   800 1000 1200 1400 1600 1800 2000
                                                           Time (ms)


Mudg Drinker is also ambiguous with a late bias towards
―Mug‖ (the /g/ has to come from somewhere).
                                     Onset of “gear”
                           0.8




     Fixation Proportion
                           0.7
                           0.6
                           0.5
                           0.4
                                                                           Assimilated
                           0.3
                                                                           Non Assimilated
                           0.2
                           0.1
                            0
                                 0                 200               400         600
                                                         Time (ms)



Looks to non-coronal (gear) following assimilated or
non-assimilated consonant.

In the same stimuli/experiment there is also a
progressive effect!
Feedback
Ganong (1980): Lexical information biases
perception of ambiguous phonemes.



                                       Phoneme
   % /t/




           doot / toot                 Restoration
                         duke / tuke   (Warren, 1970,
                                       Samuel, 1997).

           d                      t

    Lexical Feedback: McClelland & Elman (1988);
    Magnuson, McMurray, Tanenhaus & Aslin (2003)
Ganong (1980): Lexical information biases
perception of ambiguous phonemes.



                               words



                                phonemes


    Lexical Feedback: McClelland & Elman (1988);
    Magnuson, McMurray, Tanenhaus & Aslin (2003)
Scales of temporal integration in word recognition

   • A Word: ordered series of articulations.
      - Build abstract representations.
      - Form expectations about future events.
      - Fast (online) processing.

   • A phonology:
      - Abstract across utterances.
      - Expectations about possible future events.
      - Slow (developmental) processing
Sparseness
Overgeneralization
  • large 
  • costly: lose distinctiveness.
Undergeneralization
  • small 
  • not as costly: maintain distinctiveness.
To increase likelihood of successful learning:
   • err on the side of caution.
   • start with small 

                            1
                           0.9
                           0.8
                           0.7
39,900
              P(Success)



                           0.6
                                                            2 Category Model
Models                     0.5
                           0.4                              3 Category Model
Run                        0.3
                           0.2
                           0.1
                            0
                                 0   10   20      30         40       50       60

                                               Starting 
                                                                       Small 

Sparseness coefficient: % of
space not strongly mapped                                                        Unmapped
                                                                                   space
to any category.
                                                                VOT


                                                                         Starting 
       Avg Sparseness Coefficient



                                     0.4
                                    0.35                                     .5-1
                                     0.3
                                    0.25
                                     0.2
                                    0.15
                                     0.1
                                    0.05
                                      0
                                           0   2000   4000   6000     8000   10000    12000

                                                       Training Epochs
Start with large σ


                                                              VOT

                                   0.4                                 Starting 
       Avg Sparsity Coefficient




                                  0.35                                     .5-1
                                   0.3
                                  0.25
                                                                           20-40
                                   0.2
                                  0.15
                                   0.1
                                  0.05
                                    0
                                         0   2000   4000   6000     8000   10000    12000

                                                     Training Epochs
Intermediate starting σ


                                                              VOT

                                   0.4                                 Starting 
       Avg Sparsity Coefficient




                                  0.35                                     .5-1
                                   0.3                                     3-11
                                  0.25                                     12-17
                                                                           20-40
                                   0.2
                                  0.15
                                   0.1
                                  0.05
                                    0
                                         0   2000   4000   6000     8000   10000    12000

                                                     Training Epochs
                                         Model Conclusions

To avoid overgeneralization…
      …better to start with small estimates for 

Small or even medium starting ’s lead to sparse
category structure during infancy—much of phonetic
space is unmapped.


Sparse categories:
      Similar temporal integration to exp 2

       Retain ambiguity (and partial
       representations) until more input is available.
                                          AEM Paradigm


Examination of sparseness/completeness of categories
needs a two alternative task.

Anticipatory Eye Movements
(McMurray & Aslin, 2005)
          Also useful with
Infants are trained to make
             • Color                       bear
              eye movements in
anticipatory • Shape
response to auditory or visual
             • Spatial Frequency
stimulus. • Faces

Post-training, generalization can be
assessed with respect to both targets.
                                           pail
       Quicktime Demo
                                   Experiment 6


Anticipatory Eye Movements

Train:      Bear0: Left
            Pail35: Right        palm

Test:       Bear0    Pear40
            Bear5    Pear35
            Bear10   Pear30
            Bear15   Pear25
                                 beach
Same naturally-produced tokens
from Exps 4 & 5.
Expected results

                      Sparse categories
                          Adult boundary


                            unmapped
     Performance




                              space
                   Bear                    Pail




                              VOT
                                                           Results

                            % Correct: 67%
Training Tokens {
                            9 / 16 Better than chance.


                    1
                                         Beach
                 0.75                    Palm
     % Correct




                  0.5


                 0.25


                    0
                        0   10      20           30   40

                                   VOT

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:8/8/2011
language:English
pages:150