purdue by xiangpeng

VIEWS: 3 PAGES: 120

									 Within-Category Variation is Used in
      Spoken Word Recognition

Temporal Integration at Two Time Scales


            Bob McMurray
           University of Iowa
           Dept. of Psychology
                    Collaborators


Richard Aslin
Michael Tanenhaus
David Gow




    Joe Toscano
    Dana Subik
    Julie Markant
Perception & Cognition

A detailed understanding of perceptual processing
is critical to understanding higher level cognition.


Specifically:

Sensitivity to fine-grained perceptual detail can
help integrate information over time.
                                    Temporal Integration


Temporal integration: a critical problem for cognition.
     - information never arrives synchronously.

  • Vision: integration across head-movements,
    saccades and attention-shifts.

  • Music perception: long-term dependencies and
    short term expectancies.
In language, information arrives sequentially.

   • Partial syntactic and semantic representations are
     formed as words arrive.

       The Hawkeyes beat the Boilermakers (once)


   • Words are identified over sequential phonemes.

                     
              l  gd
Spoken Word Recognition is an ideal arena in which
to study these issues because:

   • Research divides word recognition into perceptual
     and cognitive mechanisms.

   • Perceptual information available for temporal
     information integration.
Scales of temporal integration in word recognition

   • A Word: ordered series of articulations.
      - Build abstract representations.
      - Form expectations about future events.
      - Fast (online) processing.

   • A phonology:
      - Abstract across utterances.
      - Expectations about possible future events.
      - Slow (developmental) processing
Mechanisms of Temporal Integration

 Stimuli do not change arbitrarily.

 Perceptual cues reveal something about the
  change itself.

 Active integration:
   • Anticipating future events
   • Retain partial present representations.
   • Resolve prior ambiguity.
                                               Overview


1) Speech perception and Spoken Word Recognition.

2) Lexical activation is sensitive to fine-grained
   detail in speech.

3) Fast temporal integration: taking advantage of
   regularity in the signal for temporal integration.

4) Slow temporal integration:
   Developmental consequences
Online Word Recognition

• Information arrives sequentially
• At early points in time, signal is temporarily ambiguous.


                           X
                          basic          bakery
ba… kery                            X
                                   barrier

                         X
                      barricade          baitX
                                  X
                                  baby
• Later arriving information disambiguates the word.
Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest
  moments of input.

• Activation Based: Lexical candidates (words) receive
  activation to the degree they match the input.

• Parallel Processing: Multiple items are active in
  parallel.

• Competition: Items compete with each other for
  recognition.
Input:    b...   u…   tt…   e…   r
  time

beach
butter
 bump
 putter
   dog
These processes have been well defined for a phonemic
representation of the input.

                 n S n
             k Ag I 

But considerably less ambiguity if we consider
subphonemic information.

Example: subphonemic effects of motor processes.
                                                Coarticulation

Any action reflects future actions as it unfolds.

Example: Coarticulation
   Articulation (lips, tongue…) reflects current, future and
   past events.

   Subtle subphonemic variation in speech reflects temporal
   organization.

      n              n
                           Sensitivity to these perceptual
      e             e     details might yield earlier
      t              c     disambiguation.
                     k
These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded.


      Example: Categorical Perception
                                            Categorical Perception


                       100                                     100




                                                                 Discrimination
B




                        % /p/
                                Discrimination


                                    ID (%/pa/)
P                          0                                     0
                                B                VOT       P

 • Sharp identification of tokens on a continuum.
 • Discrimination poor within a phonetic category.

Subphonemic variation in VOT is discarded in favor of a
discrete symbol (phoneme).
Evidence against the strong form of Categorical
Perception from psychophysical-type tasks:

      Discrimination Tasks
       Pisoni and Tash (1974)
       Pisoni & Lazarus (1974)
       Carney, Widin & Viemeister (1977)
      Training
       Samuel (1977)
       Pisoni, Aslin, Perey & Hennessy (1982)
      Goodness Ratings
       Miller (1997)
       Massaro & Cohen (1983)
                                    Experiment 1




  Does within-category acoustic detail
   systematically affect higher level
              language?


Is there a gradient effect of subphonemic
       detail on lexical activation?
                      McMurray, Aslin & Tanenhaus (2002)

A gradient relationship would yield systematic effects of
subphonemic information on lexical activation.


If this gradiency is useful for temporal integration, it must be
preserved over time.


Need a design sensitive to both acoustic detail and detailed
temporal dynamics of lexical activation.
                                                         Acoustic Detail

Use a speech continuum—more steps yields a better
picture acoustic mapping.

KlattWorks: generate synthetic continua from natural
speech.

    9-step VOT continua (0-40 ms)

    6 pairs of words.
        beach/peach      bale/pale       bear/pear
        bump/pump        bomb/palm       butter/putter

    6 fillers.
        lamp     leg     lock   ladder   lip      leaf
        shark    shell   shoe   ship     sheep    shirt
                                        Temporal Dynamics


          How do we tap on-line recognition?
          With an on-line task: Eye-movements

Subjects hear spoken language and manipulate objects in
a visual world.

Visual world includes set of objects with interesting
linguistic properties.

   a beach, a peach and some unrelated items.

Eye-movements to each object are monitored throughout
the task.

            Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
Why use eye-movements and visual world paradigm?

   • Relatively natural task.

   • Eye-movements generated very fast (within 200ms of
     first bit of information).

   • Eye movements time-locked to speech.

   • Subjects aren’t aware of eye-movements.

   • Fixation probability maps onto lexical activation..
              Task




A moment
to view the
items
            Task


     Bear


Repeat
1080
times
                                                                   Identification Results


                  1
                 0.9
                 0.8
                 0.7
                                                                   High agreement
proportion /p/




                 0.6                                               across subjects and
                 0.5                                               items for category
                 0.4
                 0.3
                                                                   boundary.
                 0.2
                 0.1
                  0
                       0   5   10   15   20   25    30   35   40
                   B                VOT (ms)                  P


                  By subject:                      17.25 +/- 1.33ms
                  By item:                         17.24 +/- 1.24ms
                                                         Task

                                       200 ms
                                                Trials
                              1

                              2

                              3

                              4

                              5




                         % fixations
Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
                                                Time
                                                                                        Task



                            VOT=0 Response=                VOT=40 Response=
                      0.9
Fixation proportion



                      0.8
                      0.7
                      0.6
                      0.5
                      0.4
                      0.3
                      0.2
                      0.1
                       00    400   800   1200   1600   0    400   800   1200   1600   2000

                                                Time (ms)

             More looks to competitor than unrelated items.
                                                                                                          Task


Given that
   • the subject heard bear                                               How often was the subject
   • clicked on ―bear‖…                                                   looking at the ―pear‖?


                      Categorical Results                                    Gradient Effect
Fixation proportion




                                                    Fixation proportion
                                           target                                                target




                       competitor                                            competitor
                                    time                                                  time
                                                                                                          Results


                                  Response=                                           Response=
                       0.16
                                                           VOT                                VOT
Competitor Fixations



                       0.14                                   0 ms                                20 ms
                       0.12                                   5 ms                                25 ms
                                                              10 ms                               30 ms
                        0.1                                   15 ms                               35 ms
                                                                                                  40 ms
                       0.08
                       0.06

                       0.04

                       0.02

                         0
                              0       400     800   1200   1600       0   400   800    1200   1600    2000
                                                    Time since word onset (ms)


                       Long-lasting gradient effect: seen throughout the
                       timecourse of processing.
                                    Response=                        Response=
                            0.08

     Competitor Fixations   0.07
                                                                     Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                     Category
                                                     Boundary
                            0.02
                                0       5       10   15   20    25       30     35   40
                                                       VOT (ms)

Area under the curve:
  Clear effects of VOT B: p=.017*                                             P: p<.001***
          Linear Trend B: p=.023*                                             P: p=.002***
                                    Response=                        Response=
                            0.08

     Competitor Fixations   0.07
                                                                     Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                     Category
                                                     Boundary
                            0.02
                                0       5       10   15   20    25       30     35   40
                                                       VOT (ms)

Unambiguous Stimuli Only
  Clear effects of VOT B: p=.014* P: p=.001***
          Linear Trend B: p=.009** P: p=.007**
                                                  Summary


Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
   • Gradient effect of VOT on looks to the competitor.
   • Effect holds even for unambiguous stimuli.
   • Seems to be long-lasting.

Consistent with growing body of work using priming
(Andruski, Blumstein & Burton, 1994; Utman, Blumstein &
Burton, 2000; Gow, 2001, 2002).
                                 The Proposed Framework

                   Sensitivity & Use
1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

2) Acoustic detail is represented as gradations in activation
   across the lexicon.

3) This sensitivity enables the system to take advantage of
   subphonemic regularities for temporal integration.

4) This has fundamental consequences for development:
   learning phonological organization.
                                        Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

   Voicing
   Laterality, Manner, Place
   Natural Speech

  X Metalinguistic Tasks            P                L
                             Bear


                                    B                Sh
                                                                  Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

   Voicing
   Laterality, Manner, Place
   Natural Speech
                                                      0.1




                              Competitor Fixations
                                                                                Response=P
  X Metalinguistic Tasks                             0.08
                                                                                Looks to B
                                                     0.06

                                                     0.04 Response=B
                                                            Looks to B
                                                     0.02                Category
                                                                         Boundary
                                                       00    5   10   15   20   25   30   35   40
                                                                   VOT (ms)
                                                                  Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

   Voicing
   Laterality, Manner, Place
   Natural Speech
                                                      0.1




                              Competitor Fixations
                                                                                Response=P
  X Metalinguistic Tasks                             0.08
                                                                                Looks to B
                                                     0.06

                                                     0.04 Response=B
                                                            Looks to B
                                                     0.02                Category
                                                                         Boundary
                                                       00    5   10   15   20   25   30   35   40
                                                                   VOT (ms)
                                        Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

   Voicing
   Laterality, Manner, Place
   Natural Speech

  X Metalinguistic Tasks

  ? Non minimal pairs
  ? Duration of effect
      (experiment 1)
2) Acoustic detail is represented as gradations in activation
   across the lexicon.


    Input:    b...     u…        m…        p…
      time

      bump
       pump
       dump
        bun
     bumper
       bomb
                                       Temporal Integration

3) This sensitivity enables the system to take advantage of
   subphonemic regularities for temporal integration.

    Regressive ambiguity resolution (exp 1):
       • Ambiguity retained until more information arrives.

    Progressive expectation building (exp 2):
       • Phonetic distinctions are spread over time
       • Anticipate upcoming material.
                                                Development

4) Consequences for development: learning phonological
   organization.


 Learning a language:
    • Integrating input across many utterances to build
      long-term representation.

 Sensitivity to subphonemic detail (exp 4 & 5).
    • Allows statistical learning of categories (model).
                                         Experiment 2




How long are gradient effects of within-category
detail maintained?

Can subphonemic variation play a role in ambiguity
resolution?     ?

How is information at multiple levels integrated?
                                              Misperception


What if initial portion of a stimulus was misperceived?

       Competitor still active
         - easy to activate it rest of the way.

       Competitor completely inactive
         - system will ―garden-path‖.

P ( misperception )  distance from boundary.

Gradient activation allows the system to hedge its bets.
barricade vs. parakeet        / beIrəkeId / vs. / peIrəkit /

  Input:     p/b   eI     r   ə   k      i     t…
     time

   Categorical Lexicon
  parakeet
 barricade


   Gradient Sensitivity
  parakeet
 barricade
                                              Methods

10 Pairs of b/p items.

     Voiced          Voiceless      Overlap
     Bumpercar       Pumpernickel   6
     Barricade       Parakeet       5
     Bassinet        Passenger      5
     Blanket         Plankton       5
     Beachball       Peachpit       4
     Billboard       Pillbox        4
     Drain Pipes     Train Tracks   4
     Dreadlocks      Treadmill      4
     Delaware        Telephone      4
     Delicatessen    Television     4
X
                                                                   Eye Movement Results


                              Barricade -> Parricade
                       1
                                                            VOT
                      0.8                                     0
                                                              5
Fixations to Target




                      0.6                                     10
                                                              15
                      0.4                                     20
                                                              25
                      0.2                                     30
                                                              35
                       0
                        300         600               900
                                          Time (ms)




                      Faster activation of target as VOTs near lexical endpoint.

                                 --Even within the non-word range.
                                                                      Eye Movement Results


                              Barricade -> Parricade              Parakeet -> Barakeet
                       1
                                                                                                     VOT
                      0.8                                                                              0
                                                                                                       5
Fixations to Target




                      0.6                                                                              10
                                                                                                       15
                      0.4                                                                              20
                                                                                                       25
                      0.2                                                                              30
                                                                                                       35
                       0
                        300         600               900   300       600               900   1200
                                          Time (ms)                         Time (ms)




                      Faster activation of target as VOTs near lexical endpoint.

                                 --Even within the non-word range.
                               Experiment 2 Conclusions

Gradient effect of within-category variation without
minimal-pairs.

Gradient effect long-lasting: mean POD = 240 ms.

Regressive ambiguity resolution:

   • Subphonemic gradations maintained until more
     information arrives.
   • Subphonemic gradation can improve (or hinder)
     recovery from garden path.
                    Progressive Expectation Formation


Can within-category detail be used to predict
future acoustic/phonetic events?


Yes: Phonological regularities create systematic
within-category variation.

   • Predicts future events.
                                Experiment 3: Anticipation

Word-final coronal consonants (n, t, d) assimilate the place
of the following segment.

          Maroong Goose        Maroon Duck
Place assimilation -> ambiguous segments
                         —anticipate upcoming material.

Input:      m… a… rr… oo… ng… g… oo…             s…
   time

 maroon
  goose
   goat
   duck
Subject hears
   ―select the maroon    duck‖
   ―select the maroon    goose‖
   ―select the maroong   goose‖
   ―select the maroong   duck‖ *


                                   We should see
                                   faster eye-
                                   movements to
                                   ―goose‖ after
                                   assimilated
                                   consonants.
                                                                                               Results

                                                  Onset of “goose” + oculomotor delay
                         0.9
                         0.8

   Fixation Proportion
                         0.7
                         0.6
                         0.5
                         0.4                                                 Assimilated
                         0.3                                                 Non Assimilated
                         0.2
                         0.1
                          0
                               0               200               400                600
                                                     Time (ms)
                                   Looks to “goose“ as a function of time



Anticipatory effect on looks to non-coronal.
                                                      Onset of “goose” + oculomotor delay
                           0.3



    Fixation Proportion
                          0.25                                                  Assimilated
                                                                                Non Assimilated
                           0.2

                          0.15

                           0.1

                          0.05

                            0
                                 0                 200                400                   600
                                                         Time (ms)
                                     Looks to “duck” as a function of time



Inhibitory effect on looks to coronal (duck, p=.024)
Sensitivity to subphonemic detail:
   • Increase priors on likely upcoming events.
   • Decrease priors on unlikely upcoming events.
   • Active Temporal Integration Process.


Occasionally assimilation creates ambiguity
  • Resolves prior ambiguity: mudg drinker
  • Similar to experiment 2…
                                              Adult Summary


Lexical activation is exquisitely sensitive to within-
category detail.


This sensitivity is useful to integrate material over time.

   • Regressive Ambiguity resolution.
   • Progressive Facilitation

Taking advantage of phonological and lexical regularities.
                                             Development

Historically, work in speech perception has been linked
to development.

Sensitivity to subphonemic detail must revise our view
of development.


Use: Infants face additional temporal integration problems

   No lexicon available to clean up noisy input: rely on
   acoustic regularities.

   Extracting a phonology from the series of utterances.
Sensitivity to subphonemic detail:

For 30 years, virtually all attempts to address this
question have yielded categorical discrimination (e.g.
Eimas, Siqueland, Jusczyk & Vigorito, 1971).


   Exception: Miller & Eimas (1996).
     • Only at extreme VOTs.
     • Only when habituated to non-
       prototypical token.
                                                        Use?


Nonetheless, infants possess abilities that would
require within-category sensitivity.


  • Infants can use allophonic differences at word
    boundaries for segmentation (Jusczyk, Hohne &
    Bauman, 1999; Hohne, & Jusczyk, 1994)

  • Infants can learn phonetic categories from
    distributional statistics (Maye, Werker & Gerken,
    2002; Maye & Weiss, 2004).
                                Statistical Category Learning


Speech production causes clustering along contrastive
phonetic dimensions.

E.g. Voicing / Voice Onset Time
       B:     VOT ~ 0
       P:     VOT ~ 40


 Within a category, VOT forms Gaussian distribution.


 Result: Bimodal distribution
                                             0ms         40ms
                                                   VOT
To statistically learn speech categories, infants must:

     • Record frequencies of tokens at each value along a
       stimulus dimension.

     • Extract categories from the distribution.

                     +voice         -voice
         frequency




                       0ms            50ms
                              VOT

    • This requires ability to track specific VOTs.
                                                 Experiment 4

Why no demonstrations of sensitivity?

   • Habituation
          Discrimination not ID.
          Possible selective adaptation.
          Possible attenuation of sensitivity.

   • Synthetic speech
          Not ideal for infants.

   • Single exemplar/continuum
          Not necessarily a category representation

Experiment 4: Reassess issue with improved methods.
                                                        HTPP


Head-Turn Preference Procedure
                                   (Jusczyk & Aslin, 1995)

Infants exposed to a chunk of language:
      • Words in running speech.
      • Stream of continuous speech (ala statistical learning
        paradigm).
      • Word list.

Memory for exposed items (or abstractions) assessed:
    • Compare listening time between consistent and
      inconsistent items.
Test trials start with all lights off.
Center Light blinks.
Brings infant’s attention to center.
One of the side-lights blinks.
                                   Beach…
                                   Beach…
                                   Beach…




When infant looks at side-light…
      …he hears a word
…as long as he keeps looking.
                                                        Methods


7.5 month old infants exposed to either 4 b-, or 4 p-words.

80 repetitions total.                     Bomb      Palm
                                          Bear      Pear
Form a category of the exposed            Bail      Pail
class of words.
                                          Beach     Peach

Measure listening time on…
                          Original words Bear       Pear
                            Competitors Pear        Bear
                  VOT closer to boundary Bear*      Pear*
Stimuli constructed by cross-splicing naturally
produced tokens of each end point.

B:     M= 3.6 ms VOT
P:     M= 40.7 ms VOT

B*:    M=11.9 ms VOT
P*:    M=30.2 ms VOT

B* and P* were judged /b/ or /p/ at least 90%
consistently by adult listeners.

B*: 97%
P*: 96%
                                     Novelty or Familiarity?

Novelty/Familiarity preference varies across infants and
experiments.

We’re only interested in the middle stimuli (b*, p*).

Infants were classified as novelty or familiarity preferring
by performance on the endpoints.

     Novelty   Familiarity
                               Within each group
 B     36          16          will we see evidence
                               for gradiency?
 P     21          12
After being exposed to
       bear… beach… bail… bomb…

Infants who show a novelty effect…
       …will look longer for pear than bear.


                                         What about in between?
  Listening Time




                                         Categorical
                                         Gradient


                   Bear   Bear*   Pear
                                                                    Results

                        Novelty infants (B: 36       P: 21)
                      10000

                      9000
Listening Time (ms)


                      8000

                      7000
                                                   Exposed to:
                      6000
                                                         B
                      5000
                                                         P

                      4000
                              Target     Target*       Competitor



                          Target vs. Target*:      p<.001
                      Competitor vs. Target*:      p=.017
                        Familiarity infants (B: 16     P: 12)

                      10000
                                                       Exposed to:
                       9000                                  B
Listening Time (ms)



                                                             P
                       8000

                       7000

                       6000

                       5000

                       4000
                               Target     Target*      Competitor



                          Target vs. Target*:       P=.003
                      Competitor vs. Target*:       p=.012
                                 Infants exposed to /p/
                                                                       .009**
               10000

                                                                                                 Novelty
Listening Time (ms)


                      9000        .024*
                      8000                                                                       N=21
                      7000


                      6000

                      5000
                                          Listening Time (ms)
                                                                9000                .028*
                      4000
                                                                8000
                             P                                  P*              B                .018*
                                                                7000


                                                                6000
                      Familiarity
                          N=12                                  5000


                                                                4000
                                                                            P               P*           B
                                  Infants exposed to /b/
                      10000
                                                                    >.2
                                                                    >.1
                                  <.001**
                      9000
                                                                                             Novelty
Listening Time (ms)




                      8000
                                                                                             N=36
                      7000


                      6000

                                                            10000                 .06
                      5000
                                                                                              .15
                                      Listening Time (ms)

                                                             9000
                      4000
                              B                                B*
                                                             8000             P

                                                             7000


                                                             6000
                      Familiarity
                                                             5000
                          N=16
                                                             4000
                                                                          B             B*             P
                                 Experiment 4 Conclusions

Contrary to all previous work:

7.5 month old infants show gradient sensitivity to
subphonemic detail.
   • Clear effect for /p/
   • Effect attenuated for /b/.
Reduced effect for /b/… But:
  Listening Time




                                                        Null Effect?


                   Bear              Bear*    Pear
                    Listening Time




                                                             Expected Result?


                                     Bear    Bear*   Pear
 Listening Time
                                        Actual result.   

                  Bear   Bear*   Pear

• Bear*  Pear

• Category boundary lies between Bear & Bear*
   - Between (3ms and 11 ms) [??]

• Within-category sensitivity in a different range?
                                         Experiment 5

Same design as experiment 3.
VOTs shifted away from hypothesized boundary
Train
        Bomb     Bear
        Beach    Bale
                         -9.7 ms.

Test:
        Bomb     Bear
        Beach    Bale
                         -9.7 ms.
        Bomb*    Bear*
        Beach*   Bale*
                         3.6 ms.
        Palm     Pear
        Peach    Pail
                         40.7 ms.
                             Familiarity infants (34 Infants)

                      9000
                                     =.01**

                      8000                        =.05*
Listening Time (ms)




                      7000


                      6000



                      5000


                      4000
                                B-            B           P
                             Novelty infants (25 Infants)

                      9000
                                                 =.002**

                      8000
                                     =.02*
Listening Time (ms)




                      7000


                      6000


                      5000


                      4000
                                B-           B             P
                                  Experiment 5 Conclusions


• Within-category sensitivity in /b/ as well as /p/.


• Shifted category boundary in /b/: not consistent with
  adult boundary (or prior infant work). Why?
/b/ results consistent with (at least) two mappings.
 Category Mapping


                      /b/          /p/
     Strength


                                             1) Shifted boundary



                               VOT

                    • Inconsistent with prior literature.

                    • Why would infants have this boundary?
HTPP is a one-alternative task.
     Asks:      B or not-B                not:   B or P

                       Adult boundary
  Category Mapping




                         unmapped         2) Sparse Categories
                     /b/            /p/
      Strength




                           space




                           VOT

Hypothesis: Sparse categories: by-product of efficient
learning.
                                      Computational Model
                                                                   Adult boundary




                                              Category Mapping
                                                                       unmapped
Distributional learning model                                    /b/     space    /p/




                                                   Strength
1) Model distribution of tokens as
                                                                        VOT
   a mixture of Gaussian distributions
   over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with the
   highest posterior probability is the ―category‖.

3) Each Gaussian has three       
   parameters:
                                          

                                VOT       
Statistical Category Learning

1) Start with a set of randomly selected Gaussians.

2) After each input, adjust each parameter to find best
   description of the input.

3) Start with more Gaussians than necessary--model doesn’t
   innately know how many categories.
       -> 0 for unneeded categories.


        VOT                          VOT
Overgeneralization
  • large 
  • costly: lose phonetic distinctions…
Undergeneralization
  • small 
  • not as costly: maintain distinctiveness.
To increase likelihood of successful learning:
   • err on the side of caution.
   • start with small 

                            1
                           0.9
                           0.8
                           0.7
39,900
              P(Success)



                           0.6
                                                            2 Category Model
Models                     0.5
                           0.4                              3 Category Model
Run                        0.3
                           0.2
                           0.1
                            0
                                 0   10   20      30         40       50       60

                                               Starting 
                                                                       Small 

Sparseness coefficient: % of
space not strongly mapped                                                        Unmapped
                                                                                   space
to any category.
                                                                VOT


                                                                         Starting 
       Avg Sparseness Coefficient



                                     0.4
                                    0.35                                     .5-1
                                     0.3
                                    0.25
                                     0.2
                                    0.15
                                     0.1
                                    0.05
                                      0
                                           0   2000   4000   6000     8000   10000    12000

                                                       Training Epochs
Start with large σ


                                                              VOT

                                   0.4                                 Starting 
       Avg Sparsity Coefficient




                                  0.35                                     .5-1
                                   0.3
                                  0.25
                                                                           20-40
                                   0.2
                                  0.15
                                   0.1
                                  0.05
                                    0
                                         0   2000   4000   6000     8000   10000    12000

                                                     Training Epochs
Intermediate starting σ


                                                              VOT

                                   0.4                                 Starting 
       Avg Sparsity Coefficient




                                  0.35                                     .5-1
                                   0.3                                     3-11
                                  0.25                                     12-17
                                                                           20-40
                                   0.2
                                  0.15
                                   0.1
                                  0.05
                                    0
                                         0   2000   4000   6000     8000   10000    12000

                                                     Training Epochs
                                                Limitations

1) Occasionally model leaves sparse regions at the end of
   learning.
     • Competition/Choice framework:
       Additional competition or selection mechanisms
       during processing: categorization despite incomplete
       information.

2) Multi-dimensional categories
           1-D: 3      parameters / category
           2-D: 6         ―           ―
           3-D: 13        ―           ―
           4-D: 15        ―           ―
    • Cue/model-reliability may reduce dimensionality.
                             Non-parametric approach?

                                        Categories
• Competitive Hebbian Learning
  (Rumelhart & Zipser, 1986).
• Not constrained by a particular
  equation—can fill space better.
• Similar properties in terms of           VOT
  starting  and sparseness.
                                         Model Conclusions

To avoid overgeneralization…
      …better to start with small estimates for 

Small or even medium starting ’s lead to sparse
category structure during infancy—much of phonetic
space is unmapped.


Sparse categories:
      Similar temporal integration to exp 2

       Retain ambiguity (and partial
       representations) until more input is available.
                                          AEM Paradigm


Examination of sparseness/completeness of categories
needs a two alternative task.

Anticipatory Eye Movements
(McMurray & Aslin, 2005)
          Also useful with
Infants are trained to make
             • Color                       bear
              eye movements in
anticipatory • Shape
response to auditory or visual
             • Spatial Frequency
stimulus. • Faces

Post-training, generalization can be
assessed with respect to both targets.
                                           pail
       Quicktime Demo
                                   Experiment 6


Anticipatory Eye Movements

Train:      Bear0: Left
            Pail35: Right        palm

Test:       Bear0    Pear40
            Bear5    Pear35
            Bear10   Pear30
            Bear15   Pear25
                                 beach
Same naturally-produced tokens
from Exps 4 & 5.
Expected results

                      Sparse categories
                          Adult boundary


                            unmapped
     Performance




                              space
                   Bear                    Pail




                              VOT
                                                           Results

                            % Correct: 67%
Training Tokens {
                            9 / 16 Better than chance.


                    1
                                         Beach
                 0.75                    Palm
     % Correct




                  0.5


                 0.25


                    0
                        0   10      20           30   40

                                   VOT
                                             Infant Summary

Infants show graded sensitivity to subphonemic detail.

/b/-results: regions of unmapped phonetic space.

Statistical approach provides support for sparseness.
    • Given current learning theories, sparseness results
      from optimal starting parameters.

Empirical test will require a two-alternative task.
  • AEM: train infants to make eye-movements in
    response to stimulus identity.
                                                 Conclusions

Infant and adults sensitive to subphonemic detail.

Sensitivity is important to adult and developing word
recognition systems.

       1) Short term cue integration.
       2) Long term phonology learning.

In both cases…
       Partially ambiguous material is retained until
       more data arrives.

       Partially active representations anticipate
       likelihood of future material
                                        Conclusions


    Spoken language is defined by change.

     But the information to cope with it is
       in the signal—if we look online.

Within-category acoustic variation is signal, not
                    noise.
 Within-Category Variation is Used in
      Spoken Word Recognition

Temporal Integration at Two Time Scales

            Bob McMurray
           University of Iowa
           Dept. of Psychology
                             IR Head-Tracker
                                 Emitters
          Head-Tracker Cam       Monitor



   Head


            2 Eye cameras

Computers connected            Subject
via Ethernet                   Computer

       Eyetracker
       Computer
Misperception: Additional Results
10 Pairs of b/p items.
   • 0 – 35 ms VOT continua.


20 Filler items (lemonade, restaurant, saxophone…)

Option to click ―X‖ (Mispronounced).

26 Subjects

1240 Trials over two days.
                                                                              Identification Results

                1.00
                0.90
Response Rate
                0.80
                0.70
                                 Voiced
                0.60
                0.50             Voiceless
                                                                             Significant target
                0.40
                0.30
                                 NW                                          responses even at
                0.20
                0.10                                                         extreme.
                0.00
                       0     5         10    15   20   25      30       35

                       Barricade                             Parricade       Graded effects of VOT
                1.00
                0.90
                                                                             on correct response
                                                                             rate.
Response Rate




                0.80
                0.70
                0.60
                                                            Voiced
                0.50
                                                            Voiceless
                0.40
                0.30                                        NW
                0.20
                0.10
                0.00
                       0    5         10     15   20   25      30       35
                       Barakeet                               Parakeet
                                                                  Phonetic “Garden-Path”


                        ―Garden-path‖ effect:
                                Difference between looks to each target (b
                        vs. p) at same VOT.

                                     VOT = 0 (/b/)                  VOT = 35 (/p/)
                       1
Fixations to Target




                      0.8
                                Barricade
                      0.6
                                Parakeet
                      0.4

                      0.2

                       0
                            0        500               1000   0     500               1000   1500

                                           Time (ms)                      Time (ms)
                           0.15


                            0.1                             Target
( Barricade - Parakeet )
    Garden-Path Effect




                           0.05


                              0
                                                                           GP Effect:
                           -0.05
                                                                           Gradient effect of VOT.
                            -0.1
                                    0   5   10   15   20    25   30   35   Target: p<.0001
                                                 VOT (ms)
                           0.06                                            Competitor: p<.0001
                           0.04
                                   Competitor
( Barricade - Parakeet )
   Garden-Path Effect




                           0.02
                              0
                           -0.02
                           -0.04
                           -0.06
                           -0.08
                            -0.1
                                    0   5   10   15   20    25   30   35

                                                 VOT (ms)
Assimilation: Additional Results
       runm picks

       runm takes     ***


When /p/ is heard, the bilabial feature can be
assumed to come from assimilation (not an
underlying /m/).

When /t/ is heard, the bilabial feature is likely to be
from an underlying /m/.
                                  Exp 3 & 4: Conclusions


Within-category detail used in recovering from
assimilation: temporal integration.

   • Anticipate upcoming material
   • Bias activations based on context
      - Like Exp 2: within-category detail retained to
        resolve ambiguity..

Phonological variation is a source of information.
Subject hears
   ―select the mud    drinker‖
   ―select the mudg   gear‖      Critical Pair
   ―select the mudg   drinker
     Onset of “gear”                                Avg. offset of “gear” (402 ms)

                             0.45



       Fixation Proportion
                              0.4
                             0.35
                              0.3
                             0.25
                              0.2
                             0.15                            Initial Coronal:Mud Gear
                              0.1
                                                             Initial Non-Coronal:Mug Gear
                             0.05
                               0
                                    0   200 400   600 800 1000 1200 1400 1600 1800 2000
                                                        Time (ms)



Mudg Gear is initially ambiguous with a late bias
towards ―Mud‖.
  Onset of “drinker”                                 Avg. offset of “drinker (408 ms)
                             0.6

       Fixation Proportion   0.5

                             0.4

                             0.3

                             0.2

                             0.1                                 Initial Coronal: Mud Drinker
                                                                 Initial Non-Coronal: Mug Drinker
                              0
                                   0   200   400   600   800 1000 1200 1400 1600 1800 2000
                                                           Time (ms)


Mudg Drinker is also ambiguous with a late bias towards
―Mug‖ (the /g/ has to come from somewhere).
                                     Onset of “gear”
                           0.8




     Fixation Proportion
                           0.7
                           0.6
                           0.5
                           0.4
                                                                           Assimilated
                           0.3
                                                                           Non Assimilated
                           0.2
                           0.1
                            0
                                 0                 200               400         600
                                                         Time (ms)



Looks to non-coronal (gear) following assimilated or
non-assimilated consonant.

In the same stimuli/experiment there is also a
progressive effect!

								
To top