PPT

Document Sample
PPT Powered By Docstoc
					Subphonemic detail is used in
  spoken word recognition:

   Temporal Integration at
      Two Time Scales


        Bob McMurray
            Grateful Thanks to:

Advisors                Collaborators
Dick Aslin              Meghan Clayards
Mike Tanenhaus          David Gow

Committee               Saviors in the Lab
Joyce McDonough         Julie Markant
David Knill             Dana Subik
Christopher Brown

People who put up with me
Kate Pirog         Kathy Corser      Bette
Andrea Lathrop Jennifer Gillis       McCormick
Meaningful stimuli are almost always temporal.

   Scene Perception: build stable representation
      across multiple eye-movements, attention shifts.

   Music: series of notes. Temporal properties (order
     and rhythm) are fundamental.
                   Language as Temporal Integration


Temporal Integration fundamental to language, as it
appears in the world.

•Word: Ordered series of articulations.

•Sentence: Sequence of words.

•A Language: Series of utterances.

   Phonology, syntax extracted from this series of
   utterances.
How are abstract representations formed?

Stimuli do not change arbitrarily.

At any point in time, subtle, perceptual cues tell the
system something about the change itself.

Enable an active integration process.
     Anticipating future events
     Retain partial present representations.
     Resolve prior ambiguity.
Word recognition is an ideal arena:
  • Substantial perceptual information available.
  • Multiple timescales for integration.



But:
       Early evidence suggested that this
       perceptual information is not maintained.
                                           Overview


1) Continuous perceptual variation affects word
   recognition.

2) A new framework for word recognition.

3) Integrating speech cues in online recognition.

4) Long-term temporal integration: development.

5) The use of continuous detail during development.

6) Conclusions
                            Speech and Word Recognition


     Acoustic                 Speech Perception
                              • Categorization of acoustic
                                input into sublexical units.


                       Sublexical Units
                       /a/ /la/ /ip/
                          /b/ /l/ /p/


Word Recognition                           Lexicon
• Identification of target word
  from active sublexical units.
Word Recognition as temporal ambiguity resolution

• Information arrives sequentially
• At early points in time, signal is temporarily
  ambiguous.

                         X
                        basic          bakery
ba… kery
                                  X
                                barrier

                       X
                    barricade             X
                                       bait


                                 X
                                baby
• Later arriving information disambiguates the word.
Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest
  moments of input.

• Activation Based: Lexical candidates (words)
  receive activation to the degree they match the
  input.

• Parallel Processing: Multiple items are active in
  parallel.

• Competition: Items compete with each other for
  recognition.
Input:    b...   u…   tt…   e…   r
  time

 beach
 butter
 bump
putter
   dog
These processes have been well defined for a
phonemic representation of the input.

                n S n
            k Ag I 

But there may be considerably less ambiguity in the
signal if we consider subphonemic information.

Example: subphonemic effects of motor processes.
                                              Coarticulation

Any action reflects future actions as it unfolds.

Example: Coarticulation
   Movements of articulators (lips, tongue…) during
   speech reflect current, future and past events.

   Yields subtle subphonemic variation in speech that
   reflects temporal organization.

      n             n
                           Sensitivity to these
      e            e      perceptual details might
      t             c      yield earlier disambiguation.
                    k
These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded.

      Example: Categorical Perception
                                     Categorical Perception


                     100                                100




                                                            Discrimination
B




                     % /p/
                             Discrimination


                                 ID (%/pa/)
P                       0                               0
                             B                VOT   P

 • Sharp identification of tokens on a continuum.
 • Discrimination poor within a phonetic category.

Subphonemic variation in VOT is discarded in favor
of a discrete symbol (phoneme).
Evidence against the strong form of Categorical
Perception comes from a variety of
psychophysical-type tasks:

      Discrimination Tasks
       Pisoni and Tash (1974)
       Pisoni & Lazarus (1974)
       Carney, Widin & Viemeister (1977)
      Training
       Samuel (1977)
       Pisoni, Aslin, Perey & Hennessy (1982)
      Goodness Ratings
       Miller (1997)
       Massaro & Cohen (1983)
Does within-category acoustic detail
 systematically affect higher level
            language?


    Is there a gradient effect of
   subphonemic detail on lexical
             activation?
                  McMurray, Aslin & Tanenhaus (2002)


A gradient relationship would yield systematic effects
of subphonemic information on lexical activation.


If this gradiency is useful for temporal integration, it
must be preserved over time.


Need a design sensitive to both acoustic detail and
detailed temporal dynamics of lexical activation.
                                                Acoustic Detail

Use a speech continuum—more steps yields a
better picture acoustic mapping.

KlattWorks: generate synthetic continua from
natural speech.

    9-step VOT continua (0-40 ms)

    6 pairs of words.
        beach/peach      bale/pale    bear/pear
        bump/pump        bomb/palm    butter/putter

    6 fillers.
        lamp     leg     lock   ladder lip     leaf
        shark    shell   shoe   ship   sheep   shirt
                                     Temporal Dynamics

       How do we tap on-line recognition?
       With an on-line task: Eye-movements

Subjects hear spoken language and manipulate
objects in a visual world.

Visual world includes set of objects with interesting
linguistic properties.

   a beach, a peach and some unrelated items.

Eye-movements to each object are monitored
throughout the task.

          Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
Why use eye-movements and visual world paradigm?

   •Relatively natural task.

   •Eye-movements generated very fast (within 200ms
    of first bit of information).

   •Eye movements time-locked to speech.

   •Subjects aren’t aware of eye-movements.

   •Fixation probability maps onto lexical activation..
              Task




A moment
to view the
items
            Task


     Bear


Repeat
1080
times
                                                                   Identification Results


                  1
                 0.9
                 0.8
                 0.7
                                                                    High agreement
proportion /p/




                 0.6                                                across subjects
                 0.5                                                and items for
                 0.4
                 0.3
                                                                    category
                 0.2                                                boundary.
                 0.1
                  0
                       0   5   10   15   20   25    30   35   40
                   B                VOT (ms)                  P


                  By subject:                      17.25 +/- 1.33ms
                  By item:                         17.24 +/- 1.24ms
                                               Task

                             200 ms
                                      Trials
                         1

                         2

                         3

                         4

                         5



Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
                                      Time
                                                                                       Task



                            VOT=0 Response=                VOT=40 Response=
                      0.9
Fixation proportion



                      0.8
                      0.7
                      0.6
                      0.5
                      0.4
                      0.3
                      0.2
                      0.1
                       00    400   800   1200   1600   0    400   800   1200   1600   2000

                                                Time (ms)

              More looks to competitor than unrelated items.
                                                                                                          Task


Given that
   • the subject heard bear                                               How often was the subject
   • clicked on ―bear‖…                                                   looking at the ―pear‖?


                      Categorical Results                                    Gradient Effect
Fixation proportion




                                                    Fixation proportion
                                           target                                                target




                       competitor                                            competitor
                                    time                                                  time
                                                                                                          Results


                                  Response=                                           Response=
                       0.16
                                                           VOT                                VOT
Competitor Fixations



                       0.14                                   0 ms                                20 ms
                       0.12                                   5 ms                                25 ms
                                                              10 ms                               30 ms
                        0.1                                   15 ms                               35 ms
                                                                                                  40 ms
                       0.08
                       0.06

                       0.04

                       0.02

                         0
                              0       400     800   1200   1600       0   400   800    1200   1600    2000
                                                    Time since word onset (ms)


                       Long-lasting gradient effect: seen throughout
                       the timecourse of processing.
                                    Response=                    Response=
                            0.08

     Competitor Fixations   0.07
                                                                 Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                 Category
                                                 Boundary
                            0.02
                                0       5   10    15   20   25      30      35   40
                                                   VOT (ms)

Area under the curve:
 Clear effects of VOT B: p=.017*                                         P: p<.001***
        Linear Trend B: p=.023*                                          P: p=.002***
                                    Response=                    Response=
                            0.08

     Competitor Fixations   0.07
                                                                 Looks to
                            0.06

                            0.05

                            0.04     Looks to

                            0.03                 Category
                                                 Boundary
                            0.02
                                0       5   10    15   20   25      30      35   40
                                                   VOT (ms)

Unambiguous Stimuli Only
 Clear effects of VOT B: p=.014*                                         P: p=.001***
        Linear Trend B: p=.009**                                         P: p=.007**
                                                  Summary


Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
   • Gradient effect of VOT on looks to the competitor.
   • Effect holds even for unambiguous stimuli.
   • Seems to be long-lasting.

Consistent with growing body of work using priming
(Andruski, Blumstein & Burton, 1994; Utman, Blumstein &
Burton, 2000; Gow, 2001, 2002).
                                 The Proposed Framework

                   Sensitivity & Use
1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.

2) Acoustic detail is represented as gradations in activation
   across the lexicon.

3) This sensitivity enables the system to take advantage of
   subphonemic regularities for temporal integration.

4) This has fundamental consequences for development:
   learning phonological organization.
                                        Lexical Sensitivity

1) Word recognition is systematically sensitive to
   subphonemic acoustic detail.


       McMurray, Tanenhaus and Aslin (2002)

       Other phonetic contrasts (exp. 1)
       Non minimal-pairs (exp. 2)
       During development (exps. 3 & 4)
                                               Lexical Basis

2) Acoustic detail is represented as gradations in activation
   across the lexicon.


     Lexicon forms a high dimensional basis
     vector for acoustic/phonetic space.

     No unneeded dimensions (features)
     coded—represents only possible
     alternatives.
2) Acoustic detail is represented as gradations in activation
   across the lexicon.


    Input:    b...     u…        m…        p…
      time

      bump
      pump
      dump
        bun
    bumper
      bomb
                                     Temporal Integration

3) This sensitivity enables the system to take advantage of
   subphonemic regularities for temporal integration.

    Short term cue integration (exp 1):
       •Cues to phonetic distinctions are spread out
        over time.
       •Lexical activation retains probabilistic
        representation of input as information
        accumulates.

    Longer term ambiguity resolution (exp 2):
      •Early, ambiguous material retained until
        more information arrives.
                                           Development

4) Consequences for development: learning phonological
   organization.


 Learning a language:
    •Integrating input across many utterances to
     build long-term representation.

 Sensitivity to subphonemic detail (exp 3 & 4).
    •Allows statistical learning of categories (exp 5).
                                Experiment 1




1) Do lexical representations serve as
   a locus for short-term temporal
   integration of acoustic cues?


2) Can we see sensitivity to
   subphonemic detail in additional
   phonetic contexts?
                                   Phonetic Context

Asynchronous cues to voicing:
                VOT  Vowel Length

Both covary with speaking rate: rate normalization




       VOT   Vowel Length
                                   Phonetic Context

Asynchronous cues to voicing:
                VOT  Vowel Length

Both covary with speaking rate: rate normalization




       VOT   Vowel Length
Manner of Articulation

Formant Transition Slope (FTSlope):
Temporal cue like VOT covaries with vowel length.


                belt




                welt
                                     Alternative Models

VOT precedes Vowel Length.
Online processing: how are these cues integrated?

Model 1: Sublexical integration
   time

          VOT       Vowel Length



          Sublexical Rep. (phonemes)
          Sublex.



                                    The Lexicon
VOT precedes Vowel Length.
Online processing: how are these cues integrated?

Model 2: Lexical Integration (proposed framework)
   time

          VOT       Vowel Length




             Partial representation     More complete
             retained...                representation…
                       The Lexicon
Eye-movements reveal lexical activation…



   Will the temporal pattern of fixations
    to lexical competitors reveal when
    acoustic information contacts the
                  lexicon?
                9-step VOT continua (0-40 ms)
                                    beach/peach
2 Vowel                             beak/peak
Lengths
          x                         bees/peas
                9-step formant transition slope
                                    bench/wench
                                    belt/welt
                                    bell/well
Fillers         9-step F3 onset (place)
•No effect of                       dune/goon
 vowel length                       dew/goo
                                    deuce/goose
•Extend
                9-step F3 onset (laterality)
 gradiency to
                                    lake/rake
 new continua                       lei/rai
                                    lace/race
                                           Task

Same task as McMurray et al (2002)




                                     40 Subjects
                                     1080 Trials
                                          Analysis

1) Validate methods with identification (mouse
   click) data.


2) Extend gradient effects of subphonemic detail to
      • Multiple dimensions
      • New phonetic contrasts


3) Disambiguate integration models by examining
   when effects are seen.
                             Results: Stimulus Validation

1) Identification: Expected Results (from literature)

                         Long          Short
            B/P         More /b/      More /p/

            B/W         More /b/      More /w/

            R/L              No difference

            D/G              No difference
                                          B/P                                                               B/W
                  1                                                                    1
                 0.9                                                                  0.9
                               Long




                                                                     % /w/ response
                 0.8                                                                  0.8
% /p/ response




                                                                                                    Long
                 0.7
                               Short                                                  0.7
                 0.6                                                                  0.6           Short
                 0.5                                                                  0.5
                 0.4                                                                  0.4
                 0.3                                                                  0.3
                 0.2                                                                  0.2
                 0.1                                                                  0.1
                  0                                                                    0
                       0   5    10   15    20   25   30   35   40                           1   2   3   4    5    6   7   8   9

                 /b/                      VOT                  /p/                      /b/             FTStep                /w/
                                         L/R                                                            D/G
                  1                                                                1
                 0.9                                                              0.9




                                                                 % /g/ response
                 0.8
                                   Long                                           0.8
                                                                                                Long
% /r/ response




                 0.7                                                              0.7
                 0.6               Short                                          0.6           Short
                 0.5                                                              0.5
                 0.4                                                              0.4
                 0.3                                                              0.3
                 0.2                                                              0.2
                 0.1                                                              0.1
                  0                                                                0
                       1   2   3     4     5   6   7   8    9                           1   2   3   4    5    6   7   8    9

                 /l/                                       /r/                     /d/                                    /g/
                            Stimulus Validation



       Long       Short
B/P   More /b/   More /p/       
B/W   More /b/   More /w/       
R/L      No difference          
D/G      No difference          
                                      Results: Gradiency

2) Eye-movements: Predicted Results


         Continuum      Vowel             Finding
                                 Replicate prior work
  B/P                          2D gradiency
                                 Extend gradiency to manner
  B/W                          2D gradiency
                                 Extend gradiency to laterality
  R/L                          Validate methods
                                 Extend gradiency to place
  D/G                          Validate methods
                                                                                          B/P

                          0.18
                                                                                      F3 onset
                                                                                      B: p<.001
Fixations to Competitor




                          0.16
                                                                         Long
                          0.14
                                                                         Short
                                                                                      P: p=.002
                          0.12

                           0.1                                                        Vowel
                          0.08
                                                                                      B: p=.006
                          0.06
                                                                                      P: p=.061
                          0.04


                                                                                      Interaction
                          0.02

                            0
                                 -25    -15      -5       5        15            25   B: p>.1
                                       Distance from Category Boundary                P: p=.027
                                      Summary: Gradiency

Across continua, looks to competitors validated gradient
hypothesis.

        Continuum Vowel               Finding

                            Replicate prior work
  B/P    P=.0015     .006   2D gradiency
                            Extend gradiency to FT Slope
  B/W      .001       .05   2D gradiency
                            Extend gradiency to F3
  R/L      .001       >.1   Validate methods
                            Extend gradiency to place
  D/G      .017       >.1   Validate methods
                         Results: Temporal Dynamics



When do effects occur?

   VOT / FTStep effects cooccurs with vowel length.
     (Sublexical Integration)


   VOT / FTStep precedes vowel length.
     (Lexical locus)
   Compute 3 effect sizes at each 20 ms time slice.

           •VOT / FTStep: Regression slope of competitor
            fixations as a function of VOT.

                                                                                                          Time = 720 ms…
                        0.2                                                                 0.14




                                                                     Competitor Fixations
                                                          VOT from
                       0.18
                                                          Boundary                          0.12
Competitor Fixations




                       0.16                                    -25
                                                                                             0.1
                       0.14                                    -20
                       0.12                                    -15                          0.08
                                                               -10
                        0.1

                                                                                                                     Y = M720x + B
                                                               -5                           0.06
                       0.08
                                                                                            0.04
                       0.06
                       0.04                                                                 0.02
                       0.02
                                                                                              0
                         0
                              0   500    1000      1500      2000                                 -30    -25   -20   -15   -10   -5    0

                                        Time (s)                                                        Distance from Boundary (VOT)
   Compute 3 effect sizes at each 20 ms time slice.

           •VOT / FTStep: Regression slope of competitor
            fixations as a function of VOT.

                                                                                                          Time = 740 ms…
                        0.2                                                                 0.14




                                                                     Competitor Fixations
                                                          VOT from
                       0.18
                                                          Boundary                          0.12
Competitor Fixations




                       0.16                                    -25
                                                                                             0.1
                       0.14                                    -20
                       0.12                                    -15                          0.08
                                                               -10
                        0.1

                                                                                                                     Y = M740x + B
                                                               -5                           0.06
                       0.08
                                                                                            0.04
                       0.06
                       0.04                                                                 0.02
                       0.02
                                                                                              0
                         0
                              0   500    1000      1500      2000                                 -30    -25   -20   -15   -10   -5    0

                                        Time (s)                                                        Distance from Boundary (VOT)
Compute 3 effect sizes at each 20 ms time slice.


  •Vowel Length: Difference (D) between fixations
   after hearing long vs. short vowel.

                                                    Time = 340 ms…
  •Repeat for                               0.084
   each time
                     Competitor Fixations

                                            0.080
   slice, subject.
                                            0.076          L-S = D
                                            0.072

                                            0.068

                                            0.064
                                                    Long             Short
Compute 3 effect sizes at each 20 ms time slice.

•Unrelated: Difference between looks to target
 after a experimental vs. filler stimulus.

  Information available from the earliest moments
  of processing: subjects should show early effect.

  Does analysis have sufficient power?
Resulting dataset…

 Subject   Time      Unrelated   VOT (M)   Vowel (D)
 1         20        0.02076     -0.0023   0.0094
           40        0.02446     -0.0016   0.0095
           60        0.02916     -0.0008   0.0108
           …
           2000      0.99871     0.06021   0.123
 2         20        0.05642     0.0014    0.0091
           40        0.07126     0.0018    0.0088
           60        0.08926     0.0029    0.0104
           …
           2000      0.99261     0.0604    0.1223
 …
                                             Results: Temporal Dynamics


 Model 1: Sublexical integration

          Effect of VOT / FTStep appears at same time as
          Vowel Length

 Model 2: Lexical Locus

          Effect of VOT / FTStep precedes Vowel Length

time                                           time

       VOT      Vowel Length                          VOT     Vowel Length



       Sublexical Rep. (phonemes)
                                                        Partial representation   More complete
                                                        retained...              representation…
                               The Lexicon                        The Lexicon
                                         B/P: Effects on looks to Competitor

Looks to competitor                                        Combined (b/p).
 fƒ
   Effect Size (normalized)   1.2

                                1

                              0.8

                              0.6

                              0.4
                                                             Vowel
                              0.2                            VOT
                                0                            UR
                              -0.2
                                     0   300      600      900       1200

                                               Time (ms)


  Little sequentiality—vowel length and VOT
  effects appear at same time.
                                   Looks to competitor (b/p)
fƒ
                                       B
                            1.2
 Effect Size (normalized)


                              1

                            0.8                                               Some
                            0.6

                            0.4
                                                                              sequentiality on
                            0.2
                                                               Vowel
                                                               VOT            voiced side
                              0                                UR
                            -0.2
                                   0       300      600      900       1200

                                                 Time (ms)

                                       P
                            1.2
 Effect Size (normalized)




                              1

                            0.8

                            0.6                                               None on
                            0.4

                            0.2
                                                               Vowel          voiceless.
                                                               VOT
                              0                                UR
                            -0.2
                                   0       300      600      900       1200

                                                 Time (ms)
                                       B/P Summary


Limited sequentiality of effects supports some kind
of sublexical integration.

   •Voiced: ~sequential effects.
   •Voiceless: effect of VOT simultaneous with
    vowel length.

VOT requires at least some portion of the vowel for
lexical interpretation.
   •Voiceless sounds need ―more‖.
   •Consistent with prior measurement and
     perceptual work.
                                        B/W: Effects on looks to Competitor

Looks to competitor                                        Combined (b/w).
 fƒ
                             1.2
  Effect Size (normalized)

                               1
                             0.8
                             0.6
                             0.4
                                                                Vowel
                             0.2
                                                                Step
                               0
                                                                UR
                             -0.2
                             -0.4

                                    0   300      600      900           1200
                                              Time (ms)

  Clearly sequential—FTStep effects appear
  before vowel length.
                                        Looks to competitor (b/w)
fƒ
                                 1.2
                                            B
 Effect Size (normalized)


                                   1
                                 0.8                                               Clear
                                 0.6
                                 0.4
                                                                                   sequentiality on
                                 0.2
                                   0
                                                                    UR
                                                                    Step           both sides.
                                                                    Vowel
                                 -0.2
                                 -0.4

                                        0       300      600      900       1200
                                                      Time (ms)
                                  1.2
                                    1       W
                                  0.8
      Effect Size (normalized)




                                  0.6
                                  0.4
                                  0.2
                                    0
                                 -0.2
                                 -0.4

                                        0       300      600      900       1200
                                                      Time (ms)
                                         B/W Summary



Manner of Articulation
   •Clear sequential effects on competitor.
   •Support lexical locus of temporal integration.


Formant transition slope may not work similarly to VOT.

   •Is VOT the right cue for voicing?

   •What was actually manipulated?
     FTSlope vs. Transition Duration
                             Experiment 1 Conclusions


Gradient effect on lexical activation extended to

   •Multi-dimensional categories
        VOT & Vowel Length
        FTStep & Vowel Length

   •Additional phonetic dimensions
       B/W: Manner of articulation
       R/L: Laterality
       D/G: Place of Articulation
Temporal Integration:

  •VOT effect precedes vowel length only for voiced
   sounds:
       Some vowel required to interpret VOT.

  •FTStep effect precedes vowel length.
       Supports lexical integration.
                                         Experiment 2



Lexical activation can play a role in integrating
multiple phonemic cues.


How long is the information available?

How is information at multiple levels integrated?
                                         Misperception


What if a stimulus was misperceived?

      Competitor still active
          -- easy to activate it rest of the way.

      Competitor completely inactive
          -- system will “garden-path”.

P ( misperception )  distance from boundary.

Gradient activation allows the system to hedge its bets.
barricade vs. parakeet    /beIkeId/ vs.
                          /peIkit/
  Input:     p/b   eI             k
     time
             i  t…
   Categorical Lexicon
  parakeet
 barricade


   Gradient Sensitivity
  parakeet
 barricade
                                            Methods

10 Pairs of b/p items.

    Voiced         Voiceless      Overlap
    Bumpercar      Pumpernickel   6
    Barricade      Parakeet       5
    Bassinet       Passenger      5
    Blanket        Plankton       5
    Beachball      Peachpit       4
    Billboard      Pillbox        4
    Drain Pipes    Train Tracks   4
    Dreadlocks     Treadmill      4
    Delaware       Telephone      4
    Delicatessen   Television     4
10 Pairs of b/p items.
   • 0 – 35 ms VOT continua.


20 Filler items (lemonade, restaurant, saxophone…)

Option to click ―X‖ (Mispronounced).

26 Subjects

1240 Trials over two days.
X
                                                                             Identification Results

                1.00
                0.90
Response Rate
                0.80
                0.70
                0.60
                0.50
                                 Voiced
                                 Voiceless
                                                                             Significant target
                0.40
                0.30
                                 NW                                          responses even at
                0.20
                0.10
                                                                             extreme.
                0.00
                       0     5         10    15   20   25      30       35

                       Barricade                             Parricade       Graded effects of
                1.00
                0.90
                                                                             VOT on correct
                                                                             response rate.
Response Rate




                0.80
                0.70
                0.60                                        Voiced
                0.50
                                                            Voiceless
                0.40
                0.30                                        NW
                0.20
                0.10
                0.00
                       0    5         10     15   20   25      30       35
                       Barakeet                               Parakeet
                                                                     Eye Movement Results


                              Barricade -> Parricade               Parakeet -> Barakeet
                       1
                                                                                                      VOT
                                                                                                        0
                      0.8
                               fƒ
Fixations to Target




                                                                                                        5
                      0.6                                                                               10
                                                                                                        15
                      0.4                                                                               20
                                                                                                        25
                      0.2                                                                               30
                                                                                                        35
                       0
                        300          600               900   300       600               900   1200
                                           Time (ms)                         Time (ms)


                              Faster activation of target as VOTs approach
                              lexical endpoint.

                                 • Even within the non-word range.
                                                                Phonetic “Garden-Path”


                        ―Garden-path‖ effect:
                             Difference between looks to each target
                             (b vs. p) at same VOT.

                                     VOT = 0 (/b/)                 VOT = 35 (/p/)
                       1
Fixations to Target




                      0.8
                                Barricade
                      0.6
                                Parakeet
                      0.4

                      0.2

                       0
                            0       500              1000   0      500           1000   1500

                                          Time (ms)                      Time (ms)
fƒ                         0.15


                            0.1                             Target
( Barricade - Parakeet )
    Garden-Path Effect




                           0.05


                              0
                                                                           GP Effect:
                           -0.05
                                                                           Gradient effect of VOT.
                            -0.1
                                   0    5   10   15   20    25   30   35   Target: p<.0001
                                                                           Competitor: p<.0001
                                                 VOT (ms)
                           0.06


                                   Competitor
                           0.04
( Barricade - Parakeet )
   Garden-Path Effect




                           0.02
                              0
                           -0.02
                           -0.04
                           -0.06
                           -0.08
                            -0.1
                                    0   5   10   15   20    25   30   35

                                                 VOT (ms)
                           Experiment 2 Conclusions

Gradient effect of within-category variation
without minimal-pairs.

Gradient effect long-lasting: mean POD = 240 ms.

Regressive ambiguity resolution:

   •Subphonemic gradations maintained until
    more information arrives.
   •Subphonemic gradation can improve (or
    hinder) recovery from garden path.
                                       Adult Summary




Lexical activation is exquisitely sensitive to
within-category detail.


This sensitivity is useful to integrate material
over time.
                                       Development

Historically, work in speech perception has been
linked to development.

Sensitivity to subphonemic detail must revise our
view of development.


Use: Infants face an additional problem of
temporal integration:

      Extracting a phonology from the series of
      utterances they hear.
Sensitivity to subphonemic detail:

For 30 years, virtually all attempts to address this
question have yielded categorical discrimination.


   Exception: Miller & Eimas (1996).
     •Only at extreme VOTs.
     •Only when habituated to non-
      prototypical token.
                                                    Use?


Nonetheless, infants possess abilities that would
require within-category sensitivity.


  •Infants can use allophonic differences at word
   boundaries for segmentation (Jusczyk, Hohne
   & Bauman, 1999; Hohne, & Jusczyk, 1994)

  •Infants can learn phonetic categories from
   distributional statistics (Maye, Werker &
   Gerken, 2002; Maye & Weiss, 2004).
                         Statistical Category Learning


Speech production causes clustering along contrastive
phonetic dimensions.

E.g. Voicing / Voice Onset Time
      B:     VOT ~ 0
      P:     VOT ~ 40


 Within a category, VOT forms Gaussian
  distribution.

 Result: Bimodal distribution
                                        0ms         40ms
                                              VOT
To statistically learn speech categories, infants must:

    •Record frequencies of tokens at each value
     along a stimulus dimension.

    •Extract categories from the distribution.

                    +voice         -voice
        frequency




                      0ms            50ms
                             VOT

    •This requires ability to track specific VOTs.
                                         Experiment 3

Why no demonstrations of sensitivity?

   • Habituation
        Discrimination not ID.
        Possible selective adaptation.
        Possible attenuation of sensitivity.

   • Synthetic speech
         Not ideal for infants.

   • Single exemplar/continuum
         Not necessarily a category representation

Experiment 3: Reassess issue with improved methods.
                                                   HTPP


Head-Turn Preference Procedure
                               (Jusczyk & Aslin, 1995)

Infants exposed to a chunk of language:
      •Words in running speech.
      •Stream of continuous speech (ala statistical
       learning paradigm).
      •Word list.

After exposure, memory for exposed items (or
abstractions) is assessed by comparing listening time to
consistent items with inconsistent items.
Test trials start with all lights off.
Center Light blinks.
Brings infant’s attention to center.
One of the side-lights blinks.
                                   Beach…
                                   Beach…
                                   Beach…




When infant looks at side-light…
      …he hears a word
…as long as he keeps looking.
                                                  Methods


7.5 month old infants exposed to either 4 b-, or 4 p-words.

80 repetitions total.                  Bomb     Palm
                                       Bear     Pear
Form a category of the exposed         Bail     Pail
class of words.
                                       Beach    Peach

Measure listening time on…
                        Original words Bear     Pear
                          Competitors Pear      Bear
                VOT closer to boundary Bear*    Pear*
Stimuli constructed by cross-splicing
naturally produced tokens of each end point.

B:    M= 3.6 ms VOT
P:    M= 40.7 ms VOT

B*:   M=11.9 ms VOT
P*:   M=30.2 ms VOT

B* and P* were judged /b/ or /p/ at
least 90% consistently by adult listeners.

B*: 97%
P*: 96%
                               Novelty or Familiarity?

Novelty/Familiarity preference varies across infants
and experiments.

We’re only interested in the middle stimuli (b*, p*).

Infants were classified as novelty or familiarity
preferring by performance on the endpoints.

     Novelty Familiarity
                            Within each group
 B     36        16         will we see
                            evidence for
 P     21        12         gradiency?
After being exposed to
      bear… beach… bail… bomb…

Infants who show a novelty effect…
      …will look longer for pear than bear.


                                        What about in between?
 Listening Time




                                        Categorical
                                        Gradient


                  Bear   Bear*   Pear
                                                                 Results

                        Novelty infants (B: 36        P: 21)
                      10000

                      9000
Listening Time (ms)


                      8000

                      7000
                                                 Exposed to:
                      6000
                                                       B
                      5000
                                                       P

                      4000
                              Target   Target*      Competitor



          Target vs. Target*:                    p<.001
      Competitor vs. Target*:                    p=.017
                        Familiarity infants (B: 16        P: 12)

                      10000
                                                       Exposed to:
                       9000                                  B
Listening Time (ms)



                                                             P
                       8000

                       7000

                       6000

                       5000

                       4000
                               Target     Target*      Competitor



                          Target vs. Target*:       P=.003
                      Competitor vs. Target*:       p=.012
                             Infants exposed to /p/
                                                                  .009**
                10000

                                                                                            Novelty
Listening Time (ms)


                      9000       .024*
                      8000                                                                  N=21
                      7000


                      6000

                      5000
                                     Listening Time (ms)
                                                           9000                .028*
                      4000
                                                           8000
                             P                             P*              B                .018*
                                                           7000




        Familiarity
                                                           6000



             N=12                                          5000


                                                           4000
                                                                       P               P*           B
                              Infants exposed to /b/
                      10000
                                                                    >.2
                                                                    >.1
                                  <.001**
                      9000
                                                                                             Novelty
Listening Time (ms)




                      8000
                                                                                             N=36
                      7000


                      6000

                                                            10000                 .06
                      5000
                                                                                              .15
                                      Listening Time (ms)

                                                             9000
                      4000
                              B                                B*
                                                             8000             P

                                                             7000



           Familiarity                                       6000



                N=16                                         5000


                                                             4000
                                                                          B             B*          P
                                 Experiment 3 Conclusions

Contrary to all previous work:

7.5 month old infants show gradient sensitivity to
subphonemic detail.
   • Clear effect for /p/
   • Effect attenuated for /b/.
Reduced effect for /b/… But:
 Listening Time




                                                      Null Effect?


                  Bear             Bear*    Pear
                  Listening Time




                                                           Expected Result?


                                   Bear    Bear*   Pear
Listening Time
                                       Actual result.   

                 Bear   Bear*   Pear

•Bear*  Pear

•Category boundary lies between Bear & Bear*
  • Between (3ms and 11 ms).


•Will we see evidence for within-category
 sensitivity with a different range?
                                    Experiment 4

Same design as experiment 3.
VOTs shifted away from hypothesized boundary
Train
        Bomb     Bear
        Beach    Bale
                         -9.7 ms.

Test:
        Bomb     Bear
        Beach    Bale
                         -9.7 ms.
        Bomb*    Bear*
        Beach*   Bale*
                         3.6 ms.
        Palm     Pear
        Peach    Pail
                         40.7 ms.
                             Familiarity infants (34 Infants)

                      9000
                                     =.01**

                      8000                        =.05*
Listening Time (ms)




                      7000


                      6000



                      5000


                      4000
                                B-            B           P
                             Novelty infants (25 Infants)

                      9000
                                                 =.002**
                                     =.02*
Listening Time (ms)


                      8000



                      7000


                      6000


                      5000


                      4000
                                B-           B             P
                           Experiment 4 Conclusions


•Within-category sensitivity in /b/ as well as /p/.


•Shifted category boundary in /b/: not consistent
 with adult boundary (or prior infant work). Why?
/b/ results consistent with (at least) two mappings.
 Category Mapping


                     /b/         /p/
      Strength


                                          1) Shifted boundary



                              VOT

                    •Inconsistent with prior literature.

                    •Why would infants have this boundary?
HTPP is a one-alternative task.
    Asks:      B or not-B       not:         B or P

                      Adult boundary
 Category Mapping




                        unmapped       2) Sparse Categories
                    /b/          /p/
      Strength




                          space




                          VOT

Hypothesis:
Sparse categories: by-product of efficient learning.
                                  Computational Model
                                                               Adult boundary




                                          Category Mapping
Distributional learning model                                /b/
                                                                   unmapped
                                                                     space    /p/




                                               Strength
1) Model distribution of tokens as
                                                                    VOT
   a mixture of Gaussian distributions
   over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with the
   highest posterior probability is the “category”.

3) Each Gaussian has three 
   parameters:
                                      

                            VOT       
Statistical Category Learning

1) Start with a set of randomly selected Gaussians.

2) After each input, adjust each parameter to find
   best description of the input.

3) Start with more Gaussians than necessar--model
   doesn’t innately know how many categories.
       -> 0 for unneeded categories.


       VOT                       VOT
Overgeneralization
  • large 
  • costly: lose phonetic distinctions…
Undergeneralization
  • small 
  • not as costly: maintain distinctiveness.
To increase likelihood of successful learning:
   • err on the side of caution.
   • start with small 

                            1
                           0.9
                           0.8

39,900                     0.7
              P(Success)



                           0.6

Models                     0.5
                           0.4
                                                            2 Category Model
                                                            3 Category Model
Run                        0.3
                           0.2
                           0.1
                            0
                                 0   10   20      30         40       50       60

                                               Starting 
                                                                     Small 

Sparseness coefficient: % of
space not strongly mapped                                                      Unmapped
                                                                                 space
to any category.
                                                              VOT
      Avg Sparseness Coefficient



                                    0.4                               Starting 
                                   0.35                                    .5-1
                                    0.3
                                   0.25
                                    0.2
                                   0.15
                                    0.1
                                   0.05
                                     0
                                          0   2000   4000   6000    8000   10000   12000

                                                     Training Epochs
Start with large σ


                                                            VOT

                                  0.4                               Starting 
      Avg Sparsity Coefficient




                                 0.35                                    .5-1
                                  0.3
                                 0.25
                                                                         20-40
                                  0.2
                                 0.15
                                  0.1
                                 0.05
                                   0
                                        0   2000   4000   6000    8000   10000   12000

                                                   Training Epochs
Intermediate starting σ


                                                            VOT

                                  0.4                               Starting 
      Avg Sparsity Coefficient




                                 0.35                                    .5-1
                                  0.3                                    3-11
                                 0.25                                    12-17
                                                                         20-40
                                  0.2
                                 0.15
                                  0.1
                                 0.05
                                   0
                                        0   2000   4000   6000    8000   10000   12000

                                                   Training Epochs
                                          Limitations

1) Occasionally model leaves sparse regions at the end
   of learning.
    • Competition/Choice framework:
        Additional competition or selection mechanisms
        during processing: categorization despite
        incomplete information.

2) Multi-dimensional categories
          1-D: 3     parameters / category
          2-D: 5        “         “
          3-D: 21       “         “
    • Incorporating cue/model-reliability may
       reduce dimensionality.
                        Non-parametric approach?

                                    Categories
•Competitive Hebbian Learning
 (Rumelhart & Zipser, 1986).
•Not constrained by a particular
 equation—can fill space better.
•Similar properties in terms of       VOT
 starting  and sparseness.
                                       Model Conclusions

To avoid overgeneralization…
      …better to start with small estimates for 

Small or even medium starting ’s lead to sparse
category structure during infancy—much of
phonetic space is unmapped.


Sparse categories:
      Similar temporal integration to exp 2

      Retain ambiguity (and partial
      representations) until more input is available.
                                       Infant Summary

Infants show graded sensitivity to subphonemic detail.

/b/-results: regions of unmapped phonetic space.

Statistical approach provides support for sparseness.
   • Given current learning theories, sparseness
      results from optimal starting parameters.

Empirical test will require a two-alternative task.
  • AEM: train infants to make eye-movements in
    response to stimulus identity.
                                          Conclusions


Infant and adult word learning are sensitive to
subphonemic detail.


Sensitivity is important to adult and developing
word recognition systems.

      1) Short term cue integration.
      2) Long term phonology learning.

In both cases, partially ambiguous material is
retained until more data arrives.
                                                The Future?


Change is the law of life. And those who look only to
the past or present are certain to miss the future.
       -- John F. Kennedy
                                                 The Future?


Change is the law of life. And those [Word
Recognition Systems] who look only to the
past or present are certain to miss the future
[Acoustic Material].
       -- John F. Kennedy-[McMurray]



Subphonemic cues signal upcoming events.

Can the system use the information to prepare
itself for future material?
                               The Last Word


Spoken language is defined by change.

 But the information to cope with it is
             in the signal.

 Within-category acoustic variation is
          signal, not noise.
Subphonemic detail is used in
  spoken word recognition:

   Temporal Integration at
      Two Time Scales


        Bob McMurray
• Infants make anticipatory eye-movements along
  predicted trajectory, in response to stimulus identity.

• Two alternatives allows us to distinguish between
  category boundary and unmapped space.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:15
posted:12/11/2011
language:English
pages:132