Embed
Email

PPT

Document Sample

Shared by: wuxiangyu
Categories
Tags
Stats
views:
1
posted:
12/11/2011
language:
pages:
132
Subphonemic detail is used in

spoken word recognition:



Temporal Integration at

Two Time Scales





Bob McMurray

Grateful Thanks to:



Advisors Collaborators

Dick Aslin Meghan Clayards

Mike Tanenhaus David Gow



Committee Saviors in the Lab

Joyce McDonough Julie Markant

David Knill Dana Subik

Christopher Brown



People who put up with me

Kate Pirog Kathy Corser Bette

Andrea Lathrop Jennifer Gillis McCormick

Meaningful stimuli are almost always temporal.



Scene Perception: build stable representation

across multiple eye-movements, attention shifts.



Music: series of notes. Temporal properties (order

and rhythm) are fundamental.

Language as Temporal Integration





Temporal Integration fundamental to language, as it

appears in the world.



•Word: Ordered series of articulations.



•Sentence: Sequence of words.



•A Language: Series of utterances.



Phonology, syntax extracted from this series of

utterances.

How are abstract representations formed?



Stimuli do not change arbitrarily.



At any point in time, subtle, perceptual cues tell the

system something about the change itself.



Enable an active integration process.

Anticipating future events

Retain partial present representations.

Resolve prior ambiguity.

Word recognition is an ideal arena:

• Substantial perceptual information available.

• Multiple timescales for integration.







But:

Early evidence suggested that this

perceptual information is not maintained.

Overview





1) Continuous perceptual variation affects word

recognition.



2) A new framework for word recognition.



3) Integrating speech cues in online recognition.



4) Long-term temporal integration: development.



5) The use of continuous detail during development.



6) Conclusions

Speech and Word Recognition





Acoustic Speech Perception

• Categorization of acoustic

input into sublexical units.





Sublexical Units

/a/ /la/ /ip/

/b/ /l/ /p/





Word Recognition Lexicon

• Identification of target word

from active sublexical units.

Word Recognition as temporal ambiguity resolution



• Information arrives sequentially

• At early points in time, signal is temporarily

ambiguous.



X

basic bakery

ba… kery

X

barrier



X

barricade X

bait





X

baby

• Later arriving information disambiguates the word.

Current models of spoken word recognition



• Immediacy: Hypotheses formed from the earliest

moments of input.



• Activation Based: Lexical candidates (words)

receive activation to the degree they match the

input.



• Parallel Processing: Multiple items are active in

parallel.



• Competition: Items compete with each other for

recognition.

Input: b... u… tt… e… r

time



beach

butter

bump

putter

dog

These processes have been well defined for a

phonemic representation of the input.



n S n

k Ag I 



But there may be considerably less ambiguity in the

signal if we consider subphonemic information.



Example: subphonemic effects of motor processes.

Coarticulation



Any action reflects future actions as it unfolds.



Example: Coarticulation

Movements of articulators (lips, tongue…) during

speech reflect current, future and past events.



Yields subtle subphonemic variation in speech that

reflects temporal organization.



n n

Sensitivity to these

e  e perceptual details might

t c yield earlier disambiguation.

k

These processes have largely been ignored

because of a history of evidence that perceptual

variability gets discarded.



Example: Categorical Perception

Categorical Perception





100 100









Discrimination

B









% /p/

Discrimination





ID (%/pa/)

P 0 0

B VOT P



• Sharp identification of tokens on a continuum.

• Discrimination poor within a phonetic category.



Subphonemic variation in VOT is discarded in favor

of a discrete symbol (phoneme).

Evidence against the strong form of Categorical

Perception comes from a variety of

psychophysical-type tasks:



Discrimination Tasks

Pisoni and Tash (1974)

Pisoni & Lazarus (1974)

Carney, Widin & Viemeister (1977)

Training

Samuel (1977)

Pisoni, Aslin, Perey & Hennessy (1982)

Goodness Ratings

Miller (1997)

Massaro & Cohen (1983)

Does within-category acoustic detail

systematically affect higher level

language?





Is there a gradient effect of

subphonemic detail on lexical

activation?

McMurray, Aslin & Tanenhaus (2002)





A gradient relationship would yield systematic effects

of subphonemic information on lexical activation.





If this gradiency is useful for temporal integration, it

must be preserved over time.





Need a design sensitive to both acoustic detail and

detailed temporal dynamics of lexical activation.

Acoustic Detail



Use a speech continuum—more steps yields a

better picture acoustic mapping.



KlattWorks: generate synthetic continua from

natural speech.



9-step VOT continua (0-40 ms)



6 pairs of words.

beach/peach bale/pale bear/pear

bump/pump bomb/palm butter/putter



6 fillers.

lamp leg lock ladder lip leaf

shark shell shoe ship sheep shirt

Temporal Dynamics



How do we tap on-line recognition?

With an on-line task: Eye-movements



Subjects hear spoken language and manipulate

objects in a visual world.



Visual world includes set of objects with interesting

linguistic properties.



a beach, a peach and some unrelated items.



Eye-movements to each object are monitored

throughout the task.



Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

Why use eye-movements and visual world paradigm?



•Relatively natural task.



•Eye-movements generated very fast (within 200ms

of first bit of information).



•Eye movements time-locked to speech.



•Subjects aren’t aware of eye-movements.



•Fixation probability maps onto lexical activation..

Task









A moment

to view the

items

Task





Bear





Repeat

1080

times

Identification Results





1

0.9

0.8

0.7

High agreement

proportion /p/









0.6 across subjects

0.5 and items for

0.4

0.3

category

0.2 boundary.

0.1

0

0 5 10 15 20 25 30 35 40

B VOT (ms) P





By subject: 17.25 +/- 1.33ms

By item: 17.24 +/- 1.24ms

Task



200 ms

Trials

1



2



3



4



5







Target = Bear

Competitor = Pear

Unrelated = Lamp, Ship

Time

Task







VOT=0 Response= VOT=40 Response=

0.9

Fixation proportion







0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00 400 800 1200 1600 0 400 800 1200 1600 2000



Time (ms)



More looks to competitor than unrelated items.

Task





Given that

• the subject heard bear How often was the subject

• clicked on ―bear‖… looking at the ―pear‖?





Categorical Results Gradient Effect

Fixation proportion









Fixation proportion

target target









competitor competitor

time time

Results





Response= Response=

0.16

VOT VOT

Competitor Fixations







0.14 0 ms 20 ms

0.12 5 ms 25 ms

10 ms 30 ms

0.1 15 ms 35 ms

40 ms

0.08

0.06



0.04



0.02



0

0 400 800 1200 1600 0 400 800 1200 1600 2000

Time since word onset (ms)





Long-lasting gradient effect: seen throughout

the timecourse of processing.

Response= Response=

0.08



Competitor Fixations 0.07

Looks to

0.06



0.05



0.04 Looks to



0.03 Category

Boundary

0.02

0 5 10 15 20 25 30 35 40

VOT (ms)



Area under the curve:

Clear effects of VOT B: p=.017* P: p.1

Distance from Category Boundary P: p=.027

Summary: Gradiency



Across continua, looks to competitors validated gradient

hypothesis.



Continuum Vowel Finding



Replicate prior work

B/P P=.0015 .006 2D gradiency

Extend gradiency to FT Slope

B/W .001 .05 2D gradiency

Extend gradiency to F3

R/L .001 >.1 Validate methods

Extend gradiency to place

D/G .017 >.1 Validate methods

Results: Temporal Dynamics







When do effects occur?



VOT / FTStep effects cooccurs with vowel length.

(Sublexical Integration)





VOT / FTStep precedes vowel length.

(Lexical locus)

Compute 3 effect sizes at each 20 ms time slice.



•VOT / FTStep: Regression slope of competitor

fixations as a function of VOT.



Time = 720 ms…

0.2 0.14









Competitor Fixations

VOT from

0.18

Boundary 0.12

Competitor Fixations









0.16 -25

0.1

0.14 -20

0.12 -15 0.08

-10

0.1



Y = M720x + B

-5 0.06

0.08

0.04

0.06

0.04 0.02

0.02

0

0

0 500 1000 1500 2000 -30 -25 -20 -15 -10 -5 0



Time (s) Distance from Boundary (VOT)

Compute 3 effect sizes at each 20 ms time slice.



•VOT / FTStep: Regression slope of competitor

fixations as a function of VOT.



Time = 740 ms…

0.2 0.14









Competitor Fixations

VOT from

0.18

Boundary 0.12

Competitor Fixations









0.16 -25

0.1

0.14 -20

0.12 -15 0.08

-10

0.1



Y = M740x + B

-5 0.06

0.08

0.04

0.06

0.04 0.02

0.02

0

0

0 500 1000 1500 2000 -30 -25 -20 -15 -10 -5 0



Time (s) Distance from Boundary (VOT)

Compute 3 effect sizes at each 20 ms time slice.





•Vowel Length: Difference (D) between fixations

after hearing long vs. short vowel.



Time = 340 ms…

•Repeat for 0.084

each time

Competitor Fixations



0.080

slice, subject.

0.076 L-S = D

0.072



0.068



0.064

Long Short

Compute 3 effect sizes at each 20 ms time slice.



•Unrelated: Difference between looks to target

after a experimental vs. filler stimulus.



Information available from the earliest moments

of processing: subjects should show early effect.



Does analysis have sufficient power?

Resulting dataset…



Subject Time Unrelated VOT (M) Vowel (D)

1 20 0.02076 -0.0023 0.0094

40 0.02446 -0.0016 0.0095

60 0.02916 -0.0008 0.0108



2000 0.99871 0.06021 0.123

2 20 0.05642 0.0014 0.0091

40 0.07126 0.0018 0.0088

60 0.08926 0.0029 0.0104



2000 0.99261 0.0604 0.1223



Results: Temporal Dynamics





Model 1: Sublexical integration



Effect of VOT / FTStep appears at same time as

Vowel Length



Model 2: Lexical Locus



Effect of VOT / FTStep precedes Vowel Length



time time



VOT Vowel Length VOT Vowel Length







Sublexical Rep. (phonemes)

Partial representation More complete

retained... representation…

The Lexicon The Lexicon

B/P: Effects on looks to Competitor



Looks to competitor Combined (b/p).



Effect Size (normalized) 1.2



1



0.8



0.6



0.4

Vowel

0.2 VOT

0 UR

-0.2

0 300 600 900 1200



Time (ms)





Little sequentiality—vowel length and VOT

effects appear at same time.

Looks to competitor (b/p)



B

1.2

Effect Size (normalized)





1



0.8 Some

0.6



0.4

sequentiality on

0.2

Vowel

VOT voiced side

0 UR

-0.2

0 300 600 900 1200



Time (ms)



P

1.2

Effect Size (normalized)









1



0.8



0.6 None on

0.4



0.2

Vowel voiceless.

VOT

0 UR

-0.2

0 300 600 900 1200



Time (ms)

B/P Summary





Limited sequentiality of effects supports some kind

of sublexical integration.



•Voiced: ~sequential effects.

•Voiceless: effect of VOT simultaneous with

vowel length.



VOT requires at least some portion of the vowel for

lexical interpretation.

•Voiceless sounds need ―more‖.

•Consistent with prior measurement and

perceptual work.

B/W: Effects on looks to Competitor



Looks to competitor Combined (b/w).



1.2

Effect Size (normalized)



1

0.8

0.6

0.4

Vowel

0.2

Step

0

UR

-0.2

-0.4



0 300 600 900 1200

Time (ms)



Clearly sequential—FTStep effects appear

before vowel length.

Looks to competitor (b/w)



1.2

B

Effect Size (normalized)





1

0.8 Clear

0.6

0.4

sequentiality on

0.2

0

UR

Step both sides.

Vowel

-0.2

-0.4



0 300 600 900 1200

Time (ms)

1.2

1 W

0.8

Effect Size (normalized)









0.6

0.4

0.2

0

-0.2

-0.4



0 300 600 900 1200

Time (ms)

B/W Summary







Manner of Articulation

•Clear sequential effects on competitor.

•Support lexical locus of temporal integration.





Formant transition slope may not work similarly to VOT.



•Is VOT the right cue for voicing?



•What was actually manipulated?

FTSlope vs. Transition Duration

Experiment 1 Conclusions





Gradient effect on lexical activation extended to



•Multi-dimensional categories

VOT & Vowel Length

FTStep & Vowel Length



•Additional phonetic dimensions

B/W: Manner of articulation

R/L: Laterality

D/G: Place of Articulation

Temporal Integration:



•VOT effect precedes vowel length only for voiced

sounds:

Some vowel required to interpret VOT.



•FTStep effect precedes vowel length.

Supports lexical integration.

Experiment 2







Lexical activation can play a role in integrating

multiple phonemic cues.





How long is the information available?



How is information at multiple levels integrated?

Misperception





What if a stimulus was misperceived?



Competitor still active

-- easy to activate it rest of the way.



Competitor completely inactive

-- system will “garden-path”.



P ( misperception )  distance from boundary.



Gradient activation allows the system to hedge its bets.

barricade vs. parakeet /beIkeId/ vs.

/peIkit/

Input: p/b eI   k

time

i t…

Categorical Lexicon

parakeet

barricade





Gradient Sensitivity

parakeet

barricade

Methods



10 Pairs of b/p items.



Voiced Voiceless Overlap

Bumpercar Pumpernickel 6

Barricade Parakeet 5

Bassinet Passenger 5

Blanket Plankton 5

Beachball Peachpit 4

Billboard Pillbox 4

Drain Pipes Train Tracks 4

Dreadlocks Treadmill 4

Delaware Telephone 4

Delicatessen Television 4

10 Pairs of b/p items.

• 0 – 35 ms VOT continua.





20 Filler items (lemonade, restaurant, saxophone…)



Option to click ―X‖ (Mispronounced).



26 Subjects



1240 Trials over two days.

X

Identification Results



1.00

0.90

Response Rate

0.80

0.70

0.60

0.50

Voiced

Voiceless

Significant target

0.40

0.30

NW responses even at

0.20

0.10

extreme.

0.00

0 5 10 15 20 25 30 35



Barricade Parricade Graded effects of

1.00

0.90

VOT on correct

response rate.

Response Rate









0.80

0.70

0.60 Voiced

0.50

Voiceless

0.40

0.30 NW

0.20

0.10

0.00

0 5 10 15 20 25 30 35

Barakeet Parakeet

Eye Movement Results





Barricade -> Parricade Parakeet -> Barakeet

1

VOT

0

0.8



Fixations to Target









5

0.6 10

15

0.4 20

25

0.2 30

35

0

300 600 900 300 600 900 1200

Time (ms) Time (ms)





Faster activation of target as VOTs approach

lexical endpoint.



• Even within the non-word range.

Phonetic “Garden-Path”





―Garden-path‖ effect:

Difference between looks to each target

(b vs. p) at same VOT.



VOT = 0 (/b/) VOT = 35 (/p/)

1

Fixations to Target









0.8

Barricade

0.6

Parakeet

0.4



0.2



0

0 500 1000 0 500 1000 1500



Time (ms) Time (ms)

fƒ 0.15





0.1 Target

( Barricade - Parakeet )

Garden-Path Effect









0.05





0

GP Effect:

-0.05

Gradient effect of VOT.

-0.1

0 5 10 15 20 25 30 35 Target: p.2

>.1

0 for unneeded categories.





VOT VOT

Overgeneralization

• large 

• costly: lose phonetic distinctions…

Undergeneralization

• small 

• not as costly: maintain distinctiveness.

To increase likelihood of successful learning:

• err on the side of caution.

• start with small 



1

0.9

0.8



39,900 0.7

P(Success)







0.6



Models 0.5

0.4

2 Category Model

3 Category Model

Run 0.3

0.2

0.1

0

0 10 20 30 40 50 60



Starting 

Small 



Sparseness coefficient: % of

space not strongly mapped Unmapped

space

to any category.

VOT

Avg Sparseness Coefficient







0.4 Starting 

0.35 .5-1

0.3

0.25

0.2

0.15

0.1

0.05

0

0 2000 4000 6000 8000 10000 12000



Training Epochs

Start with large σ





VOT



0.4 Starting 

Avg Sparsity Coefficient









0.35 .5-1

0.3

0.25

20-40

0.2

0.15

0.1

0.05

0

0 2000 4000 6000 8000 10000 12000



Training Epochs

Intermediate starting σ





VOT



0.4 Starting 

Avg Sparsity Coefficient









0.35 .5-1

0.3 3-11

0.25 12-17

20-40

0.2

0.15

0.1

0.05

0

0 2000 4000 6000 8000 10000 12000



Training Epochs

Limitations



1) Occasionally model leaves sparse regions at the end

of learning.

• Competition/Choice framework:

Additional competition or selection mechanisms

during processing: categorization despite

incomplete information.



2) Multi-dimensional categories

1-D: 3 parameters / category

2-D: 5 “ “

3-D: 21 “ “

• Incorporating cue/model-reliability may

reduce dimensionality.

Non-parametric approach?



Categories

•Competitive Hebbian Learning

(Rumelhart & Zipser, 1986).

•Not constrained by a particular

equation—can fill space better.

•Similar properties in terms of VOT

starting  and sparseness.

Model Conclusions



To avoid overgeneralization…

…better to start with small estimates for 



Small or even medium starting ’s lead to sparse

category structure during infancy—much of

phonetic space is unmapped.





Sparse categories:

Similar temporal integration to exp 2



Retain ambiguity (and partial

representations) until more input is available.

Infant Summary



Infants show graded sensitivity to subphonemic detail.



/b/-results: regions of unmapped phonetic space.



Statistical approach provides support for sparseness.

• Given current learning theories, sparseness

results from optimal starting parameters.



Empirical test will require a two-alternative task.

• AEM: train infants to make eye-movements in

response to stimulus identity.

Conclusions





Infant and adult word learning are sensitive to

subphonemic detail.





Sensitivity is important to adult and developing

word recognition systems.



1) Short term cue integration.

2) Long term phonology learning.



In both cases, partially ambiguous material is

retained until more data arrives.

The Future?





Change is the law of life. And those who look only to

the past or present are certain to miss the future.

-- John F. Kennedy

The Future?





Change is the law of life. And those [Word

Recognition Systems] who look only to the

past or present are certain to miss the future

[Acoustic Material].

-- John F. Kennedy-[McMurray]







Subphonemic cues signal upcoming events.



Can the system use the information to prepare

itself for future material?

The Last Word





Spoken language is defined by change.



But the information to cope with it is

in the signal.



Within-category acoustic variation is

signal, not noise.

Subphonemic detail is used in

spoken word recognition:



Temporal Integration at

Two Time Scales





Bob McMurray

• Infants make anticipatory eye-movements along

predicted trajectory, in response to stimulus identity.



• Two alternatives allows us to distinguish between

category boundary and unmapped space.



Related docs
Other docs by wuxiangyu
CP_Outline
Views: 0  |  Downloads: 0
manila_prices
Views: 1  |  Downloads: 0
LLOYD'S JEWELLERS' BLOCK PROPOSAL FORM
Views: 19  |  Downloads: 0
APAH
Views: 1  |  Downloads: 0
124410_r43f766e294903
Views: 1  |  Downloads: 0
PG Application Form - Scholarships Online
Views: 0  |  Downloads: 0
AFGHAN BRISHNA CUP_ NY_ USA
Views: 9  |  Downloads: 0
Peddler's Post November 2008.indd
Views: 0  |  Downloads: 0
Section 1 Executive Summary
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!