Docstoc

Puzzles and Patterns in 50 years

Document Sample
Puzzles and Patterns in 50 years Powered By Docstoc
					   Puzzles and Patterns
in 50 years of Research on
    Speech Perception

        Sarah Hawkins
    University of Cambridge
      sh110@cam.ac.uk
                Three periods
1. 1950-1965 Broad-based exploration
2. 1965-1990s Narrowed to focus on the
   search for invariance in the relationship
   between speech signal and its percept: THEORY
3. 1995…. This focus is broadening again
  –    to include ‗discrepant‘ data & new understanding
  –    which requires changes in conceptualization of
      • task goals
      • processes involved
          The Main Message
• Speech perception is at an exciting stage:
       we are beginning to
  integrate areas of old research
  with the mainstream theoretical
  work of the last 30 years or so
• A paradigm shift?
 Early work

Glorious Discovery
               Early work
• often looked at effects on the whole signal
• but as puzzles arose, and we looked more
  closely, then attention became focused on
  small domains in an effort both to simplify
  and to clarify
   Early work: source separation
Cocktail party effect / multi-talker perception
Cherry (1953)
• continuous natural speech, with different
  types of content, presented in different ways
• a huge wealth of observations relevant to
   – memory
   – attention
   – transitional probabilities
   – speaker vs message
                             Cherry (1953) JASA 25, 975-979
      Early work: source separation
Cocktail party effect / multi-talker perception
Broadbent & Ladefoged (1957)
• separate synthetic formants fuse to sound like
  a single vowel when presented to the same or
  different ears, only if they have the same f0
• compared ‗natural‘ and ‗sustained‘ formants
• extensions to theories of hearing (e.g. Licklider)
                            Broadbent & Ladefoged (1957) JASA 29, 708-710
                                          Darwin (1981) QJEP 33, 185-207
                                   Bregman (1990) Auditory Scene Analysis
ASA special session, 2004      Cooke & Ellis (2001) Sp. Comm. 35, 141–177
   Early work: source integration
Sumby & Pollack (1954)
Especially in high levels of noise:
• audiovisual presentation increases intelligibility
  (visual contribution is relative to the available
  auditory contribution)
                        Sumby & Pollack (1954) JASA 26: 212-215
                          Massaro (1998) Perceiving Talking Faces
                           Widespread AV groups and applications
   Early work: source integration
Sumby & Pollack (1954)
Especially in high levels of noise:
• audiovisual presentation increases intelligibility
  (visual contribution is relative to the available
  auditory contribution)
                        Sumby & Pollack (1954) JASA 26: 212-215
                          Massaro (1998) Perceiving Talking Faces
                           Widespread AV groups and applications

• in auditory-only presentations, polysyllables are
  more intelligible than monosyllables
      (overall shape... neighborhoods…cohorts…)
                        Richard Warren, Paul Luce, Marslen-Wilson
       Early work: brain function
Kimura (1961)
• speech is processed more efficiently by the ear that is
  contralateral to the language-dominant hemisphere
• independent of handedness and right/left focus of
  damage due to epilepsy
 complexities of auditory pathways, cerebral
  dominance, and speech processing



                       Kimura (1961) Canadian J. Psychol., 15, 166-171
                         The new ‘cognitive neuroscience/psychology’…
         Early work: memory
Miller (1956)
• short term memory span for unrelated items
   – The Magical Number Seven ± Two
• can increase this span by:
  – making relative rather than absolute judgments
  – increasing the number of dimensions
  – chunking into larger items
• recoding is a crucial process
                     Miller (1956) Psychological Review 63, 81-97
                      Serial learning and recall (e.g. Underwood)
                           Lashley (1951) Serial order in behavior
                                           Pisoni (1973) and later
        Early work: intelligibility
      Context of Possible Responses
Miller, Heise & Lichten
  (1951)
• monosyllables
• size of test vocabulary
  affects identification
   • 2…256…all monsylls
• though presumably
  there are limits:
   – two vs six
   – five vs nine !
                   Miller, Heise & Lichten, (1951) J.Exp.Psych. 41, 329-335
     Early work: intelligibility
         Phonetic Context
Pickett & Pollack (1963)
• excerpts from connected speech must be
  ≥ 800 ms long to be fully intelligible
• regardless of rate:
  – faster rates need more syllables to be understood
    (slowing the speech down does not help)
   crucial role of coarticulation & style
         (‗connected speech processes‘)
                Pickett & Pollack (1963) Language & Speech 6, 165-171
    Early work: preceding context
affects the interpretation of the current sound

Ladefoged and Broadbent (1957)
• "Please say what this word is:
     bit bet bat but
                         F1 of CARRIER
              bet        200-380 Hz
               bit           380-660 Hz


                     Ladefoged and Broadbent (1957) JASA 29, 98-104
       Early work: immediate context
     determines the interpretation of the current stimulus


Synthesizing
bursts and
transitionless
vowels




                 Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606
       Early work: immediate context
    determines the interpretation of the current stimulus


Identification
of bursts and
transitionless
vowels:
the CV is
identified as
the minimal
acoustic unit



                 Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606
       Early work: immediate context
    determines the interpretation of the current stimulus


Identification of
burstless stops
with different
vowels:
 transitions
     are
   all you
    need!
                        Delattre, Liberman, & Cooper (1955) JASA 27, 769-773
       Categorical Perception
       of obstruent consonants
Equal acoustic changes  unequal auditory percepts
   place of articulation of stops: /b/ vs /d/ vs /g/




          b                d                   g
                     Liberman, Harris, Hoffman, and Griffith (1957)
                    Journal of Experimental Psychology 54, 358-368
        Categorical Perception
        of obstruent consonants
• together with a theoretical bias in favor of
  binary oppositions
• encouraged a focused search for simple
  transformations from the encoded signal to an
  unambiguous, formal linguistic mental
  representation
            This narrower focus
• required clear conceptualisation of
   – identity of the important unit(s) of perception
   – process of abstraction

• On the whole, the units and levels of linguistic
  description were rather uncritically adopted
 …units of linguistic description
 were rather uncritically adopted
―we….had undertaken to find the ‗invariants‘
of speech, a term which implies, at least in its
simplest interpretation, a one-to-one
correspondence between something half-
hidden in the spectrogram and the successive
phonemes of the message.‖

            Cooper, Delattre, Liberman, Borst & Gerstman,
                     Perception of synthetic speech sounds
                                    JASA (1952) 24, 604-5
    …though not without some
          misgivings
―…one should not expect always to be able to
 find acoustic invariants for the individual
 phonemes…we are trying to [compile] the
 code book, one in which there is one
 column for acoustic entries and another
 column for message units, whether these be
 phonemes, syllables, words, or whatever.‖

          Cooper, Delattre, Liberman, Borst & Gerstman,
                   Perception of synthetic speech sounds
                                  JASA (1952) 24, 604-5
 Middle period

The search for essence:
     ‗invariance‘
            Middle period:
        the search for essence
• Impose order on the chaos!
• Focus: non-linearity between variation in
  acoustic signal and perceptual response
            Categorical Perception
                 (of consonants)
• Context becomes seen as variability, so we
  control for it ever more stringently
• to discover the crucial—invariant—properties
  requires a view of what is fundamental

• The basic syllable! ba
  –   CV
  –   in isolation
  –   stressed
  –   possibly with only one V if we‘re looking at Cs,
      and only one C if we‘re looking at Vs
        Imposing order on chaos
• The basic syllable: ba (context: silence)
• What was lost?
  –   polysyllables
  –   unstressed syllables
  –   prosody
  –   accounting for rate changes
  –   connected speech
  –   informativeness of variation esp. in connected speech
  –   meaning
  –   communication
  –   (most things really)
 Development of theory and the
      search for essence
• Two main approaches

    The Motor Theory

    Quantal Theory leading to
     Acoustic/Auditory Invariance
      The Motor Theory of Speech
             Perception
Liberman, Cooper, Shankweiler &
  Studdert-Kennedy (1967) Psychological Review 74, 431–461
Liberman & Mattingly (1985) Cognition 21, 1-36
• Listeners interpret speech sounds in terms of
   – motoric gestures they would make them with (1967)
   – intended gestures of the speaker (1985)

• Gestural unit: ‗phonetic category‘
        Quantal Theory of Speech
       Perception (and production)
Stevens (1972, 1989)
• Regions of stability in
  the acoustic signal, or
  auditory response,
  provide a basis for
  forming categories of
  sounds
• Unit: distinctive feature
  (Chomsky & Halle 1968)
                               Stevens (1989) Journal of Phonetics 17, 3-45
            Stevens (1972) In David & Denes Human Communication. 51-66
     Quantal Theory becomes
 Acoustic/Auditory invariance theory
                                      +consonantal       -consonantal
Stevens & Blumstein (1978)
  ……. Stevens (2002)
• For each DF there is a binary
  response to an invariant
  acoustic or auditory property
• e.g. particular changes in
  spectral shape over short time
  periods at crucial parts of the
  signal
   – segment boundaries                 change            little change
   – vowel steady states
                                        Stevens (2002) JASA 111, 1872-1891
                             Stevens & Blumstein (1978) JASA 64, 1358-1368
Acoustic/Auditory invariance theory
                              +strident          -strident
Stevens (2002)
• landmarks:
  – islands of reliability
  – built-in local context
• connected speech…



                             Stevens (2002) JASA 111, 1872-1891
         Common properties
• Motor and Acoustic Invariance theories
  have much in common
   – dynamic
   – early abstraction
   – discrete units
   – phonological
         Common properties
• Motor and Invariance theories have
  much in common
  – dynamic
  – early abstraction
  – discrete units
                      allowed psycholinguistic
  – phonological theories to assume an input
                   that is abstract and discrete:
                         to ignore phonetic
                             information
       Psycholinguistic theories
• Focus on word segmentation & identification
• Top-down knowledge compensates for
  impoverished (phonemic) input
   – metrical stress, possible words, phonotactics….
• Statistical, probabilistic
• Some names:
   – McClelland & Elman (TRACE)
   – Cutler, Norris, McQueen (Race, Shortlist, Merge)
   – Marslen-Wilson, Gaskell… (Cohort)
          extensions, questions:
     is simplicity the best answer?
Kewley-Port (1983)
• better identification
  with overall pattern
  (more detail?)
Klatt (1979)
• Lexical Access From
  Spectra (LAFS)
• whole-word patterns?
                                 Kewley-Port (1983) JASA 73, 322-335
                          Klatt (1979) Journal of Phonetics 7, 279-312
             extensions, questions:
               wider influences
Ganong (1980)
                                       nonword-word: dask-task
• identification expt                  word-nonword: dash-tash
• VOT continuum           100
• word at one end, non-
  word at the other     % /d/
• perception is more
  forgiving when the        0
  sound means something! short VOT (d)          long VOT (t)



                           Ganong (1980) J. Exp. Psych: HPP 6, 110-125
                Summary:
           ‗context‘ and ‗signal‘

• ‗Units‘ functionally inseparable from ‗context‘
• The context and the signal together determine
  whether the signal is coherent
  – and hence what each unit ‗is‘
 Recent developments
   (since early-to-mid 90s)
  systematic subtle variation as
    linguistically informative:
 classify the contexts in a more
linguistically-sophisticated way
  Combining old and new themes
• re-examination and extension of information
  provided by systematic phonetic variation
• new areas, e.g.
  – cross-linguistic work (Best, Beddor, Bradlow...)
  – memory & learning (Goldinger, Pisoni...)
  – functional brain imaging (Sophie Scott)
Listeners use fine phonetic detail
Allen & Miller (2004)
• speaker identity: listeners generalize talker-
  specific VOT information to a novel word
Smith (2004)
• lexical identity: slightly inappropriate
  allophones in a sentence disrupt word-spotting
  only when speaker is familiar to listener
• familiarization to speakers is fast
                      Allen & Miller (2004) JASA 116, 3171-3183
             Smith (2004) PhD Dissertation, Cambridge University
• Spoken word
  recognition test, which
                                        Chinese English Spanish
  is used to establish
  cerebral dominance
• large groups of native speakers
  of Chinese/English/Spanish
• coronal MRI slices, data for 3 Ss,
  >200 ms post-stimulus onset
• Lateralisation (%Ss):
   Spanish 100% left
   English 80% left
   Chinese 79% bilateral
       (tone lang.)

                                  Valaki et al. (2004) Neuropsychologia 42, 967–979
         What sort of model?
• biologically plausible
• roles of attention, memory & learning
• focus on meaning (‗sound to sense‘)
• multiple potential ‗units of perception‘
  no obligatory units?
• structure from incomplete information
     Adaptive Resonance Theory (ART) ?
        Grossberg 1986…

                Grossberg (2003) Journal of Phonetics 31, 423-445
               A key issue
• what is a phonetic category?
  (Carol Fowler, May 2004: ‗never been sure‘)

• mental representations of phonetic
  categories are dynamic, relational, & plastic
  – Repp, Lindblom, Studdert-Kennedy
  – Bradlow, Pisoni, Hawkins…..




                 Hawkins (2003) Journal of Phonetics 31, 373-405
        bottom-up vs top-down?
• phonetic variation that systematically indicates
  linguistic structure makes many ‗top-down‘
  processes unnecessary
  – e.g. allophonic detail vs Possible Word Constraint


• and blurs the traditional distinction between
  signal & knowledge
                 A Challenge
• to define and refine new questions in testable
  ways – i.e. to refocus, but to do it in ways that:
  – are rigorous yet focus on meaning and
    communication
  – avoid the ‗new understanding‘ becoming
    doctrinaire
  – build on past contributions
      Some topics I haven‘t mentioned
            but should have…
and could have, if I‘d told the same story in a different way
  • infants‘ & animals‘ perception (periods 2 & 3)
  • vowel perception (dynamics; center of gravity)
  • sine wave speech
  • more theories (direct perception, auditory
    enhancement, FLMP)
  • more on memory (incl. associations) & learning
  • connections with psychoacoustics
  • production-perception connections
              Categorical Perception
   Run a discrimination experiment
   Run an identification experiment

                                             1 versus 3
     100
                                      Discrimination peak
% /b/
% difft


          0



              1   ...   3   …         5 …      7
                                         Courtesy Chris Darwin‘s web site
   Valaki et al. (2004) Neuropsychologia 42, 967–979

• Monolingual/near monolingual native speakers:
  – 30 Mandarin-Chinese
  – 20 Spanish speakers               all right handed
  – 42 American English
• Whole-head MEG, auditory word recognition test,
  used clinically to establish hemispheric dominance
  for receptive language: 63 abstract words/language
  – 33 target words, each in 3 lists, with 10 novel non-
    target words in each list
  – lift finger when you recognize a target word
    Patterns of dominance (%)
                                LH RH bilateral


Laterality Index:     Spanish   100
(LH – RH) / (LH + RH)
                      English   80        20

                      Mandarin 14     7   79
Vowel-to-vowel coarticulation
    /ibbi/ vs /AbbA/

Naturally spoken

Schwas exchanged

    /ibbi/     /AbbA/

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:10/17/2010
language:English
pages:51