CNBH, PDN, University of Cambridge
Part II: Lent Term 2008: ( 1 of 4)
Central Auditory Processing
Roy Patterson
Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University of Cambridge
email rdp1@cam.ac.uk
www.pdn.cam.ac.uk/cnbh
People in the CNBH for supervisions Dr Tim Ives dti20@cam.ac.uk Dr Martin Vestergaard mdv23@cam.ac.uk Dr Alexis Hervais-Adelmann agh33@cam.ac.uk Mr Etienne Gaudrain
CNBH, PDN, University of Cambridge
The Overture
Act I: The form of animal communication sounds including speech and musical notes
Interlude: Anatomy of the auditory pathway
Act II: How the auditory system processes communication sounds [signal processing] [Tune7nCPHtone.mov]
axial axial
Interlude: Anatomy of the auditory pathway
Act III: Where the auditory system processes communication sounds in the brain [anatomy, physiology] Denouement: Extending the range of natural sounding notes using computers [a bit of musical fun]
CNBH, PDN, University of Cambridge
Act I The form of animal communication sounds including speech and musical notes
CNBH, PDN, University of Cambridge
Sounds used to communicate at a distance,
Pulse
to declare territories and attract mates, are typically Pulse-Resonance Sounds
Amplitude
Time
The pulse marks the start of the communication. The resonance provides distinctive information about the shape and size of resonators in the sender’s body.
CNBH, PDN, University of Cambridge
Communication ‘syllables’ of four different animals
Patterson, Smith, van Dinther and Walters (2008).
Fish
Frog
400 ms
Macaque
Human
CNBH, PDN, University of Cambridge
The information in speech sounds:
VT length determines resonance rate /a/m
VT shape determines resonance shape (vowel type) VC mass determines GPR (voice pitch) Long vocal tract ~ Man
CNBH, PDN, University of Cambridge
The information in speech sounds:
VT length determines resonance rate
/a/m
/a/w
2/3
Long vocal tract ~ Man
Shorter vocal tract ~ Woman
CNBH, PDN, University of Cambridge Patterson, Smith, van Dinther and Walters (2008).
Low
Long
High
Pitch
Short
VTL
Time
Time
CNBH, PDN, University of Cambridge
In natural communication sounds, at the syllable level, there are three important kinds of information: • resonance shape the message • glottal pulse rate pitch • resonance scale resonator size, or body size
CNBH, PDN, University of Cambridge
Musical Instruments come in Families
pitch
Instruments with different sizes, but same shape and construction, sound similar.
viola cello
violin
The ‘family’ sound is the message.
CNBH, PDN, University of Cambridge
Waveforms for trumpet and trombone
van Dinther and Patterson (2004)
pulse
resonance
Time
CNBH, PDN, University of Cambridge
Size Perception in Musical Instruments
French Horn
Resonance size
Pulse Rate
The perception of size in musical instruments Ralph van Dinther and Roy D. Patterson (2004)
CNBH, PDN, University of Cambridge
The perceptions produced by natural communication sounds have a pitch, a size and a message Human speech Animal calls: mammals, birds, frogs and fish Most musical instruments Most engines and some motors But not the sounds of inanimate turbulence: wind in the trees, rain on the roof, air conditioning noise, a tap or shower running.
CNBH, PDN, University of Cambridge
Contents
I: Size information in animal communication sounds [including speech] II: The robustness of auditory perception to changes in source size III: How the auditory system normalizes communication sounds for source size
CNBH, PDN, University of Cambridge
The effect of GPR and VTL on the perception of speaker size
Decreasing VTL
Increasing GPR
Kawahara and Irino (2004). Principles of speech manipulation system STRAIGHT. In Speech separation by humans and machines, P. Divenyi (Ed.), Kluwer Academic, 167-179.
CNBH, PDN, University of Cambridge
Rana catesbeiana
Kawahara and Irino (2004). Principles of speech manipulation system STRAIGHT. In Speech separation by humans and machines, P. Divenyi (Ed.), Kluwer Academic, 167-179.
Decreasing VTL
Increasing GPR
CNBH, PDN, University of Cambridge
Spectra on a linear frequency axis
Low Long
High
Pitch
Short
VTL
CNBH, PDN, University of Cambridge
Recognition of Scaled Vowels
/a/ /e/
Smith, Patterson, Turner, Kawahara and Irino JASA (2005)
/i/
/o/ Domain of normal speakers
/u/
CNBH, PDN, University of Cambridge
Speaker Size estimates for vowels varying in GPR and VTL
Smith and Patterson (2005) JASA
Size
CNBH, PDN, University of Cambridge
Smith and Patterson (2005) JASA
CNBH, PDN, University of Cambridge
Syllable database
Sonorants
Stops
Fricatives
CV’s
VC’s
vowels
ma me mi mo mu am em im om um aa
na ne ni no nu an en in on un ee
la le li lo lu al el il ol ul ii
ra re ri ro ru ar er ir or ur oo
wa we wi wo wu aw ew iw ow uw uu
ya ye yi yo yu ay ey iy oy uy
ba be bi bo bu ab eb ib ob ub
da de di do du ad ed id od ud
ga ge gi go gu ag eg ig og ug
pa pe pi po pu ap ep ip op up
ta te ti to tu at et it ot ut
ka ke ki ko ku ak ek ik ok uk
sa se si so su as es is os us
fa fe fi fo fu af ef if of uf
va ve vi vo vu av ev iv ov uv
za ze zi zo zu az ez iz oz uz
xa xe xi xo xu ax ex ix ox ux
ha he hi ho hu ah eh ih oh uh
mi en ka it so us
large (voiced) small (voiced)
Kawahara and Irino (2004). The vocoder STRAIGHT. Kluwer Academic
Ives, Smith and Patterson (2005) JASA
CNBH, PDN, University of Cambridge
Speaker-size discrimination task (Syllables)
Present two intervals of syllables and ask: “which is the smaller speaker?” Rove level between intervals Different pitch contours between intervals Only consistent cue is the change in VTL
interval 1 /se/ pitch /ma/ /et/ /ku/ /am/
VTL = x
interval 2
/wa/ /om/ /te/
VTL = x + Δx
Ives, Smith and Patterson (2005) JASA
There are different synthesised VTLs in each interval
CNBH, PDN, University of Cambridge
Experiment
Measure size discrimination thresholds for different sized people
≈VTL/cm SER/% 10 1.65 DWARF SMALL CHILD Ives, Smith and Patterson, JASA (2005)
14
1.22
SMALL MALE
LARGE MALE 19 0.92 80 160 Glottal pulse rate / Hz
CASTRATO
320
Trials test as smaller Trials test as smaller
DWARF
CNBH, PDN, University of Cambridge
Results: all subjects, all stimuli
Trials test as smaller CASTRATO
Ives, Smith and Patterson (2005) JASA
LARGE MALE Trials test as smaller SMALL MALE Trials test as smaller SMALL CHILD
CNBH, PDN, University of Cambridge
Results: all subjects, all stimuli (Syllables)
DWARF
SMALL CHILD
average JND across syllable category for specific speaker type. SMALL MALE grand average JND for the experiment
LARGE MALE
CASTRATO
CNBH, PDN, University of Cambridge
Speaker-size discrimination results (vowels)
Smith, Patterson, Turner, Kawahara and Irino JASA (2005)
7
Vocal-tract length / cm
24
Glottal Pulse Rate / Hz
CNBH, PDN, University of Cambridge
Interim summary
The information in natural communication sounds, at the syllable level: pulse rate, resonance shape, and resonance scale. The auditory system normalizes communication sounds at an early point in the processing to segregate the three forms of information and produce: a carrier-invariant representation of the message, a pitch value and an estimate of the speaker’ size.
[ Both VTL and GPR contribute to ] [ the perception of speaker size. ]
CNBH, PDN, University of Cambridge
Contents
I: Size information in animal communication sounds [including speech] II: The robustness of auditory perception to changes in source size III: How the auditory system normalizes communication sounds for source size