frank

Document Sample
frank Powered By Docstoc
					                Shhhhh………
Why do people whisper?
People whisper when they’re telling others a scandalous secret…
People whisper when they’re ask to speak softly as not to disturb
others (remember your elementary school librarian?)…
People whisper when they’re too weak to speak normally…
People whisper when…(can you think of other reasons?)




It seems that whispering is the most effective and efficient vocal
communication when it is better that only people within very short
range of the speaker should hear the speech.
                    Shhhhh………
So exactly what is whispering, or, whisper speech?


 Is it just a softer, a less intense version of regular speech?
 Why is it harder to understand whisper speech, even when it is
spoken right next to your ear?
 Would it be easier or more difficult to build a speech recognizer for
whisper speech?
 Can different voices be recognized in whisper speech?
 How is a word stressed, or emphasized, in whisper speech?


In an attempt to answer these questions, we have…
              The Experiment
Four subjects are asked to:


 Speak 10 medium length sentences (5 to 12 words) as naturally
and as clearly as possible.
 Repeat the 10 sentences again, but this time in whisper speech.


The first 9 sentences covers each of the phonemes in American
English at least once, and the 10th sentence is repeated three
times (both in regular speech and whisper speech), and each time
a different word is stressed.
                    The Subjects
This experiment was made possible by four dear volunteers. They are:


•A Young female native English speaker from the Midwest
•A Young male native English speaker from eastern Canada
•A Young male native English speaker from the Southeast
•A Young male native English speaker from Texas


These four subjects should give us a good idea of the differences
between regular and whisper speech in North American English.
Without further delay, let us look at the results…
    General Appearance of the
         Spectrograms




Spectrogram of the phrase ―stole my house‖ in regular speech
     General Appearance of the
          Spectrograms




Spectrogram of the phrase ―stole my house‖ in whisper speech
     General Appearance of the
          Spectrograms
At first glance, the spectrograms of whisper speech looks like
a string of fricative noises.
It is definitely much less intense than regular speech, which
would explain why it takes less energy to whisper than to
speak.


Now let’s take a closer look at what happens to each type of
phonemes when we whisper, starting with vowels…
Vowels – What’s all that hissing
          noise?




    ―I lied a lot on Saturday‖ in whisper speech
Vowels – What’s all that hissing
          noise?




   ―Chang is not a China man‖ in whisper speech
Vowels – What’s all that hissing
          noise?




 Regular ―stole my house‖ again, but this time notice the HH
    Vowels – What’s all that hissing
              noise?




Whisper ―stole my house‖ again, can you tell where HH starts and stops?
  Vowels – What’s all that hissing
             noise?
A closer look at the vowels shows us something interesting: They
all look like HH’s!
We all know that HH is a very ―transparent‖ phoneme, it does not
warp the vowels around it. Actually, vowels seem to ―pass through‖
HH because we can make out the formants.
Now it seems like all the whisper vowels are just HH’s with different
vowels passing through. Can you guess what would the word ―is‖
sound like in whisper speech?
Did you notice something peculiar with the formants?
Vowels – What’s all that hissing
          noise?



      ―The boy will eat oat, pit, or soot…‖




        ―…but only in small doses.‖
  Vowels – What’s all that hissing
             noise?
A second look shows us that low f1 on vowels seem to disappear
entirely, which is also an attribute of HH’s.
Fortunately, we can guess a low f1 on a whisper spectrogram from
the lack of it, and f2 and f3 are good enough indicators of labial,
velar, and dental phonemes.


But how about voicing? Isn’t f1 going down usually an indicator of
voicing? Let’s look at the voicing for…
Fricatives and Stops – Why we
      don’t say ―bzzzd…‖




      ―The fish thief stole my house‖
Fricatives and Stops – Why we
      don’t say ―bzzzd…‖




     ―Can I pay tickets with tacos and pork?‖
 Fricatives and Stops – Why we
        don’t say “bzzzd…”

The whisper fricatives and stops seems to be relatively easy to
spot in the spectrogram, just as in regular speech.
Now let’s take a look at the voiced fricatives and stops…
Fricatives and Stops – Why we
      don’t say ―bzzzd…‖




    ―The very vexed zebra‖ in regular speech
Fricatives and Stops – Why we
      don’t say ―bzzzd…‖




    ―The very vexed zebra‖ in whisper speech
Fricatives and Stops – Why we
      don’t say ―bzzzd…‖




   ―Beat the good dog, boy!‖ in whisper speech
  Fricatives and Stops – Why we
         don’t say “bzzzd…”
What happened?
The voiced fricatives and stops look just like their unvoiced
counterparts! It seems that they’ve lost their voicing!
So how do we hear things like ―dog‖ and ―zebra‖? It is because we
rely on high-level knowledge.
If we play just the phoneme of the whispered voiced consonant by
itself, we can hear that the unvoiced version is actually
pronounced!
  Fricatives and Stops – Why we
         don’t say “bzzzd…”

Fricatives and stops are relatively easy to spot in a whisper
spectrogram but they can be confusing, which is exactly the
opposite of…
Nasals – Barely there




   ―Chang is not a China man‖
           Nasals – Barely there
It seems nasals follow suit with the other phonemes—no voice bars
and no low f1 formants. Additionally nasals seem so faint that they
almost look like pauses.
However, we can see from the spectrogram that it isn’t difficult to
identify which nasal it is; we can see the formants going up for N,
going down for M, and velar pinch for NG.




What about liquids and glides? They actually behave pretty well in
whisper speech; identifying them is usually easier.
Liquids and Glides




―Look, you wet your red leather boots!‖
                Try this at home!
Now that we have gone through the different types of phonemes, we
can compile our results:


Vowels resemble HH’s
Voiced fricatives and stops lose their voicing
Nasals become faint but can be differentiated
Liquids and glides do not change much
Much high level knowledge is required to recognize whisper speech


We can do a little test to demonstrate this…
                    F0 and Pitch
What sort of f0 and pitch does whisper speech have?
(Can you guess?)


First, we can try using the Emu Labeler do the pitch analysis for
us…
                   F0 and Pitch




Pitch analysis for ―Somebody set up us the bomb!‖ (stress on ―us‖)
                     F0 and Pitch
It seems that Emu Labeler has failed us (not too surprisingly).
But that’s alright; we can still do it ourselves. Let’s make the
broadband spectrograms into narrowband spectrograms…
               F0 and Pitch




―Somebody set up us the bomb!‖ (stress on ―us‖) Bandwidth=70
               F0 and Pitch




―Somebody set up us the bomb!‖ (stress on ―us‖) Bandwidth=40
               F0 and Pitch




―Somebody set up us the bomb!‖ (stress on ―us‖) Bandwidth=20
                F0 and Pitch
As we make the bandwidth smaller and smaller, we realize
that we cannot make out the f0.
But since pitch is so important in stressing and emphasizing
parts of speech, how is stressing and emphasizing done in
whisper speech?
                F0 and Pitch




―Somebody set up us the bomb!‖ (stress on ―somebody‖)
           F0 and Pitch




―Somebody set up us the bomb!‖ (stress on ―us‖)
            F0 and Pitch




―Somebody set up us the bomb!” (stress on ―bomb‖)
                F0 and Pitch
As you may have expected, because of the lack of the
ability to change the pitch, speakers uses the other two
methods—more energy and longer duration—to emphasize
something they want to stress in whisper speech.
Try sing in whisper…can you do it?
One Last Thought – Variability in
       Whisper Speech
 One thing we notice throughout the experiment is that many
 characteristics of regular speech are lost in whisper speech.
 On the other hand, some variability factors such as age, regional
 accent, and emotion may also be reduced to some extent in whisper
 speech.
One Last Thought – Variability in
       Whisper Speech
       Which speaker whispered the sentence at the bottom?




Speaker A      ―Chang is not a China man.‖ in whisper        Speaker B




       ―They treasured the very vexed zebra.‖ in whisper
One Last Thought – Variability in
       Whisper Speech
                         Now can you tell?




Speaker A   ―Chang is not a China man.‖ in regular speech   Speaker B




     ―They treasured the very vexed zebra.‖ in regular speech
One Last Thought – Variability in
       Whisper Speech
 It seems that whisper speech forces the speech to lose some of its
 variability.
 What can you guess anything about the speaker from the this
 speech? (sex, age, nationality, region, the person?)




     ―The fish thief stole my house.‖ in whisper speech


     ―The fish thief stole my house.‖ in regular speech
             Conclusion
Whisper speech introduces more ambiguity into
speech, therefore the recognition of whisper
speech requires much high level knowledge.


There is no      detectable   pitch   dynamics   in
whisper speech.



Whisper speech seem to reduce some variability
in speech.
               Conclusion
Would we ever need automatic speech recognition
for whisper speech?

    For use in quiet places (library)
    For people with speech difficulty (throat cancer)
    Can you think of others? (secret agent watch?)


Would it be more difficult than automatic speech
recognition for regular speech?

    More ambiguity
    Need more high-level language modeling
    Less variability?

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:14
posted:3/2/2011
language:English
pages:45