Speech _ non-speech audio

Document Sample
Speech _ non-speech audio Powered By Docstoc
					Speech & non-speech audio
Exercise: Non-speech auditory cues
  what are some non-speech auditory cues you
   encounter on a daily basis?
  what information do these convey to you?
Forms of non-speech audio

  alarms and warning systems
  status and monitoring indicators
  encoded messages and data

  “Just like any other language, this is a learned vocabulary”
                                                  - Bill Buxton

  Use of non-speech audio to convey information or
   perceptualize data, e.g., Geiger counter
  Techniques include:
      Earcons - [Blattner, 1991]
       auditory equivalent of graphical icons
       abstract, structured musical tones
       convey information about object or event
    Auditory icon – [Gaver, 1991]
       short audio recording, typically from real world
    Parameter-mapping and model-based sonification
Heart Rate Variability Sonification
[Ballora et al. 2000]

  sonifies interval between successive heartbeats
   to assist in cardiopulmonary diagnosis
Live Wire
[Jeremijenko 1995]

  piece of plastic cord that hangs from a small electric
   motor mounted on the ceiling
Audio Debugging
DiGiano 1997
[Krueger & Gilden 1997]

  presents geographic information to blind people
  user's hand rests on surface covered with a tactile grid
   and is viewed by a ceiling-mounted video camera
  invisible virtual map is defined on the desk surface and
   the feature that the user is currently “touching” is
   signalled by a sound
“Blindminton”: Pong for visually impaired
[Hermann, Höner and Ritter ‘05]
Speech Technologies

    speech recognition (dictation, command)
    speaker recognition (who is this?)
    speaker verification (authentication)
    speech synthesis (TTS)
Why speak to your computer?

  HCI tasks
      command-and-control, dictation
      for users with disabilities
  computer-mediated human-human
      voice-mail
      voice annotation of documents
When to use audio vs. video for conveying a message to a user?
Condition   use audio   use video
Speech Recognition Application Issues
[based primarily on slides from Nuance]

                                                   Dictating a Patient
 Customer Service

Calling for Directory                               Interacting with
     Assistance                                   Navigation Systems

                         Using a Cellular Phone
The value of Automation

                                         target area for
                                         user experience


Impersonal                                                    Human

                                                Live agents
Speech Interface Goals

    personal, “human” caller experiences
    more efficient calls
    flexible applications
    higher application accuracy
    fewer operator opt-outs and hang-ups
UI Space

Graphical interfaces               Speech interfaces
  immediately visible                 listening is slow and serial
  can be viewed in parallel           retention is fleeting: must rely
  persistent: user can refer back      on short-term memory;
   to screen contents                   otherwise, need to repeat
                                        what’s been said
Digits and Speech

TouchTone                        Speech
  Vocabulary is 0-9, *, and #     Vocabulary is designed for
  99.9+% accurate                  application
  100% in vocabulary              Wider range of automation
                                   Vocabulary is ‘unlimited’
                                       95+% in vocabulary
                                   Language is imprecise
                                       must establish new
The Science Behind Guiding the Caller
  Prompt                Which Airport are you travelling to?
  Grammar and           Atlanta
  Synonyms              Boston [International]
                        [Boston] Logan [International] [Airport]

                        LaGuardia                                                            Designing & Developing
                        New York
  Pronunciations        Boston
                        b ah s t ih n
                        b aw s t ih n
  Language              34% Dallas
  Model                 8% Dulles

                                                Speech Science
 Tuning to listen                                                  “Please say or enter the phone number associated
                                                                   with your account including the area code.
   to caller’s                                                     [2 second pause]
stumbling blocks                                                   You can also say or enter your account number.

                          Most people spoke a 10 digit phone number
                          Many people were using local area codes: 410, 227, 301, 443 or 667
                          Colloquial “four ten” area code is Out of Vocabulary (OOV)‫‏‬
The Media Equation
[Reeves & Nass 1996]

 “When [people] are asked to evaluate a computer's
  performance, they tend to assess the one they are
   using more positively than others -- just as people
      tend to praise other people more to their faces
                            than behind their backs.”
Implications from Social Psychology
  Speech-based applications succeed if they:
    Establish close relationships: Team Work
    Convince users to try harder: Reciprocity
    Create “believability”: Expert Opinion
Prompt Issues

  Prompts are heard differently by different people
  “What kind of food do you want?”
  “Ham sandwich.”
  “I didn’t understand. Please repeat your choice.”
  “Ham sandwich, ham sandwich, ham sandwich.”

  Prompts are not directive enough
  “At which airport did you file your claim?”

  People don’t listen to the words!
     Main Menu: You can say “buy”, “sell”, or “Help” for more options
     Heard as: You can say “buy”, “sell” or, “help for more options”

  Options are not addressing callers’ needs
   “Would you like departure or arrival information?”
   “I have a question about reserved seating”

  Provide feedback through percolation or music
Prompts and Feedback

  provide tapering of repeated prompts
      possible options
      help command
      repeat command
  lead callers to the intended response
  the words are only half the battle
      personality
      prosody, intonation
 Apologize Generously
  take the blame
       even when it isn’t the system’s fault
       especially when it is
  if you don’t…
       users will perceive the system as uncooperative
       users will blame themselves
       they won’t use it
Dealing with agent requests

analysis of several million calls from
the Nuance Deployment Databank
                                         % of Callers Remaining in Automation*

                                                         Not offered explicitly

                                                                   2 second pause

                                               1 second pause
     “What happens to
        my automation                                Provide immediate
     level if I tell callers                            agent option
       they can ask for
          an agent?”
Dialogue prompts should:
    direct the caller how to speak within the grammar
    provide confirmation and error-recovery
    provide help
    adapt to caller behavior
    adapt to the environment…noise, caller’s voice, etc.

              Telephone Repair System Example:
           “There is a test you can run to try to isolate the   T

          trouble. … It’ll be a small box marked … “NID.”

               Do you think you can get to that box?”
Caller-Driven Design

                                           Developing Caller

        Securing Caller

 Tuning to listen    T

  to real callers’

                          Usability Studies for

stumbling blocks          direct feedback from       Designing Adaptive
                               real callers                Calls
Developing Caller Archetypes

  a biographical sketch of a typical caller
  archetypes assist in the UI design process
Sample Caller Archetype

 Jimmy is a troubleshooting engineer and gets to fly all over the
 continent. He took the job about a year ago after rotting away in a
 software slave factory for what seemed like forever. He loves the
 travel and visiting cities that he only saw on TV – even though he
 rarely gets to see more than the view between the airport and the
 factory. He also enjoys life’s little luxuries: trendy restaurants and
 cafes, etc.

 He has collected all the points offered to him, and has a wallet full of
 loyalty cards. However he has no idea of the number of points on
 any given one. His buddy Joel – a sales engineer - recently told him
 about the cool holiday he booked with his points, so Jimmy has
 decided to look into how far he can get with his points.
Impact of the Persona(lity)‫‏‬

  a company has its own brand to reinforce
      rolled up in the application persona
  speech has personality
      intrinsic feature to conversational systems
      requires conscious design effort
  people evaluate voices rapidly

      gender within .5 seconds

      age within 1 second
      region, intelligence, trust, class in 3 seconds
Adapting to Proactive Input

   People provide proactive input every time we talk
       It’s quicker, more helpful & more pleasant…

Agent: What day are you
                              Caller: On Wednesday and
                               I’m returning Thursday

   Agent: OK, out on
  Wednesday…back on
Failing on Proactive Input

   Speech apps typically fare poorly on proactive input
       even minor deviations from model result in retries
        and confirmations

System: What day are you
                                Caller: On Wednesday, and
                                  I’m returning Thursday

 System: I’m sorry, I didn’t
 understand. Please tell me
 what day you’re leaving –
  for example, Monday or
Examples of Adaptive Calls
System        What city are you departing from?

Version 1     Boston                                    0:25

Version 2     Boston, South Station                     0:08   1-800-USA-RAIL

  Callers want to be proactive, and use real
   world knowledge to finish the task quickly
      Up to 20% of retries & confirmations are due to
       callers entering unsupported proactive input*
      Up to 10% of hang ups and opt outs are due to
       callers who unsuccessfully enter proactive input*

            *Source: Nuance Deployment Databank, 2005
Benefits of Adaptive Calls

          Tell me how many passengers will be
          traveling                                1-800-USA-RAIL
          Three                             0:32
          One adult, two children           0:09
Benefits of Personalization

  different levels of familiarity with company, application
  different interests, goals
      last statement balance or minimum amount due?
  different contact histories
      third call in 3 hours
      first-time-ever caller may want a web promotion
Explicit Personalization

  “Welcome ELIZABETH! Main menu. What can I….”
  “Here’s the info you always want”
  “Let me guess – you’re calling about…”

  benefits of identifying a repeat caller
      you’re not back to “square 1”
      respects callers’ efforts
      can modify promotional information
Testing the Interface

  Wizard Of Oz / role play
      one person for the system, one for the user
  usability tests
      6 or more subjects individually
      as close as possible to target user base
  pilot tests
      did we get the caller goals right?
 Typical Corrections
  add / correct pronunciations
     “Atlanta” / “Alanna”
  add obvious (in retrospect) synonyms
     “yes, ma’am”, “Dulles International”
  adjust prompting or dialog to reduce errors and confusions
     “…or say ‘messages’ at the main menu”
     “…say your account number one digit at a time”
 Typical Adjustments from Live Calls
  UI
      make prompts more understandable
      adjust call flow
        add options for newly discovered goals
        tune shortcuts and hints for common usage
  grammar
      more pronunciations, synonyms
      adjust recognition thresholds
        confirmation and rejection
  add examples:
      “Please say your 10-digit loan number.”
Transaction Completion Rate
Building a Speech Application

–  Speech Strategy              –  User Interface
–  Requirements
   Analysis                     –  Sample Calls

                                –  Grammar & Application
–  Engine Tuning
                                –  Voice Recording
–  Application
   Tuning                       –  Usability Testing
                                –  Functional Testing

Shared By: