presentation ISIP

Document Sample
presentation ISIP Powered By Docstoc
					Research Challenges for Spoken
   Language Dialog Systems

              Julie Baca, Ph.D.
   Center for Advanced Vehicular Systems
        Mississippi State University

    Computer Science Graduate Seminar
           November 27, 2002
   Define dialog systems
   Describe research issues
   Present current work
   Give conclusions and discuss
    future work
What is a Dialog System?
   Current commercial voice products
    require adherence to “command
    and control” language, e.g.,
       User: “Plan Route”
   Such interfaces are not robust to
    variations from the fixed words and
What is a Dialog System?
   Dialog systems seek to provide a
    natural conversational interaction
    between the user and the
    computer system, e.g.,
       User: “Is there a way I can get to
        Canal Street from here?
Domains for Dialog Systems

     Travel reservation
     Weather forecasting
     In-vehicle driver assistance
     On-line learning environments
Dialog Systems:
Information Flow
   Must model two-way flow of information
   User-to-system
   System-to-user
Dialog System
Research Issues
Many fundamental problems must be
solved for these systems to mature.
Three general areas include:
 Automatic Speech Recognition

 Natural Language Processing

 Human-computer Interaction (HCI)
NLP Issue for Dialog
Systems: Semantics
   Must assess meaning, not just
    syntactic correctness.
   Therefore, must handle
    ungrammatical inputs, e.g.,
       “The ……nearest .....station is…
        …is there a gas station nearby?”
NLP Issue: Semantic
Representation 1
   For NLP, use semantic grammars
   Semantic frame with slots and
   <destination> -> <prep> <place>
    <prep>->      “nearest”
    <place>->      “gas station”
NLP Issue: Semantic
Representation 2
 Must also represent:
 “How do I get from Canal Street to Royal
<directions> ->     <start> <destination>
<destination> ->    <prep><place>
<place> ->          <street_name> |
<street_name>->     “Canal St”| “Royal St”
<prep> ->           <to_prep><near_prep>
<near-prep> ->      “nearest”|“closest”

NLP Issue: Semantic
Representation 3
   Two Approaches:
   Hand-craft the grammar for the
    application, using robust parsing to
    understand meaning [1,2].
       Problem: time, expense
   Use statistical approach, generating
    initial rules and using annotated tree-
    banked data to discover the full rule set
       Problem: annotated training data
ASR/NLP Issue:
Reducing Errors
   Most systems use a loose coupling
    of ASR and NLP.
   Try earlier integration of semantics
    with recognizer.
   Incorporate dialog “state” into
    underlying statistical model.
   Problems:
     Increases search space
     Training Data
NLP Issue: Resolving
Meaning Using Context
   Must maintain knowledge of the
    conversational context.
   After request for nearest gas station,
    user says, “What is it close to?”
       Resolving “it” - anaphora
   Another follow-up by the user,
    “How about …restaurant?”
       Resolving “…” with “nearest”- ellipsis
Resolving Meaning:
Discourse Analysis
   To resolve such requests, system
    must track context of the
   This is typically handled by a
    discourse analysis component in
    the Dialog Manager.
Dialog Manager:
Discourse Analysis
   Anaphora resolution approach: Use
    focus mechanism, assuming
    conversation has focus [5].
   For our example, “gas station” is current
   But how about:
       “I’m at Food Max. How do I get to a gas
        station close to it and a video store close
        to it?”
   Problem: Resolving the two “its”.
Dialog System
Dialog Manager:
   Often cannot satisfy request in one
   The previous example may require
    clarification from the user,
       “Do you want to go to the gas
        station first?”
      HCI Issue:
      System vs. User Initiative
   What level of control do you provide user in
    the conversation?
Mixed Initiative
   Total system initiative provides low
   Total user initiative introduces
    higher error rate.
   Thus, mixed initiative approach,
    balancing usability and error rate,
    is taken most often.
   Allowing user to adapt the level
    explicitly has also shown merit [6].
ASR/HCI Issue:
Error Handling
   How to handle possible errors?
   Assign confidence score to result
    of recognizer.
   For results with lower confidence
    score, request clarification or revert
    to system-oriented initiative.
   Can incorporate dialog state in
    computing confidence score [7].
HCI Issue:
Response Generation
   How to present response to user in a
    way that minimizes cognitive load?
   Varies depending on whether output is
    speech-only or speech /visual.
       Speech-only output must respect user
        short-term memory limitations, e.g., lists
        must be short, timed appropriately, and
        allow repetition.
       Speech/visual output must be
        complimentary, e.g., importance of
        redundancy and timing.
HCI Issue:
Evaluating Dialog Systems
   How to compare and evaluate
    dialog systems?
     (Paradigm for Dialog Systems
    Evaluation) provides a standard
    framework [8].
Evaluating Dialog Systems
   Task success
       Was the necessary information
   Efficiency/Cost
       Number dialog turns, task completion
   Qualitative
       ASR rejections, timeouts, helps
   Usability
       User satisfaction with ASR, task ease,
        interaction pace, system response
Current Work
   Sponsored by CAVS
   Examining:
       In-vehicle Environment
       Manufacturing Environment

   Multidisciplinary Team:
       CS , ECE, IE
            Baca, Picone, Duffy
       ECE graduate students
            Hualin Gao, Zheng Feng
Current Work:
In-vehicle Dialog System

     Specific ASR Issues for In-vehicle
       Real-time performance
       Noise cancellation
Current Work:
In-vehicle Dialog System
    Other Significant Issues:
      Reducing error rate
      Graceful error handling and mixed
       initiative strategy
      Response generation to reduce
       user cognitive load
      Evaluation
Current Work:
In-vehicle Dialog System
   Approach
       Develop prototype in-vehicle system
       Initial focus on ASR and NLP issues
            Integrate real-time recognizer [9]
            Employ noise-cancellation techniques [10]
            Use semantic grammar for NLP
            Examine tighter integration of ASR and
            Incorporate dialog state in underlying
             statistical models for ASR
Current Work:
In-vehicle Dialog System
   Second phase, focus on:
     Response generation
     Mixed initiative strategies

     Evaluation
Current Work:
Workforce Training
Dialog System
   Significant issues in manufacturing
       Recognition issues:
            Real-time performance
            Noisy environments
       Understanding issues:
            Multimodal interface for reducing error
             rate, e.g., voice and pen [11].
       HCI/Human Factors Issues:
            Response generation to integrate speech
             and visual output
Research Significance
   Advance the development of dialog
    systems technology through
    addressing fundamental issues as
    they arise in the automotive
   Potential areas: ASR, NLP, HCI
[1] S.J. Young and C.E. Proctor, “The design and implementation of dialogue control in voice
    operated database inquiry systems,” Computer Speech and Language, Vol.3, no. 4, pp.
    329-353, 1992.
[2] W. Ward, “Understanding spontaneous speech,” in Proceedings of International
    Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 1991, pp.
[3] R. Pieraccini and E. Levin, “Stochastic representation of semantic structure for speech
    understanding,” Speech Communication, vol. 11., no.2, pp. 283-288, 1992.
[4] Y. Wang and A. Acero, “Evaluation of spoken grammar learning in the atis domain,” in
    Proceedings International Conference on Acoustics, Speech, and Signal Processing,
    Orlando, Florida, 2002.
[5] C. Sidner, “Focusing in the comprehension of definite anaphora,” in Computational Model
    of Discourse, M. Brady, Berwick, R., eds, 1983, Cambridge, MA, pp. 267-330, The MIT
[6] D. Littman and S. Pan, “Empirically evaluating an adaptable spoken language dialog
    system,” in The Proceedings of International Conference on User Modeling, UM ’99,
    Banff, Canada, 1999.
[7] S. Pradham and W. Ward, “Estimating Semantic Confidence for Spoken Dialogue
    Systems, “ Proceedings of the IEEE International Conference on Acoustics, Speech, and
    Signal Processijng (ICASSP-2002), Orlando, Florida, USA, May 2002.
[8] M. Walker, et al., “PARADISE: A Framework for Evaluating Spoken Dialogue Agents, “
     Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics
     (ACL-97), pp. 271-289, 1997.
[9] F. Zheng, J. Hamaker, F. Goodman, B. George, N. Parihar, and J. Picone,
    “The ISIP 2001 NRL Evaluation for Recognition of Speech in Noisy Environments,”
     presented at the Speech In Noisy Environments (SPINE) Workshop, Orlando, Florida,
     USA, November 2001.
[10] F. Zheng and J. Picone, "Robust Low Perplexity Voice Interfaces,“ MITRE Corporation,
     December 31, 2001.
[11] S. Oviatt, “Taming Speech Recognition Errors within a Multimodal Interface, “
     Communications of the ACM, Sept. 2000, 43 (9), 45-51 (special issue on
     "Conversational Interfaces").

Shared By: