presentation ISIP by jennyyingdi


									Research Challenges for Spoken
   Language Dialog Systems

              Julie Baca, Ph.D.
   Center for Advanced Vehicular Systems
        Mississippi State University

    Computer Science Graduate Seminar
           November 27, 2002
   Define dialog systems
   Describe research issues
   Present current work
   Give conclusions and discuss
    future work
What is a Dialog System?
   Current commercial voice products
    require adherence to “command
    and control” language, e.g.,
       User: “Plan Route”
   Such interfaces are not robust to
    variations from the fixed words and
What is a Dialog System?
   Dialog systems seek to provide a
    natural conversational interaction
    between the user and the
    computer system, e.g.,
       User: “Is there a way I can get to
        Canal Street from here?
Domains for Dialog Systems

     Travel reservation
     Weather forecasting
     In-vehicle driver assistance
     On-line learning environments
Dialog Systems:
Information Flow
   Must model two-way flow of information
   User-to-system
   System-to-user
Dialog System
Research Issues
Many fundamental problems must be
solved for these systems to mature.
Three general areas include:
 Automatic Speech Recognition

 Natural Language Processing

 Human-computer Interaction (HCI)
NLP Issue for Dialog
Systems: Semantics
   Must assess meaning, not just
    syntactic correctness.
   Therefore, must handle
    ungrammatical inputs, e.g.,
       “The ……nearest .....station is…
        …is there a gas station nearby?”
NLP Issue: Semantic
Representation 1
   For NLP, use semantic grammars
   Semantic frame with slots and
   <destination> -> <prep> <place>
    <prep>->      “nearest”
    <place>->      “gas station”
NLP Issue: Semantic
Representation 2
 Must also represent:
 “How do I get from Canal Street to Royal
<directions> ->     <start> <destination>
<destination> ->    <prep><place>
<place> ->          <street_name> |
<street_name>->     “Canal St”| “Royal St”
<prep> ->           <to_prep><near_prep>
<near-prep> ->      “nearest”|“closest”

NLP Issue: Semantic
Representation 3
   Two Approaches:
   Hand-craft the grammar for the
    application, using robust parsing to
    understand meaning [1,2].
       Problem: time, expense
   Use statistical approach, generating
    initial rules and using annotated tree-
    banked data to discover the full rule set
       Problem: annotated training data
ASR/NLP Issue:
Reducing Errors
   Most systems use a loose coupling
    of ASR and NLP.
   Try earlier integration of semantics
    with recognizer.
   Incorporate dialog “state” into
    underlying statistical model.
   Problems:
     Increases search space
     Training Data
NLP Issue: Resolving
Meaning Using Context
   Must maintain knowledge of the
    conversational context.
   After request for nearest gas station,
    user says, “What is it close to?”
       Resolving “it” - anaphora
   Another follow-up by the user,
    “How about …restaurant?”
       Resolving “…” with “nearest”- ellipsis
Resolving Meaning:
Discourse Analysis
   To resolve such requests, system
    must track context of the
   This is typically handled by a
    discourse analysis component in
    the Dialog Manager.
Dialog Manager:
Discourse Analysis
   Anaphora resolution approach: Use
    focus mechanism, assuming
    conversation has focus [5].
   For our example, “gas station” is current
   But how about:
       “I’m at Food Max. How do I get to a gas
        station close to it and a video store close
        to it?”
   Problem: Resolving the two “its”.
Dialog System
Dialog Manager:
   Often cannot satisfy request in one
   The previous example may require
    clarification from the user,
       “Do you want to go to the gas
        station first?”
      HCI Issue:
      System vs. User Initiative
   What level of control do you provide user in
    the conversation?
Mixed Initiative
   Total system initiative provides low
   Total user initiative introduces
    higher error rate.
   Thus, mixed initiative approach,
    balancing usability and error rate,
    is taken most often.
   Allowing user to adapt the level
    explicitly has also shown merit [6].
ASR/HCI Issue:
Error Handling
   How to handle possible errors?
   Assign confidence score to result
    of recognizer.
   For results with lower confidence
    score, request clarification or revert
    to system-oriented initiative.
   Can incorporate dialog state in
    computing confidence score [7].
HCI Issue:
Response Generation
   How to present response to user in a
    way that minimizes cognitive load?
   Varies depending on whether output is
    speech-only or speech /visual.
       Speech-only output must respect user
        short-term memory limitations, e.g., lists
        must be short, timed appropriately, and
        allow repetition.
       Speech/visual output must be
        complimentary, e.g., importance of
        redundancy and timing.
HCI Issue:
Evaluating Dialog Systems
   How to compare and evaluate
    dialog systems?
     (Paradigm for Dialog Systems
    Evaluation) provides a standard
    framework [8].
Evaluating Dialog Systems
   Task success
       Was the necessary information
   Efficiency/Cost
       Number dialog turns, task completion
   Qualitative
       ASR rejections, timeouts, helps
   Usability
       User satisfaction with ASR, task ease,
        interaction pace, system response
Current Work
   Sponsored by CAVS
   Examining:
       In-vehicle Environment
       Manufacturing Environment

   Multidisciplinary Team:
       CS , ECE, IE
            Baca, Picone, Duffy
       ECE graduate students
            Hualin Gao, Zheng Feng
Current Work:
In-vehicle Dialog System

     Specific ASR Issues for In-vehicle
       Real-time performance
       Noise cancellation
Current Work:
In-vehicle Dialog System
    Other Significant Issues:
      Reducing error rate
      Graceful error handling and mixed
       initiative strategy
      Response generation to reduce
       user cognitive load
      Evaluation
Current Work:
In-vehicle Dialog System
   Approach
       Develop prototype in-vehicle system
       Initial focus on ASR and NLP issues
            Integrate real-time recognizer [9]
            Employ noise-cancellation techniques [10]
            Use semantic grammar for NLP
            Examine tighter integration of ASR and
            Incorporate dialog state in underlying
             statistical models for ASR
Current Work:
In-vehicle Dialog System
   Second phase, focus on:
     Response generation
     Mixed initiative strategies

     Evaluation
Current Work:
Workforce Training
Dialog System
   Significant issues in manufacturing
       Recognition issues:
            Real-time performance
            Noisy environments
       Understanding issues:
            Multimodal interface for reducing error
             rate, e.g., voice and pen [11].
       HCI/Human Factors Issues:
            Response generation to integrate speech
             and visual output
Research Significance
   Advance the development of dialog
    systems technology through
    addressing fundamental issues as
    they arise in the automotive
   Potential areas: ASR, NLP, HCI
[1] S.J. Young and C.E. Proctor, “The design and implementation of dialogue control in voice
    operated database inquiry systems,” Computer Speech and Language, Vol.3, no. 4, pp.
    329-353, 1992.
[2] W. Ward, “Understanding spontaneous speech,” in Proceedings of International
    Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 1991, pp.
[3] R. Pieraccini and E. Levin, “Stochastic representation of semantic structure for speech
    understanding,” Speech Communication, vol. 11., no.2, pp. 283-288, 1992.
[4] Y. Wang and A. Acero, “Evaluation of spoken grammar learning in the atis domain,” in
    Proceedings International Conference on Acoustics, Speech, and Signal Processing,
    Orlando, Florida, 2002.
[5] C. Sidner, “Focusing in the comprehension of definite anaphora,” in Computational Model
    of Discourse, M. Brady, Berwick, R., eds, 1983, Cambridge, MA, pp. 267-330, The MIT
[6] D. Littman and S. Pan, “Empirically evaluating an adaptable spoken language dialog
    system,” in The Proceedings of International Conference on User Modeling, UM ’99,
    Banff, Canada, 1999.
[7] S. Pradham and W. Ward, “Estimating Semantic Confidence for Spoken Dialogue
    Systems, “ Proceedings of the IEEE International Conference on Acoustics, Speech, and
    Signal Processijng (ICASSP-2002), Orlando, Florida, USA, May 2002.
[8] M. Walker, et al., “PARADISE: A Framework for Evaluating Spoken Dialogue Agents, “
     Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics
     (ACL-97), pp. 271-289, 1997.
[9] F. Zheng, J. Hamaker, F. Goodman, B. George, N. Parihar, and J. Picone,
    “The ISIP 2001 NRL Evaluation for Recognition of Speech in Noisy Environments,”
     presented at the Speech In Noisy Environments (SPINE) Workshop, Orlando, Florida,
     USA, November 2001.
[10] F. Zheng and J. Picone, "Robust Low Perplexity Voice Interfaces,“ MITRE Corporation,
     December 31, 2001.
[11] S. Oviatt, “Taming Speech Recognition Errors within a Multimodal Interface, “
     Communications of the ACM, Sept. 2000, 43 (9), 45-51 (special issue on
     "Conversational Interfaces").

To top