Spoken Dialogue Technology
Overview
The Dialogue Manager is the central component of a dialogue
system.
Accepts spoken input from the user, produces messages to be
communicated to the user, interacts with external knowledge
sources, and generally controls the dialogue flow.
The task may involve a fairly simple interaction in which users
retrieve information and perform routine transactions.
Complex tasks involving negotiation and problem solving
Dialogue initiative
System directed, user directed, mixed initiative
Dialogue control methods
Finite state based, frame based, agent based
Grounding
Verification methods
Dealing with external knowledge sources
System-directed dialogue
The system asks one or more questions to elicit some
information from the user so that it can submit an
appropriate query to the external knowledge source.
required vocabulary and grammar for each response can
be specified in advance (+)
speech recognition and language understanding are
constrained and are likely to be more accurate (+)
restricts the user’s input to predetermined words and
phrases (-)
difficult to correct misrecognised items (-)
does not allow user to take the initiative and ask questions
or introduce new topics (-)
System-directed dialogue: example
S: How can I help you? For account balances and
activities, say account information.
To transfer funds between accounts, say ...
U: Account information
S: For which account would you like information? Please
say: savings, checking, or reserve account.
U: Savings
S: Would you like your savings account balance, last
deposit, last withdrawal, ...
U: Balance
S: Your savings account balance is seven thousand two
hundred fifty dollars
User initiative
User asks one or more questions that the system
interprets and answers.
System needs to have S: How can I help you?
comprehensive speech and U: What's the balance in my
language processing
capabilities in order to checking account?
process and interpret a S: Your checking account
potentially wide range of balance is one thousand
input. five hundred dollars. How
User needs to be aware of can I help you?
the words and phrases that
the system can interpret. U: What's the balance in my
savings account?
S: Your savings account
balance is seven thousand
two hundred fifty dollars
Mixed initiative
Either participant can take the initiative to ask questions,
initiate topics, and request clarifications
permits more complex interaction between user, system,
and underlying application
requires comprehensive natural language processing
and complex dialogue management
focus on modelling dialogue as collaboration between
agents to solve a task
different approaches depending on what is modelled
Planning and plan recognition
Beliefs, intentions and goals
Rational agency
Dialogue Control Methods
Finite state based (state transition networks, directed
graphs)
Frame-based control
Agent-based approaches
State-based dialogue control
The nodes represent the system’s questions
Transitions between the nodes represent all
the possible paths through the network.
The graph specifies all legal dialogues
Each state represents a stage in the
dialogue in which some information is elicited
from or confirmed with the user, or some
action is performed by the system.
System: What is your destination?
User: London
System: Was that London?
User: Yes
System: What day do you want to travel?
User: Friday
System: Was that Sunday
User: No
System: What day do you want to travel?
Advantages of State-based Dialogue
control
suitable for system-directed dialogues
suitable for well-structured tasks with pre-determined
sequence of questions
dialogue can be modelled graphically
can include sub-dialogues for sub-tasks e.g. getting a
date
used widely in commercial applications
some empirical evidence that users prefer a predictable
control flow
Disadvantages of State-based Dialogue
control
Inflexible
problem with dialogues that deviate from the
predetermined path
difficult for user to make corrections
difficult for user to introduce information not predicted at
design time
not suitable for more complex tasks such as
negotiation, since the course of the dialogue has to be
specified in advance
e.g. planning a journey may require discussion of
constraints that may be unknown at the outset
planning for multiple paths and situations leads to
combinatorial explosion of states and transitions
Frame-Based Control
declarative approach
uses a template (or frame) containing slots to be filled
during the dialogue
destination: London
date: unknown
time of departure: 9
system decides the next question to be asked based on
what information has been elicited and what remains to
be elicited
provides for a more flexible dialogue
requires more elaborate dialogue control algorithm
Characteristics of Frame-based Dialogue
Control
user can provide more information than was asked for
in the system prompt e.g.
System: where are you travelling to?
User: London on Friday
(System does not ask: When do you want to go to London)
requires more extended natural language grammar as
user’s answer could include various permutations of the
required information e.g.
Destination
Destination + Date
Destination + Time
Destination + Date + Time
Destination + Time + Date
Problems: Complex Tasks
users with wide range of different levels of knowledge
would require wide range of system responses
the state of the world may change dynamically during
the course of the dialogue - not possible to specify all
possible configurations in advance;
dialogues involving negotiation of some task to be
achieved, planning and other types of collaborative
interaction
Agent-based Control
draws on methods from Artificial Intelligence
focus on modelling dialogue as collaboration between
agents to solve a task
permits more complex interaction between user,
system, and underlying application
different approaches depending on what is modelled
Planning and plan recognition
Beliefs, intentions and goals
Rational agency
mixed initiative dialogue
requires sophisticated natural language processing
Example of an agent-based system
User: I’m looking for a job in the Calais area. Are there
any servers?
System: No, there aren’t any employment servers for
Calais. However, there is an employment server for
Pas-de Calais and an employment sever for Lille. Are
you interested in one of these?
system attempts to provide a more co-operative
response that might address the user’s needs.
Grounding
Some potential problems for a dialogue manager:
1. The speech recogniser may have detected silence even though
the user had spoken. (No words returned – noinput event).
2. Only a part of the user’s utterance has been recognised and
returned.
beginning of the user’s input may have been cut off
end of the user’s input could have been lost because the
engine stopped listening too early
3. All of the input has been captured but some or all of the words
were incorrectly recognised.
4. Even though all the words were correctly recognised, the language
understanding component was either unable to assign the correct
meaning or there were a number of possible meanings due to
ambiguity.
Clarification sub-dialogues
1. Simple approach – ask user to repeat
Does not address the problem (no input, incomplete, etc)
Relies on user to know how best to repeat or reformulate
2. More complex approach – attempt to detect and
address the problem
If silence detected, use specific prompt for silence
If unable to assign meaning to input, use specific prompt
addressing problem with understanding
Built-in event handlers in VoiceXML
More sophisticated methods in research systems
Verification (Confirmation)
Current speech recognition technology cannot
guarantee that the system heard exactly what the user
said
The system should confirm what the user wants,
especially if the next action could result in
unrecoverable consequences
There are two types of verification
explicit verification
implicit verification
It is also important to decide
if confirmation should be made as each value is acquired
if more than one value can be confirmed at a time
Explicit Verification
Explicit verification plays back the user’s response to
check that it has been understood correctly
useful when the next action could cause effects that
cannot be easily undone
problem that too many unnecessary confirmations can
make the interface too verbose and lengthen the
interaction
problem with confirming several values at a later stage
System: Where are you travelling to?
User: London.
System: What day?
User: Friday.
System: So you want to travel to London on Sunday?
Implicit Verification
Incorporates the user’s response in the next prompt
Saves time by not asking the user for explicit
confirmation
problem of how to handle the wide range of possible
corrections if the system has misunderstood
User: I want to fly from Belfast to London.
System: At what time do you want to fly from Belfast to London?
User: Seven in the evening.
Possible user corrections:
No, not London, Luton. | Luton, not London. | I said Luton.
Belfast to Luton. | Belfast Luton at seven in the evening …
Mixed strategy for verification
Shift to more explicit method if problems arise
User: I want to fly from Belfast to Luton.
System: At what time do you want to fly from Belfast to London? Implicit
User: No, not Luton, London.
System: Do you want to fly to London? Please answer yes or no.
User: No. Explicit –
System: Please spell your destination. Explicit – Leading prompt
User: L-U-T-O-N. Ask for spelling
System: So you want to fly to Luton? Explicit
User: Yes.
1System: At what time do you want to fly from Belfast to Luton? Implicit
Accessing external knowledge
Problems that arise when there are discrepancies
between the information that the user requests and
what is available in the external knowledge source.
The vocabulary of the dialogue does not map directly
on to the vocabulary of the application.
The data that is retrieved is ambiguous or
indeterminate.
Problems with vocabulary
Misspellings, different spellings of items such as
names, abbreviations, or different ways of referring to
the same item.
May be handled in an “ad hoc” way by providing
alternative representations of the items.
More general approach
Enhance the Dialogue Manager with an Information
Manager that deals with complex information processing
involving the application knowledge source
Could involve the use of an ontology to model vocabulary
and concepts that are related
Ambiguous and indeterminate data
Handling under-specified or ambiguous values
Examples from Philips Train Timetable system
disambiguate train stations with the same name (such as
“Frankfurt am Main” and “Frankfurt an der Oder”, which
might both be referred to in a dialogue using the shorter
name “Frankfurt”).
combining values, for example, if a user calls in the
afternoon with the utterance “today at 8”
the two values can be combined into the single value
20.00 hours given that the value 08.00 hours is no longer
valid.
Mechanisms such as these are generally developed on
a fairly “ad hoc” way to handle ambiguity and
indeterminacy that may arise in a particular domain.
Relaxing parameters to resolve a query
Example - a query concerning a flight to London at 8 might be
unsuccessful, although there may be flights to London just before
or just after this time.
Approach - relax some of the parameters of the query until a
suitable result can be found in the database.
It may not be clear which item should be relaxed.
Is there a train from Birmingham to London arriving around 10 in the morning?
Relaxing the time parameter might return trains arriving at 9 and
11.
However, relaxing the transport parameter might return a flight or a
bus that arrives around 10.
In other cases the user might even be happy with a change in the
departure or destination cities, as would be the case with
alternative airports in the same city.
Making judgements about which parameters to relax requires
detailed analysis of the domain - there may not be any general
solutions to this problem.
Feature/Dialogue State-based Frame-based Agent-based
Control Strategy
Input Single words or Natural language Unrestricted
phrases with concept natural language
spotting
Verification Explicit Explicit and Grounding
confirmation – implicit
either of each confirmation
input or at end of
transaction
Dialogue model Information state Explicit Dialogue history
represented representation of Context
implicitly in information states Model of system’s
dialogue states Dialogue control intentions, goals,
Dialogue control represented with beliefs
represented control algorithm
explicitly with
state diagram
User model Simple model of Simple model of Model of user’s
user user intentions, goals,
characteristics or characteristics or beliefs
preferences preferences
Dialogue Engineering
Overview
The dialogue engineering process
Requirements analysis, functional specification, design,
implementation, testing, evaluation
Speech interface issues
Methods used for developing speech interfaces
Spoken language requirements
Wizard of Oz simulations
Design issues for speech interfaces
Testing speech interfaces
Evaluating speech interfaces
Requirements analysis: Use case
analysis
Whether the system is to replace or complement an
existing system
Whether speech is appropriate
The type of service to be provided by the system
The types of user who will make use of the system
The general deployment environment for the system
Voice Interface Considerations
It is important to identify the range of user speech behaviours so as
to constrain the user interface as much as possible
In a speech user interface human behaviour is less predictable
than with other technologies because humans may assume
speech with a machine can be the same as natural speech with
other humans
With a speech application, the machine recognition is not going to
be 100% accurate and the recognition will not be consistent across
user populations and environments (e.g., background noise levels).
Speech is sequential – it is not possible to present more than one
piece of information simultaneously. This may slow down the
interaction, as users must carefully listen to various lists, dialogue
flow cues, and help prompts before they can proceed
Presenting users with too much information taxes short-term
memory. Listening to long lists of choices is unreasonable, and
purely hierarchical, menu driven applications are exhausting.
Without visual cues and a well-established mental model for VUIs,
users have fewer ways to understand what choices are available to
them. Without careful attention to design, these limitations can
severely diminish system flexibility and user control.
Service Environment (Usage) Issues
In what type of environment will the users use the
system (quiet office, outdoors, noisy shopping mall)?
What type of phone connection will most of the users
have (land-line, cordless, cellular)?
How many speech to system interactions are there?
The more times a user must interact with the application,
the greater the chance that the user or the recognition
engine will make an error.
Error recovery is the toughest part of good user interface
design.
Users do not like it when they are not understood.
When there are more interactions, the application takes
longer to navigate, the task takes longer to complete, and
the risk of errors increases.
Spoken Language Requirements
Description of the vocabulary, grammar and interaction
patterns that are likely to be deployed in the system
Helps to determine the technologies that are to be used
- for example, isolated versus continuous speech
recognition, keyword spotting versus natural language
understanding, and directed versus mixed-initiative
dialogue
Analysis of human-human dialogues
Simulations – the Wizard of Oz method (WOZ)
Low-Level Design
use of barge-in
prompts
grammars: speech, DTMF, combination
Interaction style: directed dialogue or mixed initiative
navigation
system help
consistency
confirmation
error handling
Unit testing
Testing the recognition of the user’s input
Prompt Please enter your username
Input Expected result Actual result Pass/fail
Liz Liz Liz Pass
Margaret Margaret Margaret Pass
Mike Mike Margaret Fail
Testing the execution of paths in the dialogue
Prompt Please enter your username
Input Expected result Actual result Pass/fail
Liz Continue to next prompt Continue to next prompt Pass
Margaret Continue to next prompt Continue to next prompt Pass
William System help System help Pass
Sarah System help Continue to next prompt Fail
Integration Testing
1
System prompt User input Expected result Actual Result
Say ‘view’ or View Continue to next prompt Continue to next prompt
‘add’ 96050918 System retrieves student System retrieves student
Enter the details: firstname: details: firstname:
student id john, lastname: scott, john, lastname:
coursecode: dk003, scott, coursecode:
stage: 1 dk003, stage: 1
(Pass)
2
Say ‘view’ or View Continue to next prompt Continue to next prompt
‘add’ 96069783 System retrieves student System retrieves student
Enter the details: firstname: details: firstname:
student id david, lastname: john, lastname:
wilson, coursecode: scott, coursecode:
dk005, stage: award dk003, stage: 1
(Fail)
Evaluation: Qualitative
Analysis of user acceptance
1 2 3 4 5
It was easy to complete a task using the system.
It was easy to navigate around the system.
The system understood what you said.
The system’s speech was easy to understand.
The system responded in a timely manner.
The system responded in ways that you would expect.
The system was able to cope with errors.
You would prefer to use this system rather than a Web based
system.
Evaluation: System performance
Individual components
word accuracy
sentence accuracy (SA)
percentage of utterances completely and correctly
recognised, matched exactly with words in reference
answer
concept accuracy (CA)
percentage of concepts that have been correctly
understood
e.g. will it rain tomorrow in Boston
TOPIC: rain; DATE: tomorrow; CITY: boston
Dialogue Metrics
transaction success
how successful the system has been in providing the user with
the requested information
S (succeed), SC (succeed with constraint relaxation), SN
(succeed with no answer), F (fail).
number of turns / transaction time
correction rate
proportion of turns in a dialogue that are concerned with
correcting either the system’s or the user’s utterances, which
may have been the result of speech recognition errors, errors in
language understanding, or misconceptions
contextual appropriateness
dialogue strategy
PARADISE (PARAdigm for Dialogue
System Evaluation)
maximising user satisfaction
maximising task success
minimising costs
efficiency measures
qualitative measures.
attribute value matrix (transaction success)
represents information to be exchanged between the system
and the user in terms of a set of ordered pairs of attributes and
their possible values
confusion matrix: shows correct and incorrect values
Confusion Matrix
Username
Data Liz Margaret Mike Guest
Liz 25 1 0 5
Margaret 1 29 0 1
Mike 1 0 18 0
Guest 3 0 2 14
Total 30 30 20 20
Kappa P(A) - P(E)
=
coefficient 1- P(E)