Embed
Email

Design

Document Sample

Shared by: cuiliqing
Categories
Tags
Stats
views:
2
posted:
11/12/2011
language:
English
pages:
40
Spoken Dialogue Technology

Overview



 The Dialogue Manager is the central component of a dialogue

system.

 Accepts spoken input from the user, produces messages to be

communicated to the user, interacts with external knowledge

sources, and generally controls the dialogue flow.

 The task may involve a fairly simple interaction in which users

retrieve information and perform routine transactions.

 Complex tasks involving negotiation and problem solving

 Dialogue initiative

 System directed, user directed, mixed initiative

 Dialogue control methods

 Finite state based, frame based, agent based

 Grounding

 Verification methods

 Dealing with external knowledge sources

System-directed dialogue



The system asks one or more questions to elicit some

information from the user so that it can submit an

appropriate query to the external knowledge source.



 required vocabulary and grammar for each response can

be specified in advance (+)

 speech recognition and language understanding are

constrained and are likely to be more accurate (+)

 restricts the user’s input to predetermined words and

phrases (-)

 difficult to correct misrecognised items (-)

 does not allow user to take the initiative and ask questions

or introduce new topics (-)

System-directed dialogue: example





S: How can I help you? For account balances and

activities, say account information.

To transfer funds between accounts, say ...

U: Account information

S: For which account would you like information? Please

say: savings, checking, or reserve account.

U: Savings

S: Would you like your savings account balance, last

deposit, last withdrawal, ...

U: Balance

S: Your savings account balance is seven thousand two

hundred fifty dollars

User initiative

User asks one or more questions that the system

interprets and answers.





 System needs to have S: How can I help you?

comprehensive speech and U: What's the balance in my

language processing

capabilities in order to checking account?

process and interpret a S: Your checking account

potentially wide range of balance is one thousand

input. five hundred dollars. How

 User needs to be aware of can I help you?

the words and phrases that

the system can interpret. U: What's the balance in my

savings account?

S: Your savings account

balance is seven thousand

two hundred fifty dollars

Mixed initiative



Either participant can take the initiative to ask questions,

initiate topics, and request clarifications

 permits more complex interaction between user, system,

and underlying application

 requires comprehensive natural language processing

and complex dialogue management

 focus on modelling dialogue as collaboration between

agents to solve a task

 different approaches depending on what is modelled

 Planning and plan recognition

 Beliefs, intentions and goals

 Rational agency

Dialogue Control Methods



 Finite state based (state transition networks, directed

graphs)

 Frame-based control

 Agent-based approaches

State-based dialogue control

 The nodes represent the system’s questions

 Transitions between the nodes represent all

the possible paths through the network.

 The graph specifies all legal dialogues

 Each state represents a stage in the

dialogue in which some information is elicited

from or confirmed with the user, or some

action is performed by the system.



System: What is your destination?

User: London

System: Was that London?

User: Yes

System: What day do you want to travel?

User: Friday

System: Was that Sunday

User: No

System: What day do you want to travel?

Advantages of State-based Dialogue

control

 suitable for system-directed dialogues

 suitable for well-structured tasks with pre-determined

sequence of questions

 dialogue can be modelled graphically

 can include sub-dialogues for sub-tasks e.g. getting a

date

 used widely in commercial applications

 some empirical evidence that users prefer a predictable

control flow

Disadvantages of State-based Dialogue

control

 Inflexible

 problem with dialogues that deviate from the

predetermined path

 difficult for user to make corrections

 difficult for user to introduce information not predicted at

design time

 not suitable for more complex tasks such as

negotiation, since the course of the dialogue has to be

specified in advance

 e.g. planning a journey may require discussion of

constraints that may be unknown at the outset

 planning for multiple paths and situations leads to

combinatorial explosion of states and transitions

Frame-Based Control



 declarative approach

 uses a template (or frame) containing slots to be filled

during the dialogue



 destination: London

 date: unknown

 time of departure: 9





 system decides the next question to be asked based on

what information has been elicited and what remains to

be elicited

 provides for a more flexible dialogue

 requires more elaborate dialogue control algorithm

Characteristics of Frame-based Dialogue

Control

 user can provide more information than was asked for

in the system prompt e.g.



System: where are you travelling to?

User: London on Friday

(System does not ask: When do you want to go to London)



 requires more extended natural language grammar as

user’s answer could include various permutations of the

required information e.g.

Destination

Destination + Date

Destination + Time

Destination + Date + Time

Destination + Time + Date

Problems: Complex Tasks



 users with wide range of different levels of knowledge

would require wide range of system responses

 the state of the world may change dynamically during

the course of the dialogue - not possible to specify all

possible configurations in advance;

 dialogues involving negotiation of some task to be

achieved, planning and other types of collaborative

interaction

Agent-based Control



 draws on methods from Artificial Intelligence

 focus on modelling dialogue as collaboration between

agents to solve a task

 permits more complex interaction between user,

system, and underlying application

 different approaches depending on what is modelled

 Planning and plan recognition

 Beliefs, intentions and goals

 Rational agency

 mixed initiative dialogue

 requires sophisticated natural language processing

Example of an agent-based system



User: I’m looking for a job in the Calais area. Are there

any servers?

System: No, there aren’t any employment servers for

Calais. However, there is an employment server for

Pas-de Calais and an employment sever for Lille. Are

you interested in one of these?



 system attempts to provide a more co-operative

response that might address the user’s needs.

Grounding



Some potential problems for a dialogue manager:



1. The speech recogniser may have detected silence even though

the user had spoken. (No words returned – noinput event).

2. Only a part of the user’s utterance has been recognised and

returned.

 beginning of the user’s input may have been cut off

 end of the user’s input could have been lost because the

engine stopped listening too early

3. All of the input has been captured but some or all of the words

were incorrectly recognised.

4. Even though all the words were correctly recognised, the language

understanding component was either unable to assign the correct

meaning or there were a number of possible meanings due to

ambiguity.

Clarification sub-dialogues



1. Simple approach – ask user to repeat

 Does not address the problem (no input, incomplete, etc)

 Relies on user to know how best to repeat or reformulate

2. More complex approach – attempt to detect and

address the problem

 If silence detected, use specific prompt for silence

 If unable to assign meaning to input, use specific prompt

addressing problem with understanding

 Built-in event handlers in VoiceXML

 More sophisticated methods in research systems

Verification (Confirmation)



 Current speech recognition technology cannot

guarantee that the system heard exactly what the user

said

 The system should confirm what the user wants,

especially if the next action could result in

unrecoverable consequences

 There are two types of verification

 explicit verification

 implicit verification

 It is also important to decide

 if confirmation should be made as each value is acquired

 if more than one value can be confirmed at a time

Explicit Verification



 Explicit verification plays back the user’s response to

check that it has been understood correctly

 useful when the next action could cause effects that

cannot be easily undone

 problem that too many unnecessary confirmations can

make the interface too verbose and lengthen the

interaction

 problem with confirming several values at a later stage





System: Where are you travelling to?

User: London.

System: What day?

User: Friday.

System: So you want to travel to London on Sunday?

Implicit Verification



 Incorporates the user’s response in the next prompt

 Saves time by not asking the user for explicit

confirmation

 problem of how to handle the wide range of possible

corrections if the system has misunderstood



User: I want to fly from Belfast to London.

System: At what time do you want to fly from Belfast to London?

User: Seven in the evening.



Possible user corrections:

No, not London, Luton. | Luton, not London. | I said Luton.

Belfast to Luton. | Belfast Luton at seven in the evening …

Mixed strategy for verification



Shift to more explicit method if problems arise



User: I want to fly from Belfast to Luton.

System: At what time do you want to fly from Belfast to London? Implicit

User: No, not Luton, London.

System: Do you want to fly to London? Please answer yes or no.

User: No. Explicit –

System: Please spell your destination. Explicit – Leading prompt



User: L-U-T-O-N. Ask for spelling

System: So you want to fly to Luton? Explicit

User: Yes.

1System: At what time do you want to fly from Belfast to Luton? Implicit

Accessing external knowledge



Problems that arise when there are discrepancies

between the information that the user requests and

what is available in the external knowledge source.



 The vocabulary of the dialogue does not map directly

on to the vocabulary of the application.

 The data that is retrieved is ambiguous or

indeterminate.

Problems with vocabulary



 Misspellings, different spellings of items such as

names, abbreviations, or different ways of referring to

the same item.

 May be handled in an “ad hoc” way by providing

alternative representations of the items.

 More general approach

 Enhance the Dialogue Manager with an Information

Manager that deals with complex information processing

involving the application knowledge source

 Could involve the use of an ontology to model vocabulary

and concepts that are related

Ambiguous and indeterminate data

 Handling under-specified or ambiguous values

 Examples from Philips Train Timetable system

 disambiguate train stations with the same name (such as

“Frankfurt am Main” and “Frankfurt an der Oder”, which

might both be referred to in a dialogue using the shorter

name “Frankfurt”).

 combining values, for example, if a user calls in the

afternoon with the utterance “today at 8”

 the two values can be combined into the single value

20.00 hours given that the value 08.00 hours is no longer

valid.

 Mechanisms such as these are generally developed on

a fairly “ad hoc” way to handle ambiguity and

indeterminacy that may arise in a particular domain.

Relaxing parameters to resolve a query

 Example - a query concerning a flight to London at 8 might be

unsuccessful, although there may be flights to London just before

or just after this time.

 Approach - relax some of the parameters of the query until a

suitable result can be found in the database.

 It may not be clear which item should be relaxed.



Is there a train from Birmingham to London arriving around 10 in the morning?



 Relaxing the time parameter might return trains arriving at 9 and

11.

 However, relaxing the transport parameter might return a flight or a

bus that arrives around 10.

 In other cases the user might even be happy with a change in the

departure or destination cities, as would be the case with

alternative airports in the same city.

 Making judgements about which parameters to relax requires

detailed analysis of the domain - there may not be any general

solutions to this problem.

Feature/Dialogue State-based Frame-based Agent-based

Control Strategy

Input Single words or Natural language Unrestricted

phrases with concept natural language

spotting

Verification Explicit Explicit and Grounding

confirmation – implicit

either of each confirmation

input or at end of

transaction

Dialogue model Information state Explicit Dialogue history

represented representation of Context

implicitly in information states Model of system’s

dialogue states Dialogue control intentions, goals,

Dialogue control represented with beliefs

represented control algorithm

explicitly with

state diagram

User model Simple model of Simple model of Model of user’s

user user intentions, goals,

characteristics or characteristics or beliefs

preferences preferences

Dialogue Engineering

Overview



 The dialogue engineering process

 Requirements analysis, functional specification, design,

implementation, testing, evaluation

 Speech interface issues

 Methods used for developing speech interfaces

 Spoken language requirements

 Wizard of Oz simulations

 Design issues for speech interfaces

 Testing speech interfaces

 Evaluating speech interfaces

Requirements analysis: Use case

analysis

 Whether the system is to replace or complement an

existing system

 Whether speech is appropriate

 The type of service to be provided by the system

 The types of user who will make use of the system

 The general deployment environment for the system

Voice Interface Considerations

 It is important to identify the range of user speech behaviours so as

to constrain the user interface as much as possible

 In a speech user interface human behaviour is less predictable

than with other technologies because humans may assume

speech with a machine can be the same as natural speech with

other humans

 With a speech application, the machine recognition is not going to

be 100% accurate and the recognition will not be consistent across

user populations and environments (e.g., background noise levels).

 Speech is sequential – it is not possible to present more than one

piece of information simultaneously. This may slow down the

interaction, as users must carefully listen to various lists, dialogue

flow cues, and help prompts before they can proceed

 Presenting users with too much information taxes short-term

memory. Listening to long lists of choices is unreasonable, and

purely hierarchical, menu driven applications are exhausting.

 Without visual cues and a well-established mental model for VUIs,

users have fewer ways to understand what choices are available to

them. Without careful attention to design, these limitations can

severely diminish system flexibility and user control.

Service Environment (Usage) Issues



 In what type of environment will the users use the

system (quiet office, outdoors, noisy shopping mall)?

 What type of phone connection will most of the users

have (land-line, cordless, cellular)?

 How many speech to system interactions are there?

 The more times a user must interact with the application,

the greater the chance that the user or the recognition

engine will make an error.

 Error recovery is the toughest part of good user interface

design.

 Users do not like it when they are not understood.

 When there are more interactions, the application takes

longer to navigate, the task takes longer to complete, and

the risk of errors increases.

Spoken Language Requirements



 Description of the vocabulary, grammar and interaction

patterns that are likely to be deployed in the system

 Helps to determine the technologies that are to be used

- for example, isolated versus continuous speech

recognition, keyword spotting versus natural language

understanding, and directed versus mixed-initiative

dialogue

 Analysis of human-human dialogues

 Simulations – the Wizard of Oz method (WOZ)

Low-Level Design



 use of barge-in

 prompts

 grammars: speech, DTMF, combination

 Interaction style: directed dialogue or mixed initiative

 navigation

 system help

 consistency

 confirmation

 error handling

Unit testing

Testing the recognition of the user’s input

Prompt Please enter your username



Input Expected result Actual result Pass/fail



Liz Liz Liz Pass



Margaret Margaret Margaret Pass



Mike Mike Margaret Fail



Testing the execution of paths in the dialogue

Prompt Please enter your username

Input Expected result Actual result Pass/fail

Liz Continue to next prompt Continue to next prompt Pass

Margaret Continue to next prompt Continue to next prompt Pass

William System help System help Pass

Sarah System help Continue to next prompt Fail

Integration Testing

1

System prompt User input Expected result Actual Result

Say ‘view’ or View Continue to next prompt Continue to next prompt

‘add’ 96050918 System retrieves student System retrieves student

Enter the details: firstname: details: firstname:

student id john, lastname: scott, john, lastname:

coursecode: dk003, scott, coursecode:

stage: 1 dk003, stage: 1

(Pass)

2

Say ‘view’ or View Continue to next prompt Continue to next prompt

‘add’ 96069783 System retrieves student System retrieves student

Enter the details: firstname: details: firstname:

student id david, lastname: john, lastname:

wilson, coursecode: scott, coursecode:

dk005, stage: award dk003, stage: 1

(Fail)

Evaluation: Qualitative

Analysis of user acceptance



1 2 3 4 5

It was easy to complete a task using the system.

It was easy to navigate around the system.

The system understood what you said.

The system’s speech was easy to understand.

The system responded in a timely manner.

The system responded in ways that you would expect.



The system was able to cope with errors.

You would prefer to use this system rather than a Web based

system.

Evaluation: System performance



 Individual components

 word accuracy

 sentence accuracy (SA)

 percentage of utterances completely and correctly

recognised, matched exactly with words in reference

answer

 concept accuracy (CA)

 percentage of concepts that have been correctly

understood

e.g. will it rain tomorrow in Boston

TOPIC: rain; DATE: tomorrow; CITY: boston

Dialogue Metrics



 transaction success

 how successful the system has been in providing the user with

the requested information

 S (succeed), SC (succeed with constraint relaxation), SN

(succeed with no answer), F (fail).

 number of turns / transaction time

 correction rate

 proportion of turns in a dialogue that are concerned with

correcting either the system’s or the user’s utterances, which

may have been the result of speech recognition errors, errors in

language understanding, or misconceptions

 contextual appropriateness

 dialogue strategy

PARADISE (PARAdigm for Dialogue

System Evaluation)

 maximising user satisfaction

 maximising task success

 minimising costs

 efficiency measures

 qualitative measures.



 attribute value matrix (transaction success)

 represents information to be exchanged between the system

and the user in terms of a set of ordered pairs of attributes and

their possible values

 confusion matrix: shows correct and incorrect values

Confusion Matrix



Username



Data Liz Margaret Mike Guest

Liz 25 1 0 5

Margaret 1 29 0 1

Mike 1 0 18 0

Guest 3 0 2 14

Total 30 30 20 20





Kappa P(A) - P(E)

=

coefficient 1- P(E)



Related docs
Other docs by cuiliqing
P-1 Area
Views: 0  |  Downloads: 0
server maps sep 07
Views: 6  |  Downloads: 0
MeetingPackage2
Views: 0  |  Downloads: 0
award_fy11
Views: 10  |  Downloads: 0
APPLICATION FOR A CHAPERONE LICENCE
Views: 1  |  Downloads: 0
273
Views: 0  |  Downloads: 0
PRE - HISTORY
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!