A Question of Questions Prosodic Cues to Question Form - PowerPoint
Document Sample


A Question of Questions: Prosodic
Cues to Question Form and Function
Julia Hirschberg
(Joint work with)
Jennifer Venditti and Jackson Liscombe
Questioning in Dialogue
• A fundamental activity in conversation
• Elicit information
• Elicit action
• But
• How to define a question?
• Bolinger ‟57: “fundamentally an attitude…an utterance
that „craves‟ a verbal or other semiotic … response”
• Ginzburg & Sag „00: “the semantic object associated with
the attitude of wondering and the speech act of
questioning”
• How to identify a question as such
• How to represent its semantics? The intention of the
questioner?
Distinguishing Question Form and
Function
• Questions may take many syntactic forms
• Is it a question? What is a question? It‟s a question, isn‟t it?
Is it a question or an answer? Right? It‟s a question?
• Questions may serve many pragmatic functions
• Clarification-seeking? Information-seeking? Confirmation-
seeking?
• Possible Indicators
• Syntactic cues
• Context
• Intonation
Questions in Spoken Dialogue Systems
• Goals
• Examine question form and function
• How are they related?
• What features characterize them?
• Identify form and function automatically in an
Intelligent Tutoring domain
Previous Studies
• Integration of prosodic tree model with language
model based on words yields best performance
accuracy in detecting questions/question form
(Shriberg et al.‟98: English)
• Some corpus-based (MapTask) studies have
examined tune/accent types wrt. question function
(Kowtko‟96: Glaswegian English; Grice et al.‟95:
German, Italian, Bulgarian)
• Studies of different types (functions) of clarification
questions (Rodríguez & Schlangen‟94: German;
Edlund et al.‟95: Swedish)
• Our goal: a comprehensive quantitative analysis of
question form and function in English which will
permit question form/function identification
Domain: Intelligent Tutoring Systems
• ITSs must be able to recognize both the form
and function of student questions
• Students ask human tutors many questions
• More questions better learning
• Different question FORMs seek different
information
• e.g. polar questions seek yes-no answer
• wh-questions seek different information
• Different question FUNCTIONs also often
require different types of answers
• Wh-questions, e.g.
• Information-seeking:
(S has just submitted an essay to the tutor)
S: Ok, what do you think about that?
T: Uh, well that uh you have uh there are too many
parameters here which uh need definition ...
• Clarification-seeking:
• T: So if there is if the only force on an object in
earth‟s gravity then what is its motion called?
• S: What was the motion called?
• T: Yes, what‟s the name for this motion?
• Yes-no questions, e.g.
• Information-seeking tutor provides additional
information
• Clarification clarification subdialogue
• Successful ITSs must be able to recognize
the presence of a question in a student turn
and its form and function
Question Corpus
• Human-human tutoring dialogs collected by Litman et
al.‟04 for development of ITSpoke, a speech-enabled
ITS designed to teach physics
• Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U.
Memphis))
• Corpus includes 1030 student questions
• „Question‟ defined a la Bolinger „57 as “an utterance
that craves a response”
• 25.2 Qs/hour
• 13.3% of total student speaking time
• This study: a subset of 643 tokens
[pr01_sess00_prob58]
Question Detection
what symbol are you talking about
do i have to rewrite this again
am i ok with that
so it‟d be one meter per second squared
Coding question type
• Form coding based on surface syntax
• Declarative question (dQ): It‟s a vector? A vector?
• Yes-no question (ynQ): Is it a vector?
• Wh-question (whQ): What is a vector?
• Tag question (ynTAG): It‟s a vector, isn‟t it?
• Alternative question (altQ): Is it a vector or a scalar?
• Particle (part): Huh?
• Function coding derived from Stenström „84
• Confirmation-seeking check question (chk)
• Clarification-seeking question (clar)
• Information-seeking question (info)
• Other (oth)
Form/Function Distribution
chk clar info oth N (%)
dQ 257 81 2 4 344 (53.5)
ynQ 53 80 27 5 165 (25.7)
whQ - 47 21 - 68 (10.6)
ynTAG 41 5 - - 46 (7.2)
altQ 6 5 1 - 12 (1.9)
part - 8 - - 8 (1.2)
N 357 226 51 9 643
(%) (55.5) (35.1) (7.9) (1.4) (100)
Falling (L-L%) F0 contours
chk clar info oth N (%)
dQ 3 4 - - 7 (2.0)
ynQ - 4 5 2 11 (6.7)
whQ - 12 17 - 29 (42.6)
ynTAG 1 1 - - 2 (4.3)
altQ 2 5 1 - 8 (66.7)
part - - - - -
N 6 26 23 2 57
(%) (1.7) (11.5) (45.1) (22.2) (100)
F0 measures of non-falling questions
• Quantitative analysis of F0 height in the 573
non-falling tokens w/sufficient data for
analysis
• Examined question nucleus (nucF0) and tail
(btF0) only
• Speaker-normalized (z-score) F0 of:
• 1. nuclear accent (nucF0)
• 2. rightmost edge of question (btF0)
• 3. difference between 1 & 2 (riserange)
Question Form and F0
• DeclQs and YNQs both thought to rise (H*H-
H% vs. L*H-H%?): Are there F0 height
differences between them?
• 2-way ANOVA on form x function:
FORM: nucF0: F(5)=19.34, p=0
btF0: F(5)10.71, p=0
riserange: F(5)=3.6, p<.01
• Planned comparisons (Tukey, alpha=.01) show no
difference between declarative Qs and yes-no Qs
• Main effect of form caused by yes-no tags (low
F0) and particles (high F0)
Normalized means at nucF0 and btF0
boundary
nuclear accent boundary
2.5
2.5 2.5
2 2 2
normalized F0
1.5 1.5 1.5
1 1
1
0.5 0.5
0.5
0 0
-0.5
0 -0.5
-1 -0.5 -1
chk clar info chk clar info
-1
ynQ dQ ynTAG whQ part ynQ dQ ynTAG whQ part
ynQ dQ ynTAG whQ part
Question Function and F0
• Question dialog acts thought to correlate with
F0: Does question FUNCTION affect F0?
• 2-way ANOVA on form x function:
FUNCTION: nucF0: F(3)=16.6, p=0
btF0: F(3)=8.56, p<.001
riserange: F(3)=3.94, p<.01
• Main effect; planned comparisons show:
• clarQ > chkQ (nucF0 & btF0)
• infoQ > clarQ/chkQ (nucF0)
• No interactions for any measure
Clarification types and F0
Clark „96 levels of coordination: sources of communication problems
1 Channel: Problem hearing if the tutor actually said something or not
(Huh?, Hm?)
2 Perception: Problem hearing what the tutor said („G‟ as in God?, Did you
say a word or a letter?, including reprise/echo questions (A what?)
3 Understanding: Problem with reference resolution (This up here?, What
did I imply or what does the statement imply?), or with general
understanding (Is that the same thing or is that different?, What do you
mean?)
4 Intention: Problem determining what the tutor intended by his utterance
(You want an exact number?, Uh are you asking me another characteristic
of freefall?)
+ Non-interlocutor-related (NIR): Problem understanding the task (Am I
supposed to speak this or type it?), or clarification of the examination
question (Should I assume both vehicles are going at the same speed?)
Effects of Clarification Type
• One-way ANOVA combining levels 1&2 into
single acoustic/perceptual category:
nucF0: F(3)=5.41, p=.001
btF0: F(3)=6.6, p<.001
riserange: F(3)=2.59, p=.05
• Main effect for clarification type
• Ranking for each measure:
higher F0 > > > > > > > > > > > > > > > lower F0
acoust/percept > understanding > NIR > intention
• Planned comparisons (Tukey, alpha=.01)
show only significant comparison was
acoust/percep > intention
Can Prosody Distinguish Question Form?
Question Function?
• Only a few question forms prosodically
distinct in our study – lexico/syntactic
information can help
• Question function more successfully
differentiated prosodically – where there is
less reliable lexico/syntactic information
• Can we use prosodic information with lexico-
syntactic information to help identify question
form and function automatically?
Detecting Student Questions
• Syntax
• Wh-words, subject/auxiliary inversion
• Prosody
• Phrase-final rising intonation (Pierrehumbert &
Hirschberg „90)
• Duration and pausing (Shriberg et al. „98)
• Lexico-pragmatics
• personal pronouns, utterance-initial pronouns
(Geluykens 1987; Beun 1990)
Corpus
• 141 ITSpoke dialogues
• 5 hours of student speech
• Student turns average 2.5 seconds
• 1,030 questions
• 25 questions per hour
• 70% of turns consist entirely of the question
• 89% of questions are turn-final
Question Form Distribution in ITSpoke
Form Example Distr.
yes/no Is that right? 24%
wh- What do you mean? 10%
yes/no tag It will stay the same, right? 7%
alternative Force or something? 3%
particle Huh? 2%
declarative The weight? 54%
Question-Bearing Turns
• Contain one or more questions
• N = 918
Features Extracted
• Prosodic
• pitch
• loudness
• pausing
• speaking rate
• calculated over entire turn and last 200 ms
• Syntactic
• unigram and bigram part-of-speech tags
Feature Extraction
• Lexical
• unigram and bigram hand-labeled transcriptions
• Student and task dependent
• pre-test score
• gender
• correctness
• previous tutor dialogue act
Machine Learning Experiments
• Question-bearing vs. non-question-bearing
• Down-sampled to 50/50 distribution
• Experimented by feature type
• Adaboosted C4.5 decision trees
• 5-fold cross validation
• Best results with all features
• Accuracy = 79.7%
• Precision = Recall = F-measure = 0.8
Accuracy by Feature Type
prosody: pausing and speaking rate 52.6%
student and task dependent 56.1%
prosody: loudness 61.8%
syntactic 65.3%
lexical 67.2%
prosody: last 200 ms 70.3%
prosody: pitch 72.6%
prosody: all 74.5%
Feature Type Discussion
• Which features most informative?
• pitch slope of last 200 ms and entire turn
• maximum and mean pitch of turn
• Which features most often used in learning?
• pre-test score
• slope of last 200 ms
• maximum pitch of entire turn
• cumulative pause duration
Other Observations
• Syntactic features were informative
• personal pronoun + verb, wh-pronoun, interjection
• Lexical features were informative
• yes, right, what, I, you
Conclusions
• Most questions in our tutoring corpus are
declarative in form
• More than syntax is needed to identify these as
questions
• Prosodic features are very important
• Detecting question-bearing turns is possible
• Detecting question function is needed
Question Forms in ITSpoke
Form Distr. Example
declarative 54% The weight?
yes/no 24% Is that right?
wh- 10% What do you mean?
yes/no tag 7% It will stay the same, right?
alternative 3% Force or something?
particle 2% Huh?
Get documents about "