A Question of Questions Prosodic Cues to Question Form - PowerPoint
Shared by: Yearoveryear
A Question of Questions: Prosodic Cues to Question Form and Function Julia Hirschberg (Joint work with) Jennifer Venditti and Jackson Liscombe Questioning in Dialogue • A fundamental activity in conversation • Elicit information • Elicit action • But • How to define a question? • Bolinger ‟57: “fundamentally an attitude…an utterance that „craves‟ a verbal or other semiotic … response” • Ginzburg & Sag „00: “the semantic object associated with the attitude of wondering and the speech act of questioning” • How to identify a question as such • How to represent its semantics? The intention of the questioner? Distinguishing Question Form and Function • Questions may take many syntactic forms • Is it a question? What is a question? It‟s a question, isn‟t it? Is it a question or an answer? Right? It‟s a question? • Questions may serve many pragmatic functions • Clarification-seeking? Information-seeking? Confirmation- seeking? • Possible Indicators • Syntactic cues • Context • Intonation Questions in Spoken Dialogue Systems • Goals • Examine question form and function • How are they related? • What features characterize them? • Identify form and function automatically in an Intelligent Tutoring domain Previous Studies • Integration of prosodic tree model with language model based on words yields best performance accuracy in detecting questions/question form (Shriberg et al.‟98: English) • Some corpus-based (MapTask) studies have examined tune/accent types wrt. question function (Kowtko‟96: Glaswegian English; Grice et al.‟95: German, Italian, Bulgarian) • Studies of different types (functions) of clarification questions (Rodríguez & Schlangen‟94: German; Edlund et al.‟95: Swedish) • Our goal: a comprehensive quantitative analysis of question form and function in English which will permit question form/function identification Domain: Intelligent Tutoring Systems • ITSs must be able to recognize both the form and function of student questions • Students ask human tutors many questions • More questions better learning • Different question FORMs seek different information • e.g. polar questions seek yes-no answer • wh-questions seek different information • Different question FUNCTIONs also often require different types of answers • Wh-questions, e.g. • Information-seeking: (S has just submitted an essay to the tutor) S: Ok, what do you think about that? T: Uh, well that uh you have uh there are too many parameters here which uh need definition ... • Clarification-seeking: • T: So if there is if the only force on an object in earth‟s gravity then what is its motion called? • S: What was the motion called? • T: Yes, what‟s the name for this motion? • Yes-no questions, e.g. • Information-seeking tutor provides additional information • Clarification clarification subdialogue • Successful ITSs must be able to recognize the presence of a question in a student turn and its form and function Question Corpus • Human-human tutoring dialogs collected by Litman et al.‟04 for development of ITSpoke, a speech-enabled ITS designed to teach physics • Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U. Memphis)) • Corpus includes 1030 student questions • „Question‟ defined a la Bolinger „57 as “an utterance that craves a response” • 25.2 Qs/hour • 13.3% of total student speaking time • This study: a subset of 643 tokens [pr01_sess00_prob58] Question Detection what symbol are you talking about do i have to rewrite this again am i ok with that so it‟d be one meter per second squared Coding question type • Form coding based on surface syntax • Declarative question (dQ): It‟s a vector? A vector? • Yes-no question (ynQ): Is it a vector? • Wh-question (whQ): What is a vector? • Tag question (ynTAG): It‟s a vector, isn‟t it? • Alternative question (altQ): Is it a vector or a scalar? • Particle (part): Huh? • Function coding derived from Stenström „84 • Confirmation-seeking check question (chk) • Clarification-seeking question (clar) • Information-seeking question (info) • Other (oth) Form/Function Distribution chk clar info oth N (%) dQ 257 81 2 4 344 (53.5) ynQ 53 80 27 5 165 (25.7) whQ - 47 21 - 68 (10.6) ynTAG 41 5 - - 46 (7.2) altQ 6 5 1 - 12 (1.9) part - 8 - - 8 (1.2) N 357 226 51 9 643 (%) (55.5) (35.1) (7.9) (1.4) (100) Falling (L-L%) F0 contours chk clar info oth N (%) dQ 3 4 - - 7 (2.0) ynQ - 4 5 2 11 (6.7) whQ - 12 17 - 29 (42.6) ynTAG 1 1 - - 2 (4.3) altQ 2 5 1 - 8 (66.7) part - - - - - N 6 26 23 2 57 (%) (1.7) (11.5) (45.1) (22.2) (100) F0 measures of non-falling questions • Quantitative analysis of F0 height in the 573 non-falling tokens w/sufficient data for analysis • Examined question nucleus (nucF0) and tail (btF0) only • Speaker-normalized (z-score) F0 of: • 1. nuclear accent (nucF0) • 2. rightmost edge of question (btF0) • 3. difference between 1 & 2 (riserange) Question Form and F0 • DeclQs and YNQs both thought to rise (H*H- H% vs. L*H-H%?): Are there F0 height differences between them? • 2-way ANOVA on form x function: FORM: nucF0: F(5)=19.34, p=0 btF0: F(5)10.71, p=0 riserange: F(5)=3.6, p<.01 • Planned comparisons (Tukey, alpha=.01) show no difference between declarative Qs and yes-no Qs • Main effect of form caused by yes-no tags (low F0) and particles (high F0) Normalized means at nucF0 and btF0 boundary nuclear accent boundary 2.5 2.5 2.5 2 2 2 normalized F0 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 -0.5 0 -0.5 -1 -0.5 -1 chk clar info chk clar info -1 ynQ dQ ynTAG whQ part ynQ dQ ynTAG whQ part ynQ dQ ynTAG whQ part Question Function and F0 • Question dialog acts thought to correlate with F0: Does question FUNCTION affect F0? • 2-way ANOVA on form x function: FUNCTION: nucF0: F(3)=16.6, p=0 btF0: F(3)=8.56, p<.001 riserange: F(3)=3.94, p<.01 • Main effect; planned comparisons show: • clarQ > chkQ (nucF0 & btF0) • infoQ > clarQ/chkQ (nucF0) • No interactions for any measure Clarification types and F0 Clark „96 levels of coordination: sources of communication problems 1 Channel: Problem hearing if the tutor actually said something or not (Huh?, Hm?) 2 Perception: Problem hearing what the tutor said („G‟ as in God?, Did you say a word or a letter?, including reprise/echo questions (A what?) 3 Understanding: Problem with reference resolution (This up here?, What did I imply or what does the statement imply?), or with general understanding (Is that the same thing or is that different?, What do you mean?) 4 Intention: Problem determining what the tutor intended by his utterance (You want an exact number?, Uh are you asking me another characteristic of freefall?) + Non-interlocutor-related (NIR): Problem understanding the task (Am I supposed to speak this or type it?), or clarification of the examination question (Should I assume both vehicles are going at the same speed?) Effects of Clarification Type • One-way ANOVA combining levels 1&2 into single acoustic/perceptual category: nucF0: F(3)=5.41, p=.001 btF0: F(3)=6.6, p<.001 riserange: F(3)=2.59, p=.05 • Main effect for clarification type • Ranking for each measure: higher F0 > > > > > > > > > > > > > > > lower F0 acoust/percept > understanding > NIR > intention • Planned comparisons (Tukey, alpha=.01) show only significant comparison was acoust/percep > intention Can Prosody Distinguish Question Form? Question Function? • Only a few question forms prosodically distinct in our study – lexico/syntactic information can help • Question function more successfully differentiated prosodically – where there is less reliable lexico/syntactic information • Can we use prosodic information with lexico- syntactic information to help identify question form and function automatically? Detecting Student Questions • Syntax • Wh-words, subject/auxiliary inversion • Prosody • Phrase-final rising intonation (Pierrehumbert & Hirschberg „90) • Duration and pausing (Shriberg et al. „98) • Lexico-pragmatics • personal pronouns, utterance-initial pronouns (Geluykens 1987; Beun 1990) Corpus • 141 ITSpoke dialogues • 5 hours of student speech • Student turns average 2.5 seconds • 1,030 questions • 25 questions per hour • 70% of turns consist entirely of the question • 89% of questions are turn-final Question Form Distribution in ITSpoke Form Example Distr. yes/no Is that right? 24% wh- What do you mean? 10% yes/no tag It will stay the same, right? 7% alternative Force or something? 3% particle Huh? 2% declarative The weight? 54% Question-Bearing Turns • Contain one or more questions • N = 918 Features Extracted • Prosodic • pitch • loudness • pausing • speaking rate • calculated over entire turn and last 200 ms • Syntactic • unigram and bigram part-of-speech tags Feature Extraction • Lexical • unigram and bigram hand-labeled transcriptions • Student and task dependent • pre-test score • gender • correctness • previous tutor dialogue act Machine Learning Experiments • Question-bearing vs. non-question-bearing • Down-sampled to 50/50 distribution • Experimented by feature type • Adaboosted C4.5 decision trees • 5-fold cross validation • Best results with all features • Accuracy = 79.7% • Precision = Recall = F-measure = 0.8 Accuracy by Feature Type prosody: pausing and speaking rate 52.6% student and task dependent 56.1% prosody: loudness 61.8% syntactic 65.3% lexical 67.2% prosody: last 200 ms 70.3% prosody: pitch 72.6% prosody: all 74.5% Feature Type Discussion • Which features most informative? • pitch slope of last 200 ms and entire turn • maximum and mean pitch of turn • Which features most often used in learning? • pre-test score • slope of last 200 ms • maximum pitch of entire turn • cumulative pause duration Other Observations • Syntactic features were informative • personal pronoun + verb, wh-pronoun, interjection • Lexical features were informative • yes, right, what, I, you Conclusions • Most questions in our tutoring corpus are declarative in form • More than syntax is needed to identify these as questions • Prosodic features are very important • Detecting question-bearing turns is possible • Detecting question function is needed Question Forms in ITSpoke Form Distr. Example declarative 54% The weight? yes/no 24% Is that right? wh- 10% What do you mean? yes/no tag 7% It will stay the same, right? alternative 3% Force or something? particle 2% Huh?