A Question of Questions Prosodic Cues to Question Form by nym11541

VIEWS: 18 PAGES: 33

									 A Question of Questions: Prosodic
Cues to Question Form and Function

            Julia Hirschberg
            (Joint work with)
 Jennifer Venditti and Jackson Liscombe
             Questioning in Dialogue
• A fundamental activity in conversation
   • Elicit information
   • Elicit action
• But
   • How to define a question?
      • Bolinger ‟57: “fundamentally an attitude…an utterance
        that „craves‟ a verbal or other semiotic … response”
      • Ginzburg & Sag „00: “the semantic object associated with
        the attitude of wondering and the speech act of
        questioning”
   • How to identify a question as such
   • How to represent its semantics? The intention of the
     questioner?
     Distinguishing Question Form and
                  Function
• Questions may take many syntactic forms
   • Is it a question? What is a question? It‟s a question, isn‟t it?
     Is it a question or an answer? Right? It‟s a question?
• Questions may serve many pragmatic functions
   • Clarification-seeking? Information-seeking? Confirmation-
     seeking?
• Possible Indicators
   • Syntactic cues
   • Context
   • Intonation
 Questions in Spoken Dialogue Systems

• Goals
  • Examine question form and function
      • How are they related?
      • What features characterize them?
  • Identify form and function automatically in an
    Intelligent Tutoring domain
                 Previous Studies
• Integration of prosodic tree model with language
  model based on words yields best performance
  accuracy in detecting questions/question form
  (Shriberg et al.‟98: English)
• Some corpus-based (MapTask) studies have
  examined tune/accent types wrt. question function
  (Kowtko‟96: Glaswegian English; Grice et al.‟95:
  German, Italian, Bulgarian)
• Studies of different types (functions) of clarification
  questions (Rodríguez & Schlangen‟94: German;
  Edlund et al.‟95: Swedish)
• Our goal: a comprehensive quantitative analysis of
  question form and function in English which will
  permit question form/function identification
  Domain: Intelligent Tutoring Systems

• ITSs must be able to recognize both the form
  and function of student questions
  • Students ask human tutors many questions
  • More questions  better learning
• Different question FORMs seek different
  information
  • e.g. polar questions seek yes-no answer
  • wh-questions seek different information
• Different question FUNCTIONs also often
  require different types of answers
• Wh-questions, e.g.
  • Information-seeking:
    (S has just submitted an essay to the tutor)
    S: Ok, what do you think about that?
    T: Uh, well that uh you have uh there are too many
    parameters here which uh need definition ...
  • Clarification-seeking:
      • T: So if there is if the only force on an object in
        earth‟s gravity then what is its motion called?
      • S: What was the motion called?
      • T: Yes, what‟s the name for this motion?
•   Yes-no questions, e.g.
    •   Information-seeking  tutor provides additional
        information
    •   Clarification  clarification subdialogue
•   Successful ITSs must be able to recognize
    the presence of a question in a student turn
    and its form and function
                  Question Corpus

• Human-human tutoring dialogs collected by Litman et
  al.‟04 for development of ITSpoke, a speech-enabled
  ITS designed to teach physics
   • Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U.
     Memphis))
• Corpus includes 1030 student questions
• „Question‟ defined a la Bolinger „57 as “an utterance
  that craves a response”
   • 25.2 Qs/hour
   • 13.3% of total student speaking time
• This study: a subset of 643 tokens
[pr01_sess00_prob58]
           Question Detection

what symbol are you talking about

do i have to rewrite this again

am i ok with that

so it‟d be one meter per second squared
                Coding question type
• Form coding based on surface syntax
  •   Declarative question (dQ): It‟s a vector? A vector?
  •   Yes-no question (ynQ): Is it a vector?
  •   Wh-question (whQ): What is a vector?
  •   Tag question (ynTAG): It‟s a vector, isn‟t it?
  •   Alternative question (altQ): Is it a vector or a scalar?
  •   Particle (part): Huh?
• Function coding derived from Stenström „84
  •   Confirmation-seeking check question (chk)
  •   Clarification-seeking question (clar)
  •   Information-seeking question (info)
  •   Other (oth)
        Form/Function Distribution

         chk      clar    info    oth       N (%)
 dQ      257      81        2      4      344 (53.5)
ynQ      53       80       27      5      165 (25.7)
whQ       -       47       21      -       68 (10.6)
ynTAG     41       5        -       -      46 (7.2)
 altQ      6       5        1       -      12 (1.9)
 part      -       8        -       -      8 (1.2)
  N      357      226      51       9        643
 (%)    (55.5)   (35.1)   (7.9)   (1.4)     (100)
        Falling (L-L%) F0 contours

        chk      clar     info     oth      N (%)
  dQ      3        4        -       -       7 (2.0)
 ynQ      -        4        5       2       11 (6.7)
 whQ      -       12       17       -      29 (42.6)
ynTAG     1        1        -       -       2 (4.3)
 altQ     2        5        1       -       8 (66.7)
 part     -        -        -       -           -
   N      6       26       23       2          57
 (%)    (1.7)   (11.5)   (45.1)   (22.2)     (100)
   F0 measures of non-falling questions

• Quantitative analysis of F0 height in the 573
  non-falling tokens w/sufficient data for
  analysis
• Examined question nucleus (nucF0) and tail
  (btF0) only
• Speaker-normalized (z-score) F0 of:
  • 1. nuclear accent (nucF0)
  • 2. rightmost edge of question (btF0)
  • 3. difference between 1 & 2 (riserange)
          Question Form and F0

• DeclQs and YNQs both thought to rise (H*H-
  H% vs. L*H-H%?): Are there F0 height
  differences between them?
• 2-way ANOVA on form x function:
    FORM: nucF0: F(5)=19.34, p=0
            btF0: F(5)10.71, p=0
            riserange: F(5)=3.6, p<.01
  • Planned comparisons (Tukey, alpha=.01) show no
    difference between declarative Qs and yes-no Qs
  • Main effect of form caused by yes-no tags (low
    F0) and particles (high F0)
                       Normalized means at nucF0 and btF0

                                                           boundary
                          nuclear accent                                     boundary
                                 2.5
                2.5                                         2.5

                  2                2                          2
normalized F0




                1.5              1.5                        1.5

                  1                                           1
                                   1
                0.5                                         0.5
                                 0.5
                  0                                           0

                -0.5
                                   0                        -0.5

                 -1             -0.5                         -1
                        chk      clar           info                 chk           clar           info
                                  -1
                ynQ      dQ   ynTAG      whQ        part       ynQ     dQ        ynTAG      whQ     part

                                   ynQ         dQ          ynTAG           whQ       part
        Question Function and F0

• Question dialog acts thought to correlate with
  F0: Does question FUNCTION affect F0?
• 2-way ANOVA on form x function:
  FUNCTION:       nucF0: F(3)=16.6, p=0
                  btF0: F(3)=8.56, p<.001
                  riserange: F(3)=3.94, p<.01
• Main effect; planned comparisons show:
     • clarQ > chkQ             (nucF0 & btF0)
     • infoQ > clarQ/chkQ       (nucF0)
     • No interactions for any measure
                 Clarification types and F0
Clark „96 levels of coordination: sources of communication problems
1   Channel: Problem hearing if the tutor actually said something or not
    (Huh?, Hm?)
2   Perception: Problem hearing what the tutor said („G‟ as in God?, Did you
    say a word or a letter?, including reprise/echo questions (A what?)
3   Understanding: Problem with reference resolution (This up here?, What
    did I imply or what does the statement imply?), or with general
    understanding (Is that the same thing or is that different?, What do you
    mean?)
4   Intention: Problem determining what the tutor intended by his utterance
    (You want an exact number?, Uh are you asking me another characteristic
    of freefall?)
+   Non-interlocutor-related (NIR): Problem understanding the task (Am I
    supposed to speak this or type it?), or clarification of the examination
    question (Should I assume both vehicles are going at the same speed?)
        Effects of Clarification Type
• One-way ANOVA combining levels 1&2 into
  single acoustic/perceptual category:
     nucF0: F(3)=5.41, p=.001
     btF0: F(3)=6.6, p<.001
     riserange: F(3)=2.59, p=.05
• Main effect for clarification type
• Ranking for each measure:
   higher F0 > > > > > > > > > > > > > > > lower F0
   acoust/percept > understanding > NIR > intention
   • Planned comparisons (Tukey, alpha=.01)
     show only significant comparison was
     acoust/percep > intention
    Can Prosody Distinguish Question Form?
             Question Function?
• Only a few question forms prosodically
  distinct in our study – lexico/syntactic
  information can help
• Question function more successfully
  differentiated prosodically – where there is
  less reliable lexico/syntactic information
• Can we use prosodic information with lexico-
  syntactic information to help identify question
  form and function automatically?
       Detecting Student Questions

• Syntax
  • Wh-words, subject/auxiliary inversion
• Prosody
  • Phrase-final rising intonation (Pierrehumbert &
    Hirschberg „90)
  • Duration and pausing (Shriberg et al. „98)
• Lexico-pragmatics
  • personal pronouns, utterance-initial pronouns
    (Geluykens 1987; Beun 1990)
                      Corpus

•   141 ITSpoke dialogues
•   5 hours of student speech
•   Student turns average 2.5 seconds
•   1,030 questions
•   25 questions per hour
•   70% of turns consist entirely of the question
•   89% of questions are turn-final
  Question Form Distribution in ITSpoke


      Form               Example              Distr.
yes/no        Is that right?                  24%
wh-           What do you mean?               10%
yes/no tag    It will stay the same, right?    7%
alternative   Force or something?              3%
particle      Huh?                             2%
declarative   The weight?                     54%
            Question-Bearing Turns

• Contain one or more questions
• N = 918
                Features Extracted

• Prosodic
  •   pitch
  •   loudness
  •   pausing
  •   speaking rate
  •   calculated over entire turn and last 200 ms
• Syntactic
  • unigram and bigram part-of-speech tags
                Feature Extraction

• Lexical
  • unigram and bigram hand-labeled transcriptions
• Student and task dependent
  •   pre-test score
  •   gender
  •   correctness
  •   previous tutor dialogue act
        Machine Learning Experiments

•   Question-bearing vs. non-question-bearing
•   Down-sampled to 50/50 distribution
•   Experimented by feature type
•   Adaboosted C4.5 decision trees
    • 5-fold cross validation
• Best results with all features
    • Accuracy = 79.7%
    • Precision = Recall = F-measure = 0.8
          Accuracy by Feature Type


prosody: pausing and speaking rate   52.6%
student and task dependent           56.1%
prosody: loudness                    61.8%
syntactic                            65.3%
lexical                              67.2%
prosody: last 200 ms                 70.3%
prosody: pitch                       72.6%
prosody: all                         74.5%
           Feature Type Discussion

• Which features most informative?
  • pitch slope of last 200 ms and entire turn
  • maximum and mean pitch of turn
• Which features most often used in learning?
  •   pre-test score
  •   slope of last 200 ms
  •   maximum pitch of entire turn
  •   cumulative pause duration
              Other Observations

• Syntactic features were informative
  • personal pronoun + verb, wh-pronoun, interjection
• Lexical features were informative
  • yes, right, what, I, you
                  Conclusions

• Most questions in our tutoring corpus are
  declarative in form
  • More than syntax is needed to identify these as
    questions
  • Prosodic features are very important
• Detecting question-bearing turns is possible
• Detecting question function is needed
           Question Forms in ITSpoke


      Form      Distr.              Example
declarative     54%      The weight?
yes/no          24%      Is that right?
wh-             10%      What do you mean?
yes/no tag       7%      It will stay the same, right?
alternative      3%      Force or something?
particle         2%      Huh?

								
To top