Docstoc

7-NielsenEtAl-QG08

Document Sample
7-NielsenEtAl-QG08 Powered By Docstoc
					        Question Generation: Proposed
      Challenge Tasks and Their Evaluation

                       Rodney D. Nielsen
           Boulder Language Technologies, Boulder, CO
Center for Computational Language and Education Research, CU, Boulder
            The Nature of Automatic QG
                Application Dependent
                     Educational Assessment
                           Evaluate
                     Socratic Tutoring
                           Guide
                     Etc.
                           Gather information




QG Challenge, Sept 25, 2008, Rodney D. Nielsen   2
            Defining the QG Tasks
                QG can be viewed as a 3-step process
                     Concept Selection
                     Question Type Determination
                     Question Construction




QG Challenge, Sept 25, 2008, Rodney D. Nielsen          3
            Key Concept Identification
                Givens:
                     The full text document
                     The application track
                Objective:
                     Identify key spans of text for which
                      questions are likely to be generated.




QG Challenge, Sept 25, 2008, Rodney D. Nielsen                4
            Question Type Determination
                Givens:
                     Source text snippets
                     The full text
                     The application track
                Objective:
                     Identify the most likely types of questions
                      to be generated



QG Challenge, Sept 25, 2008, Rodney D. Nielsen                      5
            Question Construction
                Application independent
                Givens:
                     Source text snippets
                     A question type
                     The full text
                Objective:
                     Construct a natural language question


QG Challenge, Sept 25, 2008, Rodney D. Nielsen                6
            Evaluating Key Concept
            Identification
                K experts annotate a set of documents
                     Tag spans of text regarding key concepts
                     Adjudicate and tag as vital or optional
                Instance Recall for each vital snippet
                    IRi  max Vi  S j Vi
                                 j1..n
                Instance Precision based on all snippets
                    IPj  max S j  A i S j
                                 i1..m

              F-measure
                Fully Automatic
      
QG Challenge, Sept 25, 2008, Rodney D. Nielsen                   7
               Evaluating Question Construction
              Compare system question to K expert
               questions (similar to MT and AS)
              Average question F-measure based on facet
               entailment
                   Use most similar expert question
                   Recall: proportion of facets in the expert question
                    entailed by the system question
                   Precision: proportion of facets in the system
                    question entailed by the expert question


QG Challenge, Sept 25, 2008, Rodney D. Nielsen                            8
            Facet Representation
                Original Dependency Parse
                       det        vc              vmod              sbar       prd
                       nmod sub vmod        vmod pmod               det sub vmod
                                                    det
                The brass ring would not stick to the nail because the ring is not iron.

                Final Semantic Representation
                                  theme_not           cause_because
                       nmod                      destination_to_not     be_prd_not
                The brass ring would not stick to the nail because the ring is not iron.




QG Challenge, Sept 25, 2008, Rodney D. Nielsen                                             9
               Evaluating Question Construction
              Prior work
                   Analysis of n-gram size effects (Soricut and Brill, 2004)
                   Dependence evaluation metrics (Owczarzak et al., 2007)
                   F-measure in similar evaluations (Turian et al., 2003)
                   N-gram inadequacy in entailment (Perez & Alfonseca, 2005)
                   Macro-average over nuggets (Lin & Demner-Fushman, 2005)
                   Facet entailment results (Nielsen et al., 2008)




QG Challenge, Sept 25, 2008, Rodney D. Nielsen                              10
            Summary
                QG can be viewed as a 3-step process
                     Concept Selection
                     Question Type Determination
                     Question Construction
                Ultimate goal should be very context
                 specific Question Generation
                     E.g., incorporating a learner model with
                      their goals and a history of interactions

QG Challenge, Sept 25, 2008, Rodney D. Nielsen                    11
            Thanks!
                Thanks to Wayne Ward, Steve Bethard,
                 James Martin, Matha Palmer, Philipp Wetzler,
                 the CU Computational Semantics Group and
                 the anonymous reviewers for helpful
                 feedback.
                This work was partially funded by Award
                 Numbers:
                     NSF 0551723,
                     IES R305B070434, and
                     NSF DRL-0733323.
QG Challenge, Sept 25, 2008, Rodney D. Nielsen              12
            Evaluating Question Construction
                A Unified Framework for Automatic Evaluation
                 using N-gram Co-Occurrence Statistics
                 (Soricut and Brill, 2004)
                     MT: 4-grams to ensure fluency
                     AS: unigrams; little syntactic construction
                     QG: bigram-level; uses question stems and
                      extraction of key phrases, but more syntactic
                      composition than typical AS




QG Challenge, Sept 25, 2008, Rodney D. Nielsen                        13

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:9/19/2012
language:English
pages:13