Infr Agreement

Document Sample
Infr Agreement Powered By Docstoc
					   Annotating Students’ Understanding
   of Science Concepts
         Rodney D. Nielsen, Wayne Ward,
         James Martin, and Martha Palmer
Center for Computational Language and Education Research
              University of Colorado, Boulder
            Annotating Fine-Grained Entailments
               Question: Kate said: “An object has to move
                to produce sound.” Do you agree with her?
                Why or why not?
               Reference answer: Agree. Vibrations are
                movements and vibrations produce sound.
               Learner answer: I do not agree because a
                radio does not move to make sound.
                     The student agrees             Contradicted
                     Vibrations are movement        Unaddressed
                     Vibrations produce something   Different Argument
                     Something produces sound       Expressed


LREC May 28, 2008, Rodney D. Nielsen                                  2
            Recognizing Textual Entailment


               Hypothesis: Agree. Vibrations are movements
                and vibrations produce sound.
               Text: I do not agree because a radio does not
                move to make sound.
                     The student agrees             False
                     Vibrations are movement        Unknown
                     Vibrations produce something   Unknown
                     Something produces sound       True


LREC May 28, 2008, Rodney D. Nielsen                           3
            Prior Work
               Automated Tutors
                     Aleven et al. 2001; Graesser et al., 2001; Jordan et al.,
                      2004; Koedinger et al. 1997; Makatchev et al., 2004; Peters
                      et al., 2004; Pon Berry et al., 2004; Roll et al., 2005; Rose et
                      al., 2003; VanLehn et al., 2005
               Constructed Response Scoring
                     Callear et al., 2001; Leacock and Chodorow, 2003; Mitchell
                      et al., 2002 & 2003; Pullman, 2005; Sukkarieh, 2003 & 2005
               PASCAL RTE (Dagan, Glickman and Magnini, 2005)
               Differences / Weakness
                     Course grained entailment – yes/no or grade: 0-2 points
                     Question-specific systems
                          Hand-crafted dialog control, parsers, knowledge-based
                           ontologies, logic representations, and or rules
                          Require 100-500 responses per question
LREC May 28, 2008, Rodney D. Nielsen                                                 4
           Necessity of Finer-Grained Analysis
               Imagine a tutor only knowing that there is some
                unspecified part of the reference answer that we are
                not sure the student understands
                   Reference Answer: A long string produces a low pitch.
               Break the reference answer down into low-level facets
                derived from a dependency parse and thematic roles
                   NMod(string, long)            The string is long.
                   Agent(produces, string)       A string is producing something.
                   Product(produces, pitch)      A pitch is being produced.
                   NMod(pitch, low)              The pitch is low.
               Assess whether an understanding of each facet is
                             det
                                           object
                                             det
                implicated by the student’s response
                            nmod
                                  subject
                                             nmod
                             A long string produces a low pitch.
LREC May 28, 2008, Rodney D. Nielsen                                                 5
           Representing Fine-Grained Semantics
              Assess the relationship between the student’s
               answer and the reference answer facets at a
               finer grain
                  Reference Ans: A long string produces a low pitch.
                NMod(string, long)          Yes
                                             Expressed
                                             Assumed
                Agent(produces, string)     Yes
                                             Expressed
                Product(produces, pitch)    Yes
                                             Expressed
                NMod(pitch, low)            No
                                             Unaddressed Expressed
                                             Different Argument
                                             Contradiction
               It produces a loud pitch.
                             high
               A long string produces a pitch.


LREC May 28, 2008, Rodney D. Nielsen                                    6
            The Focus of This Effort
               Low level facets of reference answer
               Finer-grained relationship to the facets




LREC May 28, 2008, Rodney D. Nielsen                       7
             The Corpus
Grd        Life Science       Physical Science and    Earth and         Scientific
                                  Technology         Space Science    Reasoning and
                                                                       Technology
 3-4 Human Body        Magnetism & Electricity Water           Ideas & Inventions
     Structure of Life Physics of Sound        Earth Materials Measurement
 5-6 Food & Nutrition Levers & Pulleys               Solar Energy    Models & Designs
     Environments     Mixtures & Solutions           Landforms       Variables

    Assessing Science Knowledge (ASK): Full Option Science System
          Berkeley, Lawrence Hall of Science national assessment project (NSF)
          16 science teaching and learning modules, Grades 3-6
          287 constructed response questions
          15,400 total student responses
          146,000 facet entailment annotations

LREC May 28, 2008, Rodney D. Nielsen                                                8
            Annotation Process
               Step 1: FOSS/ASK reference answers were manually
                decomposed into constituent facets
                     Ref Answer: The string is tighter, so the pitch is higher.
                     Be(string, tighter) The string is tighter.
                     Be(pitch, higher)   The pitch is higher.
                     Cause(X, Y)         X is caused by Y
               Step 2: Learner answers are annotated to indicate
                whether and how each facet was addressed
                     Learner Answer: The string is tighter, so there is less tension
                      so the pitch gets higher.
                     Be(string, tighter)   The string is tighter.   Self-Contra
                     Be(pitch, higher)     The pitch is higher.     Expressed
                     Cause(X, Y)           X is caused by Y         Expressed

LREC May 28, 2008, Rodney D. Nielsen                                                9
            Reference Answer Decomposition
               Begin with a manual dependency parse of the reference answer
                                 vc        vmod            sbar      prd
                    nmod sub vmod vmod pmod                     sub vmod
                The brass ring would not stick to the nail because the ring is not iron.

               Then raise main verbs, remove unimportant dependencies,
                incorporate copulas, prepositions and negation into dependency
                labels, and utilize thematic role labels
                             theme_not             cause_because
                     nmod                 destination_to_not        be_not
                The brass ring would not stick to the nail because the ring is not iron.



LREC May 28, 2008, Rodney D. Nielsen                                                   10
            Reference Answer Markup
                Final facets for Ref Answer: The brass ring
                 would not stick to the nail because the ring is
                 not iron.
                     NMod(ring, brass)               The ring is brass.
                     Theme_not(stick, ring)          The ring does not stick.
                     Destination_to_not(stick, nail) Something does not stick to the nail.
                     Be_not(ring, iron)              The ring is not iron.
                     Cause_because(stick, is)        X is caused by Y

                                theme_not            cause_because
                       nmod                  destination_to_not             be_not
                The brass ring would not stick to the nail because the ring is not iron.


LREC May 28, 2008, Rodney D. Nielsen                                                     11
            Answer Annotation Labels
               Assumed: Facets that are assumed to be understood a priori
                based on the question
               Expressed: Any facet directly expressed or inferred by simple
                reasoning
               Inferred: Facets inferred by pragmatics or nontrivial logical
                reasoning
               Contra-Expr: Facets directly contradicted by negation,
                antonymous expressions and their paraphrases
               Contra-Infr: Facets contradicted by pragmatics or complex
                reasoning
               Self-Contra: Facets that are both contradicted and implied
                (self contradictions)
               Diff-Arg: The core relation is expressed, but it has a different
                modifier or argument
               Unaddressed: Facets that are not addressed at all by the
                student’s answer

LREC May 28, 2008, Rodney D. Nielsen                                               12
            Annotation – Expressed & Inferred
               Question: Kate said: “An object has to move to
                produce sound.” Do you agree with her? Why or why
                not?
               Reference Answer: Agree. Vibrations are movements
                and vibrations produce sound.
               Root(root, agree)            student agrees                 Expressed
               Be(vibration, movement)      vibration is movement          Inferred
               Agent(produce, vibrations)   vibrations produce something   Expressed
               Patient(produce, sound)      something produces sound       Expressed

               Student Answer: Yes because it has to vibrate to
                make sounds.

LREC May 28, 2008, Rodney D. Nielsen                                                13
            Annotation – Contradictions
               Question: Darla tied one end of a string around a
                doorknob and held the other end in her hand. When
                she plucked the string (pulled and let go quickly) she
                heard a sound. How would the pitch change if Darla
                pulled the string tighter?
               Reference Answer: When the string is tighter, the
                pitch will be higher.
               Be(string, tighter) The string is tighter.   Assumed
               Be(pitch, higher) The pitch is higher.       Contra-Expr
               Cause(X, Y)         X is caused by Y         Assumed
               Student Answer: it will be low the pitch change


LREC May 28, 2008, Rodney D. Nielsen                                  14
            Annotation – Unaddressed
               Question: … Write a note to David to tell him why the
                pitch gets higher rather than lower
               Ref Ans: The string is tighter, so the pitch is higher.
                The string between the cup and table is not longer.
               …
               Be_not(string, longer) The string is not longer Unaddressed
               Student Answer: David pitch is not happening
                tension is happening okay so calm down.




LREC May 28, 2008, Rodney D. Nielsen                                          15
            Labels
               Assumed: Facets that are assumed to be understood a priori
                based on the question
               Expressed: Any facet directly expressed or inferred by simple
                reasoning
               Inferred: Facets inferred by pragmatics or nontrivial logical
                reasoning
               Contra-Expr: Facets directly contradicted by negation,
                antonymous expressions and their paraphrases
               Contra-Infr: Facets contradicted by pragmatics or complex
                reasoning
               Self-Contra: Facets that are both contradicted and implied
                (self contradictions)
               Diff-Arg: The core relation is expressed, but it has a different
                modifier or argument
               Unaddressed: Facets that are not addressed at all by the
                student’s answer

LREC May 28, 2008, Rodney D. Nielsen                                               16
            Inter-annotator Agreement
                                                  Fine-Grn       Tutor         Y/N
   In most disagreements              ITA         78.4%        86.2%         88.0%
    (57%) one annotator                Kappa        0.704       0.728         0.752
    chose Unaddressed
                                       Fine-Grn: all labels kept separate
        49% were between              Tutor: combine {Expressed, Inferred & Assumed}
         Unaddressed and                 and {Contra-Expr & Contra-Infr}, others separate
         Understood                    Y/N: combine {Expressed, Inferred & Assumed} v.
   35% of disagreements                 {everything else}
    were between the labels
    implying understanding
   Only 2.3% of
    disagreements are
    between Understood and
    Contradicted

LREC May 28, 2008, Rodney D. Nielsen                                                  17
            Assessment Technology Overview
               Start with hand-generated reference answer facets
               Automatically parse reference & learner answer and
                automatically extract representation
               Generate machine learning feature vectors indicative
                of the student’s understanding of each facet
                     From answers, their parses, the relations between these,
                      and corpus co-occurrence statistics
               Train a machine learning classifier on the training set
                feature vectors
               Use classifier to assess the test set answers,
                assigning one of five Tutor-Labels for each RA facet


LREC May 28, 2008, Rodney D. Nielsen                                             18
            Results (C4.5 decision tree)
                               # nonAsmd Majority         Lexical   All    Reduced
                                 Facets       Class      Baseline Features Training
     Training Set 10xCV           54,967       54.6         59.7         77.1
     Unseen Answers               30,514       51.1         56.1         75.5
     Unseen Questions              6,699       58.4         63.4         61.7        66.5
     Unseen Modules                3,159       53.4         62.9         61.4        68.8

               Results on Tutor-Labels are:
                    24.4, 8.1 and 15.4% over most frequent class baseline
                    19.4, 3.1 and 5.9% over lexical baseline
            (All Unseen Modules facets adjudicated, about half of other modules adjudicated)




LREC May 28, 2008, Rodney D. Nielsen                                                           19
            Conclusions
               New assessment paradigm to enable
                more effective tutoring dialog
                management
                     Facet break down: enables the tutor to
                      provide feedback relevant specifically to
                      the appropriate part of the reference
                      answer
                     Additional labels: facilitate understanding
                      the type of mismatch between the
                      reference answer/hypothesis and the
                      student’s answer/text
LREC May 28, 2008, Rodney D. Nielsen                                20
            Conclusions
               Corpus of annotated answers
                     Substantial agreement: 86.2% on Tutor-
                      Labels, 0.728 Kappa
                     About 146K facet annotations
                     Only corpus of fine-grained inference
                      information
                          Freely available
                     Will support alternative approaches to the
                      Recognizing Textual Entailment task


LREC May 28, 2008, Rodney D. Nielsen                               21
            Conclusions
               Answer Assessment System
                     Evaluation according to new paradigm
                     Within domain performance:
                          24% over majority class baseline
                     Out-of-domain performance:
                          15% over majority class baseline
                     First system to address out-of-domain
                      assessment
                     First successful assessment of Grade 3-6
                      constructed responses
LREC May 28, 2008, Rodney D. Nielsen                             22
            Thanks!
               This work was partially funded by
                Award Numbers:
                     NSF 0551723,
                     IES R305B070434, and
                     NSF DRL-0733323.




LREC May 28, 2008, Rodney D. Nielsen                23

				
DOCUMENT INFO
Description: Infr Agreement document sample