Annotation Guidelines:

Document Sample
Annotation Guidelines: Powered By Docstoc
					Draft 5.2                                 Erin Fitzgerald




            Speech Reconstruction
              Annotation Guide
                         for

   Conversational Telephone Speech Conversations


                      Draft 5.2
                   October 31, 2007

                    Erin Fitzgerald




9/14/2012                                           1/19
Draft 5.2                                                                                                                               Erin Fitzgerald


                                                    Table of Contents
Gold standard Speech Reconstruction Annotation Guidelines ............................................................ 3
    Goal: ................................................................................................................................................... 3

   1. Using the Tool................................................................................................................................... 3
     Keyboard shortcuts ........................................................................................................................... 4
     Overview of Edit Procedures .............................................................................................................. 4
     Suggested annotation procedure ......................................................................................................... 6

   2. Sentence Types ................................................................................................................................. 6
     1) Backchannel (Ctrl+1) .................................................................................................................. 7
     2) Well-formed sentence (Ctrl+2).................................................................................................... 7
     3) Well-formed fragment with content (Ctrl+3) .............................................................................. 7
     4) Fragment, no content (Ctrl+4) .................................................................................................... 7
      Differences between Fragments, “Backchannel” s-types.............................................................. 8
     5) Cannot repair sentence (Ctrl+5) ................................................................................................. 8
     6) Unannotated: (default annotation, Ctrl+0) ................................................................................. 8

   3. Reconstruction Actions ..................................................................................................................... 9
     Sentence Boundary Actions ................................................................................................................ 9
       1) Remove sentence boundary/ Join SUs: ................................................................................... 9
       2) Add sentence boundary: .......................................................................................................... 9
     Deletion Actions ................................................................................................................................. 9
       1) Delete co-reference: ........... Error! Bookmark not defined.Error! Bookmark not defined.
       2) Delete fillers (filled pauses and discourse markers): ............................................................. 9
       3) Delete leading coordinations: ............................................................................................... 10
       4) Delete extraneous phrase:..................................................................................................... 10
       5) Delete repeat/repair and delete restart ................................................................................. 10
     Insertion Actions ............................................................................................................................... 11
       1) Insert function word .............................................................................................................. 12
       2) Insert the neutral noun _NOUN_ .......................................................................................... 12
       3) Insert neutral verb................................................................................................................. 12
     Substitution Actions .......................................................................................................................... 12
       1) Substitute Tense/ Number Change ........................................................................................ 12
       2) Substitute: Transcriber Error ............................................................................................... 13
     Phrase Movement Actions ................................................................................................................ 13
       1) Phrase Movement: Adjunct ................................................................................................... 14
       2) Phrase Movement: Argument ............................................................................................... 14
       3) Phrase Movement: Fix Grammar ......................................................................................... 14

   4. Verb-Argument Labeling ................................................................................................................ 15

   5. Reconstruction Examples................................................................................................................ 17

Troubleshooting the annotation tool ..................................................................................................... 19

9/14/2012                                                                                                                                               2/19
Draft 5.2                                                                                                    Erin Fitzgerald



    Gold standard Speech Reconstruction Annotation Guidelines
Goal:
Transform a “W-layer”1 sentential unit2 (denoted SU) of verbatim speech text into a simply structured
grammatical sentence, or as close as you can come to it, via as few and as simple changes as possible.

    1. Using the Tool




                                                        (a)                                                              (b)
                       Figure 1: A view of the annotation tool, (a) before and (b) after annotation

Every speech conversation is associated with four files – two for the first speaker and two for the second
speaker. For each speaker in each conversation, there exist files in the form [convNumber_spkr].w (the
w-layer, or original, text) and [convNum_spkr].m (the m-layer, or reconstructed, text). Each *.m file
contains a set of consecutively spoken SUs from that speaker and that conversation. To annotate a
specific conversation file, run
         % [tool_installation_path]/test.prl [file].m

from a command line prompt (StartRuncmd if using Windows OS). If you do not know the tool
installation path, ask the annotation moderator.

Understanding the display: A window similar to that shown in Figure 1a above will appear. The tool
displays one SU at a time. On the lower “W-layer” part of the screen is a series of fixed word nodes
representing the original text. On the upper “M-layer” half of the screen is a series of editable word
nodes representing reconstructed text. Connecting the two layers is a set of arcs. A complete sentence
annotation requires at least one arc attachment to every node on both the m-layer and w-layer.

Viewing other SUs: The annotator can move to the next or previous sentence – or directly to a specific
sentence number – through the menu options or corresponding keyboard shortcuts (Table 1). He or she
can view a different sentence before completing the annotation for the current sentence, and can always
locate the next sentence with incomplete annotations through the “Go to next unannotated” command.

1
  W-layer here stands for the word layer, or the original text, as used in the Prague Dependency Treebank. This is the original
text in the bottom row of the tool. M-layer stands for the morphological layer of PDT, which should theoretically include
both base form and tense information, etc for the words giving meaning to the sentence. Here we will not concern ourselves
with morphological information, but will simple generate a grammatical and contentful reconstruction of the w-layer text.
The M-layer is the text, revised by you, on the top row of the tool.
2
  A sentential unit, or SU, is not necessarily a well-defined sentence but merely a sentence-like string of words.
9/14/2012                                                                                                                3/19
Draft 5.2                                                                                                  Erin Fitzgerald



    Keyboard shortcuts
    Ctrl+O                  Open                                Arrows, Home, End   Move focus
    Ctrl+S                  Save                                Shift+arrow         Move m-node/ move top arc end
    Ctrl+A                  Save As                             Ctrl+arrow          Move bottom arc end
    Ctrl+Q                  Quit                                Space               Edit active item
    Ctrl+Z                  Undo                                Delete              Delete active item
    F1                      Help                                Insert              Clone active item
    Ctrl+M                  Play sentence audio                 Ctrl+V              Label verb and annotate arguments
    Right-click and drag    Play audio for range of w-nodes
                                                                Ctrl+Shift+A        Show all sentences
    Sentence type labels:                                       Ctrl+P              Go to previous sentence
     Ctrl+1                 Backchannel                         Ctrl+N              Go to next sentence
     Ctrl+2                 Well-formed sentence                Ctrl+F              Go to first sentence
     Ctrl+3                 Well-formed fragment with content   Ctrl+T              Go to last sentence
     Ctrl+4                 Fragment, no content                Ctrl+U              Go to next unannotated sentence
     Ctrl+5                 Cannot fix sentence                 Ctrl
     Ctrl+0                 Unannotated                         Ctrl+R/ Ctrl+L      Attach right/left SU context

    Ctrl+Shift+F            Delete filler word                  Ctrl+Shift+R        Delete repeat/ repair word
    Ctrl+Shift+E            Delete extraneous phrase word       Alt+Shift+R         Delete restart fragment word
    Ctrl+Shift+L            Delete leading coordinator          Ctrl+R              Revert sentence to original form
                                                Table 1: Keyboard shortcuts



                                             Overview of Edit Procedures

All m-nodes and arcs can be altered by clicking/dragging or keyboard commands, as listed in the table
above and described below.

Altering M-Nodes
Double-clicking word nodes on the M-layer allows you to alter the word form (ex. change the tense of
a verb), give a label to the node (ex. mark that the new word form is a present-tense, third-person-
singular verb), or choose the lemma (not a valid option for this annotation task). You can delete and
insert m-nodes (after selecting a node by mouse or arrow key) through the menu or by pressing the
Delete or Insert keys, respectively. Selected m-nodes can be shifted to the left or to the right by pressing
SHIFT+{left or right arrow}, or simply clicking and dragging. Any arcs attached at the top to the m-
node will move along with the node.

Altering W-Nodes
W-nodes cannot be moved, added, or deleted.

Altering Arcs
Double-clicking arcs between the m-layer and the w-layer allows you to choose an arc label. All arcs
are labeled “Basic” by default. Any changes made to the m-layer nodes should be reflected in the label
of the arc originally connected to it. All deletions, for example, will result in a corresponding arc label
such as “Delete filler”. You can delete and insert arcs nodes (after selecting an arc by mouse or arrow
key) through the menu or by pressing the Delete or Insert keys, respectively. Selected arc roots (top) can
be shifted to the left or to the right one end at a time by pressing SHIFT+{left or right arrow}, or simple
clicking and dragging. Selected arc ends (bottom) are shifted by pressing CTRL+{left or right arrow} or
clicking and dragging. Shifting arcs has no effect on the placement of corresponding m-nodes.



9/14/2012                                                                                                               4/19
Draft 5.2                                                                                   Erin Fitzgerald


Which w-node should I connect my m-node to?
After an m-node is deleted, it is not always clear which other m-node the corresponding w-node should
have an arc from. Likewise, when nodes are inserted on the m-node side, the appropriately
corresponding w-node to connect by arc isn’t necessarily obvious. Below are some basic rules of thumb;
the rules are demonstrated in the Reconstruction Examples section beginning on page 17.

After deletions: Which word on the m-side was the speaker probably thinking of when he or she
produced the item you deleted? For repeats/repairs/restarts and coreference, this is generally obvious
(the replacement text on the right and the co-referred word(s), respectively). For filler words, this is
typically the word following the filler.
After insertions: Which word “generated” the inserted word, or tipped you off to the fact that the
inserted word was missing? For example, for missing determiners (the,a) this is typically the noun it
modifies. Inserted null nouns are generated by their governing verb, and an inserted verb can be linked
to its most dominant argument.
After substitutions: Arc anchors do not change; only the arc label is altered.
After phrase movement: Arcs should connect moved nodes to their original w-node positions. Additional
arcs should link m-nodes to their new head (for moved adjuncts or arguments, typically the referring
verb).
After joining sentences: This is treated like a phrase movement (across sentence boundaries), and so
should include an additional arc linking the main m-node of the less dominant sentence to the verb or
main w-node of the main sentence. See the discussion in the following section for more details.


Splitting and joining SUs
Often it is appropriate to split or join consecutive sentences in order to repair poor sentence
segmentation or to make the speakers thoughts clearer. The “Segment” commands in the “Sentence”
menu will allow the annotator to make these types of changes. See the Sentence Boundary Actions
section on page 9 for more details on when splitting or joining SUs is appropriate.

   Join SU Commands: Select the initial word node and select “Sentence Segment  Attach
   right/left content” to attach the following or previous SU to the current SU, respectively (Ctrl+r/
   Ctrl+l)

   Running the command while some mid-SU word node N is selected will cut all words from N until
   the end of the SU, and attach them to the following SU.

   If an arc is selected when “Attach right/left content” is chosen, no action will be taken.


   Split SU Commands: Select the word node which should begin the next SU and choose
   “Sentence  Segment  Create new right”.

   Alternately, select the word that should end this SU and choose “Create new left”.

   Again, if either command is chosen while an arc is selected, no action will be taken.

9/14/2012                                                                                                5/19
Draft 5.2                                                                                                   Erin Fitzgerald


                                      Suggested annotation procedure
          (See step-by-step example annotations in the Reconstruction Examples section, pg.17)

1) Read original sentence.
2) If meaning is unclear, or for further clarification, play the corresponding audio3. (Ctrl-m)
3) Identify and delete fillers, repeated and repaired words, and leading coordinations. Label the
    corresponding arc (see pg. 9 for Deletion Actions)
4) Mark and delete restart fragments and complex repairs. Label the corresponding arc. (see pg. 9)
5) Make necessary Phrase Movement Actions (see pg. 13) and delete any pronoun co-references no
    longer needed. Label the corresponding arc(s).
6) Insert additional nodes if needed. Label the corresponding arcs. (see “Insertion Actions” section on
    pg. 11)
7) Substitute word forms on the M-layer if needed. Label the corresponding arc. (see “Substitution
    Actions” section on pg. 12)
8) If an arc has multiple categories (ex. deleted node is both filler and part of a fragment), mark the arc
    with lowest order label as listed above. (in this case, filler)
9) Once optimal reconstructive clean-up has been accomplished, give a Sentence Types label (see pg.
    6) to the SU to indicate the quality of the final reconstruction.
10) Verb-argument labeling: For each verb in the cleaned sentence, label its arguments as defined by
    the Unified Verb Index at http://www.cs.rochester.edu/~gildea/Verbs/.
                                        Figure 2: Suggested annotation procedure

    2. Sentence Types
Annotators will assign sentence type (s-type) labels at the end of each sentence’s annotation process, as
an indicator of the completeness of the final reconstruction and its contribution to the content of the
conversation. Though a final step, these labels are important to understand early on. There are seven
sentence types:

         1)   Backchannel
         2)   Well-formed sentence
         3)   Well-formed fragment with content
         4)   Fragment without content
         5)   Cannot fix sentence
         6)   Unannotated (default)

Each label has a keyboard shortcut as defined below, and can also be accessed by choosing the
“Sentence  Change sentence type” menu. Sentence type annotations for each sentence are illustrated
by a background color code and are listed on the menu bar of the annotation tool, as shown in Figure 3.




                                             Figure 3: Sentence type displayed



3
 Playing audio: As described above, the “SentencePlay Sentence” command, or Ctrl+m, will play the displayed SU in its
entirety. Right-clicking and dragging across a set of words in the W-layer allows the annotator to listen to a segment of SU.
9/14/2012                                                                                                               6/19
Draft 5.2                                                                                   Erin Fitzgerald


1)    Backchannel (Ctrl+1)
A backchannel segment gives positive feedback and a response to the speaker without interrupting the
speaker or influencing the direction of the conversation. Typically, large portions of spoken responses in
a dialog are backchannels. A backchannel SU does not contribute content to the conversation, and thus
can be discarded without consequence or need for further editing. Note the difference between a
backchannel and a fragment.

     Examples:
      Uh, um, mhm, and all other stand-alone filler words
      Yeah, yes, right, correct, totally, true – simple confirmations or prompts for the other speaker to
        continue
      “Oh my god” and other contentless interjections without a verb
      “I know”, “I agree”

     Non-backchannel examples:
      “No”, “I disagree” – These are not backchannels because they provide contradiction and
       contrast, and often impact the direction of the conversation. Mark instead as “Well-formed
       fragment with content”.
      “That’s true” – Since a verb is included and the SU is grammatical, mark instead as “Well-
       Formed Sentence”.

2)    Well-formed sentence (Ctrl+2)
The final sentence is fluent and grammatical, as it might be if written in a newspaper.

3)    Well-formed fragment with content (Ctrl+3)
This label indicates that the final reconstruction contains content words (in other words, non-neutral
verbs or non-pronoun nouns), and it could be a substring in a grammatical sentence. However, some
element (perhaps a verb or an argument) is missing and complex analysis would be required to make the
necessary repairs. Note the difference between a well-formed fragment with content and a fragment
without content, discussed on page 7.

     Examples:
      Sentences with heavy ellipsis (ex. “I remember”)
      Noun Phases (ex. “Bob”, “The house around the corner”)
      Any other set of content words that could be appended on either end to form complete sentence
        without changing the set (ex. “so that it’ll get people’s attention”)
      Sentences with unfilled argument
        o He looks like he’s just looking for (fsh_117936B-43)
        o I’ve been watching so (fsh_117936B-53)
        o I think so because come on (fsh_117936B-67)
     NOT A FRAGMENT: “I wonder if there’s a separation between those that do things that are
     barbaric and those that don’t.” The verb phrase ellipsis at the end is okay here.

4)    Fragment, no content (Ctrl+4)
The SU does not contribute unique content to the conversation; discard.

9/14/2012                                                                                              7/19
Draft 5.2                                                                                   Erin Fitzgerald


     Examples:
      You could
      Which was that was that no (fsh_117936B-50)
      I mean it is not (fsh_117936B-75)


 Differences between Fragments, “Backchannel” s-types
Both fragment types indicate partially expressed thoughts. A “Fragment without content” sentence type
is made up of function words and possibly a pronoun subject (ex "I would well the") but is incomplete
and does not provide new content to the dialogue, while a “Well-Formed Fragment with content” does
include new information, either through a non-neutral verb or a non-pronoun content word (ex. “But a
beautiful woman”).

A backchannel is a complete but contentless response to help the flow of conversation (ex "yeah", "I
see", "Mhm"). While backchannels are also incomplete sentences, they are constructed this way
intentionally by the speaker, and with the intention of contributing to the flow of conversation rather
than the content of the conversation.

5)       Cannot repair sentence (Ctrl+5)
The annotator made the best simple improvements possible to the original SU, but the final SU could not
be a clean substring in a grammatical sentence. This s-label should also be used if the annotator simply
doesn’t understand what was intended to have been expressed and therefore has low confidence in the
final reconstruction.

      That’d make that that group that that’s all that well I don’t know
      That’d make that group that’s all well I don’t know

The reconstruction here deleted some repeats and a filler word and so is arguably an improvement from
the original text. However, the annotator judged the final SU as an unfixable ill-formed sentence.

          There used to be this I can’t remember the name of the group
          There used to be this group I can’t remember the name of the group

Here the node “group” was duplicated to fill the argument of the first segment. The reconstructed SU
would ideally be split into two SUs, but doing so would mean losing the source w-node for the duplicate
“group”. Thus the sentence cannot be split further, and the annotation must end without making the SU
grammatical or clean.

6)       Unannotated: (default annotation, Ctrl+0)
Manual reconstruction is incomplete. Leave sentences you’d like to come back to as “Unannotated”
(Ctrl+u will allow you to automatically move to the next unannotated sentence), but all sentences must
be assigned one of labels #1-5 by the end of the annotation process.




9/14/2012                                                                                                 8/19
Draft 5.2                                                                                  Erin Fitzgerald



   3. Reconstruction Actions
All changes, or actions, made by the annotator during the reconstruction process must be documented
via labels on the arcs connecting the original sentence (w-layer) word nodes to the reconstructed
sentence (m-layer) word nodes. Reconstruction options include removing/ adding sentence boundaries,
deletions, inserting neutral elements, phrase movement, and tense/number substitutions. Each of these
types of changes has various subtypes, as described below. Example ID numbers such as (fsh_115051)
refer to examples in the Reconstruction Examples section on pg. 17.




                                        Figure 4: Arc label window

                                     Sentence Boundary Actions

The sentence segmentation for the given set of SU can be altered through the “SentenceSegment”
menu. During the course of annotation, consecutively spoken sentences from the original conversation
are listed in order, so the annotator can make educated judgments as to whether a sentence boundary was
improperly placed.

1) Remove sentence boundary/ Join SUs:

   This reconstruction action type is not relevant in this task, except as a means of undoing an
   erroneously inserted sentence boundary.

2) Add sentence boundary:
   New SU boundaries should be added if the SU expresses multiple distinct thoughts, some of which
   are sentences in their own right. Avoid adding sentence boundaries when the original SU can be
   cleaned into a well-constructed sentence without the boundary.

   -   See (fsh_118378A-8), (fsh_118378A-24) below on page 17 for SU splitting examples.


                                           Deletion Actions
1) Delete co-reference:
   When redundant references to the same entity exist in an SU, the less descriptive of the two co-
   references should be deleted, even if it forces phrase movement of the second referring phrase. Arcs
   from the original and deleted word should be inserted to connect the co-reference with its co-
9/14/2012                                                                                            9/19
Draft 5.2                                                                                    Erin Fitzgerald


   referent(s), all with the appropriate label. If the non-deleted coreferent is longer than five words, the
   deleted co-reference should be connected only to the head or main word of the phrase.

   See examples of this action in “Reconstruction Examples” (fsh_115051), (fsh_117936B-86), (fsh_117936B-
   93) on pg. 17.

2) Delete fillers (filled pauses and discourse markers):
   -   Filled Pauses: uh, um, mhm, etc
   -   Discourse Markers: you know, you know what, so (as filler), oh, see, like, I mean
   -   Short interjections: (embedded question to self like “what was her name” or parts thereof).
       See (fsh_117936B-46) on page 15 for an interjection example (some tone lost)

   Arcs should be connected to the next word, except in the case of a SU-final filler (which should
   connect to the left).

   Not a filler:
   - so (with the meaning “thus” or “because”): “So that it’ll get people’s attentions” – a more
      grammatical sentence would result after deleting these terms, but the speaker’s intended meaning
      would be lost. Leave the terms and label with s-type “Well-formed fragment w/ content” instead.

3) Delete leading coordinations:
   -   and: almost always drop, unless somehow important to meaning of sentence
   -   but: leave to preserve contrast unless unimportant to sentence meaning
   -   because: delete unless needed to show explicit rationale
   -   or: delete unless needed to show explicit contrast (ex “or or I’m not a vegan” – “or” here is   a
       fragment, and not used to show contrast (audio assistance useful)

4) Delete extraneous phrase:
   Only delete if it seems that sentence meaning will not be affected.
   -   It was with that guy Raul Compos or something like that (fsh_117936B-51)
   -   They noticed he did not know how to hold utensils right or anything (fsh_117936A-20)
   -   Especially now they can do things with stem cells and all that (fsh_118378B-5)

5) Delete repeat/repair and delete restart
   All of the above categories are incomplete thoughts (also known as reparandum) interrupted by a
   new thought. However, many differences exist, as described here.
   - Both repeats and repairs are part of a “rough copy”, where contents have a direct generation
       point (antecedent) in the surviving portion of the sentence with identical or highly similar
       wording.
           o Repeat example: “If you if you get a T. V. show like C. S. I.”
           o Repair example: “On Tuesday I mean on Wednesday I am going to visit my uncle”
           o See other examples in the “Reconstruction Examples” section below.

   -   Repeats can also be words repeated for emphasis; we sacrifice the emphasis for a cleaner SU.
          o “you’re talking about big big time producers” (fsh_117936B-68)

9/14/2012                                                                                              10/19
Draft 5.2                                                                                                     Erin Fitzgerald


             o “I really really hope so”

         Arcs from each deleted word should connect to the corresponding preserved word in the phrase.

    -    A restart is an abandoned thought (or part of an abandoned thought) whose preservation would
         not increase the content of the reconstructed sentence.
             o “I don’t I have cousins that are vegetarian.”
             o “Well I just think because on the computer there’s just so much stuff” (fsh_117716B-10)

                 However I think the technology works needs to be gone a step further
              However I think the technology needs to go a step further
                  Here we delete “works” as a restart fragment (replaced with “needs”). We change the
                  tense of “gone” to “go”, which leaves us with the extra function word “be”, which
                  should be linked to “gone” with the label “Substitute Tense” since “be” originated as a
                  modifier of the baseform verb “go”.
         Arcs from all deleted words in the restart should connect to the first word in the new phrase.

         If content would be lost by deleting this restart region, the SU should instead be split, with
         the fragment portion labeled as “Well-formed fragment w/ content”.
             o “And the doctor basically” + “I said these medications are a pain in the ass.”

    -    Repair or Restart? Open questions:
            o “so1 that they1 could so2 they2 ’ll sell”
                Only “so1” and “they1” have rough copies elsewhere in the sentence. Regardless, label all
                four initial words as repeat/repair.
            o “that I don’t I’m I don’t really know anybody that’s vegetarian”
                “I don’t” looks like reparandum, but “I’m” and “that” seem to be part of fragment. Label
                all words as restarts to be deleted.


                                                    Insertion Actions

Reconstruction Annotation actions are meant to be limited to the type a computer might learn to repair,
without the advantage of world knowledge, or even of topics mentioned previously in the conversation.
Thus, we severely restrict the types of allowable word insertions during annotation.

Legal insertions include inserting function words (such as prepositions, conjunctions, articles, and
relative pronouns4), inserting “to be” or “to have” neutral and auxiliary verbs5, and inserting the neutral
noun phrase node _NOUN_.




4
  A preposition is a word like “in”, “of”, “on”, “from” which provides relationships between nouns and verbs. A conjunction
is a word connecting phrases or entities like “and”, “but”, “or”. An article is a word like “a”, “an”, “the” which gives the
degree of specificity of a noun phrase. Relative pronouns includes “which” and “that”.
5
  Neutral verbs include “to be” and “to have”. Auxilary verbs include “is” in “is eating”, “have” in “have eaten”, “do”, “will”,
“can”, etc.
9/14/2012                                                                                                                11/19
Draft 5.2                                                                                   Erin Fitzgerald


1) Insert function word
   Again, legal function word insertions include prepositions, conjunctions, articles, and relative
   pronouns. See footnote 4.

      But it’s not like the movies have a standard rating system
    But it’s not like the movies which have a standard rating system

   Assisted here by audio, the function word “which” was inserted by an annotator to better represent
   the intentions of the speaker.

   Arcs should connect to the word(s) that required the inserted function word (ex. an inserted
   determiner like “the” should connect to the main noun of its noun phrase; a preposition should do
   the same; a phrase-connecting word like “which” or “and” should have multiple arcs, connecting to
   the main word of each phrase being connected (above, “which have a standard rating system”
   describes the noun “movies” and has the main verb “have”, so “which” would link to “movies” and
   “have”).

2) Insert the neutral argument _ARG_
   While the noun “it” is debatably neutral, inserting “it” to fill missing arguments can still influence
   meaning and lead to content loss so we use a more neutral insertion instead. Use this insertion type
   to fill simple missing arguments.
   -   “Still wants to party”  “_ARG_ still wants to party”
   An arc should link to the verb or adjective requiring the additional argument.

3) Insert neutral verb
   The term “neutral verb” is defined in Footnote 5.
      I actually working in Jersey
    I am actually working in Jersey

   Typically these inserted verbs are actually auxiliary to some main verb, which it should have a
   labeled arc connecting it to (ex. above, “am” would link to the verb “working”)

                                           Substitution Actions

Legal substitution moves include changing number or tense of terms while keeping the same baseform
and correcting transcription errors such as substituted homophones (ex. “there” instead of “their”).

Don’t forget to change the arc label as appropriate after all substitution actions!

1) Substitute Tense/ Number Change
      I haven’t saw the old one but I saw the new one
    I haven’t seen the old one but I saw the new one (fsh_117936A-12)

   A substitution action not only requires an explanatory label on the arc connecting the new word to its
   original form, but the word node itself should have a substitution tag on it. You can get to a list of

9/14/2012                                                                                             12/19
Draft 5.2                                                                                 Erin Fitzgerald


   word tag options via the “ArcEdit” menu, double-clicking the word m-node, or pressing the
   SPACE bar when the word node is selected. The menu showed in Figure 5 will then appear.




                             Figure 5: M-node annotation for substituted node

   For the English version of the tool, lemmas (root words) cannot be changed, but an appropriate tag
   for the replacement word should be chosen and the form field should display the newly substituted
   word.

   Tag options include:
      1) Verb: Infinitive (“[to] take”, “be”)
      2) Verb: Past tense (“took”, “was”)
      3) Verb: Present tense, non-3rd person singular (“take”, “are”)
      4) Verb: Present tense, 3rd person singular (“takes”, “is”)
      5) Verb: Gerund/ present participle ([is] “taking”, “being”)
      6) Verb: Past Participle ([have] “taken”, “been”)
      7) Verb: Modal or auxiliary verb (“should”, “might”, “could”, etc)
      8) Noun: Singular
      9) Noun: Plural

2) Substitute: Transcriber Error
   -   inpinged  impinged
   -   to  too and other homophones    (i.e. words with different spellings but the same sounds)


                                     Phrase Movement Actions

   Just as is done following “Join SU” moves, every phrase movement action should be followed by
   adding a new set of arcs between the words of the phrase to the main word they modify (often a
   verb).




9/14/2012                                                                                           13/19
Draft 5.2                                                                                 Erin Fitzgerald


1) Phrase Movement: Adjunct
   An adjunct phrase is an optional set of words that modifies the meaning of the sentence by
   providing additional context, like “on The Bachelor” or “last Friday”.
   Examples:
      In the house my mother would put us there
    My mother would put us there in the house (arcs from “in the house” to “there”)
      On the Bachelor do you remember when she had those twenty five guys
    Do you remember when she had those twenty five guys on the Bachelor

   Arcs should connect both the phrase words to their original positions, and the main word of the
   moved phrase (main preposition > verb > noun > adjective/adverb > function word) should connect
   to the main verb being modified. For the first example above, “in” would link to “put” and in the
   second example, “on” would link to “had”.

2) Phrase Movement: Argument
   See example in Figure 6 and (fsh_117936- 93) on page 15.




                    Figure 6: An example of phrase movement annotation (fsh_117936B-93)

   An arc should also be included from the main word of the moved phrase to the verb for which the
   phrase is an argument.

3) Phrase Movement: Fix Grammar
   At times, other parts of a statement are spoken out of the standard grammatical order without
   correction. For any phrase movement done for reasons of grammar rather than cleaner placement of
   adjuncts or verb arguments, the “Phrase Movement: Fix Grammar” arc label should be used.
   Example:
      It didn’t happen until like the next afternoon where we had both forgotten about it sort of
    It didn’t happen until the next afternoon where we had both sort of forgotten about it
     (117716a_8)




9/14/2012                                                                                            14/19
Draft 5.2                                                                                  Erin Fitzgerald


      4. Verb-Argument Labeling6 -- To be done in the future
Often an ill-formed SU is considered to be poorly constructed because it is missing vital information, or
otherwise includes too much useless information. We hope that learning to identify the core elements of
a sentence will assist in finding and correcting these problems. Therefore, once we have completed the
reconstruction process for each well-formed sentence, we will label the verbs and arguments in the
sentence to assist in future work for poor construction identification and correction.

All well-formed sentences, and some sentence fragments, include one or more verbs each paired with a
set of arguments. Some verbs require only one argument (ex. “[I] worked.”), and some several (ex. “[He]
gave [Mary] [a present].”). These arguments often take places as the commonly-described “Subject”
and “Object” roles, but in fact a complex set of “thematic roles” has been defined by the linguistic
community. Fortunately, it is not a goal of this annotation effort to define each of these argument types.
We will label arguments according to the PropBank model, using the argument definitions given for
each verb via “Frameset” webpages at http://www.cs.rochester.edu/~gildea/Verbs/.




                               Figure 7: Unified Verb Index pages for argument reference

      For any given verb, there may be up to five mandatory or directly influencing arguments, and any
      number of optional modifying arguments which give verb-independent context to the sentence (ex. a
      location or a time).




6
    Only necessary for Well-Formed Sentences!
9/14/2012                                                                                           15/19
Draft 5.2                                                                                   Erin Fitzgerald




                                     Figure 8: Argument label selection

   Given the infinitive form of any verb (ex. “(to) know” or “be” but not “knew”, “knowing”, or “is”),
   the Propbank model has defined an argument label for all standard arguments. Loosely, the labels
   are as follows:

      Arg0 ≈ agent causing a change of state (ex. the knower, the giver, the hitter)
      Arg1 ≈ direct object / theme / “patient” (undergoes change of state) (ex. thing given, thing hit)
      Arg2 ≈ indirect object / benefactive / instrument / attribute / end state (ex. recipient of gift)
      Arg3 ≈ start point / benefactive / instrument / attribute
      Arg4 ≈ end point
      Additional modifying arguments (ArgM) include ArgM-TMP for time-oriented information and
       ArgM-LOC for location-oriented information pertaining to the given verb (ex. yesterday, in the
       park)

When a verb has multiple senses, it is important to label its arguments for the appropriate sense, as seen
in the two Framesets for “leave” in Figure 9.

            Frameset leave.01 "move away from":      Frameset leave.02 "give":
            Arg0: entity leaving                     Arg0: giver
            Arg1: place left                         Arg1: thing given
            Ex. [John]Arg0 left [the store]Arg1.     Arg2: beneficiary
                                                     Ex. [John]Arg0 left [Anna]Arg2 [a note]Arg1.
                                Figure 9: Dual framesets for the verb "leave"

Phrasal verbs: Often in English, verbs take the form of “phrasal verbs”, where a typical verb word is
followed by a preposition or particle, such as in “leave out” or “go on”, and hence the sentence verb is
really a collection of words instead of only one. Watch for this carefully to be certain that the correct
frameset (typically on the same page under the heading “verb_particle” (eg. “leave_out”)) is used. Any
particles to be included in the phrasal verb should be labeled as “Verb particle”.

What if the verb isn’t listed in the Propbank Framesets? Speak with the moderator – the group of
annotators will come to a collective decision on any verbs not immediately included in the Propbank
framesets.
9/14/2012                                                                                            16/19
Draft 5.2                                                                                        Erin Fitzgerald




   5. Reconstruction Examples
For each example below, the sentence ID corresponding to each example is given to the left. Here,
                                              th
(fsh_117936B-24) corresponds to speaker B’s 24 utterance in Fisher conversation id #117936. In addition,
we’ve listed
       The original verbatim (W-layer) transcript of the SU
       An example reconstruction (M-layer) of the SU
         o A list of actions taken to transform the W-layer text to the M-layer text
         o Links: Which reconstruction (w-layer) node should link to the original (M-layer) node?
          Verb and Argument Identification
The list of edit actions taken is followed by some explanatory text on considerations taken when
determining reconstruction decisions.

(fsh_117936B-24)      He1 and2 he3 almost4 gave5 it6 away7 that8 one9 time10 with11 with12 what13 was14
                       it15 Zora16 I17 think18
                    I think hem3 almost gave it away that one time withm12 Zora.
                       o   Delete reparandum: he1, with11; link to corresponding repair (hem3, withm12)
                       o   Delete leading coordination: and; link to the right (he3)
                       o   Delete filler: what was it; link to the left (with2)
                       o   Phrase movement: Arg: I think; link both words to “gave”
                          Verb and Argument Identification
                            o think: Arg0 (thinker)=I, Arg1(thought)= “I think hem3 almost gave it...”
                            o give_away: Arg0 (giver): He, Arg1 (thing given): it

                   Here “I think” must be preserved for tone and as a measure of uncertainty. Its usage
                   maintains the uncertainty from “what was it”, which can be deleted.

    (fsh_115051)      But1 when2 I3 watched4 the5 news6 like7 the8 evening9 news10 or11 the12 late13
                       news14 and15 a16 lot17 of18 the19 people20 there21 the22 citizens23 you24 know25 the26
                       people27 there28 they29 were30 really31 against32 it33
                    But when I watched them8 eveningm9 newsm10 or them12 latem13 newsm14 a lot of
                       them22 citizens andM them26 peoplem27 therem28 were really against it

                       o Delete reparandum: the news{5-6}, the people there{19-21};
                         link to the corresponding repairs (the newsm{8,10}, the people therem{26-28})
                       o Delete filler: like7, you know{24-25}; link to the left (newsm6, citizensm23)
                       o Delete unnecessary function word: and15; link to the left (news14)
                       o Insert function word: andM; link to left (citizensw23)
                       o Delete co-reference: they29; link to head of “the citizens andM the people
                         there”, which is andM.
                        Verb and Argument Identification:
                         o watch: Arg0=I, Arg1(thing looked at)= “the evening news or the late
                            news”
                         o be: Arg0= “a lot of the citizens and the people there”, Arg1= “really against it”
9/14/2012                                                                                                 17/19
Draft 5.2                                                                                          Erin Fitzgerald



                    This example includes the cleanup of several reparandum and fillers. There is also
                    the less trivial deletion of and15 (since phrase 1-14 is an adjunctal phrase for the
                    main sentence and not a sentence in its own right), insertion of connecting
                    conjunction andM between noun phrases “the citizens” and “the people”, and the
                    identification of they29 as a redundant co-reference.

(fsh_117936B-46)       You1 know2 what3 there4 was5 this6 other7 show8 where9 where10 was11 it12 like13 a14
                        it15 was16 it17 the18 Joe19 Millionaire20
                     There was this other show Joe Millionaire
                        o Delete Filler: you know what{1-3}, it was{15-16}, it17; link to the right
                        o Delete Reparandum: where1; link to the corresponding repair (where2)
                        o Delete Fragment: where2, was it like a{11-14}; link to the right (Joe19) and move
                          wherew1 wherem2 arc to wherew1Joem19
                        o Delete extra function words: the18; attach right
                         Verb and Argument Identification
                          o be (existential): Arg1= “this other show Joe Millionaire”. * NOTE: “There”
                            is a marker of existence, and not an argument.

(fsh_117936B-86)     The1 the2 Joe Millionaires we know that1 that2 ’s1 that3 ‘s2 not going anywhere
                     We know that the Joe Millionaires are not going anywhere
                        o Delete Reparandum: the1, that2, ’s1
                        o Phrase Movement - Argument: the Joe Millionaires is subject of the verb “to
                          be” (original text: ’s2) and moves rightward
                          (add arc label for movement head?)
                        o Delete Co-reference: that3 attaches to head of NP, here Millionaire
                        o Substitution - Tense Change: ’s2 chances to are
                         Verb and Argument Identification
                          o know: Arg0= We, Arg1= “the Joe Millionaires are not going anywhere”
                          o go: Arg1 (go-er)=“the Joe Millionaires”, Arg4 (end point): anywhere

 (fsh_117936- 93)      For the1 the2 For Love or Money people I1 I2 I3 think there’s a fifty1 fifty2 chance it
                        might work out with them
                       I think there’s a fifty fifty chance it might work out with the For Love or Money
                        people
                        o Delete Reparandum: the1, I1, I2
                        o Phrase Movement (Argument): the For Love or Money people is an argument
                          of them
                          (add arc label for movement head?)
                        o Co-reference: them attaches to NP head people
                         Verb and Argument Identification
                          o think: Arg0= I, Arg1= “there’s a fifty fifty chance it might...people”
                          o be (existential): Arg1= “a fifty fifty chance it might... people”
                          o work_out: Arg1 (scheme)= it, Arg2= “the For Love or Money people”

 (fsh_118378A-8)       All I know about as far as this stuff goes are two things kosher because I’m Jewish

9/14/2012                                                                                                    18/19
Draft 5.2                                                                                       Erin Fitzgerald


                    All I know about as far as this stuff goes are two things
                    Kosher because I’m Jewish

                       o Split Sentence
                        Verb and Argument Identification
                         o know: Arg0= I, Arg2= “two things”
                         o be: Arg0= All I know about, Arg1= “two things”

                   Here we have the start of a listing of items. Making this SU grammatical as one
                   sentence will be very difficult. It is better to separate the SU into one complete
                   sentence and one contentful fragment.

(fsh_118378A-24)      I think it also has to do with people just feeling the pain for you 1 know1 I1 mean1
                       you can’t look into a puppy’s eyes and kick it
                      I think it also has to do with people just feeling the pain for _NOUN_.
                      You can’t look into a puppy’s eyes and kick it.
                       o Insert neutral noun _NOUN_.
                       o SU break insertion
                        Verb and Argument Identification
                         o think: Arg0= I, Arg2= “two things”
                         o has_to_do_with: Arg0= I, Arg2= “two things”
                         o feel: Arg0= I, Arg2= “two things”

                          Troubleshooting the annotation tool
   “Ctrl-n, Ctrl-p, other shortcuts are not responding correctly”
    Answer: Make sure that Caps Lock has not been turned on
   “I’m trying to change the current SU boundaries via one of the “SentenceSegment” commands,
    but nothing is happening”
    Answer: Make sure that an appropriate node with a connecting arc is selected; if an arc is selected,
    or a node with no arc is selected, then no action will occur.




9/14/2012                                                                                                 19/19

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:9/15/2012
language:Unknown
pages:19