Tutorial on Standoff Markup

Reviews
Shared by: techmaster
Stats
views:
12
rating:
not rated
reviews:
0
posted:
10/29/2008
language:
English
pages:
0
Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh Standoff Annotation • Don’t keep all your data in one big document • One document for each annotation level (with its own DTD) • Links between documents LTG link syntax (1) • an element can point to one or more contiguous elements in the same or a different document • each element is identified by a unique ID • a link is shown as an attribute on an element • default attributes in the DTD tell a program that this is a link LTG link syntax (2) • attributes to describe a link which will be embedded in the original element output document CDATA CDATA CDATA CDATA #IMPLIED #FIXED "simple“ #FIXED "embed“ #FIXED "auto" href xml:link show actuate Standoff Example (1): Words XML turn right for three centimetres okay Standoff Example (2): Moves XML Standoff Example (3): Moves and Words XML “words.dtd”> turn speaker=“spk1” id=“m1” href=“words.xml#id(w1)..id(w5) right ”/> for three centimetres href=“words.xml#id(w6)”/> okay Advantages of Standoff Annotation • It is possible to have levels of annotation which have crossing branches (not normally possible in XML) • New levels of annotation can be added without disturbing existing ones • Editing one level of annotation has minimal knock-on effects on others • People can work on different levels at the same time without worrying about creating different versions Example Map Task Annotation Structure Dialogue Games Dialogue Moves Game instruct M instruct M align M ack M instruct M align M ack S1 turn Words S2 right for three centimetres okay right three or four centimetres okay right reparandum Disfluencies Disfluency repair HCRC Map Task XML Corpus Architecture Gaze Timed Units Disfluencies Landmark References Moves Transactions Other Speaker’s Words Tokens Tagged Words Automatic Syntax Games Tools and Software • LTXML tools www.ltg.ed.ac.uk/software • MATE workbench (NITE) mate.nis.sdu.dk (nite.nis.sdu.dk) • Map Task XML www.hcrc.ed.ac.uk/maptask knit • Part of the LTXML toolkit • Allows you to “expand” links according to how they have been defined in the DTD (e.g. replace or embed) • Command line program, can be used in pipelines Standoff Example (3): Moves and Words XML “words.dtd”> turn speaker=“spk1” id=“m1” href=“words.xml#id(w1)..id(w5) right ”/> for three centimetres href=“words.xml#id(w6)”/> okay Standoff Example (4) Moves XML with embed links turn right for three centimetres okay Standoff Example (4) Moves XML with replace links turn right for three centimetres okay Working with knit • Use knit on one XML document to work with one hierarchical view of the data • To work across hierarchies, knit several views and navigate using the structures plus the unique ids of elements Stylesheets • style sheet: template rules – pattern which specifies which tree it applies to – pattern which specifies which tree it should output stylesheet processor – reads XML document and stylesheet – carries out the instructions in the stylesheet – outputs a new XML document or Template Matching • XPath is a language for addressing parts of an XML document, and is used by XSLT in the match attribute of a template e.g.