Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Transforming XML by hla22005

VIEWS: 4 PAGES: 13

									     Transforming XML
Canadian Symposium on Text Analysis
         University of Alberta
                2005-10-07

               David J. Birnbaum
            University of Pittsburgh
                djbpitt+@pitt.edu
       http://clover.slavic.pitt.edu/~djb/
                  Outline
• What can we do with XML?
  – Standards-based archival preservation
  – Data mining and analysis
  – Rendering
• How do we get rid of the angle brackets?
  – Transformation: eXtensible Stylesheet
    Language for Transformations (XSLT)
  – Decoration: Cascading Stylesheets (CSS)
                      XSLT
• Programming language for manipulating XML
• Tree transformation
  – Introduce decorative (formatting) information
  – Rearrange elements (unlike CSS)
  – Extract information from XML documents for further
    processing
  – Transform XML to XML, XHTML, HTML, plain text,
    PDF (via FO), etc.
  – Multipurposing
                       CSS
• Inline, internal, or external
• Formatting (“decorate the tree”)
• Override default XML or HTML rendering
  – OL OL {LIST-STYLE-TYPE: lower-alpha}
  – IMG {BORDER-TOP-STYLE: none; BORDER-
    RIGHT-STYLE: none; BORDER-LEFT-STYLE: none;
    BORDER-BOTTOM-STYLE: none}
  – .important {color: red}
• The class attribute and the <span> element
  – <p class="important"> ... </p>
  – <span class="important"> ... </span>
       Stylesheet Examples
• Igor tale (two versions)
• Russian fairy tales
• Metrical analysis of Russian verse
• Plectogrammatic representation of
  medieval Slavic manuscripts
• Russian incantations
                zagovory.dtd
<!ELEMENT text (title, stanza+)>
<!ELEMENT title (#PCDATA | variant | notegroup)*>
<!ELEMENT notegroup (item, variant?, note)>
<!ELEMENT item (#PCDATA | variant)*>
<!ELEMENT note (p | stanza)*>
<!ELEMENT p (#PCDATA)>
<!ELEMENT stanza (line+)>
<!ELEMENT line (#PCDATA | variant | notegroup)*>
<!ELEMENT variant (rdg)+>
<!ELEMENT rdg (#PCDATA)>
<!ATTLIST rdg wit (likh | p | k) "likh">
        XSLT is Declarative
• Procedural
  – Most traditional programming languages
  – First do step 1, then do step 2
• Declarative
  – XSLT
  – I’ll throw a document at you, and each time
    you meet an element, look up what to do with
    it
  – Order of templates (element rules) is free
             XSLT is XML
• Must begin with magic incantation
 <?xml version="1.0" encoding="UTF-8"?>


• Must be well-formed
  – Document must have a single root
  – All elements must nest properly
  – Output elements, not tags (see below)
            <xsl:stylesheet>
Root element is <xsl:stylesheet>

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
      (contents go here)
</xsl:stylesheet>
                  Templates
• Stylesheet contains template rules
• Template rules process elements
• <xsl:apply-templates/> means “process the
  contents”

<xsl:template match=“ordList">
  <ol>
    <xsl:apply-templates/>
  </ol>
</xsl:template>
   Output Elements, not Tags
• OK
  <xsl:template match=“ordList">
    <ol>
      <xsl:apply-templates/>
    </ol>
  </xsl:template>
• Not OK
  <xsl:template match=“ordList">
    <ol>
      <xsl:apply-templates/>
  </xsl:template>
                   XPath
• XPath identifies element to match

<xsl:template match=“ordList/listItem/ordList">
    <ol style="LIST-STYLE-TYPE: lower-alpha">
       <xsl:apply-templates/>
    </ol>
 </xsl:template>
              Conclusions
• What can we do with XML?
  – Standards-based archival preservation
  – Data mining and analysis
  – Rendering
• How do we get rid of the angle brackets?
  – Transformation: eXtensible Markup Language
    for Transformations (XSLT)
  – Decoration: Cascading Stylesheets (CSS)

								
To top