Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

The Semantic Blessings of XSL-T

VIEWS: 0 PAGES: 19

									The Semantic Blessings of XSLT



Diederik Gerth van Wijk
dg@doxatrix.nl

XML Holland 2008
Planetarium Gaasperplas, Amsterdam, 20 november




                                      DOXATRIX
Intended audience
   Understands English
   Knows what XML is about
   Cares about meaning, processing and validation
   Does not need to know about XSLT
   Does not need to be a programmer
   But might be aware that computers need to be programmed




Diederik Gerth van Wijk          Semantic Blessings of XSLT   2
Semantic? Blessings? XSLT?
   XML is about the structure of a document
   Semantics are about “meaning”
   A schema can say that a document should have a title (structure)‫‏‬
   The documentation might add that a title is used for identification (unique within a
    set of documents), and give a clue about what the document is about (semantics)‫‏‬
   The words used in the title are really semantics
   Blessings are good, helpful, you want them
   What is XSLT?
   How can XSLT help you in adding, verifying and using semantic markup?




Diederik Gerth van Wijk            Semantic Blessings of XSLT                              3
Why bother marking up explicitly?




Diederik Gerth van Wijk   Semantic Blessings of XSLT   4
NLP is good, Explicit Markup is better
   “Plein 26 Den Haag”=<street>Plein</street><nr>26</nr><city>Den Haag</city>
   “Plein 1813 Den Haag”=<street>Plein 1813</street><city>Den Haag</city>
   XML is about tagging structure
   A schema adds semantics
   <name>Quattro Staggioni</name>: Pizza by Mario or piece by Vivaldi?
   I don’t care (in this presentation)‫‏‬




Diederik Gerth van Wijk              Semantic Blessings of XSLT                  5
eXtensible Stylesheet Language - Transformations
        XSL: the eXtensible Stylesheet Language
        Family of three W3C recommendations for transformation and presentation
          XML Path Language (XPath)‫‏‬
          XSL Transformations (XSLT)‫‏‬
          XSL Formatting Objects (XSL-FO)

                            XSLT
                        stylesheet 1                                  XSL-FO
                                                                     document   XSL-FO processor   PDF




             XML
            source                        XSLT processor
         document(s)‫‏‬




                             XSLT                                                   HTML
                         stylesheet 2                                               pages



Diederik Gerth van Wijk                 Semantic Blessings of XSLT                                 6
XSLT characteristics
   An XSLT style sheet is an XML document
   Input is one or more XML documents
   Output is one or more XML (XSLT!), HTML, XSL-FO or plain text (CSS!) documents
   Style sheet can look like template of the result document (data pull)‫‏‬
   Or be event driven (data push)‫‏‬
   Elements and attributes are “events”
   Functional programming language
   Rule based
   Declarative
   No side effects
   Statements can be executed in any order
   Embeds XPath
   XSLT 2.0 and XPath 2.0 know XML Schema types
   XSLT 2.0 can compute from implicit structure

Diederik Gerth van Wijk            Semantic Blessings of XSLT                        7
XSLT engines
   stand alone:
       Saxon (open source, Michael Kay)‫‏‬
       Altova (free, XML Spy)‫‏‬
       MSXML
   on server:
       Saxon + .NET
       Altova + .NET
       MSXML + ASP
   built in browser:
       IE6 and higher
       FF1 and higher
       Opera9 and higher



Diederik Gerth van Wijk           Semantic Blessings of XSLT   8
What’s the competition?
   CSS (Cascading Style Sheets)‫‏‬
       Easier, simpler
       Don’t transform
   Perl, Python, Java, JavaScript, C(++), (V)Basic
       Generic programming or scripting languages
       No built in knowledge of XML, but lots of libraries for DOM or SAX
   JSP, ASP, PHP
       Server side processing
       Not really XML aware
       Little or no transformation
   IS-10179 DSSSL: Document Style Semantics and Specification Language
       SGML based
       Rarely used

Diederik Gerth van Wijk               Semantic Blessings of XSLT             9
XSLT and semantics...
    XML elements describe what the content is (semantics)‫‏‬
    XSLT stylesheets what to do (processing) with them
    How can a processing stylesheet be a semantic blessing?




Diederik Gerth van Wijk          Semantic Blessings of XSLT    10
Blessing 3: XSLT 2.0 may be schema aware
   A schema defines the semantics of a document type
   XSLT 2.0 is based on XPath 2.0
   XSLT 2.0 may use schemas
   Then, XPath 2.0 can use the type of element types or attributes
   So it can know whether to treat an attribute as string or as integer
    (”12” < ”3” if type is string, ”12” > ”3” if type is integer)‫‏‬
   But will it sort correctly:
    <song title=”50 ways to leave your lover” performer=”Paul Simon” />
    <song title=”1919 rag” performer=”Kid Ory” />
    or
    <king name=”Henry VIII” born=”1491-06-28” died=”1547-01-28” />
    <king name=”Henry IX” born=”1725-03-11” died=”1807-07-13” />
    (yes, if the roman numbers were coded as &#x2167; and &#x2168;)‫‏‬
   With the “instance of” operator you can use information that is not in the
    document, but is in the schema
   Therefore, XSLT 2.0 disencourages stand alone processing
   From a semantic point of view, that’s a blessing
Diederik Gerth van Wijk              Semantic Blessings of XSLT                  11
Blessing 4: Schema independent processing (1)‫‏‬
   In a sequence group, the order contains no information:
    (title, abbreviated-title?)            (1)
    is equivalent to
    (abbreviated-title?, title)            (2)‫‏‬
   Suppose, you want to print the abbreviated title if one is coded, and otherwise the
    full title
   In streamprocessing, the q&d solution might be as simple as:
    temp=getNextElement;
    if existsNextElement then write(getNextElement)
    else write(temp);                             (1)
    or
    write(getNextElement);                        (2)‫‏‬
   But what if you decide to change from order (1) to (2)?
   Or add an optional element toc-title?
    (title, abbreviated-title?, toc-title?)      (1)
    (toc-title?, abbreviated-title?, title)      (2)‫‏‬
   The simple program breaks


Diederik Gerth van Wijk               Semantic Blessings of XSLT                          12
Blessing 4: Schema independent processing (2)‫‏‬
   In XSLT, you have access to the elements by name, in arbitrary order
   The style sheet fragment looks like
    <xsl:choose>
     <xsl:when test="./abbreviated-title">
          <xsl:value-of select="abbreviated-title"/>
     </xsl:when>
     <xsl:otherwise>
          <xsl:value-of select="title"/>
     </xsl:otherwise>
    </xsl:choose>

   If the schema (and documents) change order, the stylesheet remains the same
   If an optional toc-title is added, the stylesheet remains the same
   Verbosity turns out to be simpler, in the long run
   By the way, if sequence matters in the document, it shouldn’t in the schema
   Reasons to prescribe sequence:
       to ease input
       to enforce cardinality



Diederik Gerth van Wijk               Semantic Blessings of XSLT                  13
Blessing 5: functional programming
   No variables
   Suppose you want to sort items alphabetically and do act on each new letter
   First idea:
    <xsl:variable name="PrevLetter" select="' '" />
    <xsl:for-each select="book">
            <xsl:sort select="title" data-type="text" order="ascending"/>

            <xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" />

            <xsl:if test="$PrevLetter!=$ThisLetter">

                  <H2><xsl:value-of select="$ThisLetter"/></H2>
            </xsl:if>

            <xsl:variable name="PrevLetter" select="$ThisLetter" />

            <H3><xsl:value-of select="title"/></H3>

    </xsl:for-each>

   No good: the value of the variable PrevLetter is reset in every iteration of the for-
    each loop




Diederik Gerth van Wijk                Semantic Blessings of XSLT                           14
Would this work?
<xsl:for-each select="book">
    <xsl:sort select="title" data-type="text" order="ascending"/>

    <xsl:variable name="PrevLetter" select="substring(preceding-sibling::book[1]/title/.[1],1,1)"
       />

    <xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" />

    <xsl:if test="$PrevLetter!=$ThisLetter">

            <H2><xsl:value-of select="$ThisLetter"/></H2>
    </xsl:if>

    <H3><xsl:value-of select="title"/></H3>

</xsl:for-each>

   Better, but the function   preceding-sibling   operates on the original order, not on the
    sorted...
   Is that a bug or a feature?
   It’s a blessing!




Diederik Gerth van Wijk               Semantic Blessings of XSLT                                15
The solution
<xsl:for-each-group select="book" group-by="substring(title/.[1],1,1)">
    <H2><xsl:value-of select="current-grouping-key()"/></H2>

    <xsl:for-each select="current-group()">

            <xsl:sort select="title" data-type="text" order="ascending"/>

            <H3><xsl:value-of select="title"/></H3>

    </xsl:for-each>

</xsl:for-each-group>

   Think XML
   Think in creating hierarchies: groups of titles starting with the same letter




Diederik Gerth van Wijk               Semantic Blessings of XSLT                    16
The ultimate semantic normalisation
   “PCDATA considered harmful” (Han Nonnekes, Shell Oil)‫‏‬
   Text is the outer structure in a specific language of a deeper meaning
   You should encode a text as that deeper tree
   With references to abstract words (concepts)‫‏‬
   For each language (“English, upper class, around 1850”) give dictionary and
    transformation rules
   Then generate the text




Diederik Gerth van Wijk           Semantic Blessings of XSLT                      17
Questions?
   Ask me now
   Ask me during lunch or tea break
   Ask me during buffet
   Mail dg@doxatrix.nl
   Presentation can be downloaded from
       www.xmlholland2008.nl
       www.doxatrix.nl/dg




Diederik Gerth van Wijk          Semantic Blessings of XSLT   18
Diederik Gerth van Wijk   Semantic Blessings of XSLT   19

								
To top