Docstoc

Layered Architecture for the WWW

Document Sample
Layered Architecture for the WWW Powered By Docstoc
					Internet-based Application Architectures
           for the 21 Century:
            The Role of XML

            Henry S. Thompson
      HCRC Language Technology Group
          Division of Informatics
          University of Edinburgh
                                                                       2
    Introduction
       It's the Web, stupid!
         E-commerce,  E-business, E-finance
         Web servers in your microwave
         ADSL by Easter
       Seriously, in the Academy
         Virtual communities of effort multiply our
          effectiveness
            – Shared resources
            – Shared tools
       Enfranchisement is the key
         Everybody     can play
            – Whether Bill lets them or not
Language Technology Group        Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                            3
    XML is ASCII for the 21st century
       ASCII (ISO 646) solved a fundamental
        interchange problem for flat text documents
         What   bits encode what characters
            – (For a pretty parochial definition of 'character')
     UNICODE/ISO 10646 extends that solution
      to the whole world
     XML thought it was doing the same for simple
      tree-structured documents
         The emphasis in the XML design was on
          simplifying SGML to move it to the Web
         XML didn't touch SGML's architectural vision
            – flexible linearisation/transfer syntax
            – for tree-structured documents with internal links
Language Technology Group       Dagstuhl, 2000-03-20         Henry S. Thompson
                                                                    4
    Digression: Just what is XML?
     It's a markup language used for annotating text
     It is concerned with logical structure
         toidentify sections, titles, section headers, chapters,
          paragraphs,…
       It is not concerned with appearance
         you  say 'this is a subtitle'
          not 'this is in bold, 14pt, centered'
         you say 'this is an example'
          not 'this is in verbatim, indented by 5pts, ragged
          right'

Language Technology Group   Dagstuhl, 2000-03-20     Henry S. Thompson
                                                                    5
    Why is XML a big deal?
     It is an official W3C Recommendation
     It is vendor-independent, platform
      independent, application independent,…
               Word documents, RTF documents, PDF
         unlike
          documents, Postscript documents,…
       It is human readable
         ditto   (for most values of 'human')




Language Technology Group     Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  6
    Unformatted text
    Internet-based Application Architectures for the
    21st Century:
    The Role of XML
    Let's skip straight to an example of XML syntax for
    a simple bit of structure:
    <tip><emph>Never</emph> stand up in a canoe!</tip>




Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                          7
    Formatted text
     Internet-based Application Architectures for
                  the 21st Century:
                               The Role of XML
    Let's skip straight to an example of XML syntax for a simple bit of
    structure:
       <tip><emph>Never</emph> stand up in a
       canoe!</tip>




Language Technology Group      Dagstuhl, 2000-03-20        Henry S. Thompson
                                                                  8
    XML marked up text
    <article>
     <title> Internet-based Application Architectures
    for the 21st Century: </title>
     <subtitle>The Role of XML</subtitle>
     <section>
      <para> Let's skip <emph>straight</emph> to an
    example of XML syntax for a simple bit of
    structure:</para>
      <example> &lt;tip>&lt;emph>Never&lt;/emph> stand
    up in a canoe!&lt;/tip></example>
      </para>
     </section>
    </article>




Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  9
    Connecting structure and form
     There is a stylesheet langauge called XSLT
      which will allow us to write simple style rules
      which will produce the formatted presentation
      from the structured version
     For example
        <template match='emph'>
          <I><apply-templates/></I>
        </template>
        will do part of the Transformation job


Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                        10
    XML vs HTML
       Isn’t XML just like HTML?
         Like XML, you can sometimes use HTML to mark
          up things for content rather than appearance,
            – e.g. <H2>This is a subtitle</H2>
              appearance of <H2> is defined elsewhere, usually by
              someone else
         but
            – a lot of HTML markup is for appearance
              e.g. <I>this is italics</I>, <B>this is in
              bold</B>
            – you couldn’t markup <RECIPIENT>Some
              names</RECIPIENT>
         Ifyou know HTML, easy to understand basic XML
          syntax
Language Technology Group     Dagstuhl, 2000-03-20       Henry S. Thompson
                                                                    11
    XML vs SGML
       SGML is more complicated than it need be
         because  it was designed in the old days
          (12 years ago!)
       XML is a simplified subset of SGML
         much  less minimisation
         makes processing easier
         qua complexity: sits somewhere between HTML
          and SGML




Language Technology Group   Dagstuhl, 2000-03-20     Henry S. Thompson
                                                                  12
    Who is in charge of XML?
     XML is a W3C Recommendation
     The W3C is The World Wide Web Consortium, a
      voluntary association of companies and non-
      profit organisations. Membership costs serious
      money, confers voting rights. Complex
      procedures, with the Chairman (Tim Berners-
      Lee) having ultimate authority, guided by a
      committee of the whole called the Advisory
      Council.
     The XML recommendation was written by the
      W3C’s XML Working Group.

Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                   13
    Digression, v.2: Just what is XML?
     It's a markup language used for transferring
      data
     It is concerned with data models
         to convert between application-appropriate and
          transfer-appropriate forms
       It is not concerned with human beings
         It's   produced and consumed by programs




Language Technology Group    Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                        14
    XML as UI
     A slogan of Adam Bosworth
     I interpret it in two ways:
         At   the client end
            – Use XML plus XSL as the basis for what the user sees
              on his/her screen
            – Use XLinks from a master document to pull together
              disparate sources of information
         At   the server end
            – Use XML as a uniform interface for any data source onto
              the web
            – Not just documents, but E.g. Databases, process control
              information, stock quotes
Language Technology Group       Dagstuhl, 2000-03-20    Henry S. Thompson
                                                                  15
    Application data




Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  16
    Structured markup
    <POORDERHDR>
      <DATETIME qualifier="DOCUMENT">
       <YEAR>1996</YEAR>
        <MONTH>06</MONTH>
        <DAY>30</DAY>
        <HOUR>23</HOUR>
        <MINUTE>59</MINUTE>
        <SECOND>59</SECOND>
        <SUBSECOND>0000</SUBSECOND>
        <TIMEZONE>+0100</TIMEZONE>
       </DATETIME>
       <OPERAMT qualifier="EXTENDED" type="T">
        <VALUE>670000</VALUE>
        <NUMOFDEC>2</NUMOFDEC>
        <SIGN>+</SIGN>
        <CURRENCY>USD</CURRENCY>
      . . .

Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                         17
    What just happened!?
     The whole transfer syntax story just went meta,
      that's what happened!
     XML has been a runaway success, on a much
      greater scale than its designers anticipated
         Not    for the reason they had hoped
            – Because separation of form from content is right
         But   for a reason they barely thought about
            – Data must travel the web
       Tree structured documents are a useable
        transfer syntax for just about anything
         So data-oriented web users think of XML as a
          transfer mechanism for their data

Language Technology Group     Dagstuhl, 2000-03-20        Henry S. Thompson
                                                                  18
    The Cambridge Communiqué
     A W3C Note resulting from a meeting this
      August (http://www.w3.org/TR/schema-arch)
     Signalled a widespread acceptance of layering:
        "XML has defined a transfer syntax for tree-
         structured documents;
        "Many data-oriented applications are being defined
         which build their own data structures on top of an
         XML document layer, effectively using XML
         documents as a transfer mechanism for structured
         data; "



Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  19
    The Communiqué, cont'd
     Called for support in XML Schema for
      specifying mapping between the XML
      document data model (or XML Infoset) and
      application-specific data models
     XML Schema is a W3C recommendation-in-
      progress for definiing the structure of
      document families
     A grammar for markup structure
     E.g.
         artice -> title, subtitle?, section+
    or
         POORDERHDR -> DATETIME, ORDERAMT

Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  20
    XML Schema: some details
     Fortunately, XML Schema is actually notated in
      XML itself
     So there are elements defined for use in
      schemas to define. . .
         Elements :-)
         Attributes
         Types
     A type is a collection of constraints on element
      content and attribute values
     A type may be either
         simple, for constraining string values
         complex, for constraining elements which contain
          other elements
Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  21
    A simple example
    <!ELEMENT text (#PCDATA|emph|name)*>
    <!ATTLIST text
            timestamp NMTOKEN #REQUIRED>

    <xs:element name="text">
     <xs:complexType content="mixed">
      <xs:element ref="emph"/>
      <element ref="name"/>
      <xs:attribute name="timestamp"
                    type="date"
                    minOccurs="1"/>
     </xs:complexType>
    </xs:element>
Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  22
    Richer type definition example
    <xs:complexType name='personName'>
     <xs:element name='title'
                 minOccurs='0'/>
     <xs:element name='forename'
                 minOccurs='0'
                 maxOccurs='unbounded'/>
     <xs:element name='surname'/>
     <xs:attribute name='id'
                   type='integer'/>
    </xs:complexType>

    <xs:element name='owner'
                type='personName'/>

Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                      23
    Mapping between layers
       We can think of this in two ways
         In   terms of an abstract data modelling language
            – Entity-Relation
            – UML
            – RDF
         In   concrete implementation terms
            – Tables and rows
            – Class instances and instance variables
     The first is more portable
     The second more immediately useful




Language Technology Group       Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                   24
    Mapping between layers 2
       Regardless of what approach we take, we need
        A  vocabulary of data model components
         An attachment of that vocabulary to schema
          components
       Sample vocabularies
         entity, relationship, collection
         table, row, column
         instance, variable, list, dictionary
       Where should attachment be specified?
         In   the schema
            – convenient
         Outside   it
            – modular
Language Technology Group    Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  25
    Specifying mapping in the schema
     Probably reasonable if done in high-level (e.g.
      RDF, UML, ER) terms
     See example infoset-xmpl.xml, infoset-uml.xsd




Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                  26
    Specifying mapping outside
     Requires some duplication of structural
      information
     Encourages cross-language working
     XSLT is the obvious candidate
     See example infoset-xmpl.xsl




Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson
                                                                          27
    Compile the Mapping
       Perhaps we can get the benefits of both
        approaches
         Annotate the schema
         Compile the annotations into XSLT
            – With bindings for separate implementations
       Semi-structured data has a role to play here
                    when the data model antecedes the
         Particularly
          document model




Language Technology Group     Dagstuhl, 2000-03-20         Henry S. Thompson
                                                                  28
    Take-home message
     The point at which idiosyncratic scripting takes
      over can be moved one layer up
     Using public consensual declarative standards is
      a Good Thing
     Interoperability makes things better for
      everyone




Language Technology Group   Dagstuhl, 2000-03-20   Henry S. Thompson

				
DOCUMENT INFO