; XML and XSLT
Learning Center
Plans & pricing Sign in
Sign Out



  • pg 1

eXtensible Markup Language
      Simon M. Lucas
•   Brief History
•   Why XML?
•   What is XML?
•   Elements and attributes
•   DTDs
•   XML processing
    – With Java
    – With JavaScript E4X
                 What is XML?
• XML is a metadata language - a language for
  providing data about data
• W3C standard around 1998
• It looks a bit like HTML, but with XML the tags are
  user-defined and therefore extensible
• HTML marks up logical presentation
• CSS specifies presentation style
• XML marks up meaning (semantics)
                      Why XML?
• Separates content from presentation
• General - can be applied to anything
• Adds value to semi-structured data
   – E.g. Product Catalogue
• Enables an enterprise to mark up all its data
• Using XML greatly simplifies encoding of data
   – (c.f. ad hoc text representations)
• Ubiquitous - everybody is using it!
            Where does XML fit?
• Why not put everything in a relational or OO
• XML is a global standard:
   – offers better information transfer between different
     applications and enterprises than proprietary databases
• XML is flexible and easily applied
   – (which also presents dangers - data does NOT become
     more valuable just because it is marked up in XML - the
     XML structures have to be well designed).
                  Data Centric or
                 Document Centric?
• Data centric
   – Used in web services
   – Communication between applications
   – Data export from databases
• Document centric
   – To add meaning to semi-structured documents
   – E.g. content for web pages, lecture notes, product
• Emerging XML databases such as Xindice
  http://xml.apache.org/xindice/ store XML directly
  (don’t have to map to relational DB)
             XML Basic Syntax
• An XML document consists of a number of
  declarations followed by a tree of elements.
• Each element is delimited between begin and end
• Each element may contain attributes
• Elements may contain text or other elements (or a
  mixture of the two)
• Attributes may only contain text
               XML Element
•   Has a name
•   Has a begin tag <elementName>
•   Then text and/or child elements
•   Has an end tag </elementName>
•   E.g. <name> Simon </name>
•   Elements can also be empty
•   E.g. <person name=“Simon” />
         Well-Formed and Valid
• Elements tags must be properly nested
  – E.g. <a> <b> text </b> </a> is ok
  – But <a> <b> text </a> </b> is NOT
• Attribute values enclosed in string quotes
• A document where all the tags are properly
  nested is well-formed
• If a document is well-formed, and obeys the
  syntax rules of a specified DTD, then it is also
          Elements or Attributes
• Information can either be stored in elements or
• Structured information is stored in elements
• Primitive information (i.e. a single atomic value or list
  of values) can either be stored in an element or an
• Perhaps better to store primitives in attributes
                XML Attributes
• Element start tags may also contain attributes
• An attribute consists of an attribute name followed
  by an attribute value
• Attributes are only allowed in the start tags
• E.g.:
   <person email=“sml@essex.ac.uk”>
  Document Type Definition (DTD)
• Provides a concise way to specify the syntax of
  a given document type
• Declares how the elements can include other
• And the attributes allowed for each element
• Special operators specify the order and
  cardinality of each item (see below)
        DTD Symbols: Elements

Operator                  Meaning

   +       One or more times

    *      Zero or more times

   ?       Zero or once

    |      Or: (a | b)? Either a or b or nothing

  (no     Exactly once: (a , b) Exactly one a
operator) followed by exactly one b
           CDATA and PCDATA
• CDATA – Character Data
• Attributes declared with CDATA may contain
  any text characters
• PCData – Parsed Character Data
• Elements declared PCDATA do not contain
  other elements
  – i.e. no other mark-up within them
• In tree-terms, these are LEAF-nodes
    DTD for Address Book Example
<!-- DTD for simple address book -->
<!ELEMENT AddressBook (Title, Person*)>

• Tip: Enter the Address Book DTD and XML as files in Intellij, then
  use the tools -> validate command to perform validation on the
• Try to modify the DTD and/or XML document to make it invalid.
        Address Book – XML
  <Title>Simon's address book</Title>
  <Person name="Simon“
          email="sml@essex.ac.uk" />
  <Person name="Anna" />
       Alternative Address Book
• What about this version:
  <Simon email=“sml@essex.ac.uk” />
  <Anna email=“thewife@gmail.com” />
• Is it well formed?
• Is it valid (with respect to previous DTD?)
• Is it well designed?
         Creating XML with JDOM
                (JDOM – Java API for XML)
public static void main(String[] args) throws Exception {
       Element root = new Element("AddressBook");
       Element title = new Element("Title");
       title.setText("Simon's address book");
       Element e1 = new Element("Person");
       Element e2 = new Element("Person");
       e1.setAttribute("name", "Simon");
       e1.setAttribute("email", "sml@essex.ac.uk");
       e2.setAttribute("name", "Anna");
       XMLOutputter out = new
         XMLOutputter( Format.getPrettyFormat());
       out.output(root, System.out);
       Produced the following:
  <Title>Simon's address book</Title>
  <Person name="Simon"
    email="sml@essex.ac.uk" />
  <Person name="Anna" />
     Reading and Processing XML
 public static void main(String[] args) throws Exception {
        String infile = args[0];
        SAXBuilder builder = new SAXBuilder();
        InputStream is = new FileInputStream(infile);
        Document doc = builder.build(is);
        Element root = doc.getRootElement();
        // now print the names and emails in plain text
        for (Element el :
               (List<Element>) root.getChildren()) {

      ----------- Produces --------------------------
[Attribute: name="Simon"]
[Attribute: name="Anna"]
• XHTML is a stricter version of HTML
• Tags must have begin/end pairs
    – E.g. <p> … </p> and not just <p>
• Tags must be properly nested
• Attribute values must be in string quotes
• Document must have a single root element
• MS Frontpage can apply XML formatting rules to
  comply with this
• Then makes info very easy to manipulate
        Example Benefit of XHTML
•   Web site construction
•   If all pages are in XHTML
•   Can be edited with WYSIWYG editor
•   And manipulated with JSP / JDOM / E4X
•   I use this method for web site construction:
    – Example: http://cigames.org
• This allows a single master page
    – To include selected parts of content pages
• Less effort than adding <jsp:include> tags to each
  content page
• Simple instance of MVC architecture
             XML and JavaScript
• Can use XML within JavaScript
   – JavaScript is also known as ECMAScript
• Easiest to use E4X
   – ECMAScript 4 XML
• Can treat XML fragments as native parts of the
• Supported in Rhino (JavaScript implemented in Java)
   – Hence can be executed on the server / stand-alone
   – But on server, does not have access to Browser DOM
• And in Firefox (e.g. 2.1)
   – Enables concise generation of HTML
                E4X: Native XML
• Fantastic!
   – Write XML mark-up
   – Then directly instantiate object models of the XML
   – And navigate using ‘dot’ notation etc.
• Following examples: adapted from the
  e4x_example.js file that comes with the Rhino
• Can be executed on Server using Rhino
• Or in a compatible web browser (e.g. Firefox)
• Note that ‘print’ is a utility method defined in the Shell
  program that comes with Rhino
         Making an XML Structure
var order = <order>
      <description>Big Screen Television</description>
      Accessing with ‘.’ Notation
// Construct the full customer name
var name = order.customer.firstname + " " +

// Calculate the total price
var total = order.item.price * order.item.quantity;
      Construction with Expressions
Contents of curly braces are evaluated as expressions e.g.
var   tagname = "name";
var   attributename = "id";
var   attributevalue = 5;
var   content = "Fred";

var x = <{tagname}

Exercise: write the XML that this produces (i.e. that
  ‘x’ is bound to after executing the above).
                    Data Selection
var e = <employees>
   <employee id="1"><name>Joe</name><age>20</age></employee>
   <employee id="2"><name>Sue</name><age>30</age></employee>
   <employee id="3"><name>Simon</name><age>25</age></employee>

// get all the names in e
print("All the employee names are:\n" + e..name);

// employees with name Joe
print("The employee named Joe is:\n" + e.employee.(name ==

// employees with id's 1 & 2
print("Employees with ids 1 & 2:\n" + e.employee.(@id == 1 ||
   @id == 2));

// name of employee with id 1
print("Name of the employee with ID=1: " + e.employee.(@id ==
All the employee names are:
The employee named Joe is:
<employee id="1">
Employees with ids 1 & 2:
<employee id="1">
<employee id="2">
Name of the employee with ID=1: Joe
// calculate the average age of all employees
// based on previous employee data
var totalAge = 0.0;
var nEmps = 0.0;
for each (i in e.employee) {
    totalAge += 1.0 * i.age;
print("Average age of all employees: " +
       totalAge / nEmps);

Average age of all employees: 25
       Simplified HTML Generation
        (Tested in Firefox Browser)
<script type="text/javascript">
    function myFunc() {
        var el = document.getElementById('test');
        var table = <table><tr>
        var from = 10;
        var to = 12;
        for (var i=from; i<=to; i++) {
          table.tr += <tr> <td> {i} </td>
            <td> { i * 9.0/5 + 32 } </td> </tr>;
        el.innerHTML = table;
    Some Exciting XML Applications
• Word processing
  – (e.g. Syntext Serna)
• Web Application Programming
  – XForms
• News feeds
  – RSS (Really Simple Syndication)
• Mathematics
  – MathML
   –   Simple yet powerful
   –   Will become more and more widespread
   –   Java APIs such as JDOM allow easy XML processing in Java
   –   E4X – even easier!
   –   Also see AJAX
• Many challenges ahead
   – In order to get greatest benefit
   – Common standards are required
   – Is there a common XML standard for a delivery address yet? UK?
•   Design an XML markup for a simple product catalogue
•   Each product has a name, price and manufacturer
•   Each manufacturer has a name and homepage-URL
•   Include sample XML + DTD in your solution

To top