Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

XML, DOM and Visitor Design Pattern by sus16053


									                                                                                  Markup languages
                                                                             A markup language is used to tell a printer (a
                                                                             person!) how to lay out text on the page.

       XML, DOM and                                                          SGML: from about 1980
                                                                                 usual complaint: “too heavyweight”
    Visitor Design Pattern                                                         means “hard”
                                                                             HTML: much looser, therefore many users
                                                                             XML: structure allows description of data
                                                                                 need description of “tags”

          An HTML example                                                    From HTML to XML
    <?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="" xml:lang="en" lang="en">
                                                                         •   Factors leading to the creation of XML
        <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
                                                                         •   Problems with HTML
             CSC207H: E5
                                                                             •   primarily presentation
    </head>                                                                  •   hard to derive meaning from the markup

        <h1>                                                                     fixed tag set
             CSC207H: E5
                                                                         •   Web browsers were being viewed as potential
             <strong>Due date: 10:00 a.m., Thursday, March 11, 2010.</
                                                                             application platforms
                Basic Format                                           Rules for well-formed XML
•   Element: <tag>content</tag>                                           •   Elements that contain data must have start and end tags

                                                                          •   Empty tags must be closed
    •   basic unit
                                                                              •   <br /> or <br> </br>
    •   tag name defines what the content is
                                                                          •   Elements should not overlap
    •   opening and closing tags enclose content
                                                                              •   Bad Nesting: <trunk> <branch> </trunk> </branch>
•   Attribute: Information about the data
                                                                          •   All attribute values must be wrapped in quotes

    •   Attribute names are usually adjectives                                •   <a href="newpage.html">

    •   Stored as attribute="value" pairs:                                •   XML is case sensitive (unlike HTML): <TAG> and <Tag> are
                                                                              treated differently.
        •   <tag colour="red">
        •   </tag>
                                                                              •   Standard: use lower case.

                  More Rules                                           Document Object Model (DOM)
•   A document begins with:
                                                                          •   Cross-language API for representing XML
    •   an XML Declaration                                                    documents as trees
    •   ! <?xml version="1.0" encoding="UTF-8"?>
                                                                              •   Easier to manipulate than strings or streams
    •   and perhaps a DocType Declaration:
                                                                              •   But may require a lot of memory
        •   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
            Strict//EN"                                                   •   Several implementations in Java
        •   "">

•   Root element immediately follows; encloses entire content of the
                                                                              •   E.g., org.jdom
    document.                                                             •   In Python, xml.dom is standard
        •   <book>
                                                                              •   xml.dom.minidom doesn’t have everything,
        •     everything that’s part of the book
                                                                                  but is easy to use and fast.
        •   </book>
                Tree Structure                                               DOM Rules
                                                            •   Every document becomes an object of type Document

Let’s look at this document:
                                                            •   This has a single child of type Element

                                                                •   The root element of the document

                                                            •   Its children may be:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"        •   Other elements
                                                                •   Text objects

  <body>                                                        •   Other things that we won't worry about
      <p>A <em>word</em></p>                                •   Note: white space is preserved
                                                                •   For example, the newlines in the previous slide

                                                            •   But comments are not

                   Using JDom                                   Iterate over children
public static void main(String[] args) {
                                                                                                      Get root
    try {
      String filename = args[0];
                                       Build the           // Show top-level elements
                                       DOM tree            Element root = doc.getRootElement();
       // Build document tree                              Iterator ic = root.getChildren().iterator();
       SAXBuilder builder = new SAXBuilder();              // (jdom isn’t 1.5-happy.)
       Document doc =;                                                       Get all children
                                                           while (ic.hasNext()) {                    (excluding text)
       // Show top-level elements (next slide)
                                                              Element elt = (Element);
     } catch (Exception e) {                                  System.out.println(elt.getName());
       System.err.println(e);                              }
            Input and output                          The Visitor Pattern
<?xml version=“1.0” ?>
                               Document    •   Often want to operate on a tree recursively

<h1>First heading</h1>                             Count elements, search for text that matches a pattern,
                                book               etc.
paragraph.</em></p>                        •   Mechanics of traversing is the same every time
                          h1     p     p   •   So build a generic visitor that knows how to traverse the
book                             em em         •   Give it do-nothing methods that are invoked at specific
  h1                                               times during traversal
       em                                      •   Users derive from this class and override the methods
  p                                                they're interested in

To top