DOM by niusheng11

VIEWS: 20 PAGES: 31

									3.2 Document Object Model (DOM)

   How access structured documents uniformly in
    parsers, browsers, editors, databases,...?
   Overview of the W3C DOM Spec
            » Level 1, W3C Rec, Oct. 1998
            » Level 2, W3C Rec, Nov. 2000
            » Level 3 in progress (as 21 modules!);
              Validation, Core, and Load and Save
              Recommendations (Spring 2004)




SDPL 2007                 3.2: Document Object Model   1
DOM: What is it?
   An object-based, language-neutral API for
    XML and HTML documents
    – Allows programs and scripts to build, access, and
      modify documents
    – Supports the development of
               querying, filtering,
               transformation, formatting etc.
      applications on top of DOM implementations
   In contrast to “Serial Access XML” could think
    as “Directly Obtainable in Memory”
SDPL 2007           3.2: Document Object Model            2
DOM structure model
   Based on O-O concepts:
    – methods (to access or change object’s state)
    – interfaces (declaration of a set of methods)
    – objects (encapsulation of data and methods)
   Roughly similar to the XSLT/XPath data model (to be
    discussed later)  syntax-tree
    – Tree structure implied by abstract relationships defined
      by the API; Data structures of an implementation may
      differ (but hardly do(?))




SDPL 2007             3.2: Document Object Model             3
<invoice form="00"
           type="estimated">                    DOM structure model
 <addressdata>
    <name>John Doe</name>
    <address>
      <streetaddress>Pyynpolku 1
      </streetaddress>
                                                                form="00"
      <postoffice>70460 KUOPIO             invoice              type="estimated"
      </postoffice>
    </address>
  </addressdata>
  ...                               addressdata                   ...

                                                            address
                         name
    Document
                       John Doe           streetaddress                   postoffice
     Element
                                           Pyynpolku 1                  70460 KUOPIO
      Text

        SDPL 2007
 NamedNodeMap                      3.2: Document Object Model                          4
Structure of DOM Level 1

I: DOM Core Interfaces
    – Fundamental interfaces
            » basic interfaces: Document, Element, Attr, Text, ...
    – "Extended" (XML specific) interfaces
            » CDATASection, DocumentType, Notation, Entity,
              EntityReference, ProcessingInstruction
II: DOM HTML Interfaces
    – more convenient access to HTML documents
    – (we'll ignore these)


SDPL 2007                  3.2: Document Object Model                5
DOM Level 2
    – Level 1: basic representation and manipulation of
      document structure and content
      (No access to the contents of a DTD)
   DOM Level 2 adds
    – support for namespaces
    – accessing elements by ID attribute values
    – optional features (we’ll skip these)
        » interfaces to document views and style sheets
        » an event model (for, say, user actions on elements)
        » methods for traversing the document tree and manipulating
          regions of document (e.g., selected by the user of an editor)
    – Load/Save of documents not specified (until Level 3)
SDPL 2007                 3.2: Document Object Model                 6
DOM Language Bindings

   Language-independence:
    – DOM interfaces are defined using OMG Interface
      Definition Language (IDL; Defined in Corba
      Specification)
   Language bindings (implementations of
    interfaces) defined in the Recommendation for
    – Java (See the Java API doc) and
    – ECMAScript (standardised JavaScript)


SDPL 2007          3.2: Document Object Model          7
Core Interfaces: Node & its variants

Node

        Document   DocumentFragment                  Element   Attr

        CharacterData

        Comment    Text              CDATASection          “Extended
                                                           interfaces”


       DocumentType        Notation              Entity


       EntityReference          ProcessingInstruction
SDPL 2007               3.2: Document Object Model                    8
Node
getNodeType, getNodeName,                         DOM interfaces: Node
getNodeValue
getOwnerDocument
getParentNode
hasChildNodes, getChildNodes
                                                                  form="00"
getFirstChild, getLastChild                   invoice             type="estimatedbill"
getPreviousSibling, getNextSibling
hasAttributes, getAttributes
appendChild(newChild)
insertBefore(newChild,refChild)
replaceChild(newChild,oldChild)                                          ...
removeChild(oldChild)                   addressdata


                       name                                       address
    Document
                      John Doe                streetaddress                    postoffice
     Element

      Text                                     Pyynpolku 1              70460 KUOPIO

        SDPL 2007
 NamedNodeMap                        3.2: Document Object Model                             9
    Type and Name of a Node

    node.getNodeType():
     short int constants 1, 2, …, 12 for
        Node.ELEMENT_NODE,
        Node.ATTRIBUTE_NODE,
        Node.TEXT_NODE, …
    node.getNodeName()
      – for an Element = node.getTagName()
      – for an Attr: the name of the attribute
      – for anonymous nodes:
         "#text", "#document", "#comment" etc
    SDPL 2007        3.2: Document Object Model   10
    The Value of a Node

    node.getNodeValue()
      – content of a text node,
        value of attribute, …;
        null for an Element (!!)
      – (in XSLT/XPath the value of a node is its full textual
        content)
      – DOM 3 gives access to full textual content with the
        method
          node.getTextContent()



    SDPL 2007            3.2: Document Object Model          11
Object Creation in DOM
   Each DOM Node n belongs to a Document:
    n.getOwnerDocument()
   Objects implementing interface X are created
    by factory methods
              doc.createX(…) ,
    where doc is a Document object. E.g:
      doc.createElement("A"),
      doc.createAttribute("href"),
      doc.createTextNode("Hello!")
   Loading & saving specified in DOM3 (or via
    implementation-specific methods , or JAXP)

SDPL 2007            3.2: Document Object Model   12
   Node                         DOM interfaces: Document

Document
getDocumentElement
getElementById(IdVal)                                      form="00"
                                       invoice             type="estimated"
getElementsByTagName(tagName)

createElement(tagName)
createTextNode(data)                                          ...
                              addressdata


                                                     address
                     name
  Document
                   John Doe        streetaddress                    postoffice
   Element
                                    Pyynpolku 1               70460 KUOPIO
    Text

NamedNodeMap
       SDPL 2007              3.2: Document Object Model                         13
 Node
                                  DOM interfaces: Element
Element
getTagName()

hasAttribute(name)                         invoice
getAttribute(name)
                                                                   form="00"
setAttribute(attrName, value)           invoicepage                type="estimatedbill"
removeAttribute(name)
                                         addressee
getElementsByTagName(name)

                                        addressdata

   Document
                         name                                      address
   Element
                       John Doe               streetaddress                  postoffice
    Text

NamedNodeMap
       SDPL 2007                  3.2: Document Pyynpolku
                                                Object Model   1        70460 KUOPIO 14
Text Content Manipulation in DOM

   for an object c that implements the
    CharacterData interface
    (Text, Comments, CDATASections):
    –   c.substringData(offset, count)
    –   c.appendData(string)
    –   c.insertData(offset, string)
    –   c.deleteData(offset, count)
    –   c.replaceData(offset, count, string)
        ( = c.deleteData(offset, count);
            c.insertData(offset, string) )


SDPL 2007            3.2: Document Object Model   15
Additional Core Interfaces (1)

   NodeList for ordered lists of nodes
    – e.g. from Node.getChildNodes() or
      Element.getElementsByTagName("name")
            » all descendant elements of type "name" in document
              order ("*" matches any element type)
   Accessing a specific node, or iterating over all
    nodes of a NodeList:
     – E.g., to process all children of node:
        for (i=0;
         i<node.getChildNodes().getLength();
         i++)
         process(node.getChildNodes().item(i));
SDPL 2007                  3.2: Document Object Model              16
Additional Core Interfaces (2)

   NamedNodeMap for unordered sets of nodes
    accessed by their name:
    – e.g. from Node.getAttributes()
   NodeLists and NamedNodeMaps are "live":
    – updates of the document structure are reflected to
      their contents
    – e.g., this would delete every other child of node n:
        NodeList cList = n.getChildNodes();
        for (i=0; i<cList.getLength(); i++)
              n.removeChild(cList.item(i));

            » That’s strange! (What happens?)

SDPL 2007                  3.2: Document Object Model   17
DOM: XML Implementations
   Java-based parsers
    e.g. Apache Xerces, Apache Crimson, …
   In MS IE browser: COM programming interfaces for
    C/C++ and Visual Basic; ActiveX object
    programming interfaces for script languages
   Perl: XML::DOM (Implements DOM Level 1)
   Others? APIs for other applications than parsers?
    – Vendors of different kinds of systems have participated in
      the W3C DOM WG




SDPL 2007               3.2: Document Object Model                 18
 A Java-DOM Example
    Command-line tool RegListMgr for
     maintaining a course registration list
     – with single-letter commands for listing, adding,
       updating and deleting student records
    Example:
$ java RegListMgr reglist.xml
Document loaded succesfully
> l                           list the contents
…
40: Tero Ulvinen, TKM1, tero@fake.addr.fi, 2
41: heli viinikainen, tkt5, heli@fake.addr.fi, 1


 SDPL 2007            3.2: Document Object Model          19
Registration list: the XML file

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE reglist SYSTEM "reglist.dtd">
<reglist lastID="41">
  <student id="RDK1">
    <name><given>Juho</given>
       <family>Ahopelto</family></name>
    <branchAndYear>TKT4</branchAndYear>
    <email>juho@fake.addr.fi</email>
    <group>2</group>
  </student>
  <!-- … and the other students … -->
</reglist>
SDPL 2007       3.2: Document Object Model    20
Registration List: the DTD
<!ELEMENT reglist (student*)>
<!ATTLIST reglist
          lastID CDATA #REQUIRED >
<!ELEMENT student
         (name, branchAndYear, email, group)>
<!ATTLIST student
          id ID #REQUIRED >
<!ELEMENT name (given, family)>
<!ELEMENT given (#PCDATA)>
<!-- … and the same for family,
  branchAndYear, email,and group -->

SDPL 2007       3.2: Document Object Model   21
Loading and Saving the RegList

   Loading of the registration list into DOM
    Document doc implemented with a
    JAXP DocumentBuilder
    – assume this has been done: doc is a
      handle to the Document
   Saving implemented with a
    JAXP Transformer
   … to be discussed later
SDPL 2007        3.2: Document Object Model   22
Listing student records (1)
NodeList students =
  doc.getElementsByTagName("student");
  for (int i=0; i<students.getLength(); i++)
    showStudent((Element) students.item(i));
private void showStudent(Element student) {
  // Collect relevant sub-elements:
  Node given =
    student.getElementsByTagName("given").item(0);
  Node family = given.getNextSibling();
  Node bAndY = student.
    getElementsByTagName("branchAndYear").item(0);
  Node email = bAndY.getNextSibling();
  Node group = email.getNextSibling();
SDPL 2007         3.2: Document Object Model    23
Listing student records (2)

  // Method showStudent continues:
  System.out.print(
     student.getAttribute("id").substring(3));
  System.out.print(": " +
     given.getFirstChild().getNodeValue() );
     // or given.getTextContent() with DOM3
  // .. similarly access and display the
  // value of family, bAndY, email, and group
  // …

} // showStudent

SDPL 2007          3.2: Document Object Model   24
 Adding New Records

    Example:                                   add students
> a
First name (or <return> to finish): Antti
Last name: Ahkera
Branch&year: tkt3
email: antti@fake.addr.fi
group: 2
First name (or <return> to finish):
Finished adding records
> l
 …
41: heli viinikainen, tkt5, heli@fake.addr.fi, 1
42: Antti Ahkera, tkt3, antti@fake.addr.fi, 2
 SDPL 2007         3.2: Document Object Model                  25
Implementing addition of records (1)

Element rootElem = doc.getDocumentElement();
String lastID = rootElem.getAttribute("lastID");
int lastIDnum = java.lang.Integer.parseInt(lastID);
System.out.print(
     "First name (or <return> to finish): ");
String firstName =
           terminalReader.readLine().trim();
while (firstName.length() > 0) {
  // Get the next unused ID:
  ID = "RDK" + new Integer(++lastIDnum).toString();
  // … Read values lastName, bAndY, email,
  // and group from the terminal, and then ...

SDPL 2007         3.2: Document Object Model     26
Implementing addition of records (2)

  Element newStudent =
    newStudent(doc, ID, firstName, lastName,
               bAndY, email, group);
  rootElem.appendChild(newStudent);
  System.out.print(
      "First name (or <return> to finish): ");
  firstName = terminalReader.readLine().trim();
} // while firstName.length() > 0
// Update the last ID used:
String newLastID =
      java.lang.Integer.toString(lastIDnum);
rootElem.setAttribute("lastID", newLastID);
System.out.println("Finished adding records");
SDPL 2007         3.2: Document Object Model      27
Creating new student records (1)

  private Element
      newStudent(Document doc, String ID,
      String fName, String lName, String bAndY,
      String email, String grp) {
   Element stu = doc.createElement("student");
   stu.setAttribute("id", ID);
   Element newName = doc.createElement("name");
   Element newGiven = doc.createElement("given");
   newGiven.appendChild(doc.createTextNode(fName));
   Element newFamily = doc.createElement("family");
   newFamily.appendChild(doc.createTextNode(lName));
   newName.appendChild(newGiven);
   newName.appendChild(newFamily);
   stu.appendChild(newName);
SDPL 2007         3.2: Document Object Model     28
Creating new student records (2)

  // method newStudent(…) continues:
  Element newBr =
      doc.createElement("branchAndYear");
  newBr.appendChild(doc.createTextNode(bAndY));
  stu.appendChild(newBr);
  Element newEmail = doc.createElement("email");
  newEmail.appendChild(doc.createTextNode(email));
  stu.appendChild(newEmail);
  Element newGrp = doc.createElement("group");
  newGrp.appendChild(doc.createTextNode(group));
  stu.appendChild(newGrp);
  return stu;
} // newStudent
SDPL 2007         3.2: Document Object Model     29
Updates and Deletions

 Updates and deletions implemented
  similarly, by manipulating the DOM
  structures
 To be treated in the exercises




SDPL 2007     3.2: Document Object Model   30
Summary of XML APIs so far
   Give applications access to the structure and
    contents of XML documents
   Event-based APIs (e.g. SAX)
    – notify application through parsing events
    – efficient
   Object-model (or tree) based APIs (e.g. DOM)
    – provide a full parse tree
    – more convenient, but require much resources with
      large documents
   Major parsers support both SAX and DOM
    – used through proprietary methods
    – used through JAXP          (-> next)
SDPL 2007            3.2: Document Object Model      31

								
To top