Tutorial Introduction to XML and Java XML, dom4j by yec10699

VIEWS: 0 PAGES: 40

									                          Tutorial:
               Introduction to XML and Java:
                   XML, dom4j and XPath

                          Eran Toch
              Methodologies in the Development
                   of Information Systems
                       December 2003
XML and Java: XML, dom4j and Xpath – Eran Toch
Methodologies in Information System Development
                                            Sources

• Major Sources:
      – http://www.cis.upenn.edu/~cis550/slides/xml.ppt
        CIS550 Course Notes, U. Penn, source for many
        slides
      – http://www.cs.technion.ac.il/~oshmu/
        236804 - Seminar in Computer Science 4: XML -
        Technology, Systems and Theory
      – http://dom4j.org




XML and Java: XML, dom4j and Xpath – Eran Toch            2
Methodologies in Information System Development
                                            Agenda
• Short Introduction to XML
      – What is XML
      – Structure and Terminology
      – JAVA APIs for XML: an Overview
• dom4j
      – Parsing an XML document
      – Writing to an XML document
• Xpath
      – Xpath Queries
      – Xpath in dom4j
• References
XML and Java: XML, dom4j and Xpath – Eran Toch       3
Methodologies in Information System Development
                          The Structure of XML

• XML consists of tags and text
• Tags come in pairs <date> ...</date>
• They must be properly nested
  <date> <day> ... </day> ... </date> --- good
  <date> <day> ... </date>... </day> --- bad




XML and Java: XML, dom4j and Xpath – Eran Toch    4
Methodologies in Information System Development
                                           XML text

• XML has only one “basic” type -- text. It is
  bounded by tags e.g.
  <title> The Big Sleep </title>
  <year> 1935 </ year> --- 1935 is still text
• XML text is called PCDATA (for parsed
  character data). It uses a 16-bit encoding,
  e.g. \&\#x0152 for the Hebrew letter Mem
  Later we shall see how new types are
  specified by XML-data
XML and Java: XML, dom4j and Xpath – Eran Toch        5
Methodologies in Information System Development
                                    XML structure

• Nesting tags can be used to express various
  structures. E.g. A tuple (record):
         <person>
               <name> Jeff Cohen</name>
               <tel> 04-828-1345 </tel>
               <tel> 054-470-778 </tel>
               <email> jeffc@cs.technion.ac.il </email>
         </person>




XML and Java: XML, dom4j and Xpath – Eran Toch            6
Methodologies in Information System Development
                           XML structure (cont.)

• We can represent a list by using the same
  tag repeatedly:

                               <addresses>
                                 <person> ... </person>
                                 <person> ... </person>
                                 <person> ... </person>
                                 ...
                               </addresses>



XML and Java: XML, dom4j and Xpath – Eran Toch            7
Methodologies in Information System Development
                           XML structure (cont.)

• Nested tags can be part of a list too:
              <addresses>
                      <person>
                              <name> Yossi Orr</name>
                              <tel> 04-828-1345 </tel>
                              <email> yossio@cs.technion.ac.il </email>
                      </person>
                      <person>
                              <name> Irma Levy</name>
                              <tel> 03-426-1142 </tel>
                              <email>irmal@yourmail.com</email>
                      </person>
              </addresses>



XML and Java: XML, dom4j and Xpath – Eran Toch                            8
Methodologies in Information System Development
                                      Terminology
• The segment of an XML document between an opening and a
  corresponding closing tag is called an element.
• Meta date about an element can appear in an attribute.
                          attribute
            <person type=“Friend”>
                 <name>Ortal Derech</name>
                 <tel>04-8732122</tel>
element          <tel>054-646888</tel>
                 <email>oderech@tx.technion.ac.il</email>
            </person>
        element, a sub-element           text
               of
XML and Java: XML, dom4j and Xpath – Eran Toch          9
Methodologies in Information System Development
                                  XML is tree-like

                                                  person

                       name                 tel         tel        email

    Malcolm Atchison                              (215) 898 4321
                          (215) 898 4321                      mp@dcs.gla.ac.sc




XML and Java: XML, dom4j and Xpath – Eran Toch                                   10
Methodologies in Information System Development
                  A Complete XML Document
        <?XMLversion ="1.0" encoding="UTF-8"
                                                       Tells whether or not
             standalone="no"?>
                                                       this document
   <!DOCTYPE addresses SYSTEM                          references an
   "http://www.technion.ac.il/~erant/addresses.dtd">   external entity or an
   <addresses>                                         external data type
          <person>                                     specification
                 <name> Jeff Cohen</name>
                 <tel> 04-828-1345 </tel>
                 <tel> 054-470-778 </tel>
                 <email> jeffc@cs.technion.ac.il </email>
          </person>
   </addresses>



XML and Java: XML, dom4j and Xpath – Eran Toch                           11
Methodologies in Information System Development
                    XML Structure Definitions

• DTD
      – Document Type Definition – defines structure
        constraints for XML documents
• XML Schema
      – Same as DTD, more powerful because it includes
        facilities to specify the data type of elements and it is
        based on XML.
• Namespaces
      – Namespaces are a way of preventing name clashes
        among elements from more than one source within
        the same XML document.
XML and Java: XML, dom4j and Xpath – Eran Toch                      12
Methodologies in Information System Development
                                  More Standards

• Xpath
      – XML Path Language, a language for locating parts of
        an XML document.
• Xquery
      – A query language for XML documents (like SQL…).
• XSLT
      – XSL Transformations, a language for transforming
        XML documents into other XML documents.
• RDF
      – Resource Description Framework. A formal
        knowledge model from the World Wide Web.
XML and Java: XML, dom4j and Xpath – Eran Toch             13
Methodologies in Information System Development
                        Why Is XML Important?

• Because it exists, and everybody uses it.
• Plain Text - you can create and edit files with
  anything.
• Data Identification - XML tells you what kind
  of data you have, not how to display it.
• Separation from style.
• Hierarchical, and easily processed.



XML and Java: XML, dom4j and Xpath – Eran Toch    14
Methodologies in Information System Development
                      An Overview of the APIs

• JAXP: Java API for XML Processing
      – It provides a common interface for creating and using
        the standard SAX, DOM, and XSLT APIs.
• JAXB: Java Architecture for XML Binding
      – defines a mechanism for writing out Java objects as
        XML.
• JDOM
      – Represents an XML file as a tree of objects
        (sophisticated version of DOM)
• dom4j
      – Lightweight version of JDOM.
XML and Java: XML, dom4j and Xpath – Eran Toch                15
Methodologies in Information System Development
                                            Agenda
• Introduction to XML
     – What is XML
     – Structure and Terminology
     – JAVA APIs for XML: an Overview
• dom4j
     – Parsing an XML document
     – Writing to an XML document
• Xpath
     – Xpath Queries
     – Xpath in dom4j
• References
XML and Java: XML, dom4j and Xpath – Eran Toch       16
Methodologies in Information System Development
                                              dom4j

• An Open Source XML framework for Java.
• Allows you to read, write, navigate, create
  and modify XML documents.
• Integrates with DOM and SAX.
• Full XPath support.
• XSLT Support.




XML and Java: XML, dom4j and Xpath – Eran Toch        17
Methodologies in Information System Development
                              Download and Use

• Go to: http://dom4j.org.
• Go to http://dom4j.org/download.html, and
  download the latest release (current = 1.4).
• Unzip.
• Don’t forget the classpath. When working in
  an IDE, don’t forget to add the log4j.jar
  library.
• Javadoc: http://dom4j.org/apidocs/index.html.
• Quick start guide: http://dom4j.org/guide.html.
XML and Java: XML, dom4j and Xpath – Eran Toch    18
Methodologies in Information System Development
                  Opening an XML Document

           import org.dom4j.*;
           public class Foo {
                  public Document parse(String id)
                    throws DocumentException{
                          SAXReader reader = new SAXReader();
                          Document document = reader.read(id);
                          return document;
                  }
           }


                                                  We can read: file,
                                                  URL, InputStream,
                                                  String


XML and Java: XML, dom4j and Xpath – Eran Toch                         19
Methodologies in Information System Development
                               Example XML File

   <?xml version="1.0" encoding="UTF-8" ?>
   <salesdata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="C:\Documents and Settings\eran\
   My Documents\Academic\Courses\XML\xpath_ass_schema.xsd">
              <year>
                         <theyear>1997</theyear>
                         <region><name>central</name><sales unit="millions">34</sales></region>
                         <region><name>east</name><sales unit="millions">34</sales></region>
                         <region><name>west</name><sales unit="millions">32</sales></region>
              </year>
              <year>
                         <theyear>1998</theyear>
                         <region><name>east</name><sales unit="millions">35</sales></region>
                         region><name>west</name><sales unit="millions">42</sales> </region>
              </year>
   </salesdata>




XML and Java: XML, dom4j and Xpath – Eran Toch                                             20
Methodologies in Information System Development
                     Accessing XML Elements
                    Accessing root                       Retrieving child
                    element                              elements


public void dump(Document document)
throws DocumentException{
       Element root = document.getRootElement();
       for (Iterator i = root.elementIterator(); i.hasNext(); ) {
               Element element = (Element)i.next();
               System.out.println(element.getQualifiedName());
               System.out.println(element.getTextTrim());
               System.out.println(element.elementText("theyear"));
       }
}
                                                  Retrieving element
                                                  text
              Retrieving element
              name
                                                  Retrieving the text
                                                  of the child
XML and Java: XML, dom4j and Xpath – Eran Toch    element “theyear”         21
Methodologies in Information System Development
         Accessing XML Elements – cont’d

• What will be the output of dump()?

                       year

                       1997
                       year

                       1998

                                                  Why?


XML and Java: XML, dom4j and Xpath – Eran Toch           22
Methodologies in Information System Development
      Accessing XML Elements Recursively


public void go(Element element, int depth){
       for (int d=0; d<depth; d++){
               System.out.print("   ");
       }
       System.out.print(element.getQualifiedName());
       System.out.println(" "+ element.getTextTrim());
       for (Iterator i = element.elementIterator(); i.hasNext(); ) {
               Element son = (Element)i.next();
               go(son, depth+1);
       }
}


                                                   What will be the
                                                   output?

 XML and Java: XML, dom4j and Xpath – Eran Toch                       23
 Methodologies in Information System Development
             Accessing Recursively – cont’d
                             salesdata
                                year
                                   theyear 1997
                                   region
                                       name central
                                       sales 34
                                   region
                                       name east
                                       sales 34
                                   region
                                       name west

                                year
                                       sales 32       The whole XML
                                   theyear 1998       tree, element
                                   region
                                       name east      names + values
                                       sales 35
                                   region
                                       name west
                                       sales 42

XML and Java: XML, dom4j and Xpath – Eran Toch                         24
Methodologies in Information System Development
                   Creating an XML document
                                                      Creating root
                                                      element
   public Document createDocument() {
          Document document = DocumentHelper.createDocument();
          Element root = document.addElement("phonebook");

               Element address1 = root.addElement("address")
               .addAttribute("name", "Yuval")
               .addAttribute("category", "family")
               .addText("Ehud 3, Jerusalem");

               Element address2 = root.addElement("address")
               .addAttribute("name", "Ortal")
               .addAttribute("category", "friends")
               .addText("Kibbutz Givaat Haim");
               return document;
   }                                                 Adding elements
           What will we get
           when running go()?
XML and Java: XML, dom4j and Xpath – Eran Toch                         25
Methodologies in Information System Development
       Creating an XML document – cont’d

                   phonebook
                      address Ehud 3, Jerusalem         XML tree
                      address Kibbutz Givaat Haim       structure of the
                                                        new document



    FileWriter out = new FileWriter("C:\\addresses.xml");
    document.write(out);
    String XML = document.asXML()
                                                    Writing the XML
                                                    document to a
                                Retrieving the      file
                                XML itself as
                                string

XML and Java: XML, dom4j and Xpath – Eran Toch                             26
Methodologies in Information System Development
                                    Client Program
    public static void main(String[] args) {
             Foo foo = new Foo();
             try{
                      Document doc = foo.parse("C:\\Documents and Settings\\eran\\
Opening the           My Documents\\Academic\\Courses\\XML\\sales.xml");
file                  foo.dump(doc);
                      foo.go(doc.getRootElement(), 0);
                      foo.xpath(doc);
                      Document newDoc = foo.createDocument();
Dumping               foo.go(newDoc.getRootElement(), 0);
and printed           FileWriter out = new FileWriter( "C:\\addresses.xml" );
recursively           newDoc.write(out);
             }
             catch (Exception E){
                      System.out.println(E);               Creating a
             }
                                                           new
    }
                                                         document




 XML and Java: XML, dom4j and Xpath – Eran Toch                              27
 Methodologies in Information System Development
                                            Agenda
• Introduction to XML
      – What is XML
      – Structure and Terminology
      – JAVA APIs for XML: an Overview
• dom4j
      – Parsing an XML document
      – Writing to an XML document
• Xpath
      – Xpath Queries
      – Xpath in dom4j
• References
XML and Java: XML, dom4j and Xpath – Eran Toch       28
Methodologies in Information System Development
                            Xpath - Introduction

• XML Path Language. XPath is a language for
  addressing parts of an XML document.
• Enables node locating and retrieving, very
  much like directory accessing in file systems.
• Limited (but not bad) filtering and querying
  abilities.
• Retrieved the actual PCDATA or node sets



XML and Java: XML, dom4j and Xpath – Eran Toch     29
Methodologies in Information System Development
               Xpath – Simple Path Selection

Xpath Expression: /salesdata/year/theyear

                                                     “/” signifies child-of
<theyear>1997</theyear>
<theyear>1998</theyear>



/salesdata/year[2]/theyear

<theyear>1998</theyear>                           Filtering the level –
                                                  getting only the second
                                                  year element



XML and Java: XML, dom4j and Xpath – Eran Toch                            30
Methodologies in Information System Development
                             Xpath – Conditions

/salesdata/year/region[sales > 34]
                                                      Going down to region, and
<region>                                              filtering according to the
                                                      sales element
  <name>east</name>
  <sales unit="millions">35</sales>
</region>
<region>
  <name>west</name>
  <sales unit="millions">42</sales>
</region>
                       /salesdata/year/region[sales > 34]/name
                                                  ?
XML and Java: XML, dom4j and Xpath – Eran Toch                                31
Methodologies in Information System Development
               Xpath – Traveling Up the Tree
/salesdata/year/region[sales > 34]/parent::year/theyear

<theyear>1998</theyear>




               Going up the XML tree (and
               then down again)




XML and Java: XML, dom4j and Xpath – Eran Toch            32
Methodologies in Information System Development
                 Xpath – Traveling Down Fast

  /descendant::sales

                                                              Going all the way down,
  <sales       unit="millions">34</sales>
                                                              until the sales element
  <sales       unit="millions">34</sales>
  <sales       unit="millions">32</sales>
  <sales       unit="millions">35</sales>
  <sales       unit="millions">42</sales>

  //sales

                                                  Same same




XML and Java: XML, dom4j and Xpath – Eran Toch                                          33
Methodologies in Information System Development
                   Xpath – Advanced Queries

• The years (text nodes) for which sales data exists:

                                                           ancestor is same
                                                           as parent but
        Logical operators                                  goes all the way
                                                           up to year
//region[name=\"west\" and sales >
  32]/sales[@unit='millions']/ancestor::year
  /theyear
                                                  Accessing attributes

<theyear>1998</theyear>



XML and Java: XML, dom4j and Xpath – Eran Toch                            34
Methodologies in Information System Development
        Xpath – Advanced Queries (cont’d)

•      The years (text nodes) in which the west region
       sales were higher than the east region sales; sales
       may be expressed in thousands or in millions:

year[region[name="west"]/sales[@unit='millions'
   *1000 or @unit='thousands'] >
   region[name="east"]/sales[@unit='millions‘
   *1000 or @unit='thousands']]/theyear/text()




XML and Java: XML, dom4j and Xpath – Eran Toch           35
Methodologies in Information System Development
                                   Xpath in dom4j

• Xpath queries can be used in dom4j:
                                                                Xpath expression
                                                                is fed to the
public void xpath(Document document) {                          xpathSelector
         XPath xpathSelector =
DocumentHelper.createXPath("/salesdata/year/theyear");
         List results = xpathSelector.selectNodes(document);
         for (Iterator iter = results.iterator(); iter.hasNext(); ) {
                  Element element = (Element) iter.next();
                  System.out.println(element.asXML());
         }
}

                                       The nodes are selected
                                       from the document,
                                       according to the xpath
                                       query

XML and Java: XML, dom4j and Xpath – Eran Toch                               36
Methodologies in Information System Development
                                            Agenda
• Introduction to XML
      – What is XML
      – Structure and Terminology
      – JAVA APIs for XML: an Overview
• dom4j
      – Parsing an XML document
      – Writing to an XML document
• Xpath
      – Xpath Queries
      – Xpath in dom4j
• References
XML and Java: XML, dom4j and Xpath – Eran Toch       37
Methodologies in Information System Development
                               References - XML

• XML tutorial:
      – http://www.w3schools.com/xml/default.asp
• XML Specification from w3c:
      – http://www.w3.org/XML/
• The Java/XML Tutorial:
      – http://java.sun.com/xml/tutorial_intro.html
• DTD Tutorial:
      – http://www.xmlfiles.com/dtd/
• XML Schema Tutorial:
      – http://www.w3schools.com/schema/default.asp
• XML Schema Resource Page:
      – http://www.w3.org/XML/Schema

XML and Java: XML, dom4j and Xpath – Eran Toch        38
Methodologies in Information System Development
                                              dom4j

• Web site:
      – http://dom4j.org/
• Javadocs:
      – http://dom4j.org/apidocs/index.html
• Quick Start:
      – http://dom4j.org/guide.html
• Cookbook (main functionality):
      – http://dom4j.org/cookbook.html



XML and Java: XML, dom4j and Xpath – Eran Toch        39
Methodologies in Information System Development
                                                  Xpath

• Xpath specification:
      – http://www.w3.org/TR/xpath
• Xpath tutorial:
      – http://www.w3schools.com/xpath/default.asp
• Xpath tutorial (extended):
      – http://www.zvon.org/xxl/XPathTutorial/General/examp
        les.html
• Xpath reference:
      – http://www.vbxml.com/xsl/XPathRef.asp

XML and Java: XML, dom4j and Xpath – Eran Toch            40
Methodologies in Information System Development

								
To top