Programming with XML
Document Sample


Programming with XML
Written by:
Adam Carmi
Zvika Gutterman
Agenda
• About XML
• Review of XML syntax
• Document Object Model (DOM)
• JAXP
• W3C XML Schema
• Validating Parsers
XML 2
About XML
• XML – EXtensible Markup Language
• Designed to describe data
– Provides semantic and structural information
– Extensible
• Human readable and computer-manipulable
• Software and Hardware independent
• Open and Standardized by W3C1
• Ideal for data exchange
1) World Wide Web Consortium (founded in 1994 by Tim Berners-Lee)
XML 3
offenders.xml
<offenders> Information is marked
<!-- Lists all traffic offenders --> up with structural and
<offender id=“024378449 ”> semantic information.
<firstName> David </firstName>
Comment <middleName>Reuven</middleName> The characters &, <, >,
‘, “ are reserved and
<lastName>Harel</lastName> Character
can‟t be used in
<violation id=’12’> Data
character data. Use
<code num=“232” category=“traffic”/>
&, <, >,
Tag <issueDate>2001-11-02</issueDate> ' and "
<issueTime>10:32:00</issueTime> instead.
Ran a red light at Arik & Bentz st.
</violation>
</offender> Character
</offenders> Data
XML 4
offenders.xml: Tags
<offenders> XML tags are not pre-
<!-- Lists all traffic offenders --> defined and are case
Root Tag <offender id="024378449 "> sensitive.
<firstName> David </firstName>
<middleName>Reuven</middleName>
An XML document
may have only one root
<lastName>Harel</lastName>
tag.
<violation id=’12’>
Start Tag
<code num=“232” category=“traffic”/>
<issueDate>2001-11-02</issueDate> Shorthand for:
<issueTime>10:32:00</issueTime> <code num=...></code>
Ran a red light at Arik & Bentz st.
</violation>
</offender> End Tag
</offenders>
XML 5
offenders.xml: Elements
<offenders> Elements mark-up
<!-- Lists all traffic offenders --> information.
<offender id="024378449 ">
<firstName> David </firstName> Element x begins with
<middleName>Reuven</middleName>
a start-tag <x> and
Root Element
ends with an end-tag
<lastName>Harel</lastName>
</x>
<violation id=’12’>
<code num=“232” category=“traffic”/>
XML Elements must
<issueDate>2001-11-02</issueDate> be properly nested:
<issueTime>10:32:00</issueTime> <x>...<y>...</y>...</x>
Ran a red light at Arik & Bentz st.
</violation> XML documents must
</offender> contain exactly one
</offenders> root element.
XML 6
offenders.xml: Content
<offenders>• The content of an
<
•!--•ists•ll•raffic•ffenders•->•
L a t o - element is all the text
•offender id="024378449•>•
< " that lies between its
•firstName>•avid•/firstName>•
< •
D < start and end tags.
<
•middleName>Reuven</middleName>•
An XML parser is
<
•lastName>Harel</lastName>•
required to pass all
<
•violation id=’12’>•
characters in a
<
•code num=“232” category=“traffic”/>•
document, including
<
•issueDate>2001-11-02</issueDate>• whitespace characters.
<
•issueTime>10:32:00</issueTime>•
R ar l a •
A & B
•an••ed•ight•t•rik•amp;•enz st.•
<
•/violation>•
<
•/offender>•
</offenders>
• whitespace
XML 7
offenders.xml: Attributes
<offenders> Attributes are used to
<!-- Lists all traffic offenders --> provide additional
<offender id="024378449 "> information about
<firstName> David </firstName> elements.
<middleName>Reuven</middleName>
Attributes values must
<lastName>Harel</lastName>
always be enclosed in
<violation id=’12’>
quotes (“/„)
<code num=“232” category=“traffic”/>
<issueDate>2001-11-02</issueDate>
<issueTime>10:32:00</issueTime>
Ran a red light at Arik & Benz st.
</violation>
</offender>
</offenders>
XML 8
TM
DOM
• DOMTM – Document Object Model
• A Standard hierarchy of objects, recommended by
the W3C, that corresponds to XML documents.
• Each element, attribute, comment, etc., in an XML
document is represented by a Node in the DOM
tree.
• The DOM API1 allows data in an XML document
to be accessed and modified by manipulating the
nodes in a DOM tree.
1) Application Programming Interface
XML 9
DOM Class Hierarchy1
<<interface>> <<interface>> <<interface>>
NodeList Node NamedNodeMap
<<interface>> <<interface>> <<interface>> <<interface>>
Document CharacterData Element Attr
<<interface>> <<interface>>
Text Comment
1) A partial class hierarchy is presented in this slide.
XML 10
offenders.xml: DOM tree
:Document
:Element
offenders
:Text
•
:Comment
L all• offenders•
•ists• traffic•
:Text
•
:Element :Attribute
offender id
:Text
024378449•
:Text
•
:Element
firstName
:Text
D
•avid•
:Text
•
XML 11
Example: offenders DOM
:Element
The element lastName
:Text
“middleName” Harel
:Text
was skipped
•
:Element :Attribute
violation id
:Text
offenders
offender
12
:Text
•
:Element :Attribute
code num
:Text
:Text 232
• :Attribute
category
:Element :Text
issueDate traffic
:Text
2001-11-02
XML 12
Example: offenders DOM
:Text
offenders •
offender
violation
:Element
issueTime
:Text
10:32:00
:Text
•an• ed•
r
R a• light•
A &B st.•
•
at•rik• •enz• •
:Text
•
:Text
•
XML 13
JAXP
• JAXP – JavaTM API for XML Processing
• JAXP enables applications to parse and transform
XML documents using an API that is independent
of a particular XML processor implementation.
• JAXP provides two parser types:
– SAX1 parser: event driven
– DOM document builder: constructs DOM trees
by parsing XML documents.
1) Simple API for XML
XML 14
The Simple API for XML
(SAX) APIs
XML 15
The Document Object Model
(DOM) APIs
XML 16
Creating a DOM Builder
1. Create a DocumentBuilderFactory object:
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
2. Configure the factory object: A ParserConfigurationException
is thrown if a DocumentBuilder,
dbf.setIgnoringComments(true); which satisfies the configuration
requested cannot be created.
3. Create a builder instance using the factory:
DocumentBuilder docBuilder =
dbf.newDocumentBuilder();
XML 17
Building a DOM Document
• A DOM document can be built manually from
within the application:
Document doc = docBuilder.newDocument();
Element offenders = doc.createElement("offenders");
doc.appendChild(offenders);
Element offender = doc.createElement("offender");
offender.setAttribute("id", "024378449 ");
offenders.appendChild(offender);
Element firstName = doc.createElement(“firstName”);
Text text = doc.createTextNode(“ David “);
firstName.appendChild(text);
...
A DOMException is raised if an illegal character appears in a name, an
etc.
illegal child is appended to a node XML 18
Building a DOM Document
• A DOM Tree representation of an XML document
can be built automatically by parsing the XML
document:
Document doc = docBuilder.parse(new File(xmlFile));
A SAXParseException or SAXException is raised to report parse
errors.
XML 19
DumpDom.java (1 of 5)
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.NamedNodeMap;
Creating and traversing
import org.w3c.dom.Node; a DOM document
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
XML 20
DumpDom.java (2 of 5)
public class DumpDom {
private int indent = 0; // text indentation level
public DumpDom(String xmlFile)
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbf.newDocumentBuilder();
Document doc = docBuilder.parse(new File(xmlFile));
recursiveDump(doc);
} catch (ParserConfigurationException pce) {
System.err.println("Failed to create document builder");
} catch (SAXParseException spe) {
System.err.println("Error: Line=" + spe.getLineNumber() + ": " +
spe.getMessage());
} catch (SAXException se) {
System.err.println("Parse error found: " + se);
} catch (IOException e) {
e.printStackTrace();
} XML 21
}
DumpDom.java (3 of 5)
private void recursiveDump(Node node)
{
switch (node.getNodeType()) {
case Node.DOCUMENT_NODE:
dumpNode("document", node);
break;
case Node.COMMENT_NODE:
dumpNode("comment", node);
break;
case Node.ATTRIBUTE_NODE:
dumpNode("attribute", node);
break;
case Node.TEXT_NODE:
dumpNode("text", node);
break;
XML 22
DumpDom.java (4 of 5)
case Node.ELEMENT_NODE:
dumpNode("element", node);
indent += 2;
NamedNodeMap atts = node.getAttributes();
for (int i = 0 ; i < atts.getLength() ; ++i)
recursiveDump(atts.item(i));
indent -= 2;
break;
default:
System.err.println("Unknown node: " + node);
System.exit(1);
} // end of switch
// print children of the input node (if there are any)
indent+=2;
for (Node child = node.getFirstChild() ; child != null ;
child = child.getNextSibling()) {
recursiveDump(child);
}
indent-=2;
}// end of recursiveDump
XML 23
DumpDom.java (5 of 5)
private void dumpNode(String type, Node node)
{
for (int i = 0 ; i < indent ; ++i)
System.out.print(" ");
System.out.print("[" + type + "]: ");
System.out.print(node.getNodeName());
if (node.getNodeValue() != null)
System.out.print("=\"" + node.getNodeValue() + "\"");
System.out.print("\n");
}
public final static void main(String[] args)
{
DumpDom dumper = new DumpDom(args[0]);
}
}
XML 24
DTD - Document Type
Definition
• A specification for ensuring the validity of
XML documents
• The original mechanism, defined as part of
the XML specification
• Various Schema proposals - newer
mechanisms for describing validation
criteria
XML 25
XML Schema
• The purpose of an XML Schema is to
define a class of XML documents.
• An XML document that is syntactically
correct is considered well formed. If it also
conforms to an XML schema is considered
valid.
• An XML document is not required to have a
corresponding Schema.
XML 26
XML Schema (cont.)
• XML Schema documents are themselves XML
documents.
– Can be manipulated as such
– XML Schema is a language with an XML syntax.
• An XML document may explicitly reference the
schema document that validates it.
• Several schema models exist. In this course we
will use the W3C XML Schema1.
1) W3C recommendation since 2001
XML 27
W3C XML Schema
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>
...
</schema>
• A W3C XML Schema consists of a schema
element and a variety of sub-elements which
determine the appearance of elements and their
content in instance documents
• Each of the elements (and predefined simple
types) in the schema has (by convention) a prefix
xsd:which is associated with the W3C XML
schema namespace.
XML 28
Elements & Attribute
Declarations
• Elements are declared using the element
element:
<xsd:element name=“firstName” type=“xsd:string”/>
<xsd:element name=“offenders” type=“Offenders”/>
• Attributes are declared using the attribute
element:
<xsd:attribute name=“id” type=“xsd:positiveInteger”/>
A pre-defined
(simple) type
XML 29
Element & Attribute Types
• Elements that contain sub-elements or carry
attributes are said to have complex types.
• Elements that contain only text (e.g. numbers,
strings, dates etc.) but do not contain any sub-
elements are said to have simple types.
• Attributes always have simple types.
• Many simple types (e.g. string, date, integer etc.)
are pre-defined.
XML 30
A Few Built in Simple Types
Simple Type Examples
string any textual value (white space preserved)
NMTOKEN1 student, 342, $$
ID1 s1, :myId, _4
integer -126789, -1, 0, 1, 126789, 03485
float -INF, -1E4, -0, 0,12.78, 12.78E-2, NaN
time 13:24:12, 02:15:34.879
date 2002-11-23
boolean true, false, 0, 1
31
1) Should only be used as attribute types XML
Derived Simple Types
• New simple types may be defined by
deriving them from existing simple types
(build-in and derived)
• New simple types are derived by restricting
the range of permitted values for an existing
simple type.
• A new simple type is defined using the
simpleType element.
XML 32
Derived Simple Types
(cont.)
• Example: Numeric Restriction
<xsd:simpleType name="ViolationID">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
• Example: Enumeration
<xsd:simpleType name="ViolationCategory">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="traffic"/>
<xsd:enumeration value="criminal"/>
<xsd:enumeration value="civil"/>
</xsd:restriction>
</xsd:simpleType>
XML 33
Complex Types
• Complex types are defined using the complexType
element.
• Elements with complex types may carry attributes.
• The content of elements with complex types is
categorized as follows:
– Empty: no content is allowed.
– Simple: content must be of simple type.
– Element: content must include only child elements.
– Mixed: both element and character content is allowed.
XML 34
Complex Types: Attributes
• Attributes may be declared, using the use
attribute, as required or optional (default).
• Default values for attributes are declared
using the default attribute
– Allowed only for optional attributes
• The fixed attribute is used to ensure that an
attribute is set to a particular value.
– Appearance of the attribute is optional.
– fixed and use are mutually exclusive.
XML 35
Complex Types: Attributes
(cont.)
• Example: use, fixed
<xsd:complexType name="Code">
<xsd:attribute name="num" type="ViolationID“ use="required"/>
<xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/>
</xsd:complexType>
• Example: use, default
<xsd:complexType name="IssueTime">
...
<xsd:attribute name="accuracy" type="Accuracy" use="optional"
default="accurate"/>
...
</xsd:complexType>
XML 36
Complex Types: Empty
Content
• Example: schema
<xsd:complexType name="Code">
<xsd:attribute name="num" type="ViolationID" use="required"/>
<xsd:attribute name="category" type="ViolationCategory“
fixed="traffic"/>
</xsd:complexType>
• Example: instance document
<code num="232" category="traffic"/>
<code num="232" category="traffic"></code>
<code num="232"/>
XML 37
Complex Types: Simple
Content
• Example: element with no attributes
<xsd:element name="firstName" type="xsd:string"/>
Simple
type
• Example: element with attributes
<xsd:complexType name="IssueTime">
<xsd:simpleContent>
<xsd:extension base="xsd:time">
<xsd:attribute name="accuracy" type="Accuracy" use="optional"
default="accurate"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
XML 38
Complex Types: Element
Content
• Element Occurrence Constraints
– The minimum number of times an element may appear
is specified by the value of the optional attribute
minOccurs.
– The maximum number of times an element may appear
is specified by the value of the optional attribute
maxOccurs.
• The value unbounded indicates that there maximum number of
occurrences is unbounded.
– The default value of minOccurs and maxOccurs is 1.
XML 39
Complex Types: Element
Content (cont.)
• The element sequence is used to specify a sequence of sub-
elements.
– Elements must appear in the same order that they are declared.
<xsd:complexType>
<xsd:sequence>
<xsd:element name="firstName" type="xsd:string"/>
<xsd:element name="middleName" type="xsd:string“
minOccurs="0"/>
<xsd:element name="lastName" type="xsd:string"/>
<xsd:element name="violation" type="Violation“
minOccurs="0" maxOccurs="unbounded"/>
...
</xsd:sequence>
...
</xsd:complexType>
XML 40
Complex Types: Mixed
Content
• The optional Boolean attribute mixed is
used to specify mixed content:
<xsd:complexType name="Violation" mixed="true">
<xsd:sequence>
<xsd:element name="code" type="Code"/>
<xsd:element name="issueDate" type="xsd:date"/>
<xsd:element name="issueTime" type="IssueTime"/>
</xsd:sequence>
...
</xsd:complexType>
XML 41
Global Elements/Attributes
• Global elements and global attributes are created
by declarations that appear as the children of the
schema element.
• A global element is allowed to appear as the root
element of an instance document.
• The attribute ref of element/attribute elements
may be used (instead of the name attribute) to
reference a global element/attribute.
• Cardinality constraints cannot be placed on global
declarations, although they can be placed on local
declarations that reference global declarations.
XML 42
Global Elements/Attributes
(cont.)
• Example: global declarations
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="offenders" type="Offenders"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:attribute name="id" type="xsd:positiveInteger"/>
...
• Example: ref attribute
<xsd:element ref="comment" minOccurs="0"/>
<xsd:attribute ref="id" use="required"/>
XML 43
Anonymous Type
Definitions
• When a type is referenced only once, or
contains very few constraints, it can be
more succinctly defined as an anonymous
type.
• Saves the overhead of naming the type and
explicitly referencing it.
XML 44
Anonymous Type
Definitions (cont.)
<xsd:element name="offender" maxOccurs="unbounded"> Is this a global
<xsd:complexType> declaration?
Anonymous
<xsd:sequence>
<xsd:element name="firstName" type="xsd:string"/>
<xsd:element name="middleName" type="xsd:string“
minOccurs="0"/>
<xsd:element name="lastName" type="xsd:string"/>
<xsd:element name="violation" type="Violation“
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="comment" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute ref="id" use="required"/>
</xsd:complexType>
</xsd:element>
XML 45
offenders.xsd (1 of 4)
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
Schema for
<xsd:element name="offenders" type="Offenders"/> offenders
<xsd:element name="comment" type="xsd:string"/> XML
<xsd:attribute name="id" type="xsd:positiveInteger"/> documents
<xsd:complexType name="IssueTime">
<xsd:simpleContent>
<xsd:extension base="xsd:time">
<xsd:attribute name="accuracy" type="Accuracy" use="optional"
default="accurate"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
<xsd:complexType name="Code">
<xsd:attribute name="num" type="ViolationID" use="required"/>
<xsd:attribute name="category" type="ViolationCategory"
fixed="traffic"/>
</xsd:complexType> XML 46
offenders.xsd (2 of 4)
<xsd:complexType name="Offenders">
<xsd:sequence>
<xsd:element name="offender" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="firstName" type="xsd:string"/>
<xsd:element name="middleName" type="xsd:string“
minOccurs="0"/>
<xsd:element name="lastName" type="xsd:string"/>
<xsd:element name="violation" type="Violation"
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="comment" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute ref="id" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
XML 47
offenders.xsd (3 of 4)
<xsd:complexType name="Violation" mixed="true">
<xsd:sequence>
<xsd:element name="code" type="Code"/>
<xsd:element name="issueDate" type="xsd:date"/>
<xsd:element name="issueTime" type="IssueTime"/>
</xsd:sequence>
<xsd:attribute ref="id" use="required"/>
</xsd:complexType>
<xsd:simpleType name="ViolationID">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
XML 48
offenders.xsd (4 of 4)
<xsd:simpleType name="ViolationCategory">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="traffic"/>
<xsd:enumeration value="criminal"/>
<xsd:enumeration value="civil"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="Accuracy">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="accurate"/>
<xsd:enumeration value="approx"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
XML 49
Validating Parsers
• A validating parser is capable of reading a
Schema specification or DTD and
determine whether or not XML documents
conform to it.
• A non validating parser is capable of
reading a Schema / DTD but cannot check
XML documents for conformity.
– Limited to syntax checking
XML 50
Creating a Validating DOM
Parser
1. Create a DocumentBuilderFactory object:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
2. Configure the factory object to produce a validating parser:
dbf.setAttribute("http://java.sun.com/xml/jaxp/properties"
+ "/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
dbf.setAttribute("http://java.sun.com/xml/jaxp/properties"
+ "/schemaSource", new File(xmlSchema));
dbf.setValidating(true);
3. Create a builder instance and set its error-handler:
DocumentBuilder docBuilder = dbf.newDocumentBuilder();
docBuilder.setErrorHandler(new MyErrorHandler());
XML 51
Handling Parsing Errors
• By default, JAXP parsers do not throw
exceptions when documents are found to be
invalid.
• JAXP provides the interface ErrorHandler
so that users will be able to implement their
own error-handling semantics.
XML 52
BoundedErrorPrinter.java (1 of 3)
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
/**
* An error handler that prints to the standard error stream a specified
* number of errors. Once the specified number of errors is detected,
* parsing is aborted.
*/
public class BoundedErrorPrinter implements ErrorHandler {
private int errorCount = 0;
private int errorsToPrint;
public BoundedErrorPrinter(int errorsToPrint)
{
this.errorsToPrint = errorsToPrint;
}
XML 53
BoundedErrorPrinter.java (2 of 3)
public void warning(SAXParseException spe) throws SAXException
{
System.err.println("Warning: " + getParseExceptionInfo(spe));
}
public void error(SAXParseException spe) throws SAXException
{
if (errorCount < errorsToPrint) {
System.err.println("Error: " + getParseExceptionInfo(spe));
++errorCount;
}
if (errorCount >= errorsToPrint)
throw spe; // abort parsing
}
XML 54
BoundedErrorPrinter.java (3 of 3)
public void fatalError(SAXParseException spe) throws SAXException
{
if (errorCount < errorsToPrint)
System.err.println("Fatal: " + getParseExceptionInfo(spe));
throw spe;
}
public boolean errorsFound()
{
return errorCount > 0;
}
private String getParseExceptionInfo(SAXParseException spe)
{
return "Line = " + spe.getLineNumber() + ": " + spe.getMessage();
}
}
XML 55
Related docs
Get documents about "