XML Programming - Parsers

Document Sample
XML Programming - Parsers Powered By Docstoc
					XML Programming

XML Parsers
Brute Force” - write your own “home grown” parser

Apache XML Project's Xerces Java :
Apache XML Project's Xerces C++: IBM's XML for Java: Sun's Java API for XML: Expat: Libxml: SAX in C++:

Major Programming Functions
• Build XML – create XML • Change the Structure of XML • Parse XML – extract from XML

Other Parsers

XML Interfaces
• tree-based (DOM) • event-based (SAX.)

Parser APIs
SAX, the Simple API for XML SAX1 SAX2
DOM, the Document Object Model DOM Level 0 DOM Level 1 DOM Level 2

With a tree-based interface, the parser will create an inmemory tree of nodes You can then traverse the tree, and extract any information you like. With an event-based interface, the parser will trigger event call-backs for each significant XML parsing event (see Figure 2.) Your application can intercept these events, and thereby extract the XML data. Generally, event-based interfaces are faster and require less memory than tree-based interfaces. By contrast, however, tree-based interfaces enable you to easily edit, delete and move any node within the document tree.


Introduction to Xerces
Xerces is an open-source XML Parser sponsored by This quick start guide provides an overview to Xerces programming, including: understanding treebased v. event-based XML interfaces, creating a DOM Parser, creating a SAX Parser, turning validation on, and capturing errors.


DOM Tree
• DOM tree – Each node represents an element, attribute, etc. <?xml version = "1.0"?> <message from = "Paul" to = "Tem"> <body>Hi, Tim!</body> </message> • Node created for element message – Element message has child node for body element – Element body has child node for text "Hi, Tim!" – Attributes from and to also have nodes in tree

Parser Functions
Method Name createElement createAttribute createTextNode createComment Description Creates an element node. Creates an attribute node. Creates a text node. Creates a comment node.

createProcessingInstruction Creates a processing instruction node. createCDATASection Creates a CDATA section node. getDocumentElement appendChild getChildNodes Returns the document’s root element. Appends a child node. Returns the child nodes.

Node Functions
Method Name appendChild cloneNode Description Appends a child node. Duplicates the node. getAttributes Returns the node’s attributes. getChildNodes Returns the node’s child nodes. getNodeName Returns the node’s name. getNodeType Returns the node’s type (e.g., element, attribute, text, etc.). Node types are described in greater detail in Fig. 8.9. getNodeValue Returns the node’s value. getParentNode Returns the node’s parent. hasChildNodes Returns true if the node has child nodes. removeChild replaceChild setNodeValue insertBefore Removes a child node from the node. Replaces a child node with another node. Sets the node’s value. Appends a child node in front of a child node.

Node Functions

Getting Started
#include <xercesc/util/PlatformUtils.hpp> int main(int argC, char* argV[]) { // Initialize the XML4C2 system try { XMLPlatformUtils::Initialize(); } catch(const XMLException &toCatch) { cerr << "Error during Xerces-c Initialization.\n" << " Exception message:" << StrX(toCatch.getMessage()) << endl; return 1; }





Creating a DOM Parser

Creating a DOM Parser

Calling the Parser

Java DOM

– Simple API for XML – Another method for accessing XML document’s contents – Developed by XML-DEV mailing-list members – Uses event-based model
• Notifications (events) are raised as document is parsed

DOM vs. SAX
• DOM – Tree-based model • Stores document data in node hierarcy – Data is accessed quickly – Provides facilities for adding and removing nodes • SAX – Invoke methods when markup (specific tag) is encountered – Greater performance than DOM – Less memory overhead than DOM – Typically used for reading documents (not modifying them)

What SAX 1.0 doesn't do
• Completely ignores document type declaration • Validation and other optional results of DTD (attribute defaulting, external entities, etc.) are at parser default • Comments • XML Declaration • Does not report CDATA sections, entity references, and other non-canonical information from the document. • No explicit support for namespaces

Method Name

SAX Functions
Description Invoked at the beginning of parsing. Invoked when the parser encounters the start of an XML document. Invoked when the parser encounters the end of an XML document. Invoked when the start tag of an element is encountered. Invoked when the end tag of an element is encountered. Invoked when text characters are encountered. Invoked when whitespace that can be safely ignored is encountered. Invoked when a processing instruction is encountered.

setDocumentLocator startDocument endDocument startElement endElement characters ignorableWhitespace processingInstruction

SAX 2.0
• SAX 2.0
– Recently released – We have been using JAXP
• JAXP supports only SAX 1.0 (currently)

– Xerces parser (Apache) supports SAX 2.0

SAX 2.0 major changes
• SAX 2.0 major changes
– Class HandlerBase replaced with DefaultHandler
– Element and attribute processing support namespaces – Loading and parsing processes has changed – Methods for retrieving and setting parser properties
• e.g., whether parser performs validation

Creating a SAX Parser

SAX Handler Class

Shared By:
Tags: java, parser, guide
Vinothkumar Vinothkumar Engineer