Analyzing and Comparing the Parsing Techniques of Asynchronous Message
Document Sample


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 5, May 2011
Analyzing and Comparing the Parsing
Techniques of Asynchronous Message
Mr.P.Krishna Sankar, Ms.N.P.Shangaranarayanee
Assistant Professor, Student,
Department of Computer science and Engineering,
Department of Computer science and Engineering,
Angel College of Engineering and Technology,
Dr.Mahalingam College of Engineering & Tirupur-641 665, Tamilnadu, India.
Technology, Pollachi – 642 003, Tamilnadu, India.
npsnarayanee@gmail.com
pksankar@gmail.com
Abstract— Java API for XML Processing (JAXP) provided bottlenecks in applications and systems that process large
two methods for processing XML: Document Object Model volumes of XML data [3].
(DOM) and Simple API for XML (SAX). The idea is to parse
the whole document and construct a complete document tree XML processing can incur significant run-time
in memory before it returns control to the client. This cannot overhead in XML-based infra structural middleware such as
be achieved through either by DOM nor by SAX. So StAX is Web service application servers [4]. XML document is the
introduced to achieve the idea. StAX does not suffer from the general tree structure and the XML processing task as the
drawbacks faced while using DOM and SAX. A parser is a extension from the parallel tree traversal algorithm for the
computer program or a component of a program that classic discrete optimization problems. Analyse the standard
analyses the grammatical structure of an input with respect to parsing techniques like JDOM, SAX and STAX, based on
a given formal grammar in a process known as parsing. that efficiency of different parsers is computed.
Typically, a parser transforms some input text into a data
structure that can be processed easily, e.g. for semantic Java API for XML Processing (JAXP) provided two
checking, code generation or to help understanding the input. methods for processing XML - the Document Object Model
Such data structure usually captures the implied hierarchy of (DOM) method, which uses a standard object model to
the input and forms a tree or even a graph. XML document as represent XML documents, and the Simple API for XML
general tree structure and processing task as the extension (SAX) method, which uses application-supplied event
from the parallel tree traversal algorithm for the classic handlers to process XML. Processing several XML
discrete optimization problems. Unlike the Simple API for documents simultaneously can be a significant challenge
XML (SAX), StAX offers an API for writing XML [2]. We are using the Java to develop the parser like JDOM,
documents. To be precise, it offers two APIs: a low-level, SAX, STAX and open source software. SAX parsers, for
cursor-based API (XMLStreamWriter), and a higher-level, example, deliver the parsing events through callbacks to the
event-based API (XMLEventWriter). While the cursor-based client application. Because the SAX parser controls this
API is best used in data binding scenarios (for example, process, the client application does not really have a chance
creating a document from application data), the event-based to synchronize the different input sources. Therefore,
API is typically used in pipelining scenarios where a new programmers usually resort to the DOM parser when it
document is constructed from the data of input documents.
comes to multi-document processing. However, the penalty
Keywords— DOM, SAX, StAX, API, XML here is excessive resource usage; the node trees of all input
documents must completely reside in memory.
I.INTRODUCTION
II. PARSING ANALYSIS
XML stands for the Extensible Markup Language. It is a
Markup language for documents, Nowadays XML is a tool A parser is a computer program or a component of a
to develop and likely to become a much more common tool program that analyses the grammatical structure of an input
for sharing data and store. XML can communicate with respect to a given formal grammar in a process known
structured information to other users [1]. In other words, if a as parsing. Typically, a parser transforms some input text
group of users agree to implement the same kinds of tags to into a data structure that can be processed easily, e.g. for
describe a certain kind of information, XML applications semantic checking, code generation or to help
can assist these users in communicating their information in understanding the input.
an more robust and efficient manner. XML can make it A. JDOM
easier to exchange information between cooperating
JDOM is a tree-based API for processing XML
entities. XML technique can be categorized by four factors
documents with Java that threw out DOM’s limitations and
Strength of XML, XML Parser, XML Goals and Types of
assumptions and started from scratch. It is designed purely
XML Parsers [5]. XML parsing is a core operation
for XML, purely for Java, and with no concern for
performed on an XML document for it to be accessed and
backwards compatibility with earlier, similar APIs. JDOM
manipulated. This operation is known to cause performance
is written in and for Java. It consistently uses the Java
7 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 5, May 2011
coding conventions and the class library. It is thus much for an object, using polymorphism to match the method
cleaner and much simpler than DOM. Most developers find name to the handler code, and using encapsulation to
JDOM to be far more intuitive and easy to use than DOM. manage state in the handler between callbacks. This overall
It’s not that JDOM will enable you to do anything you can’t model of event-based programming is known as a push
do with DOM. However, writing the same program with model and has a reputation for being difficult for many
JDOM will normally take you less time and have fewer programmers to master. Most models that are considered
bugs when finished, simply because of the greater easier to program, however, require random access to the
intuitiveness of the API. In many ways, JDOM is to DOM document, and thus can lead to inefficiencies, so SAX has
as Java is to C++, a much improved, incompatible the reputation for being the most efficient standard way to
replacement for the earlier more complex technology. process XML, if far from the easiest.
JDOM is an open source, tree-based, pure Java API for
parsing, creating, manipulating, and serializing XML C. StAX
documents. JDOM was invented by Brett McLaughlin and Streaming API for XML (StAX) is an application
Jason Hunter in the spring of 2000. programming interface (API) to read and write XML
documents, originating from the Java programming
JDOM can build a new XML tree in memory. Data for language community. Traditionally, XML APIs are either:
the tree can come from a non-XML source like a database,
from literals in the Java program, or from calculations as in Tree based - the entire document is read into memory
many of the Fibonacci number examples in this book. When as a tree structure for random access by the calling
creating new XML documents from scratch (rather than application
reading them from a parser), JDOM checks all the data for
well-formed. For example, unlike many DOM Event based - the application registers to receive events
implementations, JDOM does not allow programs to create as entities are encountered within the source document.
comments whose data includes the double hyphen -- or Streaming APIs for XML (StAX) which is a
elements and attributes whose namespace mapping conflict standardized Java based API for pull-parsing XML. StAX
in impossible ways. has two basic functions: to allow users to read and write
Once a document has been loaded into memory, whether XML as efficiently as possible and be easy to use (cursor
by creating it from scratch or by parsing it from a stream, API), and be easy to extend and allow for easy pipelining
JDOM can modify the document. A JDOM tree is fully (event iterator API). Pull parsing differs from the traditional
read-write. All parts of the tree can be moved, deleted, and SAX based iteration and DOM based tree model, in that it is
added to, subject to the usual restrictions of XML. optimized for speed and performance. StAX is often
referred to as “pull parsing.” The developer uses a simple
B. SAX iterator based API to “pull” the next XML construct in the
SAX stands for Simple API for XML. SAX parsing is document. However, the common streaming APIs like SAX
unidirectional; previously parsed data cannot be re-read are all push APIs. They feed the content of the document to
without starting the parsing operation again. The SAX the application as soon as they see it, whether the
standard currently is at version 2.0. It is used to read data application is ready to receive that data or not. SAX and
from a XML document. A parser that uses SAX parses the XNI are fast and efficient, but the patterns they require
XML serially. The API is event driven and these events are programmers to adopt are unfamiliar and uncomfortable to
fired when the XML features are encountered. XML parsing many developers.
is unidirectional. Memory used by a SAX parses is Pull APIs are a more comfortable alternative for
relatively low. Due to the event nature of SAX, the parsing streaming processing of XML. A pull API is based around
is faster of an XML document. SAX usually follows Push- the more familiar iterator design pattern rather than the less
based parsing, in which case, the Parser will scan the XML well-known observer design pattern. In a pull API, the
Document from top to bottom and whenever it founds some client program asks the parser for the next piece of
node (like start node, end node, text-node etc.) it will push information rather than the parser telling the client program
notifications to the Application in the form of Events. So, when the next datum is available. In a pull API the client
SAX is basically a sequential, event-based parser. SAX is a program drives the parser. In a push API the parser drives
callback implementation. As it iterates over each the client.
fundamental unit of XML, is that as it reads each unit of
XML, it creates an event that the host program can use. This Reading with the StAX is by XMLStreamReader .It is
allows the application to ignore the bits it doesn't care the key interface in StAX. This interface represents a cursor
about, and just keep or use what is needed. SAX is often that's moved across an XML document from beginning to
used in certain high-performance applications or areas end. At any given time, this cursor points at one thing: a text
where the size of the XML might exceed the memory node, a start-tag, a comment, the beginning of the
available to the running program. In mainstream languages, document, etc. The cursor always moves forward, never
event-based interfaces are usually implemented using backward and normally only moves one item at a time.
callback functions, a style familiar in graphical user There are a few ways to filter the event stream; of
interface (GUI) programming and the like. In object- course, you could use a stack of if-else statements instead of
oriented languages, callbacks are usually registered methods the switch, but almost all StAX programs will feature an
8 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 5, May 2011
event loop something like this one. This is probably my
only major criticism of StAX. Integer type codes and big
switch statements are relics of procedural thinking. Object
oriented programs should be based around classes,
inheritance hierarchies, and polymorphism instead.
StAX is a fast, potentially extremely fast, straight-
forward, memory-thrifty way to loading data from an XML
document the structure of which is well known in advance.
State management is much simpler in StAX than in SAX, so
if you find that the SAX logic is just getting way too
complex to follow or debug, then StAX is well worth FIG.1. PARSING OF XML DOCUMENT
exploring. A few features such as validation, schema Like DOM, SAX parsers control the complete parsing
support, and entity resolution are either not available or are process. By default, a SAX parser starts parsing at the
not functional in the current reference implementation, but beginning of a document and continues until the end. Client
these should soon be available in independent event handlers are informed through callbacks about the
implementations [6]. StAX will be a very useful addition to events during this parsing process. To avoid unnecessary
any Java developer's XML toolkit. overhead during document screening, such an event handler
III.COMPARISION may want to stop the parsing process once it has gathered
the required information. A common technique for
Java API for XML Processing (JAXP) provided two achieving this in SAX is throwing an exception. This will
methods for processing XML -- the Document Object cause SAX to stop the parsing process.
Model (DOM) method, which uses a standard object model
to represent XML documents, and the Simple API for XML
(SAX) method, which uses application-supplied event
handlers to process XML. Processing several XML
documents simultaneously can be a significant challenge.
SAX parsers, for example, deliver the parsing events
through callbacks to the client application. Because the
SAX parser controls this process, the client application does
not really have a chance to synchronize the different input FIG.2. C ONVERT THE XML INTO JAVA OBJECT MODEL
sources. Therefore, programmers usually resort to the DOM
parser when it comes to multi-document processing. The information gathered by the event handler must be
However, the penalty here is excessive resource usage; the encoded in an error message that's wrapped in an exception
node trees of all input documents must completely reside in object and posted to the parser's client. A special error
memory. In each step of the parsing Java object model is to handler in the client receives this exception and must parse
be performed (Fig.1). the parser's error message to retrieve the required
information. This may be a solution to the screening
The screening or classification of XML documents is a problem, but it's a complicated one. SAX parsers, for
common problem, especially in XML middleware. Routing example, deliver the parsing events through callbacks to the
XML documents to specific processors may require analysis client application. Because the SAX parser controls this
of both the document type and the document content. The process, the client application does not really have a chance
problem here is obtaining the required information from the to synchronize the different input sources. Therefore,
document with the least possible overhead. Traditional programmers usually resort to the DOM parser when it
parsers such as DOM or SAX are not well suited to this comes to multi-document processing. However, the penalty
task. In the Fig.2 the XML document is converting to the here is excessive resource usage; the node trees of all input
Java Object model. documents must completely reside in memory.
DOM, for example, parses the whole document and StAX does not suffer from above drawbacks. As its
constructs a complete document tree in memory before it name indicates, it is targeted at streaming applications such
returns control to the client. Even DOM parsers that employ as the merging of two documents. The following example
deferred node expansion, and thus are able to parse a shows how this is done. Assume that you want to merge two
document partially, have high resource demands because documents containing lists of products.
the document tree must be at least partially constructed in
memory. This is simply not acceptable for screening Streaming API for XML (StAX) completely changes
purposes. this. Unlike the Simple API for XML (SAX), StAX offers
an API for writing XML documents. To be precise, it offers
two APIs: a low-level, cursor-based API
(XMLStreamWriter), and a higher-level, event-based API
(XMLEventWriter). While the cursor-based API is best
used in data binding scenarios (for example, creating a
document from application data), the event-based API is
9 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 5, May 2011
typically used in pipelining scenarios where a new Entire document is not
No built-in document
document is constructed from the data of input documents. navigation support
loaded into memory,
The cursor-based API offers a variety of specific methods No random access to XML
resulting in low memory
Document
for creating the various elements of the XML information SAX consumption
No support for modifying
set, such as elements, attributes, processing instructions, Allows registration of
XML in place.
data type declarations, and character content. These multiple content
No support for namespace
Handlers.
methods take care of many formatting issues. For example, scoping.
the method writeCharacters() automatically escapes Contains two parsing
characters like the less than sign (<), the greater than sign models, for ease of No built-in document
(>), and the ampersand (&). And the method performance. navigation support
Application controls No random access to XML
writeEndDocument() automatically closes all open StAX parsing, easily supporting document
structures. So it does not matter if the last call to multiple inputs No support for modifying
writeEndElement() in the example is commented out or not. Powerful filtering XML in place
capabilities provide Still in an immature state
StAX can even generate namespace prefixes for efficient data retrieval
namespaces that have not been formally declared.
javax.xml.stream.isPrefixDefaulting has been set to true for IV.RESULT
the output factory. If this property has been set to false, you
Based on time taken to parsing of the xml content with
must explicitly declare each namespace prefix and each
JDOM, SAX and StAX techniques get data.
namespace using the methods setPrefix() and
writeNamespace(). Among the DOM and SAX widely used TABLE II TIME TAKEN QUADC ORE PROCESSOR TO PARSE AN XML
methods, StAX provides the parsing efficiency and making
developer comfort. As StAX name indicates, it is targeted at Time taken (nanoseconds)
streaming applications such as the merging of two
documents and exchange information between cooperating Nodes JDOM SAX StAX
entities. 1 0.0543514280 0.0302972850 0.0229867110
StAX allows an application to process multiple XML
2 0.0551824870 0.0305735140 0.0230995120
sources simultaneously. For example: when one document
includes or imports another document, the application can 3 0.0552194060 0.0308396360 0.0231593400
process the imported document while processing the
original document. This use case is common when the 4 0.0552609610 0.0310450710 0.0232124300
application is reading documents such as XML Schemas or 5 0.0552998820 0.0312222530 0.0232675710
WSDL documents [4]. StAX has two basic functions: To
allow users to read and write XML as efficiently as possible 6 0.0553378660 0.0314174210 0.0233201160
and be easy to use (cursorAPI), and be easy to extend and
allow for easy pipelining. 7 0.0553779660 0.0315977420 0.0233766030
This approach of XML processing gives more control to 8 0.0554156950 0.0317844120 0.0234354000
the client application than to the parser, enabling faster and
9 0.0554545450 0.0319676010 0.0234957610
memory-efficient processing. This is becoming a standard
across different domains of XML processing. For example, 10 0.0554942100 0.0321462610 0.0235535530
Apache Axis2, one of the prominent SOAP processing
engines, improved its performance four times, on average, 11 0.0555336980 0.0323332980 0.0236119220
over its predecessor by using a StAX-based XML 12 0.0555771380 0.0325268310 0.0236688970
processing model called Axiom. Axiom is more memory-
efficient and preferment than the existing object models 13 0.0556225780 0.0327048640 0.0237263470
available today due to the usage of StAX as its XML 14 0.0556670760 0.0328829090 0.0237855310
parsing technology.
15 0.0557098490 0.0330663200 0.0238462250
TABLE I COMPARING THE JDOM, SAX, STAX
16 0.0557561020 0.0332445070 0.0239056370
Parser
Advantages Disadvantages
APIs 17 0.0557997820 0.0334221430 0.0239639500
XML document must be
Rich set of APIs parsed at one time 18 0.0558468040 0.0336052810 0.0240300790
Easy navigation Expensive to load entire
19 0.0558944750 0.0337868240 0.0240976250
DOM Entire tree loaded into tree into memory
memory, random access Generic DOM node not 20 0.0559429890 0.0339688360 0.0241573290
to XML document ideal for object-type
binding 21 0.0559919680 0.0341568860 0.0242198540
10 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 5, May 2011
Quad Core Execution (Parsing) Pentium D Execution(parsing)
0.7000000000
0.0600000000
0.6000000000
0.0500000000
T im e T a k e n ( N a n o S e c o n d s )
T im e t a k e n ( n a n o s e c o n d )
0.5000000000
0.0400000000
JDOM 0.4000000000 JDOM
0.0300000000 SAX SAX
0.3000000000 Stax
Stax
0.0200000000
0.2000000000
0.0100000000 0.1000000000
0.0000000000 0.0000000000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Nodes Nodes
FIG.3. T IME TAKEN BY Q UADCORE PROCESSOR TO PARSE AN XML FIG.4. T IME TAKEN BY PENTIUM D PROCESSOR TO PARSE AN XML
TABLE II TIME TAKEN PENTIUM D PROCESSOR TO PARSE AN XML From the above graph in Fig 3 and Fig 4, StAX takes
minimum time than other parsers JDOM and SAX.
Time taken (nanoseconds)
V.CONCLUSIONS
Nodes JDOM SAX StAX
Java API for XML Processing (JAXP) which processing
1 0.5790705250 0.1829763280 0.0863534340 XML documents by using, the Document Object Model
(DOM) method, the Simple API for XML (SAX) method,
2 0.6085354430 0.1833428550 0.0865333450 and Streaming API for XML (StAX) method are used
3 0.6086161790 0.1836504360 0.0866400620
commonly. As StAX name indicates, it is targeted at
streaming applications such as the merging of two
4 0.6086890940 0.1839482390 0.0867403540 documents and exchange information between cooperating
entities. StAX allows an application to process multiple
5 0.6087569790 0.1842457630 0.0868412050 XML sources simultaneously. Among the DOM and SAX
6 0.6088259820 0.1859532420 0.0869440110
widely used methods, StAX provides the parsing efficiency
and making developer comfort.
7 0.6088958240 0.1862731160 0.0870443030
REFERENCES
8 0.6089712520 0.1865784620 0.0871476690 [1] Rami Alnaqeib, Fahad H.Alshammari ,Zaidan.M.A, Zaidan.A.A.,
Zaidan.B.B., Zubaidah M.Hazza (2010) “An Overview: Extensible
9 0.6090441670 0.1868706780 0.0872543860
Markup Language Technology”, in Journal of computing, Volume
10 0.6091218300 0.1871617760 0.0873608240 2, Issue 6, June 2010.
[2] Nayak, Richi and Witt, Rebecca and Tonev, Anton (2002) “Data
11 0.6092008900 0.1874640490 0.0875563800 mining and XML documents”, In Proceedings International
Conference on Internet Computing, IC'2002 3, pages pp. 660-666.
12 0.6092768780 0.1877724680 0.0876628180 [3] Guy Lapalme (2010) “Exploring and Extracting Nodes from Large
XML Files” available at http://www.iro.umontreal.ca/~lapalme/
13 0.6093567760 0.1880747410 0.0877661830 ExamineXML.
[4] Wei Zhang and Robert A. van Engelen (2009) “An Adaptive XML
14 0.6094341600 0.1883831600 0.0878751350 Parser for Developing High-Performance Web Services”, in
proceedings of the International Symposium on Web Services
15 0.6095171320 0.1886907410 0.0879947040 (ISWS).
[5] Antoniu Nicula, Doina Zmaranda and Codruţa Vancea (2010)
16 0.6095973100 0.1891279480 0.0881047730 “Issues on efficiency of XML parsers”,
http://www.rpbourret.com/xml/XMLData Binding.
17 0.6096822370 0.1894360870 0.0882223860 [6] Morris Matsa, Eric Perkins, Abraham Heifets, Margaret Gaitatzes
Kostoulas, Daniel Silva, Noah Mendelsohn, Michelle Leger (2007)
18 0.6097646490 0.1897319350 0.0883324560
“A High Performance Interpretive Approach to Schema Directed
19 0.6098479000 0.1900311350 0.0884402910 Parsing“, in International World Wide Web Conference (IW3C2)
paper.
20 0.6099308710 0.1903364810 0.0885922660 [7] Toshiro Takase, Hisashi Miyashita, and Toyotaro Suzumura,
Michiaki Tatsubori, (2005) “An Adaptive, Fast, and Safe XML
21 0.6100155190 0.1922856120 0.0887062470 Parser Based on Byte Sequences Memorization” in International
World Wide Web Conference (IW3C2) may 2005.
11 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 5, May 2011
AUTHOR PROFILE
1. Mr.P.Krishna Sankar, M.E., Working as
Assistant Professor, Dr. Mahalingam College of
Technolgy, Pollachi, Coimbatore(Dt), Tamilnadu
(State), India. Earlier, worked in IBM – India
Software Lab, Bangalore as Software Engineer
for a year. Completed Master of Engineering in
Computer Science and Engineering in PSG
College of Technology, Coimbatore (Dt),
Tamilnadu (State), India and Bachelor of Engineering in Computer
Science and Engineering in K.S.Rangasamy College of Technology,
Tiruchengode, Namakkal (Dt), Tamilnadu (State), India. Have
published 3 National Conference papers and performed 8 Technical
Paper Presentations.
2. Ms.N.P.Shangaranarayanee, Student pursuing
Bachelor of Engineering in Computer Science and
Engineering in Angel College of Engineering and
Technology, Tirupur (Dt), Tamilnadu (State), India.
Have published 1 National Conference paper and
performed 4 Technical Paper Presentations.
12 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Related docs
Other docs by ijcsiseditor
Digital Images Encryption in Spatial Domain Based on Singular Value Decomposition and Cellular Automata
Views: 0 | Downloads: 0
Agent Behavior in Multiagent Systems: Issues and Challenges in Design, Development and Implementation
Views: 1 | Downloads: 0
Optimizing Cost, Delay, Packet Loss and Network Load in AODV Routing Protocols
Views: 2 | Downloads: 0
Get documents about "