Docstoc

Improved GML parsing through pull parsing and java centric binding

Document Sample
Improved GML parsing through pull parsing and java centric binding Powered By Docstoc
					Improved GML parsing through pull parsing and java centric binding

Donny Hallman

1

Agenda
      

Background Pull versus Push Parsing Java-centric versus XML-Centric Performance comparisons JiBX Conclusions Questions

2

Background


First there was DOM and SAX:
– –

“DOM is a tree-based API, one which builds an in-memory representation of an XML instance “ “SAX is an event-driven API, one which, rather than building an in-memory representation of an XML, calls event handlers as it encounters, serially, particular features of the XML instance” DOM is easy to use, but slow and uses lots of memory SAX is fast and efficient but a pain to use



Reality
– –

3

Background
The first approaches for project  Create a GML-specific parser
–

This would be too involved and complicated for the time given for the project



Create a tool that would generate Javacentric code
–

Used reflection; too similar to tools already in existence

4

Background
 

Found out about JiBX through an agenda for a Northern Virginia Software Symposium Decided to focus on JiBX
–
– –

Based on some principles of my first project idea Used a new parsing that I was not familiar with (Pull Parsing) Preliminary research showed that it was a very fast and efficient tool

5

Pull versus Push Parsing


XML Pull Parsing
– – – –

Some small projects existed before 2000 In 2000 some groups worked to standardize XML parsing JSR 173: Streaming API for XML request came in March 2002 StAX released 2004

6

Pull versus Push Parsing
(Directly from Sun Microsystems) Pull parsing provides several advantages over push parsing when working with XML streams:  With pull parsing, the client controls the application thread, and can call methods on the parser when needed. By contrast, with push processing, the parser controls the application thread, and the client can only accept invocations from the parser.  Pull parsing libraries can be much smaller and the client code to interact with those libraries much simpler than with push libraries, even for more complex documents.  Pull clients can read multiple documents at one time with a single thread.  A StAX pull parser can filter XML documents such that elements unnecessary to the client can be ignored

7

Java-Centric versus XML-Centric
<MyReport> <NumPages>24</NumPages> <DateCompleted>02/02/2006</DateCompleted> </MyReport>

// Java-Centric Class Report{ private int pages; private Date competedDate; }

// XML-Centric Class MyReport{ private String numPages; private String dateCompleted; }

8

Java-Centric versus XML-centric
XML-centric  Java representation of the XML structure for which XML will be unmarshalled  The data (usually) is stored as Strings Java-Centric  XML gets mapped to pre-existing Java classes, which don’t necessarily represent XML structure  Allows for data to be stored as it will be used (int, Date, float, etc)

9

Performance
 

Compared three different studies/projects Each evaluated at least four different XML binding tools

10

Performance (Sosnoski)
Dennis Sosnoski, January 2003  Tested 6 binding tools
– –

Castor, JAXB, Zeus, JBind, Quick, JiBX dom4J and SAX2 used as baselines

  



ttcomp – used values as attributes ttfull - used more elements Small : 34 1.4-3.3KB or 2.2-5.8 KB docs (not shown) Large: 1 107KB or 212 KB doc

11

Performance (Sosnoski)

12

Performance (Sosnoski)

13

Performance (Sosnoski)

14

Performance (SOA)


Study by four engineers for XML processing (SOAP) for web services
–

–
–

Article published August 2005 Dr. Srinivas Padmanabhuni; Bijoy Majumdar; Ujval Mysore; Vikram Sitaram Compared Castor, JAXB 2.0, XMLBeans, JiBX

15

Performance (SOA)

16

Performance (SOA)

17

Performance (SOA)

18

Performance (SOA)

19

Performance (Bindmark)


Bindmark
– – – – – –

Most comprehensive study Involved 21 commercial or open source tools Study from January 2005 to October 2005 For small tests : 1.42KB, 55 lines, 2 levels deep For medium tests : 123KB, 1817 lines, 20 levels deep For large tests : 1003KB, 8567 lines, 45 levels deep

20

Performance (Bindmark)
Strategy used for extracting averages:  For small tests : over 1000 successive invocations  For medium tests : over 100 successive invocations  For large tests : over 10 successive invocations
21

Performance (Bindmark)

22

Performance (Bindmark)

23

• Green cells indicate performance better than the reference hard-coded marshalling and SAX unmarshalling implementations • Red cells indicate performance worse than the reference reflection marshalling and DOM unmarshalling implementations.

JiBX
 



Uses XML pull parsing Uses a mapping file to show how to map XML to Java classes (Java-centric) Uses XML binding compiler to enhance the byte code (aspect weaving) of existing classes

24

JiBX


Simple Binding Example

25

JiBX


Flattening Example

26

JiBX


Split Binding

27

JiBX


Ignored Components

28

JiBX
Disadvantages:  Creating XML mapping file can be cumbersome  Debugging difficult

29

Conclusions
Java-centric binding depends on the application:  Is GML processing the focus of the application or just a subset of the functionality?  How much of the GML data is really needed for the application?

30

Conclusions
XML Pull Parsing:  Is all of the data going to be used? JiBX  Do you want Java-centric binding and XML Pull parsing?  Performance or convenience?
31

Questions?

32


				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:250
posted:11/20/2009
language:English
pages:32