What’s All This About Different
Kinds of XML Parsers?
How Do XML Parsers Differ? C. By what information is re-
turned to the client:
A. By the relationship be-
tween the XML parser and 6. Element structure and prop-
its client: erties, and data content in-
1. A DOM parser returns a
“tree” representation of the 7. Internal entity location and
XML document. value information.
2. A Push parser calls client’s 8. External entity information,
methods with XML events. and the ability of the client to
participate in entity resolution.
3. A Pull parser returns XML
events to a client on request. ……………………
B. By how data is returned: There aren’t necessarily clean
boundaries between the dif-
4. Data-copying XML parsers
ferent kinds of XML parsers:
copy all the information in the
parsed XML document into ob- 9. Even the best in-situ pars-
jects, returned to the client. ers have to provide some in-
formation using objects.
5. In-situ XML parsers, as
much as possible, indicate 10. Most pull model parsers re-
where data was found in the vert to the push model when
parsed XML document. accessing external entity data.
XML Parsers: Power vs. History
Relationship between parsers: A bit of history:
1. Given either a push or a pull In the realm of generally avail-
XML parser, you can easily able parsers, SAX and SAX-
build a DOM parser. Push or like push model parsers came
pull will give you XML events, first. Using an object-
and you just create DOM tree oriented programming lan-
nodes to represent and link guage, a SAX-like parser is the
those events. easiest to build.
2. Given a pull XML parser, you DOM and DOM-like parsers
can easily build a push parser: came next. Given a SAX-like
get the events from the pull parser, a DOM or DOM-like
parser and call the appropriate parser is easy to build. With-
methods in the push fashion. out a push parser, you have to
reproduce all the logic of a pull
In both cases, the other way
parser within the DOM parser.
round doesn’t really work. You
can “fake” a pull or push parser The next step is pull-model
based on a DOM parser, but parsers. Pull model parsers are
you lose the low-footprint se- hard to build, because the cur-
rial parsing of both those mod- rent state of parsing, held in
els. You can’t really build a pull the tree in a DOM parser, held
parser using a push parser (in in the program state in a SAX
the absence of coroutines, but parser, must be completely
that’s another story.) saved away between calls from
the pulling client.
XML Parsers: Power vs. History
History vs. Power What’s missing?
It’s not surprising that more The current pull-model parsers
powerful parsing models have only partially implement a pull-
come later on — that’s the everything interface, and most
conventional direction of “pro- are data-copying.
gress”. On the other hand, had
Why pull-everything? Extend-
we had freely-available pull
ing the pull interface to entity-
parsers to start with, history
side events and giving the cli-
would have been different.
ent full control over external
In-situ vs. data-copying entity resolution using a pull
model puts fewer constraints
In-situ is more powerful than
on the client and widens the
data-copying: if the client
range of application of an XML
doesn’t care where the data is,
either will do; if the client
cares, only in-situ will do. In-situ parsing is a good fit for
small-footprint devices (see
Antonio J. Sierra’s paper), but
first, and in-situ later.
has other applications. For ex-
Anything next? ample, it allows you to use XML
as the internal “native” format
Pull-everything, in-situ XML
of data in an XML-aware text
parsers are about as far as you
editor. Again, the idea is put
can go — they can do anything.
the choices in the client’s
So we’re getting close to the
end of this bit of history.