slides
Shared by: xuyuzhu
-
Stats
- views:
- 0
- posted:
- 11/16/2012
- language:
- English
- pages:
- 29
Document Sample


Querying XML with Locator
Semantics
Peter Fankhauser
joint work with:
Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie
GMD German National Research Center for Information Technology
Institute for Integrated Publication- and Informationsystems
GMD-IPSI
http://xml.darmstadt.gmd.de/
Querying XML with Locator Semantics Slide 1
Overview
Requirements for Querying XML
XQL Overview
Locators
Locator Algebra
IPSI XML-Brokering Framework
Querying XML with Locator Semantics Slide 2
General Requirements for Querying XML
(Excerpt from Dave Maier, W3C QL 98)
Require no schema
• flexibly match irregular structure
• preserve (irregular) structure
Query & Preserve Order and Association
• sibling order
• hierarchy
Precise Semantics
• rewrite rules
• compositional semantics
Closedness/Completeness
• XML to XML
• when is a QL for XML complete?
Querying XML with Locator Semantics Slide 3
Running Example
Bookstore:
• Non Uniform Hierarchy
<books_and_customers> • sci-fi: 2 levels
<bookstore> • mystery: 3 levels
<fiction>
<sci-fi>
Customers: Flat Table
<book>
<isbn>0006482805</isbn> <customers>
<title>Do androids dream of electric sheep</title> <customer>
<author>Philip K. Dick</author> <name>Jason Woolsey</name>
</book> <boughtbooks>
</sci-fi> <isbn>0261102362</isbn>
<fantasy> <isbn>0593488321</isbn>
<mystery> </boughtbooks>
<book> </customer>
<isbn>0261102362</isbn> <customer>
<title>The two towers</title> <name>P.W. Ellis</name>
<author>JRR Tolkien</author> <boughtbooks>
</book> <isbn>0006482805</isbn>
</mystery> <isbn>0261102362</isbn>
</fantasy> </boughtbooks>
</fiction> </customer>
</bookstore> </customers>
<!-- continued next column --> </ books_and_customers >
Querying XML with Locator Semantics Slide 4
Functional Requirements for Querying XML
(Dave Maier, W3C QL 98)
Selection and Extraction:
• all sci-fi books by P.K. Dick
Reduction:
• drop all authors but 1st author
Combination:
• combine all books with their customers via isbn
Restructuring:
• return flat lists of title/author pairs
• and vice versa
Multidocument Handling:
• get reviews and books from different sites
• follow (dereference) links in books to authors
Querying XML with Locator Semantics Slide 5
XQL Overview (State W3C QL 98)
Basic Concept: Selection of Subtrees
• Originated as QL for DOM
• adopted for selectors in XSL-templates
(now merged with XPointer to XPel to XPath to ????)
• Defined along search contexts = an (ordered) set of document nodes
Path Expressions and Filters:
• A query is essentially a navigation in element trees
• Navigation and filters modify the search context
• Query result is the last search context
Selection of nodes by:
• Element- and attribute name
• Type (element, attribute, comment, etc.)
• Content or value of nodes
• Relationship between nodes: hierarchy, sequence, index
Combination by: union, intersection
Querying XML with Locator Semantics Slide 6
XQL 98 Examples
Selection and Extraction:
• all books by P.K. Dick
//book[author=„P.K. Dick“]
Reduction:
• drop all but 1st author
//*?/book?/(isbn | author[0] | title)
• * matches all elements along paths to book
• shallow return operator (?) retains nesting hierarchy
• union preserves document order (title before author)
Querying XML with Locator Semantics Slide 7
XQL 98 lacked:
Selection Functionality
• comparison operators for fulltext (in progress)
• regular path expressions for hierarchy (only // for recursive
descent and * for matching all nodes in a search context)
Restructuring
• Suggestions: return operators (SAG), XSLT (W3C), Application
Level (e.g. WebMethods)
Combination
• joins; Suggestions: see below
Graphs
• no navigation along ID/IDREF
• no multi-documents (dereferencing URIs)
• Suggestions: docref, ref, keyref, idref
Delegation
• external functions
• wrappers
Querying XML with Locator Semantics Slide 8
Extended XQL Examples
Combination:
• combine all books with customers via isbn
$root//*?/book?[$i:=isbn]/
(* | $root//customer?[boughtbooks/isbn=$i])
• New concepts
• combination with nodes outside of search context ($root//review)
• correlation variables for expressing join predicate [$i:=isbn]
• $root used for clarity...
• Irregular structure of bookstore is preserved
Multidocuments/Delegation:
• get multiple bookstores from a bookmark list (HTTP-GET)
docref('http://www.bookstores')/docref(.//@href)//bookstore
• the same with a form (HTTP-POST - simplified!)
docref ('http://www.bookstores/search.cfm',‘country',‘us')//bookstore
• the same with a wrapper (application program delivering XML)
wrapper(„bookstore“)//bookstore
Querying XML with Locator Semantics Slide 9
Towards a Datamodel for querying XML
<document>
<person id=“jonathanr">
<firstname>Jonathan</firstname>
person person article
?
<lastname>Robie</lastname>
</person>
<person id=“joel"> ? author
<firstname>Joe</firstname> author
<lastname>Lapp</lastname> firstname firstname
<!-- ... --> lastname lastname title year
<document>
Jonathan Robie Joe Lapp XQL for 1999
W3C-DOM: XML Serialization: Structured Text Dummies
Element Tree OEM: Graph
? ?
FlatElemTable DocElemTable DocumentTable
document
flat
"Text zu Elem1"
own_id doc up succ pred
0 1 - - -
own_id name dtdref root
1 "Dok1" 2 0
document.person
"Text zu Elem4"
"Text zu Elem6"
1
2
1 0 -
2 - -
-
-
2
3
"Dok2" 1
"Dok3" 1
2
9 document.person.@id
"Text zu Elem8"
"Text zu Elem10"
3
4
2 2 5 -
2 3 - - document.person.@id.“joel"
5 2 2 7 3 attrRecTable
NonFlatElemTable
6
7
2 5 -
2 2 -
-
5
element name
2 Attr2
value
AW2
document.person.firstname
down
1
etName
"E0"
8
9
2 7 -
3 - -
-
-
3 Attr3 AW3 document.person.firstname.“Joe"
3
4
"E2"
"E3"
10 3 9 - - DocumentTable
own_id name etypes config
document.person.firstname.“Lapp"
6
8
"E5"
"E7"
1
2
"DTD1" {...}
"DTD2" {...}
"...."
"...." document.person
10 "E9" 3 "DTD3" {...} "...."
document.person.@id
...
Relational Tables
(generic massive join option) Locators: Lists of Paths
Querying XML with Locator Semantics Slide 10
Locators for Bookstore
bookstore#1
bookstore#1.fiction#2
bookstore#1.fiction#2.sci-fi#3
bookstore#1.fiction#2.sci-fi#3.book#4
bookstore#1.fiction#2.sci-fi#3.book#4.isbn#5
bookstore#1.fiction#2.sci-fi#3.book#4.title#6
bookstore#1.fiction#2.sci-fi#3.book#4.author#7
…
bookstore#1.fiction#2.fantasy#8
bookstore#1.fiction#2.fantasy#8.mistery#9
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.title#12
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.author#13
...
Querying XML with Locator Semantics Slide 11
Locators <-> XML Serialization
Locators are lists of paths
XML-document->Locators
• each element-node gets id in document-order (depth first, left to
right traversal)
• each element-node is located by the entire path from root
• attributes are attached to element-nodes
• content is attached to leave-nodes
Locators->XML-document:
• clean up: discard locators $prefix which are followed by at least
one locator $prefix.$postfix
• generate tree
(1) for all locators generate nested serialization
(2) fill up with content and attributes
Mappings should be total, 1:1
Querying XML with Locator Semantics Slide 12
Locator Sets vs. Relations
Commonalties
• flat sets
• identity defined by identity of components
• concatenation to derive new locators/tuples
Differences
• arity
• locators: variable length
• tuples: fixed
• access to components:
• locators: by navigation
• tuples: by position/attribute
• data:
• locator components: document nodes
tuples components: values
Querying XML with Locator Semantics Slide 13
Locator Algebra (0)
Operator Relational Algebra Locator Algebra
, , - On tuple sets On locator sets
Select Selects tuples with a Selects locators with a predicate
predicate
Project By absolute Not available, implicit projection by
component selection dependent join
Cross Product Concatenate each Dependent join concatenating locators
tuple in one set with from a context set with locators from
each tuple in another dependent set
set
Theta-Join Combination of cross Combination of dependent join, select,
product with select and variable binding
Tree-Operators Not applicable DOM-methods
Querying XML with Locator Semantics Slide 14
Locator Algebra (1)
Preliminaries
• L domain of locator sets
• x, y
• PL domain of locators
• u, v
• tail(u) … last component of u
prefix(u) … u - tail(u)
Tree-Operators
• navigation in document tree using DOM methods
• root, parent, children: PL L
• applied to locator sets from L using d-join (see below)
Set-Operators
• , , -: L L L
defined as usual
• order preservation due to total ordering on document nodes
Querying XML with Locator Semantics Slide 15
Locator Algebra (2)
Select
• select[p]: L L, where p: PL Boolean
select[p](x) = {u | u x, p(tail(u))}
• Example: select[nodename(.) = “book”](x) =
select[“book”](x)
Return
• Corresponds to project
duplicates tail of locator for preserving it in
subsequent d-join (see below)
• return: PL PL
return(u)=concat(u, tail(u))
Querying XML with Locator Semantics Slide 16
Locator Algebra (3)
Dependent-Join:
• d-join[f]: L L, where f: PL L
d-join[f](x) = u x concat(prefix(u),f(tail(u))
• Example: return all titles of books in their book context
select[“title”](d-join[children(.)]
(select[“book”](d-join[return(children(.))](x)) =
/book?/title
Kleene Star:
• fixpoint-operator for recursive descent queries
• *[f]: L L, where f: L L
*[f](x) = f(x) *[f](f(x))
• Example: select all titles in their original context
select[“title”](d-join[children(.)]
(*[d-join[return(children(.)](.))](x))=
//*?/title
• maybe too general for physical algebra
Querying XML with Locator Semantics Slide 17
Locator Algebra (4)
Varbind, Varget
• to realize joins across contexts
• varbind[i,f]: L L, where i Name, f: PL L
varbind[i,f](x):
for all u x: vars(u):=vars(u) vf(tail(u))<i,v>
• varget[i]: PL L
varget[i](u): {v | (i,v) vars(u)}
Querying XML with Locator Semantics Slide 18
Join Example (1)
bc#0 $D=varbind[$i,select[“isbn”](children(.))]($B)=
//*?/book[$i:=isbn]?
$A=*[d-join[return(children(.))](.)](x)=
bc#0.bs#1.f#2.sf#3.b#4<$i,isbn#5>
//*?
bc#0.bs#1.f#2.fa#8.mi#9.b#10<$i,isbn#11>
bc#0.bookstore#1 ...
bc#0.bookstore#1.fiction#2
bc#0.bookstore#1.fiction#2.sci-fi#3 $E=select[“customer”](d-join[children(.)]
(*[d-join[return(children(.))](.)](d-join[root(.)]($D)))
...
=//*?/customer
customers#14.customer#15
$B=select[“book”](d-join[return(children(.))]($A))= customers#14.customer#20
//*?/book
bc#0.bs#1.f#2.sf#3.b#4 $F=d-join(select[
bc#0.bs#1.f#2.fa#8.mi#9.b#10 select[“isbn”](d-join[children(.)]
... (select[“boughtbooks”](d-join[children(.)](.)))=
= varget[$i](.)](“$E”)]($D)=
$C=d-join[return(children(.))]($B)=//*?/book?/* //*?/book[$i:=isbn]?/
(//*?/customer[boughtbooks/isbn=$i])
bc#0.bs#1.f#2.sf#3.b#4.isbn#5 bc#0.bs#1.f#2.sf#3.b#4.cs#14.customer#20
bc#0.bs#1.f#2.sf#3.b#4.title#6 bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#15
... bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#20
Querying XML with Locator Semantics Slide 19
Join Example (2)
<books_and_customers> <fantasy>
<bookstore> <mystery>
<fiction> <book>
<sci-fi> <isbn>0261102362</isbn>
<book> <title>The two towers</title>
<isbn>0006482805</isbn> <author>JRR Tolkien</author>
<title>Do androids dream of electric sheep</title> <customers>
<author>Philip K. Dick</author> <customer>
<customers> <name>Jason Woolsey</name>
<customer> <boughtbooks>
<name>P.W. Ellis</name> <isbn>0261102362</isbn><isbn>0593488321</isb
<boughtbooks> </boughtbooks>
<isbn>0006482805</isbn> </customer>
<isbn>0261102362</isbn> <customer>
</boughtbooks> <name>P.W. Ellis</name>
</customer> <boughtbooks>
</customers> <isbn>0006482805</isbn> <isbn>0261102362</isbn>
</book> </boughtbooks>
</sci-fi> </customer>
</customers>
</book>
</mystery>
</fantasy>
</fiction>
</bookstore>
</books_and_customers>
Querying XML with Locator Semantics Slide 20
Some Equivalence Transformations for L’Algebra
Commutativity:
• union(A,B) = union(B,A) (within single document)
• but d-join is not commutative
Associativity:
• union, intersect, d-join
Idempotence:
• union(A,A) = A
Distributivity:
• //book/(title | author) = //book/title | //book/author
Neutral Elements:
• union: {}
• d-join: $root(?)
Querying XML with Locator Semantics Slide 21
Open Issues
Combination with relational algebra
Graphs/Multidocuments
• DAGs: Multiple paths from root-context to node (serialization?)
• Role of URIs in locators?
Typing
• Role of XSD (XML Schema Description)
• Inference
Constructors
• attribute to element and vice versa….
• Grouping, Skolems
Details
• Investigate conformance of locator concept to W3C Infoset
• Constraints on locators/mappings to guarantee wellformedness
Political
• XQL-Implementations shipping:
underlying semantics node-based, not locator-based
Querying XML with Locator Semantics Slide 22
The IPSI XML Brokering Framework
Visualization
HTML, CSS
URL+Queries
XQL XML XSL Processor
Queries
XQL XML Server (HTTP, URL)
Program
DOM Queryprocessor: XML Query Language (XQL)
Datamodel: Document Object Model (W3C-DOM)
Persistent
DOM
HTTP/HTML Generic JEDI Specific
Warehouse
Roboter Wrappers Framework Wrappers
Querying XML with Locator Semantics Slide 23
Wrappers
Jedi Framework for Wrappers
• Pivot Object Model
• Scripting language for control-flow
• Access to dynamic sources (ODBC, CORBA) with iterators
Generic Wrappers
• Generic Mapping of structured formats to XML
• Examples: SGML,XML, HTML, MS-RTF
Jedi Parser
• for irregularily formatted sources
• context free, attributed grammars
• fault-tolerant, efficient parser: unlimited lookahead, interpretation
of ambiguous, incomplete grammars by specificity ordering
HTTP-Access
• Access plans for delegation integrated with XQL Engine
Querying XML with Locator Semantics Slide 24
Mediator: XQL Engine + Persistent DOM
XQL 98 Implementation
• efficient recursive descent queries by signature-index
+ Joins
+ Multi Document Handling
• extends XQL with external references (via http-get, http-post)
• Multidocument DOM; for every node namespace and URI
+ User defined functions
• input: context (reference-node-set, reference-node-pointer),
parameters: constants, XQL-expressions (lazy evaluation)
• output: node-functions, collection-functions (set of nodes),
comparison-operators
can attach base-URIs
• variables
Querying XML with Locator Semantics Slide 25
Application 1: An XML Broker for Golfers
<golfdemo
<golfplatz>
XSL <adresse> ... </adresse>
<greenfee> ... </greenfee>
...
</golfplatz>
<wetter> ... </wetter>
<route> ... </route>
</golfdemo>
Query XML Broker
<golfplatz id="platz0001"> <www.wetter.de> <www.reiseplanung.de>
<adresse> <wetter> <route>
[...] <plz>87724</plz> <von>53757</von>
</adresse> <datum>981001</datum> <nach>93333</nach>
<policy>
... <temperatur>16</temperatur> <entfernung>481.9</entfernung>
</policy> <regen>90</regen> <fahrzeit>274</fahrzeit>
<handicap> <wind>9</wind> <karte>5375793333.gif</karte>
<wochentag>34</wochentag> <prognose>13</prognose> </route>
<wochenende>34</wochenende> </wetter> <!-- ... -->
</handicap> <!-- ... --> </www.reiseplanung.de>
</golfplatz> <www.wetter.de>
Querying XML with Locator Semantics Slide 26
Application 2: RELIMO Integrating
Bioinformatics Data
XML Application XML Browser XSL Formatter
(e.g. Office 2000) (e.g. Mozilla 5) (e.g. Lotus-XSL)
XML Broker
RELIBASE
with XML
PDB
RPC
as local
PDOM
Querying XML with Locator Semantics Slide 27
Application Data
XML Broker for Golfers
• Sources: www.golffuehrer.de (500 KB), www.wetter.de (200 KB),
www.routen-information.de (200 KB)
• Joins (via zip-code) ~ 2 to 3 secs
RELIMO (Germany)
• Sources: Relibase (XML-RPC), PDB (5 GB -> 25 MB XML, 30 MB
PDOM)
• response time (100 MB) 50 to 30000 ms
MIROWEB (ESPRIT)
• JEDI for importing several sources to Oracle 8
Shakespeare
• all plays
• 10 MB (Tests with duplicated data up to 0.5 GB)
Querying XML with Locator Semantics Slide 28
Some Links & Acks
XQL FAQ
• http://metalab.unc.edu/xql/
IPSI XML Research & Development
• http://xml.darmstadt.gmd.de
• XQL-Engine 1.0.1 download (non-commercial use)
• JEDI download (non-commercial use)
XML Brokering Framework Licensing Info (Infonyte)
• hemmje@globit.com
• www.infonyte.com
Many thanks to
• Karl Aberer, Harald Schöning, Guido Mörkotte
Querying XML with Locator Semantics Slide 29
Get documents about "