slides

Shared by: xuyuzhu
Categories
Tags
-
Stats
views:
0
posted:
11/16/2012
language:
English
pages:
29
Document Sample
scope of work template
							                 Querying XML with Locator
                        Semantics
                            Peter Fankhauser
                             joint work with:
    Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie

     GMD German National Research Center for Information Technology
       Institute for Integrated Publication- and Informationsystems
                                 GMD-IPSI
                       http://xml.darmstadt.gmd.de/



Querying XML with Locator Semantics                             Slide 1
             Overview


         Requirements for Querying XML


         XQL Overview


         Locators


         Locator Algebra


         IPSI XML-Brokering Framework



Querying XML with Locator Semantics      Slide 2
             General Requirements for Querying XML
             (Excerpt from Dave Maier, W3C QL 98)


          Require no schema
           • flexibly match irregular structure
           • preserve (irregular) structure
          Query & Preserve Order and Association
           • sibling order
           • hierarchy
          Precise Semantics
           • rewrite rules
           • compositional semantics
          Closedness/Completeness
           • XML to XML
           • when is a QL for XML complete?




Querying XML with Locator Semantics                  Slide 3
              Running Example

                                                                      Bookstore:
                                                                      • Non Uniform Hierarchy
           <books_and_customers>                                      • sci-fi: 2 levels
           <bookstore>                                                • mystery: 3 levels
            <fiction>
              <sci-fi>
                                                                      Customers: Flat Table
                <book>
                 <isbn>0006482805</isbn>                                <customers>
                 <title>Do androids dream of electric sheep</title>      <customer>
                 <author>Philip K. Dick</author>                           <name>Jason Woolsey</name>
                </book>                                                    <boughtbooks>
              </sci-fi>                                                     <isbn>0261102362</isbn>
              <fantasy>                                                     <isbn>0593488321</isbn>
                <mystery>                                                  </boughtbooks>
                 <book>                                                  </customer>
                  <isbn>0261102362</isbn>                                <customer>
                  <title>The two towers</title>                            <name>P.W. Ellis</name>
                  <author>JRR Tolkien</author>                             <boughtbooks>
                 </book>                                                    <isbn>0006482805</isbn>
                </mystery>                                                  <isbn>0261102362</isbn>
              </fantasy>                                                   </boughtbooks>
            </fiction>                                                   </customer>
           </bookstore>                                                 </customers>
           <!-- continued next column -->                               </ books_and_customers >


Querying XML with Locator Semantics                                                                 Slide 4
             Functional Requirements for Querying XML
             (Dave Maier, W3C QL 98)


          Selection and Extraction:
           • all sci-fi books by P.K. Dick
          Reduction:
           • drop all authors but 1st author
          Combination:
           • combine all books with their customers via isbn
          Restructuring:
           • return flat lists of title/author pairs
           • and vice versa
          Multidocument Handling:
           • get reviews and books from different sites
           • follow (dereference) links in books to authors



Querying XML with Locator Semantics                            Slide 5
               XQL Overview (State W3C QL 98)

         Basic Concept: Selection of Subtrees
           • Originated as QL for DOM
           • adopted for selectors in XSL-templates
             (now merged with XPointer to XPel to XPath to ????)
           • Defined along search contexts = an (ordered) set of document nodes
         Path Expressions and Filters:
           • A query is essentially a navigation in element trees
           • Navigation and filters modify the search context
           • Query result is the last search context
         Selection of nodes by:
           •   Element- and attribute name
           •   Type (element, attribute, comment, etc.)
           •   Content or value of nodes
           •   Relationship between nodes: hierarchy, sequence, index
         Combination by: union, intersection

Querying XML with Locator Semantics                                     Slide 6
             XQL 98 Examples



       Selection and Extraction:
         • all books by P.K. Dick
           //book[author=„P.K. Dick“]
       Reduction:
         • drop all but 1st author
           //*?/book?/(isbn | author[0] | title)
         • * matches all elements along paths to book
         • shallow return operator (?) retains nesting hierarchy
         • union preserves document order (title before author)




Querying XML with Locator Semantics                                Slide 7
             XQL 98 lacked:

          Selection Functionality
           • comparison operators for fulltext (in progress)
           • regular path expressions for hierarchy (only // for recursive
             descent and * for matching all nodes in a search context)
          Restructuring
           • Suggestions: return operators (SAG), XSLT (W3C), Application
             Level (e.g. WebMethods)
          Combination
           • joins; Suggestions: see below
          Graphs
           • no navigation along ID/IDREF
           • no multi-documents (dereferencing URIs)
           • Suggestions: docref, ref, keyref, idref
          Delegation
           • external functions
           • wrappers

Querying XML with Locator Semantics                                          Slide 8
             Extended XQL Examples


       Combination:
         • combine all books with customers via isbn
           $root//*?/book?[$i:=isbn]/
                    (* | $root//customer?[boughtbooks/isbn=$i])
         • New concepts
              • combination with nodes outside of search context ($root//review)
              • correlation variables for expressing join predicate [$i:=isbn]
              • $root used for clarity...
         • Irregular structure of bookstore is preserved
       Multidocuments/Delegation:
         • get multiple bookstores from a bookmark list (HTTP-GET)
           docref('http://www.bookstores')/docref(.//@href)//bookstore
         • the same with a form (HTTP-POST - simplified!)
           docref ('http://www.bookstores/search.cfm',‘country',‘us')//bookstore
         • the same with a wrapper (application program delivering XML)
           wrapper(„bookstore“)//bookstore

Querying XML with Locator Semantics                                                Slide 9
                                Towards a Datamodel for querying XML


                                                                                            <document>
                                                                                             <person id=“jonathanr">
                                                                                               <firstname>Jonathan</firstname>
                                                                                                                                                        person    person      article

                                                           ?
                                                                                               <lastname>Robie</lastname>
                                                                                            </person>
                                                                                            <person id=“joel">                        ?                 author
                                                                                               <firstname>Joe</firstname>                                                    author
                                                                                               <lastname>Lapp</lastname>                    firstname        firstname
                                                                                            <!-- ... -->                                         lastname         lastname     title    year
                                                                                            <document>
                                                                                                                                           Jonathan Robie   Joe     Lapp     XQL for    1999
    W3C-DOM:                                                XML Serialization: Structured Text                                                                               Dummies

    Element Tree                                                                                                                                            OEM: Graph
                                                                          ?                                      ?
             FlatElemTable            DocElemTable         DocumentTable
                                                                                                                                 document
                     flat
             "Text zu Elem1"
                                 own_id doc up succ pred
                                   0     1 - -        -
                                                           own_id name dtdref root
                                                             1     "Dok1" 2     0
                                                                                                                                 document.person
             "Text zu Elem4"
             "Text zu Elem6"
                                   1
                                   2
                                         1 0 -
                                         2 - -
                                                      -
                                                      -
                                                             2
                                                             3
                                                                   "Dok2" 1
                                                                   "Dok3" 1
                                                                                2
                                                                                9                                                document.person.@id
             "Text zu Elem8"
             "Text zu Elem10"
                                   3
                                   4
                                         2 2 5 -
                                         2 3 -        -                                                                          document.person.@id.“joel"
                                   5     2 2 7 3           attrRecTable

     NonFlatElemTable
                                   6
                                   7
                                         2 5 -
                                         2 2 -
                                                      -
                                                      5
                                                           element name
                                                               2 Attr2
                                                                          value
                                                                          AW2
                                                                                                                                 document.person.firstname
     down
         1
              etName
                 "E0"
                                   8
                                   9
                                         2 7 -
                                         3 - -
                                                      -
                                                      -
                                                               3 Attr3    AW3                                                    document.person.firstname.“Joe"
         3
         4
                 "E2"
                 "E3"
                                   10    3 9 -        -    DocumentTable
                                                           own_id name etypes     config
                                                                                                                                 document.person.firstname.“Lapp"
         6
         8
                 "E5"
                 "E7"
                                                             1
                                                             2
                                                                   "DTD1" {...}
                                                                   "DTD2" {...}
                                                                                   "...."
                                                                                   "...."                                        document.person
        10       "E9"                                        3     "DTD3" {...}    "...."
                                                                                                                                 document.person.@id
                                                                                                                                 ...
 Relational Tables
 (generic massive join option)                                                                                          Locators: Lists of Paths

Querying XML with Locator Semantics                                                                                                                                          Slide 10
             Locators for Bookstore


    bookstore#1
    bookstore#1.fiction#2
    bookstore#1.fiction#2.sci-fi#3
    bookstore#1.fiction#2.sci-fi#3.book#4
    bookstore#1.fiction#2.sci-fi#3.book#4.isbn#5
    bookstore#1.fiction#2.sci-fi#3.book#4.title#6
    bookstore#1.fiction#2.sci-fi#3.book#4.author#7
    …
    bookstore#1.fiction#2.fantasy#8
    bookstore#1.fiction#2.fantasy#8.mistery#9
    bookstore#1.fiction#2.fantasy#8.mistery#9.book#10
    bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11
    bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.title#12
    bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.author#13
    ...


Querying XML with Locator Semantics                                      Slide 11
             Locators <-> XML Serialization


       Locators are lists of paths
       XML-document->Locators
         • each element-node gets id in document-order (depth first, left to
           right traversal)
         • each element-node is located by the entire path from root
         • attributes are attached to element-nodes
         • content is attached to leave-nodes
       Locators->XML-document:
         • clean up: discard locators $prefix which are followed by at least
           one locator $prefix.$postfix
         • generate tree
           (1) for all locators generate nested serialization
           (2) fill up with content and attributes
       Mappings should be total, 1:1

Querying XML with Locator Semantics                                       Slide 12
             Locator Sets vs. Relations


           Commonalties
            • flat sets
            • identity defined by identity of components
            • concatenation to derive new locators/tuples
           Differences
            • arity
                  • locators: variable length
                  • tuples: fixed
            • access to components:
                  • locators: by navigation
                  • tuples: by position/attribute
            • data:
                  • locator components: document nodes
                    tuples components: values




Querying XML with Locator Semantics                         Slide 13
             Locator Algebra (0)


               Operator        Relational Algebra      Locator Algebra

               , , -         On tuple sets           On locator sets

               Select          Selects tuples with a   Selects locators with a predicate
                               predicate
               Project         By absolute             Not available, implicit projection by
                               component selection     dependent join

               Cross Product   Concatenate each        Dependent join concatenating locators
                               tuple in one set with   from a context set with locators from
                               each tuple in another   dependent set
                               set
               Theta-Join      Combination of cross    Combination of dependent join, select,
                               product with select     and variable binding
               Tree-Operators Not applicable           DOM-methods




Querying XML with Locator Semantics                                                             Slide 14
             Locator Algebra (1)


           Preliminaries
            • L domain of locator sets
                  • x, y
            • PL domain of locators
                  • u, v
            • tail(u) … last component of u
              prefix(u) … u - tail(u)
           Tree-Operators
            • navigation in document tree using DOM methods
            • root, parent, children: PL  L
            • applied to locator sets from L using d-join (see below)
           Set-Operators
            • , , -: L  L  L
              defined as usual
            • order preservation due to total ordering on document nodes


Querying XML with Locator Semantics                                     Slide 15
             Locator Algebra (2)


           Select
            • select[p]: L  L, where p: PL  Boolean
              select[p](x) = {u | u  x, p(tail(u))}
            • Example: select[nodename(.) = “book”](x) =
              select[“book”](x)
           Return
            • Corresponds to project
              duplicates tail of locator for preserving it in
              subsequent d-join (see below)
            • return: PL  PL
              return(u)=concat(u, tail(u))




Querying XML with Locator Semantics                             Slide 16
             Locator Algebra (3)

       Dependent-Join:
         • d-join[f]: L  L, where f: PL  L
           d-join[f](x) = u  x concat(prefix(u),f(tail(u))
         • Example: return all titles of books in their book context
           select[“title”](d-join[children(.)]
                             (select[“book”](d-join[return(children(.))](x)) =
           /book?/title
       Kleene Star:
         • fixpoint-operator for recursive descent queries
         • *[f]: L  L, where f: L  L
           *[f](x) = f(x)  *[f](f(x))
         • Example: select all titles in their original context
           select[“title”](d-join[children(.)]
                             (*[d-join[return(children(.)](.))](x))=
           //*?/title
         • maybe too general for physical algebra


Querying XML with Locator Semantics                                         Slide 17
             Locator Algebra (4)

      Varbind, Varget
        • to realize joins across contexts
        • varbind[i,f]: L  L, where i  Name, f: PL  L
          varbind[i,f](x):
           for all u  x: vars(u):=vars(u)    vf(tail(u))<i,v>
        • varget[i]: PL  L
          varget[i](u): {v | (i,v)  vars(u)}




Querying XML with Locator Semantics                                 Slide 18
            Join Example (1)


               bc#0                                   $D=varbind[$i,select[“isbn”](children(.))]($B)=
                                                         //*?/book[$i:=isbn]?

$A=*[d-join[return(children(.))](.)](x)=
                                                       bc#0.bs#1.f#2.sf#3.b#4<$i,isbn#5>
   //*?
                                                       bc#0.bs#1.f#2.fa#8.mi#9.b#10<$i,isbn#11>
bc#0.bookstore#1                                       ...
bc#0.bookstore#1.fiction#2
bc#0.bookstore#1.fiction#2.sci-fi#3                   $E=select[“customer”](d-join[children(.)]
                                                          (*[d-join[return(children(.))](.)](d-join[root(.)]($D)))
...
                                                      =//*?/customer
                                                       customers#14.customer#15
$B=select[“book”](d-join[return(children(.))]($A))=    customers#14.customer#20
   //*?/book

 bc#0.bs#1.f#2.sf#3.b#4                               $F=d-join(select[
 bc#0.bs#1.f#2.fa#8.mi#9.b#10                              select[“isbn”](d-join[children(.)]
 ...                                                       (select[“boughtbooks”](d-join[children(.)](.)))=
                                                               = varget[$i](.)](“$E”)]($D)=
$C=d-join[return(children(.))]($B)=//*?/book?/*       //*?/book[$i:=isbn]?/
                                                             (//*?/customer[boughtbooks/isbn=$i])
 bc#0.bs#1.f#2.sf#3.b#4.isbn#5                 bc#0.bs#1.f#2.sf#3.b#4.cs#14.customer#20
 bc#0.bs#1.f#2.sf#3.b#4.title#6                bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#15
 ...                                           bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#20


Querying XML with Locator Semantics                                                              Slide 19
              Join Example (2)

        <books_and_customers>                                       <fantasy>
        <bookstore>                                                    <mystery>
         <fiction>                                                      <book>
           <sci-fi>                                                      <isbn>0261102362</isbn>
            <book>                                                       <title>The two towers</title>
             <isbn>0006482805</isbn>                                     <author>JRR Tolkien</author>
             <title>Do androids dream of electric sheep</title>          <customers>
             <author>Philip K. Dick</author>                             <customer>
             <customers>                                                    <name>Jason Woolsey</name>
             <customer>                                                     <boughtbooks>
               <name>P.W. Ellis</name>                                         <isbn>0261102362</isbn><isbn>0593488321</isb
               <boughtbooks>                                                </boughtbooks>
                <isbn>0006482805</isbn>                                   </customer>
                <isbn>0261102362</isbn>                                  <customer>
               </boughtbooks>                                            <name>P.W. Ellis</name>
              </customer>                                                <boughtbooks>
             </customers>                                                  <isbn>0006482805</isbn> <isbn>0261102362</isbn>
             </book>                                                     </boughtbooks>
            </sci-fi>                                                    </customer>
                                                                        </customers>
                                                                        </book>
                                                                      </mystery>
                                                                    </fantasy>
                                                                   </fiction>
                                                                  </bookstore>
                                                                  </books_and_customers>
Querying XML with Locator Semantics                                                                     Slide 20
          Some Equivalence Transformations for L’Algebra


       Commutativity:
         • union(A,B) = union(B,A) (within single document)
         • but d-join is not commutative
       Associativity:
         • union, intersect, d-join
       Idempotence:
         • union(A,A) = A
       Distributivity:
         • //book/(title | author) = //book/title | //book/author
       Neutral Elements:
         • union: {}
         • d-join: $root(?)




Querying XML with Locator Semantics                                 Slide 21
             Open Issues

         Combination with relational algebra
         Graphs/Multidocuments
           • DAGs: Multiple paths from root-context to node (serialization?)
           • Role of URIs in locators?
         Typing
           • Role of XSD (XML Schema Description)
           • Inference
         Constructors
           • attribute to element and vice versa….
           • Grouping, Skolems
         Details
           • Investigate conformance of locator concept to W3C Infoset
           • Constraints on locators/mappings to guarantee wellformedness
         Political
           • XQL-Implementations shipping:
             underlying semantics node-based, not locator-based

Querying XML with Locator Semantics                                       Slide 22
            The IPSI XML Brokering Framework


                                                                  Visualization
                                                                     HTML, CSS
                                              URL+Queries
                                                XQL     XML     XSL Processor

                              Queries
                          XQL           XML            Server (HTTP, URL)

       Program
               DOM             Queryprocessor: XML Query Language (XQL)


                          Datamodel: Document Object Model (W3C-DOM)
       Persistent
         DOM
                        HTTP/HTML         Generic        JEDI        Specific
      Warehouse
                         Roboter         Wrappers     Framework      Wrappers


Querying XML with Locator Semantics                                          Slide 23
             Wrappers


       Jedi Framework for Wrappers
         • Pivot Object Model
         • Scripting language for control-flow
         • Access to dynamic sources (ODBC, CORBA) with iterators
       Generic Wrappers
         • Generic Mapping of structured formats to XML
         • Examples: SGML,XML, HTML, MS-RTF
       Jedi Parser
         • for irregularily formatted sources
         • context free, attributed grammars
         • fault-tolerant, efficient parser: unlimited lookahead, interpretation
           of ambiguous, incomplete grammars by specificity ordering
       HTTP-Access
         • Access plans for delegation integrated with XQL Engine


Querying XML with Locator Semantics                                         Slide 24
             Mediator: XQL Engine + Persistent DOM



         XQL 98 Implementation
           • efficient recursive descent queries by signature-index
         + Joins
         + Multi Document Handling
           • extends XQL with external references (via http-get, http-post)
           • Multidocument DOM; for every node namespace and URI
         + User defined functions
           • input: context (reference-node-set, reference-node-pointer),
             parameters: constants, XQL-expressions (lazy evaluation)
           • output: node-functions, collection-functions (set of nodes),
             comparison-operators
             can attach base-URIs
           • variables



Querying XML with Locator Semantics                                         Slide 25
              Application 1: An XML Broker for Golfers

                                                <golfdemo
                                                   <golfplatz>
                                          XSL        <adresse> ... </adresse>
                                                     <greenfee> ... </greenfee>
                                                     ...
                                                   </golfplatz>
                                                   <wetter>    ... </wetter>
                                                   <route>     ... </route>
                                                </golfdemo>




                           Query                XML Broker

          <golfplatz id="platz0001">               <www.wetter.de>                <www.reiseplanung.de>
              <adresse>                              <wetter>                       <route>
              [...]                                    <plz>87724</plz>               <von>53757</von>
              </adresse>                               <datum>981001</datum>          <nach>93333</nach>
              <policy>
                   ...                             <temperatur>16</temperatur>    <entfernung>481.9</entfernung>
              </policy>                                <regen>90</regen>              <fahrzeit>274</fahrzeit>
              <handicap>                               <wind>9</wind>                 <karte>5375793333.gif</karte>
                  <wochentag>34</wochentag>            <prognose>13</prognose>      </route>
                  <wochenende>34</wochenende>        </wetter>                     <!-- ... -->
              </handicap>                            <!-- ... -->                 </www.reiseplanung.de>
          </golfplatz>                             <www.wetter.de>




Querying XML with Locator Semantics                                                                 Slide 26
             Application 2: RELIMO Integrating
             Bioinformatics Data


       XML Application                  XML Browser              XSL Formatter
       (e.g. Office 2000)               (e.g. Mozilla 5)         (e.g. Lotus-XSL)



                                        XML Broker




                             RELIBASE
                             with XML
                                                        PDB
                               RPC
                                                      as local
                                                       PDOM



Querying XML with Locator Semantics                                            Slide 27
             Application Data


       XML Broker for Golfers
         • Sources: www.golffuehrer.de (500 KB), www.wetter.de (200 KB),
           www.routen-information.de (200 KB)
         • Joins (via zip-code) ~ 2 to 3 secs
       RELIMO (Germany)
         • Sources: Relibase (XML-RPC), PDB (5 GB -> 25 MB XML, 30 MB
           PDOM)
         • response time (100 MB) 50 to 30000 ms
       MIROWEB (ESPRIT)
         • JEDI for importing several sources to Oracle 8
       Shakespeare
         • all plays
         • 10 MB (Tests with duplicated data up to 0.5 GB)




Querying XML with Locator Semantics                                  Slide 28
             Some Links & Acks


       XQL FAQ
         • http://metalab.unc.edu/xql/

       IPSI XML Research & Development
         • http://xml.darmstadt.gmd.de
         • XQL-Engine 1.0.1 download (non-commercial use)
         • JEDI download (non-commercial use)

       XML Brokering Framework Licensing Info (Infonyte)
         • hemmje@globit.com
         • www.infonyte.com

       Many thanks to
         • Karl Aberer, Harald Schöning, Guido Mörkotte



Querying XML with Locator Semantics                         Slide 29

						
Related docs
Other docs by xuyuzhu
like - ms-campbell
Views: 164  |  Downloads: 0
AVI TWIN - Albuz
Views: 137  |  Downloads: 0
APUNTATE Y PARTICIPA - Uceda
Views: 0  |  Downloads: 0
Ethics.doc - Toledo Estate Planning Council
Views: 114  |  Downloads: 0
Ethics Quiz
Views: 150  |  Downloads: 0
a bean
Views: 157  |  Downloads: 0
Avertisement in English - Doordarshan News
Views: 147  |  Downloads: 0
liga LIMENOR CATEGORíA Preparatorio
Views: 0  |  Downloads: 0