Transforming XPath Queries for Bottom Up Query Processing

Document Sample
Transforming XPath Queries for Bottom Up Query Processing Powered By Docstoc
					     Transforming XPath Queries for
      Bottom-Up Query Processing

                      Yoshiharu Ishikawa
                         Takaaki Nagai
                       Hiroyuki Kitagawa
                     University of Tsukuba
             {ishikawa,kitagawa}@is.tsukuba.ac.jp

Sept. 27, 2002              ISDB’02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
                      Background
    XML : content-description language on the
     Web
    XPath
         pattern-based query language for XML
         extracts XML nodes based on the specified
          pattern
         has navigational semantics
         XSLT uses XPath for the node specification
         XQuery also uses XPath

    Sept. 27, 2002           ISDB'02
                 XML Example
    <itemlist>
     <item category="audio equipment">
       <catalog-info>
         <type>CD player</type>
         <manufacturer>Star Electronics</manufacturer>
         <catalog-no>CDP-R55N</catalog-no>
       </catalog-info>
       <sales-info>
         <prod-year>2001</prod-year>
         <price>125.00</price>
       </sales-info>
     </item>
     ...
    </itemlist>
Sept. 27, 2002             ISDB'02
                           XPath Query
    Sample query Q: retrieve prices of CD
     players
       /itemlist/item[@category = "audio equipment"]
         [catalog-info/type = "CD player"]/sales-info/price

    XPath sentence
         contains location steps separated by "/"
         a location step has the format
          axis::node_test[predicate]...[predicate]
         location steps can be abbreviated
               e.g., /descendant::foo → //foo, /attribute::bar → @bar

    Sept. 27, 2002                  ISDB'02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
                      XPath Semantics
   XPath assumes top-down query processing
        Not efficient for large XML databases
        Bottom-up processing is better in some cases
      query: /article/authors[author = "Miller"]

top-down article                          bottom-up article

    authors            authors                 authors         authors


author       author   author     author    author   author    author     author

"Smith" "Miller" "White" "Chen" "Smith" "Miller""White" "Chen"
         "Miller"                       "Miller"
    Sept. 27, 2002                   ISDB'02
        Bottom-Up Query Processing
    We can process the
     example query when
         we can determine the                       article
          specified leaf elements
          (i.e., "Miller") with the
          help of an index, and            authors              authors
         we can select the parent
          for a specific author                                author     author
                                      author    author
          node.
    We do not need to                "Smith" "Miller""White" "Chen"
                                              "Miller"
     access all the
     authors/author
     elements
    Sept. 27, 2002               ISDB'02
          Our Objective and Approach
    Our Objective
         Efficient bottom-up processing of XPath queries
          with the help of index structures
    Our Approach
         Use of the proximal nodes model as the
          underlying retrieval model
               The model enables bottom-up query evaluation
         Development of transformation rules from XPath
          queries to proximal nodes expressions


    Sept. 27, 2002                 ISDB'02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
      The Proximal Nodes Model (1)
    Proposed by Navarro and Baeza-Yates [7] as a
     structured document retrieval model
    Uses bottom-up query processing approach
    XML data can be treated as nested nodes:
         a node corresponds to an element or attribute in XML
         each node has an associated text region (called the
          segment): segments can take nested structure
    Expressive power and efficiency are well-balanced
         evaluation cost is almost O(n): n is the no. of nodes



    Sept. 27, 2002                ISDB'02
      The Proximal Nodes Model (2)
    The model consists of three components
    Text pattern matching language
         specifies pattern matching conditions
         implementation dependent
         returns a set of the matched nodes
         example: "ABC Corporation"
    Retrieval operators based on document structures
         returns a set of nodes for a given element or attribute
          name
         example: chapter, price
    Operators to integrate partial retrieval results
         calculates the result node set from the given node sets
         efficient computation based on segment relationships
    Sept. 27, 2002                ISDB'02
         Proximal Nodes Operators
P in Q            a set of P nodes contained in one or more Q nodes

P with Q          a set of P nodes that contains one or more Q nodes

P child Q         a set of P nodes each of which is a child of a Q node

P parent Q        a set of P nodes each of which is a parent of a Q node
P+ Q              the union of P and Q
P- Q              the difference of P and Q
P is Q            the intersection of P and Q
P same Q          a set of P nodes each of which is equal to a Q node

P and Q are nodes with associated segments

 Sept. 27, 2002                    ISDB'02
Example of Proximal Nodes Expression
    Example expression of proximal nodes model

          item with (type same "CD player")
    Query processing steps
         1. determine the node sets that corresponds to the
          elements "item" and "type" using indexes
         2. determine the node set that corresponds to the pattern
          "CD player" using an index
         3. compute the result of "same" operator
         4. compute the result of "with" operator


    Sept. 27, 2002               ISDB'02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
                     Translation Rules (1)
    Supports major XPath patterns
    Based on the XPath semantic description by
     Wadler [10]
    Use of denotational semantics




    Sept. 27, 2002            ISDB'02
                 Translation Rules (2)




Sept. 27, 2002            ISDB'02
                 Translation Rules (3)




Sept. 27, 2002            ISDB'02
                 Auxiliary Functions




Sept. 27, 2002           ISDB'02
    Simplification Using the Knowledge
          of Document Structure
    If we know the DTD of the target XML, we
     can derive more simplified translation results




    Sept. 27, 2002       ISDB'02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
                     Translation Example
   Original query Q
        /itemlist/item[@category = "audio equipment"]
          [catalog-info/type = "CD player"]/sales-info/price
   Translation result:
       t1 = item with (item with (category same "audio equipment"))
       t2 = catalog-info child t1
       t3 = t1 with (t1 with (((type child t2) child t2) same "CD
        player"))
       t4 = sales-info child t3
       ans = (((price child t4) child t4) child t3) child itemlist


    Sept. 27, 2002                 ISDB'02
      Simplification of Query Plan (1)
    The translated result contains multiple
     application of an operator
    We can delete redundant operators
     considering the operator semantics
    Example:
       t1  = item with (item with (category same "audio
          equipment")) → item with (category same "audio
          equipment")


    Sept. 27, 2002           ISDB'02
      Simplification of Query Plan (2)
   If we can use the DTD information, we can
    further simplify the expressions
   Example:
       t3  = t1 with ((type child (catalog-info child t1)) same
          "CD player") → t1 with ((type in t1) same "CD
          player")
   Simplified query plan for query Q
       t1  = item with (category name "audio equipment")
         ans = price in (t1 with ((type in t1) same "CD
          player"))

    Sept. 27, 2002              ISDB'02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
                     Related Work
    Translation of XQL queries into proximal
     nodes expressions (Baeza-Yates&Navarro
     [2])
    Rewriting techniques for XQL queries (Wood
     [13])
    Use of document structure for the query
     optimization [3,11,12,13]
    Optimization of regular path expressions in
     the context of semistructured DBs [4,8]
    Sept. 27, 2002       ISDB'02
                     Presentation Overview
    Background
    Motivation and Our Approach
    The Proximal Nodes Model
    Query Translation
    Translation Example
    Related Work
    Conclusions and Future Work


    Sept. 27, 2002            ISDB'02
        Conclusions and Future Work
    Conclusions
         Bottom-up processing approach for XPath
          queries
         Support of major XPath query patterns
         Translation to proximal nodes expressions
         Simplification and optimization techniques
    Future work
         Support of more complete XPath semantics
         Application of hybrid approach (top-down and
          bottom-up)

    Sept. 27, 2002           ISDB'02