Docstoc

XML

Document Sample
XML Powered By Docstoc
					                                               XML and RDBMSs




                      XML in RDBMSs




        Qiuyue Wang
Master Informatique      XML Data Management                    1
                                                XML and RDBMSs


                      Why Bother?
• Most (structured) data stored in RDBMS
     – Seamless integration of XML and rel. Data
• Exploit capabilities of RDBMS
     – Scalability, Availability, Performance(?), ...
     – Many people with knowledge around
• XML-RDBMS Applications
     – Legal documents, decision support on XML
• When not to use RDBMS
     – Streaming data (RSS, SOAP), IR (Google)

        Qiuyue Wang
Master Informatique       XML Data Management                    2
                                                 XML and RDBMSs


                      Approaches
• Store XML as a CLOB/BLOB in relational database
• Shredding into relations
     – Covered in the previous lecture
     – Manual/middleware based approaches
• SQL/XML (now part of commercial products)
     – Extend SQL with XML data type
     – Plus: integrates well with relational data
     – Minus: not clear how it integrates with application, odd
       „marriage“



        Qiuyue Wang
Master Informatique        XML Data Management                    3
                                                       XML and RDBMSs


                  Overview of SQL/XML
• Goal: standardization of interaction/integration of
  SQL and XML
     – Store XML Data in Relational Databases
     – Publish relational data as XML
     – Query XML data in RDBMS using XQuery
• Part 14 of the SQL standard
     – First edition in SQL:2003
           • Limited functionality: storage and publishing
     – Second edition in SQL:2006
           • More complete integration of XQuery + XQuery Data Model
           • Transforming XML data into relational (table) format
           • Validating an XML value according to some XML Schema

        Qiuyue Wang
Master Informatique              XML Data Management                    4
                                                                  XML and RDBMSs


                  XML Data Type in SQL
• A new data type (like varchar, date, numeric)
     – No comparison operators defined
     – Can have optimized internal representation (different from character string)
• In SQL:2003, XML values can be
     – An XML document
     – The content of an XML element (“well-formed external parsed entity”)
     – Null value
• In SQL:2006, XML values can be
     – Any legal value of the XQuery Data Model
     – Null value
           • XQuery empty sequence is different from null value
      Proper superset of XML values in SQL/XML:2003
     – Additional, optional type modifiers can restrict possible XML values
           • XML(SEQUENCE), XML(CONTENT(ANY)), XML(DOCUMENT(UNTYPED)), …



        Qiuyue Wang
Master Informatique                  XML Data Management                           5
                                                  XML and RDBMSs


                      XML Data Type
CREATE TABLE employees
 ( id CHAR(6),
   lastname VARCHAR (30),
   ...,
   resume XML
   );




        Qiuyue Wang
Master Informatique         XML Data Management                    6
                                                     XML and RDBMSs


          XMLPARSE & XMLSERIALIZE
• SQL functions to convert from/to character strings and BLOBs
• Examples:
     INSERT INTO employees VALUES (‘123456’, ‘Smith’, …, XMLPARSE(DOCUMENT
        ‘<?xml version="1.0"?> <resume
        xmlns="http://www.res.com/resume"><name> … </name><address>
        …</address>...</resume>’ PRESERVE WHITESPACE) );

     SELECT e.id, XMLSERIALIZE(DOCUMENT e.resume AS VARCHAR (2000)) AS
        resume
     FROM employees AS e
     WHERE e.id = ‘123456’;




        Qiuyue Wang
Master Informatique            XML Data Management                           7
                                                       XML and RDBMSs


                      Publishing Functions
• SQL functions/operators for generating XML constructs (elements,
  attributes, ...) within a query
     –   XMLELEMENT generates an XQuery element node
     –   XMLCONCAT concatenates XML values
     –   XMLFOREST generates multiple XQuery element nodes
     –   XMLAGG aggregates XML values across multiple relational tuples
     –   XMLCOMMENT generates an XQuery comment node
     –   XMLPI generates an XQuery processing instruction node
     –   XMLCAST converts SQL to XML values and vice versa
     –   XMLQUERY evaluates an XQuery expression
     –   XMLVALIDATE validates a given XML value according to some XML Schema
     –   XMLDOCUMENT wraps an XQuery document node around an XML value
     –   XMLTEXT generates an XQuery text node



        Qiuyue Wang
Master Informatique              XML Data Management                            8
                                                         XML and RDBMSs


                           XMLELEMENT
• Produces an XML value that corresponds to an XML element,
  given:
     – An SQL identifier that acts as its name
     – An optional list of namespace declarations
     – An optional list of named expressions that provides names and values
       of its attributes
     – An optional list of expressions that provides its content
     – An option how to handle NULL content:
           •   EMPTY ON NULL (default since SQL/XML:2003 behavior)
           •   NULL ON NULL
           •   ABSENT ON NULL
           •   NIL ON NULL
           •   NIL ON NO CONTENT


        Qiuyue Wang
Master Informatique                XML Data Management                    9
                                                   XML and RDBMSs


                      XMLELEMENT
• XMLELEMENT can produce simple element structures:
     SELECT e.id, XMLELEMENT (NAME "Emp", e.fname || ' ' || e.lname) AS
       "result“
     FROM employees e
     WHERE ... ;




        Qiuyue Wang
Master Informatique          XML Data Management                      10
                                                       XML and RDBMSs


                        XMLELEMENT
• XMLELEMENT can produce nested elements (with mixed content):
     SELECT e.id, XMLELEMENT (NAME "Emp",
                       'Employee ',
                       XMLELEMENT (NAME "name", e.lname ),
                      ' was hired on ',
                      XMLELEMENT (NAME "hiredate", e.hire )
                  ) AS "result“
     FROM employees e;




        Qiuyue Wang
Master Informatique             XML Data Management                     11
                                                       XML and RDBMSs


                        XMLELEMENT
• XMLELEMENT can take subqueries as arguments:
     SELECT e.id, XMLELEMENT (NAME "Emp",
                       XMLELEMENT (NAME "name", e.lname ),
                       XMLELEMENT (NAME "dependants",
                                   (SELECT COUNT (*)
                                    FROM dependants d
                                    WHERE d.parent = e.id))
                   ) AS "result“
     FROM employees e
     WHERE … ;




        Qiuyue Wang
Master Informatique             XML Data Management                     12
                                                       XML and RDBMSs

                  XMLATTRIBUTES (within
                     XMLELEMENT)
• Attribute specifications must be bracketed by XMLATTRIBUTES keyword
  and must appear directly after element name and optional namespace
  declaration.
• Each attribute can be named implicitly or explicitly.
     SELECT e.id, XMLELEMENT (NAME "Emp",
                      XMLATTRIBUTES (e.id, e.lname AS "name")
                  ) AS "result"
     FROM employees e
     WHERE … ;




        Qiuyue Wang
Master Informatique             XML Data Management                     13
                                                     XML and RDBMSs

                XMLNAMESPACES (within
                    XMLELEMENT)
• Namespace declarations are bracketed by XMLNAMESPACES keyword and
  must appear directly after element name.
     SELECT empno,
             XMLELEMENT(NAME "admi:employee",
                 XMLNAMESPACES(’http://www.admi.com’ AS "admi"),
                 XMLATTRIBUTES(e.workdept AS "admi:department"),
                 e.lastname
             ) AS "result"
     FROM employees e
     WHERE e. job = ’ANALYST’;




        Qiuyue Wang
Master Informatique            XML Data Management                    14
                                                      XML and RDBMSs


                         XMLCONCAT
• Produces an XML value given two or more expressions of XML type.
• If any of the arguments evaluate to the null value, it is ignored.
     SELECT e.id, XMLCONCAT (XMLELEMENT (NAME "first", e.fname),
                                XMLELEMENT ( NAME "last", e.lname)
                  ) AS "result"
     FROM employees e;




        Qiuyue Wang
Master Informatique             XML Data Management                    15
                                                       XML and RDBMSs


                          XMLFOREST
• Produces a sequence of XML elements given named
  expressions as arguments. Arguments can also contain a list
  of namespace declarations and an option how to handle NULL
  content:
     –   NULL ON NULL (default since SQL/XML:2003 behavior)
     –   EMPTY ON NULL
     –   ABSENT ON NULL
     –   NIL ON NULL
     –   NIL ON NO CONTENT
• Element can have an explicit name:
     – e.salary AS "empSalary"
• Element can have an implicit name, if the expression is a
  column reference:
     – e.salary
        Qiuyue Wang
Master Informatique              XML Data Management                    16
                                                    XML and RDBMSs


                  XMLFOREST - Example
SELECT e.id, XMLFOREST (e.hire, e.dept AS "department") AS "result"
FROM employees e
WHERE ... ;




        Qiuyue Wang
Master Informatique           XML Data Management                     17
                                                     XML and RDBMSs


                             XMLAGG
• An aggregate function, similar to SUM, AVG, etc.
     – The argument for XMLAGG must be an expression of XML type.
• Semantics
     – For each row in a group G, the expression is evaluated and the
       resulting XML values are concatenated to produce a single XML value
       as the result for G.
     – An ORDER BY clause can be specified to order the results of the
       argument expression before concatenating.
     – All null values are dropped before concatenating.
     – If all inputs to concatenation are null or if the group is empty, the
       result is the null value.




        Qiuyue Wang
Master Informatique            XML Data Management                        18
                                                   XML and RDBMSs


                      XMLAGG - Example
SELECT XMLELEMENT ( NAME "Department",
            XMLATTRIBUTES ( e.dept AS "name" ),
            XMLAGG (XMLELEMENT (NAME "emp", e.lname))
        ) AS "dept_list",
        COUNT(*) AS "dept_count"
FROM employees e
GROUP BY dept ;




        Qiuyue Wang
Master Informatique          XML Data Management                    19
                                                   XML and RDBMSs


               XMLAGG and ORDER BY
SELECT XMLELEMENT ( NAME "Department",
            XMLATTRIBUTES ( e.dept AS "name" ),
            XMLAGG (XMLELEMENT (NAME "emp", e.lname)
                       ORDER BY e.lname)
        ) AS "dept_list", COUNT(*) AS "dept_count"
FROM employees e
GROUP BY dept ;




        Qiuyue Wang
Master Informatique          XML Data Management                    20
                                                                  XML and RDBMSs


              XMLCOMMENT & XMLPI
• XMLCOMMENT
     – Creates an XQuery comment node (XML comment)
     – One mandatory character string input argument

     XMLCOMMENT (‘This is a comment’)
      <!--This is a comment-->
• XMLPI
     – Creates an XQuery processing instruction node
     – Two input arguments:
           • One mandatory name for the PI (an SQL identifier); a.k.a. the PI target
           • One optional character string input argument; defines the content of the PI target

     XMLPI (NAME “includeFile”, ‘/POs/template.hls’)
      <?includeFile /POs/template.hls?>


        Qiuyue Wang
Master Informatique                   XML Data Management                                    21
                                                                  XML and RDBMSs


                                   XMLTEXT
• Creates an XQuery text node
• One mandatory character string
input argument
• Example:


SELECT XMLELEMENT (NAME "p",
          XMLAGG (XMLCONCAT
                   (XMLText (T.Text),
                    XMLELEMENT (NAME "em", T.Emphasized_text) )
              ORDER BY T.Seqno ) ) AS “result”
FROM T




        Qiuyue Wang
Master Informatique                   XML Data Management                          22
                                               XML and RDBMSs


                      XMLDOCUMENT
• Wraps an XQuery document node around an
  XML value
• According to the rules of XQuery’s document
  constructor
• Useful for converting an arbitrary XQuery
  sequence into an XML document or a “well-
  formed external parsed entity”



        Qiuyue Wang
Master Informatique      XML Data Management                    23
                                                  XML and RDBMSs


                        XMLQUERY
• Evaluates an XQuery or XPath expression
     – Provided as a character string literal
• Allows for optional arguments to be passed in
     – Zero or more named arguments
     – At most one unnamed argument can be passed in as the
       XQuery context item
     – Arguments can be of any predefined SQL data type incl.
       XML
     – Non-XML arguments will be implicitly converted using
       XMLCAST
• Returns a sequence of XQuery nodes

        Qiuyue Wang
Master Informatique         XML Data Management                    24
                                                    XML and RDBMSs


                 XMLQUERY - Examples
SELECT XMLQUERY(‘<e>hi</e>’ RETURNING SEQUENCE BY REF) AS “result”
FROM T




SELECT XMLQUERY(‘for $i in $po/purchaseOrder/customer
                 where $i/zip[@type=“US”]=“95141” return $i/name’
          PASSING BY REF po AS “po”
          RETURNING SEQUENCE) AS “Name_elements”
FROM POrders




        Qiuyue Wang
Master Informatique           XML Data Management                    25
                                                                      XML and RDBMSs


                                      XMLCAST
•   Convert an SQL value into an XML value
     – Values of SQL predefined types are cast to XQuery atomic types using
           • The defined mapping of SQL types/values to XML Schema types/values
           • The semantics of XQuery’s cast expression
     – XMLCAST(NULL AS XML) returns the SQL null value typed as XML
•   Convert an XML value into an SQL value
     – XML values are converted to values of SQL predefined types using a combination of
           •   The defined mapping of SQL types to XML Schema types
           •   XQuery’s fn:data function
           •   XQuery’s cast expression
           •   SQL’s CAST specification
     – An XML value that is the empty sequence is converted to a NULL value of the specified
       SQL data type
•   Note: XMLCAST to/from character strings is different from XMLSERIALIZE and
    XMLPARSE



        Qiuyue Wang
Master Informatique                      XML Data Management                               26
                                                   XML and RDBMSs


                      XMLCAST - Example
SELECT XMLCAST (
           XMLQUERY (‘<e>hi</e>’ RETURNING SEQUENCE BY REF)
           AS VARCHAR(20) ) AS “result”
FROM T




        Qiuyue Wang
Master Informatique          XML Data Management                    27
                                                      XML and RDBMSs


                            XMLTABLE
• Transforming XML data into table format
• Allows for optional arguments to be passed in
     – Just like XMLQUERY
• Element/attribute values are mapped to column values using
  path expressions (PATH) – the “column pattern”
     – Provided as a character string literal
• Names and SQL data types for extracted values/columns need
  to be specified
• Default values for “missing” columns can be provided
• ORDINALITY column can be generated
     – Contains a sequential number of the corresponding XQuery item in
       the XQuery sequence (result of the row pattern)

        Qiuyue Wang
Master Informatique             XML Data Management                       28
                                                  XML and RDBMSs


                      XMLTABLE - Example
SELECT X.*
FROM POrders P,
      XMLTABLE (‘$po//customer’ PASSING P.po AS “po”
                COLUMNS
                “#num” FOR ORDINALITY,
                “CID” INTEGER PATH ‘@id’,
                “Name” VARCHAR(30) PATH ‘name’,
                “ZipType” CHAR(2) PATH ‘zip/@type’,
                “Zip” VARCHAR(6) PATH ‘zip’
      ) AS “X”




        Qiuyue Wang
Master Informatique         XML Data Management                    29
                                                     XML and RDBMSs


         SQL Predicates on XML Type
• IS DOCUMENT
     – Checks whether an XML value conforms to the definition of a well-
       formed XML document
• IS CONTENT
     – Checks whether an XML value conforms to the definition of either a
       well-formed XML document or a well-formed external parsed entity
• IS VALID
     – Checks whether an XML value is valid according to a given XML
       Schema
     – Does not validate/modify the XML value; i.e., no default values are
       supplied.
• XMLEXISTS
     – Checks whether the result of an XQuery expression (an XQuery
       sequence) contains at least one XQuery item



        Qiuyue Wang
Master Informatique            XML Data Management                           30
                                                  XML and RDBMSs


                      XMLEXISTS - Example
SELECT id, bib
FROM bibliographies
WHERE XMLEXISTS(‘$doc/bib/book/@price’ PASSING
  bib AS “doc”);




        Qiuyue Wang
Master Informatique         XML Data Management                    31
                                                  XML and RDBMSs


                      Relation  XML

                XML view                         relational view

                  XMLELEMENT,
                  …           XMLTABLE,
         XMLQUERY                                        SQL
                              …



                XML data                         relational data
                            default view




        Qiuyue Wang
Master Informatique        XML Data Management                     32
                                              XML and RDBMSs


                  Summary of SQL/XML
   Function                  Functions/Operators/Keywords
   String  XML              XMLPARSE
   XML  String              XMLSERIALIZE
   Relation  XML            XMLELEMENT (XMLNAMESPACE ,
                             XMLATTRIBUTES), XMLFOREST,
                             XMLCONCAT, XMLAGG,
                             XMLCOMMENT, XMLPI, XMLTEXT,
                             XMLDOCUMENT

   XML  XML                 XMLQUERY
   XML  Relation            XMLTABLE
   XML  Boolean             XMLEXISTS

        Qiuyue Wang
Master Informatique     XML Data Management                    33
                                             XML and RDBMSs




                 IBM DB2 v9.5 pureXML




        Qiuyue Wang
Master Informatique    XML Data Management                    34
                                                     XML and RDBMSs


                        System Architecture




K. Beyer, et al., “System RX: One Part Relational, One Part XML”, SIGMOD 2005
          Qiuyue Wang
  Master Informatique          XML Data Management                     35
                                                XML and RDBMSs


      Native XML Storage Requirements
• Must support direct access and subdocument
  updates
• Documents need to span pages
• Must support XQuery’s node reference semantics
• Must support rollback and recovery
• Need to reuse of existing relational infrastructure
     – Table spaces, buffer pools, lock manager and log manager
       used without modification
     – Reuse of transactions, concurrency, scalability and
       recoverability simplified implementation and is essential
       for coexistence with relational data

        Qiuyue Wang
Master Informatique       XML Data Management                    36
                                                       XML and RDBMSs


          Native XML Store Highlights
• XML is stored as instances of XQuery Data Model in a structured and type
  annotated tree
     – Every node contains pointers to its children and parent
     – Path expressions are directly executed over the native store
     – Each node has a unique identifier
     – A persistent dictionary maps strings to identifiers
     – Regions of nodes for a given XML document are grouped together on pages
       and linked by a logical regions index
     – Direct access to nodes using node identifiers and the regions index




        Qiuyue Wang
Master Informatique             XML Data Management                          37
                                            XML and RDBMSs


        Efficient Document Tree Storage




        Qiuyue Wang
Master Informatique   XML Data Management                    38
                                            XML and RDBMSs


           Information for Every Node
• Tag name (encoded as unique StringID)
• A nodeID (Dewey node identifier)
• Node kind (e.g. element, attribute, etc.)
• Namespace / Namespace prefix
• Type annotation
• Pointer to parent
• Array of child pointers
• Hints to the kind & name of child nodes (for early-
  out navigation)
• For text/attribute nodes: the data itself

        Qiuyue Wang
Master Informatique   XML Data Management                    39
                                                XML and RDBMSs


             XML Node Storage Layout
• Node hierarchy of an XML document stored on DB2 pages
• Documents that don’t fit on 1 page: split into regions / pages
• Docs < 1 page: 1 region, multiple docs/regions per page




        Qiuyue Wang
Master Informatique       XML Data Management                    40
                                              XML and RDBMSs


       XML Storage: “Regions Index”
• not user defined, default component of XML storage layer

                                              • maps nodeIDs   to
                                              regions & pages
                                              • allows to fetch
                                              required regions
                                              instead of full
                                              documents
                                              • allows intelligent
                                              prefetching




        Qiuyue Wang
Master Informatique     XML Data Management                      41
                                                            XML and RDBMSs


                            XML Indexes
• Define 0, 1 or multiple XML Value Indexes per XML column
• Can index all elements/attributes, but not forced to do so
• Index definition uses an XML pattern to specify which elements/attributes
  to index (and which not to)
     – XML pattern = XPath without predicates, only child axis, attribute axis, self
       axis, descendent axis and descendent-or-self axis
• XML index maps: (pathID, value) → (nodeID, rowID)
• Index any elements or attributes, incl. mixed content
• E.g.
     – CREATE UNIQUE INDEX idx2 ON dept(deptdoc) USING XMLPATTERN
       '/dept/employee/@id' AS double;
     – CREATE INDEX idx3 ON dept(deptdoc) USING XMPPATTERN
       '/dept/employee/name' AS varchar(35);



        Qiuyue Wang
Master Informatique                XML Data Management                                 42
                                                     XML and RDBMSs


                      XML Index Selection
• Given an XPath query Q and an XPath Index I
     – Determine if the index is applicable
     – Build index pushdown QI : Q = QC (QI)
• Index I can be used to answer query Q only if Q  I, i.e. every
  node returned by Q is also returned by V.
• XPath containment problem
     – XPath{[ ],*,//] subset is NP Complete
• A. Balmin, F. Özcan, K. Beyer, R. Cochrane, H. Pirahesh, “A
  Framework for Using Materialized XPath Views in XML Query
  Processing”, VLDB 2004



        Qiuyue Wang
Master Informatique            XML Data Management                    43
                                                XML and RDBMSs


       When Indexes cannot be Used
• When path expressions are in let bindings or return
  clause, indexes cannot be used
     for $i in db2-fn:xmlcolumn('T.XMLCOL')/a/b
     return <Customer> {$i[@c = 10]} </Customer>
• Outerjoin semantics prevents index usage
• Need to return correct number of empty
  <Customer> elements
• Indexes will skip nodes
• A. Balmin, K. Beyer, F. Özcan, “On the Path to
  Efficient XML Queries”, VLDB 2006
        Qiuyue Wang
Master Informatique       XML Data Management                    44
                                                    XML and RDBMSs


            Querying XML Data in DB2
• XQuery/XPath as a stand-alone language
     – for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept
       let $emp := $d//employee/name
       return <EmpList> {$d/@bldg, $emp} </EmpList>
• SQL embedded in XQuery
     – for $d in db2-fn:sqlquery('select deptdoc from dept where deptID =
       “PR27” ')…
     – for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept, $e in db2-
       fn:sqlquery('select xmlforest(name, desc) from unit u’)…
• XQuery/XPath embedded in SQL/XML
     – xmlexists, xmlquery, xmltable
• Plain SQL for full-document retrieval

        Qiuyue Wang
Master Informatique           XML Data Management                           45
                                                      XML and RDBMSs


                 Hybrid Query Compiler
• Two major design decisions
     – No static typing to support dynamic, changing schemas
     – No XPath normalization into explicit FLWR blocks
           • Steps are not broken into selections, as in XPERANTO
     – Path expressions consisting of solely navigational steps are
       expressed as an atomic unit
• Built on extensibility features of Starburst




        Qiuyue Wang
Master Informatique             XML Data Management                    46
                                            XML and RDBMSs




        Qiuyue Wang
Master Informatique   XML Data Management                    47
                                                 XML and RDBMSs


                      Semantic Modeling
• XQuery is represented via an internal Query Graph
  Model (QGM)
• QGM
     – Is a semantic network used to represent the data flow in a
       query
     – Consists of operations and arcs, representing data flow
       between operations
• QGM is augmented with native constructs which are
  specific to XML, such as complex navigation


        Qiuyue Wang
Master Informatique        XML Data Management                    48
                                                 XML and RDBMSs


                      Example QGM Graph




        Qiuyue Wang
Master Informatique        XML Data Management                    49
                                                     XML and RDBMSs


     Query Rewrite Transformations
• QGM generated by the parser captures the most general
  semantics, and is not efficient in most cases
• Two goals
     – Optimize the data flow by consolidating operations in the graph
     – Normalize QGM representation for cost-based optimizer
• Rewrite transformations use heuristics, such as applying
  selections as early as possible
• Rewrite transformations are part of a rule-based engine
     – Rules organized into classes




        Qiuyue Wang
Master Informatique            XML Data Management                       50
                                                 XML and RDBMSs


         XQuery Rewrite Optimization
• Exploit existing rewrite techniques
     – E.g. merge nested query blocks, eliminate unused
       variables
• Provide XQuery/XPath specific rewrite techniques
     –   XPath merging
     –   Predicate pushdown
     –   Join pull up
     –   XPath normalization
     –   ……



        Qiuyue Wang
Master Informatique        XML Data Management                    51
                                               XML and RDBMSs


                      XPath Merging
1. Merge XPath expressions
     FOR $i in collection(‘T’)//Order, $j in $i/LineItem
         RETURN $j
     -->
     FOR $i in collection(‘T’)//Order/LineItem RETURN $i
2. A. Balmin, F. Ozcan, et. al., “Grouping and
   Optimization of XPath Expressions in DB2
   pureXML”, SIGMOD 2008


        Qiuyue Wang
Master Informatique      XML Data Management                    52
                                                         XML and RDBMSs


  Predicate Pushdown into Exatraction
• For pushdown is straightforward
     FOR $i in collection(‘T’)//LineItem
     WHERE $i/Price > 5000
     RETURN $i
     -->
     FOR $i in collection(‘T’)//LineItem[Price > 5000]
     RETURN $i;
• Let pushdown is tricky
     FOR $i in collection(‘T’)/Order
     LET $j := $i/LineItem
     WHERE $j/Price > 5000 RETURN $i
     -->
     FOR $i in collection(‘T’)/Order
     LET $j := $i/LineItem[Price > 5000]
     WHERE not(isEmpty($j)) RETURN $i
        Qiuyue Wang
Master Informatique                XML Data Management                    53
                                              XML and RDBMSs


                      Join Pull Up
• Simple decorrelation
     for $i in db2-fn:xmlcolumn(”T2.DOC”)/c,
     $j in db2-fn:xmlcolumn(”T1.DOC”)/a[b=$i/d]
     return $j
     
     for $i in db2-fn:xmlcolumn(”T2.DOC”)/c,
     $j in db2-fn:xmlcolumn(”T1.DOC”)/a
     where $j/b=$i/d
     return $j


        Qiuyue Wang
Master Informatique     XML Data Management                    54
                                                       XML and RDBMSs


         XPath Normalization Rewrites
• Descendant-or-self elimination
     –   “//” does not mean descendant
     –   “//” is actually shorthand for descendant-or-self::node()
     –   Inefficient to select all nodes, to eliminate most later
     –   //a can be rewritten into descendant::a
     –   //a[2] cannot: descendant-or-self::node()/child::a[2]
• Multiple predicate elimination
     – a[p1][p2] --> a[p1 and p2], if p1 and p2 do not contain positional
       predicates
• Parent elimination
     – a/b[.. = 10] --> a[. = 10]/b



        Qiuyue Wang
Master Informatique              XML Data Management                        55
                                                             XML and RDBMSs


                      Cost-based Optimizer
• Infrastructure to support XML path expressions
• Planning:
     – Reuse plan and operator structures
           • E.g., scan (on collection), index anding/oring, value-based joins,
             aggregates
     – New operators for XML index and navigation: XISCAN and XSCAN
     – Rules to combine XML data access operators and with conventional
       ones
     – Arguments and properties for new, as well as conventional, operators
• Costing:
     – Cost model for new physical operators
• Statistics:
     – New notions of XML data statistics
     – Path occurrences and correlations


        Qiuyue Wang
Master Informatique                 XML Data Management                           56
                                                 XML and RDBMSs


                      Query Plan Example




        Qiuyue Wang
Master Informatique        XML Data Management                    57
                                                                  XML and RDBMSs


                XML Navigation: XSCAN
•   Corresponds to the TableAccess operator in relational, but much more powerful
     – Evaluates predicates over pages in shared memory
     – Evaluates paths by traversing parent-child links in the storage
•   Returns tuples of variable bindings
     – Node references to XML fragments
     – Atomic values – copy of the value, as in relational + XML type
•   Can evaluate multiple xpath steps from multiple xpath expressions in the query,
    i.e. evaluates complex pattern trees
•   One pass, document order preserving, evaluation
     – very important for order-aware languages as XQuery
     – duplicate elimination in most cases for free
•   Designed to perform using limited memory
•   Based on the TurboXPath Research technology
     – V. Josifovski, M. Fontoura and A. Barta, “Querying XML streams”, VLDB Journal 14(2),
       197-210, 2005


        Qiuyue Wang
Master Informatique                   XML Data Management                                 58
                                            XML and RDBMSs




            Microsoft SQL Server 2005




        Qiuyue Wang
Master Informatique   XML Data Management                    59
                                                  XML and RDBMSs


        XML Support in SQL Server 2005
   Create table DOCS (ID int primary key,
       XDOC xml)
• XML stored in an internal, binary form (‘blob’)
• Optionally typed by a collection of XML schemas
     – Used for storage and query optimizations
• 3 of 5 methods on XML data type (embedding XQuery):
     – query(): returns XML type
     – value(): returns scalar value
     – exist(): checks conditions on XML nodes
• XML indexing


        Qiuyue Wang
Master Informatique         XML Data Management                    60
                                                 XML and RDBMSs


            XQuery – Supported Features
• XQuery clauses “for”, “where”, “return” and “order by”
• XPath axes – child, descendant, parent, attribute, self and
  descendant-or-self
• Functions – numeric, string, Boolean, nodes, context, sequences,
  aggregate, constructor, data accessor
• SQL Server extension functions to access SQL variable and column
  data within XQuery
• Numeric operators (+, -, *, div, mod)
• Value comparison operators (eq, ne, lt, gt, le, ge)
• General comparison operators (=, !=, <, >, <=, >=)




        Qiuyue Wang
Master Informatique        XML Data Management                    61
                                                XML and RDBMSs


                  Indexing XML Column
• Primary XML index on an XML column
     – Creates B+ tree on data model content of the XML nodes
     – Adds column Path_ID for the reversed, encoded path from
       each XML node to root of XML tree
• ORDPATH labelling scheme is used for XML nodes
     – Relative order of nodes
     – Document hierarchy




        Qiuyue Wang
Master Informatique       XML Data Management                    62
                                                            XML and RDBMSs


              Primary XML Index Entries
ID    ORDPATH         TAG               NODETYPE          VALUE          PATH_ID


7     1               1 (Book)          10 (ns:bT)        NULL           #1
7     1.1             2 (ISBN)          2 (xs:string)     '1-55860-…'    #2#1
7     1.3             3 (Section)       11 (ns:sT)        NULL           #3#1
7     1.3.1           4 (Title)         2 (xs:string)     'Bad Bugs'     #4#3#1
7     1.3.3           5 (Figure)        12 (ns:fT)        NULL           #5#3#1
7     1.5             3 (Section)       11 (ns:sT)        NULL           #3#1
7     1.5.1           4 (Title)         2 (xs:string)     'Tree frogs'   #4#3#1
7     1.5.3           5 (Figure)        12 (ns:fT)        NULL           #5#3#1

               - Encoding of tags & types stored in system meta-data
Clustering key - Additional details not shown

        Qiuyue Wang
Master Informatique                 XML Data Management                            63
                                                          XML and RDBMSs


                Secondary XML indexes
• To speed up different classes of commonly occurring queries


PATH                  path-based queries        PATH_ID, VALUE, ID, ORDPATH

VALUE                 value-based               VALUE, PATH_ID, ID, ORDPATH
                      queries
PROPERTY              Object properties         ID, PATH_ID, VALUE, ORDPATH

• Statistics created on key columns of the primary and
  secondary XML indexes
     – Used for cost-based selection of secondary XML indexes


        Qiuyue Wang
Master Informatique                 XML Data Management                    64
                                                 XML and RDBMSs


                      Handling Types
• If XML column is typed
     – Values are stored in XML blob and XML indexes with
       appropriate typing
• Untyped XML
     – Values are stored as strings
     – Convert to appropriate types for operations
• SQL typed values stored in primary XML index
     – Most SQL types are compatible with XQuery types
       (integer)
     – Value comparisons on XML index columns suffice
     – Some types (e.g. xs:datetime) are stored in internal format
       and processed specially
        Qiuyue Wang
Master Informatique        XML Data Management                    65
                                                         XML and RDBMSs


     XQuery Processing Architecture
          XQuery expression             •   XQuery Compiler:
                                             –   Parses XQuery expr.
                                             –   Checks static type correctness
           XQuery Compiler
                                             –   Type annotations
                                             –   Applies static optimizations
  XML algebra tree (XmlOp ops)                       • Path collapsing
                                                     • Rewrites using XML schemas

         XML Operator Mapper            •   XML Operator Mapper
                                             – Recursively traverses XML algebra
                                               tree
      Relational Operator Tree
                                             – Converts each XmlOp to relation+
      (relational+ operators)                  operator sub-tree
                                             – Mapping depends upon existence of
      Reln Query Processor                     primary XML index




        Qiuyue Wang
Master Informatique            XML Data Management                                  66
                                              XML and RDBMSs


          Examples of XML Operators
 XmlOp_Select         In: list of items, condition
                      Out: items satisfying condition

 XmlOp_Path           In: simple paths, no predicates
                      Opt: path context to collapse paths
                      Out: eligible XML nodes
 XmlOp_Apply          In: two item lists
                      Out: one item list
                      Variable binding in “for” expression
 XmlOp_Construct      In: sub-nodes for element construction,
                      otherwise value
                      Out: constructed node

        Qiuyue Wang
Master Informatique     XML Data Management                     67
                                                       XML and RDBMSs


    XML Operator Mapping – Overview
                                                              PATH Index
           XQUERY
                                PK    OrdPath
                                 1
                                 1
     PK       XML
                      Primary    1                          VALUE Index
                      XML
     1                Index
                                 1
                                 20
     20                          20
     35                          20
                                                          PROPERTY Index
                                 35
                                 35
                                Special handling for
        REL+ tree               SELECT * | XDOC


        Qiuyue Wang
Master Informatique             XML Data Management                        68
                                                   XML and RDBMSs


                      New Operators
•      Some produce N rows from M (≠ N) rows
     –     XML_Reader – streaming, pull-model XML parser
     –     XML_Serializer – to serialize query result as XML
•      Some are for efficiency
     –     Contains – to evaluate XQuery contains()
     –     TextAdd – to evaluate the XQuery function string()
     –     Data – to evaluate XQuery data() function
•      Some are for specific needs
     –     Check – validate XML during insertion or modification


        Qiuyue Wang
Master Informatique          XML Data Management                    69
                                             XML and RDBMSs


                XML Operator Mapping
• Following categories:
     – Mapping of XPath expressions
     – Mapping of XQuery expressions
     – Mapping of XQuery built-in functions




        Qiuyue Wang
Master Informatique    XML Data Management                    70
                                                    XML and RDBMSs


          Non-Indexed XML, Full Path
  XML operator tree:                   • XML_Reader produces
                                         subtrees of <SECTION>
  XmlOp_Path PATH =
    “/BOOK/SECTION”                       – Node table rows
                                          – Contains OrdPath
  Rel+ operator tree:
                                          – No PK or PATH_ID
              XML_Serialize
                                       • XML_Serialize reassembles
                                         those row into XML data
                                         type
                                          – To output result
         XML_Reader (XDOC,
         “/BOOK/SECTION”)



        Qiuyue Wang
Master Informatique           XML Data Management                    71
                                                           XML and RDBMSs

      Sample Query Execution Using
          Primary XML Index
ID    ORDPATH         TAG               NODETYPE          VALUE          PATHID


7     1               1 (Book)          10 (ns:bT)        NULL           #1
7     1.1             2 (ISBN)          2 (xs:string)     '1-55860-…'    #2#1
7     1.3             3 (Section)       11 (ns:sT)        NULL           #3#1
7     1.3.1           4 (Title)         2 (xs:string)     'Bad Bugs'     #4#3#1
7     1.3.3           5 (Figure)        12 (ns:fT)        NULL           #5#3#1
7     1.5             3 (Section)       11 (ns:sT)        NULL           #3#1
7     1.5.1           4 (Title)         2 (xs:string)     'Tree frogs'   #4#3#1
7     1.5.3           5 (Figure)        12 (ns:fT)        NULL           #5#3#1

                      • /Book/Section  #3#1 (by XML Op Mapper)
Clustering key

        Qiuyue Wang
Master Informatique                 XML Data Management                           73
                                                        XML and RDBMSs


                 Indexed XML, Full Path
        XML_Serialize                      • XmlOp_Path mapped to
                                             SELECT
                           Assemble
              Apply
                            Subtree        • GET(PXI) – rows from
                                             primary XML index
   Select ($b)           Select                 – Match PATH_ID
                                           • Not shown:
    GET                GET $b.OrdP              – JOIN with base table on
   (PXI)              (PXI) ≤ OrdP<               PK
                            DL($b)

  Path_ID=#SECTION#BOOK


        Qiuyue Wang
Master Informatique               XML Data Management                    74
                                                    XML and RDBMSs


                      XML Index – PATH
PATH_ID               VALUE            ID              ORDPATH
#1           NULL                      7               1
#2#1         '1-55860-…'               7               1.1
#3#1         NULL                      7               1.3
#3#1         NULL                      7               1.5
#4#3#1       'Bad Bugs'                7               1.3.1
#4#3#1       'Tree frogs'              7               1.5.1
#5#3#1       NULL                      7               1.3.3
             NULL
#5#3#1 up path evaluations
• Speeds                               7               1.5.3
• Example – /Book/Section  #3#1

        Qiuyue Wang
Master Informatique           XML Data Management                    75
                                                    XML and RDBMSs


       Indexed XML, Imprecise Paths
                                       /BOOK/SECTION//
          XML_Serialize                TITLE
                                       • Matched using LIKE
               Apply                     operator on Path_ID
                       Assemble
                       subtree of
 Select ($s)            <TITLE>


  GET
 (PXI)
            Path_ID LIKE
      #TITLE%#SECTION#BOOK


        Qiuyue Wang
Master Informatique           XML Data Management                    76
                                                      XML and RDBMSs


                      Predicate Evaluation
   XML_Serialize
                                         /BOOK[@ISBN = “12”]
                                         • Search value compared with
                  Apply                    VALUE column in PXI
                                         • Collapsed path /BOOK/@ISBN
         Apply            Assemble          – Induce index seeks
                          subtree of
                           <BOOK>           – Reduce intermediate result
 Select ($b) Select                            size
                                         • Parent check – Par($b)
                                              – Using OrdPath
   GET       GET Path_ID=#@I             • Value conversion might be
  (PXI)     (PXI) SBN#BOOK &               needed
                  VALUE=“12”
                       &
       Path_ID=
                    Par($b)
        #BOOK


        Qiuyue Wang
Master Informatique             XML Data Management                    77
                                                  XML and RDBMSs


                      Ordinal Predicate
• /BOOK[n]
• Adds ranking column to the rows for <BOOK>
  elements
     – Retrieves the nth <BOOK> node
• Special optimizations
     – [1]  TOP 1 ascending
     – [last()]  TOP 1 descending
     – Avoids sorting when input is sorted
           • Example – in XML_Serializer


        Qiuyue Wang
Master Informatique         XML Data Management                    78
                                                            XML and RDBMSs


                               “for” Iterator
                                               for $s in /BOOK//SECTION
                    XML_Serialize
                                               where $s/@num >= 3
                                               return $s/TITLE
                      Apply                    • XML op for “for” is
                                                 XmlOp_Apply
                            Assemble
                Apply ($s) <SECTION>                – Maps to APPLY
                                                    – Binds $s and iterates over
          Select          Path_ID LIKE                <SECTION>
                          #TITLE#SECTION%
                                                    – Determines its <TITLE> children
                   Exists #BOOK & Par($s)
                                               • Nested “for” and “for” with
  Select ($s)         Select                     multiple bindings turn into
   GET        GET(PXI)        Path_ID LIKE
                                                 nested APPLY
  (PXI)                       #@num#SEC%            – Each APPLY binds to a different
                              #BK & VALUE             variable
  Path_ID LIKE
                              >= 3 & Par($s)
 #SECTION%#BOOK


        Qiuyue Wang
Master Informatique                  XML Data Management                           79
                                                     XML and RDBMSs


     XQuery “order by” and “where”
• Order by:
     – Sorts rows based on order-by expression
     – Adds a ranking column to these rows
     – Ranking column converted into OrdPath values
           • Yield the new order of the rows
           • Fits rest of query processing framework
• Where
     – Becomes SELECT on input sequence
     – Filters rows satisfying specified condition



        Qiuyue Wang
Master Informatique            XML Data Management                    80
                                                XML and RDBMSs


                      XQuery “return”
• Return nodes sequence in document order
     – Use OrdPath values and
     – XML_Serialize operator
• New element and sequence constructions
     – Merge constructed and existing nodes into a
       single sequence (SWITCH_UNION)




        Qiuyue Wang
Master Informatique       XML Data Management                    81
                                             XML and RDBMSs


      XQuery Functions & Operators
• Built-in fn and op are mapped to relational fn
  and op if possible
     – fn:count()  count()
• Additional support for XQuery types, functions
  and operators that cannot be mapped directly
     – Intrinsics




        Qiuyue Wang
Master Informatique    XML Data Management                    82
                                               XML and RDBMSs


                      Optimizations
• Exploiting Ordered Sets
     – Sorting information (OrdPath) made available to
       further relational operators
     – XML_Serialize is an example
• Using static type information
     – Eliminates CONVERT() in operations
     – Allows range scan on VALUE index
• Path collapsing

        Qiuyue Wang
Master Informatique      XML Data Management                    83
                                                 XML and RDBMSs




                      Oracle 11g XML DB




        Qiuyue Wang
Master Informatique        XML Data Management                    84
                                                            XML and RDBMSs


                               XML Type

              XQuery           XPath               XSLT           SQL / XML


                           XML Type abstraction
                  Physical Storage
                                                     XML Type Views


            CLOB             Shredded
                                                           Relational Data
            Hybrid          Binary XML




        Qiuyue Wang
Master Informatique                  XML Data Management                      85
                                                  XML and RDBMSs


          Oracle XML Storage Models
• Object Relational Storage (OR)
     – Schema-based mapping to object relational tables
     – Preserves DOM fidelity (more than traditional shredding)
     – XPath queries translated to table/column access
• Binary XML Storage
     – Encode XML document as SAX-like opcodes
     – XPath queries evaluated using NFA in streaming fashion
• CLOB
     – Document fidelity: maintains the original XML data, byte for
       byte
     – XML operations on the document requires creating a DOM from
       the document
        Qiuyue Wang
Master Informatique         XML Data Management                    86
                                                 XML and RDBMSs


                      XMLIndex Structure
• Logically organized into a path index, order index and
  one or more value indexes
     – Path index - useful for identifying fragments based on
       simple (navigational) path expressions
     – Order index – useful for determining parent-child,
       ancestor-descendant, and sibling relationships
     – Value index – allow type aware lookups based on value
       equality or range
• Ad-hoc XML queries are decomposed into a
  composite operation that performs path, order, and
  value constrained lookups.

        Qiuyue Wang
Master Informatique        XML Data Management                    87
                                              XML and RDBMSs


         Indexing Structured Portions
• Many semi-structured XML datasets contain
  structured portions
     – Relational access required for structured portions
     – Relational views defined using SQL/XML,
       XMLTable syntax
• Structured portions projected into separate
  tables and columns
     – XMLTable Index



        Qiuyue Wang
Master Informatique     XML Data Management                    88
                                                 XML and RDBMSs


                 XMLTable Based Index
• Extract commonly searched structured data into a
  table
• All structured leaf data in the same group (having
  the same parent node) are stored in one row
• Project out each leaf data value as a column in the
  table
     – All xpath matching is avoided at run time
     – All joins to ensure the structured leaf data from the same
       parent node is avoided



        Qiuyue Wang
Master Informatique        XML Data Management                    89
                                            XML and RDBMSs


      Example of XMLTable Based Index




        Qiuyue Wang
Master Informatique   XML Data Management                    90
                                                                   XML and RDBMSs


         XML: One Size does not Fit All
                                                    Primary Content
     XML Content is
                                         Structured                     Unstructured
                                Structured XML                     ‘Islands of Structure’

                                PurchaseOrder, Trade

                  Structured    Schema Based : Object relational   Binary XML with
                                Storage                            combination of Ad-Hoc
     Fragments
     Embedded




                                Non Schema : Binary XML with       XPaths and XMLTable
                                XMLTable constructs                based Indexing


                                ‘Unstructured Fragments’           Unstructured XML

                                Resume                             Generic Word 2007
                                                                   Document
                 Unstructured
                                Object Relational
                                Embedded Fragments indexed         Binary XML with Ad-Hoc
                                with Ad-Hoc XPaths                 Xpaths




        Qiuyue Wang
Master Informatique                 XML Data Management                                     91
                                                 XML and RDBMSs


   Requirements for XQuery Engine
1. XQuery Engine should be storage independent yet
   choose most optimal plan for each storage
    – No one XML storage satisfies all applications
2. Better interoperability with SQL engine to enable
   cross-language optimizations
    – Needed for hybrid SQL and XML use cases
3. Leverage mature relational infrastructure as much
   as possible
    – Iterator based execution infrastructure


        Qiuyue Wang
Master Informatique        XML Data Management                    92
                                                         XML and RDBMSs


         XQuery Compilation (Step 1)

                          XQuery            SQL/XML


                      SQL/XML XQuery Hybrid Compiler


                      XML Extended Relational Algebra
                                 (XERA)




        Qiuyue Wang
Master Informatique                XML Data Management                    93
                                                  XML and RDBMSs

     XERA (XML Extended Relational
               Algebra)
• Not really a translation of XQuery into SQL, rather
• Compile XQuery into query graph structure
     – Referred to as XML extended relational algebra
• XERA is based on SQL relational algebra extensibility
     –   Type extensibility – new XML native data type
     –   F & O extensibility – new XQSQL functions
     –   User defined aggregator extensibility – new aggregator
     –   Collection iterator extensibility – table function




        Qiuyue Wang
Master Informatique         XML Data Management                    94
                                                   XML and RDBMSs


                      XERA – New XMLType
  • XMLType is added as type into XERA
  • XMLType models the XQDM data type
       – Atomic Value is tagged with Type
             • Support Dynamic type Match & Check
             • Support Heterogeneous XQDM sequence
       – Tree node is logically a node reference
             • Node IDs for node identity & document order
               support




        Qiuyue Wang
Master Informatique          XML Data Management                    95
                                                             XML and RDBMSs


                XERA – XQSQL operators
  • New XQSQL functions operating on XMLType are added to
    XERA
         –   XQuery Constructor operators
         –   XPath Node Navigation operator - XQPath
         –   Full XQuery F&O XQSQL operators
         –   XQuery Type Match & Dynamic Type Check operators
         –   XQuery Polymorphic arithmetic & comparison operators
               • If static type check fails, using polymorphic operators
         – XQuery Node Id Comparison operators for node comparison,
           document order sorting
         – Union/Intersect/Except XQuery F&O uses SQL
           Union/Intersect/Except with Node Id Comparison operator



        Qiuyue Wang
Master Informatique                 XML Data Management                       96
                                               XML and RDBMSs


                  XERA - Table Function
• XQDM iteration is compiled into SQL iterator
  using table function
        – Select .. From Table(XQSequence(…))
• Implements open-fetch-close interface
• XQDM is not materialized whenever possible
• Table function iterator works with the rest of
  SQL iterator trees
        – streaming among SQL & XML data


        Qiuyue Wang
Master Informatique      XML Data Management                    97
                                             XML and RDBMSs


   XERA - User Defined Aggregators
• Polymorphic user defined aggregators for
  XQuery aggregation whose type is unknown
  statically
• XQAgg aggregates table function result into
  XQDM
• Important algebraic identity
     – XQDMSeq = Select XQAgg(t.value order by t.pos)
       from table(xs(XQDMSeq)) t


        Qiuyue Wang
Master Informatique    XML Data Management                    98
                                              XML and RDBMSs


                  Example – User Query
  xmlt is a table storing XML document
    instances

  select count(*)
  from xmlt v
  where xmlexists('$x/a/b[c="cv"][d="dv"]' passing
    value(v) as "x") ;




        Qiuyue Wang
Master Informatique     XML Data Management                    99
                                                          XML and RDBMSs


                      Example (Step 1)
  • XML Extended Relational Algebra (XERA)
  • Produced by XQuery-SQL/XML Hybrid Compiler
      select count(*)
      from xmlt v
      where exists(
        select 1
        from table(xs(xqpath('$x/a/b') passing value(v) as "x")) v1
        where
           exists
             (select 1
              from table(xs(xqpath('$v1/c') passing value(v1))) v2
              where xqexval(value(v2)) = "cv")
            and
          exists
             (select 1
              from table(xs(xqpath('$v1/d') passing value(v1))) v3
              where xqexval(value(v3)) = "dv")
        )
        Qiuyue Wang
Master Informatique               XML Data Management                      100
                                                         XML and RDBMSs


                      XERA - Optimization
                          XQuery            SQL/XML


                      SQL/XML XQuery Hybrid Compiler
  XERA

                 XML Storage/Index Independent Optimizations
                              (Logical Rewrite)



                             Optimized XERA




        Qiuyue Wang
Master Informatique                XML Data Management                    101
                                                  XML and RDBMSs


                      XERA - Optimization
• Optimistic static typing
• Join optimization
     – Ordered left dependency join  SQL joins
•   Nested FLWR block merge (view merge)
•   Existence to semi-join (subquery unnesting)
•   XQPath push down
•   XQPath cancellation with construction
•   Variable & Function inlining & factoring
•   ……

        Qiuyue Wang
Master Informatique         XML Data Management                    102
                                                XML and RDBMSs


                      Example (Step 2)
   • Post logical rewrite – optimized XERA
 select count(*)
 from xmlt v,
      semi-join lateral(table(xs(xqpath('$x/a/b') passing
 value(v) as "x"))) v1,
      semi-join lateral(table(xs(xqpath('$v1/c') passing
 value(v1) as "v1"))) v2,
      semi-join lateral(table(xs(xqpath('$v1/d') passing
 value(v1) as "v1"))) v3
 where xqexval(value(v2)) = 'cv' and xqexval(value(v3)) =
 'dv'


        Qiuyue Wang
Master Informatique       XML Data Management                    103
                                                          XML and RDBMSs

         Overall XQuery Architecture
                          XQuery            SQL/XML


                      SQL/XML XQuery Hybrid Compiler
  XERA

                 XML Storage/Index Independent Optimizations
Optimized                     (Logical Rewrite)
 XERA


   XML Storage/Index Dependent Optimizations (Physical Rewrite)
Structured Storage, SQL/XML over relational data, Binary XML with XMLIndex


                                                                     Cost
                      Relational Optimization with Cost             Analysis



        Qiuyue Wang
Master Informatique                XML Data Management                         104
                                                XML and RDBMSs


                      Physical Rewrite
• XQSQL for XPath Navigation
     – XQPath(xqdm, xpath)
• Physical Rewrite for XQPath Operator
     – OR storage: object relational query
     – SQL/XML view: relational query
     – XMLIndex (XPath Index / Value Index): query over
       index storage tables
     – Binary XML: NFA based streaming evaluation
     – Hybrid of XMLIndex with NFA based streaming
       evaluation
        Qiuyue Wang
Master Informatique       XML Data Management                    105
                                            XML and RDBMSs

 Example - Post Physical Rewrite –
           Object Relational Storage

 select count(*)
 from xmlt v,
       semi-join tab_b b, semi-join tab_c c, semi-join
 tab_d d
 where v.docid = b.docid and b.nid = c.nid and b.nid2 =
 d.nid and c.val = 'cv' and d.val = 'dv'




        Qiuyue Wang
Master Informatique   XML Data Management                    106
                                                XML and RDBMSs

 Example - Post Physical Rewrite –
                      using XMLIndex

 select count(*)
 from xmlt v, semi-join pathtable p1,
      semi-join pathtable p2, semi-join pathtable p3
 where v.docid = p1.docid and p1.pid = pid('/a/b') and
        p2.pid = pid('/a/b/c') and p2.docid = p1.docid and
 p2.value = "cv“
        and parent_key(p2.orderkey) = p1.orderkey
        and p3.pid = pid('/a/b/d') and p3.docid = p1.docid
 and p3.value = "dv"
        and parent_key(p3.orderkey) = p1.orderkey)


        Qiuyue Wang
Master Informatique       XML Data Management                    107
                                            XML and RDBMSs

 Example - Post Physical Rewrite –
            Binary Stream Evaluation

 select count(*)
 from xmlt v,
      semi-join lateral(xpathtable('$x/a/b')             passing
 value(v) as "x")) v1,
      semi-join lateral(xpathtable('$v1/c')              passing
 value(v1) as "v1")) v2,
      semi-join lateral(xpathtable('$v1/d')              passing
 value(v1) as "v1")) v3
 where v2.value= 'cv' and v3.value = 'dv'



        Qiuyue Wang
Master Informatique   XML Data Management                    108
                                            XML and RDBMSs

 Example - Post Physical Rewrite –
         XMLIndex & Binary Stream

 select count(*)
 from xmlt v, semi-join (select mkini(p1) as v1
                         from pathtable p1
                         where p1.pid = pid(‘/a/b’) and
 p1.docid = v.docid) pv,
      semi-join lateral(xpathtable('$v1/c') passing pv.v1
 as "v1")) v2,
      semi-join lateral(xpathtable('$v1/d') passing pv.v1
 as "v1")) v3
 where v2.value= 'cv' and v3.value = 'dv'


        Qiuyue Wang
Master Informatique   XML Data Management                    109

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:1/25/2013
language:Unknown
pages:108