XPathQuilt

Document Sample
XPathQuilt Powered By Docstoc
					                          CIS 550

                   Handout 7 -- XPATH and Quilt




CIS550 Handout 6
                                                  1
•
                            XPath some nodes from a given
    Primary goal = to permit to access
  document
• XPath main construct : axis navigation
• An XPath path consists of one or more navigation steps,
  separated by /
• A navigation step is a triplet: axis + node-test + list of
  predicates
• Examples
    – /descendant::node()/child::author
    – /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2]

• XPath also offers some shortcuts
    – no axis means child
    – //  /descendant-or-self::node()/
• XPath/XSL-T quickref
          http://www.mulberrytech.com/quickref/index.html


CIS550 Handout 6
             XPath- child axis navigation
• author is shorthand for child::author. Examples:
     – aaa -- all the child nodes labeled aaa (1,3)
     – aaa/bbb -- all the bbb grandchildren of aaa children (4)
     – */bbb all the bbb grandchildren of any child (4,6)

                                                        context node

                     1   aaa    2    ccc    3    aaa

              4            5          6           7
                   bbb         aaa         bbb         ccc


     – . -- the context node
     – / -- the root node



CIS550 Handout 6
                                                                       3
     XPath- child axis navigation (cont)
     – /doc -- all the doc children of the root
     – ./aaa -- all the aaa children of the context node
       (equivalent to aaa)
     – text() -- all the text children of the context node
     – node() -- all the children of the context node (includes
       text and attribute nodes)
     – .. -- parent of the context node
     – .// -- the context node and all its descendants
     – // -- the root node and all its descendants
     – //para -- all the para nodes in the document
     – //text() -- all the text nodes in the document
     – @font the font attribute node of the context node



CIS550 Handout 6
                                                                  4
                          Predicates
     – [2] -- the second child node of the context node
     – chapter[5] -- the fifth chapter child of the context node
     – [last()] -- the last child node of the context node
     – chapter[title=“introduction”] -- the chapter children of the
       context node that have one or more title children whose
       string-value is “introduction” (the string-value is the
       concatenation of all the text on descendant text nodes)
     – person[.//firstname = “joe”] -- the person children of the
       context node that have in their descendants a firstname
       element with string-value “Joe”
     – From the XPath specification:
       NOTE: If $x is bound to a node set then $x = “foo” does not mean
       the same as not ($x != “foo”) ...



CIS550 Handout 6
                                                                      5
             Unions of Path Expressions
• employee consultant -- the union of the employee and
  consultant nodes that are children of the context node
• For some reason person/(employeeconsultant) --as in regular
  path expressions -- is not allowed
• However person/node()[boolean(employeeconsultant)] is
  allowed!!
• From the XPATH specification:
   – The boolean function converts its argument to a boolean as
      follows:
          • a number is true if and only if it is neither positive or negative zero
            nor NaN
          • a node-set is true if and only if it is non-empty
          • a string is true if and only if its length is non-zero
          • an object of a type other than the four basic types is converted to a
            boolean in a way that is dependent on that type

CIS550 Handout 6
                                                                                      6
                       Axis navigation
• So far, nearly all our expressions have moved us down the by
  moving to child nodes. Exceptions were
     –   . -- stay where you are
     –   / go to the root
     –   // all descendants of the root
     –   .// all descendants of the context node
• All other expressions have been abbreviations for child::…
  e.g. child::para. child:is an example of an axis
• XPath has several axes: ancestor, ancestor-or-self, attribute,
  child, descendant, descendant-or-self, following, following-
  sibling, namespace, parent, preceding, preceding-sibling, self
     – Some of these (self, parent) describe single nodes, others
       describe sequences of nodes.




CIS550 Handout 6
                                                                    7
                       XPath Navigation Axes
                                      (merci, Arnaud)


                                             ancestor




preceding-sibling                                          following-sibling

                                                   self



                                               child




                        attribute
           preceding                                      following
                                namespace



                                            descendant




  CIS550 Handout 6
               XPath abbreviated syntax


          (nothing)       child::
          @               attribute::
          //              /descendant-or-self::node()
          .               self::node()
          .//             descendant-or-self::node
          ..              parent::node()
          /               (document root)




CIS550 Handout 6
                              XPath
• Reasonably widely adopted -- in XML-Schema and
  query languages.
• Neither more expressive nor less expressive than
  regular path expressions (can’t do (ab)* )
• Particularly messy in some areas:
     – defining order of results
     – overloading of operations,
          • e.g. [chapter/title = “Introduction”]
          • why not [ “Introduction” IN chapter/title] ?




CIS550 Handout 6
                                                           10
                              Quilt
            proposed by Chamberlin, Robbie and Florescu

                     (from the authors’ slides)

• Leverage the most effective features of several existing
  and proposed query languages
• Design a small, clean, implementable language
• Cover the functionality required by all the XML Query use
  cases in a single language
• Write queries that fit on a slide
• Design a quilt, not a camel




CIS550 Handout 6
                                                              11
                    Quilt/Kweelt URLs

Quilt (the language)
http://www.almaden.ibm.com/cs/people/chamberlin/quilt_lncs.pdf

Kweelt (the implementation)
    http://db.cis.upenn.edu/Kweelt/
    http://db.cis.upenn.edu/Kweelt/useCases
         (examples in these notes)




 CIS550 Handout 6
                                                                 12
Quilt = XPath + “comprehension” syntax
• XML -QL                          bind variables
                   where     <pattern> in <XML-expression>
                             <pattern> in <XML-expression>
                              …
                             <condition>     use variables
                   construct <expression>

                                 bind variables
• Quilt               for    x in <XPath-expression>
                             y in <XPath-expression>
                              …
                      where <condition>        use variables
                      return <expression>
CIS550 Handout 6
                                                             13
                          Examples of Quilt
     (from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt )
Relational data -- two DTDs:
                   <?xml version="1.0" ?>
                   <!DOCTYPE items [
                    <!ELEMENT items       (item_tuple*)>
                    <!ELEMENT item_tuple (itemno, description, offered_by, start_date?,
                                              end_date?, reserve_price? )>
                    <!ELEMENT itemno       (#PCDATA)>
                    <!ELEMENT description (#PCDATA)>
                    <!ELEMENT offered_by (#PCDATA)>
                    <!ELEMENT start_date (#PCDATA)>
                    <!ELEMENT end_date (#PCDATA)>
                    <!ELEMENT reserve_price (#PCDATA)>
                   ]>
                   <?xml version="1.0" ?>
                   <!DOCTYPE bids [
                    <!ELEMENT bids       (bid_tuple*)>
                    <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)>
                    <!ELEMENT userid (#PCDATA)>
                    <!ELEMENT itemno (#PCDATA)>
                    <!ELEMENT bid       (#PCDATA)>
                    <!ELEMENT bid_date (#PCDATA)>
                   ]>

CIS550 Handout 6
                                                                                          14
                                  The data
      <items>
                                               <bids>
      <item_tuple>
                                               <bid_tuple>
      <itemno>1001</itemno>
                                               <userid>U02</userid>
      <description>Red Bicycle</description>
                                               <itemno>1001</itemno>
      <offered_by>U01</offered_by>
                                               <bid>35</bid>
      <start_date>1999-01-05</start_date>
                                               <bid_date>99-01-07</bid_date>
      <end_date>1999-01-20</end_date>
                                               </bid_tuple>
      <reserve_price>40</reserve_price>
      </item_tuple>
                                               <bid_tuple>
                                               <userid>U04</userid>
      <item_tuple>
                                               <itemno>1001</itemno>
      <itemno>1002</itemno>
                                               <bid>40</bid>
      <description>Motorcycle</description>
                                               <bid_date>99-01-08</bid_date>
      <offered_by>U02</offered_by>
                                               </bid_tuple>
      <start_date>1999-02-11</start_date>
      <end_date>1999-03-15</end_date>
                                               …
      <reserve_price>500</reserve_price>
      </item_tuple>
                                               </bids>
      …

      </items>

CIS550 Handout 6
                                                                               15
                            Query 1
 FUNCTION date()        simple function definitions
 {
   "1999-02-01"                          XPath expressions
 }                                       in orange
 <result>
   (
   FOR $i IN document("items.xml")//item_tuple
   WHERE $i/start_date LEQ date()
     AND $i/end_date GEQ date()
                                             dates are formatted so
     AND contains($i/description, "Bicycle")
   RETURN                                    that lexicographic
     <item_tuple>                            ordering gives the right
       $i/itemno ,                           result
       $i/description
     </item_tuple> SORTBY (itemno)
   )
 </result>

CIS550 Handout 6
                                                                  16
                      Output from Q1

               <?xml version="1.0" ?>
               <result>
                <item_tuple>
                  <itemno> 1003 </itemno>
                  <description> Old Bicycle </description>
                </item_tuple>
                <item_tuple>
                  <itemno> 1007 </itemno>
                  <description> Racing Bicycle </description>
                </item_tuple>
               </result>




CIS550 Handout 6
                                                                17
                          Query Q2
For all bicycles, list the item number, description, and
highest bid (if any), ordered by item number.
<result>
  (
  FOR $i IN document("items.xml")//item_tuple
  LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno]
  WHERE contains($i/description, "Bicycle")
  RETURN
    <item_tuple>                                   use of variable
      $i/itemno ,                                  in Xpath
      $i/description ,
      IF ($b) THEN
         <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid>
      ELSE ""
    </item_tuple> SORTBY (itemno)
  )                                              lots of coercion
</result>
CIS550 Handout 6
                                                                    18
                       Output from Q2
          <result>
                   <item_tuple>
                    <itemno> 1001 </itemno>
                    <description> Red Bicycle </description>
                    <high_bid> 55 </high_bid>
                   </item_tuple>
                   <item_tuple>
                    <itemno> 1003 </itemno>
                    <description> Old Bicycle </description>
                    <high_bid> 20 </high_bid>
                   </item_tuple>
                   <item_tuple>
                    <itemno> 1007 </itemno>
                    <description> Racing Bicycle </description>
                    <high_bid> 225 </high_bid>
                   </item_tuple>
                   <item_tuple>
                    <itemno> 1008 </itemno>
                    <description> Broken Bicycle </description>
                   </item_tuple>
          </result>
CIS550 Handout 6
                                                                  19
                             Query Q3
Find cases where a user with a rating worse (alphabetically greater
than "C" ) offers an item with a reserve price of more than 1000.

<result>
  (
  FOR $u IN document("users.xml")//user_tuple,
     $i IN document("items.xml")//item_tuple
  WHERE $u/rating GT 'C'
    AND $i/reserve_price GT 1000            Comparing sets with singletons
    AND $i/offered_by = $u/userid           Same rules as in XPath? In this
  RETURN                                    case the DTD gives uniqueness
    <warning>
      <user_name>$u/name/text()</user_name>,
      <user_rating>$u/rating/text()</user_rating>,
      <item_description>$i/description/text()</item_description>,
      $i/reserve_price
    </warning>
  )
</result>

  CIS550 Handout 6
                                                                              20
             Quilt -- Attributes and IDs
                             <census>
                              <person name = "Bill" job = "Teacher">
<?xml version="1.0" ?>
                               <person name = "Joe" job = "Painter" spouse = "Martha">
<!DOCTYPE census [
                                 <person name = "Sam" job = "Nurse">
 <!ELEMENT census (person*)>
                                   <person name = "Fred" job = "Senator" spouse = "Jane">
 <!ELEMENT person (person*)>
                                   </person>
 <!ATTLIST person
                                 </person>
    name ID       #REQUIRED
                                 <person name = "Karen" job = "Doctor" spouse = "Steve">
    spouse IDREF #IMPLIED
                                 </person>
    job CDATA #IMPLIED >
                               </person>
]>
                               <person name = "Mary" job = "Pilot">
                                 <person name = "Susan" job = "Pilot" spouse = "Dave">
                                 </person>
                               </person>
                              </person>
                              <person name = "Frank" job = "Writer">
                               <person name = "Martha" job = "Programmer" spouse = "Joe">
                                 <person name = "Dave" job = "Athlete" spouse = "Susan">
                                 </person>
                               </person>
                                ...
                              </person>
                             </census>
CIS550 Handout 6
                                                                                   21
                          Query Q1

Find Martha's spouse:

FOR $m IN document("census.xml")//person[@name="Martha"]
RETURN shallow($m/@spouse->{person@name})



                                          A hack. Kweelt
   The shallow function
   strips an element of   Dereferencing   does not read
   its subelements.                       the DTD




 CIS550 Handout 6
                                                           22
                       Query Q6

  Find Bill's grandchildren.


  <result>
    (
    FOR $b IN document("census.xml")//person[@name = "Bill"] ,
      $c IN $b/person | $b/@spouse->{person@name}/person ,
      $g IN $c/person | $c/@spouse->{person@name}/person
    RETURN
      shallow($g)
    )
  </result>



CIS550 Handout 6
                                                                 23
                   Status of XML types
• DTDs -- widely used, but limited
     – lack of base types
     – untyped pointers (IDs and IDREFs)
     – no tuple types (hence no record subtyping or inheritance)
• XML-schema -- lots of hoopla, but
     – not stable
     – too complex
• Others: RDF (not really types for XML) SOX,
  Relax, Schematron
• Opinions:
     – None of these is good for database design.
     – Something new is needed (some core of XML-schema)


CIS550 Handout 6
                                                              24
        Status of XML Query languages
• None of them are really typed (by a DTD or
  anything else).
• Type errors show up as empty answers
• XML-QL probably the most elegant, but too
  powerful.
• XSL and descendants are working (in IE 5)
• Quilt -- nice extension of XPath, but XPath is
  quite complex.
• Nothing like an “algebra” for any of these (though
  some ideas are now emerging)
• Nothing like database optimization yet exists.
• Do we need something simpler?
CIS550 Handout 6
                                                   25

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:86
posted:10/29/2012
language:Latin
pages:25