Document Sample
XPathQuilt Powered By Docstoc
					                          CIS 550

                   Handout 7 -- XPATH and Quilt

CIS550 Handout 6
                            XPath some nodes from a given
    Primary goal = to permit to access
• XPath main construct : axis navigation
• An XPath path consists of one or more navigation steps,
  separated by /
• A navigation step is a triplet: axis + node-test + list of
• Examples
    – /descendant::node()/child::author
    – /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2]

• XPath also offers some shortcuts
    – no axis means child
    – //  /descendant-or-self::node()/
• XPath/XSL-T quickref

CIS550 Handout 6
             XPath- child axis navigation
• author is shorthand for child::author. Examples:
     – aaa -- all the child nodes labeled aaa (1,3)
     – aaa/bbb -- all the bbb grandchildren of aaa children (4)
     – */bbb all the bbb grandchildren of any child (4,6)

                                                        context node

                     1   aaa    2    ccc    3    aaa

              4            5          6           7
                   bbb         aaa         bbb         ccc

     – . -- the context node
     – / -- the root node

CIS550 Handout 6
     XPath- child axis navigation (cont)
     – /doc -- all the doc children of the root
     – ./aaa -- all the aaa children of the context node
       (equivalent to aaa)
     – text() -- all the text children of the context node
     – node() -- all the children of the context node (includes
       text and attribute nodes)
     – .. -- parent of the context node
     – .// -- the context node and all its descendants
     – // -- the root node and all its descendants
     – //para -- all the para nodes in the document
     – //text() -- all the text nodes in the document
     – @font the font attribute node of the context node

CIS550 Handout 6
     – [2] -- the second child node of the context node
     – chapter[5] -- the fifth chapter child of the context node
     – [last()] -- the last child node of the context node
     – chapter[title=“introduction”] -- the chapter children of the
       context node that have one or more title children whose
       string-value is “introduction” (the string-value is the
       concatenation of all the text on descendant text nodes)
     – person[.//firstname = “joe”] -- the person children of the
       context node that have in their descendants a firstname
       element with string-value “Joe”
     – From the XPath specification:
       NOTE: If $x is bound to a node set then $x = “foo” does not mean
       the same as not ($x != “foo”) ...

CIS550 Handout 6
             Unions of Path Expressions
• employee consultant -- the union of the employee and
  consultant nodes that are children of the context node
• For some reason person/(employeeconsultant) --as in regular
  path expressions -- is not allowed
• However person/node()[boolean(employeeconsultant)] is
• From the XPATH specification:
   – The boolean function converts its argument to a boolean as
          • a number is true if and only if it is neither positive or negative zero
            nor NaN
          • a node-set is true if and only if it is non-empty
          • a string is true if and only if its length is non-zero
          • an object of a type other than the four basic types is converted to a
            boolean in a way that is dependent on that type

CIS550 Handout 6
                       Axis navigation
• So far, nearly all our expressions have moved us down the by
  moving to child nodes. Exceptions were
     –   . -- stay where you are
     –   / go to the root
     –   // all descendants of the root
     –   .// all descendants of the context node
• All other expressions have been abbreviations for child::…
  e.g. child::para. child:is an example of an axis
• XPath has several axes: ancestor, ancestor-or-self, attribute,
  child, descendant, descendant-or-self, following, following-
  sibling, namespace, parent, preceding, preceding-sibling, self
     – Some of these (self, parent) describe single nodes, others
       describe sequences of nodes.

CIS550 Handout 6
                       XPath Navigation Axes
                                      (merci, Arnaud)


preceding-sibling                                          following-sibling



           preceding                                      following


  CIS550 Handout 6
               XPath abbreviated syntax

          (nothing)       child::
          @               attribute::
          //              /descendant-or-self::node()
          .               self::node()
          .//             descendant-or-self::node
          ..              parent::node()
          /               (document root)

CIS550 Handout 6
• Reasonably widely adopted -- in XML-Schema and
  query languages.
• Neither more expressive nor less expressive than
  regular path expressions (can’t do (ab)* )
• Particularly messy in some areas:
     – defining order of results
     – overloading of operations,
          • e.g. [chapter/title = “Introduction”]
          • why not [ “Introduction” IN chapter/title] ?

CIS550 Handout 6
            proposed by Chamberlin, Robbie and Florescu

                     (from the authors’ slides)

• Leverage the most effective features of several existing
  and proposed query languages
• Design a small, clean, implementable language
• Cover the functionality required by all the XML Query use
  cases in a single language
• Write queries that fit on a slide
• Design a quilt, not a camel

CIS550 Handout 6
                    Quilt/Kweelt URLs

Quilt (the language)

Kweelt (the implementation)
         (examples in these notes)

 CIS550 Handout 6
Quilt = XPath + “comprehension” syntax
• XML -QL                          bind variables
                   where     <pattern> in <XML-expression>
                             <pattern> in <XML-expression>
                             <condition>     use variables
                   construct <expression>

                                 bind variables
• Quilt               for    x in <XPath-expression>
                             y in <XPath-expression>
                      where <condition>        use variables
                      return <expression>
CIS550 Handout 6
                          Examples of Quilt
     (from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt )
Relational data -- two DTDs:
                   <?xml version="1.0" ?>
                   <!DOCTYPE items [
                    <!ELEMENT items       (item_tuple*)>
                    <!ELEMENT item_tuple (itemno, description, offered_by, start_date?,
                                              end_date?, reserve_price? )>
                    <!ELEMENT itemno       (#PCDATA)>
                    <!ELEMENT description (#PCDATA)>
                    <!ELEMENT offered_by (#PCDATA)>
                    <!ELEMENT start_date (#PCDATA)>
                    <!ELEMENT end_date (#PCDATA)>
                    <!ELEMENT reserve_price (#PCDATA)>
                   <?xml version="1.0" ?>
                   <!DOCTYPE bids [
                    <!ELEMENT bids       (bid_tuple*)>
                    <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)>
                    <!ELEMENT userid (#PCDATA)>
                    <!ELEMENT itemno (#PCDATA)>
                    <!ELEMENT bid       (#PCDATA)>
                    <!ELEMENT bid_date (#PCDATA)>

CIS550 Handout 6
                                  The data
      <description>Red Bicycle</description>


CIS550 Handout 6
                            Query 1
 FUNCTION date()        simple function definitions
   "1999-02-01"                          XPath expressions
 }                                       in orange
   FOR $i IN document("items.xml")//item_tuple
   WHERE $i/start_date LEQ date()
     AND $i/end_date GEQ date()
                                             dates are formatted so
     AND contains($i/description, "Bicycle")
   RETURN                                    that lexicographic
     <item_tuple>                            ordering gives the right
       $i/itemno ,                           result
     </item_tuple> SORTBY (itemno)

CIS550 Handout 6
                      Output from Q1

               <?xml version="1.0" ?>
                  <itemno> 1003 </itemno>
                  <description> Old Bicycle </description>
                  <itemno> 1007 </itemno>
                  <description> Racing Bicycle </description>

CIS550 Handout 6
                          Query Q2
For all bicycles, list the item number, description, and
highest bid (if any), ordered by item number.
  FOR $i IN document("items.xml")//item_tuple
  LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno]
  WHERE contains($i/description, "Bicycle")
    <item_tuple>                                   use of variable
      $i/itemno ,                                  in Xpath
      $i/description ,
      IF ($b) THEN
         <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid>
      ELSE ""
    </item_tuple> SORTBY (itemno)
  )                                              lots of coercion
CIS550 Handout 6
                       Output from Q2
                    <itemno> 1001 </itemno>
                    <description> Red Bicycle </description>
                    <high_bid> 55 </high_bid>
                    <itemno> 1003 </itemno>
                    <description> Old Bicycle </description>
                    <high_bid> 20 </high_bid>
                    <itemno> 1007 </itemno>
                    <description> Racing Bicycle </description>
                    <high_bid> 225 </high_bid>
                    <itemno> 1008 </itemno>
                    <description> Broken Bicycle </description>
CIS550 Handout 6
                             Query Q3
Find cases where a user with a rating worse (alphabetically greater
than "C" ) offers an item with a reserve price of more than 1000.

  FOR $u IN document("users.xml")//user_tuple,
     $i IN document("items.xml")//item_tuple
  WHERE $u/rating GT 'C'
    AND $i/reserve_price GT 1000            Comparing sets with singletons
    AND $i/offered_by = $u/userid           Same rules as in XPath? In this
  RETURN                                    case the DTD gives uniqueness

  CIS550 Handout 6
             Quilt -- Attributes and IDs
                              <person name = "Bill" job = "Teacher">
<?xml version="1.0" ?>
                               <person name = "Joe" job = "Painter" spouse = "Martha">
<!DOCTYPE census [
                                 <person name = "Sam" job = "Nurse">
 <!ELEMENT census (person*)>
                                   <person name = "Fred" job = "Senator" spouse = "Jane">
 <!ELEMENT person (person*)>
 <!ATTLIST person
    name ID       #REQUIRED
                                 <person name = "Karen" job = "Doctor" spouse = "Steve">
    spouse IDREF #IMPLIED
    job CDATA #IMPLIED >
                               <person name = "Mary" job = "Pilot">
                                 <person name = "Susan" job = "Pilot" spouse = "Dave">
                              <person name = "Frank" job = "Writer">
                               <person name = "Martha" job = "Programmer" spouse = "Joe">
                                 <person name = "Dave" job = "Athlete" spouse = "Susan">
CIS550 Handout 6
                          Query Q1

Find Martha's spouse:

FOR $m IN document("census.xml")//person[@name="Martha"]
RETURN shallow($m/@spouse->{person@name})

                                          A hack. Kweelt
   The shallow function
   strips an element of   Dereferencing   does not read
   its subelements.                       the DTD

 CIS550 Handout 6
                       Query Q6

  Find Bill's grandchildren.

    FOR $b IN document("census.xml")//person[@name = "Bill"] ,
      $c IN $b/person | $b/@spouse->{person@name}/person ,
      $g IN $c/person | $c/@spouse->{person@name}/person

CIS550 Handout 6
                   Status of XML types
• DTDs -- widely used, but limited
     – lack of base types
     – untyped pointers (IDs and IDREFs)
     – no tuple types (hence no record subtyping or inheritance)
• XML-schema -- lots of hoopla, but
     – not stable
     – too complex
• Others: RDF (not really types for XML) SOX,
  Relax, Schematron
• Opinions:
     – None of these is good for database design.
     – Something new is needed (some core of XML-schema)

CIS550 Handout 6
        Status of XML Query languages
• None of them are really typed (by a DTD or
  anything else).
• Type errors show up as empty answers
• XML-QL probably the most elegant, but too
• XSL and descendants are working (in IE 5)
• Quilt -- nice extension of XPath, but XPath is
  quite complex.
• Nothing like an “algebra” for any of these (though
  some ideas are now emerging)
• Nothing like database optimization yet exists.
• Do we need something simpler?
CIS550 Handout 6

Shared By: