Docstoc

Introduction to XQuery

Document Sample
Introduction to XQuery Powered By Docstoc
					 Introduction to XQuery
     Bob DuCharme
    www.snee.com/bob
      bob@snee.com
these slides: www.snee.com/xml
            What is XQuery?

“ A query language that uses the structure of
XML intelligently can express queries across
all these kinds of data, whether physically
stored in XML or viewed as XML via
middleware. This specification describes a
query language called XQuery, which is
designed to be broadly applicable across
many types of XML data sources.”


“ XQuery 1.0: An XML Query Language”
W3C Working Draft
                  History

• February 1998: XML (Rec)
• November 1999: XSLT 1.0, Xpath 1.0 (Recs)
• (as of 8 June 2005): XPath 2.0, XSLT 2.0,
  XQuery 1.0 in “last call Working Draft”
  status
• Steps for a W3C “standard”:
   – Working Draft
   – Last Call Working Draft
   – Candidate Recommendation
   – Proposed Recommendation
   – Recommendation
     input1.xml sample document



<doc>
  <p>This is a sample file.</p>
  <p>This line <emph>really</emph> has an inline
element.</p>
  <p>This line doesn't.</p>
  <p>Do <emph>you</emph> like inline elements?</p>
</doc>
                Our first query



Querying from the command line:
java net.sf.saxon.Query " {doc('input1.xml')//p[emph]} "


Result:
<?xml version="1.0" encoding="UTF-8"?>
<p>This line <emph>really</emph> has an inline
  element.</p>
<p>Do <emph>you</emph> like inline elements?</p>
         Query stored in a file

• xq1.xqy:
(: Here is an XQuery comment. :)
doc('data1.xml')//p[emph]


• Executing it:
java net.sf.saxon.Query xq1.xqy
      Simplifying the command line

• Linux shell script xquery :

  java net.sf.saxon.Query $1 $2 $3 $4 $5 $6


• Windows batch file xquery.bat :

  java net.sf.saxon.Query %1 %2 %3 %4 %5 %6


  (assuming saxon8.jar is in classpath)

• Executing either:

  xquery xq1.xqy
    Data for more serious examples

• RecipeML: DTD and documentation
  http://www.formatdata.com/recipeml


• Squirrel's RecipeML Archive
  http://dsquirrel.tripod.com/recipeml/indexrecipes2.html



• My sample: 294 files
         RecipeML: typical structure
<recipeml version="0.5">
  <recipe>

   <head>
     <title>Walnut Vinaigrette</title>
     <categories><cat>Dressings</cat></categories>
     <yield>1</yield>
   </head>

   <ingredients>
     <ing>
       <amt><qty>1</qty><unit>cup</unit></amt>
       <item>Canned No Salt Chicken</item></ing>
     <ing>
     <!-- more ing elements -->
   </ingredients>

   <directions>
     <step>Bring chicken broth to a boil.</step>
     <!-- more step elements -->
   </directions>

  </recipe>
</recipeml>
     Saxon and collection() function

• Argument to function names document
  in this format:

<collection>
 <doc href="_Band__Sloppy_Joes.xml"/>
 <doc href="_Cheese__Fricadelle.xml"/>
 <!-- more doc elements... -->
 <doc href="Walton_Mountain_Coffee_Cake.xml"/>
 <doc href="Walty's_Dressing.xml"/>
 <doc href="Wan_Tan_(Wonton).xml"/>
</collection>
          Looking for some sugar



collection('recipeml/docs.xml')/recipeml/
  recipe/head/title
  [//ingredients/ing/item[contains(.,'sugar')]]
         A more SQL-like approach



for $ingredient in collection('recipeml/docs.xml')//
          ingredients/ing/item[contains(.,'sugar')]
     return $ingredient/../../../head/title
         Outputting well-formed XML


<sweets>
  {
      let $target := 'sugar'


      for $ingredient in collection('recipeml/docs.xml')//
               ingredients/ing/item[contains(., $target )]
      return $ingredient/../../../head/title
  }
</sweets>
            FLWOR expressions
•   for
•   let
•   where
•   order by
•   return

"a FLWOR expression ... supports iteration
   and binding of variables to intermediate
   results. This kind of expression is often
   useful for computing joins between two or
   more documents and for restructuring
   data."
       Extracting subsets: XPath vs.
             FLWOR approach
• Get the title element for each recipe whose yield is
  greater than 20:


collection('recipeml/docs.xml')/recipeml/
  recipe/head/title[../yield > 20]


• Go through all the documents in the collection, and
  for any with a yield of more than 20, get the title:

for $doc in
  collection('recipeml/docs.xml')/recipeml
where $doc/recipe/head/yield > 20
return $doc/recipe/head/title
       Doing more with the for clause
                 variable
(: Create an HTML page linking to recipes
   that serve more than 20 people.              :)


<html><head><title>Food for a Crowd</title></head>
<body>
  <h1>Food for a Crowd</h1>
  {
      for $doc in collection('recipeml/docs.xml')
      where $doc /recipeml/recipe/head/yield > 20
      return
         <p><a href="{document-uri( $doc )}">
         { $doc /recipeml/recipe/head/title/text()}
         </a></p>
  }
</body></html>
  Calling functions from a let clause

(: Which recipe(s) serves the most people?   :)


let $maxYield :=
 max(collection('recipeml/docs.xml')/recipeml/
 recipe/head/yield)


return collection('recipeml/docs.xml')/recipeml/
  recipe[head/yield = $maxYield]
    distinct-values and order by
(: A unique, sorted list of all unique
  ingredients in the recipe collection,
    with URLS to link to the recipes. :)

<ingredients>
{
  for $ingr in
distinct-values( collection('recipeml/docs.xml')/
   recipeml/recipe/ingredients/ing/item )
  order by $ingr
  return
    <item name="{$ingr}">
    {
      for $doc in
         collection('recipeml/docs.xml')
      where $doc/recipeml/recipe/
            ingredients/ing/item = $ingr
   distinct-values and order by,
             continued


return
      <title url="{document-uri($doc)}">
{$doc/recipeml/recipe/head/title/ text() }
       </title>
    }
    </item>
}
</ingredients>
<ingredients>
                      Excerpt from output
  <!-- some item elements removed -->               "Gold Room" Scones</title>
  <item name=" (12-oz) tomato paste ">         <title url="file:/c:/dat/recipeml/
    <title url="file:/C:/dat/recipeml/          _Outrageous_Chocolate_Chipper.xml">
           _Best_Ever__Pizza_Sauce.xml">   "Outrageous" Chocolate-Oatmeal Chipper
      "Best Ever" Pizza Sauce</title>      (Cooki</title>
  </item>                                    </item>
  <item name=" Baking Powder">               <item name="Baking soda">
    <title url="file:/c:/dat/recipeml/         <title url="file:/c:/dat/recipeml/
                _Blondie__Brownies.xml">                _First__Ginger_Cookies.xml">
      "Blondie" Brownies</title>           "First" Ginger Molasses Cookies</title>
    <title url="file:/c:/dat/recipeml/         <title url="file:/c:/dat/recipeml/
                 Walnut_Pound_Cake.xml">                     _Foot_in_the_Cake.xml">
      Walnut Pound Cake</title>            "Foot in the Fire" Chocolate Cake</title>
  </item>                                    </item>
  <item name=" Baking Soda ">                <item name="Tomato paste">
    <title url="file:/c:/dat/recipeml/         <title url="file:/C:/dat/recipeml/
                  _Faux__Sourdough.xml">                    Crawfish_Etouff'ee.xml">
      "Faux" Sourdough</title>             "Frank's Place" Crawfish Etouff'ee
  </item>                                  </title>
  <item name=" Baking potatoes ">              <title url="file:/C:/dat/recipeml/
<title url="file:/c:/dat/recipeml/             Hamburger____Ground_Meat_Balti.xml">
                    _Indian_Chili_.xml">         "Hamburger" / Ground Meat Balti
      "Indian Chili"</title>               </title>
  </item>                                      <title url="file:/C:/dat/recipeml/
<item name=" Baking powder ">                                    Indian_Chili_.xml">
    <title url="file:/c:/dat/recipeml/            "Indian Chili"</title>
          _Best__Apple_Nut_Pudding.xml">     </item>
      "Best" Apple Nut Pudding</title>       <!-- some item elements removed -->
 <title url="file:/c:/dat/recipeml/        </ingredients>
                _Gold_Room__Scones.xml">
  RecipeML: varying markup richness
• One way to do it:

  <ing><item>
    (12-oz) tomato paste
  </item></ing>


• Another way:
  <ing>
    <amt>
      <qty>12</qty>
      <unit>oz</unit>
    </amt>
    <item>tomato paste</item>
  </ing>
     Normalizing data with declared
               functions
(: A unique, sorted list of all unique ingredients in
   the recipe collection, with URLs to link to them.
   Ingredient names get normalized by functions
   declared in the query prolog. :)

declare namespace sn = "http://www.snee.com/ns/misc/" ;

declare function sn:normIngName($ingName) as xs:string {
  (: Normalize ingredient name. :)
  (: remove parenthesized expression that may begin
      string, e.g. in "(10 ozs) Rotel diced tomatoes":)
  let $normedName := replace($ingName,"^\(.*?\)\s*","")
  (: convert to all lower-case :)
  let $normedName := lower-case($normedName)
  (: replace multiple spaces with a
     single one :)
  let $normedName := normalize-space($normedName)

  return $normedName
};
 Normalizing data with functions, part
                2 of 3

declare function sn:normIngList($ingList) as item()* {
   (: Normalize a list of ingredient names. :)
   for $ingName in $ingList
     return sn:normIngName($ingName)
};


<ingredients>
{
  let $normIngNames :=
sn:normIngList(collection('recipeml/docs.xml')//
   ing/item)
Normalizing data with functions, part
               3 of 3
for $ingr in distinct-values($normIngNames)
  order by $ingr
  return
    <item name="{$ingr}">
    {
      for $doc in
        collection('recipeml/docs.xml'),
        $i in $doc/recipeml/recipe/ingredients/ing/item
      where sn:normIngName($i) = $ingr
      return
      <title url="{document-uri($doc)}">
{$doc/recipeml/recipe/head/title/text()}
        </title>
    }
    </item>
}
</ingredients>
    Specs at http://www.w3.org/tr

• XQuery 1.0: An XML Query Language
• XQuery 1.0 and XPath 2.0 Formal Semantics
• the XQuery 1.0 and XPath 2.0 Data Model
• XSLT 2.0 and XQuery 1.0 Serialization
• XQuery 1.0 and XPath 2.0 Functions and
  Operators
• XML Query Use Cases
              Other resources

• eXist: http://www.exist-db.org

• http:ww/w3.org/TR:

• MarkLogic: http://www.marklogic.com

• Mike Kay “Comparing XSLT and XQuery”:
  http://idealliance.org/proceedings/xtech05/pap
  ers/02-03-01/

• http:ww/w3.org/TR:

   – XQuery Update Requirements

   – XQuery 1.0 and XPath 2.0 Full-Text

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:9/8/2011
language:English
pages:26