XQueryP programming for XML by isp11018


									XQueryP: programming for XML
        Daniela Florescu (Oracle)*
              Don Chamberlin (IBM),
              Mike Carey (BEA),
              Donald Kossmann (ETH),
              Mary Fernandez (AT&T),
              Jonathan Robie (DataDirect),
              Jerome Simeon (IBM),
              Giorgio Ghelli (Univ. Pisa)
  * This is my own opinion, not that of my company
XQuery: the beginning
 December 1998, Boston, it was really
 XML-QL was the sparkle
 100 people excited about the potential of
 querying the large volume of XML data that
 we all anticipated
 60 proposed articles, several languages and
 algebras, architectures
 Effort estimated at 2-3 years :)
 Many (conflicting) goals and traditions
XQuery: the influences
 “Database” community
   SQL, OQL, understood declarative processing and
   Two sub-camps:
      “SQL is all you’ll ever need”
      “XML is significantly different, we need a new query language”
   Took us a long time to understand the utility of the
   “document” aspect of XML
 “Document” community
   Understood XML and SGML
   Understood the utility and usage of information markup
   Didn’t understand why the DB folks were so worried about
   the Xpath 1.0 equality not being transitive
 “Functional programming” community
   Believed that functional programming was the most
   elegant programming paradigm, static typing is a must               3
XQuery: the present
 Proposed Recommendation in Nov. 2006
 An entire family of (consistently designed)
   Abstract XML Data Model (XDM)
   XQuery 1.0
   XPath 2.0
   XSLT 2.0
   XML Functions and Operators
   XQueryX: the XML syntax for XQuery 1.0
   Formal Semantics and XML-Schema based type
XQuery: the successes
Almost 50 implementations
The three major databases (Oracle, SQL Server, DB2)
implement it
Used in application servers (Oracle, BEA)
Open source projects are flourishing
  Saxon is widely used, same for BerkeleyDB-XML
Enthusiastic customers
  not completely satisfied, but that’s a good sign…
Used in a variety of contexts:
  Temporary data, persistent data, streaming XML data
  Usage scenarios: data transformation, pure querying, analyitics,
  publishing, etc
Classes in major universities
Flourishing research community around XML processing

Why this work, why this talk ?

 Let’s look at what we have done.

 How are users trying to use XQuery ?

 Where should we go from here ?

 What happens if we do go there ?

Plan of the talk
 Status of XQuery and its update extension
 Why do we need anything else ?
   More use case scenarios
 XQueryP: the technical proposal
 Implementation and usage experience
 XQueryP: frequent criticism
 Alternatives / related work
 Potential impact on software architectures

Xquery 1.0: a read only language
  Functional heritage
    XQuery programs are expressions
    Expressions can be combined with full generality
    No real side-effects
  XQuery semantics was carefully crafted to
  allow optimizations:
    subexpressions can be evaluated in almost any
    lazy evaluation is possible
    errors are allowed to be non-deterministic
    And, or are commutative, etc
  Compilers have freedom to do code rewriting8
XQuery 1.0: the expressions
 Variable, constants
 Arithmetics, boolean, etc
 Function calls
 FLWOR expressions
   equivalent of SQL’s SELECT-FROM-WHERE
   If (expr) then (expr) else (expr)
 Node constructors
 Expressions fully composable
Adding updates to XQuery
 The XQuery update extension: first step to
 add real (declarative) side-effects to
 W3C Working Draft Nov. 2006

XML Updates
Primitive update expressions
  do insert <age>24</age> into $person[name=“Jim”]
  do delete $book[@year<2000]
  do rename $article as “publication”
  do replace ($books/book)[1] with <book>….</book>
  do replace value of $title with “New Title”

Conditional updates
  then do delete $book/year
  else do rename $book/year as “publicationTime”

Collection-oriented updates
  for $x in $book
  where $x/year<200
  return do rename $x as “oldBook”
XQuery update expressions
 Insert, delete, rename, etc are normal
 The are not fully composable with the rest of
 the expression language
   Semantic, not syntactic restrictions
 Distinguish the side-effecting expressions vs.
 non-side-effecting expressions
   Only in “control-flow” style expressions (FLWOR,
   typeswitch, conditionals)
   Side-effecting functions vs. read-only functions
Single snapshot program

  No side-effects are visible until the end of
  an entire XQuery program
    Not to concurrent XQuery applications
    Not to the current application either
  Database tradition
    Good: old optimization/reorganization of the
    code is still the same
    Bad: very hard to write complex applications

What are users trying to do
with XQuery
 Process XML
 Write complex applications
 Apply computations on input values coming
 for several data sources, and then output new
 fragments of XML data as result
   Human resources
   Business data                            14
New use case scenarios
 Implementation of Web Services
 XML Data transformation and integration
 of heterogeneous data sources (data
 Processing RSS feeds and other XML
 message streams
 Coordination of services in an SOA
 XML data cleansing or normalization
 Complex manipulations of persistent XML
Are such customers happy
with XQuery 1.0 ?
 Sure. They like XQuery.
 But are not entirely satisfied.
 Even with the update extension certain pieces of
 their application logic must still be expressed
 outside the XQuery/XML world.
 They lack support for common programming
 The friction/cost between “inside” and “outside”
 of an XML world is high
   Productivity and performance

What are they missing ?
1. The ability to preserve state during
      All functional programming languages have
      variable assignment !
2. The ability to “see” the results of their side-
   effects during the computation
3. The ability to invoke external side-effecting
   functions (e.g. Web Services) that cannot
   participate in snapshot semantics
4. The ability to recover (in a controlled way)
   from dynamic errors
5. The ability to model graphs
XQueryP overview
A small language extension
  More computation can be expressed directly in
  Minimize the friction between the inner and outer
  of an XML world
A big step towards helping our customers
build richer XML applications
  Supported by Oracle, BEA, DataDirect, etc
  Proposed to the W3C
Surprisingly: very small extensions can make
XQueryP a good and complete XML
processing language                         18
 The XQueryP concrete proposal
1.   A well-defined evaluation order for XQuery expressions
     (“sequential order”)
2.   Reduce the granularity of the snapshot to each individual
     atomic update expression
3.   Adds new expressions:
        Break, Continue
4.   An error handling facility (try-catch)
5.   A way of modeling graphs in XML
6.   A way of mapping XQueryP <-> Web Services
      The entire package is proposed as an optional feature      19
(1) Sequential evaluation order
 Slight modification to existing rules:
    FLWOR: FLWO clauses are evaluated first; result
    in a tuple stream; then Return clause is evaluated
    in order for each tuple. Side-effects made by one
    row are visible to the subsequent rows.
    COMMA: subexpressions are evaluated in order
    (UPDATING) FUNCTION CALL: arguments are
    evaluated first before body gets evaluated

Required (only) if we add side-effects immediately
visible to the program: e.g. variable assignments or
single snapshot atomic updates; otherwise semantics
not deterministic.
(2) Reduce snapshot granularity
Today update snapshot: entire query
  Every single atomic update expression (insert,
  delete, rename, replace) is executed and made
  effective immediately
  The effects of side-effecting external functions
  are visible immediately
 Semantics is deterministic because of the
sequential evaluation order (point1)

Sequential evaluation mode
and the FLWOR
for $x in <expression/>
let $y := <expression/>
where <expression/>       No side-effects are visible until here.
order by <expression/>
    <side-effecting expression/>           $x $y

(3) Adding new expressions
 Block expressions
 Assignment expressions
 While expressions
 Early return in function bodies

 Only under sequential evaluation mode
Block expression
     “{“ ( BlockDecl “;”)* Expr (“;” Expr)* “}”
BlockDecl :=
       (“declare” $VarName TypeDecl? (“:=“ ExprSingle) ?)?
       (“,” $VarName TypeDecl? (“:=“ ExprSingle) ? )*
  Declare a set of updatable variables, whose scope is only
  the block expression (in order)
  Evaluate each expression (in order) and make the effects
  visible immediately
  Return the value of the last expression
Updating if body contains an updating expression
Assignment expression
   “set” $VarName “:=“ ExprSingle
   Change the value of the variable
   Variable has to be external or declared in a
   block (no let, for or typeswitch)
 Updating expression
 Semantics is deterministic because of the
 sequential evaluation order
Function and blocks
   In the function syntax we change
      EnclosedExpr /* the body */ => Block
   Note: compatible change
   We relax the fact the a function cannot update some
   nodes and return a value
declare updating function local:prune($d as xs:integer) as
   declare $count as xs:integer := 0;
   for $m in /mail/message[date lt $d]
   return { do delete $m;
             set $count := $count + 1
}                                                            26
While expression
 “while” “(“ exprSingle “)” “return” expr
   Evaluate the test condition
   If “true” then evaluate the return clause; repeat
   If “false” return the concatenation of the values
   returned by all previous evaluations of return
 Syntactic sugar, mostly for convenience
   Could be written using recursive functions

Break, Continue, Return
 Traditional semantics, nothing surprising
 Break (or continue) the closest FLWOR or
 WHILE iteration
 Return: early exit from a function body
 Hard(er) to implement in a “database” style
 evaluation engine
   Because of the lazy evaluation

Atomic Blocks
 “atomic” “{“ . . . “}”
   If the evaluation of Expr does not raise errors,
   then result is returned
   If the evaluation of Expr raises a dynamic error
   then no partial side-effects are performed (all
   are rolled back) and the result is the error
 Only the largest atomic scope is effective
 Note: XQuery! had a similar construct
   Snap {…} vs. atomic {…}
More complex example
declare updating function myNs:cumCost($projects)
  as element( )*
  declare $total-cost as xs:decimal :=0;
  for $p in $projects[year eq 2005]
      {set $total-cost := $total-cost+$p/cost;
}       Today: additional self join, or recursive function
Putting everything together:
the sequential mode
  New setter in the prolog
    “declare” “execution” “sequential”
  Granularity: query or module
  What does it mean:
    Sequential evaluation mode for expressions
    Single atomic update snapshot
    Several new updating expressions (blocks, set, while,
    break, continue)
  If the query has no side-effects, sequential mode is
  irrelevant, and traditional optimizations are still
Sequential mode and optimization
 Sequential mode required to ensure deterministic
 semantics in case of visible side-effects
 In theory, more constraints on evaluation order
 implies less optimizations
 Decades-old tension between
   adding side-effects
   still allowing the optimizations
 Compromise to be made
   Errors might still be allowed to be non-deterministic ?
 The idea that optimization is not possible anymore is
 certainly not true
   More complex dataflow analysis and intelligence required to
   trace when side-effects are being applied, on what data,
   when two side-effects commute, etc                       32
(4) Try-catch
Errors in XQuery 1.0, Xpath 2.0, XSLT 2.0
  fn:error(err:USER0005, "Value out of range", $value)
Traditional design for try-catch
  try ( target-expr )
  catch ( $name as QName1, $desc, $obj )
       return handler-expr1
  catch ( $name as QName2, $desc, $obj )
       return handler-expr2. . .
  default ( $name, $desc, $obj )
       return general-handler-expr
  let $x := expr
      try ( <a>{ $x }</a> )
      catch (err:XQTY0024)
      return <a> {$x[self::attribute()],$x[fn:not(self::attribute())]} </a>
Try-catch (follow-up)
Semantic issues
  Definition assumes eager evaluation
  let $x := 1 div 0
       try ( <a>{ $x }</a> )
       catch (err:XQTY0075)
       return “division by zero”   THIS WILL NOT BE INVOKED !!
Optimization issues
  Almost all XQuery rewriting rules will be impacted
  Input and output cannot be streamed (pipeline breaker)
Alternative design: atomic try-catch
  If an error occurs, all side-effects are undone and the catch
  expression is evaluated
  Database style programming, high productivity
  Harder to implement, esp. external function calls….
(5) Invoking and coordinating Web Services
   WS are the standard way of sending and receiving XML data
   XQuery (and XSLT) are the standard way to program the XML
   We should design them consistently, natural fit
XQuery                          Web Services
module                           service
functions/operations             operations
arguments                        ports
values for arguments and         value for input and output
Result: XML                      messages: XML
   XQueryP proposes:
      A standard way of importing a Web Service into an XQuery program
      A standard way of invoking a WS operation as a normal function
      A standard way of exporting an XQuery module as a Web Service
   Many XQuery implementations already support this. We have
   to agree on a standard.
     Calling Google...
import service namespace
declare execution sequential;
declare variable $result;
declare variable $query;
set $query := mxq:readLine();
set $result :=
$query, 0,\10, fn:true(), "", fn:false(), "", "UTF-8", "UTF-8");
<results query="{$query}">
     for $url in $result/resultElements/item/URL
     return data($url)
(6) Adding references to XML
XML tree, not graph
E/R model graph, not tree
Inherent tension, XML Data Model is the source of
the problem, not XQuery
  let $x := <a><b/><a/> return <c>{$x/b)</c> /* copy of <b/>*/
Nodes in XDM have node identifiers
  Lifetime and scope of nodeids, implementation defined
XQueryP solution:
  fn:ref($x as node()) as xs:anyURI
  fn:deref($x as xs:anyURI) as node()
Lifetime and scope of URIs, implementation defined
Untyped references (URIs)
No changes required to:
  XML Schema, XDM Data Model, Xquery type system
XQueryP usage scenarios
XQueryP programs in the browsers
  We all love Ajax (the results). A pain to program. Really primitive as
  XML processing goes.
  Embedding XQueryP in browsers
  XQueryP code can take input data from WS, RSS streams, directly
  from databases
  Automatically change the XHTML of the page
XQueryP programs in the databases
  Complex data manipulation executed directly inside the database
  Takes advantage of the DB goodies, performance, scalability,
  security, etc
XQueryP programs in application servers
  Orchestration of WS calls, together with data extraction for a variety
  of data sources (applications, databases, files), and XML data
  XML data mashups
Related work
 Programming for XML:
   Extensions to other programming languages
     Xlinq, ECMAScript, PhP, XJ, etc
   Extensions to XQuery
     XL, XQuery!, MarkLogic’s extension
   Re-purposing other technologies: BPEL
 Long history of adding control flow logic to
 query languages
   15 years of success of PL/SQL and others
   SQL might have failed otherwise !
 This is certainly not new research, but a
 natural evolution
Extensions to other
programming languages
Every programming language is extended with “XML support”
  Through APIs and/or native in the language
  SQL (SQL/XML), Java (XJ), C# (Xlinq), Python, JavaScript
  (ECMAScript), etc
Two approaches to reach the same goal
  Language extensions less disruptive then XQueryP
  Both can co-exist (thanks WS and SOA)
Advantages of extending XQuery:
  Standard, broader range of applications, vendor and platform
  More “declarative”
     Easier to automatically generate the code
     Easier to optimize automatically for large volumes of data, streaming data
     Global optimization possible (no system or language barrier)
 ( Florescu, Kossmann, 2001)
 Goal: implementation of Web Services
 Adds statements to XQuery in the same
 way PL/SQL extended SQL
 Insert, delete, etc are statements, not
 Lack of composability => overloading of
     If-then-else, iteration, etc both as statements
     and expressions
 ( J. Simeon, G. Ghelli, 2006 )
 Same goal
 Extends XQuery in slightly different ways
 Full composability between side-effecting
 and non side-effecting expressions
 Sequential evaluation mode
 No variable assignment
 User controlled snapshot granularity
   Snap { expression }
 Snap {…} vs. atomic {…}
   Orthogonal, maybe both are needed, atomic
   more important                              42
MarkLogic’s XML application
development platform
 Same goal, different XQuery extension
   ML view: snapshot at the level of each individual
   update expression is bad for performance
 Compromise: should we have two version of
 insert, delete, etc: ?
   One visible immediately
     Apply delete //a
   One that is delayed for a later “commit” time
     Do delete //a
 Should we have a user-controlled commit ?             43
Frequent criticism
“Programmers do not know how to program
   What about SQL !?
“We don’t know how to optimize a language that is
not purely declarative”
   Why is the alternative any better !?
   Take ideas from both database optimization and
   programming languages compilation, plus innovate
“If you give users variable assignment, they’ll use it (
and abuse it) !”
  Teach them not to, rewrite automatically if they (still) do
“XQuery is too complicated”

Frequent criticism (2)
 “XML Pipelines are the answer”
    Redundancy of concepts is not good
       Global optimization
 “It will never perform as well as if we write the application
 in Java + SAX”
    Maybe true today, not sure in near future
    Optimizing a single XML applications vs. optimizing an
    XQuery(P) engine (I.e. all XML applications)
 “This would require users to learn a new language”
    Smooth transition, easy integration of pieces written in other
    languages (thanks WS!)
 “There are no libraries”
    Let’s build some…
    We will not need the same libraries like in Java or C#
       Different level of abstraction
       The target applications are different
Is it all we need to add to XQuery ?
  Certainly not. Outside the scope of this talk.
  More query-like extensions still needed
    Streaming and windowing (e.g. RSS processing)
  General purpose functionalities
    high-order functions
    Dynamic dispatch
  Improve the aesthetics and usability
  Better dynamic NS support
  Semantic based search
  Standard modules and libraries                    46
XQueryP implementations
 Prototype in Big OracleDB
   Presented at Plan-X 2005
 Prototype in BerkeleyDB-XML
   Might be open sourced (if interest)
 Open source prototype built in ETH, Zurich
   http://www.mxquery.org (Java)
   Runs on mobile phones: Java CLDC1.1; some cuts
   even run CLDC 1.0
   Eclipse Plugin available in March 2007

 Zorba C++ engine (FLWOR Foundation)
   Small footprint, performance, extensibility,
   potentially embeddable in many contexts          47
XQueryP Pet Projects (at ETH)
Airline Alliances
   every student programs his/her own airline
   form alliances
   experiment: do this in Java/SQL first; then in XQueryP
Public Transportation
   mobile phone computes best route (S-Bahn)
   integrate calendar, address book, ZVV, GPS
Context-sensitive Remote Control
   mote captures „clicks“ and movements
   mobile phone determines context and action (TV, garage, ..)
Lego Mindstorm
   move to warmest place in a room
Less of a toy (Oracle): XML Schema validator in XQueryP
Automatic XQueryP code
 Our goal will only be achieved if the code
 can be automatically generated, based on:
   Higher-level description (UI)
 Better chances with XQueryP then with
 any other alternative technology
   The higher level of abstraction, the easier
 Very important open research problem…
XQueryP’s potential impact
Let’s imagine the following scenario:
  OracleDB, BerkeleyDB, MySQL, all run XQueryP
  (or more, XQueryP executable can be part of a
  normal Linux instalation..)
  Any HTTP server understands XQueryP, is able
  to execute it, and responds in XML
  Content managers understand XML/XQuery
  XQueryP runs in the browsers
  XQueryP runs on all kinds of mobile devices
  We manage to optimize it properly (<28ms)
  We find a good UI paradigm for automatically
  generating XQueryP code                       50
 Potential impact, conclusion
 What will happen to the N-tiered architectures ?
 What will happen to client-server architectures ?
 What will happen to the cost of building and changing
 an application ?

 One step closer to the goal of XML technologies
   Enable the flow of digital information
   Automatize the processing the information
   Decrease the cost of building and running Web applications
   Transform Web applications in a commodity

Thank you.

To top