XQueryP programming for XML
Document Sample


XQueryP: programming for XML
Daniela Florescu (Oracle)*
Don Chamberlin (IBM),
Mike Carey (BEA),
Donald Kossmann (ETH),
Mary Fernandez (AT&T),
Jonathan Robie (DataDirect),
Jerome Simeon (IBM),
Giorgio Ghelli (Univ. Pisa)
* This is my own opinion, not that of my company
XQuery: the beginning
December 1998, Boston, it was really
cold….
XML-QL was the sparkle
100 people excited about the potential of
querying the large volume of XML data that
we all anticipated
60 proposed articles, several languages and
algebras, architectures
Effort estimated at 2-3 years :)
Many (conflicting) goals and traditions
2
XQuery: the influences
“Database” community
SQL, OQL, understood declarative processing and
optimization
Two sub-camps:
“SQL is all you’ll ever need”
“XML is significantly different, we need a new query language”
Took us a long time to understand the utility of the
“document” aspect of XML
“Document” community
Understood XML and SGML
Understood the utility and usage of information markup
Didn’t understand why the DB folks were so worried about
the Xpath 1.0 equality not being transitive
“Functional programming” community
Believed that functional programming was the most
elegant programming paradigm, static typing is a must 3
XQuery: the present
Proposed Recommendation in Nov. 2006
An entire family of (consistently designed)
specifications
Abstract XML Data Model (XDM)
XQuery 1.0
XPath 2.0
XSLT 2.0
XML Functions and Operators
XQueryX: the XML syntax for XQuery 1.0
Formal Semantics and XML-Schema based type
system
4
XQuery: the successes
Almost 50 implementations
The three major databases (Oracle, SQL Server, DB2)
implement it
Used in application servers (Oracle, BEA)
Open source projects are flourishing
Saxon is widely used, same for BerkeleyDB-XML
Enthusiastic customers
not completely satisfied, but that’s a good sign…
Used in a variety of contexts:
Temporary data, persistent data, streaming XML data
Usage scenarios: data transformation, pure querying, analyitics,
publishing, etc
Classes in major universities
Flourishing research community around XML processing
5
Why this work, why this talk ?
Let’s look at what we have done.
How are users trying to use XQuery ?
Where should we go from here ?
What happens if we do go there ?
6
Plan of the talk
Status of XQuery and its update extension
Why do we need anything else ?
More use case scenarios
XQueryP: the technical proposal
Implementation and usage experience
XQueryP: frequent criticism
Alternatives / related work
Potential impact on software architectures
7
Xquery 1.0: a read only language
Functional heritage
XQuery programs are expressions
Expressions can be combined with full generality
No real side-effects
XQuery semantics was carefully crafted to
allow optimizations:
subexpressions can be evaluated in almost any
order
lazy evaluation is possible
errors are allowed to be non-deterministic
And, or are commutative, etc
Compilers have freedom to do code rewriting8
XQuery 1.0: the expressions
Variable, constants
Arithmetics, boolean, etc
Function calls
FLWOR expressions
equivalent of SQL’s SELECT-FROM-WHERE
Conditionals
If (expr) then (expr) else (expr)
Node constructors
<element>{expression}</element>
Expressions fully composable
9
Adding updates to XQuery
The XQuery update extension: first step to
add real (declarative) side-effects to
XQuery
W3C Working Draft Nov. 2006
10
XML Updates
Primitive update expressions
do insert <age>24</age> into $person[name=“Jim”]
do delete $book[@year<2000]
do rename $article as “publication”
do replace ($books/book)[1] with <book>….</book>
do replace value of $title with “New Title”
Conditional updates
if($book/year<2000)
then do delete $book/year
else do rename $book/year as “publicationTime”
Collection-oriented updates
for $x in $book
where $x/year<200
return do rename $x as “oldBook”
11
XQuery update expressions
Insert, delete, rename, etc are normal
expressions
The are not fully composable with the rest of
the expression language
Semantic, not syntactic restrictions
Distinguish the side-effecting expressions vs.
non-side-effecting expressions
Only in “control-flow” style expressions (FLWOR,
typeswitch, conditionals)
Side-effecting functions vs. read-only functions
12
Single snapshot program
No side-effects are visible until the end of
an entire XQuery program
Not to concurrent XQuery applications
Not to the current application either
Database tradition
Consequence:
Good: old optimization/reorganization of the
code is still the same
Bad: very hard to write complex applications
13
What are users trying to do
with XQuery
Process XML
Write complex applications
Apply computations on input values coming
for several data sources, and then output new
fragments of XML data as result
Examples:
HealthCare
Financial
Human resources
Business data 14
New use case scenarios
Implementation of Web Services
XML Data transformation and integration
of heterogeneous data sources (data
mashups)
Processing RSS feeds and other XML
message streams
Coordination of services in an SOA
environment
XML data cleansing or normalization
Complex manipulations of persistent XML
data
15
Are such customers happy
with XQuery 1.0 ?
Sure. They like XQuery.
But are not entirely satisfied.
Even with the update extension certain pieces of
their application logic must still be expressed
outside the XQuery/XML world.
They lack support for common programming
tasks
The friction/cost between “inside” and “outside”
of an XML world is high
Productivity and performance
16
What are they missing ?
1. The ability to preserve state during
computation
All functional programming languages have
variable assignment !
2. The ability to “see” the results of their side-
effects during the computation
3. The ability to invoke external side-effecting
functions (e.g. Web Services) that cannot
participate in snapshot semantics
4. The ability to recover (in a controlled way)
from dynamic errors
5. The ability to model graphs
17
XQueryP overview
A small language extension
More computation can be expressed directly in
XQuery
Minimize the friction between the inner and outer
of an XML world
A big step towards helping our customers
build richer XML applications
Supported by Oracle, BEA, DataDirect, etc
Proposed to the W3C
Surprisingly: very small extensions can make
XQueryP a good and complete XML
processing language 18
The XQueryP concrete proposal
1. A well-defined evaluation order for XQuery expressions
(“sequential order”)
2. Reduce the granularity of the snapshot to each individual
atomic update expression
3. Adds new expressions:
Block
Set
While
Break, Continue
4. An error handling facility (try-catch)
5. A way of modeling graphs in XML
6. A way of mapping XQueryP <-> Web Services
The entire package is proposed as an optional feature 19
(1) Sequential evaluation order
Slight modification to existing rules:
FLWOR: FLWO clauses are evaluated first; result
in a tuple stream; then Return clause is evaluated
in order for each tuple. Side-effects made by one
row are visible to the subsequent rows.
COMMA: subexpressions are evaluated in order
(UPDATING) FUNCTION CALL: arguments are
evaluated first before body gets evaluated
Required (only) if we add side-effects immediately
visible to the program: e.g. variable assignments or
single snapshot atomic updates; otherwise semantics
not deterministic.
20
(2) Reduce snapshot granularity
Today update snapshot: entire query
Change:
Every single atomic update expression (insert,
delete, rename, replace) is executed and made
effective immediately
The effects of side-effecting external functions
are visible immediately
Semantics is deterministic because of the
sequential evaluation order (point1)
21
Sequential evaluation mode
and the FLWOR
for $x in <expression/>
let $y := <expression/>
where <expression/> No side-effects are visible until here.
order by <expression/>
return
<side-effecting expression/> $x $y
22
(3) Adding new expressions
Block expressions
Assignment expressions
While expressions
Break,Continue
Early return in function bodies
Only under sequential evaluation mode
23
Block expression
Syntax:
“{“ ( BlockDecl “;”)* Expr (“;” Expr)* “}”
BlockDecl :=
(“declare” $VarName TypeDecl? (“:=“ ExprSingle) ?)?
(“,” $VarName TypeDecl? (“:=“ ExprSingle) ? )*
Semantics:
Declare a set of updatable variables, whose scope is only
the block expression (in order)
Evaluate each expression (in order) and make the effects
visible immediately
Return the value of the last expression
Updating if body contains an updating expression
24
Assignment expression
Syntax:
“set” $VarName “:=“ ExprSingle
Semantics:
Change the value of the variable
Variable has to be external or declared in a
block (no let, for or typeswitch)
Updating expression
Semantics is deterministic because of the
sequential evaluation order
25
Function and blocks
In the function syntax we change
EnclosedExpr /* the body */ => Block
Note: compatible change
We relax the fact the a function cannot update some
nodes and return a value
declare updating function local:prune($d as xs:integer) as
xs:integer
{
declare $count as xs:integer := 0;
for $m in /mail/message[date lt $d]
return { do delete $m;
set $count := $count + 1
};
$count
} 26
While expression
Syntax:
“while” “(“ exprSingle “)” “return” expr
Semantics:
Evaluate the test condition
If “true” then evaluate the return clause; repeat
If “false” return the concatenation of the values
returned by all previous evaluations of return
Syntactic sugar, mostly for convenience
Could be written using recursive functions
27
Break, Continue, Return
Traditional semantics, nothing surprising
Break (or continue) the closest FLWOR or
WHILE iteration
Return: early exit from a function body
Hard(er) to implement in a “database” style
evaluation engine
Because of the lazy evaluation
28
Atomic Blocks
Syntax:
“atomic” “{“ . . . “}”
Semantics:
If the evaluation of Expr does not raise errors,
then result is returned
If the evaluation of Expr raises a dynamic error
then no partial side-effects are performed (all
are rolled back) and the result is the error
Only the largest atomic scope is effective
Note: XQuery! had a similar construct
Snap {…} vs. atomic {…}
29
More complex example
declare updating function myNs:cumCost($projects)
as element( )*
{
declare $total-cost as xs:decimal :=0;
for $p in $projects[year eq 2005]
return
{set $total-cost := $total-cost+$p/cost;
<project>
<name>{$p/name}</name>
<cost>{$p/cost}</cost>
<cumCost>{$total-cost}</cumCost>
<project>
}
} Today: additional self join, or recursive function
30
Putting everything together:
the sequential mode
New setter in the prolog
Syntax:
“declare” “execution” “sequential”
Granularity: query or module
What does it mean:
Sequential evaluation mode for expressions
Single atomic update snapshot
Several new updating expressions (blocks, set, while,
break, continue)
If the query has no side-effects, sequential mode is
irrelevant, and traditional optimizations are still
applicable
31
Sequential mode and optimization
Sequential mode required to ensure deterministic
semantics in case of visible side-effects
In theory, more constraints on evaluation order
implies less optimizations
Decades-old tension between
adding side-effects
still allowing the optimizations
Compromise to be made
Errors might still be allowed to be non-deterministic ?
The idea that optimization is not possible anymore is
certainly not true
More complex dataflow analysis and intelligence required to
trace when side-effects are being applied, on what data,
when two side-effects commute, etc 32
(4) Try-catch
Errors in XQuery 1.0, Xpath 2.0, XSLT 2.0
fn:error(err:USER0005, "Value out of range", $value)
Traditional design for try-catch
try ( target-expr )
catch ( $name as QName1, $desc, $obj )
return handler-expr1
catch ( $name as QName2, $desc, $obj )
return handler-expr2. . .
default ( $name, $desc, $obj )
return general-handler-expr
Example
let $x := expr
return
try ( <a>{ $x }</a> )
catch (err:XQTY0024)
return <a> {$x[self::attribute()],$x[fn:not(self::attribute())]} </a>
33
Try-catch (follow-up)
Semantic issues
Definition assumes eager evaluation
let $x := 1 div 0
return
try ( <a>{ $x }</a> )
catch (err:XQTY0075)
return “division by zero” THIS WILL NOT BE INVOKED !!
Optimization issues
Almost all XQuery rewriting rules will be impacted
Input and output cannot be streamed (pipeline breaker)
Alternative design: atomic try-catch
If an error occurs, all side-effects are undone and the catch
expression is evaluated
Database style programming, high productivity
Harder to implement, esp. external function calls….
34
(5) Invoking and coordinating Web Services
WS are the standard way of sending and receiving XML data
XQuery (and XSLT) are the standard way to program the XML
processing
We should design them consistently, natural fit
XQuery Web Services
module service
functions/operations operations
arguments ports
values for arguments and value for input and output
Result: XML messages: XML
XQueryP proposes:
A standard way of importing a Web Service into an XQuery program
A standard way of invoking a WS operation as a normal function
A standard way of exporting an XQuery module as a Web Service
Many XQuery implementations already support this. We have
to agree on a standard.
35
Calling Google...
import service namespace
ws="http://api.google.com/GoogleSearch.wsdl";
declare execution sequential;
declare variable $result;
declare variable $query;
set $query := mxq:readLine();
set $result :=
ws:doGoogleSearch("oIqddkdQFHIlwHMXPerc1KlNm+FDcPUf",
$query, 0,\10, fn:true(), "", fn:false(), "", "UTF-8", "UTF-8");
<results query="{$query}">
{
for $url in $result/resultElements/item/URL
return data($url)
}
</results>
36
(6) Adding references to XML
XML tree, not graph
E/R model graph, not tree
Inherent tension, XML Data Model is the source of
the problem, not XQuery
Example
let $x := <a><b/><a/> return <c>{$x/b)</c> /* copy of <b/>*/
Nodes in XDM have node identifiers
Lifetime and scope of nodeids, implementation defined
XQueryP solution:
fn:ref($x as node()) as xs:anyURI
fn:deref($x as xs:anyURI) as node()
Lifetime and scope of URIs, implementation defined
Untyped references (URIs)
No changes required to:
XML Schema, XDM Data Model, Xquery type system
37
XQueryP usage scenarios
XQueryP programs in the browsers
We all love Ajax (the results). A pain to program. Really primitive as
XML processing goes.
Embedding XQueryP in browsers
XQueryP code can take input data from WS, RSS streams, directly
from databases
Automatically change the XHTML of the page
XQueryP programs in the databases
Complex data manipulation executed directly inside the database
Takes advantage of the DB goodies, performance, scalability,
security, etc
XQueryP programs in application servers
Orchestration of WS calls, together with data extraction for a variety
of data sources (applications, databases, files), and XML data
transformations
XML data mashups
38
Related work
Programming for XML:
Extensions to other programming languages
Xlinq, ECMAScript, PhP, XJ, etc
Extensions to XQuery
XL, XQuery!, MarkLogic’s extension
Re-purposing other technologies: BPEL
Long history of adding control flow logic to
query languages
15 years of success of PL/SQL and others
SQL might have failed otherwise !
This is certainly not new research, but a
natural evolution
39
Extensions to other
programming languages
Every programming language is extended with “XML support”
Through APIs and/or native in the language
SQL (SQL/XML), Java (XJ), C# (Xlinq), Python, JavaScript
(ECMAScript), etc
Two approaches to reach the same goal
Language extensions less disruptive then XQueryP
Both can co-exist (thanks WS and SOA)
Advantages of extending XQuery:
Standard, broader range of applications, vendor and platform
independent
More “declarative”
Easier to automatically generate the code
Easier to optimize automatically for large volumes of data, streaming data
Global optimization possible (no system or language barrier)
40
XL
( Florescu, Kossmann, 2001)
Goal: implementation of Web Services
Adds statements to XQuery in the same
way PL/SQL extended SQL
Insert, delete, etc are statements, not
expressions
Lack of composability => overloading of
concepts
If-then-else, iteration, etc both as statements
and expressions
41
XQuery!
( J. Simeon, G. Ghelli, 2006 )
Same goal
Extends XQuery in slightly different ways
Full composability between side-effecting
and non side-effecting expressions
Sequential evaluation mode
No variable assignment
User controlled snapshot granularity
Snap { expression }
Snap {…} vs. atomic {…}
Orthogonal, maybe both are needed, atomic
more important 42
MarkLogic’s XML application
development platform
Same goal, different XQuery extension
Differences:
ML view: snapshot at the level of each individual
update expression is bad for performance
Compromise: should we have two version of
insert, delete, etc: ?
One visible immediately
Apply delete //a
One that is delayed for a later “commit” time
Do delete //a
Should we have a user-controlled commit ? 43
Frequent criticism
“Programmers do not know how to program
declaratively”
What about SQL !?
“We don’t know how to optimize a language that is
not purely declarative”
Why is the alternative any better !?
Take ideas from both database optimization and
programming languages compilation, plus innovate
“If you give users variable assignment, they’ll use it (
and abuse it) !”
Teach them not to, rewrite automatically if they (still) do
“XQuery is too complicated”
44
Frequent criticism (2)
“XML Pipelines are the answer”
Redundancy of concepts is not good
Productivity
Global optimization
“It will never perform as well as if we write the application
in Java + SAX”
Maybe true today, not sure in near future
Optimizing a single XML applications vs. optimizing an
XQuery(P) engine (I.e. all XML applications)
“This would require users to learn a new language”
Smooth transition, easy integration of pieces written in other
languages (thanks WS!)
“There are no libraries”
Let’s build some…
We will not need the same libraries like in Java or C#
Different level of abstraction
The target applications are different
45
Is it all we need to add to XQuery ?
Certainly not. Outside the scope of this talk.
More query-like extensions still needed
Group-by
Outer-joins
Streaming and windowing (e.g. RSS processing)
General purpose functionalities
eval(“query-string”)
high-order functions
Dynamic dispatch
Improve the aesthetics and usability
Better dynamic NS support
Semantic based search
Standard modules and libraries 46
XQueryP implementations
Prototype in Big OracleDB
Presented at Plan-X 2005
Prototype in BerkeleyDB-XML
Might be open sourced (if interest)
Open source prototype built in ETH, Zurich
http://www.mxquery.org (Java)
Runs on mobile phones: Java CLDC1.1; some cuts
even run CLDC 1.0
Eclipse Plugin available in March 2007
Zorba C++ engine (FLWOR Foundation)
Small footprint, performance, extensibility,
potentially embeddable in many contexts 47
XQueryP Pet Projects (at ETH)
Airline Alliances
every student programs his/her own airline
form alliances
experiment: do this in Java/SQL first; then in XQueryP
Public Transportation
mobile phone computes best route (S-Bahn)
integrate calendar, address book, ZVV, GPS
Context-sensitive Remote Control
mote captures „clicks“ and movements
mobile phone determines context and action (TV, garage, ..)
Lego Mindstorm
move to warmest place in a room
Less of a toy (Oracle): XML Schema validator in XQueryP
48
Automatic XQueryP code
generation
Our goal will only be achieved if the code
can be automatically generated, based on:
Metadata
Higher-level description (UI)
Better chances with XQueryP then with
any other alternative technology
The higher level of abstraction, the easier
Very important open research problem…
49
XQueryP’s potential impact
Let’s imagine the following scenario:
OracleDB, BerkeleyDB, MySQL, all run XQueryP
(or more, XQueryP executable can be part of a
normal Linux instalation..)
Any HTTP server understands XQueryP, is able
to execute it, and responds in XML
Content managers understand XML/XQuery
XQueryP runs in the browsers
XQueryP runs on all kinds of mobile devices
We manage to optimize it properly (<28ms)
We find a good UI paradigm for automatically
generating XQueryP code 50
Potential impact, conclusion
What will happen to the N-tiered architectures ?
What will happen to client-server architectures ?
What will happen to the cost of building and changing
an application ?
One step closer to the goal of XML technologies
Enable the flow of digital information
Automatize the processing the information
Decrease the cost of building and running Web applications
Transform Web applications in a commodity
Thank you.
51
Related docs
Get documents about "