16-XML-Views

Document Sample
16-XML-Views Powered By Docstoc
					          Views and Storing XML in Relational
          Databases



                                                              Susan B. Davidson
                                                         University of Pennsylvania
                                            CIS330 – Database Management Systems




Some slide content courtesy of Zack Ives.
Outline
 Views of XML data and XQuery
 Embedding and querying XML data in relational
  databases
 Storing and querying XML data using relational
  databases.




                                                   2
What are views?
 We frequently want to reference data in a way that differs
  from the way it’s stored
      XML data  HTML, text, etc.
      Relational data  XML data
      Relational data  Different relational representation
      XML data  Different XML representation

 Generally, these can all be thought of as different views over
  the data
    A view is a named query
    It is the DB analog of methods with arguments in PL
    We will start with views in the same model (relational, XML), and
     then address moving XML data to a relational format.

                                                                         3
 Views in SQL
  A view is first created (CREATE VIEW).
  We then use the name of the view to invoke the query
   (treating it as if it were the relation it returns)
Creating a view:                 Using the view:
CREATE VIEW V(A,B,C) AS          SELECT *
      SELECT A,B,C FROM R        FROM V, R
      WHERE R.A = “123”;         WHERE V.B=5 AND V.C=R.C

        This expands in the query processor to:
               SELECT *
               FROM (SELECT A,B,C FROM R
                    WHERE R.A = “123”) AS V, R
               WHERE V.B = 5
               AND V.C = R.C                               4
Views in XQuery
 In XQuery, views are defined as functions.
 Creating a view:
 declare function V() as element(content)* {
                for $r in doc(“R”)/root/tree,
                 $a in $r/a, $b in $r/b, $c in $r/c
                where $a = “123”
                return <content>{$a, $b, $c}</content> }
 Using a view:
 for $v in V()/content,
     $r in doc(“R”)/root/tree
 where $v/b = $r/b
 return $v
                                                           5
Why are views useful?
In addition to describing transformations from one
   schema to another, views have several common
   uses:
 Providing security/access control
   • We can assign users permissions on different views
   • Can select rows or project out columns so we only reveal
     what we want!
 Can be used as relations in other queries
   • Allows the user to query things that make more sense


                                                            6
Outline
 Views of XML data and XQuery
 Publishing and embedding XML data in relational
  databases
 Storing and querying XML data using relational
  databases.




                                                    7
“Publishing” XML Views of Relational Data
It can be done with SQL/XML, an extension of the SQL standard
    (see http://sqlxml.org)
(Don’t confuse with old and lame SQLXML for SQL Server.)


select xmlelement(name “Customer”,
            xmlelement(name “CustID”, c.CustId),
            xmlelement(name “CustName”, c.CustName),
            xmlelement(name “City”, c.City) )
from customers c
where c.Status = “preferred”

This is a very valuable tool for B2B (business-to-business) data exchange. Available
   in Oracle, DB2, SQL Server.
                                                                              8
Embedding XML in a Relational Database

 Straightforward solution: add attributes of type “XML”.
  Promoted by the same SQL/XML standard.

create table clients(
  id              int   primary key          not null,
  name            varchar(50),
  status          varchar(10),
  contactinfo     xml )


Available in Oracle, DB2, SQL Server. Syntax may vary. Above
  syntax is from DB2.

                                                               9
 Querying Relationally Embedded XML
 with SQL/XML
SQL/XML standard specifies xmlexists and xmlquery for
  embedding XPath and XQuery into SQL. DB2 syntax.

select name from clients
where xmlexists('$c/Client/Address[zip="95116"]'
                passing clients.contactinfo as "c")

select name,
       xmlquery('for $e in $c/Client/email return $e'
                passing contactinfo as "c")
from clients
where status = 'Gold’

                                                        10
 SQL Extensions for XQuery (DB2)
for $y in
  db2-fn:xmlcolumn('CLIENTS.CONTACTINFO')/Client/Address
return $y

for $y in db2-fn:sqlquery('select contactinfo
                                 from clients
                                 where status=''Gold'' ’
                          )/Client
where $y/Address/city="San Jose"
return (
     if ($y/email) then <emailList>{$y/email}
                             </emailList>
                   else $y/Address )
                                                     11
Outline
 Views of XML data and XQuery
 Embedding and querying XML data in relational
  databases
 Storing and querying XML data using relational
  databases.




                                                   12
More Uniform Ways for Storing XML in
an RDBMS: Mapping Relational  XML

 We know the following:
    XML data is tree-like
    XML is SEMI-structured
       There’s some structured “stuff”, especially if it follows a DTD
       There is some unstructured “stuff”, eg. text
 Issues relate to describing XML structure,
  particularly parent/child in a relational encoding
    Relations are flat
    Tuples can be “connected” via foreign-key/primary-key
     links

                                                                          13
The Simplest Way to Encode a Tree
 Suppose we had:               BinaryLikeEdge(key, label,
  <tree id=“0”>                  type, value, parent)
    <content id=“1”>
      <sub-content>XYZ key   label       type   value   parent
      </sub-content>    0    tree        ref    -       -
      <i-content>14     1    content     ref    -       0
      </i-content>
    </content>          2    sub-        ref    -       1
  </tree>                    content
                        3    i-content   ref    -       1
 If we have no IDs, we 4    -           str    XYZ     2
  CREATE values…
                        5    -           int    14      3
    What are shortcomings here?                               14
Florescu/Kossmann
Improved Edge Approach
 Consider order, typing;
  separate the values                     Vint(vid, value)
    Edge(parent, ordinal, label,
     flag, target)                            vid value
                                              v3   14
parent ord label            flag target
-      1   tree             ref 0
0      1   content          ref 1        Vstring(vid, value)

1       1     sub-content str       v2        vid value
1       2     i-content   int       v3        v2   XYZ

                                                                15
How Do You Compute the XML?
 Assume we know the structure of the XML tree
  (we’ll see how to avoid this later)
 We can compute an “XML-like” SQL relation using
  “outer unions”
   Idea: if we take two non-union-compatible expressions,
    pad each with NULLs, we can UNION them together

   Let’s see how this works…




                                                             16
A Relation that Mirrors the
XML Hierarchy
 <tree id=“0”>
   <content id=“1”>
     <sub-content> XYZ </sub-content>
     <i-content>14</i-content>
   </content>
  </tree>
 Write an SQL query whose output looks like:

rLabel rid rOrd clabel cid cOrd          sLabel      sid sOrd   str   int
 tree   0    1       -      -     -         -        -    -      -     -
  -     0    1    content   1     1         -        -    -      -     -
  -     0    1       -      1     1    sub-content   2    1      -     -
  -     0    1       -      1     1         -        2    1     XYZ    -
  -     0    1       -      1     2     i-content    3    1      -     -
  -     0    1       -      1     2         -        3    1      -    14
                                                                      17
A Relation that Mirrors the
XML Hierarchy
 Output relation would look like:


rLabel rid rOrd clabel cid cOrd            sLabel      sid sOrd   str   int
 tree   0     1        -      -      -        -        -    -      -         -
  -     0     1    content   1       1        -        -    -      -         -
  -     0     1        -     1       1   sub-content   2    1      -         -
  -     0     1        -     1       1        -        2    1     XYZ        -
  -     0     1        -     1       2    i-content    3    1      -         -
  -     0     1        -     1       2        -        3    1      -    14




                                                                        18
A Relation that Mirrors the
XML Hierarchy
 Output relation would look like:


rLabel rid rOrd clabel cid cOrd            sLabel      sid sOrd   str   int
 tree     0   1        -      -      -        -        -    -      -         -
  -       0   1    content   1       1        -        -    -      -         -
  -       0   1        -     1       1   sub-content   2    1      -         -
  -       0   1        -     1       1        -        2    1     XYZ        -
  -       0   1        -     1       2    i-content    3    1      -         -
  -       0   1        -     1       2        -        3    1      -    14

        Colors are representative of separate SQL queries…
                                                                        19
SQL for Outputting XML
 For each sub-portion we preserve the keys (target, ord) of
  the ancestors
 Root:
      select E.label AS rLabel, E.target AS rid, E.ord AS rOrd, null AS
      cLabel, null AS cid, null AS cOrd, null AS subOrd, null AS sid, null AS
      str, null AS int
      from Edge E
      where parent IS NULL
 First-level children:
      select null AS rLabel, E.target AS rid, E.ord AS rOrd, E1.label AS
      cLabel, E1.target AS cid, E1.ord AS cOrd, null AS …
      from Edge E, Edge E1
      where E.parent IS NULL AND E.target = E1.parent


                                                                                20
The Rest of the Queries
 Grandchild:
      select null as rLabel, E.target AS rid, E.ord AS rOrd, null AS cLabel, E1.target
      AS cid, E1.ord AS cOrd, E2.label as sLabel, E2.target as sid, E2.ord AS sOrd,
      null as …
      from Edge E, Edge E1, Edge E2
      where E.parent IS NULL AND E.target = E1.parent AND
      E1.target = E2.parent
 Strings:
      select null as rLabel, E.target AS rid, E.ord AS rOrd, null AS cLabel, E1.target
      AS cid, E1.ord AS cOrd, null as sLabel, E2.target as sid, E2.ord AS sOrd, Vi.val
      AS str, null as int
      from Edge E, Edge E1, Edge E2, Vstring Vi
      where E.parent IS NULL AND E.target = E1.parent AND
      E1.target = E2.parent AND Vi.vid = E2.target
 How would we do integers?


                                                                                     21
Finally…
 Union them all together:
     ( select E.label as rLabel, E.target AS rid, E.ord AS rOrd, …
       from Edge E
       where parent IS NULL)
     UNION (
       select null as rLabel, E.target AS rid, E.ord AS rOrd, E1.label AS cLabel,
       E1.target AS cid, E1.ord AS cOrd, null as …
       from Edge E, Edge E1
       where E.parent IS NULL AND E.target = E1.parent
     ) UNION (
       .
       :
     ) UNION (
       .
       :
     )
 Then another module will add the XML tags, and we’re done!
                                                                                    22
“Inlining” Techniques
 Folks at Wisconsin noted we can exploit the
  “structured” aspects of semi-structured XML
   If we’re given a DTD, often the DTD has a lot of required
    (and often singleton) child elements
      Book(title, author*, publisher)
   Recall how normalization worked:
      Decompose until we have everything in a relation determined by
       the keys
      … But don’t decompose any further than that
   Shanmugasundaram et al. try not to decompose XML
    beyond the point of singleton children

                                                                        23
Inlining Techniques
 Start with DTD, build a graph representing structure
                         tree    ?
                          *     @id
                        content
                                             @id
                        *             *
                  sub-content          i-content
• The edges are annotated with ?, * indicating repetition,
  optionality of children
• They simplify the DTD to figure this out
                                                             24
Building Schemas
 Now, they tried several alternatives that differ in
  how they handle elements w/multiple ancestors
    Can create a separate relation for each path book author
    Can create a single relation for each element
    Can try to inline these                    name
 For tree examples, these are basically the same
    Combine non-set-valued things with parent
    Add separate relation for set-valued child elements
    Create new keys as needed


                                                           25
Schemas for Our Example
   TheRoot(rootID)
   Content(parentID, id, @id)
   Sub-content(parentID, varchar)
   I-content(parentID, int)

 If we suddenly changed DTD to
  <!ELEMENT content(sub-content*, i-content?)
  what would happen?


                                                26
XQuery to SQL
 Inlining method needs external knowledge about the
  schema
    Needs to supply the tags and info not stored in the tables
 We can actually directly translate simple XQuery
  into SQL over the relations – not simply reconstruct
  the XML




                                                              27
An Example
  for $X in document(“mydoc”)/tree/content
  where $X/sub-content = “XYZ”
  return $X

 The steps of the path expression are generally joins
    … Except that some steps are eliminated by the fact we’ve inlined
     subelements
    Let’s try it over the schema:
       TheRoot(rootID)
       Content(parentID, id, @id)
       Sub-content(parentID, varchar)
       I-content(parentID, int)


                                                                         28
Summary
 We’ve seen that views are useful things, and are heavily used
  within database applications as well as for data exchange.
 We’ve seen an example of a view that changes from XML to
  relations – and we’ve even seen how such a view can be
  posed in XQuery and “unfolded” into SQL
 We have also seen how to define XML views of relational
  data




                                                                  29

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/25/2013
language:English
pages:29