SQL: Queries, Programming, Triggers - PowerPoint

Shared by: HC120727195643
Categories
Tags
-
Stats
views:
10
posted:
7/27/2012
language:
pages:
46
Document Sample
scope of work template
							         Database Systems I

The Semistructured Data Model




   CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   311
The Web Today
 HTML documents
   generated by humans or by applications,
   consumed by humans only,
   easy access: across platforms, across organizations.
   only layout, no semantic information
 Limited application interoperability
   HTML not understood by applications
    at most, some heuristic rules.
   Database technology
    SQL standard, but still lots of vendor specific
      aspects in implementations.
      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   312
XML Data Exchange Format
 A standard from the W3C (World Wide Web
 Consortium, http://www.w3.org).
 The mission of the W3C
 „. . . developing common protocols that
 promote its evolution and ensure its
 interoperability. . .“.
 Basic ideas
   XML = data
   XML generated by applications
   XML consumed by applications
   Easy access: across platforms, organizations.

      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   313
Paradigm Shift on the Web
 For web search engines:
   From documents (HTML) to data (XML)
   From document management to document
   understanding (e.g., question answering)
   From information retrieval to data management
 For database systems:
   From relational (structured) model to
   semistructured data
   From data processing to data /query translation
   From storage to transport


      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   314
The Semistructured Data Model
 Developed by the DBS community to address
 the following, emerging issues
 Data sets with non-rigid structure
    Biological data
   sequence data, 3D data, text data . . .
   and their relationships
    Web data
 Integration of heterogeneous sources
 not only, but especially for Web data and
 biological data.


      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   315
The Semistructured Data Model
 Data is self-describing, i.e. the data description is
 integrated with the data itself rather than in a
 separate schema.
 Database is a collection of nodes and arcs
 (directed graph).
 Leaf nodes represent data of some atomic type
 (atomic objects, such as numbers or strings).
 Interior nodes represent complex objects
 consisting of components (child nodes),
 connected by arcs to this node.
 Arcs are directed and connect two nodes.

       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   316
The Semistructured Data Model
 Arc labels indicates the relationship between
 the two corresponding nodes.
 The root node is the only interior node without
 in-arcs, representing the entire database.
 All database objects are children of the root
 node.
 Every node must be reachable from the root.
 A general graph structure is possible, i.e. the
 graph need not be a tree structure.


      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   317
Graphical Representation
                                               Bib
                                               &o1
complex object                  paper                     paper
                                             book

                                                       references
                         &o12                 &o24                 &o29
                                   references             references
                                        author                                      page
                author
                     title year http                          author
                                           author titlepublisher        title
                                                                   author
                                               author
          &o43                                                                             &25
                                                                    &96
                              1997
                                                                                                   last
    firstname                                              firstname      lastname         first
                   lastname


                                                            &243                &206
    “Serge”          “Abiteboul”
                                                            “Victor”                        122      133
                                                                                “Vianu”

                           atomic object
         CMPT 354, Simon Fraser University, Fall 2008, Martin Ester                                        318
Textual Representation
 Example:
   Bib: &o1 { paper: &o12 { … },
       book: &o24 { … },
       paper: &o29
           { author: &o52 “Abiteboul”,
             author: &o96 { firstname: &243 “Victor”,
                           lastname: &o206 “Vianu”},
             title: &o93 “Regular path queries with constraints”,
             references: &o12,
             references: &o24,
             pages: &o25 { first: &o64 122, last: &o92 133}
           }
        }
 Nested tuples, set-values, object identifiers (oids)


         CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   319
Textual Representation
 Simplified textual representation.
 Can omit oids.

 { paper: { author: “Abiteboul”,
         author: { firstname: “Victor”,
                   lastname: “Vianu”},
         title: “Regular path queries …”,
         page: { first: 122, last: 133 }
       }
 }




        CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   320
Comparison with Relational Model
  Missing attributes
  Additional attributes
  Multiple attribute values (set-valued attributes)
  Objects as attribute values
  No global schema

 only the first characteristics supported by relational
  model, all others are not



        CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   321
Comparison with Relational Model
 Semistructured data
    Self-describing,
    Irregular data,
    No a-priori structure.


 Relational DB
    Separate schema,
    Regular data,
    A-priori structure.



       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   322
Comparison with Relational Model
Example
                                                  row     row        row

 name        phone                    name phone name phone name phone


 John         3634                     “John” 3634 “Sue” 6343 “Dick”       6363


                                      { row: { name: “John”, phone: 3634 },
 Sue          6343                      row: { name: “Sue”, phone: 6343 },
                                        row: { name: “Dick”, phone: 6363 }
                                      }
 Dick         6363

        CMPT 354, Simon Fraser University, Fall 2008, Martin Ester            323
XML
A W3C standard for an Extensible Markup
Language.
Origins: Structured text SGML (Standard
Generalized Markup Language).
Motivation
  HTML describes presentation only, XML
  describes content and its meaning (semantics).
  HTML is fix language, XML allows to define your
  own markup languages.

          HTML  XML  SGML
    CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   324
From HTML to XML




   HTML describes the presentation / layout

   CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   325
From HTML to XML

HTML example

<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
     Abiteboul, Hull, Vianu
     <br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
     Abiteboul, Buneman, Suciu
     <br> Morgan Kaufmann, 1999


       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   326
From HTML to XML
XML example
 <bibliography>
    <book> <title> Foundations… </title>
           <author> Abiteboul </author>
           <author> Hull </author>
           <author> Vianu </author>
           <publisher> Addison Wesley </publisher>
           <year> 1995 </year>
    </book>
    …
 </bibliography>
XML describes the content
     CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   327
Elements
Tags
   book, title, author, …
  start tag: <book>, end tag: </book>
  defined by user / programmer (different from
  HTML!)
Elements
   <book>…<book>,<author>…</author>
  An element consists of a matching start and end tag
  and the enclosed content.
  Elements can be nested, i.e. content of one element
  can consist of sequence of other elements.

     CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   328
Attributes
Attributes can be associated with any element.
Provide additional information about elements.
Attributes can have only one value.
Example
 <book price = “55” currency = “USD”>
  <title> Foundations of Databases </title>
  <author> Abiteboul </author>
   …
  <year> 1995 </year>
 </book>
Attributes can also be used to connect elements.

       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   329
Non-tree-like XML
So far: only tree-like XML documents,
i.e. each element is nested within at most one
other element.
Attributes can also be used to create non-tree
XML documents.
Attributes with a domain of ID serve as
primary keys of elements.
Attributes with a domain of IDREF serve as
foreign keys referencing the ID of another
element.

     CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   330
 Non-tree-like XML
  Example of a non-tree structure
<persons>
 <person personid=“o555”>
   <name> Jane </name>
 </person>
 <person personid=“o456”>
 <name> Mary </name>
 <children refs=“o123 o555”</children >
 </person>
 <person personid=“o123” mother=“o456”>
  <name>John</name>
 </person>
</persons>
         CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   331
 Namespaces
    An XML document can involve tags that
    come for multiple sources.
    One and the same tag can appear in more
    than one source.
<table> <tr>
    <td>Apples</td>
    <td>Bananas</td>
</tr> </table>

<table>
    <name>African Coffee Table</name>
    <width>80</width>
    <length>120</length>
</table>
         CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   332
Namespaces
 Name conflicts can be resolved by prefixing tag
 names according to their source.
 <h:table>
     <h:tr> <h:td>Apples</h:td>
     <h:td>Bananas</h:td> </h:tr>
 </h:table>
 <f:table>
     <f:name>African Coffee Table</f:name>
     <f:width>80</f:width>
     <f:length>120</f:length>
 </f:table>
 When using prefixes in XML, a namespace for the
 prefix must be defined.
 The namespace must be referenced (via an URI) in
 the start tag of an enclosing element .
      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   333
 Namespaces
<h:table xmlns:h="http://www.w3.org/TR/html4/">
 <h:tr> . . .
</h:tr> </h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture"> . . .
</f:table> </root>

Or alternatively:
<root xmlns:h="http://www.w3.org/TR/html4/"
    xmlns:f="http://www.w3schools.com/furniture">
    <h:table>
    ...
    </h:table>
    <f:table>
    ...
    </f:table>
</root> CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   334
Namespaces
 A URI is a Universal Resource Identifier, typically
 a URL.
 The document referenced by the URI describes the
 meaning of the tags in the namespace.
 This description is informal and is not used by the
 XML parser.
 The description can even be empty.




      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   335
Well-Formed XML
A well-formed XML document satisfies the following
conditions:
  Begins with a declaration that it is XML.
  Has a single root element that encloses the whole
  document.
  Consists of properly nested elements, i.e. start and
  end tag of an element are within the same
  enclosing element.
standalone =“yes” states that document has no DTD.
In this mode, you can invent your own tags, like in
semistructured data model.
      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   336
Well-Formed XML
 <?XML version=“1.0” standalone =“yes” ?>
 <bibliography>
       <book> <title> Foundations… </title>
                <author> Abiteboul </author>
                <author> Hull </author>
                <author> Vianu </author>
                <publisher> Addison Wesley </publisher>
                <year> 1995 </year>
       </book>
       <book> <title> … </title>
                ...
       </book>
       …
 </bibliography>


      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   337
Well-Formed XML
HTML browsers will display documents with errors
(like missing end tags).
 The W3C XML specification states that a program
should stop processing an XML document if it finds
an error.
The main reason is that XML is being consumed by
programs rather than by humans (as HTML).
W3C provides a validator that checks whether an
XML document is well-formed.




      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   338
Valid XML
The validator can also check whether an XML
document is valid, i.e. conforms to a Document Type
Definition (DTD).
A DTD specifies the allowable tags and how they can
be nested.
XML with a DTD is no longer semistructured (self-
describing).
However, a DTD is less rigid than the schema of a
relational DB. E.g., a DTD allows missing and
multiple attributes / elements.

      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   339
Document Type Definitions
Document Type Definition (DTD): set of rules
(grammar) specifying elements, attributes and all
other aspects of XML documents.
For each element, specify name and content type.
Content type can, e.g., be
   #PCDATA (character string),
   other elements,
   regular expression made of the above content types
    * = zero or more occurrences
    ? = zero or one occurrence
    + = one or more occurrences
    , = sequence of elements.
      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   340
Document Type Definitions
Specification of element type
    “<!ELEMENT“ <Name> <Content> “>“
Specification of attributes
    “<!ATTLIST“ <ElementName> <AttributeName>
         <Content> <Type> “>“
Attribute type either #REQUIRED or #IMPLIED
(optional).
 <!ELEMENT Book (title, author*) >

 <!ELEMENT title #PCDATA>
 <!ELEMENT author (name, address,age?)>

 <!ATTLIST Book id ID #REQUIRED>
 <!ATTLIST Book pub IDREF #IMPLIED>
      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   341
Document Type Definitions
ID: domain with unique values within the given
document.
IDREF: references one ID.
IDREFS: references a list of IDs.
Example
   <Book id = „book1“ pub = „book5“ . . .>
   ...
    <Book id = „book5“ pub = „book4“ . . .>


      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   342
Document Type Definitions
Document type contains all corresponding element
types:
 “<!DOCTYPE“ <Name> “[“ <ElementTypes> “]>“
Use of DTD by some document:
   reference DTD in document opening line
   STANDALONE = “no“.
Example
 <?XML version=“1.0” standalone =“no” ?>
 <!DOCTYPE Book SYSTEM =“Book.dtd”>
     CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   343
 Example DTD: Product Catalog
<!DOCTYPE CATALOG [
<!ELEMENT CATALOG (PRODUCT+)>
<!ELEMENT PRODUCT (SPECIFICATIONS+,OPTIONS?,PRICE+,NOTES?)>
<!ATTLIST PRODUCT NAME CDATA #IMPLIED
  CATEGORY (HandTool|Table|Shop-Professional) "HandTool"
  PARTNUM CDATA #IMPLIED
  PLANT (Pittsburgh|Milwaukee|Chicago) "Chicago"
  INVENTORY (InStock|Backordered|Discontinued) "InStock">
<!ELEMENT SPECIFICATIONS (#PCDATA)>
<!ATTLIST SPECIFICATIONS WEIGHT CDATA #IMPLIED
 POWER CDATA #IMPLIED>
<!ELEMENT OPTIONS (#PCDATA)>
<!ATTLIST OPTIONS FINISH (Metal|Polished|Matte) "Matte"
 ADAPTER (Included|Optional|NotApplicable) "Included"
 CASE (HardShell|Soft|NotApplicable) "HardShell">
<!ELEMENT PRICE (#PCDATA)>
<!ATTLIST PRICE MSRP CDATA #IMPLIED
 WHOLESALE CDATA #IMPLIED
 STREET CDATA #IMPLIED
 SHIPPING CDATA #IMPLIED>
<!ELEMENT NOTES (#PCDATA)> ]>
                CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   344
XML Schema
The successor of DTDs to specify a schema for XML
documents.
A W3C standard.
Includes and extends functionality of DTDs.
In particular, XML Schemas support data types. This
makes it easier to validate the correctness of data and
to work with data from a database.
XML Schemas are written in XML. You don't have to
learn a new language and can use your XML parser to
parse your Schema files.

      CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   345
Simple Elements
Simple elements contain only text.
They can have one of the built-in datatypes:
xs:string, xs:decimal, xs:integer, xs:boolean
xs:date, xs:time.
Example
    <xs:element name="lastname“ type="xs:string"/>
    <xs:element name="age" type="xs:integer"/>

    <xs:element name="dateborn" type="xs:date"/>




       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   346
Simple Elements
Restrictions allow you to further constrain the content
of simple elements.


<xs:element name="age">
 <xs:simpleType>
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="0"/>
      <xs:maxInclusive value="120"/>
    </xs:restriction>
 </xs:simpleType>
</xs:element>

       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   347
Attributes
Attributes can be specified using the attribute element:
    <xs:attribute name="xxx" type="yyy"/>
Attribute elements are nested within the element of the
element with which they are associated.
By default, attributes are optional.
To make an attribute mandatory, use
<xs:attribute name="lang“ type="xs:string“use="required"/>
Attributes can have the same built-in datatypes as
simple elements.



       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   348
Complex Elements
Complex elements can contain other elements and can
have attributes.
Nested elements need to occur in the order specified.
The number of repetitions of elements are controlled
by the attributes minOccurs and maxOccurs. The
default is one repetition.
A complex element with an attribute:
<xs:element name="product">
    <xs:complexType>
       <xs:attribute name="prodid" type="xs:positiveInteger"/>
    </xs:complexType>
</xs:element>
       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   349
 Complex Elements
  A complex element containing a sequence of nested
  (simple) elements:

<xs:element name="employee">
   <xs:complexType>
      <xs:sequence>
        <xs:element name="firstname" type="xs:string"/>
        <xs:element name="lastname" type="xs:string"/>
      </xs:sequence>
   </xs:complexType>

</xs:element>


          CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   350
 Complex Elements
  If you name the complex element, other elements can
  reference and include it:

<xs:complexType name="persontype">
  <xs:sequence>
      <xs:element name="firstname" type="xs:string"/>
      <xs:element name="lastname" type="xs:string"/>
  </xs:sequence>
</xs:complexType>


<xs:element name="person" type="persontype"/>

         CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   351
 XML Document With Schema
  An XML document that uses a schema has to
  reference the schema in the schemaLocation
  attribute of its root element :
<?xml version="1.0"?>
<note xmlns="http://www.w3schools.com"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.w3schools.com note.xsd">
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>
          CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   352
 Example XML Schema
<schema version=“1.0”
  xmlns=“http://www.w3.org/1999/XMLSchema”>
<element name=“author” type=“string” />
<element name=“date” type = “date” />
<element name=“abstract”>
  <type> … </type>
</element>
<element name=“paper”>
  <type>
    <attribute name=“keywords” type=“string”/>
    <element ref=“author” minOccurs=“0”
      maxOccurs=“*” />
    <element ref=“date” />
    <element ref=“abstract” minOccurs=“0”
      maxOccurs=“1” />
    <element ref=“body” />
  </type>
</element>
</schema>

       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   353
XML vs. Semistructured Data
Both described best by a graph.
Both are schema-less, self-describing
(XML without DTD / XML schema).
XML is ordered, semistructured data is not.
XML can mix text and elements:
<talk> Making Java easier to type and easier to type
      <speaker> Phil Wadler </speaker>
</talk>
XML has lots of other stuff: attributes, entities,
processing instructions, comments.

       CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   354
Summary
Due to their variable and complex structure, Web
documents cannot naturally be modeled using
the relational model.
The Semistructured Data Model is a self-
describing data model providing sufficient
flexibility for representing Web documents.
One of the weaknesses of the Web is that
(HTML) documents cannot be processed
automatically.
The purpose of XML is to provide a way of
recording the semantics of Web documents and
their components. For this sake, XML allows you
to define your application-specific tags.
     CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   355
Summary
XML documents are lists of elements and
attributes. Elements can be nested to form tree-
like structures.
Non-hierarchical structures are also possible.
Document type definitions (DTDs) are similar to
but less restrictive than DB schemas, specifying
rules that corresponding XML documents have
to satisfy.
XML schemas are a more recent and more DB-
like extension of DTDs.


     CMPT 354, Simon Fraser University, Fall 2008, Martin Ester   356

						
Related docs
Other docs by HC120727195643
NOTICE OF AVAILABILITY
Views: 0  |  Downloads: 0
Chapter 6
Views: 12  |  Downloads: 0
THE BOEING COMPANY
Views: 0  |  Downloads: 0
energy work power
Views: 11  |  Downloads: 0
State of Florida Telecommuting Guide
Views: 2  |  Downloads: 0
Auto Body Painting
Views: 1  |  Downloads: 0
11 11 04 Boeing Huntington Beach
Views: 3  |  Downloads: 0
FIA repair list
Views: 5  |  Downloads: 0