XML Data and Technologies

Document Sample
XML Data and Technologies Powered By Docstoc
					XML Data and Technologies

  Chapter 30.2.2, 30.3.2 - 30.3.5,
  30.4, 30.5 (but 30.5.1, 30.5.2,
         30.5.4 – 30.5.6)
XML Document Modeling
   XML vocabularies:
       Are designed for a specific type of content
       Conform general XML standards
       All documents from the same application must
        also conform to a consistent set of rules
   Modeling tools:
       XML Data Type Definitions (DTDs)
       XML schemas


                      Marina G. Erechtchoukova         2
Data Type Definitions
   DTD forms a vocabulary:
       Defines a set of allowable elements
       Defines a set of allowable attributes
   DTD forms the grammar
       Content model is a pattern which
        determines the element and attribute
        appearance
   DTD facilitates content management
                     Marina G. Erechtchoukova   3
    Declaring a DTD
 Internal DTD – is placed within the same
  document
<!DOCTYPE root_element
[ declarations… ]>
 External DTD – all declarations are placed

  to a separate file with extension “.dtd”


                   Marina G. Erechtchoukova    4
    Declaring an External DTD
 Private external DTD – is stored locally on
  the server:
<!DOCTYPE root_element SYSTEM “URL”>
 Public external DTD

<!DOCTYPE root_element
                PUBLIC “FPI” “URL”>,
Where FPI – formal public identifier
                   Marina G. Erechtchoukova     5
Declaring Document Elements
<!ELEMENT element content-model>
 Content-model specifies a type of
  element content:
     Any element – no restriction
     Empty element – cannot store any content
     Character data – text string
     Elements – contains child (nested) elements
     Mixed – contains both a character data and child
      elements

                     Marina G. Erechtchoukova            6
Element Definition – ANY Content
<!ELEMENT element_name ANY>
 Definition:

<!ELEMENT card ANY>
 Element appearance in XML document:
<card> <name> Toon Mermaid </name>
  <kind> Monster </kind></card>
Or
<card> Toon Mermaid </card>
              Marina G. Erechtchoukova   7
Element Definition – EMPTY
Content
<!ELEMENT element_name EMPTY>
 Definition:

<!ELEMENT attack EMPTY>
 Element appearance in XML document:

     <attack></attack>
Or
           <attack/>
              Marina G. Erechtchoukova   8
Element Definition – Character
Data Only Content
<!ELEMENT element_name (#PCDATA)>
Where PCDATA – parsed-character data
 Definition:

<!ELEMENT rarity (#PCDATA)>
 Element appearance in XML document:

     <rarity> Super rare foil</rarity>
NOT VALID appearance:
  <rarity> <class>Super rare</class>
           <type> foil</type></rarity>
                Marina G. Erechtchoukova   9
Element Definition – Element
Content
<!ELEMENT element_name (child_elements)>

 Definition:
<!ELEMENT rarity (class)>
 Element appearance in XML document:
      <rarity><class> Super rare
  foil</class></rarity>
NOT VALID appearance:
  <rarity> <class>Super rare</class>
           <type> foil</type></rarity>
                Marina G. Erechtchoukova   10
List of Child Elements
 Sequence is a list of elements with a
  defined order:
<!ELEMENT element_name
  (child1,…,childN)>
 Choice lists possible elements

<!ELEMENT element_name
  (child1|child2|…)>
Only one child-element is allowed
   Choice and sequence can be combined
                    Marina G. Erechtchoukova   11
Occurrence of Elements
   Modifying symbols:
       Allow zero or one - element?
       Allow one or more - element+
       Allow zero or more - element*
   Modifying symbols can be applied to a
    sequence or choice


                    Marina G. Erechtchoukova   12
Working with Mixed Content
<!ELEMENT element_name
  (#PCDATA|Child1|Child2|…)>
 Definition:

<!ELEMENT rarity (#PCDATA|type)>
 Element appearance in XML document:

<rarity> Super rare foil</rarity> Or
<rarity> <type> Super rare
  foil</type></rarity>
Restricts the control of the document structure
                 Marina G. Erechtchoukova         13
Defining Element Attributes
   Attribute-list declaration:
       List of the names of attributes associated
        with a specific element
       Attribute data types
       Indicates whether the attribute is required
        or optional
       Default value for the attribute


                     Marina G. Erechtchoukova     14
Attribute Declaration Syntax
<!ATTLIST element
     attribute1 type1 default1
     attribute2 type2 default2…>
Or
<!ATTLIST element attribute1 type1 default1>
<!ATTLIST element attribute2 type2 default2>
<!ATTLIST element attribute3 type3 default3>
…
                Marina G. Erechtchoukova       15
Attribute Types
  String types – CDATA
   attribute CDATA
  Enumerated types
    attribute (value1|value2|…)
<!ATTLIST card kind (“Magic”|”Trap”)>
  Tokenized type specifies some rules
   for the format and content
    attribute token
               Marina G. Erechtchoukova   16
Attribute Tokens
Tokenized   Description
Type
ID          Creates unique identifier
IDREF       References the ID attribute from
            another element
ENTITY      Corresponds to the name of a single
            entity
NMTOKEN     Restricted form of string (a valid XML
            name)
Others
                 Marina G. Erechtchoukova            17
Entities
   An entity – physical storage unit
   Entity reference – the method to refer
    to the storage unit
   Entities are introduced for:
       To refer to often repeated text
       To include the content of external files
   Entity reference in XML documents:
       &entity_name;
                     Marina G. Erechtchoukova      18
    Defining Entities
 Internal entity
  <!ENTITY entity_name “value”>
 Example:
Definition:
<!ENTITY effect1 “Destroys one opponent’s
  monster”>
In XML document: <effect>&effect1;</effect>
 External entity

<!ENTITY entity_name SYSTEM “URL”>


                   Marina G. Erechtchoukova   19
    Attribute Defaults
Default           Description
#REQUIRED         The attribute must appear with every
                  occurrence of the element
#IMPLIED          The attribute is optional
“default_value”   The attribute is optional. If the value is
                  not specified, a parser uses default
                  value

#FIXED            The attribute is optional. If the value is
“default_value”   specified, it must match the default
                  value.
                        Marina G. Erechtchoukova               20
Merging XML Documents
   XML documents created based on a few
    XML documents
   Name collision
   Declaring a namespace in the document
   Assigning a prefix to the namespace
   Applying the prefix to corresponding
    elements

                Marina G. Erechtchoukova   21
Namespaces
 Namespace – defined collection of
  element and attribute names
 Declaring a namespace

<?xml:namespace ns=“URI”
  prefix=“pr_name”?>



               Marina G. Erechtchoukova   22
Combine a Few XML Documents
   Although namespaces can be used to
    distinguish elements from different XML
    documents in a given document, to
    check validity of the document new
    DTD must be created
   DTD cannot be associated with a
    namespace

                 Marina G. Erechtchoukova   23
DTDs Limitations
   Written in a different (non-XML) syntax-
    Extended Backus Naur Form
   Doesn’t work well with namespaces
   Limited data typing
   Limited control on mixed content



                 Marina G. Erechtchoukova   24
XML Schema
   A definition of a specific XML structure
   It is an XML document
   Defines:
       Each element type of the structure
       Each data type associated with the
        element type
   Schema dialects

                    Marina G. Erechtchoukova   25
     Creating XML Schema
   W3C Schema Working Group
   File written in the XML Schema dialect with
    extension “.xsd”
   Single root element – schema
<xsd:schema
xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>
Element declarations
</xsd:schema>


                       Marina G. Erechtchoukova   26
Schema Element Types
   Simple type: an element contains no
    attribute or child element
   Complex type: an element contains
    attributes and/or child element
   Complex type is defined based on a
    compositor and attribute declaration


                 Marina G. Erechtchoukova   27
Compositors
   Sequence compositor forces elements in
    the XML document to be entered in the same
    order as in the schema
   Choice compositor specifies that only one
    of the elements in the list to be used in the
    XML document
   All compositor allows any of the elements
    to appear in the XML document
   Compositors can be nested
                   Marina G. Erechtchoukova     28
Element Declaration. Simple type
<xsd:element name=“element_name”
  type=“xsd:data_type”/>
Where data_type can be:
 string

 decimal

 integer, positiveInteger, and other

 boolean

 date

 ….

                 Marina G. Erechtchoukova   29
Element Declaration. Complex
type
<xsd:element name=“element_name”>
 <xsd:complexType>
    <xsd:compositor>
    element declarations
    </ xsd:compositor>
    attribute declaration
 </xsd:compelexType>
</xsd:element>
              Marina G. Erechtchoukova   30
Element Cardinality
   The number of possible occurances
   The minimum number
       minOccurs
   The maximum number
       maxOccur
   Default value is 1 for both attributes


                    Marina G. Erechtchoukova   31
Declaring Attributes
   An attribute must be declared along with the
    element it belongs to
<xsd:attribute name=“attr_name”
 type=“xsd:data_type”
 use=“Is_required?”
 default=“default_value”
 fixed=“fixed_values” />
   Is_required can be: required, optional,
    prohibited

                   Marina G. Erechtchoukova        32
Elements with Simple Content
and Attributes
<xsd:element name=“element_name”>
 <xsd:complexType>
     <xsd:simpleContent>
     <xsd:extension base=“data_type”>
     attribute declarations
     </xsd:extension>
     </xsd:simpleContent>
  </xsd:complexType>
</xsd:element>
                 Marina G. Erechtchoukova   33
Using Schemas in a Combined
Documents
   To attach the schemas to different parts
    of the document:
       Add the XML Schema-instance namespace
        to the document’s root element
       Assign namespaces to the different parts of
        the document
       Add schemaLocation attribute to the parent
        element of each part

                    Marina G. Erechtchoukova     34
    Example:
<?xml version=“1.0”?>
<mc:set
 xmlns:xsi=“http://www.w3.org/2001/XMLSchema-
 instance”
 xmlns:mc=“http://deck/monster/ns”
 xsi:schemaLocation=“URI schema”>
 <card>
 …
 </mc:set>
                   Marina G. Erechtchoukova   35
Structuring a Schema
   Russian Doll design – set of nested
    declarations
   Flat Catalog design – all element
    declarations are made globally
    References
    <xsd:element ref=“element_name”/>



                 Marina G. Erechtchoukova   36
        Displaying Contents of XML
        Documents
   Cascading Style Sheets have limitations:
       Can’t change the format of the content (like date
        format)
       Can’t add additional text
       Hard to work with images and hyperlinks
       Are applied to the elements, but not attributes
   Extensible Style Sheet Language:
       XSL-Formatting Objects (page layout and design)
       XSLTransformations (transforms XML content into
        another presentation format)
       XPath (locates information and performs operations)
                            Marina G. Erechtchoukova          37
XSLT Style Sheets
   Convert source document (XML
    document) into a result document
   Transformation is performed by XSLT
    processor:
       Server side transformation
       Client side transformation
   Browser’s support for XSLT:
       Internet Explorer 6.0 fully supports
        W3CXSLT specifications
                     Marina G. Erechtchoukova   38
XSLT Transformation
 XSLT allows to create a result document
  as HTML, XML, or text file.
<xsl:output method=“html”
  version=“4.0”/>
 Some of XSLT processors generate
  result documents according to
  specification.

               Marina G. Erechtchoukova   39
  Creating XSLT
<?xml version="1.0" ?>
<xsl:stylesheet version=“1.0” xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform">
     Content
</xsl:stylesheet>
 XSLT file has an extension “.xsl”




                  Marina G. Erechtchoukova   40
XSLT Content
   Template – a set of elements that defines
    how a part of the source document should be
    transformed in the result document
Template
<xsl:template match=“node”
    XSLT and Literal result Elements
</ xsl:template>
Where node either “/”, ”name”, or XPath
 expression
                  Marina G. Erechtchoukova      41
Template
   XSLT elements – commands to XSLT
    processor
   Literal result element – text sent to the
    result document, but not processed by
    XSLT processor:
       The example – HTML tags



                   Marina G. Erechtchoukova     42
Referencing Parts of XML
Documents
   XPath is the language for referencing the
    content of an XML document
   XML document is viewed as a node tree.
   Similar to Unix or DOS specifications for file
    paths
   Reaching elements:
       /child1/child2
   Reaching attribute of child2:
       /child1/@child

                         Marina G. Erechtchoukova    43
Working with XSLT
   Inserting a node expression into XSLT
    document
<xsl:value-of
 select=“XPath_expression”/>
Applying XSLT to XML document
<?xml:stylesheet type=“text/xsl”
 href=“URL” ?>
                 Marina G. Erechtchoukova   44
Querying XML Data
   XML Query Working Group
   Basic Documents:
       XML Query Requirements
       XML Query Data Model
       XML Query Algebra
       XQuery



                   Marina G. Erechtchoukova   45
Query Requirements
   Language must be declarative
   Independent of any protocol
   Data model must:
       Represent XML 1.0 data types of Schema
        specification
       Support references within and outside of
        the document
   The language must support specific
    operations
                    Marina G. Erechtchoukova       46
XML Query Data Model
   A node – basic construct
   Node types:
       Document,
       Element,
       Value,
       Attribute,
       Namespace,
       Processing instruction (PI) ,
       Comment

                         Marina G. Erechtchoukova   47
Node Document
   Root node is connected to all the nodes
    that are reachable directly or indirectly
    from it
   Connected nodes form a tree
   Every node belongs to exactly one tree
   Every tree has exactly one root node


                  Marina G. Erechtchoukova   48
XML Query Algebra
       Projection
       Selection
       Iteration
       Join
       Sorting
       Aggregation


              Marina G. Erechtchoukova   49
XQuery
   Is applied to XML
   Returns a sequence of XML fragments
    or atomic values
   XQuery relies on XPath and XML
    Schema data types.
   XQuery is not an XML language


                Marina G. Erechtchoukova   50
XQuery Expressions
   Path expressions
   Element constructors
   FLWOR ("flower") expressions
   List expressions
   Conditional expressions
   Quantified expressions
   Datatype expressions
                Marina G. Erechtchoukova   51
FLWOR Expression
   For-Let-Where-Order-Return
   Similar to SELECT-FROM-WHERE-
    GROUP-BY from SQL
   For – binds values to one or more
    variables using path expression, is used
    when iteration is required:
       Generates list of bindings per variable

                     Marina G. Erechtchoukova     52
FLWOR Expression (cont…)
   Let – binds values to one or more
    variables without iteration:
       Single binding per each variable
   Where – specifies qualification condition
   Return – constructs the output of the
    expression:
       Node
       Set of nodes

                       Marina G. Erechtchoukova   53
XML and Databases
   Databases store data for machine
    processing
   Data-centric model:
       Data is stored in a database
       Data is transferred as XML documents
   Document-centric model:
       Documents are designed for human
        consumption

                    Marina G. Erechtchoukova   54
XML Storing to and Retrieving
from a Database
   Documents with simple recordsets:
       Relational mapping
   Documents with hierarchical recordsets:
       Object-relational mapping
       Object oriented mapping
   Schema-independent representation:
       Relations are used to describe structure of
        XML document

                     Marina G. Erechtchoukova     55
Converting a Relation into XML
Document
                       Root/Relation




           Marina G. Erechtchoukova    56
SQL and XML
   SQL/XML Standard
       Oracle XML
   XML SQL Utility:
       Directly serves and stores XML from the database
       Takes SQL queries and generates XML documents
        from results
       Flexible XML output can be produced as Text or as
        DOM trees
       Generates DTDs and XML schemas from database
        schemas

                      Marina G. Erechtchoukova         57