Docstoc

dtd

Document Sample
dtd Powered By Docstoc
					          DTD
(Document Type Definition)
     Imposing Structure on
       XML Documents
     (W3Schools on DTDs)

                             1
             Motivation
• A DTD adds syntactical requirements in
  addition to the well-formed requirement
• It helps in eliminating errors when
  creating or editing XML documents
• It clarifies the intended semantics
• It simplifies the processing of XML
  documents

                                            2
              An Example
• In an address book, where can a phone
  number appear?
  – Under <person>, under <name> or under
    both?
• If we have to check for all possibilities,
  processing takes longer and it may not
  be clear to whom a phone belongs



                                               3
  Document Type Definitions

• Document Type Definitions (DTDs)
  impose structure on XML documents
• There is some relationship between a
  DTD and a schema, but it is not close –
  hence the need for additional “typing”
  systems (XML schemas)
• The DTD is a syntactic specification

                                            4
   Example: An Address Book
<person>
   <name> Homer Simpson </name>         Exactly one name
   <greet> Dr. H. Simpson </greet>     At most one greeting
   <addr>1234 Springwater Road </addr>    As many address
                                          lines as needed
   <addr> Springfield USA, 98765 </addr>
                                          (in order)
   <tel> (321) 786 2543 </tel>
                                   Mixed telephones
   <fax> (321) 786 2544 </fax>
                                   and faxes
   <tel> (321) 786 2544 </tel>
   <email> homer@math.springfield.edu </email>   As many
                                                 as needed
</person>
                                                             5
    Specifying the Structure


• name          to specify a name element
• greet?        to specify an optional
                (0 or 1) greet elements
• name, greet? to specify a name followed by
               an optional greet



                                               6
      Specifying the Structure
              (cont’d)

• addr*       to specify 0 or more address
              lines
• tel | fax   a tel or a fax element
• (tel | fax)* 0 or more repeats of tel or fax
• email*      0 or more email elements



                                                 7
    Specifying the Structure
            (cont’d)
• So the whole structure of a person entry
  is specified by

     name, greet?, addr*, (tel | fax)*, email*


• This is known as a regular expression


                                                 8
       Element Type Definition
• for each element type E, a declaration of the form:
•            <!ELEMENT E P>
•      where P is a regular expression, i.e.,
• P ::= EMPTY | ANY | #PCDATA | E’ |
•                    P1, P2 | P1 | P2 | P? | P+ | P*
   –   E’: element type
   –   P1 , P2: concatenation
   –   P1 | P2: disjunction
   –   P?: optional
   –   P+: one or more occurrences
   –   P*: the Kleene closure


                                                   9
Summary of Regular Expressions

• A         The tag (i.e., element) A occurs
• e1,e2     The expression e1 followed by
            e2
•   e*      0 or more occurrences of e
•   e?      Optional: 0 or 1 occurrences
•   e+      1 or more occurrences
•   e1 | e2 either e1 or e2
•   (e)     grouping
                                           10
The Definition of an Element Consists of
     Exactly One of the Following
 • A regular expression (as defined
   earlier)
 • EMPTY means that the element has no
   content
 • ANY means that content can be any
   mixture of PCDATA and elements
   defined in the DTD
 • Mixed content which is defined as
   described on the next slide
 • (#PCDATA)
                                         11
 The Definition of Mixed Content
• Mixed content is described by a
  repeatable OR group
     (#PCDATA | element-name | …)*
  – Inside the group, no regular expressions –
    just element names
  – #PCDATA must be first followed by 0 or
    more element names, separated by |
  – The group can be repeated 0 or more
    times
                                                 12
 An Address-Book XML Document
      with an Internal DTD
<?xml version="1.0" encoding="UTF-8"?>
                                              The name of
<!DOCTYPE addressbook [
                                               the DTD is
  <!ELEMENT addressbook (person*)>
  <!ELEMENT person
                                              addressbook
     (name, greet?, address*, (fax | tel)*, email*)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT greet (#PCDATA)>                 The syntax
  <!ELEMENT address       (#PCDATA)> of a DTD is
  <!ELEMENT tel       (#PCDATA)>             not XML
  <!ELEMENT fax        (#PCDATA)>            syntax
  <!ELEMENT email (#PCDATA)>
]>
     “Internal” means that the DTD and the
       XML Document are in the same file
                                       13
      The Rest of the
Address-Book XML Document

<addressbook>
 <person>
     <name> Jeff Cohen </name>
     <greet> Dr. Cohen </greet>
     <email> jc@penny.com </email>
 </person>
</addressbook>

                                     14
       Regular Expressions
• Each regular expression determines a
  corresponding finite-state automaton
• Let’s start with a simpler example: A double
                                       circle
           name, addr*, email          denotes an
                   addr                accepting
                                       state

           name            email


 This suggests a simple parsing program
                                             15
    Another Example
name,address*,(tel | fax)*,email*
     address          tel           email
                tel
 name                       email
               fax
                      fax       email




                                            16
Some Things are Hard to Specify
Each employee element should contain name,
age and ssn elements in some order

<!ELEMENT employee
  ( (name, age, ssn) | (age, ssn, name) |
    (ssn, name, age) | ...
  )>

Suppose that there were many more fields!

                                             17
Some Things are Hard to Specify
           (cont’d)

<!ELEMENT employee
  ( (name, age, ssn) | (age, ssn, name) |
    (ssn, name, age) | ...
  )>
       There are n! different
        orders of n elements fields!
Suppose there were many more

       It is not even polynomial
                                            18
 Specifying Attributes in the DTD
<!ELEMENT height (#PCDATA)>
<!ATTLIST height
   dimension CDATA #REQUIRED
   accuracy CDATA #IMPLIED >


The dimension attribute is required
The accuracy attribute is optional

CDATA is the “type” of the attribute – it means
“character data,” and may take any literal string
as a value
                                                    19
The Format of an Attribute Definition

• <!ATTLIST element-name attr-name
  attr-type default-value>
• The default value is given inside quotes
• attribute types:
  – CDATA
  – ID, IDREF, IDREFS
  –…

                                             20
         Summary of Attribute
           Default Values
• #REQUIRED means that the attribute must
  by included in the element
• #IMPLIED
• #FIXED “value”
  – The given value (inside quotes) is the only
    possible one
• “value”
  – The default value of the attribute if none is given


                                                          21
           Recursive DTDs
<DOCTYPE genealogy [
   <!ELEMENT genealogy (person*)>     Each person
   <!ELEMENT person (                 should have
             name,                    a father and a
             dateOfBirth,             mother. This
             person,      -- mother   leads to either
             person )> -- father      infinite data or
   ...                                a person that
]>
                                      is a descendent
What is the problem with this?        of herself.
A parser does not notice it!
                                                  22
     Recursive DTDs (cont’d)
<DOCTYPE genealogy [
   <!ELEMENT genealogy (person*)>
                                      If a person only
   <!ELEMENT person (                 has a father,
             name,                    how can you
             dateOfBirth,             tell that he has
             person?,     -- mother   a father and
             person? )> -- father     does not have
   ...                                a mother?
]>
What is now the problem with this?

                                                  23
Using ID and IDREF Attributes
 <!DOCTYPE family [
 <!ELEMENT family (person)*>
 <!ELEMENT person (name)>
 <!ELEMENT name (#PCDATA)>
 <!ATTLIST person
           id       ID   #REQUIRED
           mother IDREF #IMPLIED
           father IDREF #IMPLIED
           children IDREFS #IMPLIED>
]>

                                       24
              IDs and IDREFs
• ID attribute: unique within the entire document.
   – An element can have at most one ID attribute.
   – No default (fixed default) value is allowed.
      • #required: a value must be provided
      • #implied: a value is optional
• IDREF attribute: its value must be some other
  element’s ID value in the document.
• IDREFS attribute: its value is a set, each element of
  the set is the ID value of some other element in the
  document.
  <person id=“898” father=“332” mother=“336”
       children=“982 984 986”>


                                                          25
       Some Conforming Data
<family>
    <person id=“lisa” mother=“marge” father=“homer”>
        <name> Lisa Simpson </name>
    </person>
    <person id=“bart” mother=“marge” father=“homer”>
        <name> Bart Simpson </name>
    </person>
    <person id=“marge” children=“bart lisa”>
        <name> Marge Simpson </name>
    </person>
    <person id=“homer” children=“bart lisa”>
        <name> Homer Simpson </name>
    </person>
</family>

                                                       26
ID References do not Have Types
• The attributes mother and father are
  references to IDs of other elements
• However, those are not necessarily
  person elements!
• The mother attribute is not necessarily a
  reference to a female person



                                          27
     An Alternative Specification
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE family [
   <!ELEMENT family (person)*>
   <!ELEMENT person (name, mother?, father?, children?)>
   <!ATTLIST person id ID #REQUIRED>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT mother EMPTY>
   <!ATTLIST mother idref IDREF #REQUIRED>
   <!ELEMENT father EMPTY>
   <!ATTLIST father idref IDREF #REQUIRED>
   <!ELEMENT children EMPTY>
   <!ATTLIST children idrefs IDREFS #REQUIRED>
]>

                                                           28
                The Revised Data
<family>                                <person id="bart">
  <person id="marge">                     <name> Bart
    <name> Marge                            Simpson </name>
     Simpson </name>                      <mother idref="marge"/>
    <children idrefs="bart lisa"/>        <father idref="homer"/>
  </person>                             </person>
  <person id="homer">                   <person id="lisa">
    <name> Homer                          <name> Lisa
     Simpson </name>                       Simpson </name>
    <children idrefs="bart lisa"/>        <mother idref="marge"/>
  </person>                               <father idref="homer"/>
                                        </person>
                                     </family>
                                                              29
    Consistency of ID and IDREF
          Attribute Values
• If an attribute is declared as ID
   – The associated value must be distinct, i.e., different
     elements (in the given document) must have
     different values for the ID attribute (no confusion)
      • Even if the two elements have different element names
• If an attribute is declared as IDREF
   – The associated value must exist as the value of
     some ID attribute (no dangling “pointers”)
• Similarly for all the values of an IDREFS
  attribute
• ID, IDREF and IDREFS attributes are not typed
                                                                30
 Adding a DTD to the Document
• A DTD can be internal
  – The DTD is part of the document file
• or external
  – The DTD and the document are on
    separate files
  – An external DTD may reside
    • In the local file system
       (where the document is)
    • In a remote file system
                                           31
 Connecting a Document with its DTD
• An internal DTD:
 <?xml version="1.0"?>
 <!DOCTYPE db [<!ELEMENT ...> … ]>
 <db> ... </db>
• A DTD from the local file system:
 <!DOCTYPE db SYSTEM "schema.dtd">
• A DTD from a remote file system:
 <!DOCTYPE db SYSTEM
 "http://www.schemaauthority.com/schema.dtd">
                                           32
Well-Formed XML Documents
• An XML document (with or without a DTD) is
  well-formed if
  – Tags are syntactically correct
  – Every tag has an end tag
  – Tags are properly nested     An XML document
                                 must be well formed
  – There is a root tag
  – A start tag does not have two occurrences of the
    same attribute


                                                       33
          Valid Documents

• A well-formed XML document isvalid if
  it conforms to its DTD, that is,
  – The document conforms to the regular-
    expression grammar,
  – The types of attributes are correct, and
  – The constraints on references are satisfied

                                               34

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:5/16/2012
language:
pages:34
fanzhongqing fanzhongqing http://
About