DTD

Document Sample
DTD Powered By Docstoc
					         DTD
Document Type Definition
CSE3201/4500 Information Retrieval
           Systems



           (C) Monash University Maria   1
                 Indrawan 2004
       Valid XML Document
• Is a well-formed document
• That also complies with
  – Syntax,
  – Structural,
  – and other rules
  defined in a Document Type Definition (DTD) or
    XML Schema

                (C) Monash University Maria    2
                      Indrawan 2004
                 DTD Example
<bookshop>                           <!DOCTYPE bookshop [
<book>                                    <!ELEMENT bookshop
  <title> Harry Potter and                (book)+>
        the Philosopher’s
                                          <!ELEMENT book (title,
        Stone</title>
  <author>                                author, price)+>
        <initials>                        <!ELEMENT title
               J.K                        (#PCDATA)>
        </initials>                       <!ELEMENT author
        <surname>                         (initials, surname)>
               Rowlings                           <!ELEMENT initials
        </surname>                        (#PCDATA)>
   </author>
                                                  <!ELEMENT surname
   <price value=“16.95”>
   </price>                               (#PCDATA)>
</book>                                   <!ELEMENT price EMPTY>
…                                         <!ATTLIST price
</bookshop>                               value CDATA #IMPLIED
                                     >
                                     ]>
                      (C) Monash University Maria                   3
                             Indrawan 2004
           DTD Components
• DOCTYPE
  – Document name and the name of its root element
• ELEMENT
  – Define content model of an element
• ATTLIST
  – List of attributes for an element
  – An element may have more than one ATTLIST
    statement
• ENTITY

                   (C) Monash University Maria       4
                         Indrawan 2004
       ELEMENT declaration

 <!ELEMENT name content_category>

• Content Category: any, empty,
  content_model




                  (C) Monash University Maria   5
                        Indrawan 2004
   Content Category –Any and
            Empty
• Any
  – <!ELEMENT AnythingGoesInHere ANY>

  – AnythingGoesInHere can contain anything, eg
    elements, character data, comments, etc
• Empty
  – <!ELEMENT price EMPTY>

  – May contain attributes
  – <img src=‘logo.png’/> => <!ELEMENT img EMPTY>



                  (C) Monash University Maria       6
                        Indrawan 2004
    Content Category - Content
             Model
<!ELEMENT name (content_model) cardinality>



• Content Model:
   – text only
   – element only
   – mixed


                    (C) Monash University Maria   7
                          Indrawan 2004
    Content Model – Text only
• Text Only
  – <!ELEMENT title (#PDCDATA)>




• An element declared with this content
  model can only contain textual data (simple
  string) and entity references.


                  (C) Monash University Maria   8
                        Indrawan 2004
 Content Model – Element Only
• Element Only
    <!ELEMENT bookshop(book)+>



  – An element declared with this content model
    can only contain elements and entity references.




                  (C) Monash University Maria      9
                        Indrawan 2004
       Content Model - Mixed
• Mixed
  – intersperse child elements with character data
• Example
  <?xml version="1.0"?>
  <!DOCTYPE root_element [
  <!ELEMENT root_element (#PCDATA|a_child_element)*>
  <!ELEMENT a_child_element (#PCDATA)>
  ]>

  <root_element> text mixed with <a_child_element> child
    text</a_child_element></root_element>


                    (C) Monash University Maria            10
                          Indrawan 2004
                 Cardinality
• Defines how many child elements may
  appear in a content model
      Operator Description
      None     One and only one child is allowed
      ?        Zero or one child
      *        Zero or more child(ren)
      +        One or more child(ren)



                  (C) Monash University Maria      11
                        Indrawan 2004
           Sequence Indicator
• “Followed by” (AND) => “ , “
  <!ELEMENT personName
    (title,firstName,middleName,lastName,suffix)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT firstName (#PCDATA)>
  <!ELEMENT middleName (#PCDATA)>
  <!ELEMENT lastName (#PCDATA)>
  <!ELEMENT suffix (#PCDATA)>
                                 <personName>
                                    <title>Mr<Title/>
                                    <firstName>John</FirstName>
                                    <middleName> V </MiddleName>
                                    <lastName> Smart </LastName>
                                    <suffix>Jr</Suffix>
                                 </personName>
                   (C) Monash University Maria               12
                         Indrawan 2004
            Sequence Indicator(2)
• Optional/Choice => “ | “
<!ELEMENT personName
   ((Mr |Ms |Dr ),firstName,middleName,lastName,( Jr
   |Sr))>
<!ELEMENT Mr EMPTY >
<!ELEMENT Ms EMPTY >          <personName>
<!ELEMENT Dr EMPTY >             <Mr />
<!ELEMENT firstName(#PCDATA)>    <firstName>John</firstName>
<!ELEMENT middleName(#PCDATA)> <middleName> V </middleName>
<!ELEMENT lastName(#PCDATA)>     <lastName> Smart </lastName>
<!ELEMENT Jr EMPTY >             <Jr />
<!ELEMENT Sr EMPTY >          </personName>



                       (C) Monash University Maria         13
                             Indrawan 2004
           DTD Components
• DOCTYPE
  – Document name and the name of its root element
• ELEMENT
  – Define content model of an element
• ATTLIST
  – List of attributes for an element
  – An element may have more than one ATTLIST
    statement
• ENTITY

                   (C) Monash University Maria       14
                         Indrawan 2004
           Attribute Declaration
<ATTLIST elementName
     attrName1 attrType1 attrDefault defaultValue1
     attrName2 attrType2 attrDefault defaultValue2
     …
     attrNameN attrTypeN attrDefault defaultValueN>


<personName title=“Dr” firstName=“Jenny” surname=“Genius”>

<!ELEMENT personName EMPTY>
<!ATTLIST personName
       title          CDATA      #IMPLIED
       firstName      CDATA      #REQUIRED
       surname        CDATA      #REQUIRED
>
                      (C) Monash University Maria            15
                            Indrawan 2004
        Attribute Type (some)
     Type                      Description

CDATA        Character Data (Simple String)

Enumerated   One of a series
Values
ID           A unique identifier for each instance of
             this element type
IDREF        A reference to an element with ID type
             attribute
ENTITY       The name of a predefined entity


                   (C) Monash University Maria          16
                         Indrawan 2004
           Attribute Defaults

Values                               Description
#REQUIRED        Attribute must appear in every instance of the
                 element.
#IMPLIED         Attribute is OPTIONAL.
#FIXED (plus     Attribute is OPTIONAL.
default value)   If it does appear, it must match the default
                 value.
                 If it does not appear, the parser may supply
                 the default value.


                     (C) Monash University Maria                  17
                           Indrawan 2004
             #Required - Example
<?xml version="1.0"?>
<!DOCTYPE friends [
<!ELEMENT friends (person)+>
<!ELEMENT person (personName,email) >
<!ELEMENT personName (firstName,surname) >
<!ELEMENT firstName (#PCDATA) >
<!ELEMENT surname (#PCDATA) >
<!ELEMENT email (#PCDATA) >
<!ATTLIST person perID ID #REQUIRED>
]>
<friends>
   <person perID="p1">
         <personName >
                  <firstName> Jenny </firstName>
                  <surname> Genius </surname>
         </personName>
         <email>jgenius@einstein.com</email>
   </person>
</friends>

                          (C) Monash University Maria   18
                                Indrawan 2004
            #Implied - Example
<?xml version="1.0"?>
<!DOCTYPE friends [
<!ELEMENT friends (person)+>
<!ELEMENT person (personName,email) >
<!ELEMENT personName (firstName,surname) >
<!ELEMENT firstName (#PCDATA) >
<!ELEMENT surname (#PCDATA) >
<!ELEMENT email (#PCDATA) >
<!ATTLIST person title CDATA #IMPLIED >
]>
… (next slide)




                      (C) Monash University Maria   19
                            Indrawan 2004
              #Implied - Example
<friends>
   <person title="Dr">
         <personName>
                  <firstName> Jenny </firstName>
                  <surname> Genius </surname>
         </personName>
         <email>jgenius@einstein.com</email>
   </person>
   <person>
         <personName >
                  <firstName> John </firstName>
                  <surname> Howard </surname>
         </personName>
         <email>jhoward@oz.gov.au</email>
   </person>
</friends>



                          (C) Monash University Maria   20
                                Indrawan 2004
            #Fixed-Example
<?xml version="1.0"?>
<!DOCTYPE friends [
<!ELEMENT friends (person)+>
<!ELEMENT person (personName,email) >
<!ELEMENT personName (firstName,surname) >
<!ELEMENT firstName (#PCDATA) >
<!ELEMENT surname (#PCDATA) >
<!ELEMENT email (#PCDATA) >
<!ATTLIST person title CDATA #FIXED "Dr" >
]>


                 (C) Monash University Maria   21
                       Indrawan 2004
         #Fixed-Valid Instances
<friends>
   <person title="Dr">
        <personName>
               <firstName> Jenny </firstName>
               <surname> Genius </surname>
        </personName>
        <email>jgenius@einstein.com</email>
   </person>
   <person title>
        <personName>
               <firstName> John </firstName>
               <surname> Howard </surname>
        </personName>
        <email>jhoward@oz.gov.au</email>
   </person>
</friends>
                      (C) Monash University Maria   22
                            Indrawan 2004
       #Fixed – Invalid Instance
<friends>
   <person title=“Ms">
       <personName>
              <firstName> Jenny </firstName>
              <surname> Genius </surname>
       </personName>
       <email>jgenius@einstein.com</email>
   </person>
</friends>




                    (C) Monash University Maria   23
                          Indrawan 2004
           DTD Components
• DOCTYPE
  – Document name and the name of its root element
• ELEMENT
  – Define content model of an element
• ATTLIST
  – List of attributes for an element
  – An element may have more than one ATTLIST
    statement
• ENTITY

                   (C) Monash University Maria       24
                         Indrawan 2004
                   Entity
• Storage Unit
• Entity is declared in DTD (except
  predefined entity) and is referred in
  DTD/XML document.



               (C) Monash University Maria   25
                     Indrawan 2004
             Entity Example
<?xml version="1.0"?>
<!DOCTYPE footNote [
<!ELEMENT footNote (#PCDATA)>
<!ENTITY copy "&#xA9;2001">
<!ENTITY uni "Monash University">
<!ENTITY disclaimer "No warranty &copy; &uni;">
]>
<footNote>All &uni; websites contain the
   following disclaimer &quot;&disclaimer;&quot;
</footNote>


                  (C) Monash University Maria      26
                        Indrawan 2004
              Category of Entity
• By the location of usage
   – General entities
      • used within XML documents
   – Parameter entities
      • only used within DTD
• By the treatment taken by the parser
   – Parsed entities:
      • used to store text
   – Unparsed entities
      • used to store non-textual content, eg binary data
      • has to have NOTATION declaration associated with it.
                        (C) Monash University Maria            27
                              Indrawan 2004
      Notations – non XML data
• Notation is used for:
   – anything that XML Processor can’t understand and
     parse
• Unparsed entities.

<!NOTATION gif89a PUBLIC “-//Compuserve//NOTATION Graphic
   Interchange Format89a//EN” “gif”>




                     (C) Monash University Maria            28
                           Indrawan 2004
                  External DTD
• Re-use of DTD.
• Easy to maintain
  – single update
• Public DTD
  <!DOCTYPE article PUBLIC “MyPublicDTD/Book”
    http://www.csse.monash.edu.au/DTDs/maria/book.dtd>

• Local DTD
  <!DOCTYPE article SYSTEM
    http://www.csse.monash.edu.au/DTDs/maria/book.dtd>

                       (C) Monash University Maria       29
                             Indrawan 2004
       External DTD Example
• DTD file
<!ELEMENT   friends (person)+>
<!ELEMENT   person (personName,email) >
<!ELEMENT   personName (firstName,surname) >
<!ELEMENT   firstName (#PCDATA) >
<!ELEMENT   surname (#PCDATA) >
<!ELEMENT   email (#PCDATA) >
<!ATTLIST   person title CDATA #IMPLIED >



                   (C) Monash University Maria   30
                         Indrawan 2004
            External DTD Example
• XML file
<?xml version="1.0" standalone="no"?>
<!DOCTYPE friends SYSTEM "external.dtd">
<friends>
   <person title="Dr">
         <personName>
                  <firstName> Jenny </firstName>
                  <surname> Genius </surname>
         </personName>
         <email>jgenius@einstein.com</email>
   </person>
   <person>
         <personName >
                  <firstName> John </firstName>
                  <surname> Howard </surname>
         </personName>
         <email>jhoward@oz.gov.au</email>
   </person>
</friends>
                          (C) Monash University Maria   31
                                Indrawan 2004
                     Mixed DTDs
• Internal and external can be mixed.
• The external has to be declared first.
       <!DOCTYPE article PUBLIC “MyPublicDTD/Book”
         http://www.csse.monash.edu.au/DTDs/maria/book.dtd

       [
            DTD declarations
                                            internal DTD subset
            …
       ]>

Conflict management:
   – the internal DTD subset always take priority
   – the internal DTD will overide the external declaration.
                         (C) Monash University Maria              32
                               Indrawan 2004
           Limitations of DTD
•   Non-XML syntax
•   DTD is not Extensible
•   Weak Data Typing
•   No inheritance

• Possible solution: XML Schema

                 (C) Monash University Maria   33
                       Indrawan 2004

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:1/17/2013
language:Unknown
pages:33