Document Sample
xml Powered By Docstoc
					Extensible Markup Language

      MSI 602 – Spring 2003

• Data is facts and figures
• Database is a related set of data
Kinds of databases
• Unstructured
     – Meaning of data interpreted by user
•    Semi-Structured
     – Structure of data wrapped around data
•    Structured
     – Fixed structure of data
     – Data added to the fixed structure
    Definition and Example
•    XML is a text based markup language that is fast becoming a
     standard of data interchange
     –   An open standard from W3C
     –   A direct descendant from SGML

Example: Product Inventory Data
        <Model Number>R3456d2h</Model Number>
        <Manufacturer>General Electric</Manufacturer>
    Data Interchange
•    XMLs key role is data interchange
•    Two business partners want to exchange customer data
     –   Agree on a set of tags
     –   Exchange data without having to change internal databases
•    Other business partners can participate by using the same
     –   New tags can be added to extend the functionality

          Key to successful data interchange is building
             consensus and standardizing of tag sets
    Universal Data
•    TCP/IP     Universal Networking
•    HTML       Universal Rendering
•    Java       Universal Code
•    XML        Universal Data

•    Numerous standard bodies are set up for standardization of
     tags in different domains
     –   ebXML
     –   XBRL
     –   MML
     –   CML
    HTML vs. XML
•    Both are markup languages
     –   HTML has fixed set of tags
     –   XML allows user to specify the tags based on requirements
•    Usage
     –   HTML tags specify how to display data
     –   XML tags specify semantics of the data
•    Tag Interpretation
     –   HTML specifies what each tag and attribute means
     –   XML tags delimit data & leave interpretation to the parsing application
•    Well formedness
     –   HTML very tolerant of rule violations (nesting, matching tags)
     –   XML very strictly follows rules of well formedness
•    Prolog
     –     Instructs the parser as to what it it parsing
     –     Contains processing instructions for processor
•    Body
     –     Tags                 - Entities
     –     Attributes           - Properties of Entities
     –     Comments             - Statements for clarification in the document
     <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>    Prolog
                      <first name>Sanjay</first name>
                      <last name>Goel</last name>
           <address>                                             Body
                      <street>56 Della Street</street>
•    Syntax: <?xml version=“1.0” encoding=“UTF-8”
               standalone=“yes” ?>
•    Contains eclaration that identifies a document as xml
•    Version
     –   Version of XML markup language used in the data
     –   Not optional
•    Encoding
     –   Identifies the character set used to encode the data
     –   Default compressed Unicode: UTF-8
•    Standalone
     –   Tells whether or not this document references external entity
•    May contain entity definitions and tag specifications
    XML Syntax
    Elements & Attributes
•    Uses less-than and greater-than characters (<…>) as
•    Every opening tag must having an accompanying closing tag
     –   <First Name>Sanjay</First Name>
     –   Empty tags do not require an accompanying closing tag.
     –   Empty tags have a forward slash before the greater-than sign e.g.
•    Tags can have attributes which must be enclosed in double
     –   <name first=“Sanjay” last=“Goel”)
•    Elements should be properly nested
     –   The nesting can not be interleaved
     –   Each document must have one single root element
•    Elements and attribute names are case sensitive
    Tree Structure
•    XML documents have a tree structure containing multiple
     levels of nested tags.
     –     Root element is a single XML element which encloses all of the other
           XML elements and data in the document
     –     All other elements are children of the root element

     <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
     <contact>                                          Root Element
                      <first name>Sanjay</first name>
                      <last name>Goel</last name>
                      <street>56 Della Street</street>  Child Elements
    Definition and Example

•     Attributes are properties associated with an element
•     Each attribute is a name value pair
      –     No element may contain two attributes with same name
      –     Name and value are strings
      <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
            <name first=“Sanjay” last=“Goel”></name>     Attributes
                       <street>56 Della Street</street>  Nested Elements
    Elements vs. Attributes
•    Data should be stored in Elements
•    Information about data (meta-data) should be stored in
     –   When in doubt use elements
•    Rules of thumb
     –   Elements should have information which some one may want to read.
     –   Attributes are appropriate for information about document that has
         nothing to do with content of document
         e.g. URLs, units, references, ids belong to attributes
     –   What is your meta-data may be some ones data
•    XML comments begin with “<!--”and end with “-->”
     –    All data between these delimiters is discarded
     –    <!-- This is a list of names of people -->
•    Comments should not come before XML declaration
•    Comments can not be placed inside a tag
•    Comments may be used to hide and surround tags
         <!-- <last>Goel</last> -->                      Last tag is ignored
•    “--” string may not occur inside a comment except as part of
     its opening and closing tag
     –    <!-- the Red door -- that is the second -->    Illegal
•    XML documents come from different sources
     – Combining elements from different sources can result in
       name conflict
     – Namespaces allow the interpreter to resolve the elements
•    Namespaces
     – Declared within element start-tag using attribute xmlns
     – Represented as an actual URI (since namespaces are
       globally unique)
     –   e.g. <Collection xmlns:book=""
     –   Here book and cd are short hands for the full namespace name
•    Default namespace is used if no other namespace is defined
     –   It does not have any prefix associated with it
<?xml version="1.0"?>                            <?xml version="1.0"?>
<!-- File Name: Collection.xml -->               <!-- File Name: Collection.xml -->
<COLLECTION                                      <COLLECTION
  xmlns:book=""     <ITEM>
  xmlns:cd="">           <TITLE>Violin Concertos Numbers 1, 2, and 3</TITLE>
  <ITEM Status="in">                                 <COMPOSER>Mozart</COMPOSER>
    <TITLE>The Adventures of Huckleberry             <PRICE>$16.49</PRICE>
    <AUTHOR>Mark Twain</book:AUTHOR>
                                                  <TITLE>Violin Concerto in D</TITLE>
  <ITEM Status="in">
    <TITLE>The Marble Faun</TITLE>
    <AUTHOR>Nathaniel Hawthorne</AUTHOR>
  <ITEM Status="out">
    <TITLE>Leaves of Grass</TITLE>
    <AUTHOR>Walt Whitman</AUTHOR>
  <ITEM Status="out">
    <TITLE>The Legend of Sleepy Hollow</TITLE>
    <AUTHOR>Washington Irving</AUTHOR>
  Books and CDs are tracked in different files if combined will lead to conflicts
<?xml version="1.0"?>                             <cd:ITEM>
<!-- File Name: Collection.xml -->                  <cd:TITLE>Violin Concertos Numbers 1, 2, and
<COLLECTION                                              3</cd:TITLE>
  <book:ITEM Status="in">
                                                  <book:ITEM Status="out">
    <book:TITLE>The Adventures of Huckleberry       <book:TITLE>The Legend of Sleepy
          Finn</book:TITLE>                              Hollow</book:TITLE>
    <book:AUTHOR>Mark Twain</book:AUTHOR>           <book:AUTHOR>Washington Irving</book:AUTHOR>
    <book:PRICE>$5.49</book:PRICE>                  <book:PRICE>$2.95</book:PRICE>
  </book:ITEM>                                    </book:ITEM>
  <cd:ITEM>                                       <book:ITEM Status="in">
    <cd:TITLE>Violin Concerto in D</cd:TITLE>       <book:TITLE>The Marble Faun</book:TITLE>
    <cd:COMPOSER>Beethoven</cd:COMPOSER>            <book:AUTHOR>Nathaniel Hawthorne</book:AUTHOR>
  <book:ITEM Status="out">
    <book:TITLE>Leaves of Grass</book:TITLE>
    <book:AUTHOR>Walt Whitman</book:AUTHOR>
    Display XML
    Style Sheets

•    A style sheet is a file that contains instructions for
     rendering individual elements in an XML document
•    Two kinds of style sheets exist
     – Cascading Style Sheets (CSS)
     – Extensible Stylesheet language (XSLT)
•    Please refer to the following web site for
     comprehensive information on style sheets
    Cascading Style Sheets
<?xml version="1.0"?>                           <BOOK>
<!-- File Name: Inventory01.xml -->                <TITLE>The Legend of Sleepy Hollow</TITLE>
<?xml-stylesheet type="text/css"                   <AUTHOR>Washington Irving</AUTHOR>
        href="Inventory01.css"?>                   <BINDING>mass market paperback</BINDING>
<INVENTORY>                                        <PRICE>$2.95</PRICE>
  <BOOK>                                         </BOOK>
    <TITLE>The Adventures of Huckleberry         <BOOK>
       Finn</TITLE>                                <TITLE>The Marble Faun</TITLE>
    <AUTHOR>Mark Twain</AUTHOR>                    <AUTHOR>Nathaniel Hawthorne</AUTHOR>
    <BINDING>mass market paperback</BINDING>       <BINDING>trade paperback</BINDING>
    <PAGES>298</PAGES>                             <PAGES>473</PAGES>
    <PRICE>$5.49</PRICE>                           <PRICE>$10.95</PRICE>
  </BOOK>                                        </BOOK>
  <BOOK>                                        <BOOK>
    <TITLE>Leaves of Grass</TITLE>                 <TITLE>Moby-Dick</TITLE>
    <AUTHOR>Walt Whitman</AUTHOR>                  <AUTHOR>Herman Melville</AUTHOR>
    <BINDING>hardcover</BINDING>                   <BINDING>hardcover</BINDING>
    <PAGES>462</PAGES>                             <PAGES>724</PAGES>
    <PRICE>$7.75</PRICE>                           <PRICE>$9.95</PRICE>
  </BOOK>                                        </BOOK>
  Cascading Style Sheets
/* File Name: Inventory02.css */   BINDING
BOOK                                {display:block;
  {display:block;                    margin-left:15pt}
  {display:block;                  PRICE
   font-size:12pt;                  {display:block;
   font-weight:bold;                 margin-left:15pt}
Cascading Style Sheets
    Formal Languages/Grammars
•    A formal language is a set of strings
     –   It is characterized by a set of rules which determine which strings are
         a part of the language and which are not
     –   In case of programming languages, programs which compile are
         grammatical corret (others are not)
     –   In a natural language, like English, correct sentences follows rules of
         the English language grammar
•    More precisely grammar a defines four things
     –   A vocabulary out of which the strings are constructed (terminal
     –   Vocabulary that is used to formulate grammar rules (non terminal
     –   Grammar rules (productions), each of which has a lhs and a rhs
     –   A designated start symbol
    Validated XML Document
•    An XML document is valid if it conforms to the grammar of
     the language
     –   Validity is different from well-formedness
•    Two ways to specify the grammar of the language
     –   Document Type Definition (DTD)
     –   XML Schema
•    Why bother with the language grammar
     –   It provides the blueprint of the language
     –   Ensures that the data is interchangable
     –   Eliminates processing errors in custom software which expects a
         particular document content and structure
•    Validity of the document is checked by using a validator
    Document Type Declaration
•    Document type declaration is a block of XML markup added
     to the prolog of the document
     –   It has to follow the XML declaration
     –   It has to be outside of other markup language
•    It defines the content and structure of the language
     –   Without a document type declaration or schema a document is merely
         checked for well-formedness and not validity
•    Why bother with the language grammar
     –   It provides the blueprint of the language
     –   Ensures that the data is interchangable
     –   Eliminates processing errors in custom software which expects a
         particular document content and structure
•    The form of a document type declaration is:
     –   <!DOCTYPE Name DTD>
     –   DTD is document type definition
     –   Name specifies the name of the document element
    Document Type Definitions
•    Document type definition (DTD) consists of a series of
     markup declarations enclosed in square brackets
        <?xml version=“1.0” standalone=“yes”?>
        Hello XML!
•    A DTD can also be stored separately from the XML document
     and referenced in it.
    Document Type Definitions
• Element Type Declaration
     –   Syntax: <!Element Name contentspec>
     –   Name is the name of the element
     –   contentspec is the content specification
•    Example:
     –   <!Element Title (#PCDATA)>
•    Content specification can have four types of values
     –   EMPTY content – Element must not have content
           <!Element Image EMPTY>
     –   ANY Content – Can contain any thing
           <!Element misc ANY>
     –   Element Content – Child elements but no character data
           <!DOCTYPE BOOK [
              <!ELEMENT BOOK (TITLE, AUTHOR)>
              <!ELEMENT TITLE (#PCDATA)>
              <!ELEMENT AUTHOR (#PCDATA)>
     –   Mixed Content – character data and child elements interspersed
    Element Content Specification
• Content Specification indicates allowed child elements and
  their order
     –   If element has element content it can not contain any character data
•    Types of content specifications
     –   Sequence: Indicates that each element must have a specific sequence
         of child elements
     –   Example
           <!Doctype Mountain [
              <!ELEMENT NAME (#PCDATA)
              <!ELEMENT HEIGHT (#PCDATA)
              <!ELEMENT STATE (#PCDATA)
     –   Valid XML
              <STATE>New Mexico</STATE>
  Element Content Specification
• Types of content specifications
    –   Choice: Indicates that element can have one of a series of child
    –   Each element is separated by a | sign
    –   Example
          <!Doctype FILM [
             <!ELEMENT STAR (#PCDATA)>
             <!ELEMENT NARRATOR (#PCDATA)>
    –   Valid XML
    –   Invalid XML
              <NARRATOR>Sir Gregory Parsloe</NARRATOR>
              <INSTRUCTOR>Galahad Threepwood</INSTRUCTOR>
    Element Content Specification
    Number of Elements
•    Specifying the number of elements allowed
     –   ? zero or one
     –   + one or more
     –   * zero or more
     –   Example
           <!Doctype Mountain [
              <!ELEMENT NAME (#PCDATA)
              <!ELEMENT HEIGHT (#PCDATA)
              <!ELEMENT STATE (#PCDATA)
     –   Valid XML
              <NAME>Peublo Peak</NAME>
              <NAME>Taos Mountain</NAME>
              <STATE>New Mexico</STATE>
    Element Content Specification

•    Modifying a group of elements
     – Example
          <!Doctype FILM [
             <!ELEMENT STAR (#PCDATA)>
             <!ELEMENT NARRATOR (#PCDATA)>
     – Valid XML
             <NARRATOR>Sir Gregory Parsloe</NARRATOR>
    Element Content Specification
•    Nesting in specification
     –   Example
          <!Doctype FILM [
             <!ELEMENT TITLE (#PCDATA)>
             <!ELEMENT CLASS (#PCDATA)>
             <!ELEMENT STAR (#PCDATA)>
             <!ELEMENT NARRATOR (#PCDATA)>
     –   Valid XML
              <TITLE>The Net</TITLE>
              <STAR>Sandra Bullock</STAR>
    Element Content Specification
    Mixed Content Model
•    Mixed Content Model: Allows element to contain
     –   Character Data
     –   Child elements in any position and any frequency (zero or more
     –   Child elements can be interspersed with data
•    Character data only
     –   Example
           <!ELEMENT TITLE (#PCDATA)>

•    Character data and elements
     –   Example:
     –   Valid XML
           <TITLE>Moby Dick <SUBTITLE>Or, The Whale</SUBTITLE></TITLE>
           <TITLE><SUBTITLE>Or, The Whale</SUBTITLE>Moby Dick</TITLE>
    Attribute Specification
•    All attributes in the document need to be specified using an
     attribute declaration list. It defines
     –   Defines the name of the attribute
     –   Defines the data type of each attribute
     –   Specifies whether an attribute is required or noe
•    Syntax: <!ATTLIST Name Attdefs>
     –   Name is the name of the element
     –   Attdefs is a series of one or more attribute definitions
•    Attribute definition Syntax: Name AttType DefaultDecl
     –   Name is the attribute name
     –   AttType is the type of the attribute (CDATA, Token Type,
     –   DefaultDecl specifies if attribute is required & default values
     –   Example:
           <!ATTLIST FILM Class CDATA “fictional” Year CDATA #REQUIRED>
    Entity Specification

•    There are two kinds of entities in XML documents1
     – Character entities (referred by character unicode number)
     – Named entities, referred to by name
    XML Parsing
    Definition and Types
•    An XML parser is a program that reads an XML document
     and makes its contents available for processing
•    There are two standard types of parsers for XML
     –   Document Object Model (DOM) which makes the document available
         as a tree
     –   Simple XML Parser (SAX) which associates an event with each tag
         and each block of text
•    XML parsers are available from many vendors
     –   Each vendor conforms to the standardized XML interfaces
     –   One of the best parsers is the xerces parser
     –   Suns API for XML parsing is JAXP (supports basic classes and
         interfaces that a Java XML parser should support)
     –   Often SAX parsers are used for writing DOM parsers
    SAX Parser
•    As the parser scans the document it sends notifications of
     events, for instance
     –    Element start
     –    Element end
     –    Character sequence between two elements is found
•    SAX provides standard names for these callback functions that
     are triggerd by these events
     void characters (char[] ch, int start, int length): notification of character data
     void startDocument(): notification of start of document
     void endDocument(): notification of end of document
     void startElement(String name, AttributeList atts): notification of start of element
     void endElement(String name): notification of end of element
     void processingInstruction(String target, String data): notification of processing
 SAX Parser

From professional JSP page 658
    XSLT Parser
    Definition and Uses
•    XSLT is an XML structure transforming language
     – Any treee transforming language needs an ability to refer
       to tree paths
     – Xpath is the sub-language underneath XSLT for tree path
•    There are two scenarios for use of XSLT
     – Browser contains an XSLT and uses it to render XML
     – XSLT is used for changing the structure of an existing
       XML document
•     To run XSLT the following components are required
     – Java 1.4 standard development kit
     – James Clark’s xt (xt.jar)
    XSLT Parser
•    XSLT style sheet is an XML document
•    Consists of two parts
     –   Standard XML declaration including namespace declaractions
     –   Top level elements that set up the general framework for the output,
         e.g., variables or import parameters from the command line
•    Processing involves the following
     –   A current list of nodes from the source document is created by
         matching a pattern
     –   Output to the current node is generated by instantiating a template
         corresponding the current pattern
     –   In process of transformation new nodes can be added to the list
     –   The processing begins by processing a list containing the entire
     –   Transformation ends when the node list is empty
    XSLT Parser
•    XSLT
Web Services
    Web Services
•    Web Services are software programs that use XML to exchange
     information with other software programs via common Internet
     –   Web services communicate over the network to provide specific
         methods that other applications can invoke.
     –   Thus applications residing on different computer can work
         synergistically by invoking methods on each other
     –   Http is the key protocol used for Web Services.
•    Characteristics
     –   Programmable
     –   Encapsulate a task
     –   XML based data exchange allows programs on heterogenous platforms
         to communicate (SOAP)
     –   Self-describing (WSDL)
     –   Discoverable (UDDI)
    Web Services
•    SOAP – Simple Object Access Protocol
     –   Enables data transfer between systems distributed over a network
     –   A SOAP method send to the a Web Service invokes a method provided
         by the service
     –   Web Service may return the result via another SOAP message
•    SOAP consists of standardized XML schemas
•    Defines a format for transmitting XML messages over network
     –   Includes data types and message structure
•    Layered over an Internet protocol, such as HTTP and can be
     used to transfer data across the Web and other networks
     –   Http allows message transfer across firewall since Http messages are
         usually accepted by firewalls
    Web Services
•    SOAP message consists of three parts
     –   Envelope
     –   Header
     –   Body
•    Envelope wraps the entire message and contains header and
•    Header (optional) provides information on security and routing
•    Body contains application specific data that is being transferred
•    Other alternative to SOAP are XML-RPC
     –   SOAP de facto standard due to simplicity, extensibility and
    Web Services
•    WSDL – Web Services Description Language
•    Provides means to provide information about a web service
     –   Instructions of its use
     –   Capability of the service
•    Provides information on connection to the service and
•    Syntax is fairly complex
     –   Normally created using automated tools
     –   Not important to understand the precise syntax of WSDL while
         developing web services
    Web Services
•    UDDI – Universal Description, Discovery and Integration
     –   Allows developers and businesses to publish and locate web services on
         a network via use of registries
     –   The registries can be made private or public
•    Structure similar to a phone book
     –   White pages contain contact information and textual description
     –   Yellow pages provides classification information about companies and
         details of company’s electronic capability
     –   Green pages list technical data relating to services and business

Shared By: