Docstoc

Impact of XML Schema Versioning on System Design

Document Sample
Impact of XML Schema Versioning on System Design Powered By Docstoc
					                                                                         http://www.xfront.com/SchemaVersioning.html




          Impact of XML Schema Versioning on System
                           Design
                             (Strategies for Facilitating System Evolution)
                               by Roger L. Costello and Melissa Utzinger

         Introduction
         Creating a new version of an XML Schema may have effects that ripple through many
         parts of a system. Managing these effects can be expensive. So it is worthwhile to
         examine ways to mitigate the costly ripple effects of new versions of a Schema.

         Frequently, Schema versioning is considered in isolation from the rest of the system.
         However, as noted, Schema changes may impact other parts of a system, so we
         recommend that Schema versioning be part of an integrated system evolution plan.
         Schema versioning is one of the drivers for system evolution.

         As a strategy for facilitating system evolution we focus on these three parts of a system
         - Schemas, instance documents, and applications. To treat these three parts in a holistic
         fashion we make the following recommendations:

         Schema Design Recommendations:
              - Use the same namespace for all Schema versions.
              - Give each new Schema version a different filename or a different
              URL location or both.
              - Don't use anonymous types. Instead, use named types.
              - If you change a type when you create a new version of a Schema
              then give the type a different name.
              - Change the name of an element's type only if its immediate
              content has changed.
              - Use a version attribute on the root element. If an instance
              document is a compound document - that is, an assembly of XML
              fragments - then place a version attribute on the root of each
              fragment.

         Instance Document Design Recommendations:
              - Use the schemaLocation attribute to identify the target Schema
              (i.e., don't have the Schema validator use out-of-band information
              to identify the target Schema)

         Application Design Recommendations:
              - Applications should use the tag names to locate data in instance
              documents. (Applications should be designed to anticipate that
              the order of tags may change)




1 of 8                                                                                             5/11/2006 5:16 PM
                                                                         http://www.xfront.com/SchemaVersioning.html



              - Define a system-wide protocol (e.g., fault reporting mechanism)
              to be used when an application is unable to process an instance
              document it receives from another application.

         The rationale for each of these recommendations is explained over the course of this
         paper. But first we begin by defining the nature of the systems being targeted.

         The System
         We assume that the system under consideration possesses these characteristics:

              The system is comprised of multiple independent applications that collaborate by
              exchanging XML documents (henceforth called "instance documents").
              All instance documents conform to a common XML Schema. (The applications are
              part of a community which uses the same XML Schema)
              All applications are independent and are not required to have prior knowledge,
              understanding, or agreements with one another.
              The XML Schema is periodically updated, i.e., periodically a new version is created.
              The XML Schemas (both old and new versions) are accessible by the applications.
              Applications are not required to upgrade in lockstep. For example, application A
              may have been upgraded to send and receive instance documents which conform
              to the latest version of an XML Schema. Meanwhile, the application that it
              exchanges data with may still be coded to the last version.
              The semantics of an element does not change with new versions of the Schema.
              Thus, a <location> element always means "position", although in the version 1
              Schema its contents may be <lat> and <lon> whereas in the version 2 Schema its
              contents may be <x> and <y>. Thus, the representation of <location> may
              change between versions, but its semantics remains constant.
              Instance documents use the schemaLocation attribute to reference the XML
              Schema(s).

         Given the above system characterization we now state the problem.

         Problem Statement
         How can a system be designed to minimize breaking things with each new version of the
         XML Schema? Specifically, when designing these three things what strategies can be
         employed to minimize breakage :

              XML Schemas
              Instance documents
              Applications

         Categories of Schema Changes that Impact Instance
         Documents and Applications

         There are six categories of changes to an XML Schema that can impact instance
         documents and applications:

           1. Namespace: the new version (of the XML Schema) could have a different



2 of 8                                                                                             5/11/2006 5:16 PM
                                                                        http://www.xfront.com/SchemaVersioning.html


              namespace (i.e., targetNamespace).
           2. Location: the new version could physically reside at a new location (URL).
           3. Change: the new version could have changed the contents of an element.
           4. Shuffle: the new version could have reorganized the data in some way, such as by
              changing the order of elements.
           5. Remove: the new version could have removed an element or attribute that was
              previously in the old version.
           6. Add: the new version could have added an element or attribute that was not in the
              old version.

         Note: There are many other kinds of changes that could occur in an XML Schema than
         those listed above. However, they are changes internal to the schema and have no
         manifestation in instance documents.

         Below we discuss how to mitigate the impact of each of these changes.

         1. Namespace-Aware Applications
         Most XML applications are "namespace aware". That is, the application is designed to
         process elements belonging to a specific namespace.

         For example, an XML Stylesheet Language Transformations (XSLT) Processor is an
         application which understands the XSLT namespace:

            http://www.w3.org/1999/XSL/Transform

         Concretely, this means that an XSLT Processor (application) knows how to process
         elements such as <template>, <for-each>, <if>, etc., provided the elements are
         associated with the XSLT namespace.

         Changing namespaces results in breaking namespace-aware applications. This brings us
         to our first recommendation:

         Recommendation 1: To avoid breaking namespace-aware applications with each new
         version of an XML Schema use the same namespace for all versions.

         2. Place the New Version of a Schema at a New
         Location to Avoid Breaking Old Instance Documents
         Suppose that a new version of a XML Schema is created (using the same namespace, as
         described above). And the new version simply overwrites the old version. That is, the
         new version has the same filename and the same URL location as the old schema.
         Depending on the kinds of changes made, this may result in breaking all instance
         documents that were written to conform to the old Schema.

         Recommendation 2: To prevent breaking old instance documents give the new Schema
         version a different filename or a different URL location or both.

         3. Dealing with Change to an Element's Content Model

         A common occurrence when creating a new version of an XML Schema is to change an
         element's content. (The technical expression is: "change an element's content model")



3 of 8                                                                                            5/11/2006 5:16 PM
                                                                          http://www.xfront.com/SchemaVersioning.html


         For example, in a version 1 Schema the <location> element may have been declared to
         be comprised of <lat> and <lon> whereas in version 2 its contents may be <x> and
         <y>.

         Suppose that an application receives an instance document which conforms to the latest
         version of the Schema. And let's suppose that the application is still coded to the
         previous version of the Schema. The application parses through the instance document
         and arrives, say, at the <location> element. How will the application recognize that
         location's content model has changed?

         It would be useful if the application could consult the parser: "What's the type (content
         model) of <location>?" If the type is not one that it expects then the application must
         decide how to proceed.


               There are many possible courses of action that an application
               may take when it encounters an element with an unfamiliar
               content model. For example it may (1) simply skip the <location>
               element, or it may (2) attempt to dynamically understand the
               new content model by consulting an ontology. Which action is
               taken depends on the application and is beyond the scope of this
               paper.

         How can we facilitate an application in recognizing an element's type? That is, how do we
         enable applications to determine the type of each element it encounters? Answer: the
         XML Schema must be designed to provide explicit type information.

         Recommendation 3: To facilitate an application in recognizing that an element's content
         has changed, don't use anonymous types. Instead, use named types.

         Example. Do not design your Schema like this:

         <element name="location">
           <complexType>
              <sequence>
                <element name="x" type="decimal"/>
                <element name="y" type="decimal"/>
              </sequence>
           </complexType>
         </element>

         What is location's type? Answer: it is anonymous. This Schema is not designed to
         facilitate an application in obtaining type information.

         Instead, design your Schema like this:

         <complexType name="locationType-x_y_version">
           <sequence>
             <element name="x" type="decimal"/>
             <element name="y" type="decimal"/>
           </sequence>
         </complexType>

         <element name="location" type="locationType-x_y_version"/>

         What is location's type? Answer: the named type, locationType-x_y_version. Thus, this
         Schema is designed to facilitate an application in obtaining type information.



4 of 8                                                                                              5/11/2006 5:16 PM
                                                                           http://www.xfront.com/SchemaVersioning.html


         Suppose that an Schema designer follows Recommendation 3 and always uses named
         types. This will enable applications to query a parser for type information, e.g., "What's
         the type of <location>?" The parser will reply with: "the type is
         locationType-x_y_version". If this is a type that the application did not expect (i.e., was
         not coded to understand) then it will take appropriate steps (as described above).


               As a parser validates an instance document against a Schema, it
               collects from the Schema information about each element in the
               instance document (such as the datatype of each element). This
               collection of information is called the Post Schema Validation
               Infoset (PSVI).

         Let's continue with the <location> example. Above we saw the motivation for using
         named types - it enables an application to easily discover an element's content model. Of
         course, if a new version of a Schema is created and <location>'s type is changed but the
         new type is given the same name as the old type, then it defeats the whole purpose of
         type information. This leads us to the next recommendation:

         Recommendation 4: If you change a type when you create a new version of a Schema
         then give the type a different name.

         Example. Suppose that in the version 1 Schema <location> has this as its contents:
         <lat> and <lon>. The Schema declares this named type:

         <complexType name="locationType">
           <sequence>
             <element name="lat" type="decimal"/>
             <element name="lon" type="decimal"/>
           </sequence>
         </complexType>

         <element name="location" type="locationType"/>

         Now suppose that in the version 2 Schema the contents of <location> is changed to
         <x> and <y>. It is important to give a new name to location's type:

         <complexType name="locationType-x_y_version">
           <sequence>
             <element name="x" type="decimal"/>
             <element name="y" type="decimal"/>
           </sequence>
         </complexType>

         <element name="location" type="locationType-x_y_version"/>

         Thus, if a version 1 application receives a version 2 instance document then, when it
         parses down to the <location> element, it will be able to easily recognize that
         <location>'s content model has changed (it has changed from locationType to
         locationType-x_y_version).

         3.b Localize Type Changes
         Suppose that the <location> element is nested within an <aircraft> element, e.g.,

         <complexType name="aircraftType">
           <sequence>



5 of 8                                                                                               5/11/2006 5:16 PM
                                                                            http://www.xfront.com/SchemaVersioning.html


             <element name="location" type="locationType-x_y_version"/>
           </sequence>
         </complexType>

         <element name="aircraft" type="aircraftType"/>

         Technically, the contents of <aircraft> has changed since the contents of <location> has
         changed. Should the type name for <aircraft> be changed? Answer: no. The reason is
         that we want to minimize changes. That is, we want an application to see as many
         familiar elements and types as possible. The aircraftType is a familiar type. It still has as
         its contents a <location> element. We want to preserve this familiarity.

         Recommendation 5: Change the name of an element's type only if its immediate
         content has changed.

         3.c Use a Version attribute
         Applications will find it useful to have an indication of whether it can expect changes as it
         processes an instance document. This can be accomplished using a version attribute on
         the root element.

         Note that this is what XSLT does. As the XSLT technology has migrated to a new
         version, instance documents (i.e., XSLT documents) indicate which version is being used
         with a version attribute on the root element.

         Recommendation 6: Use a version attribute on the root element. If an instance
         document is a compound document - that is, an assembly of XML fragments - then place
         a version attribute on the root of each fragment.

         4. Effect of Shuffling Elements
         A new version of a Schema may make a change as simple as reordering the contents of
         an element. For example, in a version 1 Schema the order may be A, B, C, e.g.,

         <complexType name="...">
           <sequence>
             <element name="A" type="..."/>
             <element name="B" type="..."/>
             <element name="C" type="..."/>
           </sequence>
         </complexType>

         In the version 2 Schema the order may be changed to B, C, A, e.g.,

         <complexType name="...">
           <sequence>
             <element name="B" type="..."/>
             <element name="C" type="..."/>
             <element name="A" type="..."/>
           </sequence>
         </complexType>

         If an application is coded to expect a certain ordering of the data then the new version of
         the Schema will break the application. To avoid this an application should never depend
         on specific ordering of data. It should locate the data using the tags.

         Recommendation 7: Applications should use the tag names to locate data in instance


6 of 8                                                                                                5/11/2006 5:16 PM
                                                                           http://www.xfront.com/SchemaVersioning.html


         documents. Applications should be designed to anticipate that the order of tags may
         change.


               Thus, a Schema's <sequence> particle should be treated only as
               notional.


         5. Effect of Removing an Element or Attribute
         Creating a new version of an XML Schema may result in removing an element or
         attribute. Consider an application that has not been upgraded to the new version, and
         receives an instance document that conforms to the new version. The application must
         decide whether the lack of the element or attribute is catastrophic or whether it can live
         without the information. The action taken is application-specific (and is outside the scope
         of this paper).

         6. Effect of Adding an Element or Attribute
         Creating a new version of an XML Schema may result in adding an element or attribute.
         Consider an application that has not been upgraded to the new version, and receives an
         instance document that conforms to the new version. The application must decide what
         to do with the additional information. Again, what action is taken is application-specific.

         What to do when an Application Breaks

         The above recommendations will help mitigate breakage due to Schema changes.
         However, they do not guarantee that applications will not break. An old application may
         receive a new instance that is missing crucial information, or the content model of a
         crucial element may have changed to a type that cannot be dynamically understood.

         To anticipate such occurrences it will be beneficial to institute a system protocol that
         specifies what actions should be taken by applications when breakage occurs. One
         possible protocol is for an application to respond to the sender with a fault message.

         Recommendation 8: Define a system-wide protocol (e.g., fault reporting mechanism) to
         be used when an application is unable to process an instance document it receives from
         another application.

         Summary

         To minimize impact to existing instance documents and applications as new versions of
         XML Schemas are created, we make the following recommendations:

         Recommendation 1: To avoid breaking namespace-aware applications with each new
         version of an XML Schema use the same namespace for all versions.

         Recommendation 2: To prevent breaking old instance documents give the new Schema
         version a different filename or a different URL location or both.

         Recommendation 3: To facilitate an application in recognizing that an element's content
         has changed, don't use anonymous types. Instead, use named types.


7 of 8                                                                                               5/11/2006 5:16 PM
                                                                      http://www.xfront.com/SchemaVersioning.html


         Recommendation 4: If you change a type when you create a new version of a Schema
         then give the type a different name.

         Recommendation 5: Change the name of an element's type only if its immediate
         content has changed.

         Recommendation 6: Use a version attribute on the root element. If an instance
         document is a compound document - that is, an assembly of XML fragments - then place
         a version attribute on the root of each fragment.

         Recommendation 7: Applications should use the tag names to locate data in instance
         documents. Applications should be designed to anticipate that the order of tags may
         change.

         Recommendation 8: Define a system-wide protocol (e.g., fault reporting mechanism) to
         be used when an application is unable to process an instance document it receives from
         another application.




8 of 8                                                                                          5/11/2006 5:16 PM
                             XML Schema Versioning


Issue
What is the Best Practice for versioning XML schemas?

Introduction
It is clear that XML schemas will evolve over time and it is important to capture the
schema’s version. This write-up summarizes two cases for schema changes and some
options for schema versioning. It then provides some ‘best practice’ guidelines for XML
schema versioning.

Schema Changes – Two Cases
Consider two cases for changes to XML schemas:

   Case 1.     The new schema changes the interpretation of some element.
               For example, a construct that was valid and meaningful for the previous
               schema does not validate against the new schema.

   Case 2.     The new schema extends the namespace (e.g., by adding new elements),
               but does not invalidate previously valid documents.

Versioning Approaches
Some options for identifying a new a schema version are to:
1. Change the (internal) schema version attribute.
2. Create a schemaVersion attribute on the root element.
3. Change the schema's targetNamespace.
4. Change the name/location of the schema.


Option 1: Change the (internal) schema version attribute.
In this approach one would simply change the number in the optional version attribute at
the start of the XML schema. For example, in the code below one could change
version=”1.0” to version=”1.1”

       <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
               elementFormDefault="qualified"
               attributeFormDefault="unqualified"
               version="1.0">

    Advantages:
     - Easy. Part of the schema specification.
     - Instance documents would not have to change if they remain valid with the new
        version of the schema (case 2 above).
     - The schema contains information that informs applications that it has changed.
        An application could interrogate the version attribute, recognize that this is a
        new version of the schema, and take appropriate action.


                                               1
     Disadvantages:
      - The validator ignores the version attribute. Therefore, it is not an enforceable
         constraint.

Option 2: Create a schemaVersion attribute on the root element.
With this approach an attribute is included on the element that introduces the namespace.
In the examples below, this attribute is named ‘schemaVersion’. This option could be
used in two ways.

Usage A: First, like option 1, this attribute could be used to capture the schema version.
In this case, one could make the attribute required and the value fixed. Then each
instance that used this schema would have to set the value of the attribute to the value
used in the schema. This makes schemaVersion a constraint that is enforceable by the
validator. With the example schema below, the instance would have to include a
schemaVersion attribute with a value of 1.0 for the instance to validate.
       <xs:schema xmlns="http://www.exampleSchema"
                targetNamespace="http://www.exampleSchema"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                elementFormDefault="qualified" attributeFormDefault="unqualified">
               <xs:element name="Example">
                     <xs:complexType>
                             ….
                         <xs:attribute name="schemaVersion" type="xs:decimal" use="required" fixed="1.0"/>
                     </xs:complexType>
                 </xs:element>


     Advantages:
      - The schemaVersion attribute is an enforceable constraint. Instances would not
         validate without the same version number.

     Disadvantages:
      - The schemaVersion number in the instance must match exactly. This does not
         allow an instance to indicate that it is valid using multiple versions of a schema.

Usage B: The second approach uses the schemaVersion attribute in an entirely different
way. It no longer captures the version of the schema within the schema (i.e., it is not a
fixed value). Rather, it is used in the instance to declare the version (or versions) of the
schema with which the instance is compatible. This approach would have to be done in
conjunction with option 1 (or an alternative indicator in the schema file to identify its
version).

The schemaVersion attribute’s value could be a list or a convention could be used to
define how this attribute is used. For example, if the convention was that the
schemaVersion attribute declares the latest schema version with which the instance is
compatible, then the example instance below states that the instance should be valid with
schema version 1.2 or earlier.

With this approach, an application could compare the schema version (captured in the
schema file) with the version to which the instance reports that it is compatible.



                                                    2
       Sample Schema (declares it’s version as 1.3)
       <xs:schema xmlns="http://www.exampleSchema"
                targetNamespace="http://www.exampleSchema"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                elementFormDefault="qualified" attributeFormDefault="unqualified"
                version="1.3">
               <xs:element name="Example">
                     <xs:complexType>
                             ….
                         <xs:attribute name="schemaVersion" type="xs:decimal" use="required"/>
                     </xs:complexType>
                 </xs:element>


       Sample Instance (declares it is compatible with version 1.2
              (or 1.2 and other versions depending upon the convention used))
       <Example schemaVersion="1.2"
               xmlns="http://www.example"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="http://www.example MyLocation\Example.xsd">


    Advantages:
     - Instance documents may not have to change if they remain valid with the new
        schema version (case 2).
     - Like option 1, an application would receive an indication that the schema has
        changed.
     - Could provide an alternative to schemaLocation as a means to point to the
        correct schema version. This could be desirable where the business practice
        requires the use of a schema in a controlled repository, rather than an arbitrary
        location.

    Disadvantages:
     - Requires extra processing by an application. For example, an application
        would have to pre-parse the instance to determine what schema version with
        which it is compatible, and compare this value to the version number stored in
        the schema file.

Option 3: Change the schema's targetNamespace.
In this approach, the schema’s ‘targetNamespace’ could be changed to designate that a
new version of the schema exists. One way to do this is to include a schema version
number in the designation of the target namespace as shown in the example below.

       <xs:schema xmlns="http://www.exampleSchemaV1.0"
               targetNamespace="http://www.exampleSchemaV1.0"
               xmlns:xs="http://www.w3.org/2001/XMLSchema"
               elementFormDefault="qualified" attributeFormDefault="unqualified">

    Advantages:
     - Applications are notified of a change to the schema (i.e., an application would
        not recognize the new namespace).
     - Requires action to assure that there are no compatibility problems with the new
        schema. At a minimum, the instance documents that use the schema, and
        schemas that include the relevant schema, must change to reference the new
        targetNamespace. This both an advantage and a disadvantage.


                                                   3
    Disadvantages:
     - With this approach, instance documents will not validate until they are changed
        to designate the new targetNamepsace. However, one does not want to force
        all instance documents to change, even if the change to the schema is really
        minor and would not impact an instance.
     - Any schemas that ‘include’ this schema would have to change because the
        target namespace of the included components must be the same as the target
        namespace of the including schema.

Option 4: Change the name/location of the schema.
This approach changes the file name or location of the schema. This mimics the
convention that many people use for naming their files so that they know which version
is the most current (e.g., append version number or date to end of file name).

       Advantages:

       Disadvantages:
        - As with option 3, this approach forces all instance documents to change,
           even if the change to the schema would not impact that instance.
        - Any schemas that import the modified schema would have to change since
           the import statement provides the name and location of the imported schema.
        - Unlike the previous options, with this approach an application receives no
           hint that the meaning of various element/attribute names has changed.
        - The schemaLocation attribute in the instance document is optional and is not
           authoritative even if it is present. It is a hint to help the processor to locate
           the schema. Therefore, relying on this attribute is not a good practice (with
           the current reading of the specification).


XML Schema Versioning Best Practices

[1] Capture the schema version somewhere in the XML schema.

[2] Identify in the instance document, what version/versions of the schema with which
    the instance is compatible.

[3] Make previous versions of an XML schema available.

   This allows applications to use previous versions. It also allows users to migrate to
   new versions of the schema as compatibility is assured.

   One way to do this is to have applications pre-parse the instance and choose the
   appropriate schema based on the version number. For example, one could have the
   schemaLocation URI point to a document that includes a list of the locations of the
   available versions of the schema. A tool could then be used to obtain the correct
   version of the schema. The disadvantage of this approach is that this pre-parsing
   requires two passes at the XML instance (one to get the correct version of the schema
   and one to validate).

                                             4
[4] When an XML schema is only extended, (e.g., new elements, attributes, extensions to
    an enumerated list, etc.) one should strive to not invalidate existing instance
    documents.

   For example, if one is adding new elements or attributes, one could consider making
   them optional where this makes sense.

   Also, one could come up with a convention for schema versioning to indicate whether
   the schema changed significantly (case 1) or was only extended (case 2). For
   example, for case 1 a version could increment by one (e.g., v1.0 to v2.0) whereas for
   case 2 a version could increment by less than one (e.g., v1.2 to v1.3).

   In this case, a possible approach would be to do the following with respect to the
   schema:
       a. Change the schema version number within the schema (e.g., option 1).
       b. Record the changes in the schema in a change history.
       c. Make the new and previous versions of the schema available (therefore, one
            would want to change the file name/location as well).

[5] Where the new schema changes the interpretation of some element (e.g., a construct
    that was valid and meaningful for the previous schema does not validate against the
    new schema), one should change the target namespace.

   In this case, the changes with respect to the schema are the same as with [4], with one
   addition:
       d. Change the target namespace.

   In this case there are also required changes with respect to the instances that use this
   schema.
       e. Update the instances to reflect the new target namespace.
       f. Confirm that there are no compatibility problems with the new schema.
       g. Change the attribute that identifies the version/versions of the schema with
            which the instance is valid.
       h. Update the schema name/location if appropriate.




                                             5
                     Hide (Localize) Namespaces
                               Versus
                        Expose Namespaces
Table of Contents
               Issue
               Introduction
               Example
               Technical Requirements for Hiding (Localizing) Namespaces
               Best Practice

Issue

When should a schema be designed to hide (localize) within the schema the namespaces of the
elements and attributes it is using, versus when should it be designed to expose the namespaces
in instance documents?
Introduction

A typical schema will reuse elements and types from multiple schemas, each with different
namespaces.


          <xsd:schema …                              <xsd:schema …
             targetNamespace=“A”>                       targetNamespace=“B”>

                   A.xsd                                       B.xsd

                    <xsd:schema …
                              targetNamespace="C">
                        <xsd:import namespace="A"
                                    schemaLocation="A.xsd"/>
                        <xsd:import namespace="B"
                                    schemaLocation="B.xsd"/>
                        …
                    </xsd:schema>
                                            C.xsd




                                               12
A schema, then, may be comprised of components from multiple namespaces. Thus, when a
schema is designed the schema designer must decide whether or not the origin (namespace) of
each element should be exposed in the instance documents.


                                   Instance Document

                    <myDoc …
                       schemaLocation=“C C.xsd”>




                    </myDoc>
                          The namespaces of the components are not
                              visible in the instance documents.




                                    Instance Document

                    <myDoc …
                       schemaLocation=“C C.xsd”>




                    </myDoc>
                      The namespaces of the components are visible
                              in the instance documents.



                                             13
A binary switch attribute in the schema is used to control the hiding/exposure of namespaces: by
setting elementFormDefault=“unqualified” the namespaces will be hidden (localized) within the
schema, and by setting elementFormDefault=“qualified” the namespaces will be exposed in
instance documents.


                          elementFormDefault - the
                             Exposure “Switch”
                                                         Schema
                         hide
                      expose

                                           elementFormDefault




              <xsd:schema …                               <xsd:schema …
                  elementFormDefault=“qualified”>   vs        elementFormDefault=“unqualified”>




Example:


       Nikon.xsd                    Olympus.xsd                              Pentax.xsd




                                      Camera.xsd


Below is a schema for describing a camera. The camera schema reuses components from other
schemas - the camera’s <body> element reuses a type from the Nikon schema, the camera’s
<lens> element reuses a type from the Olympus schema, and the camera’s <manual_adaptor>
element reuses a type from the Pentax schema.




                                                         14
Camera.xsd



    <?xml version="1.0"?>
    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                targetNamespace="http://www.camera.org"
                xmlns:nikon="http://www.nikon.com"
                xmlns:olympus="http://www.olympus.com"
                xmlns:pentax="http://www.pentax.com"
                elementFormDefault="unqualified">
        <xsd:import namespace="http://www.nikon.com"
                     schemaLocation="Nikon.xsd"/>
        <xsd:import namespace="http://www.olympus.com"
                     schemaLocation="Olympus.xsd"/>
        <xsd:import namespace="http://www.pentax.com"
                     schemaLocation="Pentax.xsd"/>
        <xsd:element name="camera">
            <xsd:complexType>
                  <xsd:sequence>
                      <xsd:element name="body" type="nikon:body_type"/>
                      <xsd:element name="lens" type="olympus:lens_type"/>
                      <xsd:element name="manual_adapter"
                                   type="pentax:manual_adapter_type"/>
                </xsd:sequence>
            </xsd:complexType>
        </xsd:element>
    </xsd:schema>

             This schema is designed to hide namespaces


Note the three <import> elements for importing the Nikon, Olympus, and Pentax components.
Also note that the <schema> attribute, elementFormDefault has been set to the value of
unqualified. This is a critical attribute. Its value controls whether the namespaces of the elements
being used by the schema will be hidden or exposed in instance documents (thus, it behaves like
a switch turning namespace exposure on/off). Because it has been set to “unqualified” in this
schema, the namespaces will be remain hidden (localized) within the schema, and will not be
visible in instance documents, as we see here:




                                                 15
Camera.xml (namespaces hidden)



                                    Instance Document
   <?xml version="1.0"?>
   <my:camera xmlns:my="http://www.camera.org"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation=
                      "http://www.camera.org
                       Camera.xsd">
      <body>
       <body>
                    Instance document with namespaces hidden
         <description>Ergonomically designed casing for easy handling</description>
           <description>Ergonomically designed casing for easy handling</descriptio
      </body>
       </body>
      <lens>
       <lens>       (localized) within the schema
          <zoom>300mm</zoom>
           <zoom>300mm</zoom>
          <f-stop>1.2</f-stop>
           <f-stop>1.2</f-stop>
      </lens>
       </lens>
      <manual_adapter>
       <manual_adapter>
          <speed>1/10,000 sec to 100 sec</speed>
           <speed>1/10,000 sec to 100 sec</speed>
      </manual_adapter>
       </manual_adapter>
   </my:camera>

     Instance document with namespaces hidden (localized) in the schema.
     --> The fact that the <description> element comes from the Nikon schema,
     the <zoom> and <f-stop> elements come from the Olympus schema, and the
     <speed> element comes from the Pentax schema is totally transparent to the
     instance document.


The only namespace qualifier exposed in the instance document is on the <camera> root
element. The rest of the document is completely free of namespace qualifiers. The Nikon,
Olympus, and Pentax namespaces are completely hidden (localized) within the schema!

Looking at the instance document one would never realize that the schema got its components
from three other schemas. Such complexities are localized to the schema. Thus, we say that the
schema has been designed in such a fashion that its component namespace complexities are
“hidden” from the instance document.

On the other hand, if the above schema had set elementFormDefault=“qualified” then the
namespace of each element would be exposed in instance documents. Here’s what the instance
document would look like:




                                               16
Camera.xml (namespaces exposed)



                                Instance Document
         <?xml version="1.0"?>
         <c:camera xmlns:c="http://www.camera.org"
                   xmlns:nikon="http://www.nikon.com"
                   xmlns:olympus="http://www.olympus.com"
                   xmlns:pentax="http://www.pentax.com"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation=
                            "http://www.camera.org
                             Camera.xsd>
             <c:body>
                 <nikon:description>Ergonomically designed casing for easy
                                            handling</nikon:description>
                                    handling</nikon:description>
             </c:body>
             <c:lens>
                 <olympus:zoom>300mm</olympus:zoom>
                 <olympus:zoom>300mm</olympus:zoom>
                 <olympus:f-stop>1.2</olympus:f-stop>
                 <olympus:f-stop>1.2</olympus:f-stop>
             </c:lens>
             <c:manual_adapter>
                 <pentax:speed>1/10,000 sec to 100 sec</pentax:speed>
                  <pentax:speed>1/10,000 sec to 100 sec</pentax:speed>
             </c:manual_adapter>
         </c:camera>

                       Instance document with namespaces exposed


Note that each element is explicitly namespace-qualified. Also, observe the declaration for each
namespace. Due to the way the schema has been designed, the complexities of where the schema
obtained its components have been “pushed out” to the instance document. Thus, the reader of
this instance document is “exposed” to the fact that the schema obtained the description element
from the Nikon schema, the zoom and f-stop elements from the Olympus schemas, and the speed
element from the Pentax schema.
All Schemas must have a Consistent Value for elementFormDefault!

Be sure to note that elementFormDefault applies just to the schema that it is in. It does not
apply to schemas that it includes or imports. Consequently, if you want to hide namespaces then
all schemas involved must have set elementFormDefault=“unqualified”. Likewise, if you want to
expose namespaces then all schemas involved must have set elementFormDefault=“qualified”.
To see what happens when you “mix” elementFormDefault values, let’s suppose that Camera.xsd
and Olympus.xsd have both set in their schema elementFormDefault=“unqualified”, whereas
Nikon.xsd and Pentax.xsd have both set elementFormDefault=“qualified”.




                                               17
                                         element FormDefault= "unqualified"



                                                 Olympus.xsd


               element FormDefault="qualified"                        elementFormDefault= "qualified"



                    Nikon.xsd                                                 Pentax.xsd




                                         elementFormDefault= "unqualified "




                                                 Camera.xsd


Here’s what an instance document looks like with this “mixed” design:
Camera.xml (mixed design)



                                           Instance Document
               <?xml version="1.0"?>
               <my:camera xmlns:my="http://www.camera.org"
                           xmlns:nikon="http://www.nikon.com"
                           xmlns:pentax="http://www.pentax.com"
                           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                           xsi:schemaLocation=
                                   "http://www.camera.org
                                    Camera.xsd>
                  <body>
                   <body>
                              Instance document with namespaces hidden
                     <nikon:description>Ergonomically designed casing for easy
                       <nikon:description>Ergonomically casing for easy
                     <description>Ergonomically designed designed casing for easy
                                           handling</description>
                                                 handling</nikon:description>
                                           handling</nikon:description>
                  </body>
                   </body>    (localized) within the schema
                  <lens>
                   <lens>
                      <zoom>300mm</zoom>
                       <olympus:zoom>300mm</olympus:zoom>
                        <olympus:zoom>300mm</olympus:zoom>
                      <f-stop>1.2</f-stop>
                                <olympus:f-stop>1.2</olympus:f-stop>
                       <olympus:f-stop>1.2</olympus:f-stop>
                  </lens>
                   </lens>
                  <manual_adapter>
                   <manual_adapter>
                      <speed>1/10,000 sec to 100 sec</speed>
                      <pentax:speed>1/10,000 sec to 100 sec</pentax:speed>
                       <pentax:speed>1/10,000 sec to 100 sec</pentax:speed>
                  </manual_adapter>
                   </manual_adapter>
               </my:camera>


              Hiding/exposure mix: This instance document has the Nikon and
              Pentax namespaces exposed, while the Camera and Olympus
              namespaces are hidden.




                                                        18
Observe that in this instance document some of the elements are namespace-qualified, while
others are not. Namely, those elements from the Camera and Olympus schemas are not qualified,
whereas the elements from the Nikon and Pentax schemas are qualified.
Technical Requirements for Hiding (Localizing) Namespaces

There are two requirements on an element for its namespace to be hidden from instance
documents:
[1] The value of elementFormDefault must be “unqualified”.
[2] The element must not be globally declared. For example:
            <?xml version=“1.0”?>
            <xsd:schema ...>
              <xsd:element name=“foo”>
              ...
            </xsd:schema ...>

The element foo can never have its namespace hidden from instance documents, regardless of the
value of elementFormDefault. foo is a global element (i.e., an immediate child of <schema>) and
therefore must always be qualified. To enable namespace hiding the element must be a local
element.
Best Practice

For this issue there is no definitive Best Practice with respect to whether to design your schemas
to hide/localize namespaces, or design it to expose namespaces. Sometimes it’s best to hide the
namespaces. Othertimes it’s best to expose the namespaces. Both have their pluses and minus, as
is discussed below.

However, there are Best Practices with regards to other aspects of this issue. They are:
1.   Whenever you create a schema, make two copies of it. The copies should be identical,
     except that in one copy set elementFormDefault=“qualified”, whereas in the other copy set
     elementFormDefault=“unqualified”. If you make two versions of all your schemas then
     people who use your schemas will be able to implement either design approach - hide
     (localize) namespaces, or expose namespaces.
2. Minimize the use of global elements and attributes so that elementFormDefault can behave
   as an “exposure switch”. The rationale for this was described above, in Technical
   Requirements for Hiding (Localizing) Namespaces
Advantages of Hiding (Localizing) Component Namespaces within the Schema

The instance document is simple. It’s easy to read and understand. There are no namespace
qualifiers cluttering up the document, except for the one on the document element (which is okay
because it shows the domain of the document). The knowledge of where the schema got its
components is irrelevant and localized to the schema.




                                                19
Design your schema to hide (localize) namespaces within the schema ...
– when simplicity, readability, and understandability of instance documents is of utmost
  importance when namespaces in the instance document provide no necessary additional
  information. In many scenarios the users of the instance documents are not XML-experts.
  Namespaces would distract and confuse such users, where they are just concerned about
  structure and content.
– when you need the flexibility of being able to change the schema without impact to instance
  documents. To see this, imagine that when a schema is originally designed it imports elements/
  types from another namespace. Since the schema has been designed to hide (localize) the
  namespaces, instance documents do not see the namespaces of the imported elements. Then,
  imagine that at a later date the schema is changed such that instead of importing the elements/
  types, those elements and types are declared/defined right within the schema (inline). This
  change from using elements/types from another namespace to using elements/types in the local
  namespace has no impact to instance documents because the schema has been designed to
  shield instance documents from where the components come from.
Advantages of Exposing Namespaces in Instance Documents

If your company spends the time and money to create a reusable schema component, and makes
it available to the marketplace, then you will most likely want recognition for that component.
Namespaces provide a means to achieve recognition. For example,
            <nikon:description>
                Ergonomically designed casing for easy handling</
            nikon:description>


There can be no misunderstanding that this component comes from Nikon. The namespace
qualifier is providing information on the origin/lineage of the description element.

Another case where it is desirable to expose namespaces is when processing instance documents.
Oftentimes when processing instance documents the namespace is required to determine how an
element is to be processed (e.g., “if the element comes from this namespace then we’ll process it
in this fashion, if it comes from this other namespace then we’ll process it in a different
fashion”). If the namespaces are hidden then your application is forced to do a lookup in the
schema for every element. This will be unacceptably slow.

Design your schema to expose namespaces in instance documents ...
– when lineage/ownership of the elements are important to the instance document users (such as
  for copyright purposes).
– when there are multiple elements with the same name but different semantics then you may
  want to namespace-qualify them so that they can be differentiated (e.g, publisher:body versus
  human:body). [In some cases you have multiple elements with the same name and different
  semantics but the context of the element is sufficient to determine its semantics. Example: the
  title element in <person><title> is easily distinguished from the title element in
  <chapter><title>. In such cases there is less justification for designing your schema to expose
  the namespaces.]

                                                20
– when processing (by an application) of the instance document elements is dependent upon
  knowledge of the namespaces of the elements.
Note about elementFormDefault and xpath Expressions

We have seen how to design your schema so that elementFormDefault acts as an “exposure
switch”. Simply change the value of elementFormDefault and it dictates whether or not elements
are qualified in instance documents. In general, no other changes are needed in the schema other
than changing the value of elementFormDefault. However, if your schema is using <key> or
<unique> then you will need to make modifications to the xpath expressions when you change
the value of elementFormDefault.

If elementFormDefault=“qualified” then you must qualify all the references in the xpath
expression.
Example:

<xsd:key name=“PK”>

 <xsd:selector xpath=“c:Camera/c:lens”>

 <xsd:field xpath=“c:zoom”>

</xsd:key>

Note that each element in the xpath expression is namespace-qualified.

If elementFormDefault=“unqualified” then you must NOT qualify the references in the xpath
expression.
Example:

<xsd:key name=“PK”>

 <xsd:selector xpath=“Camera/lens”>

 <xsd:field xpath=“zoom”>

</xsd:key>

Note that none of the elements in the xpath expressions are namespace-qualified.

So, as you switch between exposing and hiding namespaces you will need to take the xpath
changes into account.




                                               21
                             Global versus Local
Table of Contents
               Issue
               Introduction
               Russian Doll Design
               Salami Slice Design
               Russian Doll Design Characteristics
               Salami Slice Design Characteristics
               Venetian Blind Design
               Venetian Blind Design Characteristics
               Best Practice

Issue
When should an element or type be declared global versus when should it be declared local?

Introduction
[Recall that a component (element, complexType, or simpleType) is “global” if it is an
immediate child of <schema>, whereas it is “local” if it is not an immediate child of <schema>,
i.e., it is nested within another component.]

What advice would you give to someone who was to ask you, “In general, when should an
element (or type) be declared global versus when should it be declared local”? The purpose of
this chapter is to provide answers to that question.

Below is a snippet of an XML instance document. We will explore the different design strategies
using this example.

               <Book>
                 <Title>Illusions</Title>
                 <Author>Richard Bach</Author>
               </Book>

Russian Doll Design
This design approach has the schema structure mirror the instance document structure, e.g.,
declare a Book element and within it declare a Title element followed by an Author element:

               <xsd:element name=“Book”>
                 <xsd:complexType>
                   <xsd:sequence>
                     <xsd:element name=“Title” type=“xsd:string”/>
                     <xsd:element name=“Author” type=“xsd:string”/>
                   </xsd:sequence>
                 </xsd:complexType>
               </element>




                                            22
The instance document has all its components bundled together. Likewise, the schema is
designed to bundle together all its element declarations.

This design represents one end of the design spectrum.

Salami Slice Design
The Salami Slice design represents the other end of the design spectrum. With this design we
disassemble the instance document into its individual components. In the schema we define each
component (as an element declaration), and then assemble them together:

               <xsd:element name=“Title” type=“xsd:string”/>
               <xsd:element name=“Author” type=“xsd:string”/>
               <xsd:element name=“Book”>
                 <xsd:complexType>
                   <xsd:sequence>
                     <xsd:element ref=“Title”/>
                     <xsd:element ref=“Author”/>
                   </xsd:sequence>
                 </xsd:complexType>
               </xsd:element>

Note how the schema declared each component individually (Title, and Author) and then
assembled them together (by ref’ing them) in the creation of the Book component.

These two designs represent opposite ends of the design spectrum.

To understand these designs it may help to think in terms of boxes, where a box represents an
element or type:
• The Russian Doll design corresponds to having a single box, and it has nested within it boxes,
  which in turn have boxes nested within them, and so on. (boxes within boxes, just like a
  Russian Doll!)
• The Salami Slice design corresponds to having many separate boxes which are assembled
  together (separate boxes combined together, just like Salami slices brought together in a
  sandwich!)
Let’s examine the characteristics of each of the two designs. (In so doing it will yield insights
into another design.)

Russian Doll Design Characteristics
[1] Opaque content. The content of Book is opaque to other schemas, and to other parts of the
    same schema. The impact of this is that none of the types or elements within Book are
    reusable.
[2] Localized scope. The region of the schema where the Title and Author element declarations
    are applicable is localized to within the Book element. The impact of this is that if the
    schema has set elementFormDefault=“unqualified” then the namespaces of Title and Author
    are hidden (localized) within the schema.




                                                  23
[3] Compact. Everything is bundled together into a tidy, single unit.
[4] Decoupled. With this design approach each component is self-contained (i.e., they don’t
    interact with other components). Consequently, changes to the components will have limited
    impact. For example, if the components within Book changes it will have a limited impact
    since they are not coupled to components outside of Book.
[5] Cohesive. With this design approach all the related data is grouped together into self-
    contained components, i.e., the components are cohesive.
Salami Slice Design Characteristics
[1] Transparent content. The components which make up Book are visible to other schemas,
    and to other parts of the same schema. The impact of this is that the types and elements
    within Book are reusable.
[2] Global scope. All components have global scope. The impact of this is that, irrespective of
    the value of elementFormDefault, the namespaces of Title and Author will be exposed in
    instance documents.
[3] Verbose. Everything is laid out and clearly visible.
[4] Coupled. In our example we saw that the Book element depends on the Title and Author
    elements. If those elements were to change it would impact the Book element. Thus, this
    design produces a set of interconnected (coupled) components.
[5] Cohesive. With this design approach all the related data is also grouped together into self-
    contained components. Thus, the components are cohesive.
The two design approaches differ in a couple of important ways:
• The Russian Doll design facilitates hiding (localizing) namespace complexities. The Salami
  Slice design does not.
• The Salami Slice design facilitates component reuse. The Russian Doll design does not.
Is there a design which facilitates hiding (localizing) namespace complexities, and facilitates
component reuse? Yes there is!

Consider the Book example again. An alternative design is to create a global type definition
which nests the Title and Author element declarations within it:

               <xsd:complexType name=“Publication”>
                 <xsd:sequence>
                   <xsd:element name=“Title” type=“xsd:string”/>
                   <xsd:element name=“Author” type=“xsd:string”/>
                 </xsd:sequence>
               </xsd:complexType>
               <xsd:element name=“Book” type=“Publication”/>

This design has both benefits:
• it is capable of hiding (localizing) the namespace complexity of Title and Author, and
• it has a reusable Publication type component.




                                             24
Venetian Blind Design
With this design approach we disassemble the problem into individual components, as the Salami
Slice design does, but instead of creating element declarations, we create type definitions. Here’s
what our example looks like with this design approach:

               <xsd:simpleType name=“Title”>
                 <xsd:restriction base=“xsd:string”>
                   <xsd:enumeration value=“Mr.”/>
                   <xsd:enumeration value=“Mrs.”/>
                   <xsd:enumeration value=“Dr.”/>
                 </xsd:restriction>
               </xsd:simpleType>
               <xsd:simpleType name=“Name”>
                 <xsd:restriction base=“xsd:string”>
                   <xsd:minLength value=“1”/>
                 </xsd:restriction>
               </xsd:simpleType>
               <xsd:complexType name=“Publication”>
                 <xsd:sequence>
                   <xsd:element name=“Title” type=“Title”/>
                   <xsd:element name=“Author” type=“Name”/>
                 </xsd:sequence>
               </xsd:complexType>
               <xsd:element name=“Book” type=“Publication”/>

This design has:
• maximized reuse (there are four reusable components - the Title type, the Name type, the
  Publication type, and the Book element)
• maximized the potential to hide (localize) namespaces [note how this has been phrased:
  “maximized the potential ...“Whether, in fact, the namespaces of Title and Author are hidden
  or exposed, is determined by the elementFormDefault “switch"].
The Venetian Blind design espouses these guidelines ...
• Design your schema to maximize the potential for hiding (localizing) namespace complexities.
• Use elementFormDefault to act as a switch for controlling namespace exposure - if you want
  element namespaces exposed in instance documents, simply turn the elementFormDefault
  switch to “on" (i.e, set elementFormDefault= “qualified"); if you don’t want element
  namespaces exposed in instance documents, simply turn the elementFormDefault switch to
  “off" (i.e., set elementFormDefault=“unqualified").
• Design your schema to maximize reuse.
• Use type definitions as the main form of component reuse.
• Nest element declarations within type definitions.
Let’s compare the Venetian Blind design with the Salami Slice design. Recall our example:




                                                 25
Salami Slice Design:
               <xsd:element name=“Title" type=“xsd:string"/>
               <xsd:element name=“Author" type=“xsd:string"/>
               <xsd:element name=“Book">
                 <xsd:complexType>
                   <xsd:sequence>
                     <xsd:element ref=“Title"/>
                     <xsd:element ref=“Author" />
                   </xsd:sequence>
                 </xsd:complexType>
               </xsd:element>

The Salami Slice design also results in creating reusable (element) components, but it has
absolutely no potential for namespace hiding.

“However", you argue, “Suppose that I want namespaces exposed in instance documents. [We
have seen cases where this is desired.] So the Salami Slice design is a good approach for me.
Right?"

Let’s think about this for a moment. What if at a later date you change your mind and wish to
hide namespaces (what if your users hate seeing all those namespace qualifiers in instance
documents)? You will need to redesign your schema (possibly scraping it and starting over).

Better to adopt the Venetian Blind Design, which allows you to control whether namespaces are
hidden or exposed by simply setting the value of elementFormDefault. No redesign of your
schema is needed as you switch from exposing to hiding, or vice versa.

[That said ... your particular project may need to sacrifice the ability to turn on/off namespace
exposure because you require instance documents to be able to use element substitution. In such
circumstances the Salami Slice design approach is the only viable alternative.]

Here are the characteristics of the Venetian Blind Design.

Venetian Blind Design Characteristics:
[1] Maximum reuse. The primary component of reuse are type definitions.
[2] Maximum namespace hiding. Element declarations are nested within types, thus maximizing
    the potential for namespace hiding.
[3] Easy exposure switching. Whether namespaces are hidden (localized) in the schema or
    exposed in instance documents is controlled by the elementFormDefault switch.
[4] Coupled. This design generates a set of components which are interconnected (i.e.,
    dependent).
[5] Cohesive. As with the other designs, the components group together related data. Thus, the
    components are cohesive.




                                             26
Best Practice
[1] The Venetian Blind design is the one to choose where your schemas require the flexibility to
    turn namespace exposure on or off with a simple switch, and where component reuse is
    important.
[2] Where your task requires that you make available to instance document authors the option
    to use element substitution, then use the Salami Slice design.
[3] Where mimimizing size and coupling of components is of utmost concern then use the
    Russian Doll design.




                                                27
             Creating Extensible Content Models
Table of Contents
               Issue
               Definition
               Introduction
               Extensibility via Type Substitution
               Extensibility via the <any> Element
               Non-determinism and the <any> element
               Best Practice

Issue
What is Best Practice for creating extensible content models?

Definition
An element has an extensible content model if in instance documents the authors can extend the
contents of that element with additional elements beyond what was specified by the schema.

Introduction
             <xsd:element name= “Book”>
               <xsd:complexType>
                 <xsd:sequence>
                   <xsd:element name=“Title” type=“string”/>
                   <xsd:element name=“Author” type=“string”/>
                   <xsd:element name=“Date” type=“string”/>
                   <xsd:element name=“ISBN” type=“string”/>
                   <xsd:element name=“Publisher” type=“string”/>
                 </xsd:sequence>
               </xsd:complexType>
             </xsd:element>

This schema snippet dictates that in instance documents the <Book> elements must always be
comprised of exactly 5 elements <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. For
example:

             <Book>
                 <Title>The First and Last Freedom</TItle>
                 <Author>J. Krishnamurti</Author>
                 <Date>1954</Date>
                 <ISBN>0-06-0064831-7</ISBN>
                 <Publisher>Harper &amp; Row</Publisher>
             </Book>

The schema specifies a fixed/static content model for the Book element. Book’s content must
rigidly conform to just the schema specification. Sometimes this rigidity is a good thing.
Sometimes we want to give our instance documents more flexibility.

                                               64
How do we design the schema so that Book’s content model is extensible? Below are two
methods for implementing extensible content models.

Extensibility via Type Substitution
Consider this version of the above schema, where Book’s content model has been defined using a
type definition:

            <xsd:complexType name=“BookType”>
              <xsd:sequence>
                <xsd:element name=“Title” type=“xsd:string”/>
                <xsd:element name=“Author” type=“xsd:string”/>
                <xsd:element name=“Date” type=“xsd:string”/>
                <xsd:element name=“ISBN” type=“xsd:string”/>
                <xsd:element name=“Publisher” type=“xsd:string” />
              </xsd:sequence>
            </xsd:complexType>
            <xsd:element name=“BookCatalogue”>
              <xsd:complexType>
                <xsd:sequence>
                  <xsd:element name=“Book” type=“BookType”
                         maxOccurs=“unbounded”/>
                </xsd:sequence>
              </xsd:complexType>
            </xsd:element>

Recall that via the mechanism of type substitutability, the contents of <Book> can be substituted
by any type that derives from BookType.


                                   <Book>
                                     -- content --
                                   </Book>
For example, if a type is created which derives from BookType:

            <xsd:complexType name=“BookTypePlusReviewer”>
              <xsd:complexContent>
                <xsd:extension base=“BookType” >
                  <xsd:sequence>
                    <xsd:element name=“Reviewer” type=“xsd:string”/>
                  </xsd:sequence>
                </xsd:extension>
              </xsd:complexContent>
            </xsd:complexType>




                                               65
then instance documents can create a <Book> element that contains a <Reviewer> element,
along with the other five elements:

               <Book xsi:type=“BookTypePlusReviewer”>
                  <Title>My Life and Times</Title>
                  <Author>Paul McCartney</Author>
                  <Date>1998</Date>
                  <ISBN>94303-12021-43892</ISBN>
                  <Publisher>McMillin Publishing</Publisher>
                  <Reviewer>Roger Costello</Reviewer>
               </Book>

Thus, Book’s content model has been extended with a new element (Reviewer)!

In this example, BookTypePlusReviewer has been defined within the same schema as BookType.
In general, however, this may not be the case. Other schemas can import/include the
BookCatalogue schema and define types which derive from BookType. Thus, the contents of
Book may be extended, without modifying the BookCatalogue schema, as we see on the next
page:


              Extend a Schema, without Touching it!
                                                                            xmlns=" http://www.publishing.org"
        xmlns=" http://www.publishing.org"
                                                                            <xsd:complexType name="BookType">
                                                                               <xsd:sequence>
        <xsd:include schemaLocation="BookCatalogue.xsd"/>                          <xsd:element name="Title" type="xsd:string"/>
                                                                                   <xsd:element name="Author" type="xsd:string"/>
                                                                                   <xsd:element name="Date" type="xsd:year"/>
        <xsd:complexType name="BookTypePlusReviewer">                              <xsd:element name="ISBN" type="xsd:string"/>
           <xsd:complexContent>                                                    <xsd:element name="Publisher" type="xsd:string"/>
               <xsd:extension base="BookType" >                                </xsd:sequence>
                 <xsd:sequence>                                             </xsd:complexType>
                    <xsd:element name="Reviewer" type="xsd:string"/>
                 </xsd:sequence>                                            <xsd:element Book type="BookType"/>
               </xsd:extension>
           </xsd:complexContent>
        </xsd:complexType>
                                                                                      BookCatalogue.xsd

                    MyTypeDefinitions.xsd


And here’s what an instance document would look like:




                                                                       66
        xmlns="http://www.publishing.org"


        xsi:schemaLocation="http://www.publishing.org
                            MyTypeDefinitions.xsd"



        <Book xsi:type="BookTypePlusReviewer">
              <Title>The First and Last Freedom</Title>      We have type-substituted
              <Author>J. Krishnamurti</Author>               Book's content with the
              <Date>1954</Date>                              type specified in the new
              <ISBN>0-06-064831-7</ISBN>                     schema. Thus, we have
              <Publisher>Harper &amp; Row</Publisher>        extended BookCatalogue.xsd
              <Reviewer>Roger L. Costello</Reviewer>         without touching it!
        </Book>


This type substitutability mechanism is a powerful extensibility mechanism. However, it suffers
from two problems:

Disadvantages:
Location Restricted Extensibility: The extensibility is restricted to appending elements onto the
end of the content model (after the <Publisher> element). What if we wanted to extend <Book>
by adding elements to the beginning (before <Title>), or in the middle, etc? We can’t do it with
this mechanism.

Unexpected Extensibility: If you look at the declaration for Book:

            <xsd:element name=“Book” type=“BookType”
                   maxOccurs=“unbounded”/>

and the definition for BookType:

            <xsd:complexType name=“BookType”>
              <xsd:sequence>
                <xsd:element name=“Title” type=“xsd:string”/>
                <xsd:element name=“Author” type=“xsd:string”/>
                <xsd:element name=“Date” type=“xsd:gYear”/>
                <xsd:element name=“ISBN” type=“xsd:string”/>
                <xsd:element name=“Publisher” type=“xsd:string”/>
              </xsd:sequence>
            </xsd:complexType>




                                               67
it is easy to be fooled into thinking that in instance documents the <Book> elements will always
contain just <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. It is easy to forget that
someone could extend the content model using the type substitutability mechanism.
Extensibility is unexpected! Consequently, if you write a program to process BookCatalogue
instance documents, you may forget to take into account the fact that a <Book> element may
contain more than five children.

It would be nice if there was a way to explicitly flag places where extensibility may occur: “hey,
instance documents may extend <Book> at this point, so be sure to write your code taking this
possibility into account.” In addition, it would be nice if we could extend Book’s content model
at locations other than just the end ... The <any> element gives us these capabilities beautifully,
as is discussed in the next section.

Extensibility via the <any> Element
An <any> element may be inserted into a content model to enable instance documents to contain
additional elements. Here’s an example showing an <any> element at the end of Book’s content
model:

            <xsd:element name= “Book”>
              <xsd:complexType>
                <xsd:sequence>
                  <xsd:element name=“Title” type=“string”/>
                  <xsd:element name=“Author” type=“string”/>
                  <xsd:element name=“Date” type=“string”/>
                  <xsd:element name=“ISBN” type=“string”/>
                  <xsd:element name=“Publisher” type=“string”/>
                  <xsd:any namespace=”##any” minOccurs=”0”/>
                </xsd:sequence>
              </xsd:complexType>
            </xsd:element>

“The content of Book is Title, Author, Date, ISBN, Publisher and then (optionally) any well-
formed element. The new element may come from any namespace.”

Note the <any> element may be inserted at any point, e.g., it could be inserted at the top, in the
middle, etc.




                                                 68
In this version of the schema it has been explicitly specified that after the <Publication> element
any well-formed XML element may occur and that XML element may come from any
namespace. For example, suppose that the instance document author discovers a schema,
containing a declaration for a Reviewer element:


                   <xsd:element name="Reviewer">
                      <xsd:complexType>
                          <xsd:sequence>
                             <xsd:element name="Name">
                                 <xsd:complexType>
                                    <xsd:sequence>
                                        <xsd:element name="First" type="xsd:string"/>
                                        <xsd:element name="Last" type="xsd:string"/>
                                    </xsd:sequence>
                                 </xsd:complexType>
                             </xsd:element>
                          </xsd:sequence>
                      </xsd:complexType>
                   </xsd:element>



And suppose that for an instance document author it is important that, in addition to specifying
the Title, Author, Date, ISBN, and Publisher of each Book, he/she specify a Reviewer. Because
the schema has been designed with extensibility in mind, the instance document author can use
the Reviewer element in his/her BookCatalogue:


                   <Book>
                         <Title>The First and Last Freedom</Title>
                         <Author>J. Krishnamurti</Author>
                         <Date>1954</Date>
                         <ISBN>0-06-064831-7</ISBN>
                         <Publisher>Harper &amp; Row</Publisher>
                         <rev:Reviewer>
                             <rev:Name>
                                 <rev:Last>Costello</rev:Last>
                                 <rev:First>Roger</rev:First>
                             </rev:Name>
                         </rev:Reviewer>
                   </Book>

The instance document author has enhanced the instance document with an element that the
schema designer may have never even envisioned. We have empowered the instance author with
a great deal of flexibility in creating the instance document. Wow!



                                                  69
An alternate schema design is to create a BookType (as we did above) and embed the <any>
element within the BookType:


              <xsd:element name="Book">
                 <xsd:sequence>
                     <xsd:element name="Title" type="xsd:string"/>
                     <xsd:element name="Author" type="xsd:string"/>
                     <xsd:element name="Date" type="xsd:year"/>
                     <xsd:element name="ISBN" type="xsd:string"/>
                     <xsd:element name="Publisher" type="xsd:string"/>
                     <xsd:any namespace="##any" minOccurs="0"/>
                 </xsd:sequence>
              </xsd:element>

and then declare Book of type BookType:

                     <xsd:element Book type="BookType"/>
However, then we are then back to the “unexpected extensibility" problem. Namely, after the
<Publication> element any well-formed XML element may occur, and after that anything could
be present.

There is a way to control the extensibility and still use a type. We can add a block attribute to
Book:

           <xsd:element Book type="BookType" block="#all"/>
The block attribute prohibits derived types from being used in Book’s content model. Thus, by
this method we have created a reusable component (BookType), and yet we still have control
over the extensibility.

With the <any> element we have complete control over where, and how much extensibility we
want to allow. For example, suppose that we want to enable there to be at most two new elements
at the top of Book’s content model. Here’s how to specify that using the <any> element:




                                                 70
       <xsd:complexType name="Book">
            <xsd:sequence>
                 <xsd:any namespace="##other" minOccurs="0" maxOccurs="2"/>
                 <xsd:element name="Title" type="xsd:string"/>
                 <xsd:element name="Author" type="xsd:string"/>
                 <xsd:element name="Date" type="xsd:string"/
                 <xsd:element name="ISBN" type="xsd:string"/>
                 <xsd:element name="Publisher" type="xsd:string"/>
            </xsd:sequence>
       </xsd:complexType>

Note how the <any> element has been placed at the top of the content model, and it has set
maxOccurs=“2". Thus, in instance documents the <Book> content will always end with <Title>,
<Author>, <Date>, <ISBN>, and <Publisher>. Prior to that, two well-formed XML elements
may occur.

In summary:
• We can put the <any> element specifically “where” we desire extensibility.
• If we desire extensibility at multiple locations, we can insert multiple <any> elements.
• With maxOccurs we can specify “how much” extensibility we will allow.
Non-Determinism and the <any> element
In the above BookType definition we used an <any> element at the beginning of the content
model. We specified that the <any> element must come from an other namepace (i.e., it must not
be an element from the targetNamespace). If, instead, we had specified namespace=“##any” then
we would have gotten a “non-deterministic content model” error when validating an instance
document. Let’s see why.

A non-deterministic content model is one where, upon encountering an element in an instance
document, it is ambiguous which path was taken in the schema document. For example. Suppose
that we were to declare BookType using ##any, as follows:


      <xsd:complexType name="Book">
           <xsd:sequence>
                <xsd:any namespace="##any" minOccurs="0" maxOccurs="2"/>
                <xsd:element name="Title" type="xsd:string"/>
                <xsd:element name="Author" type="xsd:string"/>
                <xsd:element name="Date" type="xsd:string"/
                <xsd:element name="ISBN" type="xsd:string"/>
                <xsd:element name="Publisher" type="xsd:string"/>
           </xsd:sequence>
      </xsd:complexType>


                                                71
And suppose that we have this (snippet of an) instance document:



            <Book>
                  <Title>The First and Last Freedom</Title>
                  <Author>J. Krishnamurti</Author>
                  <Date>1954</Date>
                  <ISBN>0-06-064831-7</ISBN>
                  <Publisher>Harper &amp; Row</Publisher>
            </Book>

Let’s see what happens when a schema validator gets to the <Title> element in this instance
document. The schema validator must determine what this Title element declarationt this
corresponds to in the schema document. Do you see the ambiguity? There is no way no know,
without doing a look-ahead, whether the Title element comes from the <any> element, or comes
from the <xsd:element name=“Title” .../> declaration. This is a non-deterministic content model:
if your schema has a content model which would require a schema validator to “look-ahead” then
your schema is non-deterministic. Non-deterministic schemas are not allowed.

The solution in our example is to declare that the <any> element must come from an other
namespace, as was shown earlier. That works fine in this example where all the BookCatalogue
elements come from the targetNamespace, and the <any> element comes from a different
namespace. Suppose, however, that the BookCatalogue schema imported element declarations
from other namespaces. For example:


            <?xml version="1.0"?>
            <xsd:schema ...
                   xmlns:bk="http://www.books.com" …>
              <xsd:import namespace="http://www.books.com"
                     schemaLocation="Books.xsd"/>
              <xsd:complexType name="Book">
                     <xsd:sequence>
                            <xsd:any namespace="##other" minOccurs="0"/>
                            <xsd:element ref="bk:Title"/>
                            <xsd:element name="Author" type="xsd:string"/>
                            <xsd:element name="Date" type="xsd:string"/>
                            <xsd:element name="ISBN" type="xsd:string"/>
                            <xsd:element name="Publisher" type="xsd:string"/>
                     </xsd:sequence>
              </xsd:complexType>
            </xsd:schema>




                                              72
Now consider this instance document:

             <Book>
                 <bk:Title>The First and Last Freedom</bk:TItle>
                 <Author>J. Krishnamurti</Author>
                 <Date>1954</Date>
                 <ISBN>0-06-0064831-7</ISBN>
                 <Publisher>Harper &amp; Row</Publisher>
             </Book>

When a schema validator encounters bk:Title it will try to validate it against the appropriate
element declaration in the schema. But is this the Title refered to by the schema (i.e., in the http://
www.books.com namespace), or does this Title come from using the <any> element? It is
ambiguous, and consequently non-deterministic. Thus, this schema is also illegal.

As you can see, prohibiting non-deterministic content models makes the use of the <any>
element quite restricted. So, what do you do when you want to enable extensibility at arbitrary
locations? Answer: put in an optional <other> element and let its content be <any>. Here’s how
to do it:

                     <xsd:complexType name="Book">
                          <xsd:sequence>
                               <xsd:element name="other" minOccurs="0">
                                   <xsd:any namespace="##any" maxOccurs="2"/>
                               </xsd:element>
                               <xsd:element name="Title" type="xsd:string"/>
                               <xsd:element name="Author" type="xsd:string"/>
                               <xsd:element name="Date" type="xsd:string"/
                               <xsd:element name="ISBN" type="xsd:string"/>
                               <xsd:element name="Publisher" type="xsd:string"/>
                          </xsd:sequence>
                     </xsd:complexType>


Now, instance document authors have an explicit container element (<other>) in which to put
additional elements. This isn’t the most ideal solution, but it’s the best that we can do given the
rule that schemas may not have non-deterministic content models. Write to the XML Schema
working group and tell them that you want the prohibition of non-deterministic content models
revoked!

Best Practice
The <any> element is an enabling technology. It turns instance documents from static/rigid
structures into rich, dynamic, flexible data objects. It shifts focus from the schema designer to the
instance document author in terms of defining what data makes sense. It empowers instance
document authors with the ability to decide what data makes sense to him/her.

As a schema designer you need to recognize your limitations. You have no way of anticipating all
the varieties of data that an instance document author might need in creating an instance
document. Be smart enough to know that you’re not smart enough to anticipate all possible
needs! Design your schemas with flexibility built-in.

                                                  73
Definition: an open content schema is one that allows instance documents to contain additional
elements beyond what is declared in the schema. As we have seen, this may be achieved by using
the <any> (and <anyAttribute>) element in the schema.

Sprinkling <any> and <anyAttribute> elements liberally throughout your schema will yield
benefits in terms of how evolvable your schema is:

Enabling Schema Evolution using Open Content Schemas
In today’s rapidly changing market static schemas will be less commonplace, as the market
pushes schemas to quickly support new capabilities. For example, consider the cellphone
industry. Clearly, this is a rapidly evolving market. Any schema that the cellphone community
creates will soon become obsolete as hardware/software changes extend the cellphone
capabilities. For the cellphone community rapid evolution of a cellphone schema is not just a
nicety, the market demands it!

Suppose that the cellphone community gets together and creates a schema, cellphone.xsd.
Imagine that every week NOKIA sends out to the various vendors an instance document
(conforming to cellphone.xsd), detailing its current product set. Now suppose that a few months
after cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their cellphones - they
create new memory, call, and display features, none of which are supported by cellphone.xsd. To
gain a market advantage NOKIA will want to get information about these new capabilities to its
vendors ASAP. Further, they will have little motivation to wait for the next meeting of the
cellphone community to consider upgrades to cellphone.xsd. They need results NOW. How does
open content help? That is described next.

Suppose that the cellphone schema is declared “open". Immediately NOKIA can extend its
instance documents to incorporate data about the new features. How does this change impact the
vendor applications that receive the instance documents? The answer is - not at all. In the worst
case, the vendor’s application will simply skip over the new elements. More likely, however, the
vendors are showing the cellphone features in a list box and these new features will be
automatically captured with the other features. Let’s stop and think about what has been just
described Without modifying the cellphone schema and without touching the vendor’s
applications, information about the new NOKIA features has been instantly disseminated to the
marketplace! Open content in the cellphone schema is the enabler for this rapid dissemination.

Clearly some types of instance document extensions may require modification to the vendor’s
applications. Recognize, however, that thevendors are free to upgrade their applications in their
own time. The applications do not need to be upgraded before changes can be introduced into
instance documents. At the very worst, the vendor’s applications will simply skip over the
extensions. And, of course, those vendors do not need to upgrade in lock-step

To wrap up this example suppose that several months later the cellphone community reconvenes
to discuss enhancements to the schema. The new features that NOKIA first introduced into the
marketplace are then officially added into the schema. Thus completes the cycle. Changes to the
instance documents have driven the evolution of the schema.



                                                74
                Zero, One, or Many Namespaces?
Table of Contents
               Issue
               Introduction
               Example
                 Heterogeneous Namespace Design
                 Homogeneous Namespace Design
                 Chameleon Namespace Design

         Impact of Design Approach on Instance Documents
         <redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs
         Default Namespace and the Chameleon Namespace Design
         Avoiding Name Collisions with Chameleon Components
         Creating Tools for Chameleon Components
         Best Practice

Issue:
In a project where multiple schemas are created, should we give each schema a different
targetNamespace, or should we give all the schemas the same targetNamespace, or should some
of the schemas have no targetNamespace?



               Managing Multiple Schemas - Same or
                  Different targetNamespaces?


                                                      ...


           Schema-1.xsd           Schema-2.xsd                Schema-n.xsd


                          … or no targetNamespace?


                                             28
Introduction
In a typical project many schemas will be created. The schema designer is then confronted with
this issue: “shall I define one targetNamespace for all the schemas, or shall I create a different
targetNamespace for each schema, or shall I have some schemas with no targetNamespace?”
What are the tradeoffs? What guidance would you give someone starting on a project that will
create multiple schemas?

Here are the three design approaches for dealing with this issue:

            [1] Heterogeneous Namespace Design:
                give each schema a different targetNamespace
            [2] Homogeneous Namespace Design:
                give all schemas the same targetNamespace
            [3] Chameleon Namespace Design:
                give the “main” schema a targetNamespace and give no
                targetNamespace to the “supporting” schemas (the no-namespace
                supporting schemas will take-on the targetNamespace of the main
                schema, just like a Chameleon)
To describe and judge the merits of the three design approaches it will be useful to take an
example and see each approach “in action”.

Example: XML Data Model of a Company
Imagine a project which involves creating a model of a company using XML Schemas. One very
simple model is to divide the schema functionality along these lines:

            Company schema
              Person schema
              Product schema

“A company is comprised of people and products.”

Here are the company, person, and product schemas using the three design approaches.




                                                29
[1] Heterogeneous Namespace Design
This design approach says to give each schema a different targetNamespace, e.g.,



     <xsd:schema …                                      <xsd:schema …
         targetNamespace="A">                               targetNamespace="B">

           A.xsd                                                    B.xsd

                 <xsd:schema …
                          targetNamespace="C">
                     <xsd:import namespace="A"
                                 schemaLocation="A.xsd"/>
                     <xsd:import namespace="B"
                                 schemaLocation="B.xsd"/>
                     …
                 </xsd:schema>
                                     C.xsd


Below are the three schemas designed using this design approach. Observe that each schema has
a different targetNamespace.

Product.xsd


   <?xml version="1.0"?>
   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                targetNamespace="http://www.product.org"
                xmlns="http://www.product.org"
                elementFormDefault="qualified">
     <xsd:complexType name="ProductType">
       <xsd:sequence>
          <xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/>
       </xsd:sequence>
     </xsd:complexType>
   </xsd:schema>




                                              30
Person.xsd


             <?xml version="1.0"?>
             <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                           targetNamespace="http://www.person.org"
                          xmlns="http://www.person.org"
                          elementFormDefault="qualified">
               <xsd:complexType name="PersonType">
                 <xsd:sequence>
                    <xsd:element name="Name" type="xsd:string"/>
                    <xsd:element name="SSN" type="xsd:string"/>
                 </xsd:sequence>
               </xsd:complexType>
             </xsd:schema>


Company.xsd

                <?xml version="1.0"?>
                <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                              targetNamespace="http://www.company.org"
                              xmlns="http://www.company.org"
                             elementFormDefault="qualified"
                             xmlns:per="http://www.person.org"
                             xmlns:pro="http://www.product.org">
                  <xsd:import namespace="http://www.person.org"
                         schemaLocation="Person.xsd"/>
                  <xsd:import namespace="http://www.product.org"
                         schemaLocation="Product.xsd"/>
                  <xsd:element name="Company">
                     <xsd:complexType>
                       <xsd:sequence>
                         <xsd:element name="Person" type="per:PersonType" maxOccurs="unbounded"/>
                         <xsd:element name="Product" type="pro:ProductType" maxOccurs="unbounded"/>
                       </xsd:sequence>
                     </xsd:complexType>
                  </xsd:element>
                </xsd:schema>




Note the three namespaces that were created by the schemas:

              http://www.product.org
              http://www.person.org
              http://www.company.org




                                                       31
[2] Homogeneous Namespace Design
This design approach says to create a single, umbrella targetNamespace for all the schemas, e.g.,



  <xsd:schema …                                      <xsd:schema …
      targetNamespace="Library">                         targetNamespace="Library">
   LibraryBookCatalogue.xsd                                LibraryEmployees.xsd




     <xsd:schema …
              targetNamespace="Library">
         <xsd:include schemaLocation="LibraryBookCatalogue.xsd"/>
         <xsd:include schemaLocation="LibraryEmployees.xsd"/>
         …
     </xsd:schema>
                                       Library.xsd


Below are the three schemas designed using this approach. Observe that all schemas have the
same targetNamespace.

Product.xsd


       <?xml version="1.0"?>
       <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                    targetNamespace="http://www.company.org"
                    xmlns="http://www.product.org"
                    elementFormDefault="qualified">
         <xsd:complexType name="ProductType">
            <xsd:sequence>
              <xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/>
            </xsd:sequence>
         </xsd:complexType>
       </xsd:schema>




                                               32
Person.xsd


              <?xml version="1.0"?>
              <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                           targetNamespace="http://www.company.org"
                           xmlns="http://www.person.org"
                           elementFormDefault="qualified">
                <xsd:complexType name="PersonType">
                   <xsd:sequence>
                     <xsd:element name="Name" type="xsd:string"/>
                     <xsd:element name="SSN" type="xsd:string"/>
                   </xsd:sequence>
                </xsd:complexType>
              </xsd:schema>


Company.xsd


      <?xml version="1.0"?>
      <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                   targetNamespace="http://www.company.org"
                   xmlns="http://www.company.org"
                   elementFormDefault="qualified">
        <xsd:include schemaLocation="Person.xsd"/>
        <xsd:include schemaLocation="Product.xsd"/>
        <xsd:element name="Company">
           <xsd:complexType>
             <xsd:sequence>
                <xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/>
                <xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/>
             </xsd:sequence>
           </xsd:complexType>
        </xsd:element>
      </xsd:schema>


Note that all three schemas have the same targetNamespace:

              http://www.company.org

Also note the mechanism used for accessing components in other schemas which have the same
targetNamespace: <include>. When accessing components in a schema with a different
namespace the <import> element is used, as we saw above in the Heterogeneous Design.




                                             33
[3] Chameleon Namespace Design
This design approach says to give the “main” schema a targetNamespace, and the “supporting”
schemas have no targetNamespace, e.g.,



          <xsd:schema …>                                      <xsd:schema …>

            Q.xsd                                                   R.xsd




             <xsd:schema …
                      targetNamespace="Z">
                 <xsd:include schemaLocation="Q.xsd"/>
                 <xsd:include schemaLocation="R.xsd"/>
                 …
             </xsd:schema>
                                       Z.xsd


In our example, the company schema is the main schema. The person and product schemas are
supporting schemas. Below are the three schemas using this design approach:

Product.xsd (no targetNamespace)

     <?xml version="1.0"?>
     <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                  elementFormDefault="qualified">
       <xsd:complexType name="ProductType">
         <xsd:sequence>
            <xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/>
         </xsd:sequence>
       </xsd:complexType>
     </xsd:schema>




                                             34
Person.xsd (no targetNamespace)


     <?xml version="1.0"?>
     <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                  elementFormDefault="qualified">
       <xsd:complexType name="PersonType">
         <xsd:sequence>
            <xsd:element name="Name" type="xsd:string" minOccurs="1" maxOccurs="1"/>
            <xsd:element name="SSN" type="xsd:string" minOccurs="1" maxOccurs="1"/>
         </xsd:sequence>
       </xsd:complexType>
     </xsd:schema>


Company.xsd (main schema, uses the no-namespace-schemas)


       <?xml version="1.0"?>
       <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                    targetNamespace="http://www.company.org"
                    xmlns="http://www.company.org"
                    elementFormDefault="qualified">
         <xsd:include schemaLocation="Person.xsd"/>
         <xsd:include schemaLocation="Product.xsd"/>
         <xsd:element name="Company">
            <xsd:complexType>
              <xsd:sequence>
                 <xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/>
                 <xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/>
              </xsd:sequence>
            </xsd:complexType>
         </xsd:element>
       </xsd:schema>


There are two things to note about this design approach:

First, as shown above, a schema is able to access components in schemas that have no
targetNamespace, using <include>. In our example, the company schema uses the components in
Product.xsd and Person.xsd (and they have no targetNamespace).

Second, note the chameleon-like characteristics of schemas with no targetNamespace:
– The components in the schemas with no targetNamespace get namespace-coerced. That is, the
  components “take-on” the targetNamespace of the schema that is doing the <include>.
For example, ProductType in Products.xsd gets implicitly coerced into the company
targetNamespace.

                                               35
“Chameleon effect” ... This is a term coined by Henry Thompson to describe the ability of
components in a schema with no targetNamespace to take-on the namespace of other schemas.
This is powerful!

Impact of Design Approach on Instance Documents
Above we have shown how the schemas would be designed using the three design approaches.
Let’s turn now to the instance document. Does an instance document differ depending on the
design approach? All of the above schemas have been designed to expose the namespaces in
instance documents (as directed by: elementFormDefault=“qualified”). If they had instead all
used elementFormDefault=“unqualified” then instance documents would all have this form:


                   <?xml version="1.0"?>
                   <c:Company xmlns:c="http://www.company.org"
                               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                               xsi:schemaLocation=
                                       "http://www.company.org
                                        Company.xsd">
                        <Person>
                             <Name>John Doe</Name>
                             <SSN>123-45-6789</SSN>
                        </Person>
                        <Product>
                             <Type>Widget</Type>
                        </Product>
                   </c:Company>


It is when the schemas expose their namespaces in instance documents that differences appear. In
the above schemas, they all specified elementFormDefault=“qualified”, thus exposing their
namespaces in instance documents. Let’s see what the instance documents look like for each
design approach:

[1] Company.xml (conforming to the multiple targetNamespaces version)


                   <?xml version="1.0"?>
                   <Company xmlns="http://www.company.org"
                              xmlns:per="http://www.person.org"
                              xmlns:prod="http://www.product.org"
                              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                              xsi:schemaLocation=
                                       "http://www.company.org
                                        Company.xsd">
                       <Person>
                            <per:Name>John Doe</per:Name>
                            <per:SSN>123-45-6789</per:SSN>
                       </Person>
                       <Product>
                            <prod:Type>Widget</prod:Type>
                       </Product>
                   </Company>



                                                  36
Note that:
– there needs to be a namespace declaration for each namespace
– the elements must all be uniquely qualified (explicitly or with a default namespace)
[2] Company.xml (conforming to the single, umbrella targetNamespace version)


                  <?xml version="1.0"?>
                  <Company xmlns="http://www.company.org"
                             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                             xsi:schemaLocation=
                                     "http://www.company.org
                                      Company.xsd">
                      <Person>
                           <Name>John Doe</Name>
                           <SSN>123-45-6789</SSN>
                      </Person>
                      <Product>
                           <Type>Widget</Type>
                      </Product>
                  </Company>



Since all the schemas are in the same namespace the instance document is able to take advantage
of that by using a default namespace.

[3] Company.xml (conforming to the main targetNamespace with supporting no-
targetNamespace version)


                  <?xml version="1.0"?>
                  <Company xmlns="http://www.company.org"
                             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                             xsi:schemaLocation=
                                      "http://www.company.org
                                       Company.xsd">
                      <Person>
                           <Name>John Doe</Name>
                           <SSN>123-45-6789</SSN>
                      </Person>
                      <Product>
                           <Type>Widget</Type>
                      </Product>
                  </Company>


Both of the schemas that have no targetNamespace take on the the company targetNamespace
(ala the Chameleon effect). Thus, all components are in the same targetNamespace and the
instance document takes advantage of this by declaring a default namespace.




                                                37
<redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs
The <redefine> element is used to enable access to components in another schema, while
simultaneously giving the capability to modify zero or more of the components. Thus, the
<redefine> element has a dual functionality:
– it does an implicit <include>. Thus it enables access to all the components in the referenced
  schema
– it enables you to redefine zero or more of the components in the referenced schema, i.e.,
  extend or restrict components
Example. Consider again the Company.xsd schema above. Suppose that it wishes to use
ProductType in Product.xsd. However, it would like to extend ProductType to include a product
ID. Here’s how to do it using redefine:



            <?xml version="1.0"?>
            <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                         targetNamespace="http://www.company.org"
                         xmlns="http://www.company.org"
                         elementFormDefault="qualified">
              <xsd:include schemaLocation="Person.xsd"/>
              <xsd:redefine schemaLocation="Product.xsd">
                <xsd:complexType name="ProductType">
                   <xsd:complexContent>
                     <xsd:extension base="ProductType">
                        <xsd:sequence>
                          <xsd:element name="ID" type="xsd:ID"/>
                        </xsd:sequence>
                     </xsd:extension>
                   </xsd:complexContent>
                </xsd:complexType>
              </xsd:redefine>
              <xsd:element name="Company">
                <xsd:complexType>
                   <xsd:sequence>
                     <xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/>
                     <xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/>
                   </xsd:sequence>
                </xsd:complexType>
              </xsd:element>
            </xsd:schema>




Now the <Product> element in instance documents will contain both <Type> and <ID>, e.g.,




                                                  38
                  <?xml version="1.0"?>
                  <Company xmlns="http://www.company.org"
                             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                             xsi:schemaLocation=
                                    "http://www.company.org
                                     Company.xsd">
                      <Person>
                           <Name>John Doe</Name>
                           <SSN>123-45-6789</SSN>
                      </Person>
                      <Product>
                           <Type>Widget</Type>
                           <ID>1001-1-00</ID>
                      </Product>
                  </Company>


The <redefine> element is very powerful. However, it can only be used with schemas with the
same targetNamespace or with no targetNamespace. Thus, it only applies to the Homogenous
Namespace Design and the Chameleon Namespace Design.

Name collisions
When a schema uses Chameleon components those components become part of the including
schema’s targetNamespace, just as though the schema author had typed the element declarations
and type definitions inline. If the schema <include>s multiple no-namespace schemas then there
will be a chance of name collisions. In fact, the schema may end up not being able to use some of
the no-namespace schemas because their use results in name collisions with other Chameleon
components. To demonstrate the name collision problem, consider this example:

Suppose that there are two schemas with no targetNamespace:

            1.xsd
               A
               B
            2.xsd
             A
             C

Schema 1 creates no-namespace elements A and B. Schema 2 creates no-namespace elements A,
and C. Now if schema 3 <include>s these two no-namespace schemas there will be a name
collision:

            3.xsd
            targetNamespace=“http://www.example.org”
            <include schemaLocation=“1.xsd”/>
            <include schemaLocation=“2.xsd”/>




                                                39
This schema has a name collision - A is defined twice. [Note: it’s not an error to have two
elements in the same symbol space, provided they have the same type. However, if they have a
different type then it is an error, i.e., name collision.]

Namespaces are the standard way of avoiding such collisions. Above, if instead the components
in 1.xsd and 2.xsd resided in different namespaces then 3.xsd could have <import>ed them and
there would be no name collision. [Recall that two elements/types can have the same name if the
elements/types are in different namespaces.]

How do we address the name collision problem that the Chameleon design presents? That’s next.

Resolving Namespace Collisions using Proxy Schemas
There is a very simple solution to the namespace collision problem: for each no-namespace
schema create a companion namespaced-schema (a “proxy schema”) that <include>s the no-
namespace schema. Then, the main schema <import>s the proxy schemas.


                  <xsd:schema …>                                 <xsd:schema …>

                       Q.xsd                                          R.xsd


       <xsd:schema …                                    <xsd:schema …
                targetNamespace="Z1">                            targetNamespace="Z2">
           <xsd:include schemaLocation="Q.xsd"/>            <xsd:include schemaLocation="R.xsd"/>

               Q-Proxy.xsd                                                R-Proxy.xsd



                <xsd:schema …         targetNamespace="Z">
                    <xsd:import namespace="Z1" schemaLocation="Q-Proxy.xsd"/>
                    <xsd:import namespace="Z2" schemaLocation="R-Proxy.xsd"/>
                    …
                </xsd:schema>

                                               Z.xsd



With this approach we avoid name collisions. This design approach has the added advantage that
it also enables the proxy schemas to customize the Chameleon components using <redefine>.

Thus, this approach is a two-step process:
               Create the Chameleon schemas
               Create a proxy schema for each Chameleon schema

The “main” schema <import>s the proxy schemas.

The advantage of this two-step approach is that it enables applications to decide on a domain
(namespace) for the components that it is reusing. Furthermore, applications are able to refine/



                                                   40
customize the Chameleon components. This approach requires an extra step (i.e., creating proxy
schemas) but in return it provides a lot of flexibility.

Contrast the above two-step process with the below one-step process where the components are
assigned to a namespace from the very beginning:

            1-fixed.xsd
            targetNamespace=“http://www.1-fixed.org”
               A
               B
            2-fixed.xsd
            targetNamespace=“http://www.2-fixed.org”
             A
             C
            main.xsd
            targetNamespace=“http://www.main.org”
            <xsd:import namespace=“http://www.1-fixed.org”
                  schemaLocation=“1-fixed.xsd”/>
            <xsd:import namespace=“http://www.2-fixed.org”
                  schemaLocation=“2-fixed.xsd”/>

This achieves the same result as the above two-step version. In this example, the components are
not Chameleon. Instead, A, B, and C were hardcoded with a namespace from the very beginning
of their life. The downside of this approach is that if main.xsd wants to <redefine> any of the
elements it cannot. Also, applications are forced to use a domain (namespace) defined by
someone else. These components are in a rigid, static, fixed namespace.

Creating Tools for Chameleon Components
Tools for Chameleon Components
We have seen repeatedly how Chameleon components are able to blend in with the schemas that
use them. That is, they adopt the namespace of the schema that <include>s them.

                                           <xsd:schema …>




             <xsd:schema …                               <xsd:schema …
                      targetNamespace="Z1">                    targetNamespace="Z2">
                 <xsd:include schemaLocation="Q.xsd"/>    <xsd:include schemaLocation=”Q.xsd"/>




                 Chameleon components take-on the namespace of the <include>ing schema




                                                    41
How do you write tools for components that can assume so many different faces (namespaces)?




                                     <xsd:schema …>                Tool
                                                                     ?


      <xsd:schema …                                <xsd:schema …
               targetNamespace="Z1">                     targetNamespace="Z2">
          <xsd:include schemaLocation="Q.xsd"/>     <xsd:include schemaLocation=”Q.xsd"/>




               How does a tool identify components that can assume many faces?
               Certainly not by namespaces.


Consider this no-namespace schema:

            1.xsd
              A
              B

Suppose that we wish to create a tool, T, which must process the two Chameleon components A
and B, regardless of what namespace they reside in. The tool must be able to handle the
following situation: imagine a schema, main.xsd, which <include>s 1.xsd. In addition, suppose
that main.xsd has its own element called A (in a different symbol space, so there’s no name
collision). For example:

            main.xsd
            targetNamespace=“http://www.example.org”
            <include schemaLocation=“1.xsd”/>
            <element name=“stuff”>
              <complexType>
               <sequence>
                 <element name=“A” type=“xxx”/>
                 ...
               </sequence>
              </complexType>
            </element>

How would the tool T be able to distinguish between the Chameleon component A and the local
A in an instance document?




                                              42
Chameleon Component Identification
One simple solution is that when you create Chameleon components assign them a global unique
id (a GUID). The XML Schema spec allows you to add an attribute, id, to all element, attribute,
complexType, and simpleType components.

             <xsd:element name="Lat_Lon"
                          id="http://www.geospacial.org"
                          …
             </xsd:element>

            Each component (element, complexType, simpleType, attribute)
            in a schema can have an associated id attribute. This can be used
            to uniquely identify each Chameleon component, regardless of
            its namespace.

Note that the id attribute is purely local to the schema. There is no representation in the instance
documents. This id attribute could be used by a tool to “locate” a Chameleon component,
regardless of what “face” (namespace) it currently wears. That is, the tool can open up an
instance document using DOM, and the DOM API will provide the tool access to the id value for
all components in the instance document.


                                       <xsd:schema …>
                                       id="www.geospacial.org"
                                                                            Tool



    <xsd:schema …                                        <xsd:schema …
             targetNamespace="Z1">                             targetNamespace="Z2">
        <xsd:include schemaLocation="Q.xsd"/>             <xsd:include schemaLocation=”Q.xsd"/>


             id="www.geospacial.org"                              id="www.geospacial.org"




      A tool can locate the Chameleon component by using the id attribute.

Best Practice
Above we explored the “design space” for this issue. We looked at the three design approaches in
action, both schemas and instance documents. So which design is better? Under what
circumstances?



                                                   43
When you are reusing schemas that someone else created you should <import> those schemas,
i.e., use the Heterogeneous Namespace design. It is a bad idea to copy those components into
your namespace, for two reasons: (1) soon your local copies would get out of sync with the other
schemas, and (2) you lose interoperability with any existing applications that process the other
schema’s components. The interesting case (the case we have been considering throughout this
discussion) is how to deal with namespaces in a collection of schemas that you created. Here’s
our guidelines for this case:

Use the Chameleon Design:
– with schemas which contain components that have no inherent semantics by themselves,
– with schemas which contain components that have semantics only in the context of an
  <include>ing schema,
– when you don’t want to hardcode a namespace to a schema, rather you want <include>ing
  schemas to be able to provide their own application-specific namespace to the schema
Example. A repository of components - such as a schema which defines an array type, or vector,
linked list, etc - should be declared with no targetNamespace (i.e., Chameleon).

As a rule of thumb, if your schema just contains type definitions (no element declarations) then
that schema is probably a good candidate for being a Chameleon schema.

Use the Homogeneous Namespace Design
• when all of your schemas are conceptually related
• when there is no need to visually identify in instance documents the origin/lineage of each
  element/attribute. In this design all components come from the same namespace, so you loose
  the ability to identify in instance documents that “element A comes from schema X”.
  Oftentimes that’s okay - you don’t want to categorize elements/attributes differently. This
  design approach is well suited for those situations.
Use the Heterogeneous Namespace Design
• when there are multiple elements with the same name. (Avoid name collision)
• when there is a need to visually identify in instance documents the origin/lineage of each
  element/attribute. In this design the components come from different namespaces, so you have
  the ability to identify in instance documents that “element A comes from schema X”.
Lastly, as we have seen, in a schema each component can be uniquely identified with an id
attribute (this is NOT the same as providing an id attribute on an element in instance documents.
We are talking here about a schema-internal way of identifying each schema component.)
Consider identifying each schema component using the id attribute. This will enable a finer
degree of traceability than is possible using namespaces. The combination of namespaces plus
the schema id attribute is a powerful tandem for visually and programmatically identifying
components.




                                               44

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:11/14/2012
language:English
pages:57