Comparing RDMS to XML

Document Sample
Comparing RDMS to XML Powered By Docstoc
					Comparing RDMS to XML

Understanding the Differences
              David Plotkin
          Data Quality Manager
   Wells Fargo Consumer Credit Group
    David.Plotkin@WellsFargo.com
             What we will cover
XML from a Data Practitioner’s standpoint
Relational databases and schemas
A comparison of relational and XML architecture
What each technology is good at – and when to use it.
Relational metadata and XML metadata
How to navigate a relational schema vs. an XML schema
Gathering data (queries) in a database vs. an XML document
Differences in flexibility between the two
Cost considerations
Data integrity considerations
So…what do we do?
XML: A Data Practitioner’s View (1)
 XML is a specification for designing tag-
 based languages.
 The specification allows for:
   Metadata (XML Schemas) that define
     Valid data structures
     Defining of user data types
     Valid lists of values, ranges, and patterns
     Optionality/cardinality
   Elements and Attributes
   Reusability of data, data types, and schemas
   Creation of instance documents based on an
   XML Schema
XML: A Data Practitioner’s View (2)
 The potential exists for:
   Industries to agree on an XML-based language
   for data exchange.
   Exchange of XML instance documents between
   trading partners
 An entire industry has grown up around:
   Providing XML tools (and repositories)
   Ongoing development of standards
 XML has gained very wide acceptance!
                                XML Content
    Here is an easy-to-understand sample of XML:
<?xml version = "1.0" encoding = "UTF-8"?>
<DataTransmission xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation = "PatientSearchResponse.xsd" Source = "CentralPatient" Target = "Store042"
MsgTypeCode = "PSRS" MsgTypeDesc = "PatientSearchResponse" FoundFlag = "true">
           <PatientList>
                        <PatientIdFields>
                                     <FirstName>Maria</FirstName>
                                     <LastName>Montes</LastName>
                                     <Birthdate>1951-11-05</Birthdate>
                                     <Gender>F</Gender>
                                     <StreetAddress>1969 Ygnacio Valley Road</StreetAddress>
                                     <CityAddress>Walnut Creek</CityAddress>
                                     <StateCode>CA</StateCode>
                                     <ZipCode>94597</ZipCode>
                                     <PatientID>109993345</PatientID>
                        </PatientIdFields>
           </PatientList>
</DataTransmission>
     The XML Document(1)
An XML document is:
  A self-contained set of structured (“tagged”) data
  Decoupled from the source systems
    Tagged data in the document does not change when
    the source system data changes
    The document represents a snapshot of the source
    system data at a particular moment in time.
  Packaged with the metadata (tags) for a specific
  business transaction
    The XML Document (2)
An XML Document has:
 Semantic content
   Values (tagged data)
   Metadata (the tags)
 Order
 Hierarchy
 Structure
        The XML Schema
An XML document is usually (but not
always) validated by an XML Schema.
The XML Schema provides the information
on whether the XML document “followed
the rules” set up in the XML Schema.
An XML Schema is an agreement between
the sender and the receiver of a document as
to the structure of that document.
       Elements vs. Attributes
Elements:
  Basic building blocks of XML
  Contain content which can be a structure
Attributes
  Specify additional information about an element.
  Contain only simple type content
Some data could be either an Element or an
Attribute (so you need standards on how to
decide which to use).
   Element & Attribute in XML Schema
        Element and Attribute declaration:
<xsd:element name = "DataTransmission">
           <xsd:complexType>
                       <xsd:sequence>
                                   <xsd:element ref = "FirstName" minOccurs = "0"/>
                                   <xsd:element ref = "LastName" minOccurs = "0"/>
                                   <xsd:element ref = "Phone" minOccurs = "0"/>
                                   <xsd:element ref = "Birthdate" minOccurs = "0"/>
                                   <xsd:element ref = "Gender" minOccurs = "0"/>
                                   <xsd:element ref = "StreetAddress" minOccurs = "0"/>
                                   <xsd:element ref = "CityAddress" minOccurs = "0"/>
                                   <xsd:element ref = "StateCode" minOccurs = "0"/>
                                   <xsd:element ref = "ZipCode" minOccurs = "0"/>
                                   <xsd:element ref = "SSN" minOccurs = "0"/>
                                    <xsd:element name = "SafetyCapDate" type = "xsd:date"/>
                       </xsd:sequence>
                       <xsd:attribute name = "Source" use = "required" type = "xsd:string"/>
                       <xsd:attribute name = "Target" use = "required" type = "xsd:string"/>
                       <xsd:attribute name = "MsgTypeCode" use = "required" type = "MsgTypeCodeType"/>
                       <xsd:attribute name = "MsgTypeDesc" use = "required" type = "xsd:string"/>
            </xsd:complexType>
</xsd:element>
   Element and Attribute XML Document
         Element and Attribute content:
<DataTransmission xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation
= "PatientSearchRequest.xsd" Source = "Store599" Target = "CentralPatient" MsgTypeCode = "PSRQ"
MsgTypeDesc = "PatientSearchRequest">
           <FirstName>Maria</FirstName>
           <LastName>Montes</LastName>
           <Birthdate>1951-11-05</Birthdate>
           <Gender>F</Gender>
           <StreetAddress>1969 Ygnacio Valley Road</StreetAddress>
           <CityAddress>Walnut Creek</CityAddress>
           <StateCode>CA</StateCode>
           <ZipCode>94597</ZipCode>
           <SSN>561-88-9208</SSN>
           <SafetyCapDate>2001-05-22</SafetyCapDate>
</DataTransmission>
Simple data types in an XML Schema
 Comes with “atomic” simple data types
   Integer, boolean, date, decimal, string, etc.
 You can build user-defined simple data types
   Built on the included “atomic” data types
   Allows declaration of
     valid values, ranges, Patterns, Length, total digits
     And more…
   Attributes or Elements can be of a simple data
   type (either atomic or user-defined).
                 Simple data type examples
<xsd:simpleType name = "SevenPlaceInteger">
            <xsd:restriction base = "xsd:integer">       builds on atomic simple data type
                         <xsd:totalDigits value = "7"/>
            </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name = "GenderType">
            <xsd:restriction base = "xsd:string">
                         <xsd:enumeration value = "M"/>
                         <xsd:enumeration value = "F"/>
                         <xsd:length value = "1"/>
            </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name = "RelationshipCodeType">
            <xsd:restriction base = "xsd:string">
                         <xsd:enumeration value = "self"/>
                         <xsd:enumeration value = "spouse"/>
                         <xsd:enumeration value = "dependent"/>
                         <xsd:enumeration value = "other"/>
            </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name = "SevenPlacePositiveInteger">         builds on custom simple data type
            <xsd:restriction base = "SevenPlaceInteger">
                         <xsd:minInclusive value = "0"/>
            </xsd:restriction>
</xsd:simpleType>
Complex data types in XML Schema
Builds a structure of Elements.
Each subelement is either a simple data type or
another structure of Elements.
Only Elements can be of a complex data type.
Can be named and reusable or anonymous and
used only by a single Element.
Can be an extension or restriction of another
complex type.
                 Complex data type examples
<xsd:complexType name = "AddressType">        declaration of named complex data type
           <xsd:sequence>
                       <xsd:element ref = "StreetAddress"/>
                       <xsd:element ref = "CityAddress"/>
                       <xsd:element ref = "StateCode"/>
           </xsd:sequence>
</xsd:complexType>
<xsd:element name = "WorkAddress" type = "AddressType"/> association of Element with named complex data type

<xsd:complexType name = "AddressWithCountryType"> new complex data type extends existing complex data type
            <xsd:complexContent>
                       <xsd:extension base = "AddressType">
                                    <xsd:sequence>
                                                <xsd:element name = "CountryCode" type = "xsd:string"/>
                                    </xsd:sequence>
                       </xsd:extension>
            </xsd:complexContent>
</xsd:complexType>
<xsd:element name = "PatientInsurance"> element with anonymous complex data type
            <xsd:complexType>
                       <xsd:sequence>
                                    <xsd:element ref = "Patient"/>
                                    <xsd:element ref = "TPMembership" minOccurs = "0" maxOccurs = "unbounded"/>
                       </xsd:sequence>
            </xsd:complexType>
</xsd:element>
           Using the XML Schema


 Source                                 Target
database                               database

       Data                         Data


Extract                                 Parse
program          XML                   program
               Document   Network


  XML                                   XML
 Schema                                Schema
                      Reusing XML Schemas

    Statecode.xsd                                                 XML Schemas can
                                                                  build on each other to
                                                                  provide reusability.
       Base                     <xsd:include schemaLocation =
                                "StateCodes.xsd"/>
  Definitions.xsd




     Patient                           Patient                            Patient
     Search                            Search                             Update
   Request.xsd                      Response.xsd                        Request.xsd
<xsd:include schemaLocation =     <xsd:include schemaLocation =     <xsd:include schemaLocation =
"BaseDefinitions.xsd"/>           "BaseDefinitions.xsd"/>           "BaseDefinitions.xsd"/>
                  An XML Schema example
<xsd:element name = "Patient">
              <xsd:complexType>
                            <xsd:sequence>
                                           <xsd:element ref = "PatientID"/>
                                           <xsd:element ref = "FirstName"/>
                                           <xsd:element ref = "LastName"/>
                                           <xsd:element ref = "Birthdate"/>
                                           <xsd:element ref = "Gender"/>
                                           <xsd:element name = "PrivacyInd" type = "xsd:boolean"/>
                                           <xsd:element ref = "StreetAddress"/>
                                           <xsd:element ref = "CityAddress"/>
                                           <xsd:element ref = "StateCode"/>
                                           <xsd:element ref = "ZipCode"/>
                                           <xsd:element ref = "EMailAddress" minOccurs = "0"/>
                                           <xsd:element ref = "SSN" minOccurs = "0"/>
                                           <xsd:element name = "HIPAANotifInd" type = "xsd:boolean"/>
                                           <xsd:element name = "SafetyCapInd" type = "xsd:boolean"/>
                                           <xsd:element name = "SafetyCapDate" type = "xsd:date"/>
                                           <xsd:element ref = "StatusCode"/>
                                           <xsd:element ref = "Doctor" minOccurs = "0"/>
                                           <xsd:element ref = "Phone" maxOccurs = "unbounded"/>
                                           <xsd:element ref = "PatDrugAllergy" minOccurs = "0" maxOccurs = "unbounded"/>
                                           <xsd:element ref = "OtherDrugTaken" minOccurs = "0" maxOccurs = "unbounded"/>
                            </xsd:sequence>
                            <xsd:attribute name = "PrivacyInd" type = "xsd:boolean"/>
              </xsd:complexType>
</xsd:element>
The Structure of an XML Schema
               •Elements in an XML Schema
               are hierarchical.
               •To expand the hierarchy with
               this tool (Tibco’s XML
               Authority), click here.
Expanding an Element
                This is the result you
                get – you can now see
                the elements that
                make up the structure
                of the the
                OtherDrugTaken
                element.
      Managing XML Schemas
Avoid chaos by managing XML metadata
across the Enterprise:
  Create reusable base definition schemas
  Create and document:
    Widely used elements (with their attributes)
    Complex data types
    Simple data types
  Keep track of which schemas use other schemas
  Keep track of which documents are validated by
  which schemas (XML repository).
             XML Architecture
  XML is hierarchical:
                                            This hierarchy is
               Patient
                                            useful for starting
                                            with a patient and
Insurance    Prescription          Doctor   finding all the
                                            information about
                                            them – such as a
    Drug           Fills           Doctor   list of their
                                            prescriptions and
                                            when the
                                            prescriptions were
            Drug           Claim
                                            filled.
       XML Hierarchy Revisited
  The Hierarchy can change depending on
  what the XML document is used for.
          Prescription                            Each version of
                                                  the hierarchy
                                                  serves a different
Drug       Fills           Doctor     Patient     purpose

                               This version of the hierarchy is
   Drug            Claim       useful for starting with a
                               Prescription and finding all the
                               information about it, including the
                               Patient and Drug.
      Database Architecture
Database architecture is relational:
  Normalized to eliminate data redundancy
  Join on any two columns that have the same
  data type.
  Foreign keys can enforce data integrity
Relational Metadata – the Schema
Relational metadata is stored in the database
  Database control tables fully define the structure
  of the database.
  Without the DBMS metadata the contents of the
  database are worthless.
  Completely self-contained (not reusable)
  Tables are structured, each column is a “bucket”
  for a specific kind of data
  In most databases, the metadata does not include
  descriptions, so a Data Dictionary is necessary.
 XML Metadata – the Document
Metadata built into the document
  Every element has a tag to tell you where the
  data is stored in the document.
  Descriptive tags give structure to the document
  and tell you what the data means (sort of).
  “Sort of” because it only tells the tag name, so
  this only has meaning to someone who already
  understands what the element or attribute
  means.
  Document cannot be parsed for storage on its
  own. What else is needed?…
  XML Metadata – the Schema
An XML Schema (or DTD) is needed to:
  Provide standardization (basis of agreement)
  Allow meaningful parsing and data storage
  Specify agreement on document structure
A data dictionary is still necessary to
provide definition for Elements and
Attributes
Without an XML Schema, a document is
essentially only good for transmitting
blocks of data for humans to read.
  Comparing XML to RDMS Metadata
      An XML Schema establishes the valid structure of an
      XML document, like a database schema establishes the
      valid structure of a database.

         Database Architecture                XML Architecture


•Data Model
     •Cardinality                   •XML Schema
     •Optionality                      •Cardinality
     •Data Type                        •Optionality
     •Describes Relationships          •Data Type
     •Has Metadata                     •Describes Relationships
     •Generate DDL                     •Has Metadata
                                       •Valid XML
                                       •Sequence of Elements

         Final Result: Database        Final Result: Valid XML Document
What are relational databases good at?
  Data Storage
    General purpose data storage and retrieval
    Used for many purposes, such as queries and
    analysis
    Generalized view of data for shared use
    Ideally shared across business units or the
    Enterprise
    Works well to store the contents of an XML
    document.
      What is XML good at?
Data Exchange
  Exchange of data in a document
  Usually designed for a specific communication
  Works well to move data between databases
    Important when source or target database is outside
    your firewall.
    Usually don’t have direct access to such databases.
  Works well (with style sheets or XSLT) to
  display data on the web because browsers
  inherently are designed to display documents.
   When do I use each technology?
RDBMS                               XML
  Store data                           Transmit data
  Query data                           Exchange data with
  Mine data                            outside agencies
  Create generalized reports           Replace flat files
                                       Create specific reports

Which one?
  Use them together
  Each for its own purpose
  Build an infrastructure that:
     Creates XML document (using XML schemas) from database contents.
     Parses XML documents to store their contents in a database
                                                 Patient

                              PK     PatientID             SERIAL




Relationships: RDBMS
                                     BirthDate             DATE
                                     Gender                CHAR(1)
                                     FirstName             CHAR(25)
                                     LastName              CHAR(35)
                                     PrivacyInd            CHAR(1)
                                     StreetAddress         CHAR(50)
                                     CityAddress           CHAR(35)
                                     StateCode             CHAR(2)
                                     ZipCode               CHAR(9)


  RDBMS                              HIPAANotifInd
                                     EMailAddress
                                     SSN
                                     SafetyCapInd
                                     SafetyCapDate
                                                           CHAR(1)
                                                           VARCHAR(70)
                                                           CHAR(9)
                                                           CHAR(1)
                                                           DATE
                                     StatusCode            CHAR(2)

    Explicit table to         FK1    DoctorID              INTEGER



                                     has membership by / is for


    table: if not declared                        TPMembership
                             PK     TPMembershipID                      SERIAL

    by a foreign key, it            CardholderLastName
                                    CardholderFirstName
                                                                        CHAR(35)
                                                                        CHAR(25)



    doesn’t exist.
                                    CardholderRelationshipCode          CHAR(2)
                                    CardholderID                        VARCHAR(20)
                                    CopayAmt                            DECIMAL(8,2)
                                    InsCoverageType                     SMALLINT
                                    IssueDate                           DATE
                                    MembTypeCode                        SMALLINT
                                    TerminationDate                     DATE
                             FK1    PatientID                           INTEGER
                             FK2    TPInsuranceID                       VARCHAR(20)


                                                 is for / is in


                                          ThirdPartyInsurance

                              PK    TPInsuranceID                 VARCHAR(20)

                                    AcceptAssignFlag              CHAR(1)
                                    AcceptSecondBillFlag          CHAR(1)
                                    BillMedcrFirstFlag            CHAR(1)
                                    BillingMethodCode             INTEGER
                                    CarrierTypeCode               CHAR(2)
                                    DrQualifierTypeCode           CHAR(3)
                                    ExpireDate                    DATE
                                    GroupCode                     VARCHAR(15)
                                    MbrIDLengthNum                SMALLINT
                                    MedicaidStateCode             CHAR(2)
                                    NCPDPTypeCode                 CHAR(3)
                                    PayorIDTypeCode               CHAR(2)
                                    PharmacyIDType                CHAR(10)
                                    PlanCode                      CHAR(8)
                                    PlanName                      VARCHAR(20)
                                    ProcessorTPIDCode             CHAR(10)
                                    ProgramCode                   CHAR(5)
                                    ProgramName                   VARCHAR(20)
                                    RefrlPhoneNum                 VARCHAR(20)
                                    ReminderFlag                  CHAR(1)
                                    RoutingCode                   CHAR(6)
                                    TPIDNumber                    CHAR(10)
Relationships: XML
                     <Patient>

XML                               <PatientID>10988453</PatientID>
                                  <FirstName>Maria</FirstName>
                                  <LastName>Montes</LastName>
 Implied                          <Birthdate>1951-11-05</Birthdate>
                                  <Gender>F</Gender>
                                  <PrivacyInd>false</PrivacyInd>
 positioning:                     <StreetAddress>1969 Ygnacio Valley Road</StreetAddress>
                                  <CityAddress>Walnut Creek</CityAddress>
   Part-of relationship           <StateCode>CA</StateCode>
                                  <ZipCode>94597</ZipCode>
   implied by                     <HIPAANotifInd>true</HIPAANotifInd>
                                  <SafetyCapInd>false</SafetyCapInd>
   positioning                    <SafetyCapDate>2000-01-01</SafetyCapDate>
                                  <StatusCode>A</StatusCode>
                                  <Phone>
   Parallel elements                            <PhoneID>45569009</PhoneID>
                                                <AreaCode>925</AreaCode>
   have same                                    <PhoneNumber>5556964</PhoneNumber>
                                                <PhoneTypeCode>RS</PhoneTypeCode>
   relationships as               </Phone>
                                  <PatDrugAllergy>
   sibling elements                             <PatDrugAllergyID>43325564</PatDrugAllergyID>
                                                <InformedDate>2001-01-01</InformedDate>
                                                <DrugID>432678945</DrugID>
                                  </PatDrugAllergy>
                     </Patient>
  Querying: XML vs. RDBMS
Relational
  Easy to build queries by:
    Navigating the database joins … or …
    Creating joins on any pair of tables that share
    matching columns
XML
  Difficult to build queries because:
    Must stick to the structure of the document
    Pointer-based navigation is restrictive
 Navigating in XML for Queries
Must follow the structure of the document.
Example: DOM Node navigation:
  getFirstChild
  getLastChild
  getPreviousSibling
  getNextSibling
  getParentNode
Structure Flexibility : Relational vs. XML
    Relational: Rigid structure
      Lots of work to change the structure because of
      foreign keys, Views, Stored procs, and triggers
      Plus programs that assume the database looks a
      certain way!
    XML: Flexible structure
      Easy to change the structure of documents to
      account for new data, data type changes
      Everything is text so it is easy to implement
      Or is it?
             XML Flexibility
Changing the structure of an XML document is
easy.
Changing the XML Schema is easy too, but have
you considered that you must:
  Get agreement from business partners
  Change the programs that create documents from the
  database and parse documents for the database.
  Version your XML Schemas if you want to be able
  to open and parse historical documents (do you keep
  them?).
Data Integrity: Relational vs. XML
Both provide ways to control data types,
ranges, patterns, valid values, and
optionality.
But XML does not provide data validation
via look-up as relational does with foreign
keys.
However, this lack does make the XML
Schema a less rigid structure than a
database schema.
             Cost Considerations
People always ask – which is cheaper to implement?
  XML is essentially all text, so it is easy to create, change
  and manage.
  XML doesn’t need the infrastructure that an RDBMS
  does.
  XML tools are cheap ($99!)
But does this cost difference really matter?
  The two technologies do different things.
  You can’t use one as replacement of the other.
  If you need it, you need it.
  So, XML is cheaper – and it doesn’t really matter.
        So what do we do?
Do you remember the previous slide?
  The two technologies do different things
  They complement each other:
    RDBMS stores data in a generally usable way
    XML exchanges data in a specific way is easy to
    display on the web.
  They both present challenges for metadata
  management.
So…we use them both for the appropriate
purpose.
Thank you!