Role Of XML
XML and The Web
XML Language Basics
Revolutions Of XML
Service Oriented Architecture (SOA)
THE ROLE OF XML
XML is a metalanguage (literally a language about languages) defined by the World Wide Web
XML is a set of rules and guidelines for describing structured data in plain text rather than
proprietary binary representations.
Was standardized by the W3C in 1998
In its short history, XML has given rise to numerous vertical industry vocabularies in support of
B2B e-commerce, horizontal vocabularies that provide services to a wide range of industries, and
XML’s influence has been felt in three waves, from industry-specific vocabularies, to horizontal
industry applications, to protocols that describe how businesses can exchange data across the
XML is a language for creating other languages based on the insertion of tags to help describe
data. However, XML is actually more than just tags. XML is a combination of tags and content
in which the tags add meaning to the content. The following is a simple XML markup of
<Name>John von Neumann</Name>
However, elements are only one way to describe data. It’s also possible to represent the data
using attributes within a single element:
<Customer name="John von Neumann" phone="914.631.7722"
XML allows data to be stored in either elements or attributes.
Elements and attributes can be named to give the data meaning.
Start tags and end tags define elements that are the basis for XML tree-structured
representations of documents.
Elements can contain text data and/or other elements.
The XML Advantage
XML has had an impact across a broad range of areas. The following is a list of some of the
factors that have influenced XML’s adoption by a variety of organizations and individuals.
XML files are human-readable. Such is not the case with binary data formats.
Widespread industry support exists for XML. Numerous tools and utilities are being
provided with Web browsers, databases, and operating systems, making it easier and less
expensive for small and medium-sized organizations to import and export data in XML
Major relational databases now have the native capability to read and generate XML data.
A large family of XML support technologies is available for the interpretation and
transformation of XML data for Web page display and report generation.
XML: Design by Omission
There are three key design elements that by omission contribute to XML’s success:
No display is assumed. Unlike HTML, XML makes no assumptions about how tags will
be rendered in a browser or other display device. Auxiliary technologies such as style
sheets add this capability.
There is no built-in data typing. DTDs and XML Schema provide support for defining the
structure and data types associated with an XML document.
No transport is assumed. The XML specification makes no assumption about how XML
is to be transported across the Internet.
XML AND WEB
XML may be used to communicate directly with partners and suppliers. Instead of exchanging
about purchases and orders either manually or over proprietary networks, data vocabularies can
be defined using XML and delivered from server to server using standard protocols such as
HTTP or FTP.
Associated with this ability to move data freely across the Web is the rise in the use of
messaging servers and software These servers, supporting what is known as Message Oriented
Middleware, provide guarantees of delivery and the ability to broadcast communications to
Web services is an ambitious initiative that is moving the Web to new levels of B2B (that is,
software-to-software) interaction while trying to fulfill object technology's promise of reusable
components from a service interface perspective
SOAP is the XML glue that lets clients and providers talk to each other and exchange XML data.
SOAP builds on XML and common Web protocols (HTTP, FTP, and SMTP) to enable
communication across the Web. SOAP brings to the table a set of rules for moving data, either
directly in a point-to-point fashion or by sending the data through a message queue intermediary.
Prior to SOAP, there were three basic options for doing distributed computing: Microsoft’s
Component Object Model (DCOM), Java’s Remote Method Invocation (RMI), or the Object
Management Group’s Common Object Request Broker Architecture (CORBA). Their drawback
is that they limit the
potential reach of the enterprise to servers that share the same object infrastructure. With SOAP,
however, the potential space of interconnection is the entire Web itself.
Web services is both a process and set of protocols for finding and connecting to software
exposed as services over the Web. By assuming a SOAP foundation, Web services can
concentrate on what data to exchange instead of worrying about how to get it from point A to
point B, which is the job of SOAP. To make things even easier SOAP also defines an XML
envelope to carry XML and a convention for doing remote procedure calls so that a service can
advertise ―call me here‖ and a program will be able to do so without concern for language or
platform. Although SOAP may be used with a variety of protocols, the only bindings specified in
the proposed SOAP specification are for HTTP. The Web services technical infrastructure
ensures that services even from different vendors will interoperate to create a complete business
process. Web services takes the object-oriented. vision of assembling software from component
building blocks to the next level. With Web services, however, the emphasis is on the assembly
of services that may or may not be built on object technology.
The interconnections opened up by the Web make possible a new way of interacting through the
registration, discovery, and connection of software packaged as Web services. There are three
major aspects to Web Services:
service provider provides an interface for software that can carry out a specified set of
service requester discovers and invokes a software service to provide a business solution
requester will commonly invoke a remote procedure call on the service provider, passing
parameter data to the provider and receiving a result in reply.
broker manages and publishes the service. Service providers publish their services with
the broker and requests access those services by creating bindings to the service provider.
XML: THE THREE REVOLUTIONS
Three revolutions centered on XML and the Web
The Data Revolution
XML-based industry-specific data vocabularies provide alternatives to specialized Electronic
Data Interchange (EDI) solutions by facilitating B2B data exchange and playing a key role as a
messaging infrastructure for distributed computing.
XML's strength is its data independence.
XML is pure data description, not tied to any programming language, operating system or
data is free to move about globally without the constraints imposed by tightly coupled transport
dependent architectures. Protocols such as HTTP have had a tremendous impact on XML's
viability and have opened the door to alternatives to CORBA, RMI and DCOM, which don't
work over TCP/IP. XML does this by focusing on data and leaving other issues to supporting
The Architectural Revolution
The architectural revolution surrounding XML is reflected in a move from tightly coupled
systems based on established infrastructures such as CORBA, RMI and DCOM, each with their
own transport protocol, to loosely coupled systems riding atop standard Web protocols such as
The loose coupling of the Web makes possible new system architectures built around message-
based middleware or less structured peer-to-peer interaction.
XML plays a key role in this new architecture for distributed computing through a new XML
protocol language called SOAP, the Simple Object Access Protocol.
SOAP simply defines a set of XML tags for moving XML data around the Web using standard
Web protocols, accomplishing in one simple initiative what client-server computing had been
trying to do for over a decade. Associated with this ability to move data freely across the Web is
the rise in the use of messaging servers and software that sit between conversational participants.
These servers, supporting what is known as Message Oriented Middleware, are playing an
increasingly important role in the new extended enterprise by providing guarantees of delivery
and the ability to broadcast communications to multiple recipients.
The Software Revolution
During the 1970s and 1980s, software was constructed as monolithic applications built to solve
specific problems. The problem with large software projects is that, by trying to tackle multiple
problems at once, the software is often ill suited to adding new functionality and adapting to
technological change. In the 1990s a different model for software emerged based on the concept
of simplicity. Instead of trying to define all requirements up front, this new philosophy was built
around the concept of creating building blocks capable of combination with other building
blocks that either already existed or were yet to be created. Figure 4 illustrates the software
A case in point is the Web. After decades of attempts to build complex infrastructures for
exchanging information across distributed networks, the Web emerged from an assemblage of
foundational technologies such as HTTP, HTML, browsers and a longstanding networking
technology known as TCP/IP that had been put in place in the 1970s.
SERVICE ORIENTED ARCHITECTURE
SOA is an architectural style whose goal is to achieve loose coupling among interacting software
A service is a unit of work done by a service provider to achieve desired end results for a service
consumer. Both provider and consumer are roles played by software agents on behalf of their
First, the messages must be descriptive, rather than instructive, because the service provider is
responsible for solving the problem. This is like going to a restaurant: you tell your waiter what
you would like to order and your preferences but you don't tell their cook how to cook your dish
step by step.
Second, service providers will be unable to understand your request if your messages are not
written in a format, structure, and vocabulary that is understood by all parties. Omitting the
vocabulary and structure of messages is a necessity for any efficient communication. The more
restricted a message is, the easier it is to understand the message, although it comes at the
expense of reduced extensibility.
Third, extensibility is vitally important. If messages are not extensible, consumers and providers
will be locked into one particular version of a service.
Fourth, an SOA must have a mechanism that enables consumer to discover a service provider
under the context of a service sought by the consumer.
Structuring With Schemas and DTD
XML is a markup language for documents containing structured information.
Structured information contains both content (words, pictures, etc.) and some indication of what
role that content plays
A markup language is a mechanism to identify structures in a document. The XML specification
defines a standard way to add markup to documents.
XML specifies neither semantics nor a tag set. In fact XML is really a meta-language for
describing markup languages. In other words, XML provides a facility to define tags and the
structural relationships between them. Since there's no predefined tag set, there can't be any
preconceived semantics. All of the semantics of an XML document will either be defined by the
applications that process them or by style sheets.
No. Well, yes, sort of. XML is defined as an application profile of SGML. SGML is the Standard
Generalized Markup Language defined by ISO 8879. SGML has been the standard, vendor-
independent way to maintain repositories of structured documentation for more than a decade,
but it is not well suited to serving documents over the web (for a number of technical reasons
beyond the scope of this article). Defining XML as an application profile of SGML means that
any fully conformant SGML system will be able to read XML documents. However, using and
understanding XML documents does not require a system that is capable of understanding the
full generality of SGML. XML is, roughly speaking, a restricted form of SGML.
XML was created so that richly structured documents could be used over the web. The only
viable alternatives, HTML and SGML, are not practical for this purpose.
HTML, as we've already discussed, comes bound with a set of semantics and does not provide
SGML provides arbitrary structure, but is too difficult to implement just for a web browser. Full
SGML systems solve large, complex problems that justify their expense. Viewing structured
documents sent over the web rarely carries such justification.
This is not to say that XML can be expected to completely replace SGML. While XML is being
designed to deliver structured content over the web, some of the very features it lacks to make
this practical, make SGML a more satisfactory solution for the creation and long-time storage of
complex documents. In many organizations, filtering SGML to XML will be the standard
procedure for web delivery.
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to describe data
XML tags are not predefined. You must define your own tags
XML uses a Document Type Definition (DTD) or an XML Schema to describe the data
XML with a DTD or XML Schema is designed to be self-descriptive
XML is a W3C Recommendation
The Extensible Markup Language (XML) became a W3C Recommendation 10. February 1998.
XML was designed to carry data.XML is not a replacement for HTML. XML was designed to
describe data and to focus on what data is.HTML was designed to display data and to focus on
how data looks.HTML is about displaying information, while XML is about describing
XML was created to structure, store and to send information. The following example is a note to
Tove from Jani, stored as XML:
<body>Don't forget me this weekend!</body>
The tags used to mark up HTML documents and the structure of HTML documents are
predefined. The author of HTML documents can only use tags that are defined in the HTML
standard (like <p>, <h1>, etc.).XML allows the author to define his own tags and his own
document structure.It is important to understand that XML is not a replacement for HTML. In
future Web development it is most likely that XML will be used to describe the data, while
HTML will be used to format and display the same data.My best description of XML is this:
XML is a cross-platform, software and hardware independent tool for transmitting information.
XML TECHNOLGY FAMILY
Use of Elements vs. Attributes
Data can be stored in child elements or in attributes.
In the first example sex is an attribute. In the last, sex is a child element. Both examples provide
the same information. There are no rules about when to use attributes, and when to use child
elements. My experience is that attributes are handy in HTML, but in XML you should try to
avoid them. Use child elements if the information feels like data. Some of the problems with
using attributes are:
attributes cannot contain multiple values (child elements can)
attributes are not easily expandable (for future changes)
attributes cannot describe structures (child elements can)
attributes are more difficult to manipulate by program code
attribute values are not easy to test against a Document Type Definition (DTD) - which is
used to define the legal elements of an XML document
If you use attributes as containers for data, you end up with documents that are difficult to read
and maintain. Try to use elements to describe data. Use attributes only to provide information
that is not relevant to the data.
XML Namespaces provide a method to avoid element name conflicts. Since element names in
XML are not predefined, a name conflict will occur when two different documents use the same
element names. This XML document carries information in a table:
<name>African Coffee Table</name>
If these two XML documents were added together, there would be an element name conflict
because both documents contain a <table> element with different content and definition.
Solving Name Conflicts Using a Prefix
This XML document carries information in a table:
This XML document carries information about a piece of furniture:
<f:name>African Coffee Table</f:name>
Now there will be no name conflict because the two documents use a different name for their
<table> element (<h:table> and <f:table>).
By using a prefix, we have created two different types of <table> elements.
This XML document carries information in a table:
This XML document carries information about a piece of furniture:
<f:name>African Coffee Table</f:name>
Instead of using only prefixes, we have added an xmlns attribute to the <table> tag to give the
prefix a qualified name associated with a namespace.
The XML Namespace (xmlns) Attribute
The XML namespace attribute is placed in the start tag of an element and has the following
When a namespace is defined in the start tag of an element, all child elements with the same
prefix are associated with the same namespace.
Note that the address used to identify the namespace is not used by the parser to look up
information. The only purpose is to give the namespace a unique name. However, very often
companies use the namespace as a pointer to a real Web page containing information about the
Try to go to http://www.w3.org/TR/html4/.
Uniform Resource Identifier (URI)
A Uniform Resource Identifier (URI) is a string of characters which identifies an Internet
Resource. The most common URI is the Uniform Resource Locator (URL) which identifies an
Internet domain address. Another, not so common type of URI is the Universal Resource Name
(URN). In our examples we will only use URLs.
Defining a default namespace for an element saves us from using prefixes in all the child
elements. It has the following syntax:
This XML document carries information in a table:
This XML document carries information about a piece of furniture:
<name>African Coffee Table</name>
Namespaces in Real Use
When you start using XSL, you will soon see namespaces in real use. XSL style sheets are used
to transform XML documents into other formats, like HTML.
If you take a close look at the XSL document below, you will see that most of the tags are
HTML tags. The tags that are not HTML tags have the prefix xsl, identified by the namespace
<?xml version="1.0" encoding="ISO-8859-1"?>
<h2>My CD Collection</h2>
STRUCTURING WITH SCHEMAS
The purpose of an XML Schema is to define the legal building blocks of an XML document, just
like a DTD.
An XML Schema:
defines elements that can appear in a document
defines attributes that can appear in a document
defines which elements are child elements
defines the order of child elements
defines the number of child elements
defines whether an element is empty or can include text
defines data types for elements and attributes
defines default and fixed values for elements and attributes
One of the greatest strength of XML Schemas is the support for data types.
With support for data types:
It is easier to describe allowable document content
It is easier to validate the correctness of data
It is easier to work with data from a database
It is easier to define data facets (restrictions on data)
It is easier to define data patterns (data formats)
It is easier to convert data between different data types
XML Schemas use XML Syntax
Another great strength about XML Schemas is that they are written in XML.Some benefits of
that XML Schemas are written in XML:
You don't have to learn a new language
You can use your XML editor to edit your Schema files
You can use your XML parser to parse your Schema files
You can manipulate your Schema with the XML DOM
You can transform your Schema with XSLT
XML Schemas Secure Data Communication
When sending data from a sender to a receiver, it is essential that both parts have the same
"expectations" about the content.With XML Schemas, the sender can describe the data in a way
that the receiver will understand.A date like: "03-11-2004" will, in some countries, be interpreted
as 3.November and in other countries as 11.March.However, an XML element with a data type
ensures a mutual understanding of the content, because the XML data type "date" requires the
XML Schemas are Extensible
XML Schemas are extensible, because they are written in XML.With an extensible Schema
definition you can:
Reuse your Schema in other Schemas
Create your own data types derived from the standard types
Reference multiple schemas in the same document
Well-Formed is not Enough
A well-formed XML document is a document that conforms to the XML syntax rules, like:
it must begin with the XML declaration
it must have one unique root element
start-tags must have matching end-tags
elements are case sensitive
all elements must be closed
all elements must be properly nested
all attribute values must be quoted
entities must be used for special characters
Even if documents are well-formed they can still contain errors, and those errors can have
serious consequences.Think of the following situation: you order 5 gross of laser printers, instead
of 5 laser printers. With XML Schemas, most of these errors can be caught by your validating
DOCUMENT TYPE DEFINITION (DTD)
A Document Type Definition (DTD) defines the legal building blocks of an XML document. It
defines the document structure with a list of legal elements and attributes. A DTD can be
declared inline inside an XML document, or as an external reference.
Internal DTD Declaration
If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition with
the following syntax:
<!DOCTYPE root-element [element-declarations]>
Example XML document with an internal DTD:
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<body>Don't forget me this weekend</body>
The DTD above is interpreted like this:
!DOCTYPE note defines that the root element of this document is note.
!ELEMENT note defines that the note element contains four elements: "to,from,heading,body".
!ELEMENT to defines the to element to be of the type "#PCDATA".
!ELEMENT from defines the from element to be of the type "#PCDATA".
!ELEMENT heading defines the heading element to be of the type "#PCDATA".
!ELEMENT body defines the body element to be of the type "#PCDATA".
External DTD Declaration
If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with
the following syntax:
<!DOCTYPE root-element SYSTEM "filename">
This is the same XML document as above, but with an external DTD (Open it, and select view
<!DOCTYPE note SYSTEM "note.dtd">
<body>Don't forget me this weekend!</body>
And this is the file "note.dtd" which contains the DTD:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
Why Use a DTD?
With a DTD, each of your XML files can carry a description of its own format. With a DTD,
independent groups of people can agree to use a standard DTD for interchanging data. Your
application can use a standard DTD to verify that the data you receive from the outside world is
valid. You can also use a DTD to verify your own data.
The XML DOM is the Document Object Model for XML
The XML DOM is platform- and language-independent
The XML DOM defines a standard set of objects for XML
The XML DOM defines a standard way to access XML documents
The XML DOM defines a standard way to manipulate XML documents
The XML DOM is a W3C standard
The DOM views XML documents as a tree-structure. All elements; their containing text and
their attributes, can be accessed through the DOM tree. Their contents can be modified or
deleted, and new elements can be created. The elements, their text, and their attributes are all
known as nodes.
According to the DOM, everything in an XML document is a node.The DOM says that:
The entire document is a document node
Every XML tag is an element node
The texts contained in the XML elements are text nodes
Every XML attribute is an attribute node
Comments are comment nodes
Nodes have a hierarchical relationship to each other.All nodes in an XML document form a
document tree (or node tree). Each element, attribute, text, etc. in the XML document represents
a node in the tree. The tree starts at the document node and continues to branch out until it has
reached all text nodes at the lowest level of the tree.The terms "parent" and "child" are used to
describe the relationships between nodes. Some nodes may have child nodes, while other nodes
do not have children (leaf nodes).Because the XML data is structured in a tree form, it can be
traversed without knowing the exact structure of the tree and without knowing the type of data
DOM Node Hierarchy Example
Look at the following XML file: books.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
Notice that the root element in the XML document above is named <bookstore>. All other
elements in the document are contained within <bookstore>.The <bookstore> element represents
the root node of the DOM tree. The root node <bookstore> holds four <book> child nodes.The
first <book> child node also holds four children: <title>, <author>, <year>, and <price>, which
contains one text node each, "Everyday Italian", "Giada De Laurentiis", "2005", and "30.00".
Text is always stored in text nodes. A common error in DOM processing is to navigate to an
element node and expect it to contain the text. However, even the simplest element node has a
text node under it. For example, in <year>2005</year>, there is an element node (year), and a
text node under it, which contains the text (2005).The following image illustrates a fragment of
the DOM node tree from the XML document above:
SAX is a common interface implemented for many different XML parsers (and things that pose
as XML parsers), just as the JDBC is a common interface implemented for many different
relational databases (and things that pose as relational databases). If you want to use SAX, you'll
need all of the following:
Java 1.1 or higher.
A SAX2-compatible XML parser installed on your Java classpath. (If you need such a
parser, see the page of links at the left.)
The SAX2 distribution installed on your Java classpath. (This probably came with your
Most Java/XML tools distributions include SAX2 and a parser using it. Most web applications
servers use it for their core XML support. In particular, environments with JAXP 1.1 support
CSS and XSL
XSL stands for EXtensible Stylesheet Language.The World Wide Web Consortium (W3C)
started to develop XSL because there was a need for an XML-based Stylesheet Language. CSS =
HTML Style Sheets HTML uses predefined tags and the meaning of the tags are well
understood.The <table> element in HTML defines a table - and a browser knows how to display
it.Adding styles to HTML elements is simple. Telling a browser to display an element in a
special font or color, is easy with CSS.
XSL = XML Style Sheets XML does not use predefined tags (we can use any tag-names we
like), and the meaning of these tags are not well understood. A <table> element could mean an
HTML table, a piece of furniture, or something else - and a browser does not know how to
display it. XSL describes how the XML document should be displayed!XSL - More Than a Style
XSL consists of three parts:
XSLT - a language for transforming XML documents
XPath - a language for navigating in XML documents
XSL-FO - a language for formatting XML documents
XHTML stands for EXtensible HyperText Markup Language
XHTML is aimed to replace HTML
XHTML is almost identical to HTML 4.01
XHTML is a stricter and cleaner version of HTML
XHTML is HTML defined as an XML application
XHTML is a W3C Recommendation
An XForms Processor built into the browser will be responsible for submitting the XForms data
to a target.
The data can be submitted as XML and could look something like this:
Or it can be submitted as text, looking something like this:
VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice
dialogues between a human and a computer. It allows voice applications to be developed and
deployed in an analogous way to HTML for visual applications. Just as HTML documents are
interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser.
A common architecture is to deploy banks of voice browsers attached to the public switched
telephone network (PSTN) so that users can use a telephone to interact with voice applications.
The following is an example of a VoiceXML document:
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized
Typically, HTTP is used as the transport protocol for fetching VoiceXML pages. Some
applications may use static VoiceXML pages, while others rely on dynamic VoiceXML page
generation using an application server like Tomcat, Weblogic, IIS, or WebSphere. In a well-
architected web application, the voice interface and the visual interface share the same back-end
Historically, VoiceXML platform vendors have implemented the standard in different ways, and
added proprietary features. But the VoiceXML 2.0 standard, adopted as a W3C
Recommendation 16 March 2004, clarified most areas of difference. The VoiceXML Forum, an
industry group promoting the use of the standard, provides a conformance testing process that
certifies vendors implementations as conformant.
XLink defines a standard way of creating hyperlinks in XML documents. XPointer allows the
hyperlinks to point to more specific parts (fragments) in the XML document.
XLink is short for the XML Linking Language
XLink is a language for creating hyperlinks in XML documents
XLink is similar to HTML links - but it is a lot more powerful
ANY element in an XML document can behave as an XLink
XLink supports simple links (like HTML) and extended links (for linking multiple
With XLink, the links can be defined outside of the linked files
XLink is a W3C Recommendation
XPointer is short for the XML Pointer Language
XPointer allows the hyperlinks to point to specific parts of the XML document
XPointer uses XPath expressions to navigate in the XML document
XPointer is a W3C Recommendation
In HTML, we know (and all the browsers know!) that the <a> element defines a hyperlink.
However, this is not how it works with XML. In XML documents, you can use whatever element
names you want - therefore it is impossible for browsers to predict what hyperlink elements will
be called in XML documents. The solution for creating links in XML documents was to put a
marker on elements that should act as hyperlinks. Below is a simple example of how to use
XLink to create links in an XML document:
To get access to the XLink attributes and features we must declare the XLink namespace at the
top of the document.
The XLink namespace is: "http://www.w3.org/1999/xlink".
The xlink:type and the xlink:href attributes in the <homepage> elements define that the type and
href attributes come from the xlink namespace.
The xlink:type="simple" creates a simple, two-ended link (means "click from here to go there").
We will look at multi-ended (multidirectional) links later.
In HTML, we can create a hyperlink that either points to an HTML page or to a bookmark inside
an HTML page (using #).
Sometimes it is more useful to point to more specific content. For example, let's say that we want
to link to the third item in a particular list, or to the second sentence of the fifth paragraph. This
is easy with XPointer.
If the hyperlink points to an XML document, we can add an XPointer part after the URL in the
xlink:href attribute, to navigate (with an XPath expression) to a specific place in the document.
For example, in the example below we use XPointer to point to the fifth item in a list with a
unique id of "rock":
XPath is the result of an effort to provide a common syntax and semantics for functionality
shared between XSL Transformations [XSLT] and XPointer [XPointer]. The primary purpose of
XPath is to address parts of an XML [XML] document. In support of this primary purpose, it
also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a
compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values.
XPath operates on the abstract, logical structure of an XML document, rather than its surface
syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the
hierarchical structure of an XML document.
In addition to its use for addressing, XPath is also designed so that it has a natural subset that can
be used for matching (testing whether or not a node matches a pattern); this use of XPath is
described in XSLT.
XPath models an XML document as a tree of nodes. There are different types of nodes, including
element nodes, attribute nodes and text nodes. XPath defines a way to compute a string-value for
each type of node. Some types of nodes also have names. XPath fully supports XML
Namespaces [XML Names]. Thus, the name of a node is modeled as a pair consisting of a local
part and a possibly null namespace URI; this is called an expanded-name. The data model is
described in detail in
The primary syntactic construct in XPath is the expression. An expression matches the
production Expr. An expression is evaluated to yield an object, which has one of the following
four basic types:
node-set (an unordered collection of nodes without duplicates)
boolean (true or false)
number (a floating-point number)
string (a sequence of UCS characters)
Expression evaluation occurs with respect to a context. XSLT and XPointer specify how the
context is determined for XPath expressions used in XSLT and XPointer respectively. The
context consists of:
a node (the context node)
a pair of non-zero positive integers (the context position and the context size)
a set of variable bindings
a function library
the set of namespace declarations in scope for the expression
XQuery is the language for querying XML data
XQuery for XML is like SQL for databases
XQuery is built on XPath expressions
XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.)
XQuery is a W3C Recommendation
XQuery is a language for finding and extracting elements and attributes from XML
documents.Here is an example of a question that XQuery could solve:"Select all CD records with
a price less than $10 from the CD collection stored in the XML document called cd_catalog.xml"
XQuery can be used to:
Extract information to use in a Web Service
Generate summary reports
Transform XML data to XHTML
Search Web documents for relevant information
XQuery is a W3C Recommendation. XQuery is compatible with several W3C standards, such as
XML, Namespaces, XSLT, XPath, and XML Schema. XQuery 1.0 became a W3C
Recommendation January 23, 2007.
XML INFRASTRUCTURE TECHNOLOGIES
The XML Infoset is an abstract Data Model describing the information available from an XML
document. For many applications, this way of looking at an XML document is more useful than
having to analyze and interpret XML syntax. DOM describes an API through which the
information in an XML Infoset (i.e., the information available from a specific XML document)
can be accessed from different programming languages.
The XML Information Set (Infoset) defines a data model for XML. This data model is a set of abstractions
that detail the properties of XML trees. These abstractions provide a common viewpoint from which to think
about XML APIs and higher-level specifications such as XPath, XSLT and XML Schema, as shown in Figure 1.
RDF stands for Resource Description Framework
RDF is a framework for describing resources on the web
RDF provides a model for data, and a syntax so that independent parties can exchange
and use it
RDF is designed to be read and understood by computers
RDF is not designed for being displayed to people
RDF is written in XML
RDF is a part of the W3C's Semantic Web Activity
RDF is a W3C Recommendation
RDF - Examples of Use
Describing properties for shopping items, such as price and availability
Describing time schedules for web events
Describing information about web pages, such as content, author, created and modified
Describing content and rating for web pictures
Describing content for search engines
Describing electronic libraries
RDF is Designed to be Read by Computers
RDF was designed to provide a common way to describe information so it can be read and
understood by computer applications.
RDF descriptions are not designed to be displayed on the web.
RDF is Written in XML
RDF documents are written in XML. The XML language used by RDF is called RDF/XML.
By using XML, RDF information can easily be exchanged between different types of computers
using different types of operating systems and application languages.
RDF and "The Semantic Web"
The RDF language is a part of the W3C's Semantic Web Activity. W3C's "Semantic Web
Vision" is a future where:
Web information has exact meaning
Web information can be understood and processed by computers
Computers can integrate information from the web
OVERVIEW OF SOAP
SOAP allows Java objects and COM objects to talk to each other in a distributed, decentralized,
SOAP allows objects (or code) of any kind -- on any platform, in any language -- to cross-
communicate. At present, SOAP has been implemented in over 60 languages on over 20
SOAP is simply one component in the emerging picture of the Web as a standards-based,
language- and platform-neutral framework for business operations. These operations are
commonly lumped under the generic tag "Web services, " but Web services themselves are only
as good as the infrastructure that supports them.
Three network tiers are evident in the evolution of Web services: TCP/IP, HTTP/HTML, and
XML. These tiers build successively on top of each other and remain compatible today.
The first tier, the TCP/IP protocol, is concerned primarily with passing data across the wire in
packets. A protocol that guarantees transmission across public networks, TCP/IP emphasizes
reliability of data transport and physical connectivity. Originally the putty holding proprietary
networks together, it's now the backbone protocol of the Web on which higher-level, standard
protocols such as HTTP rely.
The second tier, HTML over HTTP, is a presentation tier and concerns itself with browser-based
search, retrieval and sharing of information. The emphasis here is on GUI-based navigation and
the manipulation of presentation formats. In many ways, HTML is more show than go and lacks
both extensibility and true programming power. Nonetheless, sharing hypertext-linked
documents in a browser-based environment revolutionized the way humans communicate text-
based information to one another. Networked desktop environments, burdened with proprietary
operating systems and platform dependent software, are slowly but surely giving way to the
standards-based, open-systems computing of the Internet.
Leading the charge into this brave new standards-based world is XML, the third and possibly the
most compelling tier on the Internet. XML, a strongly-typed data interchange format, provides a
new dimension to the HTTP/HTML tier, one in which machine-to-machine communication is
made possible through standard interfaces. This layer -- variously described as A2A (application
to application), B2B (business to business) or C2C (computer to computer) -- allows programs to
exchange data formatted in a platform- and presentation-independent manner. XSLT style sheets
may be added as an optional presentation and/or transformational component.
SOAP is "a lightweight protocol for exchange of information in a decentralized, distributed
SOAP does not mandate a single programming model -- nor does it define language bindings for
a specific programming language. In the context of the Java programming language, it's up to the
Java community to define the specific language binding. Java language bindings are now being
pursued through the JAX-RPC initiative.
SOAP is an extensible, text-based framework for enabling communication between diverse
parties -- in general, objects -- that have no prior knowledge of each other or of each other's
platforms. From the point of view of objects on the net, SOAP is the ultimate blind date. Client
applications can interoperate in loosely-coupled environments to discover and connect
dynamically to services without any previous agreements having been established between them.
SOAP is extensible, because SOAP clients, servers and the protocol itself can evolve without
breaking existing apps. SOAP, moreover, is generous in terms of supporting intermediaries and
layered architectures. This means processing nodes can sit on the path a request takes between
the client and server. These intermediate nodes process parts of the message specified by SOAP
through the use of headers, which allow clients to identify which node works on what part of the
message. This type of intermediate header processing is performed by private contract between
the client application and the intermediate processing node. SOAP provides a mustUnderstand
attribute for headers, which allows the client to specify whether the processing is mandatory or
optional. If mustUnderstand is set to 1, the server must either perform the intermediate
In order to fetch a web page for you, your web browser must "talk" to a web server somewhere
else. When web browsers talk to web servers, they speak a language known as HTTP, which
stands for HyperText Transfer Protocol. This language is actually very simple and
understandable and is not difficult for the human eye to follow.
A Simple HTTP Example
The browser says:
GET / HTTP/1.0
And the server replies:
HTTP/1.0 200 OK
HTTP is the protocol that drives the WWW. It was conceived by Sir Tim Berners-Lee (that’s
right, they knighted him). The Web is based on the client-server programming model in which
the client (your browser) requests a resource (a Web page) from a server. A brief negotiation is
made and the server returns the resource after which the browser renders the page and then you
can view (or perhaps listen) to it..
The first line of the browser's request, GET / HTTP/1.0, indicates that the browser wants to see
the home page of the site, and that the browser is using version 1.0 of the HTTP protocol. The
second line, Host: www.boutell.com, indicates the web site that the browser is asking for. This is
required because many web sites may share the same IP address on the Internet and be hosted by
a single computer. The Host: line was added a few years after the original release of HTTP 1.0 in
order to accommodate this.
The first line of the server's reply, HTTP/1.0 200 OK, indicates that the server is also speaking
version 1.0 of the HTTP protocol, and that the request was successful. If the page the browser
asked for did not exist, the response would read HTTP/1.0 404 Not Found. The second line of
the server's reply, Content-Type: text/html, tells the browser that the object it is about to receive
is a web page. This is how the browser knows what to do with the response from the server. If
this line were Content-Type: image/png, the browser would know to expect a PNG image file
rather than a web page, and would display it accordingly.
A modern web browser would say a bit more using the HTTP 1.1 protocol, and a modern web
server would respond with a bit more information, but the differences are not dramatic and the
above transaction is still perfectly valid; if a browser made a request exactly like the one above
today, it would still be accepted by any web server, and the response above would still be
accepted by any browser. This simplicity is typical of most of the protocols that grew up around
In fact, you can try being a web browser yourself, if you are a patient typist. If you are using
Windows, click the Start menu, select "Run," and type "telnet www.mywebsitename.com 80" in
the dialog that appears. Then click OK. Users of other operating systems can do the same thing;
just start your own telnet program and connect to your web site as the host and 80 as the port
number. When the connection is made, type:
XML – RPC
Inside every computer, every time you click a key or the mouse, thousands of "procedure calls"
are spawned, analyzing, computing and then acting on your gestures.
A procedure call is the name of a procedure, its parameters, and the result it returns.
Every program is just a single procedure called main, every operating system has a main
procedure called a kernel. There's a top level to every program that sits in a loop waiting for
something to happen and then distributes control to a hierarchy of procedures that respond. This
is at the heart of interactivity and networking, it's at the heart of software.
What is RPC?
RPC is a very simple extension to the procedure call idea, it says let's create connections between
procedures that are running in different applications, or on different machines.
Conceptually, there's no difference between a local procedure call and a remote one, but they are
implemented differently, perform differently (RPC is much slower) and therefore are used for
Remote calls are "marshalled" into a format that can be understood on the other side of the
connection. As long as two machines agree on a format, they can talk to each other. That's why
Windows machines can be networked with other Windows machines, and Macs can talk to
Macs, etc. The value in a standardized cross-platform approach for RPC is that it allows Unix
machines to talk to Windows machines and vice versa.
What is XML-RPC?
There are an almost infinite number of formats possible. One possible format is XML, a new
language that both humans and computers can read. XML-RPC uses XML as the marshalling
format. It allows Macs to easily make procedure calls to software running on Windows machines
and BeOS machines, as well as all flavors of Unix and Java, and IBM mainframes, and PDAs
and sewing machines (they have computers in them too these days).
With XML it's easy to see what it's doing, and it's also relatively easy to marshall the internal
procedure call format into a remote format.
OK, now that we understand what XML-RPC is, let the XML part fade into the background. It's
an implementation detail. Programmers are interested in XML, as are web developers, but if
you're a user or an investor, XML is about as important as C++ or Java. The developers like it, or
seem to, and that's the only major take-away from the XML part of XML-RPC.
But RPC is important, no matter what format is used, because it allows choices, you can replace
a component with another one; and it opens possibilities, empowering advanced users to develop
solutions with packaged software that the developers didn't anticipate.
XML-RPC is among the simplest and most foolproof web service approaches, and makes it easy
for computers to call procedures on other computers.
XML-RPC permits programs to make function or procedure calls across a network.
XMLRPC uses the HTTP protocol to pass information from a client computer to a server
XML-RPC uses a small XML vocabulary to describe the nature of requests and responses.
XML-RPC client specify a procedure name and parameters in the XML request, and the server
returns either a fault or a response in the XML response.
XML-RPC parameters are a simple list of types and content - structs and arrays are the most
complex types available.
XML-RPC has no notion of objects and no mechanism for including information that uses other
With XML-RPC and web services, however, the Web becomes a collection of procedural
connections where computers exchange information along tightly bound paths.
XML-RPC emerged in early 1998; it was published by UserLand Software and initially
implemented in their Frontier product.
XML-RPC consists of three relatively small parts:
XML-RPC data model
A set of types for use in passing parameters, return values, and faults (error messages)
XML-RPC request structures
An HTTP POST request containing method and parameter information
XML-RPC response structures
An HTTP response that contains return values or fault information
The XML-RPC specification defines six basic data types and two compound data types that
represent combinations of types.
Basic data types in XML-RPC
Type Value Examples
32-bit integers between - <int>27<int>
int or i4
2,147,483,648 and 2,147,483,647. <i4>27<i4>
double 64-bit floating-point numbers
Boolean true (1) or false (0)
ASCII text, though many <string>Hello</string>
implementations support Unicode <string>bonkers! @</string>
dateTime.iso8601 Dates in ISO8601 format: <dateTime.iso8601>
Binary information encoded as Base
64, as defined in RFC 2045
These basic types are always enclosed in value elements. Strings (and only strings) may be
enclosed in a value element but omit the string element. These basic types may be combined into
two more complex types, arrays and structs. Arrays represent sequential information, while
structs represent name-value pairs, much like hashtables, associative arrays, or properties.
Arrays are indicated by the array element, which contains a data element holding the list of
values. Like other data types, the array element must be enclosed in a value element. For
example, the following array contains four strings:
XML-RPC requests are a combination of XML content and HTTP headers. The XML content
uses the data typing structure to pass parameters and contains additional information identifying
which procedure is being called, while the HTTP headers provide a wrapper for passing the
request over the Web.
Each request contains a single XML document, whose root element is a methodCall element.
Each methodCall element contains a methodName element and a params element. The
methodName element identifies the name of the procedure to be called, while the params
element contains a list of parameters and their values. Each params element includes a list of
param elements which in turn contain value elements.
For example, to pass a request to a method called circleArea , which takes a Double parameter
(for the radius), the XML-RPC request would look like:
The HTTP headers for these requests will reflect the senders and the content. The basic template
POST /target HTTP 1.0
Content-Length: length of request in bytes
For example, if the circleArea method were available from an XML-RPC server listening at
/xmlrpc, the request might look like:
POST /xmlrpc HTTP 1.0
Assembled, the entire request would look like:
POST /xmlrpc HTTP 1.0
It's an ordinary HTTP request, with a carefully constructed payload.
Responses are much like requests, with a few extra twists. If the response is successful - the
procedure was found, executed correctly, and returned results - then the XML-RPC response will
look much like a request, except that the methodCall element is replaced by a methodResponse
element and there is no methodName element:
An XML-RPC response can only contain one parameter.
That parameter may be an array or a struct, so it is possible to return multiple values
It is always required to return a value in response. A "success value" - perhaps a boolean set to
Like requests, responses are packaged in HTTP and have HTTP headers. All XML-RPC
responses use the 200 OK response code, even if a fault is contained in the message. Headers use
a common structure similar to that of requests, and a typical set of headers might look like:
HTTP/1.1 200 OK
Date: Sat, 06 Oct 2001 23:20:04 GMT
Server: Apache.1.3.12 (Unix)
XML-RPC only requires HTTP 1.0 support, but HTTP 1.1 is compatible.
The Content-Type must be set to text/xml
The Content-Length header specifies the length of the response in bytes.
A complete response, with both headers and a response payload, would look like:
HTTP/1.1 200 OK
Date: Sat, 06 Oct 2001 23:20:04 GMT
Server: Apache.1.3.12 (Unix)
After the response is delivered from the XML-RPC server to the XML-RPC client, the
connection is closed. Follow-up requests need to be sent as separate XML-RPC connections.
SOAP is a simple XML-based protocol to let applications exchange information over HTTP.
SOAP stands for Simple Object Access Protocol
SOAP is a communication protocol
SOAP is for communication between applications
SOAP is a format for sending messages
SOAP is designed to communicate via Internet
SOAP is platform independent
SOAP is language independent
SOAP is based on XML
SOAP is simple and extensible
SOAP allows you to get around firewalls
SOAP will be developed as a W3C standard
It is important for application development to allow Internet communication between programs.
Today's applications communicate using Remote Procedure Calls (RPC) between objects like
DCOM and CORBA, but HTTP was not designed for this. RPC represents a compatibility and
security problem; firewalls and proxy servers will normally block this kind of traffic.
A better way to communicate between applications is over HTTP, because HTTP is supported by
all Internet browsers and servers. SOAP was created to accomplish this.
SOAP provides a way to communicate between applications running on different operating
systems, with different technologies and programming languages
SOAP message is an ordinary XML document containing the following elements:
A required Envelope element that identifies the XML document as a SOAP message
An optional Header element that contains header information
A required Body element that contains call and response information
An optional Fault element that provides information about errors that occurred while processing
All the elements above are declared in the default namespace for the SOAP envelope:
Here are some important syntax rules:
1. A SOAP message MUST be encoded using XML
2. A SOAP message MUST use the SOAP Envelope namespace
3. A SOAP message MUST use the SOAP Encoding namespace
4. A SOAP message must NOT contain a DTD reference
5. A SOAP message must NOT contain XML Processing Instructions
The SOAP Envelope Element
The required SOAP Envelope element is the root element of a SOAP message. It defines the
XML document as a SOAP message.
Note the use of the xmlns:soap namespace. It should always have the value of:
and it defines the Envelope as a SOAP Envelope
Message information goes here
The xmlns:soap Namespace
A SOAP message must always have an Envelope element associated with the
If a different namespace is used, the application must generate an error and discard the message.
The SOAP Header Element
The optional SOAP Header element contains application specific information (like
authentication, payment, etc) about the SOAP message. If the Header element is present, it must
be the first child element of the Envelope element.
All immediate child elements of the Header element must be namespace-qualified.
The example above contains a header with a "Trans" element, a "mustUnderstand" attribute
value of "1", and a value of 234.
SOAP defines three attributes in the default namespace ("http://www.w3.org/2001/12/soap-
envelope"). These attributes are: actor, mustUnderstand, and encodingStyle. The attributes
defined in the SOAP Header defines how a recipient should process the SOAP message.
The actor Attribute
A SOAP message may travel from a sender to a receiver by passing different endpoints along the
message path. Not all parts of the SOAP message may be intended for the ultimate endpoint of
the SOAP message but, instead, may be intended for one or more of the endpoints on the
The SOAP actor attribute may be used to address the Header element to a particular endpoint.
The mustUnderstand Attribute
The SOAP mustUnderstand attribute can be used to indicate whether a header entry is mandatory
or optional for the recipient to process.
If you add "mustUnderstand="1" to a child element of the Header element it indicates that the
receiver processing the Header must recognize the element. If the receiver does not recognize the
element it must fail when processing the Header.
The SOAP Body Element
The required SOAP Body element contains the actual SOAP message intended for the ultimate
endpoint of the message.
Immediate child elements of the SOAP Body element may be namespace-qualified. SOAP
defines one element inside the Body element in the default namespace
("http://www.w3.org/2001/12/soap-envelope"). This is the SOAP Fault element, which is used to
indicate error messages.
The example above requests the price of apples. Note that the m:GetPrice and the Item elements
above are application-specific elements. They are not a part of the SOAP standard.
A SOAP response could look something like this
The SOAP Fault Element
An error message from a SOAP message is carried inside a Fault element.
If a Fault element is present, it must appear as a child element of the Body element. A Fault
element can only appear once in a SOAP message.
The SOAP Fault element has the following sub elements:
Sub Element Description
<faultcode> A code for identifying the fault
<faultstring> A human readable explanation of the fault
<faultactor> Information about who caused the fault to happen
<detail> Holds application specific error information related to the Body element
The faultcode values defined below must be used in the faultcode element when describing
VersionMismatch Found an invalid namespace for the SOAP Envelope element
MustUnderstand An immediate child element of the Header element, with the
mustUnderstand attribute set to "1", was not understood
Client The message was incorrectly formed or contained
Server There was a problem with the server so the message could not
SOAP WITH ATTACHMENTS
You can associate a SOAP message with one or more attachments in their native format (for
example GIF or JPEG) by using a multipart MIME structure for transport. There are two core
standards that define how to do this:
SOAP with Attachments (SwA) or MIME for Web Services refers to the method of using Web
Services to send and receive files using a combination of SOAP and MIME, primarily over
Web Services Definition by W3C
● A Web service is a software application
● identified by a URI,
● whose interfaces and binding are capable of being defined, described and discovered by
XML artifacts and
● supports direct interactions with other software applications
● using XML based messages
● via internet-based protocols
Characteristics of Web Services
● XML based everywhere
● Programming language independent
● Could be dynamically located
● Could be dynamically assembled or aggregated
● Accessed over the internet
● Loosely coupled
● Based on industry standards
● Are platform neutral
● Are accessible in a standard way
● Are accessible in an interoperable way
● Use simple and ubiquitous plumbing
● Are relatively cheap
● Simplify enterprise integration
Why Web Services?
● Interoperable – Connect across heterogeneous networks using ubiquitous web-based standards
● Economical – Recycle components, no installation and tight integration of software
● Automatic – No human intervention required even for highly complex transactions
● Accessible – Legacy assets & internal apps are exposed and accessible on the web
● Available – Services on any device, anywhere, anytime
● Scalable – No limits on scope of applications and amount of heterogeneous applications
WEB SERVICE ARCHITECTURE AND KEY TECHNOLOGIES
Web services are software components that can be accessed over the Web through standards-
based protocols such as HTTP or SMTP for use in other applications. They provide a
fundamentally new framework and set of standards for a computing environment that can
include servers, workstations, desktop clients, and lightweight "pervasive" clients such as phones
and PDAs. Web services are not limited to the Internet; they supply a powerful architecture for
all types of distributed computing.
Web services standards are the glue that allows computers and devices to interact, forming a
greater computing whole that can be accessed from any device on the network.
In Web services, computing nodes have three roles—client, service, and broker.
o A client is any computer that accesses functions from one or more other computing
nodes on the network. Typical clients include desktop computers, Web browsers, Java
applets, and mobile devices. A client process makes a request for a computing service
and receives results for that request.
o A service is a computing process that receives and responds to requests and returns a set
o A broker is essentially a service metadata portal for registering and discovering services.
Any network client can search the portal for an appropriate service.
Because Web services can support the integration of information and services that are
maintained on a distributed network, they are appealing to local governments and other
organizations that have departments that independently collect and manage spatial data but must
integrate these datasets.
A series of protocols—eXtensible Markup Language (XML); Simple Object Access Protocol
(SOAP); Web Service Description Language (WSDL); and Universal Description, Discovery,
and Integration (UDDI)—provides the key standards for Web services and supports sophisticated
communications between various nodes on a network. These protocols enable smarter
communication and collaborative processing among nodes built within any Web services-
UDDI allows clients to discover Web services. In a GIS context, the UDDI node plays the role of
a metadata server for registered Web services. A user can search the UDDI directory and locate
the distributed service providers or services that exist on a network.
Web services interoperate (i.e., communicate) through an XML-based protocol known as SOAP.
This is an XML API for the functions provided by a Web service. Each Web service advertises
its SOAP API using WSDL that allows easy discovery of any service's capabilities.
Web services provide an open, interoperable, and highly efficient framework for implementing
systems. Software components communicate with each other via standard SOAP and XML
protocols. A developer need only wrap an application with a SOAP API and it can talk (either
calling or serving) with other applications. Web services are efficient because they build on the
stateless (i.e., loosely coupled) environment of the Internet. A number of nodes can be
dynamically connected only when needed to carry out a specific task such as updating a database
or providing a particular service.
Universal Description, Discovery and Integration (UDDI) is a platform-independent, XML-
based registry for businesses worldwide to list themselves on the Internet.
UDDI is an open industry initiative, sponsored by OASIS, enabling businesses to publish service
listings and discover each other and define how the services or software applications interact
over the Internet.
A UDDI business registration consists of three components:
White Pages — address, contact, and known identifiers;
Yellow Pages — industrial categorizations based on standard taxonomies;
Green Pages — technical information about services exposed by the business.
UDDI is one of the core Web services standards
It is designed to be interrogated by SOAP messages and to provide access to Web Services
Description Language documents describing the protocol bindings and message formats required
to interact with the web services listed in its directory.
UDDI was written in August, 2000, In such a world, the publicly operated UDDI node or broker
would be critical for everyone. For the consumer, public or open brokers would only return
services listed for public discovery by others, while for a service producer, getting a good
placement, by relying on metadata of authoritative index categories, in the brokerage would be
critical for effective placement.
The UDDI was integrated into the Web Services Interoperability (WS-I) standard as a central
pillar of web services infrastructure. By the end of 2005, it was on the agenda for use by more
than seventy percent of the Fortune 500 companies in either a public or private implementation,
and particularly among those enterprises that seek to optimize software or service reuse. Many of
these enterprises subscribe to some form of service-oriented architecture (SOA), server programs
or database software licensed by some of the professed founders of the UDDI.org and OASIS.
The UDDI specifications supported a publicly accessible Universal Business Registry in which a naming
system was built around the UDDI-driven service broker. IBM, Microsoft and SAP announced they were
closing their public UDDI nodes in January 2006.
Some assert that the most common place that a UDDI system can be found is inside a company
where it is used to dynamically bind client systems to implementations. They would say that
much of the search metadata permitted in UDDI is not used for this relatively simple role.
However, the core of the trade infrastructure under UDDI, when deployed in the Universal
Business Registries (now being disabled), has made all the information available to any client
application, regardless of heterogeneous computing domains
UDDI registries come in two forms: public and private. Both types comply to the same
A private registry enables you to publish and test your internal e-business applications in a
secure, private environment. Rational® Developer products include a private UDDI registry.
public registry is a collection of peer directories that contain information about businesses and
services. It locates services that are registered at one of its peer nodes and facilitates the
discovery of published Web services.
Data is replicated at each of the registries on a regular basis. This ensures consistency in service
description formats and makes it easy to track changes as they occur.
A private registry allows you to publish and test your internal applications in a secure, private
What is WSDL?
WSDL stands for Web Services Description Language
WSDL is written in XML
WSDL is an XML document
WSDL is used to describe Web services
WSDL is also used to locate Web services
WSDL is not yet a W3C standard
WSDL stands for Web Services Description Language.
WSDL is a document written in XML. The document describes a Web service. It specifies the
location of the service and the operations (or methods) the service exposes.
WSDL 1.1 was submitted as a W3C Note by Ariba, IBM and Microsoft for describing services
for the W3C XML Activity on XML Protocols in March 2001.
(a W3C Note is made available by the W3C for discussion only. Publication of a Note by W3C
indicates no endorsement by W3C or the W3C Team, or any W3C Members)
The first Working Draft of WSDL 1.2 was released by W3C in July 2002.
A WSDL document is just a simple XML document.
It contains set of definitions to describe a web service.
The WSDL Document Structure
A WSDL document describes a web service using these major elements:
<portType> The operations performed by the web service
<message> The messages used by the web service
<types> The data types used by the web service
<binding> The communication protocols used by the web service
The main structure of a WSDL document looks like this:
definition of types........
definition of a message....
definition of a port.......
definition of a binding....
A WSDL document can also contain other elements, like extension elements and a service
element that makes it possible to group together the definitions of several web services in one
single WSDL document.
For a complete syntax overview go to the chapter WSDL Syntax.
The <portType> element is the most important WSDL element.
It describes a web service, the operations that can be performed, and the messages that are
The <portType> element can be compared to a function library (or a module, or a class) in a
traditional programming language.
The <message> element defines the data elements of an operation.
Each message can consist of one or more parts. The parts can be compared to the parameters of a
function call in a traditional programming language.
The <types> element defines the data type that are used by the web service.
For maximum platform neutrality, WSDL uses XML Schema syntax to define data types.
The <binding> element defines the message format and protocol details for each port.
This is a simplified fraction of a WSDL document:
<part name="term" type="xs:string"/>
<part name="value" type="xs:string"/>
In this example the <portType> element defines "glossaryTerms" as the name of a port, and
"getTerm" as the name of an operation.
The "getTerm" operation has an input message called "getTermRequest" and an output message
The <message> elements define the parts of each message and the associated data types.
Compared to traditional programming, glossaryTerms is a function library, "getTerm" is a
function with "getTermRequest" as the input parameter and getTermResponse as the return
Web Service Caveat
1. Different implementations may not work together
2. SOAP messages on port 80 may bypass firewalls
3. Transactions must be specified outside the web services framework
4. Change Management is not addresses
Electronic Business using eXtensible Markup Language, commonly known as e-business XML,
or ebXML is a family of XML based standards sponsored by OASIS and UN/CEFACT whose
mission is to provide an open, XML-based infrastructure that enables the global use of electronic
business information in an interoperable, secure, and consistent manner by all trading partners.
The ebXML architecture is a unique set of concepts; part theoretical and part implemented in the
existing ebXML standards work.
The ebXML work stemmed from earlier work on ooEDI (object oriented EDI), UML / UMM,
XML markup technologies and the X12 EDI "Future Vision" work sponsored by ANSI X12
ebXML was started in 1999 as a joint initiative between the United Nations Centre for Trade
facilitation and Electronic Business (UN/CEFACT) and Organization for the Advancement of
Structured Information Standards (OASIS). A joint coordinating committee composed of
representatives from each of the two organizations led the effort. Quarterly meetings of the
working groups were held between November 1999 and May 2001. At the final plenary a
Memorandum of Understanding was signed by the two organizations, splitting up responsibility
for the various specifications but continuing oversight by the joint coordinating committee.
The original project envisioned five layers of data specification, including XML standards for:
· Business processes,
· Collaboration protocol agreements,
· Core data components,
· Registries and repositories
After completion of the specifications by the two organizations, the work was submitted to ISO
TC 154 for approval. The International Organization for Standardization (ISO) has approved the
following five ebXML specifications as the ISO 15000 standard, under the general title,
Electronic business eXtensible markup language:
· ISO 15000-1: ebXML Collaborative Partner Profile Agreement
· ISO 15000-2: ebXML Messaging Service Specification
· ISO 15000-3: ebXML Registry Information Model
· ISO 15000-4: ebXML Registry Services Specification
· ISO 15000-5: ebXML Core Components Technical Specification, Version 2.01.
Registry: A central server that stores a variety of data necessary to make ebXML work. Amongst
the information a Registry makes available in XML form are: Business Process & Information
Meta Models, Core Library, Collaboration Protocol Profiles, and Business Library. Basically,
when a business wants to start an ebXML relationship with another business, it queries a
Registry in order to locate a suitable partner and to find information about requirements for
dealing with that partner.
Business Processes: Activities that a business can engage in (and for which it would generally
want one or more partners). A Business Process is formally described by the Business Process
Specification Schema (a W3C XML Schema and also a DTD), but may also be modeled in
Collaboration Protocol Profile (CPP): A profile filed with a Registry by a business wishing to
engage in ebXML transactions. The CPP will specify some Business Processes of the business,
as well as some Business Service Interfaces it supports.
Business Service Interface: The ways that a business is able to carry out the transactions
necessary in its Business Processes. The Business Service Interface also includes the kinds of
Business Messages the business supports and the protocols over which these messages might
Business Messages: The actual information communicated as part of a business transaction. A
message will contain multiple layers. At the outside layer, an actual communication protocol
must be used (such as HTTP or SMTP). SOAP is an ebXML recommendation as an envelope for
a message "payload." Other layers may deal with encryption or authentication.
Core Library: A set of standard "parts" that may be used in larger ebXML elements. For
example, Core Processes may be referenced by Business Processes. The Core Library is
contributed by the ebXML initiative itself, while larger elements may be contributed by specific
industries or businesses.
Collaboration Protocol Agreement (CPA): In essence, a contract between two or more
businesses that can be derived automatically from the CPPs of the respective companies. If a
CPP says "I can do X," a CPA says "We will do X together."
Simple Object Access Protocol (SOAP): A W3C protocol for exchange of information in a
distributed environment endorsed by the ebXML initiative. Of interest for ebXML is SOAP's
function as an envelope that defines a framework for describing what is in a message and how to
OVERVIEW OF .NET
It offers multiple language support.
It has a rich set of libraries, a la JVM.
It's open-standard friendly (e.g., HTTP and XML) -- it may even become a standard itself.
Its code is compiled natively, regardless of language or deployment (Web or desktop).
It's yet another platform to consider, which generally means rewriting and learning new tricks.
Microsoft tends to have good ideas, but mediocre implementation.
Currently, it's only available on Windows.
Microsoft claims C#, IL, and CLR/CLS will be submitted to ECMA, but there's still no clear
view on what will be standardized from the platform.
Microsoft's .NET initiative has its origins in the increasing importance of the Web in almost all
areas of application development. Previous development tools, exemplified by Visual Studio
version 6.0, were designed for the needs of a decade ago, when the ruling paradigm was
applications that were stand-alone or were distributed over a local area network (LAN). As the
need for Web-related capabilities grew, ad hoc solutions were crafted as enhancements to
existing tools. Because the Web capabilities were not built into the development tools from the
beginning, however, there were inevitable problems with deployment, maintenance, and
Things are different with .NET. The .NET Framework provides a comprehensive set of classes
that are designed for just about any programming task you can imagine. From the very
beginning, the Framework was designed to integrate Web-related programming functionality.
The Framework can be used by any of Microsoft's three programming languages: Visual Basic,
C++, and C# (pronounced "C sharp"). The new releases of Visual Basic and C++ will be familiar
to anyone who has used earlier versions, although there are numerous changes to accommodate
the .NET architecture. C# is new language that is similar to Java in many respects, although there
are significant differences between the two. Some observers consider C# to be a Java
replacement made necessary because legal problems have forced Microsoft to stop supporting
Java (or Visual J++, as Microsoft's version of Java was called).
For the XML developer, .NET was designed to support XML from the ground up. There are no
add-ons required, such as the MSXML Parser or the SOAP Toolkit. Everything you need is
provided by the Framework. Please remember that as of this writing, the .NET Framework is a
beta product. It is believed that the XML support is fairly stable, but it is possible that there will
be some changes before the final product is released (which may happen by the time you read
3.XML security framework
5.XML digital signatures
The basic security requirements.
Ensuring that information is not made available or disclosed to unauthorized
(i)ability to determine that the message really comes from the listed sender.
(ii)non repudiation-preventing the origination of the document from denying having
Ensuring that information is not tampered in transit
Approaches to cryptography falls into two categories.
(i)single key cryptography
(ii)public key cryptography
Single key cryptography
A single key is used for both encryption and decryption.
The key must be known to both sender and receiver
The difficulty in this approach is the distribution of the key
Example DES-Data Encryption Standard
Single key systems are effective for secure communication between ATM machines and
However it does not scale upto web, where ecommerce depends on individuals just
showing to do business.
Public key cryptography
Enables secure communication without having to exchange secret key
It uses mathematical formula to generate two separate,but related key
One key is open to public view and the other private, known only to one individuals.
Encoding scheme-are used to represent characters
Attribute values are normalized
Double quotes for attribute values
Special character in attribute values and character content
XML and DTD declarations
White space outside document element
White space in start and end elements
Ordering of ns declaration and attributes
XML security framework
W3C is driving three XML security technology
XML digital signature
XML key management services
An important issue not addressed by SSL is encrypting part of the data being exchanged
Enables to overcome it by enabling encrypting part of the data.
It can also handle both XML and non XML data
Does not support encryption of attributes sample file to be encrypted
Steps for XML encryption
1.selecting the XML to be encrypted
2.converting into canonical form
3.encrypting the resulting canonical form with public key
4.sending the encrypted XML
XML Digital Signature
1.defines both syntax and rules for processing XML digital signature
2. It defines a series of XML elements for describing details of the signature.
Signed info-holds the information that is actually
Canonicalization method-algorithm used to canonicalize the signed info.
Signature method-algorithm used to convert the canonicalized signed info into the
Combination of digest algorithm
Key dependent algorithm
Reference –includes the method used to compute the digital hash and the identified data
object the signature is later checked via reference and signature validation
Key info-indicates the key to be used to validate the signature
-optional ordered list of processing steps applied to the resources content
before the digest was computed.
-algorithm applied to data after transforms is applied to yield the digest
-holds the value computed based on the data being signed.
Public key infrastructure
-arrangement that binds public key with respective user identifier by means of a
certificate authority(entry that issues digital certificates for use by other parties)
CA issues digital certificate which contain public key and an identity of the owner
It attests that the public key contained in the certificate belongs to the person,
Organization,server or other entity noted in the certificate
PK1 consists of client software, server hardware,software ,legal contracts and
XML encryption and digital signature rely on PK1 to help encrypt,decrypt,sign and
verify various documents.
Various PK1 solutions are available-X.509,PGP,SPK1.
Applications need to integrate with pk1 solution
Different organization use different PK1.
- allows management of PK1 by abstracting the complexity of maganging the Pk1 from
client applications to a trusted third party.
- Trusted third party hosts the XKMS service while providing a PK1 interface to
- this allows a client application to access PK1 features,thereby reducing the client
XKMS spec are made up of two specs.
1.XKRSS-reg.service spec-registration of public key
2.XKISS-info.service spec-retrival of information based on key information