"XML and Web Services"
XML and Web Services Recommended Reading: Chapter 11 Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 1 Outline Why XML? An Introduction to XML Web Services Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 2 Why XML? Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 3 What‟s Wrong with HTML? HTML (Hypertext Markup Language) was developed by Tim Berners-Lee in 1992 as a simplified version of SGML (Standard Generalized Markup Language). Simple language, well suited for hypertext, multimedia, and the display of small and reasonably simple documents. SGML is a standard language for defining and using document formats (ISO 8879). Too complicated to understand and to use (accessible only to experts). Although HTML is workable for simple document, it mixes up the ideas of the structure of a document and the display of that document. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 4 What‟s Wrong with HTML (cont.)? HTML has been extended in disorganized and incompatible ways by Netscape and Microsoft. To compete with each other, these two companies have added their own HTML tags, and implemented different interpretations of the same tags. Many Web sites today contain tagging that is written for a specific browser. These Web pages will work properly only with their intended specific browser (and therefore not work properly with other browsers). Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 5 What‟s Wrong with HTML (cont.)? In addition, there are also other limitations: Extensibility: HTML does not allow users to specify their own tags or attributes in order to parameterize or semantically qualify their data. Structure: HTML does not support the specification of deep structures needed to represent database schemas or object-oriented hierarchies. Validation: HTML does not support the kind of language specification that allows consuming applications to check data for structural validity on importation. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 6 The XML Effort XML (Extensible Markup Language) was developed starting in 1996 by a working group of the W3C (World Wide Web Consortium). XML is a standardized language to represent structured data as text files. XML advantages: XML provides strong separation of the structure of a document and the display of that document. Information providers can define new tags and attributes at will. Document structures can be nested to any level of complexity. Any XML document can contain an optional description of its grammar for use by applications that need to perform structural validation. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 7 The Main Point By defining our own markup language, we can encode the information of our documents much more precisely than it is possible with HTML. This means that programs processing these documents can “understand” them much better and therefore can process the information in ways that are impossible with HTML. Example: Imagine that we mark up recipes (say, for sea food dishes) according to some definition where we enter the amounts of ingredients needed for making each dish. We can write a program that, given a list of contents in our fridge, would go through the list of recipes and make a list of the dishes we could make with the available ingredients. Given nutritional information about the ingredients, the program could sort the dishes by the amount of calories in each dish. Given the price information for the ingredients, the program could sort the dishes by the price of each dish, and so on. The possibilities are almost endless, because the information is encoded in a way that the computer can “understand”. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 8 Web Applications of XML The applications that need XML are those that cannot be accomplished within the limitations of HTML. These applications can be divided into 4 categories: 1. Applications that require the Web client to mediate between two or more heterogeneous databases. 2. Applications that attempt to distribute a significant proportion of the processing load from the Web server to the Web client. 3. Applications that require the Web client to present different views of the same data to different users. 4. Applications in which intelligent Web agents attempt to tailor information discovery to the needs of individual users. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 9 Web Applications of XML: An Example Let‟s consider a typical example of the first category of XML applications: the information tracking system for a home health care agency. A patient entering a home health care agency is represented to the information system by a large collection of paper-based materials of the patient‟s medical histories. The major task in accepting the patient into the system is the manual entry of these materials into the agency’s database. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 10 Web Applications of XML: An Example (cont.) First solution (commonly used in practice): 1. Log into the hospital‟s Web site. 2. Become an authorized user. 3. Access the patient‟s medical records using a Web browser. 4. Print out the records from the Web browser. 5. Manually key in the data from the printouts. Second solution (slightly better): Instead of printing out the patient‟s medical records, the operator reads the records from the Web browser and directly key the data into the agency‟s online forms in a separate window. This solution saves the paper that would have been needed for the printouts, but does nothing to address the root of the problem. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 11 Web Applications of XML: An Example (cont.) Desired solution: 1. Log into the hospital‟s Web site. 2. Become an authorized user. 3. Access the patient‟s medical records in a Web-based interface that represents the patient‟s records as a folder icon. 4. Drag the folder from the Web application over to the internal database application. 5. Drop the folder into the database. This solution is not possible within the limitations of HTML, for three reasons: 1. The HTML tag set is too limited to represent or identify multiple database fields in the mixture of the medical documents. 2. HTML is incapable of representing the variety of structures in those documents. 3. HTML does not have any mechanism to check data for structural validity before the application attempts to import the data into the target database. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 12 Web Applications of XML: An Example (cont.) One technically feasible solution is to require all hospitals and health care agencies to use a single standard system dictated by the government. However, in an environment where many health care agencies and hospitals are in financial difficulty, it is hardly practical to require them to replace their existing heterogeneous systems with a single new system. The other way to enable interchange between heterogeneous systems is to adopt a single industry- wide interchange format that serves as the single output format for all exporting systems, and as the single input format for all importing systems. In other words, we need a standard language to export and import data: XML Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 13 An Introduction to XML Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 14 XML: A Simple Example <?xml version=“1.0”?> <Address> <Name> Larry Stewart </Name> <Street> 11 Serissa Circle </Street> <City> Wayland </City> <State> MA </State> <Zip> 01778 </Zip> </Address> The above XML fragment contains an address in the U.S. We are free to define new tags such as <Name>, <Street>, etc. to identify parts of the address. This arrangement makes XML very easy for disparate software tools to create and use. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 15 Well Formed and Valid XML Documents An XML document is said to be well formed if it has correct syntax, and is said to be valid if it specifies a document type definition (DTD) and complies with the constraints expressed in that DTD. If an XML document is well formed and valid, an XML parser will be able to process it. A DTD is a schema for a class of XML documents, appropriate for a given domain. DTD acts as a rule book that allows authors to create new documents with the same characteristics as the base document XML provides strong separation of the structure of a document and the display of that document. The structure is encoded in XML, while the display is managed by the Extensible Style-sheet Language (XSL). Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 16 XML Entities Elements Attributes Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 17 XML Elements XML elements are similar to records in a programming language. An element declaration has the following form: <!ELEMENT ElementName (ElementContents)> This declaration defines the relationships among the elements, the order of occurrences of the elements, and their number of occurrences. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 18 XML Elements (cont.) If an element X consists of elements A, B, and C in that order, then this would be declared as follows: <!ELEMENT X (A, B, C)> If the elements A, B, and C can appear in any order, then "&" is used in place of ",". If only one among A, B, or C is used, then the declaration is <!ELEMENT X ( A | B | C )> If element X consists of zero or more As, and one or more Bs, then the declaration is <!ELEMENT X ( A*, B+ )> A question mark after an element means that the element can be skipped: <!ELEMENT X ( A, B?, C? )> Note that elements can be nested. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 19 XML Element Types #PCDATA Parsed character data: The element content contains data which the XML parser can search to look for tags or entity declarations. ANY Character data: The element content can contain any element defined in any order. Data is not parsed. EMPTY The element content contains no data. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 20 XML Attributes Attribute declarations describe information about an element. More than one attribute can be defined for one element. Attributes are contained within the start tag of an element. They are defined as follows: <!ATTLIST ElementName AttributeName1 DeclaredValue1 DefaultValue1 AttributeName2 DeclaredValue2 DefaultValue2 ... AttributeNameN DeclaredValueN DefaultValueN > Declared value is either a list of permissible values, or one of the pre-defined data types. Default value specifies which value must or may be present as the default value. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 21 XML Attributes: Declared Value Types CDATA Character data: Characters other than the attribute value delimiters such as ( _ „ ) can be used. NMTOKEN The value must conform with the rules for an XML name. In general, it must start with a letter and be followed by any combination of letters, digits, or a few special characters. No spaces are allowed. NMTOKENS One or more NMTOKEN separated by spaces. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 22 XML Attributes: Declared Value Types (cont.) ID Identifier: The value of this attribute is unique for each element. IDREF The value of this attribute matches the value of some ID attribute of an element in the same XML document. It is used to point to that element. IDREFS One or more IDREF separated by spaces. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 23 XML Attributes: Default Value Types #REQUIRED Some value must be specified for this attribute. #IMPLIED When an attribute with this default value is not specified, the application uses the pre-determined attribute value. 'value' The 'value‟ specified is the default. Other permissible values may also be used. #FIXED 'value' The value must and can only be the value specified. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 24 XML Example: FAQ Document <?xml version=“1.0”?> <!DOCTYPE FAQ SYSTEM http://www.server.com/DTDs/faq.dtd> <FAQ> <INFO> <SUBJECT> XML </SUBJECT> Accessing DTD <AUTHOR> Lars Marius Garshol </AUTHOR> <EMAIL> email@example.com </EMAIL> <VERSION> 1.0 </VERSION> <DATE> June 20 2005 </DATE> Element and tags </INFO> <PART NO=“1”> <Q NO=“1”> Attribute <QTEXT> What is XML? </QTEXT> <A> Simplified SGML. </A> </Q> <Q NO=“2”> <QTEXT> What can I use it for? </QTEXT> <A> Anything. </A> </Q> </PART> </FAQ> Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 25 XML Abstract Syntax Tree FAQ INFO PART SUBJECT AUTHOR EMAIL VERSION DATE Q Q QTEXT A QTEXT A Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 26 DTD for the FAQ System (faq.dtd) <?xml version=“1.0”?> <!ELEMENT FAQ (INFO, PART+)> <!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?, VERSION?, DATE?)> <!ELEMENT SUBJECT (#PCDATA)> <!ELEMENT AUTHOR (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT VERSION (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PART (Q+)> <!ELEMENT Q (QTEXT, A)> <!ELEMENT QTEXT (#PCDATA)> <!ELEMENT A (#PCDATA)> <!ATTLiST PART NO CDATA #IMPLIED TITLE CDATA #IMPLIED> <!ATTLIST Q NO CDATA #IMPLIED> Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 27 Linking in XML XML links can be between two or more resources, which can be either files (not necessary XML or HTML files) or elements in files. Linking is an element with attributes: <!ELEMENT simplink ANY> <!ATTLIST simplink ACTUATE (AUTO|USER) “USER” SHOW (REPLACE|EMBED|NEW) “REPLACE” … > Links can be specified with the ACTUATE attribute to be followed either when the user explicitly makes a request for instance by clicking (if the value is USER), or automatically when the system reads the linking (if the value is AUTO). Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 28 Linking in XML (cont.) What happens when following a link specified with the SHOW attribute, which can take the following values: EMBED This means that the resource the link points to is to be inserted into the document. REPLACE This means that the resource the link points to is to be replacing the linking element. (Hence, if you have two different versions of a paragraph, you can link them in such a way that one can see the other version in the same context by following the link.) NEW This means that the resource the link points to will be processed or displayed in a new context (e.g., a new page). Ordinary HTML links are of type NEW as the new page is displayed in place of the previous one. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 29 XML Processing SAX (Simple API for XML): SAX is an event-driven API, providing functions to be called whenever specific XML constructs are encountered during parsing. It is used to transform/output as XML document is parsed. DOM (Document Object Model): DOM is also an API, focused on the data structure. It provides functions that the client uses to traverse the structure of an XML document, and functions for creating and altering the in-memory structure of a new document. XPATH (XML Path Language): XPATH provides query syntax for addressing parts of an XML document (i.e., addressing nodes in the abstract syntax tree). XSLT (Extensible Stylesheet Language Transformations): XSLT provides rules to transform an XML document into other XML formats or into other formats (such as HTML). Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 30 XML on the Web HTML GUI Browser Client DOM XSLT HTTP Server Parse and Process Server SAX DB Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 31 Web Services Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 32 A Simple Example Web services are simply applications made accessible over the Web. Consider a shipping rate calculator provided by a logistics company. Turning this calculator into a Web service requires the following steps: 1. Encapsulate the logic of the calculator (but not the user interface) into a subroutine. 2. Define the API for the calculator using the Web Services Definition Language (WSDL). 3. Host the subroutine on a Web server supporting the Simple Object Access Protocol (SOAP). 4. Publish the calculator definition to an appropriate UDDI (Universal Description, Discovery, and Integration) directory. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 33 A Simple Example (cont.) Now, a programmer who wants to use the rate calculator from an e-commerce system can do the following: 1. Look up the service in the UDDI directory. 2. Use SOAP to make a remote call from the client application to the rate calculator. 3. Use the results of the call in the application. Web services make it easy for service providers to make business logic available for remote use. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 34 A Simple Example (cont.) UDDI Registry Publish Lookup Service Service Internet Web Services SOAP Call Web Services Client Host SOAP Response Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 35 The Vision of Web Services Web services provide a straightforward and interoperable means for programs to communicate with each other over the Web. Web services also provide directories so that providers can advertise and users can search for services. It is possible to develop a market for heavyweight remote services, such as payment systems, logistics, business messaging etc. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 36 Remote Procedure Calls Web services are built on the concept of remote procedure calls (RPC). In an RPC, the calling program, rather than invoking a local subroutine, instead invokes a client stub, which has the same API as the desired subroutine. The client stub communicates with a remote server, where a server stub makes the actual call to the actual subroutine. In addition, the calling program must bind its interface to the appropriate server by using a network directory service. The service directory is implemented using UDDI and the API is defined using WSDL, which is an XML schema. Actual parameters and return values are encoded in text form in XML. Web services are built on standard Web servers and HTTP. Taken together, these decisions make use of the existing Internet infrastructure for communications between programs. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 37 SOAP The Simple Object Access Protocol (SOAP) is the specification of how RPCs are implemented over the Web. There are 3 aspects to SOAP: 1. The SOAP calling conventions explain how to represent calls to remote procedures and their responses. 2. The SOAP encoding rules explain how to represent application data, namely the arguments and return values from the remote procedure calls. 3. The SOAP envelope defines the contents of a SOAP message and the rules for processing it. SOAP is almost always used with HTTP as the transport protocol, but it can also be used with other communications systems. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 38 WSDL The Web Services Definition Language (WSDL) is the interface definition language for Web services. Most commonly, WSDL is used to describe services that are available via SOAP and HTTP. WSDL defines Web services in terms of the following six concepts: 1. Types: The data type definitions that are used to describe messages. 2. Message: An abstract definition of the data being transmitted. 3. Port Type: A set of abstract operations, each of which has input and output messages. 4. Binding: The concrete protocol and data format specifications 5. Port: An address for a single communication endpoint. 6. Service: The aggregation of a set of related ports. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 39 UDDI Universal Description, Discovery, and Integration (UDDI) is not a protocol so much as a process. The idea is to operate directories or registries of business entities, business services so that people and programs can find providers of the Web services needed. See www.uddi.org for further information. Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 40 References Dr. Stan Matwin‟s Lecture slides An Introduction to XML by Lars Marius Garshol (http://www.garshol.priv.no/download/text/ xml-intro/index-en.html) XML, Java, and the Future of the Web by Jon Bosak (http://www.ibiblio.org/pub/sun- info/standards/xml/why/xmlapps.htm) Dr. Thomas Tran CSI 5389 (E-Commerce Technologies) 41