Querying XML by yurtgc548


									Remote Procedure Calls
and Web Services

                                    Zachary G. Ives
                                University of Pennsylvania
               CIS 455 / 555 – Internet and Web Systems

                                      August 30, 2012
 Reminder HW2 Milestone 2 due tonight

 HW3 “pre-release” today

What Does MapReduce Do Well?
 What are its strengths?

 What about weaknesses?

MapReduce is a Particular
Programming Model
… But it’s not especially general (though things like Pig Latin
  improve it)
Suppose we have autonomous application components that wish
  to communicate
We’ve already seen a few strategies:
    Request/response from client to server
        HTTP itself
    Asynchronous messages
        Router “gossip” protocols
        P2P “finger tables”, etc.
Are there general mechanisms and principles?
   (Of course!)
… Let’s first look at what happens if we need in-order messaging 4
Communication Mechanisms
We’ve already seen a few:
    Request/response from client to server
        HTTP itself
    Asynchronous messages
        Router “gossip” protocols
        P2P “finger tables”, etc.

Are there general mechanisms and principles?
   (Of course!)

… Let’s first look at what happens if we need in-order messaging

Message-Queuing Model (1)
 Four combinations for loosely-coupled communications using queues.


Message-Queuing Model (2)
 Basic interface to a queue in a message-queuing system.

Primitive   Meaning

Put         Append a message to a specified queue

Get         Block until the specified queue is nonempty, and remove the first message

Poll        Check a specified queue for messages, and remove the first. Never block.

Notify      Install a handler to be called when a message is put into the specified queue.

General Architecture
of a Message-Queuing System (1)

    The relationship between queue-level addressing and
     network-level addressing.

General Architecture
of a Message-Queuing System (2)

 The general organization of a message-queuing system with routers.


Benefits of Message Queueing
 Allows both synchronous (blocking) and
  asynchronous (polling or event-driven)
 Ensures messages are delivered (or at least readable)
  in the order received

 The basis of many transactional systems
    e.g., Microsoft Message Queue (MMQ), IBM MQseries,

Some Common Modes of Building
Distributed Applications
    XQuery (fetch XML from multiple sites, produce new XML)
        Turing-complete functional programming language
        Good for Web Services; not much support for I/O, etc.
    MapReduce (built over DHT or distributed file system)
        Single filter (map), followed by single aggregation (reduce)
        Languages over it: Sawzall, Pig Latin, Dryad, …
Message passing / request-response:
        e.g., over a DHT, sockets, or message queue
    Communication via asynchronous messages
    Processing in message handler loop
Function calls:
    Remote procedure call / remote method invocation
Fully Synchronous Request/Response:
Remote Procedure Calls
 Remote procedure calls have been around forever, including:
      COM+
      CORBA
      Java RMI
      The basic idea: put a function elsewhere in the system, call in
       distributed fashion but using standard languages, methods
 An RPC API defines a format for:
    Initiating a call on a particular server, generally in a reliable way
    Sending parameters (marshalling) to the server
    Receiving a return value, which may require marshalling as well
 And an RPC call is synchronous (i.e., it generally blocks)

A Remote Procedure Call Visualized

RPC          server is busy    function server waits for req.

Client working       client blocks


How RPC Generally Works
 You write an application with a series of functions
 One of these functions, F, will be distributed remotely
 You call a “stub generator”
    A caller stub emulates the function F:
        Opens a connection to the server
        Requests F, marshalling all parameters
        Receives F’s return status and parameters
    A server stub emulates the caller:
        Receives a request for F with parameters
        Unmarshals the parameters, invokes F
        Takes F’s return status (e.g., protection fault), return value, and marshals
         it back to the client

Passing Value Parameters
 Steps involved in doing remote computation through RPC


RPC Components
 Generally, you need to write:
    Your function, in a compatible language
    An interface definition, analogous to a C header file, so
     other people can program for F without having its source
 Generally, software will take the interface definition
  and generate the appropriate stubs
     (In the case of Java, RMIC knows enough about Java to run
     directly on the source file)
 The server stubs will generally run in some type of
  daemon process on the server
    Each function will need a globally unique name or GUID

Parameter Passing Can Be Tricky
Because of References
 The situation when passing an object by reference or by value.


What Are the Hard Problems with
RPC? Esp. Inter-Language RPC?

 Resolving different data formats between languages
  (e.g., Java vs. Fortran arrays)
 Reliability, security

 Finding remote procedures in the first place
 Extensibility/maintainability

 (Some of these might look familiar from when we
  talked about data exchange!)
Web Services
 Goal: provide an infrastructure for connecting components,
  building applications in a way similar to hyperlinks between
 It’s another distributed computing platform for the Web
    Goal: Internet-scale, language-independent, upwards-compatible
     where possible
 This one is based on many familiar concepts
    Standard protocols: HTTP
    Standard marshalling formats: XML-based, XML Schemas
    All new data formats are XML-based

One Alternative: REST
(Representational State Transfer)
 Not really a standard – a style of development
    Data is represented in XML, e.g., with a schema
    Function call interface uses URIs
       Server is to be stateless
    And the HTTP request type specifies the operation
       e.g., GET http://my.com/rest/service1
       e.g., POST http://my.com/rest/service1 {body} adds the body to
        the service

The “Standard” for Web Services:
Three Parts
1. “Wire” / messaging protocols
      Data encodings, RPC calls or document passing, etc.
2. Describing what goes on the wire
      Schemas for the data
3. “Service discovery”
      Means of finding web services

 The Protocol Stacks of Web Services
                        WS-AtomicTransaction,                                     High-level
Other extensions                                           Orchestration          state transition
                                                            (WS-BPEL)             + msging
SOAP Attachments            WS-Addressing                    Message              diagrams
     WS-Security                                            Sequencing            between
                                                        Service Capabilities      modules
           SOAP, XML-RPC                                Service Description       Directory
                                                              (WSDL)              (UDDI)
                   XML                                     XML Schema            Inspection
          Wire Format Stack                              Description Stack     Discovery Stack

  Enhanced + expanded from a figure from IBM’s “Web Services Insider”,

Messaging Protocol: SOAP
 Simple Object Access Protocol: XML-based format for
  passing parameters
    Has a SOAP header and body inside an envelope
    As a defined HTTP binding (POST with content-type of
    A companion SOAP Attachments encapsulates other (MIME) data

    The header defines information about processing: encoding,
     signatures, etc.
        It’s extensible, and there’s a special attribute called mustUnderstand that
         is attached to elements that must be supported by the callee
    The body defines the actual application-defined data

A SOAP Envelope
     <t:Transaction xmlns:t=“www.mytrans.com” SOAP-
     ENV:mustUnderstand=“1” />
     <m:PlaceOrder xmlns:m=“www.somewhere/there”>
      <orderno xsi:type=“xsd:string”>12</orderno>
</SOAP-ENV: Envelope>

Making a SOAP Call
 To execute a call to service PlaceOrder:

   POST /PlaceOrder HTTP/1.1
   Host: my.server.com
   Content-Type: application/soap+xml; charset=“utf-8”
   Content-Length: nnn


SOAP Return Values
 If successful, the SOAP response will generally be another
  SOAP message with the return data values, much like the
 If failure, the contents of the SOAP envelop will generally be
  a Fault message, along the lines of:

     <SOAP-ENV:Fault xmlns=“mynamespace”>
      <faultstring>Could not parse message</faultstring>

How Do We Declare Functions?
 WSDL is the interface definition language for web
    Defines notions of protocol bindings, ports, and services
    Generally describes data types using XML Schema

 In CORBA, this was called an IDL
 In Java, the interface uses the same language as the
  Java code

A WSDL Service


    Port         Port        Port
   PortType    PortType    PortType
   Operation   Operation   Operation

   Operation   Operation   Operation

   Binding     Binding     Binding

Web Service Terminology
 Service: the entire Web Service
 Port: maps a set of port types to a transport binding
  (a protocol, frequently SOAP, COM, CORBA, …)
 Port Type: abstract grouping of operations, i.e. a
 Operation: the type of operation –
  request/response, one-way
    Input message and output message; maybe also fault
 Types: the XML Schema type definitions

Example WSDL
<service name=“POService”>
     <port binding=“my:POBinding”>
          <soap:address location=“http://yyy:9000/POSvc”/>
<binding xmlns:my=“…” name=“POBinding”>
     <soap:binding style=“rpc” transport=“http://www.w3.org/2001/...” />
     <operation name=“POrder”>
         <soap:operation soapAction=“POService/POBinding” style=“rpc” />
         <input name=“POrder”>
              <soap:body use=“literal” … namespace=“POService” …/>
         <output name=“POrderResult”>
              <soap:body use=“literal” … namespace=“POService” …/>
JAX-RPC: Java and Web Services
 To write JAX-RPC web service “endpoint”, you
  need two parts:
     An endpoint interface – this is basically like the IDL
     An implementation class – your actual code

public interface BookQuote extends java.rmi.Remote {
   public float getBookPrice(String isbn) throws java.rmi.RemoteException;
public class BookQuote_Impl_1 implements BookQuote {
   public float getBookPrice(String isbn) { return 3.22; }

Different Options for Calling
 The conventional approach is to generate a stub, as
  in the RPC model described earlier
 You can also dynamically generate the call to the
  remote interface, e.g., by looking up an interesting
  function to call
 Finally, the “DII” (Dynamic Instance Invocation)
  method allows you to assemble the SOAP call on
  your own

Creating a Java Web Service
 A compiler called wscompile is used to generate
  your WSDL file and stubs
    You need to start with a configuration file that says
     something about the service you’re building and the
     interfaces that you’re converting into Web Services

Example Configuration File
<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns="http://java.sun.com/xml/ns/jax-
   <service name="StockQuote"

       <interface name="stockqt.StockQuoteProvider"


Starting a WAR
 The Web Service version of a Java JAR file is a Web Archive,
 There’s a tool called wsdeploy that generates WAR files
 Generally this will automatically be called from a build tool
  such as Ant
 Finally, you may need to add the WAR file to the appropriate
  location in Apache Tomcat (or WebSphere, etc.) and enable

 See
  WSPack2/jaxrpc.html for a detailed example

Finding a Web Service
 UDDI: Universal Description, Discovery, and
  Integration registry
 Think of it as DNS for web services
    It’s a replicated database, hosted by IBM, HP, SAP, MS

 UDDI takes SOAP requests to add and query web
  service interface data

What’s in UDDI
White pages:
    Information about business names, contact info, Web site name, etc.
Yellow pages:
    Types of businesses, locations, products
    Includes predefined taxonomies for location, industry, etc.
Green pages – what we probably care the most about:
    How to interact with business services; business process definitions;
    Pointer to WSDL file(s)
    Unique ID for each service

Data Types in UDDI
 businessEntity: top-level structure describing info
  about the business
 businessService: name and description of a service
 bindingTemplate: how to access the service
 tModel (t = type/technical): unique identifier for
  each service-template specification
 publisherAssertion: describes relationship between
  businessEntities (e.g., department, division)

Relationships between UDDI

          2                                             tModel
   businessEntity                                        n
                    n                               m

              businessService   1   n bindingTemplate

Example UDDI businessEntity
<businessEntity businessKey=“0123…” xmlns=“urn:uddi-org:api_v2”>
        <discoveryURL useType=“businessEntity”>
    <name>My Books</name>
    <description>Technical Book Wholesaler</description>
        <!– keyedReferences to tModels 
    <categoryBag> … </categoryBag>
UDDI in Perspective
 Original idea was that it would just organize itself in
  a way that people could find anything they wanted

 Today UDDI is basically a very simple catalog of
  services, which can be queried with standard APIs
    It’s not clear that it really does what people really want:
     they want to find services “like Y” or “that do Z”

The Problem: With UDDI and Plenty of
Other Situations
There’s no universal, unambiguous way of describing “what I
    Relational database idea of “normalization” doesn’t convert concepts
     into some normal form – it just helps us cluster our concepts in
     meaningful ways
    “Knowledge representation” tries to encode definitions clearly – but
     even then, much is up to interpretation

The best we can do: describe how things relate
    pollo = chicken = poulet = 雞 = 鸡 = jī = मुर्गी = murg
    Note that this mapping may be imprecise or situation-specific!
        Calling someone a chicken, vs. a chicken that’s a bird

This Brings Us to XQuery,Whose
Main Role Is to Relate XML
Suppose we define an XML schema for our target data and our source data

Can directly translate between XML schemas or structures
     Describes a relationship between two items
         Transform 2 into 6 by “add 4” operation
         Convert from S1 to S2 by applying the query described by view V

Often, we don’t need to transfer all data – instead, we want to use the data
   at one source to help answer a query over another source…

Lazy Evaluation: A Virtual View
                          Server(s)         Browser/App

  Source1.xml                      XQuery
                           Virtual           Query
                          XML doc.           Form

               Composed     Query     XSLT
               XQuery       Results

Let’s Look at Some Simple

 Beginning with examples of using XQuery to convert
  from one schema to another, e.g., to import data

 First: let’s review what our mappings need to

Challenges of Mapping Schemas
In a perfect world, it would be easy to match up items from one
   schema with another
    Each element would have a simple correspondence to an element in
     the other schema
    Every value would clearly map to a value in the other schema
Real world: as with human languages, things don’t map clearly!
      Different decompositions into elements
      Different structures
      Tag name vs. value
      Values may not exactly correspond
      It may be unclear whether a value is the same
It’s a tough job, but often things can be mapped

Example Schemas
Bob’s Movie Database            Mary’s Art List
  <movie>                         <workOfArt>
   <title>…</title>                <id>…</id>
   <year>…</year>                  <type>…</type>
   <director>…</director>          <artist>…</artist>
   <editor>…</editor>              <subject>…</subject>
   <star>…</star>*                 <title>…</title>
  </movie>*                       </workOfArt>*

  Want to map data from one schema to the other
Mapping Bob’s Movies  Mary’s Art
Start with the schema of the output as a template:
Then figure out where to find the values in the source,
  and create XPaths

The Final Schema Mapping
Mary’s Art  Bob’s Movies
    for $m in doc(“movie.xml”)//movie,
        $a in $m/director/text(),
        $i in $m/title/text(),
        $t in $m/title/text()
    return <workOfArt>
             <type>movie</type>    Note the absence of subject…
             <artist>$a</artist>   We had no reasonable source,
             <title>$t</title>     so we are leaving it out.


To top