Web Services by yaofenji


									Reliable Distributed Systems

         Web Services
   Web Services – Introduction
   “Remote Procedure Call” in WS
       Binding, Marshalling…
   Using TCP as the transport for RPCs
       Connectivity Issues: NAT, Firewall
What are Web Services?
   Today, we normally use Web browsers
    to talk to Web sites
       Browser names document via URL (lots of
        fun and games can happen here)
       Request and reply encoded in HTML, using
        HTTP to issue request to the site
   Web Services generalize this model so
    that computers can talk to computers
What are Web Services?
        Client      SOAP
       System       Router


What are Web Services?
   “Web Services are software
    components described via
    WSDL which are capable of         SOAP
    being accessed via standard
    network protocols such as SOAP
    over HTTP.”

   What are Web Services?
       “Web Services are software
        components described via
        WSDL which are capable of         SOAP
        being accessed via standard
        network protocols such as SOAP
        over HTTP.”
Today, SOAP is the primary standard.     Processes
SOAP provides rules for encoding the
     request and its arguments.

 What are Web Services?
     “Web Services are software
      components described via
      WSDL which are capable of                 SOAP
      being accessed via standard
      network protocols such as SOAP
      over HTTP.”
 Similarly, the architecture doesn’t assume    Processes
that all access will employ HTTP over TCP.
In fact, .NET uses Web Services “internally”
even on a single machine. But in that case,
         communication is over COM

       What are Web Services?
           “Web Services are software
            components described via
            WSDL which are capable of            SOAP
            being accessed via standard
            network protocols such as SOAP
            over HTTP.”
 are used to                                  Backend
 drive object                                Processes
 generation,                                 +     document
  and other
    tools.                                    Web
  Web Services are often Front Ends
                  Web Service                 WSDL-
                   invoker                   described
                                            Web Service
     App                                                         SAP

 C#                                          Web
 App                                        Server
                                          (e.g., IBM
                                  SOAP    WebSphere,
   CORBA                        messaging    BEA
    App                                   WebLogic)

Client Platform                                           Server Platform
The Web Services “stack”
   BPEL4WS (IBM only, for now)

                                      Transactions    Quality
                   Security                             of
                                      Coordination    Service

             WSDL, UDDI, Inspection                  Description

            SOAP                         Other
                                       Protocols     Messaging
      XML, Encoding

 TCP/IP or other network transport protocols         Transport
What are Web Services?
   Amazon would hand out
    “serverlets” for 3rd party
    developers to use                   SOAP
   This connects their applications
    directly to Amazon’s system


        Advantages of web services?*
   Web services provide interoperability between various
    software applications running on various platforms.
       “vendor, platform, and language agnostic”
   Web services leverage open standards and protocols.
    Protocols and data formats are text based where possible
       Easy for developers to understand what is going on.
   By piggybacking on HTTP, web services can work through
    many common firewall security measures without requiring
    changes to their filtering rules.

        *: From Wikipedia
How Web Services work
   First the client discovers the service.
   Typically, client then binds to the
       By setting up TCP connection to the
        discovered address .
       But binding not always needed.
How it works…
   Next build the SOAP request: (Marshaling)
       Fill in what service is needed, and the arguments.
        Send it to server side.
       XML is the standard for encoding the data (but is
        very verbose and results in HUGE overheads)
   SOAP router routes the request to the
    appropriate server(assuming more than one
    available server)
       Can do load balancing here.
How it works…
   Server unpacks the request,
    (Demarshaling) handles it, computes
    Result sent back in the reverse
    direction: from the server to the SOAP
    router back to the client.
Marshalling Issues
   Data exchanged between client and
    server needs to be in a platform
    independent format.
       “Endian”ness differ between machines.
       Data alignment issue (16/32/64 bits)
       Multiple floating point representations.
       Pointers
       (Have to support legacy systems too)
   This is the problem of finding the
    “right” service
       In our example, we saw one way to do it –
        with a URL
       Web Services community favors what they
        call a URN: Uniform Resource Name
   But the more general approach is to use
    an intermediary: a discovery service
             Example of a repository
Name                           Type             Publisher      Toolkit        Language       OS
Web Services Performance and   Application      LisaWu                        N/A            Cross-Platform
     Load Tester

Temperature Service Client     Application      vinuk          Glue           Java           Cross-Platform

Weather Buddy                  Application      rdmgh724890    MS .NET        C#             Windows
DreamFactory Client            Application      billappleton   DreamFactory   Javascript     Cross-Platform

Temperature Perl Client        Example Source   gfinke13                      Perl           Cross-Platform

Apache SOAP sample source      Example Source   xmethods.net   Apache SOAP    Java           Cross-Platform

ASS 4                          Example Source   TVG            SOAPLite       N/A            Cross-Platform

PocketSOAP demo                Example Source   simonfell      PocketSOAP     C++            Windows
easysoap temperature           Example Source   a00            EasySoap++     C++            Windows
Weather Service Client with    Example Source   oglimmer       MS SOAP        Visual Basic   Windows
     MS- Visual Basic
TemperatureClient              Example Source   jgalyan        MS .NET        C#             Windows
Repository summary
   A database listing servers
   Each is described using the UDDI language,
    which is defined over XML
       Hence can be searched with XML queries
   An extensible standard
       Defines some required information about
        interfaces available and argument types, etc
       But services can provide extra information too.
   UDDI is used to write down the
    information that became a “row” in the
    repository (“I have a temperature
   WSDL documents the interfaces and
    data types used by the service
   But this isn’t the whole story…
Discovery and naming
   The topic raises some tough questions
       Many settings, like the big data centers run
        by large corporations, have rather standard
        structure. Can we automate discovery?
       How to debug if applications might
        sometimes bind to the wrong service?
       Delegation and migration are very tricky
       Should a system automatically launch
        services on demand?
Example: Why discovery is
   Client has opinions
       “I want current map data for Disneyland showing
        line-lengths for the rides right now”
   Service has opinions
       Amazon.com would like requests from Ithaca to
        go to the NJ-3 datacenter, and if possible, to the
        same server instance within each clustered service
   DNS has opinions
       Many systems play with name -> IP bindings
   Internet has opinions (routing)
So, what’s tricky?
   Web Services doesn’t standardize these
    four steps, it just assumes that people
    will hack solutions
   Hence some are hard to implement, we
    lack standards, and in some cases,
    solutions are poor ones
   UDDI and WSDL are just a corner of the
    overall picture!
Network address translation…
   Another issue: Often, the internal address is
    not addressable from outside!
       A tiny bit of security.
       But if RPC server is behind a NAT, trouble!
            NAT needs the host behind it to start the connection
            Need to configure NAT to let specified traffic through.
            Generally: (WS traffic)HTTP is let through.
       Tough to have a connection in between two hosts
        behind NATs.
            There are some tricks to bypass this though.
   These allow/disallow traffic, depending on source,
    destination, protocol used, etc.
       Often only allow connection from the inside to the
   Stateful: remember active flows, and disallow unexpected
    packets (NAT)
       Again, need to configure to ensure server traffic gets
        through. (General RPC)
       Again, (WS)HTTP does not face as much of a restriction.
   Get traffic statistics.
   Spam/virus checking, etc.
   NAT and firewall typically in the same box.
Demilitarized Zone (DMZ)
              DMZ: used to host publicly
               accessible services like
               company webpages, ftp, dns.
              Good place to host the Web
              DMZ situated outside the
               private network.
              No outgoing connections from
              If DMZ attacked, damage
               limited to DMZ.
Client talks to eStuff.com
   Moving on… let’s oversimplify and just
    assume the client manages to find the
    data center
   We think of remote method invocation
    and Web Services as a simple chain:

 Client                              Web
system                                 Web
            Soap RPC   router       Service
So… suppose we get in
   Assuming we can connect to the data
    center (to its Web Services router),
    then what?
   If you just use Visual Studio out of the
    box, you end up with a single-machine
    Web Server
   But massive datacenters are common!
A glimpse inside eStuff.com

                       “front-end applications”

           Pub-sub combined with point-to-point
           communication technologies like TCP

   LB          LB           LB          LB          LB        LB

 service     service      service     service     service   service
Clusters and load balancing
   Idea here is that some form of load
    balancer spreads work over a cluster
   And cluster replicates data for
    availability and load management
   How it does this is a topic we need to
    discuss in more detail (not today)
What about “legacy”
   Some of these Web services are really just
    front-ends to older legacy applications
       So to talk to an old IBM database, we might
            Run the database on some sort of machine, or virtual
            Build one of these translator front-ends
            And then register it with the Web Services router
   This may sound expensive (it is) but it works!
   Obviously, our fancy clustering and load-
    balancing won’t apply to a legacy application,
    so those fancy tricks are only for “new” code
Discovery in eStuff.com
   Data centers are increasingly common
   And they raise hard questions!
       How can a data center in California control
        decisions a client is making in Ithaca?
       Services are clustered. How should client
        request be “routed” to the right member
       Once you start talking to a server it may
        cache data for you. How can you be sure
        to get the right one next time?
These are modern challenges
   Web Services can be seen as evolving
    from prior work
   Most often cited: CORBA, which also
    was used in many big data centers
   But CORBA didn’t assume that clients
    came in over the public Internet
       More often, CORBA was used between a
        hand-built client and the service it talks to
CORBA approach
   CORBA had what are called
       Ways to export specialized client stubs
            The client stub could include server provided
             decision logic, like “which data center to
             connect with”
            Gives data center a form of remote control
       Factory services: manufacture certain kinds
        of objects as needed
            Effect was that “discovery” can also be a
             “service creation” activity
CORBA is object oriented
   Seems obvious… and it is. CORBA is centered
    around the notion of an object
       Objects can be passive (data)
       … active (programs)
       … persistent (data that gets saved)
       … volatile (state only while running)
   In CORBA the application that manages the object is
    inseparable from the object
       And the stub on the client side is part of the application
       The request per-se is an action by the object on itself and
        could even exploit various special protocols
       We can’t do this in Web Services
Web Services are document-
   That is, communication is by sending documents (like
    pages) from client to server and back
   And most guarantees or properties are associated
    with the document itself, not the service
       For example, WS_RELIABILITY isn’t about making services
        reliable, it defines rules for writing reliability requests down
        and attaching them to documents
       In contrast, CORBA fault-tolerance standard tells how to
        make a CORBA service into a highly available clustered
Will Web Services “help” with
naming and discovery?
   Web Services tells us how
       One client can…
       … find one server and
       … bind to that server and
       … send a request that will make sense
       … and make sense of the response
   So sure, WS will help
But Web Services won’t…
   Allow the data center to control decisions the
    client makes
   Assist us in implementing naming and
    discovery in scalable cluster-style services
       How to load balance? How to replicate data?
        What precisely happens if a node crashes or one
        is launched while the service is up?
       Help with dynamics. For example, best server for
        a given client can be a function of load but also
        affinity, recent tasks, etc
    How we do it now
   Client queries directory to find the service
   Server has several options:
       Web pages with dynamically created URLs
            Server can point to different places, by changing host names
            Content hosting companies remap URLs on the fly. E.g.
             http://www.akamai.com/www.cs.cornell.edu (reroutes
             requests for www.cs.cornell.edu to Akamai)
       Server can control mapping from host to IP addr.
            Must use short-lived DNS records; overheads are very high!
            Can also intercept incoming requests and redirect on the fly
Why this isn’t good enough
   The mechanisms aren’t standard and are
    hard to implement
       Akamai, for example, does content hosting using
        all sorts of proprietary tricks
   And they are costly
       The DNS control mechanisms force DNS cache
        misses and hence many requests do RPC to the
        data center
   We lack a standard, well supported, solution!
Coming up?
   How content is managed in even larger
    systems, that have multiple data
   The main example is Akamai…

To top