Web Services Globus OGSA - PowerPoint Presentation by liwenting

VIEWS: 102 PAGES: 93

									Grid Technology A
  Web Services
  Globus OGSA
     & Grid
Prosperity Games
   August 2003
      Geoffrey Fox
  Community Grids Lab
   Indiana University
                 With Thanks to
• Tony Hey my co-speaker at CERN and

•   I adapted presentations from
•   Marlon Pierce
•   Dennis Gannon
•   Globus
•   Malcolm Atkinson
•   David de Roure
What is High Performance Computer?
• We might wish to consider three classes of multi-node computers
• 1) The Grid or distributed systems of computers around the
   – Latencies of inter-node communication – 100’s of milliseconds but can have
     good bandwidth
• 2) The classic cluster which can vary from configurations like 1) to
  3) but typically have millisecond latency and modest bandwidth
• 3) Classic MPP with microsecond latency and scalable internode
  bandwidth (tcomm/tcalc ~ 10 or so)
• All have same peak CPU performance but synchronization costs
  decrease as one goes from 1) to 3)
• Cost of system (dollars per gigaflop) increases by factors of 2 at
  each step from 1) to 2) to 3)
• One should NOT use classic MPP if class 1) or 2) suffices unless
  some security or data issues dominates over cost-performance
                  What is a Grid I?
• Collaborative Environment (Ch2.2,18)
• Combining powerful resources, federated computing and a security
  structure (Ch38.2)
• Coordinated resource sharing and problem solving in dynamic multi-
  institutional virtual organizations (Ch6)
• Data Grids as Managed Distributed Systems for Global Virtual
  Organizations (Ch39)
• Distributed Computing or distributed systems (Ch2.2,10)
• Enabling Scalable Virtual Organizations (Ch6)
• Enabling use of enterprise-wide systems, and someday nationwide
  systems, that consist of workstations, vector supercomputers, and
  parallel supercomputers connected by local and wide area networks.
  Users will be presented the illusion of a single, very powerful
  computer, rather than a collection of disparate machines. The system
  will schedule application components on processors, manage data
  transfer, and provide communication and synchronization in such a
  manner as to dramatically improve application performance. Further,
  boundaries between computers will be invisible, as will the location
  of data and the failure of processors. (Ch10)
                What is a Grid II?
• Supporting e-Science representing increasing global collaborations of
  people and of shared resources that will be needed to solve the new
  problems of Science and Engineering (Ch36)
• As infrastructure that will provide us with the ability to dynamically
  link together resources as an ensemble to support the execution of
  large-scale, resource-intensive, and distributed applications. (Ch1)
• Makes high-performance computers superfluous (Ch6)
• Metasystems or metacomputing systems (Ch10,37)
• Middleware as the services needed to support a common set of
  applications in a distributed network environment (Ch6)
• Next Generation Internet (Ch6)
• Peer-to-peer Network (Ch10, 18)
• Realizing thirty year dream of science fiction writers that have spun
  yarns featuring worldwide networks of interconnected computers that
  behave as a single entity. (Ch10)
           SERVOGrid Caricature
     Repositories                Sensor Nets
  Federated Databases                                   Streaming Data

Database          Database

                                                           Analysis and
Loosely Coupled                                            Visualization
     Filters            Closely Coupled Compute Nodes
       What is Grid Technology?
• Grids support distributed collaboratories or virtual
  organizations integrating concepts from
• The Web
• Distributed Objects (CORBA Java/Jini COM)
• Globus Legion Condor NetSolve Ninf and other High
  Performance Computing activities
• Peer-to-peer Networks
• With perhaps the Web being the most important for
  ―Information Grids‖ and Globus for ―Compute Grids‖
• Use Information Grids and not usual Data Grids as
  ―distributed file systems‖ (holding lots of data!) are
  handled in Compute Grids
PPPH: Paradigms Protocols Platforms and Hosting I
• We will start from the Web view and assert that
  basic paradigm is
• Meta-data rich Web Services communicating via
• These have some basic support from some runtime
  such as .NET, Jini (pure Java), Apache
  Tomcat+Axis (Web Service toolkit), Enterprise
  JavaBeans, WebSphere (IBM) or GT3 (Globus
  Toolkit 3)
  – These are the distributed equivalent of operating
    system functions as in UNIX Shell
• Called Hosting Environment or platform
       SERVOGrid Requirements
• Seamless Access to Data repositories and large scale
• Integration of multiple data sources including sensors,
  databases, file systems with analysis system
   – Including filtered OGSA-DAI
• Rich meta-data generation and access with SERVOGrid
  specific Schema extending openGIS standards and using
  Semantic Grid
• Portals with component model for user interfaces and web
  control of all capabilities
• Collaboration to support world-wide work
• Basic Grid tools: workflow and notification
                                                       Application WS
• Build on e-Science methodology and Grid
• Science applications with multi-scale
  models, scalable parallelism, data                       WS linking
                                                           to user and
  assimilation as key issues                                Other WS
  – Data-driven models for earthquakes, climate,         (data sources)

    environment …..
• Use existing code/database technology
  (SQL/Fortran/C++) linked to ―Application
  Web/OGSA services‖                                        Typical
  – XML specification of models, computational
    steering, scale supported at ―Web Service‖ level
    as don’t need ―high performance‖ here
  – Allows use of Semantic Grid technology
   Integration of Data and Filters
• One has the OGSA-DAI Data repository interface
  combined with WSDL of the (Perl, Fortran, Python …)
• User only sees WSDL not data syntax
• Some non-trivial issues as to where the filtering compute
  power is
   – Microsoft says filter next to data

                 WSDL                     OGSA-DAI
                 Of Filter                Interface
SERVOGrid Complexity Computing Environment


   Database           Compute                                  Sensor
    Service            Service            Service              Service

               Middle Tier    with XML           Interfaces
                                                         XML Meta-data
Application                                                Service
                          CCE Control
 Service-2              Portal Aggregation                    Complexity
Application                                                    Service
                                 Users                    Visualization
         Portal such as “Jetspeed”
H                                                  H
o    AWS          AWS          AWS         AWS     o         Grid
s                                                  s     Computing or
t                                                  t
i   Application/User Framework supporting          i     Programming
n   development and deployment of OGSI compliant   n     Environments
g                                                  g
    AWS (Application Web Services)
E                                                  E
n          Generic Application Services            n                        Web
v                                                  v
i                                                  i                        Services
r            OGSA Interoperability Layer           r
n       “Sophisticated” System Services
m                                                  m        Grid
e            OGSA Interoperability Layer           e
n                                                  n
t             Resource Grid Services               t   e.g. DAI compliant

                               Database                 Resources
Taxonomy of Grid Operational Style
Name of Grid Style          Description of Grid Operational or
                                     Architectural Style
Semantic Grid        Integration of Grid and Semantic Web meta-data
                     and ontology technologies
Peer-to-peer Grid    Grid built with peer-to-peer mechanisms

Lightweight Grid     Grid designed for rapid deployment and minimum
                     life-cycle support costs
Collaboration Grid   Grid supporting collaborative tools like the Access
                     Grid, whiteboard and shared applications.
R3 or Autonomic      Fault tolerant and self-healing Grid
Grid                 Robust Reliable Resilient R3
      Taxonomy of Grid Functionalities
Name of Grid Type               Description of Grid Functionality

Compute/File Grid      Run multiple jobs with distributed compute and data
                       resources (Global ―UNIX Shell‖)
or Data File Grid
Desktop Grid           ―Internet Computing‖ and ―Cycle Scavenging‖ with secure
                       sandbox on large numbers of untrusted computers
e.g. SETI@Home
Information Grid       Grid service access to distributed information, data and
or Data Service Grid   knowledge repositories

Complexity or          Hybrid combination of Information and Compute/File Grid
Hybrid Grid            emphasizing integration of experimental data, filters and
                       simulations: Data assimilation
Campus Grid            Grid supporting University community computing

Enterprise Grid        Grid supporting a company’s enterprise infrastructure
            What is a Web Service I
• A web service is a computer program running on either the local
  or remote machine with a set of well defined interfaces (ports)
  specified in XML (WSDL)
• In principle, computer program can be in any language (Fortran ..
  Java .. Perl .. Python) and the interfaces can be implemented in
  any way what so ever
   – Interfaces can be method calls, Java RMI Messages, CGI Web
      invocations, totally compiled away (inlining) but
• The simplest implementations involve XML messages (SOAP)
  and programs written in net friendly languages like Java and
• Web Services separate the meaning of a port (message) interface
  from its implementation
• Enhances/Enables Re-usable component model of ANY
  electronic resource
Raw               Raw Data                      Raw Data
                        (Virtual) XML Data Interface

                        Web Service (WS)

     WS                       WS                          WS

                      XML WS to WS Interfaces     etc.

WS           WS               WS                WS                WS

            (Virtual) XML Knowledge (User) Interface

             Render to XML Display Format
                                                         (Virtual) XML
     Clients                                           Rendering Interface
         What is a Web Service II
• Web Services have important implication that ALL
  interfaces are XML messages based. In contrast
• Most Windows programs have interfaces defined as
  interrupts due to user inputs
• Most software have interfaces defined as methods which
  might be implemented as a message but this is often NOT
                    WSDL interfaces           Credit Card

                 Security         Catalog

                WSDL interfaces                shipping
          What is a Web Service III
• ―Everything electronic‖ is a resource
   – Computers; Programs; People
   – Data (from sensors to this presentation to email to databases)
• ―Everything electronic‖ is a distributed object
• All resources have interfaces which are defined in XML for both
  properties (data-structure) and methods (service, function,
  subroutine) (Resources are Services)
   – We can assume that a data-structure property has
     getproperty() and setproperty(value) methods to act as
• All resources are linked by messages with structure, which must
  be specifiable in XML
• All resources have a URI such as unique://a/b/c …….
              WSDL Abstractions
• WSDL abstracts a program as an entity that does
  something given one or more inputs with its results
  defined by streams on one or more outputs.
• Functions are defined by method name and parameters
  methodname(parm1,parm2, … parmN)
  – Where parameters are ―Input‖ ―Output‖ or both
• In WSDL, we will have a Web Service which like a
  (Java or CORBA Program) can be thought of as a
  (distributed) object with many methods
  – Instead of a function call, the ―calling routine‖ sends an XML
    message to the Web Service specifying methodname and
    values of the parameters
  – Note name of function is just another parameter
       Details of WSDL Protocol Stack
• UDDI finds where programs are
                                           UDDI or WSIL
   – remote( (distributed) programs are
     just Web Services                          WSFL
   – (not a great success)
• WSFL links programs together                  WSDL

  (under revision as BPEL4WS)
                                            SOAP or RMI
• WSDL defines interface (methods,
  parameters, data formats)                 HTTP or SMTP
• SOAP defines structure of message        or IIOP or RMTP
  including serialization of information
• HTTP is negotiation/transport protocol
• TCP/IP is layers 3-4 of OSI              Physical Network
• Physical Network is layer 1 of OSI
        Education as a Web Service
• Can link to Science as a Web Service and substitute educational
• ―Learning Object‖ XML standards already exist from IMS/ADL
  http://www.adlnet.org – need to update architecture
• Web Services for virtual university include:
• Registration
• Performance (grading)
• Authoring of Curriculum
• Online laboratories for real and virtual instruments
• Homework submission
• Quizzes of various types (multiple choice, random parameters)
• Assessment data access and analysis
• Synchronous Delivery of Curricula
• Scheduling of courses and mentoring sessions
• Asynchronous access, data-mining and knowledge discovery
• Learning Plan agents to guide students and teachers
 What are System and Application Services?
• There are generic Grid system services: security, collaboration,
  persistent storage, universal access
   – OGSA (Open Grid Service Architecture) is implementing these as
     extended Web Services
• An Application Web Service is a capability used either by another
  service or by a user
   – It has input and output ports – data is from sensors or other
• Consider Satellite-based Sensor Operations as a Web Service
   – Satellite management (with a web front end)
   – Each tracking station is a service
   – Image Processing is a pipeline of filters – which can be grouped
     into different services
   – Data storage is an important system service
   – Big services built hierarchically from ―basic‖ services
• Portals are the user (web browser) interfaces to Web services
                Application Web Services
                                                             simulations and Prog2
• Filter1Service model integrates sensors, sensor analysis, Prog1
    Note          Filter2          Filter3                                    people
    WS               WS              WS                       WS               WS
• An Application Web Service is a capability used either by another service or by a
    user                                                       Build as multiple
 Build as multiple Filter Web Services
     – It has input and output ports – data is from users, sensors or other services
     – Big services built hierarchically from ―basic‖ servicesPrograms

                       Sensor Data                                 Simulation WS
                         as a Web
                       service (WS)

                                              Analysis WS

                            Sensor                         Visualization WS
     The Application Service Model
• As bandwidth of communication (between) services increases
  one can support smaller services
• A service ―is a component‖ and is a replacement for a library in
  case where performance allows
• Services (components) are a sustainable model of software
  development – each service has documented capability with
  standards compliant interfaces
   – XML defines interfaces at several levels
   – WSDL at Service interface level and XSIL or equivalent for
     scientific data format
• A service can be written as Perl, Python, Java Servlet, Enterprise
  JavaBean, CORBA (C++ or Fortran) Object …
• Communication protocol can be RMI (Java), IIOP (CORBA) or
             7 Primitives in WSDL
• types: which provides data type definitions used to describe the
  messages exchanged.
• message: which represents an abstract definition of the data
  being transmitted. A message consists of logical parts, each of
  which is associated with a definition within some type system.
• operation– an abstract description of an action supported by the
• portType: which is a set of abstract operations. Each operation
  refers to an input message and output messages.
• binding: which specifies concrete protocol and data format
  specifications for the operations and messages defined by a
  particular portType.
• port: which specifies an address for a binding, thus defining a
  single communication endpoint.
• service: which is used to aggregate a set of related ports
 OGSA OGSI & Hosting Environments
• Start with Web Services in a hosting environment
• Add OGSI to get a Grid service and a component model
• Add OGSA to get Interoperable Grid ―correcting‖ differences in base
  platform and adding key functionalities

                Not OGSA                Domain -specific services

                                    More specialized services: data
             Possibly OGSA

                                                                           Models for resources
                                    replication, workflow, etc., etc.

                                                                            & other entities
                                  Broadly applicable services: registry,
                OGSA                 authorization, monitoring, data
                                           access, etc., etc.
                                       OGSI on Web Services

                                     Hosting Environment for WS
     Given to us from on high
    OGSI Open Grid Service Interface
•   http://www.gridforum.org/ogsi-wg
•   It is a ―component model‖ for web services.
•   It defines a set of behavior patterns that each OGSI service must exhibit.
•   Every ―Grid Service‖ portType extends a common base type.
     – Defines an introspection model for the service
     – You can query it (in a standard way) to discover
           • What methods/messages a port understands
           • What other port types does the service provide?
           • If the service is ―stateful‖ what is the current state?
•   Factory Model
•   A set of standard portTypes for
     – Message subscription and notification
     – Service collections
•   Each service is identified by a URI called the ―Grid Service Handle‖
•   GSHs are bound dynamically to Grid Services References (typically wsdl
     – A GSR may be transient. GSHs are fixed.
     – Handle map services translate GSHs into GSRs.
        Categories of Worldwide Grid Services
           to be exploited by SERVOGrid
•       1) Types of Grid                                 •   7) Information Grid Services
    –        R3                                               – OGSA-DAI/DAIT
    –        Lightweight                                      – Integration with compute resources
    –        P2P
                                                              – P2P and database models
    –        Federation and Interoperability
•       2) Core Infrastructure and Hosting Environment   •   8) Compute/File Grid Services
    –        Service Management                               – Job Submission
    –        Component Model                                  – Job Planning Scheduling Management
    –        Service wrapper/Invocation                       – Access to Remote Files, Storage and
    –        Messaging                                            Computers
•       3) Security Services                                  – Replica (cache) Management
    –        Certificate Authority                            – Virtual Data
    –        Authentication
                                                              – Parallel Computing
    –        Authorization
    –        Policy
                                                         •   9) Other services including
•       4) Workflow Services and Programming Model            – Grid Shell
    –        Enactment Engines (Runtime)                      – Accounting
    –        Languages and Programming                        – Fabric Management
    –        Compiler                                         – Visualization Data-mining and
    –        Composition/Development                              Computational Steering
•       5) Notification Services                              – Collaboration
•       6) Metadata and Information Services             •   10) Portals and Problem Solving Environments
    –        Basic including Registry
                                                         •   11) Network Services
    –        Semantically rich Services and meta-data
    –        Information Aggregation (events)                 – Performance
    –        Provenance                                       – Reservation
                                                              – Operations
    Functional Level above OGSA
•   Systems Management and Automation
•   Workload / Performance Management
•   Security
•   Availability / Service Management
•   Logical Resource Management
•   Clustering Services
•   Connectivity Management
•   Physical Resource Management
•   Perhaps Data Access belongs here
         Two-level Programming I
• The paradigm implicitly assumes a two-level Programming
• We make a Service (same as a ―distributed object‖ or
  ―computer program‖ running on a remote computer) using
  conventional technologies
   – C++ Java or Fortran Monte Carlo module
   – Data streaming from a sensor or Satellite
   – Specialized (JDBC) database access
• Such nuggets accept and produce data from users files and
• The Grid is built by coordinating such nuggets assuming
  we have solved problem of programming the nugget
         Two-level Programming II
• The Grid is discussing the linkage and distribution of the
  nuggets with the only           Nugget1              Nugget2
  addition runtime interfaces
  to Grid as opposed to
  UNIX data streams          Nugget3              Nugget4

• Familiar from use of UNIX Shell, PERL or Python scripts
  to produce real applications from core programs
• Such interpretative environments are the single processor
  analog of Grid Programming
• Some projects like GrADS from Rice University are
  looking at integration between nugget levels but dominant
  effort looks at each level separately
Why we can dream of using HTTP and
          that slow stuff
•   We have at least three tiers in computing environment
•   Client (user portal discussed Thursday)
•   ―Middle Tier‖ (Web Servers/brokers)
•   Back end (databases, files, computers etc.)
•   In Grid programming, we use HTTP (and used to use
    CORBA and Java RMI) in middle tier ONLY to
    manipulate a proxy for real job
    – Proxy holds metadata
    – Control communication in middle tier only uses metadata
    – ―Real‖ (data transfer) high performance communication in
      back end
    Services              Portal                  Grid
                         Services              Computing

System              Application             System
Services          Application Metadata      Services


System                   System          System
Services                 Services        Services       Grid

                                Raw (HPC)
    Actual Application
                                Resources     Database
PPPH: Paradigms Protocols Platforms and Hosting II
• Self-describing programs/interfaces are key to scaling
   – Minimize amount of work system has to do
   – Hide as much as possible in services and applications
• Protocols describe (in ―principle‖ at least) those rules that
  system obeys and uses to deliver information between
  services (processes)
• Interfaces tell the service what to do to interpret the results
  of communication
• HTTP is the dominant transport protocol of the Web
• HTML is the ―interface‖ telling browser how to render
• But you can extend interface to allow PDF, multimedia,
  PowerPoint using ―helper applications‖ which are (with
  more or less convenience) which are ―automatically‖
  downloaded if not already available
   – ―Mime types‖ essentially self-describe‖ each interface
                Analogy with Web II
• HTTP and HTML are the analogies on the client side
• A ―Web Service‖ generalizes a CGI Script on server side
   – CGI is essentially a Distributed Object technology
     allowing server to access an arbitrary program labeled by
     a URL plus an ugly syntax to specify name and
     parameters of program to run
• Roughly WSDL (Web Service Description Language) is a
  better to specify program name and its parameters
• Web uses other protocols – HTTPS for secure links and
  RTP etc. for multimedia (UDP) streams
   – These again are required to integrate system – codecs like
     MPEG are interfaces interpreted by client
   – There are further protocols like H323 and SIP which will
     be placed (IMHO) by HTTP plus RTP etc. We should
     minimize number of protocols to get maintainable
PPPH: Paradigms Protocols Platforms and Hosting III
• There are set of system capabilities which cannot be captured as
  standalone services and permeate Grid
• Meta-data rich Message-linked Web Services is permeating paradigm
• Component Model such as ―Enterprise JavaBean (EJB)‖ or OGSI
  describes the formal structure of services – EJB if used lives inside
  OGSI in our Grids
• Invocation Framework describes how you interact with system
• Security in fine grain fashion to provide selective authorization
  (Globus and EDG WP6)
• Policy context describes rules for this particular Grid
• Transport mechanisms abstract concepts like ports and Quality of
• Messaging abstracts destination and customization of content
• Network (monitoring, performance) EDG WP7
• Fabric (resources) EDG WP4
• The Grid could and sometimes does virtualize various
• Location: URI (Universal Resource Identifier) virtualizes
• Replica management (caching) virtualizes file location
  generalized by GriPhyn virtual data concept
• Protocol: message transport and WSDL bindings
  virtualize transport protocol as a QoS request
• P2P or Publish-subscribe messaging virtualizes matching
  of source and destination services
• Semantic Grid virtualizes Knowledge as a meta-data
• Brokering virtualizes resource allocation
• Virtualization implies references can be indirect
 IFS: Interfaces and Functionality and Semantics I
• The Grid platform tries to minimize detail in protocols and
  maximize detail in interfaces to enhance scaling
• However rich meta-data and semantics are critical for
  correct and interesting operation
   – Put as much semantic interpretation as you can into specific
   – Lack of Semantic interoperation is in fact main weakness of
     today’s Grids and Web services
• Everything becomes a service (See example of education)
  whether system or application level
• There are some very important ―Global Services‖
   – Discovery (look up) and Registration of service metadata
   – Workflow
   – MetaSchedulers
IFS: Interfaces and Functionality and Semantics II
• There are many other generally important services
• OGSA-DAI The Database Service
• Portal Service linked to by WSRP (Web services
  for Remote Portals)
• Notification of events
• Job submission
• Provenance – interpret meta-data about history of
• File Interfaces
• Sensor service – satellites …
• Visualization
• Basic brokering/scheduling
    Globus in a Nutshell from IPG
• GT2 (or Globus Toolkit 2) is original (non web
  service based) version which is basis of EDG
  (European Data Grid) work
• C programs and libraries
• See Chapter 5 of book with background in chapters
  2-4 and 37
• http://www.ipg.nasa.gov/ipgusers/globus/
• http://www.globusworld.org/globusworld_web/jw2
          Globus GT2 from IPG
• The goal of the Globus GT2 is to provide dependable,
  consistent, pervasive access to high-end resources.
   – This is original Grid ―start‖ general recently to virtual
     organizations and data grids
• The Globus Project offers the most widely used
  computing grid middleware. The Globus Project is a joint
  effort of Argonne National Laboratory, the Informational
  Sciences Institute of the University of Southern
  California, in collaboration with numerous other
  organizations including NCSA, NPACI, UCSD, and
  NASA. See http://www.globus.org/ for history, goals,
  release and usage notes, software distributions, and
  research papers.
                          Globus GT2 II
• Grid Fabric: Layer One
   The fabric of the Grid comprises the underlying systems, computers, operating
   systems, networks, storage systems, and routers—the building blocks.
• Grid Services: Layer Two
   Grid services integrate the components of the Grid fabric. Examples of the services
   that are provided by Globus Toolkit 2:
   The Globus Resource Allocation Manager, GRAM, is a basic library service that
   provides capabilities to do remote-submission job start up. GRAM unites Grid
   machines, providing a common user interface so that you can submit a job to
   multiple machines on the Grid fabric. GRAM is a general, ubiquitous service, with
   specific application toolkit commands built on top of it
  The Monitoring and Discovery Service, also known as GIS, the Grid Information
  Service, provides information service. You query MDS to discover the properties of
  the machines, computers and networks that you want to use: how many processors
  are available at this moment? What bandwidth is provided? Is the storage on tape or
  disk? Is the visualization device an immersive desk or CAVE? Using an LDAP
  (Lightweight Directory Access Protocol) server, MDS provides middleware
  information in a common interface to put a unifying picture on top of disparate
• Contd …
                    Globus GT2 III
• GSI gss-api library for adding authentication to a program. GSI
  provides programs, such as grid-proxy-init, to facilitate login to a
  variety of sites, while each site has its own flavor of security
  measures. That is, on the fabric layer, the various machines you want
  to use might be governed by disparate security policies; GSI provides
  a means of simplifying multiple remote logins. The standard
  installation is based on a PKI security system; the Kerberos
  installation of Globus is less standard. (Some installations with DoE
  and DoD insist on Kerberos)
• GridFTP A new (in Globus 2.0) protocol for file transfer over a
  grid. This is a Global Grid Forum standard
• GASS Globus Access to Secondary Storage, provides command-line
  tools and C APIs for remotely accessing data. GASS integrates
  GridFTP, HTTP, and local file I/O to enable secure transfers using
  any combination of these protocols..
                      Globus GT2 IV
• Application Toolkits: Layer Three
    Application toolkits use Grid Services to provide higher-level
    capabilities, often targeted to specific classes of application.
•   For example, the Globus development team has created a set of Grid
    service tools and a toolkit of programs for running remotely
    distributed jobs. These include remote job submission commands (
    globusrun, globus-job-submit, globus-job-run), built on top of the
    GRAM service, and MPICH-G2, a Grid-enabled implementation of
    the Message Passing Interface (MPI).
•   A more modern interface is through CoG Kits (Commodity Grid) to
    different languages – Perl Python Java – see chapter 26 of Book
•   The Java CoG kit provides a natural way to link GT2 to a Web
    service framework
•   Globus Toolkit 3 (GT3) effectively integrated CoG Kit interface with
    core Globus by wrapping all Globus Services as Web services
            Job Submission in Globus
• Very similar to UNIX Shell – build Portal Web Interfaces to specific
  or general Shell commands. Some example commands
• globusrun Runs a single executable on a remote site with an RSL
• globus-job-cancel Cancels a job previously started using globus-job-
• globus-job-run Allows you to run a job at one or several remote
  resources. It translates the program arguments to an RSL request and
  uses globusrun to submit the job.
• globus-job-clean Kills the job if it is still running and cleans the
  information concerning the job.
• globus-job-status Display the status of the job. See also globus-get-
  output to check the standard output or standard error of your job.
• These are all controlled by metadata specified by the Globus
  Resource Specification Language (RSL) which provides a common
  language to describe jobs and the resources required to run them.
• http://www.globus.org/gram/gram_rsl_parameters.html
• The simplest RSL expression looks something like the following.
Virtual Data Toolkit VDT from GriPhyn
• http://www.lsc-group.phys.uwm.edu/vdt/
• Trillium (PPDG from DoE GriPhyn and iVDgL from NSF) is
  major US effort building Grid application software with a
  strong particle physics emphasis
• VDT is their major software release and its heart is Condor
  and GT2.
   – There is some ―virtual data‖ software as well but not clear
     if this is of interest in production use (interesting research
• Condor (Chapter 11 of Book) is powerful job scheduler for
  clusters and ―cycle scavenging‖
   – It has a well developed interface (ClassAds) for defining
     requirements of jobs and matching to compute capabilities
     OGSA/OGSI Top Level View
Chapters 7 to 9 of Book
• OGSA is the set of

                                    Domain-specific services
  ―core‖ Grid services
                                More specialized services: data
   – Stuff you can’t live

                                                                       Models for resources
                                replication, workflow, etc., etc.

                                                                        & other entities
                              Broadly applicable services: registry,
   – If you built a Grid         authorization, monitoring, data
                                       access, etc., etc.
     you would need to
     invent these things                     OGSI
                              Host. Env.   & Protocol Bindings
                                Hosting Environment
                               Hosting Environment        Transport
    OGSI Open Grid Service Interface
• http://www.gridforum.org/ogsi-wg
• It is a ―component model‖ for web services.
• It defines a set of behavior patterns that each OGSI service must exhibit.
• Every ―Grid Service‖ portType extends a common base type.
   – Defines an introspection model for the service
   – You can query it (in a standard way) to discover
         • What methods/messages a port understands
         • What other port types does the service provide?
         • If the service is ―stateful‖ what is the current state?
• A set of standard portTypes for
   – Message subscription and notification
   – Service collections
• Each service is identified by a URI called the ―Grid Service Handle‖
• GSHs are bound dynamically to Grid Services References (typically wsdl
   – A GSR may be transient. GSHs are fixed.
   – Handle map services translate GSHs into GSRs.
            OGSI and Stateful Services
• Sometimes you can send a message to a service, get a result and
  that’s the end
   – This is a statefree service
• However most non-trivial services need state to allow persistent
  asynchronous interactions
• OGSI is designed to support Stateful services through two
   – Information Port: where you can query for SDE (Service
     Definition Elements)
   – ―Factories‖ that allow one to view a Service as a ―class‖ (in an
     object-oriented language sense) and create separate instances for
     each Service invocation
• There are several interesting issues here
   – Difference between Stateful interactions and Stateful services
   – System or Service managed instances
                             Factories and OGSI
• Stateful interactions are typified by amazon.com where messages carry correlation
  information allowing multiple messages to be linked together
   – Amazon preserves state in this fashion which is in fact preserved in its
      database permanently
• Stateful services have state that can be queried outside a particular interaction
• Also note difference between implicit and explicit factories
   – Some claim that implicit factories scale as each service manages its own
      instances and so do not need to worry about registering instances and lifetime
• See WS-Addressing from largely IBM and Microsoft

           Implicit Factory                                                         Explicit Factory
                                     1                                                                        F
                     A                                                                          2             C
                     C               2                                                                        T
                     T                                                                                        O
                     O                                                                          3
                                     3                                                                        R
                     R                                                                                        Y
                     Y                                                                          4
Open Grid Service Architecture

• OGSA-WG chaired by
  – Ian Foster, ANL and Univ. of Chicago
  – Jeff Nick, IBM
  – Dennis Gannon, IU
• Active Members from
  – IBM, Fujitsu, NEC, SUN, Hitachi, Avaki
  – Univ. of Mich, Chicago, Indiana (not much
    academic involvement)
        OGSA Core Services I

• Registries, and namespace bindings
   – Registry is a collection of services indexed by service
      • ―find me a service with property X.‖
   – Directory is a map from a namespace to GSHs.
   – A namespace is a human understandable version of a
     Grid Handle
• Queues
   – For building schedulers and resource brokers
   – Jobs and other requests are in queues
   – This is high-level messaging
• Base this on Web Services Security
• Authentication
   – 2-way. Who are you and who am I?
• Authorization
   – What am I authorized to use/see/modify
• Accounting/Billing
   – (not really security – see monitoring)
• Privacy
• Group Access
   – Easily create a group to share access to a virtual Grid.
• Very complex issues related to services and message
    Common Resource Model

• Every resource on the grid that is
  manageable is represented by a service
  – CRM is the Schema hierarchy that defines each
    resource (with its meta-data)
  – Service for a resource presents its management
    interface to authorized parties.
                           Policy Management
• Policy management services
   – Mechanism to publish policy and the services it applies to.
   – Policy life-cycle mgmt.
• Policy languages exist for routing, security, resource use
         Producer of Policies
                                                                        Policy Service Core
            Admin GUI /                  Policies 1       Policy
                                    *                     Service                                     Policy
            Autonomic                                                   *                         Transformation
             Manager                                     Manager                                      Service

                                *                              1..n            1                     Policy
                                                                                     XML            Validation
                                                                                   Repository        Service
                                                                *              1
       Consumer of Policies     Canonical
                                                  *       Policy                                     Policy
                                 Policies                 Service                                   Resolution
                                *               1
                                                           Agent         *

                                                Policy Component Requirements:
                                                     A management control point for policy lifecycle (PSM)
      Common Resource Model                          A canonical way to express policies (AC 4-tuple)
          Device / Resource                          A distribution point for policy dissemination (PSA)
                                                     A way to express that a service is “policy aware” (PEP)
                                                     A way to effect change on a resource (CRM)
      Grid Service Orchestration
• Creating new services by composing other
• Two types of Orchestration
  – Composition in space
     • One services is directly invoking another
  – Composition in time
     • Managing the workflow
        – First do this.
        – Then do this and that
        – When that is done do this
            » If something goes wrong do this
        – And so on…
              Data Services

•   Distributed Data Access
•   Data Caching
•   Data Replication Services
•   Metadata Catalog Services
•   Storage Services
Metering Resource Consumption
            • At what granularity do
              services report resource
            • How do they report it?
            • How are services metered?

• Two threads/workflows must synchronize
  and agree they have done so before moving
  – Usually involves modification to two or more
    persistent states
  – WS-transactions has been ―proposed‖.
     Messaging, Events, Logging
• Messaging
   – Delivery Model
   – Queuing and Pub/Sub message delivery (not clear to me why
     these are different as publish/subscribe implemented as topic
     labeled queues)
• Events
   – Time stamped messages
   – Standard XML schemas
• Standard Logging
• MQSeries (IBM), JMS (Java Message Service) and
  NaradaBrokering (Indiana) provide this but most
  naturally at level of ―platform/hosting environment‖
       Where should Messaging be?
• One can define messaging at the OGSA level ―above the
  hosting environment‖ but that makes it difficult to virtualize
  messaging and support network performance
   – Publish-subscribe or better queued messaging naturally
     supports optimized routing based on network
• One can naturally support collaborative Web services in
  same fashion in a way that it MUCH easier that
  GrooveNetworks and other collaborative environments
  (WebeX, Placeware(Microsoft)) do as long as every
  application is a Web service
• OGSA location of messages is fine for low volume logging
  or notification events
   – Not good for events on ―video‖ application where each frame is an
     update event
  From                                 Application as a Web service
  As a WS                               Events         Rendering
                  From Master

                                            W3C DOM Events
     Participating Client
                                               User Interface

From                                 Application as a Web service
Collaboration     To Collaborative
As a WS                               Events         Rendering

                                          W3C DOM Events
       Master Client
                                            User Interface
        Collaboration: Shared Display
    Sharing can be done at any point on “object” or Web Service
                            Shared Web Service                 Shared
         Shared                    Shared Export               Display
                                                      Object   Object
    Object        Object’          Object’’
                                                      Viewer   Display

      Shared Display shares
      framebuffer with events                   Event
      corresponding to changed                (Message)
      pixels in master client.                 Service

    As long as pipeline uses messages, easy to
    make collaborative                                          Object
    Windows framebuffers and in fact most applications          Display
    do NOT expose a message based update interface
Shared Input Port (Replicated WS) Collaboration
                                    Collaboration as a WS
                                  Set up Session with XGSP

                     R     U
                 F    Web     F
                     Servic            WS            WS
                 I     e      I      Viewer        Display
                   O        O


                     R     U
                 F    Web     F
                     Servic           WS            WS
  Event          I     e      I      Viewer        Display
                   O        O
                     R     U
                 F    Web     F
                     Servic           WS             WS
                 I     e      I      Viewer        Display
                   O        O
Shared Output Port Collaboration
                                          Collaboration as a WS
  Web Service Message                   Set up Session with XGSP

      R                       U
  F                               F              WS           WS
           Application or
  I       Content source          I
                                               Viewer       Display
      O                       O
          Web Service

                                                WS           WS
Text Chat                                      Viewer       Display
Multiple                      Event                         Other
masters                     (Message)                    Participants
                                                WS            WS
                                               Viewer       Display
   Based on a network of cooperating broker nodes
    • Cluster based architecture allows system to scale to arbitrary
   Originally designed to provide uniform software
    multicast to support real-time collaboration linked to
    publish-subscribe for asynchronous systems.
   Now has four major core functions
    • Message transport (based on performance measurement) in
      heterogeneous multi-link fashion
    • General publish-subscribe including JMS & JXTA and
      support for RTP-based audio/video conferencing
    • Filtering for heterogeneous clients
    • Federation of multiple instances of Grid services
    Role of Event/Message Brokers
   We will use events and messages interchangeably
     • An event is a time stamped message
   Our systems are built from clients, servers and “event brokers”
     • These are logical functions – a given computer can have one
       or more of these functions
     • In P2P networks, computers typically multifunction; in Grids
       one tends to have separate function computers
     • Event Brokers “just” provide message/event services; servers
       provide traditional distributed object services as Web services
   There are functionalities that only depend on event itself and
    perhaps the data format; they do not depend on details of
    application and can be shared among several applications
     • NaradaBrokering is designed to provide these functionalities
     • MPI provided such functionalities for all parallel computing
      Engineering Issues Addressed
      by Event / Messaging Service
   Application level Quality of Service
         – e.g. give audio highest priority
   Tunnel through firewalls & proxies
   Filter messages to slow (collaborative/real-time) clients
   Choose Hardware or Software multicast
   Scaling of software multicast
    • Efficient calculation of destinations and routes.
   Integrate synchronous and asynchronous collaboration with
    same messaging, control, archiving for all functions
   Transparently replace single server JMS systems with a
    distributed solution.
   Provides reliable inter-peer group messaging for JXTA
   Open Source (high quality) messaging
NaradaBrokering implements an Event Service
    Routing                                     Filter           workflow
                        Source Matching

         Web                        (Virtual)             Web
        Service 1                    Queue               Service 2

                WSDL              Broker           WSDL
                Ports                              Ports
   Filter is mapping to PDA or slow communication channel
    (universal access) – see our PDA adaptor
   Workflow implements message process
   Routing illustrated by JXTA and includes firewall
   Destination-Source matching illustrated by JMS using Publish-
    Subscribe mechanism
   These use Security model (being implemented) based on WS-Sec
     Narada Broker Network
                                                    (P2P) Community
                   For message/events service

 (P2P) Community

                Hypercube topology
              Broker brokers?
                Tree for distance education
(P2P) Community
                              at root
                with teacherBroker                         Data

                           Software multicast

                                                (P2P) Community
           NaradaBrokering Communication
   Applications interface to NaradaBrokering through UserChannels
    which NB constructs as a set of links between NB Broker waystations
    which may need to be dynamically instantiated
   UserChannels have publish/subscribe semantics with XML topics
   Links implement a single conventional “data” protocol.
     • Interface to add new transport protocols within the Framework
     • Administrative channel negotiates the best available communication
       protocol for each link
   Different links can have different underlying transport implementations
     • Implementations in the current release include support for
       TCP,UDP, Multicast, SSL and RTP.
        HTTP, HTTPS support will be available in Feb 2003 release.
     • Supports communication through proxies such as iPlanet, Netscape
       and Apache.
     • Supports communication through firewalls such as Microsoft ISA,
Performance/Routing in Message-based Architecture
                                                Firewall   Link        B1
                            Satellite           HTTP
                       Software Multicast                       Protocol
                                                    Dial-up                 B3

   In traveling from cities A to B (say 3 separate passengers), one
    chooses between and changes transport mechanism at
    waystations to optimize cost, time, comfort, scenic beauty …
   Waystations are now NB brokers where one chooses transport
    protocol (individual or collective)
    • Able to choose between car, type of car, plane, train etc
    • Able to dynamically create waystations to cope with problems and acts as
      hubs for multicast messages
    • Knows about traffic jams and can assign the “HOV lane”
             Note on Optimization
   Note in parallel computing, couldn’t do much dynamic
    optimization as aiming at microsecond latency
    • Natural to use hardware routing
   In Grid, time scales are different
    • 100 millisecond quite normal network latency
    • 30 millisecond typical packet time sensitivity (this is one audio
      or video frame) but even here can buffer 10-100 frames on
      client (conferencing to streaming)
    • 1 millisecond is time for a Java server to “think”
   Jitter in latency (transit time) due to routing, processing
    (in NB) or packet loss recovery is important property
   Grid needs and can tolerate significant dynamic
        Transit delay for message samples in NaradaBrokering
           Different communication hops - Internal Machines
       Sender/receiver/broker - (Pentium-3, 1                 hop-3
8      GHz, 256 MB RAM). 100 Mbps LAN.                        hop-5
       JDK-1.3, Red Hat Linux 7.3                             hop-7






1000   1500      2000     2500      3000        3500   4000     4500   5000
                           Message Payload Size
        Standard Deviation for message samples in NaradaBrokering
             Different communication hops - Internal Machines
0.7                                                      hop-3






 1000     1500    2000    2500    3000    3500    4000     4500     5000
                          Message Payload Size
 Average delays/packet for 12 (of the 400 total) video-clients.
    NaradaBrokering Avg=80.76 ms, JMF Avg=229.23 ms
400                                      JMF-RTP
      0   200 400 600 800 100012001400160018002000
                      Packet Number
 Average jitter/packet for 12 (of the 400 total) video clients.
   NaradaBrokering Avg=13.38 ms, JMF Avg=15.55 ms




     0   200 400 600 800 1000 1200 1400 1600 1800 2000
                     Packet Number
Narada Performance Web Service
       Performance measurements are
        used by Links in
         • Reconfiguring Connectivity
             between nodes
         • Deciding underlying transport
         • Determining possible filtering
       Each node determines performance           Probably should replace by a more
        of links of which it is endpoint           sophisticated measurement package
       Individual node web services are
        aggregated as another Web Service

        Factors measured include
        Transit delays, bandwidth, Jitter, Receiving rates.
        Performance measurements are
          • Spaced out at increasing intervals for healthy
          • Factors selectively measured for unhealthy
          • No repeated measurements of bandwidth for          Administrative Interface
          • Injected into Narada network as XML events
         The Overall Architecture
• The Grid is defined by a collection of distributed Services
   – For many users the primary interaction with the Grid will be
     through a portal

                                       Event and
                                       Services      Application
                     Portal Server                          and group
                                                    & index
                          MyProxy                   Services
                           Server      Directory
Application Portal in a Minute (box)
   Systems like Unicore, GPDK, Gridport (HotPage),
    Gateway, Legion provide “Grid or GCE Shell”
    interfaces to users (user portals)
    • Run a job; find its status; manipulate files
    • Basic UNIX Shell-like capabilities
   Application Portals (Problem Solving Environments)
    are often built on top of “Shell Portals” but this can be
    quite time confusing
    • Application Portal = Shell Portal Web Service + Application
      (factory) Web service
             Application Web service
   Application Web Service is ONLY metadata
     • Application is NOT touched
   Application Web service defined by two sets of schema:
     • First set defines the abstract state of the application
         What are my options for invoking myapp?

         Dub these to be “abstract descriptors”

     • Second set defines a specific instance of the application
         I want to use myapp with input1.dat on
         Dub these to be “instance descriptors”.

   Each descriptor group consists of
     • Application descriptor schema
     • Host (resource) descriptor schema
     • Execution environment (queue or shell) descriptor schema
              Web Services as a Portlet
• Each Web Service naturally has a         Application as a WS
  user interface specified as ―just        General Application Ports
  another port‖                            Interface with other Web
   – Customizable for universal access
• This gives each Web Service a                                    W
                                                  Application or       S
  Portlet view specified (in XML as              Content source        R
  always) by WSRP (Web services                                    P
                                                  Web Service
  for Remote Portals)
• So component model for resources                   User Face of
  ―automatically‖ gives a component                  Web Service
                                                     WSRP Ports define
  model for user interfaces                          WS as a Portlet
   – When you build your
     application, you define portlet     Web Services have other
     at same time                        ports (Grid Service) to be
                                         OGSI compliant
Online Knowledge Center built from Portlets
                                      A set of UI

• Web Services provide a component model
  for the middleware (see large ―common
  component architecture‖ effort in Dept. of
• Should match each WSDL component with
  a corresponding user interface component
• Thus one ―must use‖ a component model
  for the portal with again an XML
  specification (portalML) of portal
      Architecture                                   Turbine Servlet
                                                       JSP template
                                                              ECS Root to HTML

 PSML                                                Screen Manager

                   PortletController                                        PortletController

                                 ECS                                           ECS         ECS


                           ECS                ECS
                                                                                          ECS            ECS

 Portlets        Portlet                 Portlet                  Portlet             Portlet        Portlet

             XML                       HTML                     JSP or VM            WebPage       Portlets
 Data                                  Local files              Local templates      Remote HTML   User implemented
             RSS, OCS, or other
                                                                                                   using Portal API
             Local or remote
       Portlets and Portal Stacks
• User interfaces to Portal       Aggregation Portals

                                                        Message Security, Information Services
  services (Code                      (Jetspeed)
  Submission, Job
  Monitoring, File
                                   User facing Web
  Management for Host X)            Service Ports
  are all managed as
• Users, administrators can      Application Grid Web
  customize their portal               Services
  interfaces to just precisely
  the services they want.
                                  Core Grid Services
Jetspeed Computing Portal: Choose Portlets

                           4 available portlets
                           linking to Web Services
                           I choose two
Choose Portlet Layout

    Choose 1-column Layout

    Original 2-column Layout
File management

                  Tabs indicate available
                    portlet interfaces.

                     Lists user files on
                  selected host, noahsark.
                  File operations include
                    Upload, download,
                  Copy, rename, crossload
   Sample page with
    several portlets:
proxy credential manager,
 submission, monitoring
          Administer Grid Portal
                                   Provide information
                                    about application
                                     host parameters

Select application
      to edit

To top