WS-Resource Framework and WS-Notification

Document Sample
WS-Resource Framework and WS-Notification Powered By Docstoc
					  Globus Toolkit:
Status and Futures
     Carl Kesselman
Globus Alliance @ USC/ISI
   Globus Toolkit in context
   What’s all this about WSRF?
   Status and Roadmap
     Security
     Data services

     GRAM

     MDS

     Testing

   Summary

            Globus Toolkit:
    “Standard Plumbing” for the Grid
   Not turnkey solutions, but building blocks
    and tools for application developers and
    system integrators
        Some components (e.g., file transfer) go
         farther than others (e.g., remote job
         submission) toward end-user relevance
   Since these solutions exist and others are
    already using them (and they’re free), it’s
    easier to reuse than to reinvent
        And compatibility with other Grid systems
         comes for free!
               Globus in Context
   Implementations are provided by a mix of
       Application-specific code
       “Off the shelf” tools and services
       Tools and services from the Globus Toolkit
       GT-compatible tools and services from the
        Grid community
   Glued together by …
       Application development
       System integration
   Deployed and supported
               What a Grid Application Looks Like
                                                                          Globus   Compute
                                                                          GRAM      Server
                                       Tool                                        Compute
        Web                                                               Globus
                                                                          GRAM      Server
                                                   Globus Index
                        CHEF                          Service

                                      Data         Telepresence
                                     Viewer           Monitor                 Camera
Off the                               Tool
Shelf                                                                     Globus   Database
                            CHEF Chat                                      DAI
Globus                                                                              service
                4            Teamlet
                                                                          Globus   Database
Grid                                                 MCS/RLS               DAI
                           MyProxy                                                  service
                                                                          Globus   Database
                      Certificate                                          DAI      service
Users work            Application services       Collective services    Resources implement
with client          organize VOs & enable         aggregate &/or       standard access &
applications        access to other services     virtualize resources   management interfaces 5
    Globus Toolkit and Web Services
   Web services have some advantages for Grids
       Standard interface definition
       Good commercial tooling (eventually)
   Not a silver bullet or complete solution!!
   Globus Alliance working to advance specs ...
       OGSI/WSRF, OGSA-DAI, WS-Agreement, etc.
       WSDL 2.0, WSDM, WS-Security, etc.
   … and implementation
       WS-based interface to existing services
       New WS-based services
       Implementations of low-level specifications   6
                   Convergence of
             Grid and Web Services (1)


far apart      Have been
in apps        converging                               ?
& tech


    However, despite enthusiasm for OGSI, adoption
    within Web community turned out to be problematic

                   Convergence of
             Grid and Web Services (2)


far apart      Have been
in apps        converging                    WSRF
& tech

   • The definition of WSRF means that Grid and Web
     communities can move forward on a common base
   • Support from major WS vendors, especially in
     management space: e.g., CA, HP, IBM            8
              Core Ideas in WSRF
   Separate service from resource
       Service is static and stateless
       Resource is dynamic and stateful
   Leverage WS-Addressing
   Make WS-I compliant
       But note that WS-I alone doesn’t make the
        problems go away, still need to worry about
        how to manage lifetime, naming, state, …
   Preserves OGSI functionality
       Lifetime, state publication, notification, error
        types                                              9
             Services and Resources
Scenario: Resource management & scheduling
                                       Grid “Jobs” and “tasks” Scheduler
    Local processor manager
                  J       J           are also modeled using is
                                  Other kinds of resources are alsoa
    is “front-ended” with
                          J           “modeled” as WS-Resources
                                  Grid WS-Resources andWeb Service
    A Web service interface     Scheduler              Service
                                        Resource Properties
                     WS-Resource used to                 A
                        Service Level
                         “model” physical
                      processor resources
                      is modeled as a
Computers              WS-ResourceNetwork
                                 WS-Notification can be            Storage
                     (WS-Agreement) to “inform” the Lifetime of SLA
R    R   R                    WS-Resource Properties Resource R R
                                R    scheduler when
                                     R        R                 R tied to
                                   processor utilization
                              “project” processor status the duration of
                              (like utilization)           the agreement

            From OGSI to WSRF:
          Refactoring and Evolution

         OGSI                             WSRF
     Grid Service Reference WS-Addressing Endpoint Reference
        Grid Service Handle WS-Addressing Endpoint Reference
   HandleResolver portType WS-RenewableReferences
Service data defn & access   WS-ResourceProperties
  GridService lifetime mgmt WS-ResourceLifetime
      Notification portTypes WS-Notification
           Factory portType Treated as a pattern
    ServiceGroup portTypes WS-ServiceGroup
             Base fault type WS-BaseFaults

    What Does this Mean for Globus?
   Invisible from high-level services
        GRAM, MDS, RFT, etc.
        And from perspective of VDT, etc.
   GT services will be WS-I +WS-addressing
    compliant web services
        Client tooling will work with Globus services
        Just ignores “extra stuff”
   GT services are standard Apache services
        Well-defined interface from service to
         backend resources
    Globus Toolkit >> WSRF/OGSI
   WSRF/OGSI impl is just one small part of GT
       GT = Services for execution mgt, data mgt,
        monitoring, and discovery
       E.g., GRAM, GridFTP, RFT, RLS, OGSA-DAI
   GT services are largely unaffected by WSRF
       Translate dynamic service to dynamic resource
       Interfaces and semantics largely unchanged
       But not interoperable due to protocol changes
   Pre-WS components continue to be supported
       GridFTP, RLS, pre-WS GRAM

 Components in Globus Toolkit 3.0
   GSI           WU GridFTP                     MDS2         WS Core

                    RFT          WS GRAM       WS-Index        OGSI
                   (OGSI)         (OGSI)        (OGSI)       C Bindings


                       Data        Resource    Information      WS
                    Management    Management     Services       Core

 Components in Globus Toolkit 3.2
   GSI           WU GridFTP                         MDS2            WS Core

                    RFT          WS GRAM          WS-Index           OGSI
                   (OGSI)         (OGSI)           (OGSI)          C Bindings

   CAS                                                               OGSI
  (OGSI)                                                         Python Bindings

 SimpleCA         OGSI-DAI


                       Data        Resource        Information        WS
                    Management    Management         Services         Core

    Planned Components in GT 4.0
   GSI           New GridFTP                         MDS2         WS Core

                     RFT          WS-GRAM          WS-Index       C WS Core
                   (WSRF)          (WSRF)          (WSRF)          (WSRF)

  CAS                                CSF
 (WSRF)                          (contribution)

 SimpleCA         OGSI-DAI

                       Data          Resource       Information      WS
                    Management      Management        Services       Core

         Globus Priorities for 2004
   Stabilize Web services-based implementation
       Greatly improve usability, performance,
        reliability, testing, scalability, documentation,
       WSRF support
       Leverage Apache even more: Axis, WS-Security,
        WS-Addressing, …
   Bring to fruition new functionality in pipeline
       Data access & integration (tech preview)
       Enhanced GridFTP
       Community scheduling framework (contribution)
       Monitoring & discovery capabilities
   GT2GT3 bump >> GT3GT4 bump                            17
              GT4 Security Roadmap
   Improved authorization capabilities:
       SAML-based authorization callout
            As specified in GGF OGSA-Authz WG
            Being tested now by PERMIS
            Initial deployments next month (UK BRIDGES project)
       Integrated policy decision engine
            XACML policy language, per-operation policies

   Better independent data unit (IDU) support
     WS-Security standard messages
     Adding replay prevention and encryption

   MyProxy integration
   One time password (OTP) support
         eXtensible Input/Output (XIO)
   Brand new in GT3.2
       Read/Write/Open/Close Abstraction, globus_io()
       Drivers are written for (file, TCP, UDP, GSI, etc.)
       Different functionality achieved by protocol stacks
   Plans for GT4.0
       GridFTP driver will allow a 3rd party application to
        access files stored behind a GridFTP server with
        RWOC semantics
       Mode E driver will allow a 3rd party application to
        add parallel TCP
               GridFTP in GT 4.0
   100% open source Globus Code
       No licensing issues
       Stable, extensible code base
   Uses XIO directly  different transports
   IPV6 Support
   Striping  multi-Gb/sec wide area transport
   HPSS version of GridFTP
       Working with HPSS consortium
       Works with Kerberos
       Other backends should be easy
        Reliable File Transfer (RFT)
   GT3.2
       Improved scalability and performance
       Scale from a few 1,000 to 20,000 files per
   GT4.0
       Improved scalability: target is 1 million files
       Usability and documentation improvements
       WSRF implementation

                Replica Services
   Replica location service
       Joint EDG/GA design
       First released in GT3.0
       Production deployments with over 3 million
        LFNs and over 30 millions PFNs (LIGO)
       Much performance testing (HPDC 2004 paper)
   Copy and Registration Service
       GTR (Grid Technology Repository): Jan 2004
   GT 4.0 plans
       Performance improvements
       Technology preview of WSRF-based GGF
        replication specification                    22
        GRAM (Job Submission and
        Management Service) Goals
   Significant improvements in performance,
    scalability, and stability
   Reduce load on head node and support larger
    numbers of jobs and job bursts
       Enable a single head node to handle many
        thousands of jobs (target=10,000 pending,
        1,000 active, 100 with active streaming)
   Decrease client latency, increase thruput
       Hundreds of job submissions per minute
   Better server controls
       GRAM should never “fail” under high load
               Improving GRAM:
                Progress to Date
   GT3.0 introduced a WS-based GRAM with
    performance inferior to that of pre-WS GRAM
   GT3.2 WS-based GRAM has performance
    comparable to that of pre-WS GRAM
       Tuning and performance testing
       Improvements in GT core
       CERN evaluation effort a big help (thanks!!)
   Job submission rates: 4/sec  16/sec
   Total jobs: 60  600+ (? still evaluating)
   Clearly not good enough yet!!
    GT 4.0: Major GRAM Redesign

   Re-architect to reduce overheads,
       Reduce memory footprint, optimize interfaces
   No performance penalty for optional features
       E.g. code of supporting streaming output
   Control cost of security
     E.g., pay for delegation only when needed
     Reduce from 6 to 1 roundtrips per job

   Continued tuning to improve performance
     Better response to server load
     Improved client implementations

     Better server management and concurrency
    GT4.0 WS GRAM Major Changes
   Single hosting environment
       No more dynamic server creation
   Per user, not per-job credentials
       Accelerate submission of multiple jobs
   Flexible credential management
       Introduce credential cache service
       Delegation used only when needed
   Remove file management (GASS)
       GridFTP for staging and streaming I/O
       Removes overhead when not used
   Augment RIPS with optimized state tracking
          New GRAM GT 4.0 Features
   Ability for job to self-organize (myJob)
       Important for MPICH-G2
   Better job state tracking
       Application-specific job status
       Record the history of all job states
       Exit code of a job available to clients
   Ability to target specific cluster nodes
   User-mode GridFTP for staging
       Eliminate the complexity and inefficiencies of
        GASS cache, file streaming and URL staging
      GT4.0 GRAM Changes that will
          Improve Performance
   Job submission decreases from 6 round
    trips to 1 (if no delegation)
   Job represented by WS-Resource (<1000
    bytes) not service (~40K bytes)
   Data transfers performed by compute
    nodes not head node

             MDS4 Target Use Cases
   Discovery of job submission (GRAM) services
       Basic resource and service discovery for users
            Cluster type data, queue data, software stack
            GLUE schema
       Data archived and viewed with histograms
   Service availability
       Site and cross-site tests for basic services
       GRAM, GridFTP, RFT
   Automatic discovery-Job Submission Service
       Use case 1 with assisted resource selection and job
        submission through GRAM

                Index Service Issues
   Index performance
       ~1000 resources, ~ 1 update/min
            Tracking of compute element, data element state
       ~100 resources, ~1 update/ 5 seconds
            Tracking of job state, transfer state

   Add pieces needed for vertical solutions
       Efficient & full-functioned resource
       Displays and programmer APIs

               GT 3.2 Index Service
   Significant performance improvement over GT3.0
       Improvements in core
       Better buffering and queue management, update
        rate limiting to improve scalability
   Service group based implementation
   Command line tools, browser access (in GTR)
   Per-service status information
       RIPS
   API components
       To help write specialized indices and provide service
        data in resource services

               GT3.2 Index Performance

                   B       1Msg/Sec             1Msg/Sec
                           @1KB                 @1KB

                                Index Node: A

                          1Msg/Sec            1Msg/Sec
                          @1KB                @1KB
                    B                                      B

A: Dual P4 Xeon 2Ghz w/HT; 1Gig RAM
B: Single P3 (Katmai) 550MHz; 1.5Gig RAM

Container: GT3 Standalone Container, default Java heap size, 20 worker threads
Test iterated on a variable number of nodes (B): 12, 24, 36 and 48
Target message rate for updates was 1KB/Second for each node
Index Service Update Performance
 Update response time (ms)

Sustained 24x1K updates/sec  1000 resources, update/min

                  GT4 MDS Plans
   Index and web UI – similar architecture,
    hardened and ported to WSRF
   Archiver to store historical attribute data
   Richer per-service information for resources
       Interface to Ganglia cluster monitor for GRAM
       GT services (RFT, GRAM, GridFTP) provide better
        status information
   Documentation for useful deployment
   API components improved based on GT3

                       GT Testing

Increase toolkit reliability by testing:
   earlier in the development process,
   more often,
   more thoroughly,
   With user centric test cases
               GT Release Timeline
          Improved robustness,
            scalability, performance,
       3.2                     4.0 b           4.0
                               Q2              Q3
2004                                                              2005

                                              WSRF; some new
                                              functionality; further
                                        4.0   usability, performance
                                                              Q2 ‘05
Note: We are
not waiting for                                             4.2
finalization           Numerous new WSRF-based services
of WSRF specs
         EEGE Engagement:
    What We’d Like to See from You
   Requirements specification
       Use cases for desired features & performance
   Joint work on component evaluation
       Testing
       E.g., GRAM & MDS performance evaluation
       Collaboration with CERN has been wonderful
   Collaborative design/implementation
       E.g., a common RLS; input on GRAM design
   Creating higher-level layered services
       E.g., data management, job management
   GT provides foundation for interoperable
    EGEE infrastructure
   Major GT focus on performance and stability
   GT4 will start migration of Globus
    infrastructure to WSRF
       Impact of WSRF is small, big up-side
   Need to ensure good communication
    between Globus Alliance/EGEE
       We are part of EGEE