BPEL and OGSA

					Grid Middleware – Principles, Practice and Potential
       (Or: What do Wombats and Grid have
                   in common?)

          UK OGSA Evaluation Project
       (UCL, Imperial, Newcastle, Edinburgh
                         )
              UCL Project Members: Paul
             Brebner, Wolfgang Emmerich
            University College London
             P.Brebner@cs.ucl.ac.uk
    What do Wombats and Grid have in common?



A They are secretive and misunderstood creatures?
B They live in complex underground burrows?
C You wouldn’t want to meet one in a confined
  space in the dark?
D All of the above?

                                ?
                        Grid – Abstract



• Principles
   – What are the principles of Grid middleware?
• Practice (and pitfalls)
   – How easy is it to use in practice? What are the pitfalls?
• Potential
   – What potential does Grid middleware have to
      • (1) provide insight into different ways of using Service
        Oriented Architectures, and
      • (2) support automatic deployment and debugging?
                       Grid – Principles



• Principles
   – What are the principles of Grid middleware?
• Practice (and pitfalls)
   – How easy is it to use in practice? What are the pitfalls?
• Potential
   – What potential does Grid middleware have to
      • (1) provide insight into different ways of using Service
        Oriented Architectures, and
      • (2) support automatic deployment and debugging?
Grid Principles – cluster, enterprise, internet
           Grid Principles – Grid vs Enterprise



• What’s the difference between Grid and
  Enterprise? (Typical generalisations…)
• Grid
   – Crosses firewalls and organisational boundaries
   – Resource and code focussed
      • scientist has some code, and wants to execute it on as many
        resources as possible, to solve ever bigger problems
   – Developer, deployer and user may be the same person
                          Grid Principles – Grid

User wants: Infinite resources, scalability, monitoring

      Code                                         Organisations want:
                                                   Fair sharing,
                                                   ease of maintenance?
       Data



                                                    Code
                   New
                   Data
                                                    Data
        Grid Principles – Grid vs Enterprise



• Enterprise
  – Code developed, deployed and maintained by
    enterprises behind firewall
  – Exposed as web services for intra and inter
    organisational interoperability
  – Users don’t develop or deploy code
                     Grid Principles – Enterprise

User wants:                                         Enterprise wants:
Response time,                                      Interoperability,
availability                                        scalability,
                                                    security
     Query or
     Transaction




                                                    Service developer
                 Response
           Grid Principles – Grid vs Enterprise



• Grid (User view)
   – I have some code, make it run fast for me.
   – Concerns: Finding resources, platform
     portability, deploying, running and monitoring
     “jobs”, security, data management.
• Enterprise (Enterprise owner view)
   – I have some business logic exposed as Web service –
     ensure internal and external users get required QoS.
   – Concerns:
     QoS, interoperability, transactional, performance/scalab
     ility, security, multiple applications sharing services.
   Grid Principles – Just another component model?




• Inspight of these differences, they have something
  in common
• OGSI has J2EE origins
   – “What does it mean to ship a J2EE-based Grid
     environment, something that can deliver OGSI-compliant services?
     It means that you provide a server programming environment that
     makes it very easy for service writers to implement services that
     conform to the set of standards that are OGSI.”
   – Containers, lifecycle management
   – Goal: Easy to write services and interoperability at
     interface level
Grid Principles – OGSA vs OGSI
Grid Principles – OGSA without OGSI
Grid Principles – OGSA and ?




                     ?
          Grid Principles - Architecture

J2EE – n-tiered architecture
          Grid Principles - Architecture

OGSA – semi-layered, or “sum of services”
          Grid Principles - Architecture

GT3 – (core) server side components
          Grid Principles – OGSA Services



•   Infrastructure services
•   Execution Management services
•   Data Services
•   Resource Management Services
•   Security Services
•   Self-Management Services
•   Information Services
             Grid Principles – J2EE cf OGSI



Feature           J2EE                  OGSI
Containers        Multiple (4)          One
Components        Multiple              One (+inheritance)
Roles             Explicit              Implicit
Implementations   Many                  1-2 (sort of)
Component         Presentation/Business High-level grid
purpose           logic/persistence     services
                   Grid Principles - State



• Treatment of stateful instances?
   – J2EE has stateful session and entity beans
      • CMP Entity beans: lifecycle management
        (passivation/activation/pooling), caching, and automatic
        persistence support
      • Typically accessed via Stateless Session Beans or MDBs
   – GT3 has stateful instances (created by Factories)
      • Accessed via SOAP and handles
      • No automatic passivation/activation or persistence
                     Grid Principles - Roles



• J2EE
   –   Component developer
   –   Application assembler
   –   Deployer
   –   System Administrator
        • Not to mention product and tool providers, system
          architect, and database designer and administrator, etc
• Many products provide distributed/remote tool
  support
               Grid Principles - Roles



• Grid?
  – Increasing number of roles in practice
  – But, no explicit definition of Grid roles, and
  – Poor tool support for cross-organisational
    support of roles
           Grid Principles - Deployment



• Treatment of deployment?
  – J2EE has explicit deployment role, and
    typically good tool support for remote
    deployment
  – Support for product independent deployment
    (JSR-88 since J2EE 1.4)
  – GT3 has built-in support for remote
    “code/executable” deployment (staging), but
    none for remote “service” deployment
      Grid Principles – Confusion/alternatives




• How is Globus intended to be used?
  – 1: Science as first-order services
     • Middleware for building and hosting Grid
       Applications, by exposing science code as Grid
       services.
  – 2: High-level grid services
     • Middleware for building a set of high level Grid
       services, composed to provide new Grid
       functionality. Science isn’t first-order service, but
       executed and managed by Grid services.
       Grid Principles – Science services or Grid services

Science services:
Directly callable, described   Client

             1




    E=mc2
       Grid Principles – Science services or Grid services

Science services:
Directly callable, described   Client

             1




              D=A+2B+C2

    E=mc2
       Grid Principles – Science services or Grid services

Science services:                       Science: Indirectly callable, not
Directly callable, described   Client   directly described or discoverable
discoverable
             1                                       2




                                                      Execution
              D=A+2B+C2
                                              Data
                                                                  D=A+2B+C2
    E=mc2

                                                     E = mc2
                         Grid – Practice



• Principles
   – What are the principles of Grid middleware?
• Practice (and pitfalls)
   – How easy is it to use in practice? What are the pitfalls?
• Potential
   – What potential does Grid middleware have to
      • (1) provide insight into different ways of using Service
        Oriented Architectures, and
      • (2) support automatic deployment and debugging?
            Grid Practice – What to evaluate?



• OGSA > OGSI > GT3.2 – Grid SOA exemplar
   – Initially evaluate installation, configuration, and
     security
   – Then performance and scalability, deployment,
     architectural choices, etc.
• What’s the point? What are we trying to learn?
   – What are some of the s/w engineering and architectural
     issues surrounding Grid infrastructure? Across
     organisational boundaries?
   – What improvements are required before it is suitable
     for production environments?
         Grid Practice –”Realistic” test-bed




• Heterogeneous platforms
  – Linux, Solaris, Windows
• Cross-organisational
  – Four nodes
  – Independently administered
  – Firewalls and access restrictions
• Security
  – UK e-Science CA
              Grid Practice – Incremental




• Start with Core Package (Just container and basic
  services – e.g. container registry service)
• Add Security
• Then try “All Services”
• Simple enough – in theory
   – Relationship between packages not well understood
   – Java and non-Java components
   – Poor integration between some parts
Grid Practice – single node




          Install
           GT3



                    OS/HW
Grid Practice – single node




         Configure
          Install
           GT3



                     OS/HW
Grid Practice – single node




         Deploy
         Configure
          Install
           GT3



                     OS/HW
Grid Practice – single node



           Run

         Deploy
         Configure
          Install
           GT3



                     OS/HW
Grid Practice – Multiple sites




GT3
Grid Practice – Multiple sites




GT3    GT3      GT3      GT3
Grid Practice – Multiple sites



       Interoperate




GT3    GT3      GT3      GT3
      Grid Practice – Multiple sites


                       Secure
             Interoperate




GT3   GT3    GT3      GT3       GT3    GT3
      Grid Practice – Multiple sites

                                Manage
                       Secure
             Interoperate




GT3   GT3    GT3      GT3       GT3      GT3
            Grid Practice – What we found


• Port number management (conflicts, discovery)
• Host access (requirements and site policies)
• Remote visibility of
  installation, container, services
  (what, configuration, version)
• Installation by System Administrators (role
  division, extra effort)
• Tomcat or Test container (different configuration)
• Linux is the only well supported platform
• Exponential increase in testing complexity as
  number of nodes increases.
                 Grid Practice – Security


• Grid Security Infrastructure (GSI)
   – X.509 certificates
   – Mutual authentication (client/host)
   – Proxy certificates (delegation and single sign-on)
• Authentication (Who are you?)
   – Secure Message (Basic)
   – Secure Conversation
      • Signing or Encryption (prevent unauthorised altering/reading)
• Authorisation (Who is authorised to use
  container, factory, service, method)
   – Gridmap file (Access Control List – maps Grid to Local
     identifies)
                   Grid Practice – Security


• In theory just have to
   –   obtain (and update) host, client, and CA certificates
   –   convert
   –   install
   –   configure (server, client side, container, services, etc)
   –   generate (and update) proxies.
• However, parts of “All Services” package also
  needed.
                Grid Practice – Security



• Interactions between security for multiple
  installations
• Essential to test non-secure interoperability first
• Windows client-side security
• Testing and viewing security configuration
• Debugging secure calls
• Client side security is programmatic
• Security management scalability
   – Construction and maintenance of user accounts and
     grid-map file entries.
                Grid Practice – Security



• Interactions between security for multiple
  installations
  – For testing may want
     • multiple versions, or duplicates (with different
       configurations) of same versions.
     • One container with no security, and another
       container with security
  – May want test/production environments
              Grid Practice – Security



• Essential to test non-secure interoperability
  first
  – Trying to test interoperability and security
    simultaneously wasn’t fun
              Grid Practice – Security



• Windows client-side security
  – Not obvious exactly what parts of Globus are
    needed for client side code with security (no
    “client side + security” package).
                Grid Practice – Security



• Testing and viewing security configuration
  – View/edit and check security configuration for
    containers and services
  – Confusion about hierarchical security settings
     • Virtual
       Organisations, clusters, servers, containers, factories
       , services, methods, and instances.
  – Remotely
  – Validate security deployment before run-time
                  Grid Practice – Security



• Debugging secure calls (or any stateful service)
   – Proxy interceptor approach (e.g. TCPMON) won’t
     work with stateful services
      • As grid handle returned to client contains the port number of
        the instance, not the proxy
   – But proxies are an important design pattern for SOAs…
   – GT4/WS-RF may be different
      • Handle resolvers, WS-Addressing and WS-
        RenewableReferences
              Grid Practice – Security



• Client side security is programmatic
  – Client side code modifications required to call
    services/methods with required protocols
  – Should be declarative
  – Sensitive to server side security credentials
                    Grid Practice – Security


• Security management scalability
   – Construction and maintenance of user accounts and grid-map file
     entries.
   – For each server, each user needs an account, and an entry in the
     container gridmap file (mapping client certificate to account)
   – May also need service specific gridmap files
   – Not scalable for large numbers of users, servers, services.
   – Revocation of certificates, host certificate expiry problem
• Alternatives?
   – Tool support
   – Role based authentication
   – Shared accounts or certificates (probably evil)
                Grid Practice - Performance

• First approach (initial results)
   – Scientific benchmark (SciMark2.0) modified to
     measure throughput, and invoked as a Stateful Grid
     Service
   – Metric is Calls Per Minute (CPM) – one unit of work.
   – No large-scale data movement, just SOAP parameters
     and result, and computation/memory load.
• Good performance and scalability
   –   Minimal overhead cf standalone benchark
   –   Security has minimal overhead
   –   Sustained 4200 “jobs” an hour throughput
   –   Problem with client side timeouts as response times
       increase
                                  Grid Practice - Performance


                                                 ART (s)

           200

           150                                                       UCL (4 cpu Sun)
Time (s)




                                                                     Newcastle (2 cpu Intel)
           100                                                       Imperial (2 cpu Intel)
                                                                     Edinburgh (4 hyperthread cpu Intel)
            50                                                       All

             0
                 0     10    20      30     40    50       60   70
                                      Threads




                     Tomcat
                     Fastest: 3.6s (Edinburgh)
                     Slowest: 25s (UCL)
                    Grid Practice - Performance


                           Throughput (CPM)

      80
      70
      60                                           UCL (4 cpu Sun)
      50                                           Newcastle (2 cpu Intel)
CPM




                                                   Imperial (2 cpu intel)
      40                                           Edinburgh (4 hyperthread cpu Intel)
      30                                           All (12 cpus)
                                                   Theoretical Maximum
      20
      10
       0
           0   20        40       60          80
                       Threads


 95% of predicted maximum throughput
              Grid Practice - Performance


• Tomcat vs Test container
   – No difference on 3 out of 4 nodes
   – But 67% faster on one node (Newcastle, slowest Intel
     box)
• Attachments will work with GT3 and Tomcat
   – But not with security
   – Limit of 1GB (DIME)
   – Bug in Axis – doesn’t clean up temporary files.
               Grid Practice - Performance


• Stateful instances visible externally can be
  problematic
   – Intermittent unreliability
      • On some runs, 1 exception in 300 calls (reliability of .9967)
          – But non-repeatable, SOAP/network related?
      • What is the safe response to exceptions? Can’t just retry.
   – Possible to kill container (relies on clients being well
     behaved):
      • By invoking same instance/method more than once.
      • By consuming container resources
          – But instances can be passivated/activated in theory
          – Could be used to enable fine-grain (per instance) control over
            resource usage.
                  Grid Practice - Pitfalls


• Production quality Grid middleware needs
  (“What this bike needs is …”)
• Support for
  –   Remote
  –   location independent
  –   cross-organisational
  –   multiple role scenarios
  –   Such as…
         Grid Practice - Pitfalls (continued)



– Platform independent, automatic, installation.
– Tool support for configuration and deployment
  creation, validation, viewing and editing.
– Management console for grid, nodes, globus
  packages, containers and services.
– Remote deployment and management of services.
– Remote distributed debugging of grid
  installations, services, and applications.
– Tool support, and more scalable processes for security.
                        Grid – Potential



• Principles
   – What are the principles of Grid middleware?
• Practice (and pitfalls)
   – How easy is it to use in practice? What are the pitfalls?
• Potential
   – What potential does Grid middleware have to
      • (1) provide insight into different ways of using Service
        Oriented Architectures, and
      • (2) support automatic deployment and debugging?
        Grid Potential – Architectural alternatives



• Evaluate the two approaches in more detail
   – Science exposed as services, vs science code managed
     by higher level grid services.
• Explore alternative mechanisms for:
   –   Executing science code
   –   Load balancing and scheduling/resource management
   –   Directory services (service and resource discovery)
   –   Data movement (e.g. SOAP Attachments vs GridFTP)
      Grid Potential – Architectural evaluation



• Evaluation approach
  – Loosely based on ATAM + mechanisms
  – Clarify the role of different GT3 mechanisms,
    and quantify pros/cons
  – Two versions of application
  – Evaluate with
     • Architecture
     • Roles
     • Scenarios (to quantify quality attributes)
        Grid Potential – Architectural evaluation


• Pick a number of roles of interest
   – Define attributes of interest, and scenarios to exercise
     and measure them
• Deployment
   – Consistency of deployment, and time to deploy
• Debugging
   – Ability to locate root cause of problem and rectify
• Security admin
   – Cost/time to secure increasing number of clients/nodes
• Grid owner
   – Scalability and ease of management
       Grid Potential – Architectural evaluation



• Hypothesis
  – Both approaches to using Grid are identical
  – But won’t be surprised by some differences – e.g.
    scalability, discovery, deployment
• Problems with
  – MDS3 (Directory and resource discovery service)
    working with aggregated service data across sites
  – GridFTP
  – Wrapping Science code with MMJFS
              Grid Potential - Deployment


• How to install and configure Grid infrastructure
  and services - scalably and securely?
• Install GT3 infrastructure and security manually
   – MMJFS allows executable code to be staged
     automatically (But not services - could provide a
     deployment service).
• Install bootstrapping code, and then install and
  deploy all other code and security automatically.
   – Using SmartFrog (HP) in the lab, and then test-bed.
   – Firewalls, platform specific configurations, user sand-
     boxing, configuring GT3 security remotely, and “trust”
     with System Administrators are open issues.
      Grid Potential – Deployment Speculation


• Explicit deployment-flows?
  – In Enterprise applications are increasingly represented
    as work-flows.
     • Good for distributed execution, and comprehensibility.
  – What if deployment plans are also represented
    explicitly as flows (deployment-flows)?
  – Some work on work-flow aware resource management
    (for Grid).
  – Deployment-flows could even be auto-magically
    generated from work-flows, and executed to ensure
    resources are deployed correctly JIT for work-flow
    execution.
     Grid Potential – Deployment Speculation


• For example:
  – Work-flow with two tasks
     • 1st task requires 10 nodes, 2nd task 100 nodes.
  – Produce deployment-flow which is interleaved
    with work-flow to:
     • Deploy 1st service for first task to 10, and start
       execution
     • Deploy 2nd service to 100 nodes concurrent with
       execution of 1st task, and ready for execution of 2nd.
       Grid Potential – Deployment Speculation

   Execute T1              Execute T2



           T1 x 10

                                         T2 x 100
S1
 S1                  S1
                      S1
  S1                   S1
   S1                   S1
    S1                   S1             Could also include
     S1                   S1
      S1                    S2          un-deployment
                             S2
                              S2
                               S2
                                 S2
                                  S2
Deploy S1 x 10       Deploy S2 x 100
        Grid Potential - Deployment + Debugging

• Debugging distributed systems is tricky
   – Need better support for cross-cutting non-functional concerns such
     as deployment and debugging.
   – (One) problem with debugging services is not knowing the context
     of errors (to aid diagnosis or cure) – a service is just a black box
     with an interface.
• Deployment aware debugging:
   – Starting from functional work-flows, generate deployment-
     flows, which are executed prior to, or concurrent with, functional
     work-flows.
       • This ensures that deployment is done consistently and automatically
         with respect to application execution.
   – If failure in functional work-flow, then corresponding deployment-
     flow is examined to determine likely causes, and parts are re-
     executed.
   – Failure in deployment-flow can also possibly be managed.
            Grid Potential - Deployment + Debugging

• Three phases of Debugging
• Debug deployment
    – Relies on deployment infrastructure and deployment-flows
    – What works locally or on one node may not work remotely, or identically
      on all nodes without modification, and deployment framework itself may
      be an extra cause of failure
• Debug/trace application + infrastructure to get working initially
    – Relies on visibility/transparency of deployed and running infrastructure
      and application
    – Ideally want integrated (active), or at least proxy/sniffer (passive),
      debugging (profiling, tracing, stepping) support.
• Debug working application upon failure
    – But multiple failure modes
    – Has application + infrastructure been analysed and/or tested for them all?
    – Can diagnosis and rectification be done anyway?
        Grid Potential - Deployment + Debugging


• Backtrack through deployment steps (Like peeling an onion)
   – Some steps will need to be reversed, and then redone correctly
   – Manage dependent, redundant, and inconsistent operations
• This approach may fix an (interesting) sub-class of problems:
       • Those which can be fixed by simply redoing (or replicating) (part of) the
         installation, E.g.
           – Intermittent failure of container or services
           – Resource starvation or overload – deploy services to more resources
       • Security problems that can be fixed with reconfiguration or refresh of
         certificates/proxies.
   – But not:
       • network, or all configuration and security/access problems.
       • Or “Enterprise Web services” (from a user perspective, as users can’t
         deploy)
       Grid Potential - Deployment + Debugging

   Execute T1              Execute T2



           T1 x 10

                                        Failure!
S1
 S1                  S1
                      S1
  S1                   S1
   S1
    S1
     S1
      S1
                        S1
                         S1
                          S1
                           S2
                                                   ?
                            S2
                             S2
                              S2
                               S2                  S2
Deploy S1 x 10       Deploy S2 x 100       Redploy S2 on
                                           failed node
        Grid Potential - Deployment + Debugging


• What’s still needed?
   – Connection between executing client code and
     deployment infrastructure
   – Ability to reason about relationship between work-
     flow/client failures, deployment-flows and grid
     infrastructure, diagnose failure causes, and plan solutions
   – Ideally want applications and deployment represented
     explicitly as flows – work and deployment flows.
   – Could possibly infer work-flow and therefore
     deployment-flow from running system in the absence of
     explicit information?
   – Justification – is the problem significant, and how far does
     this solution go?
            UK OGSA Evaluation Project



• Thank you 

• Email: P.Brebner@cs.ucl.ac.uk
  – After November: Paul.Brebner@csiro.au
            UK OGSA Evaluation Project


• Thank you 

• Email: P.Brebner@cs.ucl.ac.uk
  – After November: Paul.Brebner@csiro.au


• Not
            UK OGSA Evaluation Project


• Thank you 

• Email: P.Brebner@cs.ucl.ac.uk
  – After November: Paul.Brebner@csiro.au


• Not (quite)
            UK OGSA Evaluation Project


• Thank you 

• Email: P.Brebner@cs.ucl.ac.uk
  – After November: Paul.Brebner@csiro.au


• Not (quite) the
            UK OGSA Evaluation Project


• Thank you 

• Email: P.Brebner@cs.ucl.ac.uk
  – After November: Paul.Brebner@csiro.au


• Not (quite) the End
            UK OGSA Evaluation Project


• Thank you 

• Email: P.Brebner@cs.ucl.ac.uk
  – After November: Paul.Brebner@csiro.au


• Not (quite) the End…
          Postscript – The Secret Life of Grid?

Our experiences Evaluating Grid technology reminds me of an
Australian book (“The Secret Life of Wombats”) about a school boy
who used to sneak out of his dormitory after everyone was asleep to go
“wombatting”. He spent his nights secretly crawling down Wombat
burrows with a flashlight – a potentially lethal activity (not just from
cave-ins, as wombats are ferocious when cornered!) – and wrote
copious notes resulting in a substantial increase in knowledge of these
“mysterious and often misunderstood creatures”.
          Postscript – The Secret Life of Grid?

Our experiences Evaluating Grid technology reminds me of an
Australian book (“The Secret Life of Wombats”) about a school boy
who used to sneak out of his dormitory after everyone was asleep to go
“wombatting”. He spent his nights secretly crawling down Wombat
burrows with a flashlight – a potentially lethal activity (not just from
cave-ins, as wombats are ferocious when cornered!) – and wrote
copious notes resulting in a substantial increase in knowledge of these
“mysterious and often misunderstood creatures”.


                   UK OGSA Evaluation Project Report 1.0
                       Evaluation of Globus Toolkit 3.2 (GT3.2)
                       Installation
                       http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc
          Postscript – The Secret Life of Grid?

Our experiences evaluating grid technology reminds me of an
Australian book (“The Secret Life of Wombats”) about a school boy
who used to sneak out of his dormitory after everyone was asleep to go
“wombatting”. He spent his nights secretly crawling down Wombat
burrows with a flashlight – a potentially lethal activity (not just from
cave-ins, as wombats are ferocious when cornered!) – and wrote
copious notes resulting in a substantial increase in knowledge of these
“mysterious and often misunderstood creatures”.


                   UK OGSA Evaluation Project Report 1.0
                       Evaluation of Globus Toolkit 3.2 (GT3.2)
                       Installation
                       http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:6
posted:7/22/2010
language:English
pages:81