Document Sample
ACATtalk Powered By Docstoc
					The Grid:
Globus and the Open Grid Services

Dr. Carl Kesselman
Center for Grid Technologies
Information Sciences Institute
University of Southern California
   Why Grids
   Grid Technology
   Applications of Grids in Physics
   Summary
Grid Computing
How do we solve problems?
   Communities committed to common goals
    - Virtual organizations
   Teams with heterogeneous members &
   Distributed geographically and politically
    - No location/organization possesses all required skills
      and resources
   Adapt as a function of the situation
    - Adjust membership, reallocate responsibilities,
      renegotiate resources
 The Grid Vision
“Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual
 - On-demand, ubiquitous access to computing, data,
   and services
 - New capabilities constructed dynamically and
   transparently from distributed services

 “When the network is as fast as the computer's
  internal links, the machine disintegrates across
 the net into a set of special purpose appliances”
                  (George Gilder)
The Grid Opportunity:
eScience and eBusiness
   Physicists worldwide pool resources for peta-op
    analyses of petabytes of data
   Civil engineers collaborate to design, execute, &
    analyze shake table experiments
   An insurance company mines data from partner
    hospitals for fraud detection
   An application service provider offloads excess
    load to a compute cycle provider
   An enterprise configures internal & external
    resources to support eBusiness workload
Grid Communities & Applications:
Data Grids for High Energy Physics
                                                                                                                1 TIPS is approximately 25,000
                                                        Online System          ~100 MBytes/sec                  SpecInt95 equivalents

                                                                                    Offline Processor Farm
          There is a “bunch crossing” every 25 nsecs.
                                                                                           ~20 TIPS
          There are 100 “triggers” per second
                                                                                                         ~100 MBytes/sec
          Each triggered event is ~1 MByte in size

                                                       ~622 Mbits/sec
                                                                          Tier 0               CERN Computer Centre
                                        or Air Freight (deprecated)

 Tier 1
          France Regional                   Germany Regional                  Italy Regional                     FermiLab ~4 TIPS
              Centre                            Centre                           Centre
                                                                                                                               ~622 Mbits/sec

                                                            Tier 2            Caltech                  Tier2    Tier2 Centre
                                                                                               Tier2 Centre Centre        Tier2 Centre
                                                                              ~1 TIPS            ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
                                             ~622 Mbits/sec

                                        Institute Institute       Institute
                               ~0.25TIPS                                                       Physicists work on analysis “channels”.
                                                                                               Each institute will have ~10 physicists working on one or more
      Physics data cache
                                                 ~1 MBytes/sec                                 channels; data for these channels should be cached by the
                                                                                               institute server
                                                                 Tier 4
                    Physicist workstations                                                            
Grid Communities and Applications:
Network for Earthquake Eng. Simulation
     NEESgrid: US national
      infrastructure to couple
      earthquake engineers with
      experimental facilities,
      databases, computers, &
      each other
     On-demand access to
      experiments, data streams,
      computing, archives,

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
    Living in an Exponential World
    (1) Computing & Sensors
Moore’s Law: transistor count doubles each 18 months

   star formation
Living in an Exponential World:
(2) Storage
   Storage density doubles every 12 months
   Dramatic growth in online data (1 petabyte =
    1000 terabyte = 1,000,000 gigabyte)
    - 2000   ~0.5 petabyte
    - 2005   ~10 petabytes
    - 2010   ~100 petabytes
    - 2015   ~1000 petabytes?
   Transforming entire disciplines in physical and,
    increasingly, biological sciences; humanities
An Exponential World: (3) Networks
(Or, Coefficients Matter …)
          Network vs. computer performance
            - Computer speed doubles every 18 months
            - Network speed doubles every 9 months
            - Difference = order of magnitude per 5 years
          1986 to 2000
            - Computers: x 500
            - Networks: x 340,000
          2001 to 2010
            - Computers: x 60
            - Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-
2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
Requirements Include …
   Dynamic formation and management of virtual
   Online negotiation of access to services: who,
    what, why, when, how
   Establishment of applications and systems
    able to deliver multiple qualities of service
   Autonomic management of infrastructure
   Open, extensible, evolvable infrastructure
The Grid World: Current Status
   Dozens of major Grid projects in scientific &
    technical computing/research & education
   Considerable consensus on key concepts and
    - Open source Globus Toolkit™ a de facto standard for
      major protocols & services
    - Far from complete or perfect, but out there, evolving
      rapidly, and large tool/user base
   Industrial interest emerging rapidly
   Opportunity: convergence of eScience and
    eBusiness requirements & technologies
    Globus Toolkit
       Globus Toolkit is the source of many of the
        protocols described in “Grid architecture”
       Adopted by almost all major Grid projects
        worldwide as a source of infrastructure
       Open source, open architecture framework
        encourages community development
       Active R&D program continues to move
        technology forward
       Developers at ANL, USC/ISI, NCSA, LBNL, and
        other institutions
            The Globus Toolkit in One Slide
           Grid protocols (GSI, GRAM, …) enable resource
            sharing within virtual orgs; toolkit provides reference
            implementation ( = Globus Toolkit services)
                                                  MDS-2                  Soft state
                       Reliable           (Meta Directory Service)        enquiry
    GSI User           remote
                     invocation Gatekeeper Reporter                            GIIS: Grid
   (Grid                                      (registry +                     Information
            Authenticate &        (factory)
 Security create proxy                        discovery)      Other GSI-     Index Server
Infrastruc- credential          Create process Register      authenticated     (discovery)

   ture)      User                   User                   remote service
            process #1             process #2                  requests
              Proxy                                                          Other service
                                    Proxy #2
                  GRAM                                                       (e.g. GridFTP)
(Grid Resource Allocation & Management)

           Protocols (and APIs) enable other tools and services
            for membership, discovery, data mgmt, workflow, …
    Globus Toolkit: Evaluation (+)
   Good technical solutions for key problems,
       Authentication and authorization
       Resource discovery and monitoring
       Reliable remote service invocation
       High-performance remote data access
   This & good engineering is enabling progress
       Good quality reference implementation, multi-
        language support, interfaces to many systems, large
        user base, industrial support
       Growing community code base built on tools
     Globus Toolkit: Evaluation (-)
   Protocol deficiencies, e.g.
       Heterogeneous basis: HTTP, LDAP, FTP
       No standard means of invocation, notification,
        error propagation, authorization, termination, …
   Significant missing functionality, e.g.
       Databases, sensors, instruments, workflow, …
       Virtualization of end systems (hosting envs.)
   Little work on total system properties, e.g.
       Dependability, end-to-end QoS, …
       Reasoning about system properties
“Web Services”
   Increasingly popular standards-based framework
    for accessing network applications
    - W3C standardization; Microsoft, IBM, Sun, others
   WSDL: Web Services Description Language
    - Interface Definition Language for Web services
   SOAP: Simple Object Access Protocol
    - XML-based RPC protocol; common WSDL target
   WS-Inspection
    - Conventions for locating service descriptions
   UDDI: Universal Desc., Discovery, & Integration
    - Directory for Web services
Web Services Example:
Database Service
   WSDL definition for “DBaccess” porttype
    defines operations and bindings, e.g.:
    - Query(QueryLanguage, Query, Result)
    - SOAP protocol

   Client C, Java, Python, etc., APIs can then be
Transient Service Instances
   “Web services” address discovery & invocation
    of persistent services
    - Interface to persistent state of entire enterprise
   In Grids, must also support transient service
    instances, created/destroyed dynamically
    - Interfaces to the states of distributed activities
    - E.g. workflow, video conf., dist. data analysis
   Significant implications for how services are
    managed, named, discovered, and used
    - In fact, much of our work is concerned with the
      management of service instances
OGSA Design Principles
   Service orientation to virtualize resources
    - Everything is a service
   From Web services
    - Standard interface definition mechanisms: multiple
      protocol bindings, local/remote transparency
   From Grids
    - Service semantics, reliability and security models
    - Lifecycle management, discovery, other services
   Multiple “hosting environments”
    - C, J2EE, .NET, …
OGSA Service Model
   System comprises (a typically few) persistent
    services & (potentially many) transient
    - Everything is a service
   OGSA defines basic behaviors of services:
    fundamental semantics, life-cycle, etc.
    - More than defining WSDL wrappers
Open Grid Services Architecture:
Fundamental Structure
   WSDL conventions and extensions for
    describing and structuring services
    - Useful independent of “Grid” computing
   Standard WSDL interfaces & behaviors for
    core service activities
    - portTypes and operations => protocols
      The Grid Service =
      Interfaces + Service Data
                       Reliable invocation

Service data access     GridService     … other interfaces …   Notification
Explicit destruction                                           Authorization
Soft-state lifetime                                            Service creation
                            Service   Service     Service
                                                               Service registry


                        Hosting environment/runtime
                             (“C”, J2EE, .NET, …)
The GriPhyN Project
   Amplify science productivity through the Grid
    - Provide powerful abstractions for scientists:
      datasets and transformations, not files and programs
    - Using a grid is harder than using a workstation. GriPhyN
      seeks to reverse this situation!
   These goals challenge the boundaries of computer
    science in knowledge representation and distributed
   Apply these advances to major experiments
    - Not just developing solutions, but proving them through
GriPhyN Approach
   Virtual Data
    - Tracking the derivation of experiment data with high
    - Transparency with respect to location
      and materialization
   Automated grid request planning
    - Advanced, policy driven scheduling
   Achieve this at peta-scale magnitude
   We present here a vision that is still 3 years away, but
    the foundation is starting to come together
Virtual Data
   Track all data assets
   Accurately record how they were derived
   Encapsulate the transformations that produce
    new data objects
   Interact with the grid in terms of requests for
    data derivations
  Data Grid Architecture

          DAG (abstract)
                               Catalog Services            Monitoring
     Planner                   MCAT; GriPhyN catalogs   MDS

         DAG (concrete)             Info Services
                                                          Repl. Mgmt.
    Executor                    Policy/Security
DAGMAN, Kangaroo               GSI, CAS

                           Reliable Transfer

Compute Resource           Storage Resource
GRAM                       GridFTP; GRAM; SRM
    GriPhyN Challenge Problem:
    CMS Event Reconstruction
                                               2) Launch secondary job on WI pool;
                            Master Condor      input files via Globus GASS
                            job running at                               Secondary
                               Caltech                                Condor job on WI
                                               5) Secondary                pool
                                               reports complete
     Caltech                                   to master
         6) Master starts
         reconstruction jobs                       3) 100 Monte
         via Globus                                Carlo jobs on
         jobmanager on                             Wisconsin Condor
         cluster                                   pool
                                  9) Reconstruction
                                  job reports
                                  complete to master
                                                                        4) 100 data files
                                                                        transferred via
                                       7) GridFTP fetches               GridFTP, ~ 1 GB
                                       data from UniTree                each
                  NCSA Linux cluster
                                                                   NCSA UniTree
                                       8) Processed                - GridFTP-
                                       objectivity                 enabled FTP
                                       database stored             server
                                       to UniTree

Work of: Scott Koranda, Miron Livny, Vladimir Litvin, & others
          GriPhyN-LIGO SC2001 Demo


                                                   xml                       Frame
                                                                frontend                                                                              Desired
                                                Cgi interface
                                                                                                                 Single channel time series
       Catalog                                               Planner                          Monitoring

       Replica                                            G-DAG (DAGMan)
      Selection                            Executor
                                           CondorG/                                    Logs                                Prototype exclusive
                                           DAGMan                                                                              In design
                                                                                                                           Globus component
      Catalog                                                                                                                        In integration

                                                   GridFTP       GridFTP      GRAM
                          GridFTP   GRAM/LDAS                                            GridFTP     GRAM/LDAS

                GridCVS                                                     Compute           LDAS at Caltech
                             LDAS at UWM            UWM          SC floor

Work of: Ewa Deelman, Gaurang Mehta, Scott Koranda, & others
    iVDGL: A Global Grid Laboratory
    “We propose to create, operate and evaluate, over a
    sustained period of time, an international research
    laboratory for data-intensive science.”
                                From NSF proposal, 2001

   International Virtual-Data Grid Laboratory
     -   A   global Grid laboratory (US, Europe, Asia, South America, …)
     -   A   place to conduct Data Grid tests “at scale”
     -   A   mechanism to create common Grid infrastructure
     -   A   laboratory for other disciplines to perform Data Grid tests
     -   A   focus of outreach efforts to small institutions
   U.S. part funded by NSF (2001-2006)
     - $13.7M (NSF) + $2M (matching)
iVDGL Components
   Computing resources
    - 2 Tier1 laboratory sites (funded elsewhere)
    - 7 Tier2 university sites  software integration
    - 3 Tier3 university sites  outreach effort
   Networks
    - USA (TeraGrid, Internet2, ESNET), Europe (Géant, …)
    - Transatlantic (DataTAG), Transpacific, AMPATH?, …
   Grid Operations Center (GOC)
    - Joint work with TeraGrid on GOC development
   Computer Science support teams
    - Support, test, upgrade GriPhyN Virtual Data Toolkit
   Education and Outreach
   Coordination, management
iVDGL Components (cont.)
   High level of coordination with DataTAG
    - Transatlantic research network (2.5 Gb/s) connecting
      EU & US
   Current partners
    - TeraGrid, EU DataGrid, EU projects, Japan, Australia
   Experiments/labs requesting participation
    - ALICE, CMS-HI, D0, BaBar, BTEV, PDC (Sweden)
Initial US iVDGL Participants
-   U Florida                CMS
-   Caltech                  CMS, LIGO
-   UC San Diego             CMS, CS
-   Indiana U                ATLAS, GOC
-   Boston U                 ATLAS             Tier2 / Software
-   U Wisconsin, Milwaukee   LIGO
-   Penn State               LIGO
-   Johns Hopkins            SDSS, NVO
-   U Chicago/Argonne        CS
-   U Southern California    CS
-   U Wisconsin, Madison     CS                CS support
-   Salish Kootenai          Outreach, LIGO
-   Hampton U                Outreach, ATLAS
-   U Texas, Brownsville     Outreach, LIGO    Tier3 / Outreach
-   Fermilab                 CMS, SDSS, NVO
-   Brookhaven               ATLAS
-   Argonne Lab              ATLAS, CS         Tier1 / Labs
                                               (funded elsewhere)
   Technology exponentials are changing the
    shape of scientific investigation & knowledge
    - More computing, even more data, yet more
   The Grid: Resource sharing & coordinated
    problem solving in dynamic, multi-institutional
    virtual organizations
   Current Grid Technology
Partial Acknowledgements
   Open Grid Services Architecture design
    - Karl Czajkowski @ USC/ISI
    - Ian Foster, Steve Tuecke @ANL
    - Jeff Nick, Steve Graham, Jeff Frey @ IBM
   Globus Toolkit R&D also involves many fine
    scientists & engineers at ANL, USC/ISI, and
    elsewhere (see
   Strong links with many EU, UK, US Grid
   Support from DOE, NASA, NSF, Microsoft
For More Information
   Grid Book
   The Globus Project™
   OGSA
   Global Grid Forum

Shared By: