Overview of Grid Computing_1_
Shared by: liwenting
-
Stats
- views:
- 19
- posted:
- 11/13/2010
- language:
- English
- pages:
- 188
Document Sample


Grid Computing:
Concepts, Appplications, and
Technologies
Ian Foster
Mathematics and Computer Science Division
Argonne National Laboratory
and
Department of Computer Science
The University of Chicago
http://www.mcs.anl.gov/~foster
Grid Computing in Canada Workshop, University of Alberta, May 1, 2002
2
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
3
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
4
Living in an Exponential World
(1) Computing & Sensors
Moore’s Law: transistor count doubles each 18 months
Magnetohydro-
dynamics
star formation
foster@mcs.anl.gov ARGONNE CHICAGO
5
Living in an Exponential World:
(2) Storage
Storage density doubles every 12 months
Dramatic growth in online data (1 petabyte
= 1000 terabyte = 1,000,000 gigabyte)
– 2000 ~0.5 petabyte
– 2005 ~10 petabytes
– 2010 ~100 petabytes
– 2015 ~1000 petabytes?
Transforming entire disciplines in physical
and, increasingly, biological sciences;
humanities next?
foster@mcs.anl.gov ARGONNE CHICAGO
6
Data Intensive Physical Sciences
High energy & nuclear physics
– Including new experiments at CERN
Gravity wave searches
– LIGO, GEO, VIRGO
Time-dependent 3-D systems (simulation, data)
– Earth Observation, climate modeling
– Geophysics, earthquake modeling
– Fluids, aerodynamic design
– Pollutant dispersal scenarios
Astronomy: Digital sky surveys
foster@mcs.anl.gov ARGONNE CHICAGO
7
Ongoing Astronomical Mega-Surveys
Large number of new surveys
– Multi-TB in size, 100M objects or larger MACHO
2MASS
– In databases SDSS
– Individual archives planned and under way DPOSS
GSC-II
Multi-wavelength view of the sky COBE
MAP
– > 13 wavelength coverage within 5 years NVSS
Impressive early discoveries FIRST
GALEX
– Finding exotic objects by unusual colors ROSAT
> L,T dwarfs, high redshift quasars OGLE
...
– Finding objects by time variability
> Gravitational micro-lensing
foster@mcs.anl.gov ARGONNE CHICAGO
8
Crab Nebula in 4 Spectral Regions
X-ray Optical
Infrared Radio
foster@mcs.anl.gov ARGONNE CHICAGO
9
Coming Floods of Astronomy Data
The planned Large Synoptic Survey
Telescope will produce over 10 petabytes
per year by 2008!
– All-sky survey every few days, so will have
fine-grain time series for the first time
foster@mcs.anl.gov ARGONNE CHICAGO
Data Intensive Biology and 10
Medicine
Medical data
– X-Ray, mammography data, etc. (many petabytes)
– Digitizing patient records (ditto)
X-ray crystallography
Molecular genomics and related disciplines
– Human Genome, other genome databases
– Proteomics (protein structure, activities, …)
– Protein interactions, drug delivery
Virtual Population Laboratory (proposed)
– Simulate likely spread of disease outbreaks
Brain scans (3-D, time dependent)
foster@mcs.anl.gov ARGONNE CHICAGO
11
A Brain
is a Lot
of Data!
(Mark Ellisman, UCSD)
And comparisons must be
made among many
We need to get to one micron to know location of every cell. We’re just now
starting to get to 10 microns – Grids will help get us there and further
foster@mcs.anl.gov ARGONNE CHICAGO
12
An Exponential World: (3) Networks
(Or, Coefficients Matter …)
Network vs. computer performance
– Computer speed doubles every 18 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years
1986 to 2000
– Computers: x 500
– Networks: x 340,000
2001 to 2010
– Computers: x 60
– Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-
foster@mcs.anl.gov Vined Khoslan, Kleiner, Caufield and Perkins.
2001) by Cleo Vilett, source ARGONNE CHICAGO
13
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
14
Evolution of the Scientific Process
Pre-electronic
– Theorize &/or experiment, alone or in small
teams; publish paper
Post-electronic
– Construct and mine very large databases of
observational or simulation data
– Develop computer simulations & analyses
– Exchange information quasi-instantaneously
within large, distributed, multidisciplinary
teams
foster@mcs.anl.gov ARGONNE CHICAGO
15
Evolution of Business
Pre-Internet
– Central corporate data processing facility
– Business processes not compute-oriented
Post-Internet
– Enterprise computing is highly distributed,
heterogeneous, inter-enterprise (B2B)
– Outsourcing becomes feasible => service
providers of various sorts
– Business processes increasingly computing-
and data-rich
foster@mcs.anl.gov ARGONNE CHICAGO
16
The Grid
―Resource sharing & coordinated problem
solving in dynamic, multi-institutional
virtual organizations‖
foster@mcs.anl.gov ARGONNE CHICAGO
17
An Example Virtual Organization:
CERN’s Large Hadron Collider
1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010; 50,000 CPUs?
foster@mcs.anl.gov ARGONNE CHICAGO
Grid Communities & Applications: 18
Data Grids for High Energy Physics
~PBytes/sec
1 TIPS is approximately 25,000
Online System ~100 MBytes/sec SpecInt95 equivalents
Offline Processor Farm
There is a “bunch crossing” every 25 nsecs.
~20 TIPS
There are 100 “triggers” per second
~100 MBytes/sec
Each triggered event is ~1 MByte in size
~622 Mbits/sec
Tier 0 CERN Computer Centre
or Air Freight (deprecated)
Tier 1
France Regional Germany Regional Italy Regional FermiLab ~4 TIPS
Centre Centre Centre
~622 Mbits/sec
Tier 2 Caltech Tier2 Tier2 Centre
Tier2 Centre Centre Tier2 Centre
~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
~622 Mbits/sec
Institute
Institute Institute Institute
~0.25TIPS Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more
Physics data cache
~1 MBytes/sec channels; data for these channels should be cached by the
institute server
Tier 4
Physicist workstations
www.griphyn.org
foster@mcs.anl.gov www.ppdg.net www.eu-datagrid.org
ARGONNE CHICAGO
Data Integration and Mining: (credit Sara Graves) 19
From Global Information to Local Knowledge
Emergency
Response
Precision Agriculture
Urban
Environments
Weather
Prediction
foster@mcs.anl.gov ARGONNE CHICAGO
Intelligent Infrastructure: 20
Distributed Servers and Services
foster@mcs.anl.gov ARGONNE CHICAGO
22
Grid Computing
foster@mcs.anl.gov ARGONNE CHICAGO
23
The Grid:
A Brief History
Early 90s
– Gigabit testbeds, metacomputing
Mid to late 90s
– Early experiments (e.g., I-WAY), academic
software projects (e.g., Globus, Legion),
application experiments
2002
– Dozens of application communities & projects
– Major infrastructure deployments
– Significant technology base (esp. Globus ToolkitTM)
– Growing industrial interest
– Global Grid Forum: ~500 people, 20+ countries
foster@mcs.anl.gov ARGONNE CHICAGO
28
The Grid World: Current Status
Dozens of major Grid projects in scientific &
technical computing/research & education
– www.mcs.anl.gov/~foster/grid-projects
Considerable consensus on key concepts
and technologies
– Open source Globus Toolkit™ a de facto
standard for major protocols & services
Industrial interest emerging rapidly
– IBM, Platform, Microsoft, Sun, Compaq, …
Opportunity: convergence of eScience and
eBusiness requirements & technologies
foster@mcs.anl.gov ARGONNE CHICAGO
35
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
36
Grid Technologies:
Resource Sharing Mechanisms That …
Address security and policy concerns of
resource owners and users
Are flexible enough to deal with many
resource types and sharing modalities
Scale to large number of resources, many
participants, many program components
Operate efficiently when dealing with large
amounts of data & computation
foster@mcs.anl.gov ARGONNE CHICAGO
37
Aspects of the Problem
1) Need for interoperability when different
groups want to share resources
– Diverse components, policies, mechanisms
– E.g., standard notions of identity, means of
communication, resource descriptions
2) Need for shared infrastructure services to
avoid repeated development, installation
– E.g., one port/service/protocol for remote
access to computing, not one per tool/appln
– E.g., Certificate Authorities: expensive to run
A common need for protocols & services
foster@mcs.anl.gov ARGONNE CHICAGO
39
The Hourglass Model
Focus on architecture issues Applications
– Propose set of core services Diverse global services
as basic infrastructure
– Use to construct high-level,
domain-specific solutions
Design principles Core
services
– Keep participation cost low
– Enable local control
– Support for adaptation
– ―IP hourglass‖ model
Local OS
foster@mcs.anl.gov ARGONNE CHICAGO
40
Layered Grid Architecture
(By Analogy to Internet Architecture)
Application
Internet Protocol Architecture
―Coordinating multiple resources‖:
ubiquitous infrastructure services, Collective
app-specific distributed services Application
―Sharing single resources‖:
negotiating access, controlling use Resource
―Talking to things‖: communication
(Internet protocols) & security Connectivity Transport
Internet
―Controlling things locally‖: Access
to, & control of, resources Fabric Link
foster@mcs.anl.gov ARGONNE CHICAGO
41
Globus Toolkit™
A software toolkit addressing key technical
problems in the development of Grid-enabled
tools, services, and applications
– Offer a modular set of orthogonal services
– Enable incremental development of grid-
enabled tools and applications
– Implement standard Grid protocols and APIs
– Available under liberal open source license
– Large community of developers & users
– Commercial support
foster@mcs.anl.gov ARGONNE CHICAGO
42
General Approach
Define Grid protocols & APIs
– Protocol-mediated access to remote resources
– Integrate and extend existing standards
– ―On the Grid‖ = speak ―Intergrid‖ protocols
Develop a reference implementation
– Open source Globus Toolkit
– Client and server SDKs, services, tools, etc.
Grid-enable wide variety of tools
– Globus Toolkit, FTP, SSH, Condor, SRB, MPI, …
Learn through deployment and applications
foster@mcs.anl.gov ARGONNE CHICAGO
43
Key Protocols
The Globus Toolkit™ centers around four
key protocols
– Connectivity layer:
> Security: Grid Security Infrastructure (GSI)
– Resource layer:
> Resource Management: Grid Resource Allocation
Management (GRAM)
> Information Services: Grid Resource Information
Protocol (GRIP) and Index Information Protocol (GIIP)
> Data Transfer: Grid File Transfer Protocol (GridFTP)
Also key collective layer protocols
– Info Services, Replica Management, etc.
foster@mcs.anl.gov ARGONNE CHICAGO
44
Globus Toolkit Structure
Service naming
Soft state
Reliable invocation management
GRAM MDS GridFTP MDS ???
Notification
GSI GSI GSI
Job
manager
Job
manager
Compute Data Other Service
Resource Resource or Application
foster@mcs.anl.gov ARGONNE CHICAGO
45
Connectivity Layer
Protocols & Services
Communication
– Internet protocols: IP, DNS, routing, etc.
Security: Grid Security Infrastructure (GSI)
– Uniform authentication, authorization, and
message protection mechanisms in multi-
institutional setting
– Single sign-on, delegation, identity mapping
– Public key technology, SSL, X.509, GSS-API
– Supporting infrastructure: Certificate
Authorities, certificate & key management, …
GSI: www.gridforum.org/security/gsi
foster@mcs.anl.gov ARGONNE CHICAGO
46
Why Grid Security is Hard
Resources being used may be extremely valuable
& the problems being solved extremely sensitive
Resources are often located in distinct
administrative domains
– Each resource may have own policies & procedures
The set of resources used by a single computation
may be large, dynamic, and/or unpredictable
– Not just client/server
It must be broadly available & applicable
– Standard, well-tested, well-understood protocols
– Integration with wide variety of tools
foster@mcs.anl.gov ARGONNE CHICAGO
47
Grid Security Requirements
User View Resource Owner View
1) Easy to use 1) Specify local access control
2) Single sign-on 2) Auditing, accounting, etc.
3) Run applications 3) Integration w/ local system
ftp,ssh,MPI,Condor,Web,… Kerberos, AFS, license mgr.
4) User based trust model 4) Protection from compromised
5) Proxies/agents (delegation) resources
Developer View
API/SDK with authentication, flexible message protection,
flexible communication, delegation, ...
Direct calls to various security functions (e.g. GSS-API)
Or security integrated into higher-level SDKs:
E.g. GlobusIO, Condor-G, MPICH-G2, HDF5, etc.
foster@mcs.anl.gov ARGONNE CHICAGO
48
Grid Security Infrastructure (GSI)
Extensions to existing standard protocols & APIs
– Standards: SSL/TLS, X.509 & CA, GSS-API
– Extensions for single sign-on and delegation
Globus Toolkit reference implementation of GSI
– SSLeay/OpenSSL + GSS-API + delegation
– Tools and services to interface to local security
> Simple ACLs; SSLK5 & PKINIT for access to K5, AFS, etc.
– Tools for credential management
> Login, logout, etc.
> Smartcards
> MyProxy: Web portal login and delegation
> K5cert: Automatic X.509 certificate creation
foster@mcs.anl.gov ARGONNE CHICAGO
GSI in Action: ―Create Processes at A and B 49
that Communicate & Access Files at C‖
Single sign-on via ―grid-id‖
& generation of proxy cred. User Proxy
User Or: retrieval of proxy cred.
Proxy
credential
from online repository
Remote process
creation requests*
GSI-enabled Authorize Ditto GSI-enabled
Site A GRAM server Map to local id GRAM server Site B
(Kerberos) Create process (Unix)
Computer Generate credentials Computer
Process Process
Local id Communication* Local id
Kerberos Restricted Remote file Restricted
ticket proxy
access request* proxy
GSI-enabled
Site C FTP server
(Kerberos)
* With mutual authentication Authorize
Storage Map to local id
system Access file
foster@mcs.anl.gov ARGONNE CHICAGO
50
GSI Working Group Documents
Grid Security Infrastructure (GSI) Roadmap
– Informational draft overview of working group
activities and documents
Grid Security Protocols & Syntax
– X.509 Proxy Certificates
– X.509 Proxy Delegation Protocol
– The GSI GSS-API Mechanism
Grid Security APIs
– GSS-API Extensions for the Grid
– GSI Shell API
foster@mcs.anl.gov ARGONNE CHICAGO
51
GSI Futures
Scalability in numbers of users & resources
– Credential management
– Online credential repositories (―MyProxy‖)
– Account management
Authorization
– Policy languages
– Community authorization
Protection against compromised resources
– Restricted delegation, smartcards
foster@mcs.anl.gov ARGONNE CHICAGO
52
Community Authorization
1. CAS request, with CAS user/group
resource names membership
and operations Does the
collective policy
resource/collective
2. CAS reply, with authorize this
membership
capability request for this
and resource CA info user?
collective policy
information
User Resource
3. Resource request,
authenticated with Is this request
capability authorized by
the
capability? local policy
information
4. Resource reply
Is this request
authorized for
the CAS?
Laura Pearlman,
foster@mcs.anl.govSteve Tuecke, Von Welch, others
ARGONNE CHICAGO
53
Resource Layer
Protocols & Services
Grid Resource Allocation Management (GRAM)
– Remote allocation, reservation, monitoring,
control of compute resources
GridFTP protocol (FTP extensions)
– High-performance data access & transport
Grid Resource Information Service (GRIS)
– Access to structure & state information
Others emerging: Catalog access, code
repository access, accounting, etc.
All built on connectivity layer: GSI & IP
GRAM, GridFTP, GRIS: www.globus.org
foster@mcs.anl.gov ARGONNE CHICAGO
54
Resource Management
The Grid Resource Allocation Management
(GRAM) protocol and client API allows
programs to be started and managed on
remote resources, despite local
heterogeneity
Resource Specification Language (RSL) is
used to communicate requirements
A layered architecture allows application-
specific resource brokers and co-allocators
to be defined in terms of GRAM services
– Integrated with Condor, PBS, MPICH-G2, …
foster@mcs.anl.gov ARGONNE CHICAGO
55
Resource
Management Architecture
Broker
RSL
RSL specialization
Queries Information
Application
& Info Service
Ground RSL
Co-allocator
Simple ground RSL
Local GRAM GRAM GRAM
resource
managers LSF Condor NQE
foster@mcs.anl.gov ARGONNE CHICAGO
56
Data Access & Transfer
GridFTP: extended version of popular FTP
protocol for Grid data access and transfer
Secure, efficient, reliable, flexible, extensible,
parallel, concurrent, e.g.:
– Third-party data transfers, partial file transfers
– Parallelism, striping (e.g., on PVFS)
– Reliable, recoverable data transfers
Reference implementations
– Existing clients and servers: wuftpd, ncftp
– Flexible, extensible libraries in Globus Toolkit
foster@mcs.anl.gov ARGONNE CHICAGO
57
The Grid Information Problem
Large numbers of distributed ―sensors‖ with
different properties
Need for different ―views‖ of this information,
depending on community membership, security
constraints, intended purpose, sensor type
foster@mcs.anl.gov ARGONNE CHICAGO
58
The Globus Toolkit Solution: MDS-2
Registration & enquiry protocols, information
models, query languages
– Provides standard interfaces to sensors
– Supports different ―directory‖ structures
supporting various discovery/access strategies
foster@mcs.anl.gov ARGONNE CHICAGO
59
Globus Applications and
Deployments
Application projects include
– GriPhyN, PPDG, NEES, EU DataGrid, ESG,
Fusion Collaboratory, etc., etc.
Infrastructure deployments include
– DISCOM, NASA IPG, NSF TeraGrid, DOE
Science Grid, EU DataGrid, etc., etc.
– UK Grid Center, U.S. GRIDS Center
Technology projects include
– Data Grids, Access Grid, Portals, CORBA,
MPICH-G2, Condor-G, GrADS, etc., etc.
foster@mcs.anl.gov ARGONNE CHICAGO
60
Globus Futures
Numerous large projects are pushing hard
on production deployment & application
– Much will be learned in next 2 years!
Active R&D program, focused for example on
– Security & policy for resource sharing
– Flexible, high-perf., scalable data sharing
– Integration with Web Services etc.
– Programming models and tools
Community code development producing a
true Open Grid Architecture
foster@mcs.anl.gov ARGONNE CHICAGO
61
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
62
Important Grid Applications
Data-intensive
Distributed computing (metacomputing)
Collaborative
Remote access to, and computer
enhancement of, experimental facilities
foster@mcs.anl.gov ARGONNE CHICAGO
63
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
64
Data Intensive Science: 2000-2015
Scientific discovery increasingly driven by IT
– Computationally intensive analyses
– Massive data collections
– Data distributed across networks of varying capability
– Geographically distributed collaboration
Dominant factor: data growth (1 Petabyte = 1000 TB)
– 2000 ~0.5 Petabyte
How to collect, manage,
– 2005 ~10 Petabytes
access and interpret this
– 2010 ~100 Petabytes quantity of data?
– 2015 ~1000 Petabytes?
Drives demand for “Data Grids” to handle
additional dimension of data access & movement
foster@mcs.anl.gov ARGONNE CHICAGO
65
Data Grid Projects
Particle Physics Data Grid (US, DOE)
– Data Grid applications for HENP expts.
GriPhyN (US, NSF)
– Petascale Virtual-Data Grids
iVDGL (US, NSF) Collaborations of
– Global Grid lab application scientists &
TeraGrid (US, NSF) computer scientists
– Dist. supercomp. resources (13 TFlops) Infrastructure devel. &
European Data Grid (EU, EC) deployment
– Data Grid technologies, EU deployment Globus based
CrossGrid (EU, EC)
– Data Grid technologies, EU emphasis
DataTAG (EU, EC)
– Transatlantic network, Grid applications
Japanese Grid Projects (APGrid) (Japan)
– Grid deployment throughout Japan
foster@mcs.anl.gov ARGONNE CHICAGO
Grid Communities & Applications: 66
Data Grids for High Energy Physics
~PBytes/sec
1 TIPS is approximately 25,000
Online System ~100 MBytes/sec SpecInt95 equivalents
Offline Processor Farm
There is a “bunch crossing” every 25 nsecs.
~20 TIPS
There are 100 “triggers” per second
~100 MBytes/sec
Each triggered event is ~1 MByte in size
~622 Mbits/sec
Tier 0 CERN Computer Centre
or Air Freight (deprecated)
Tier 1
France Regional Germany Regional Italy Regional FermiLab ~4 TIPS
Centre Centre Centre
~622 Mbits/sec
Tier 2 Caltech Tier2 Tier2 Centre
Tier2 Centre Centre Tier2 Centre
~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
~622 Mbits/sec
Institute
Institute Institute Institute
~0.25TIPS Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more
Physics data cache
~1 MBytes/sec channels; data for these channels should be cached by the
institute server
Tier 4
Physicist workstations
www.griphyn.org
foster@mcs.anl.gov www.ppdg.net www.eu-datagrid.org
ARGONNE CHICAGO
Biomedical Informatics 67
Research Network (BIRN)
Evolving reference set of
brains provides essential
data for developing
therapies for neurological
disorders (multiple sclerosis,
Alzheimer’s, etc.).
Today
– One lab, small patient base
– 4 TB collection
Tomorrow
– 10s of collaborating labs
– Larger population sample
– 400 TB data collection: more
brains, higher resolution
– Multiple scale data integration
and analysis
foster@mcs.anl.gov ARGONNE CHICAGO
mammograms 68
Digital Radiology X-rays
MRI
(Hollebeek, U. Pennsylvania) CAT scans
endoscopies
Hospital digital data ...
– Very large data sources: great clinical value to
digital storage and manipulation and significant
cost savings
– 7 Terabytes per hospital per year
– Dominated by digital images
Why mammography
– Clinical need for film recall & computer analysis
– Large volume ( 4,000 GB/year ) (57% of total)
– Storage and records standards exist
– Great clinical value
foster@mcs.anl.gov ARGONNE CHICAGO
69
Earth System Grid
(ANL, LBNL, LLNL, NCAR, ISI, ORNL)
Enable a distributed community [of
thousands] to perform computationally
intensive analyses on large climate datasets
Via
– Creation of Data Grid supporting secure, high-
performance remote access
– ―Smart data servers‖ supporting reduction and
analyses
– Integration with environmental data analysis
systems, protocols, and thin clients
www.earthsystemgrid.org (soon)
foster@mcs.anl.gov ARGONNE CHICAGO
70
Earth System Grid Architecture
Attribute
Metadata Specification Replica
Catalog Application Catalog
Multiple Locations
Logical Collection and
Selected
Logical File Name
Replica Replica MDS
Selection
GridFTP commands Performance
Information &
Predictions
NWS
Disk Cache
Tape Library
Disk Array Disk Cache
Replica Location 1 Replica Location 2 Replica Location 3
foster@mcs.anl.gov ARGONNE CHICAGO
71
Data Grid Toolkit Architecture
Collective Data Management Service
Collective File Movement
(Collection mgmt., priority, fault recovery, replication, resource selection)
Data Transfer Service
End-to-End File Transfer
(Link optimization, performance guarantees, admission control)
Data Movement Service
Optimized Endpoint Management
(Bulk parallel transfer, rate-limited transfer, disk/network scheduling)
foster@mcs.anl.gov ARGONNE CHICAGO
72
A Universal
Access/Transport Protocol
Suite of communication libraries and related tools
that support
– GSI security – Integrated instrumentation
– Third-party transfers – Parallel transfers
– Parameter set/negotiate – Striping (cf DPSS)
– Partial file access – Policy-based access control
– Reliability/restart – Server-side computation
– Logging/audit trail [later]
All based on a standard, widely deployed protocol
foster@mcs.anl.gov ARGONNE CHICAGO
73
And the Universal Protocol is …
GridFTP
Why FTP?
– Ubiquity enables interoperation with many
commodity tools
– Already supports many desired features,
easily extended to support others
We use the term GridFTP to refer to
– Transfer protocol which meets requirements
– Family of tools which implement the protocol
Note GridFTP > FTP
Note that despite name, GridFTP is not
restricted to file transfer!
foster@mcs.anl.gov ARGONNE CHICAGO
74
GridFTP: Basic Approach
FTP is defined by several IETF RFCs
Start with most commonly used subset
– Standard FTP: get/put etc., 3rd-party transfer
Implement RFCed but often unused features
– GSS binding, extended directory listing,
simple restart
Extend in various ways, while preserving
interoperability with existing servers
– Stripe/parallel data channels, partial file,
automatic & manual TCP buffer setting,
progress and extended restart
foster@mcs.anl.gov ARGONNE CHICAGO
75
The GridFTP Family of Tools
Patches to existing FTP code
– GSI-enabled versions of existing FTP client
and server, for high-quality production code
Custom-developed libraries
– Implement full GridFTP protocol, targeting
custom use, high-performance
Custom-developed tools
– E.g., high-performance striped FTP server
foster@mcs.anl.gov ARGONNE CHICAGO
76
High-Performance Data Transfer
GRAM GRIP GridFTP TCP, BTP… TCP, BTP…
Control
Data Channel Data Channel
Resource Mgmt. Enquiry Interface
(disk, NIC)
Scheduling Rate Limiting Rate Limiting
Modules Interface Interface
Bulk Transfer TCP Transfer Bulk Transfer TCP Transfer
Protocol Protocol Protocol Protocol
foster@mcs.anl.gov ARGONNE CHICAGO
77
GridFTP for Efficient WAN Transfer
Transfer Tb+ datasets
– Highly-secure authentication
– Parallel transfer for speed
Parallel Transfer
LLNL->Chicago transfer (slow Fully utilizes bandwidth of
site network interfaces): network interface on single nodes.
GridFTP (globus-url-copy)
80
Parallel Filesystem
Parallel Filesystem
70
Bandwidth (Mbs)
60
50
40
30
20
10
0
0 5 10 15 20 25 30 35
# of Parallel Streams
FUTURE: Integrate striped
GridFTP with parallel storage Striped Transfer
systems, e.g., HPSS
Fully utilizes bandwidth of
Gb+ WAN using multiple nodes.
foster@mcs.anl.gov ARGONNE CHICAGO
78
GridFTP for User-Friendly
Visualization Setup
High-res visualization is too large
for display on a single system
– Needs to be tiled, 24bit->16bit
depth
– Needs to be staged to display
units
GridFTP/ActiveMural integration
application performs tiling, data
reduction, and staging in a single
operation
– PVFS/MPI-IO on server
– MPI process group transforms
data as needed before transfer
– Performance is currently bounded
by 100Mb/s NICs on display
nodes
foster@mcs.anl.gov ARGONNE CHICAGO
79
Distributed Computing+Visualization
Remote Center WAN Transfer
Generates Tb+ datasets from Chiba City Visualization
simulation code code constructs
FLASH data transferred and stores
to ANL for visualization high-resolution
visualization
GridFTP parallelism frames for
utilizes high bandwidth display on
(Capable of utilizing many devices
>Gb/s WAN links)
Job Submission
Simulation code submitted to
remote center for execution ActiveMural Display LAN/WAN Transfer
on 1000s of nodes Displays very high resolution
User-friendly striped GridFTP
large-screen dataset animations
application tiles the frames and
stages tiles onto display nodes
FUTURE (1-5 yrs)
• 10s Gb/s LANs, WANs
• End-to-end QoS
• Automated replica
management
• Server-side data
reduction & analysis
• Interactive portals
foster@mcs.anl.gov ARGONNE CHICAGO
80
SC’2001 Experiment:
Simulation of HEP Tier 1 Site
Tiered (Hierarchical) Site Structure
– All data generated at lower tiers must be
forwarded to the higher tiers
– Tier 1 sites may have many sites transmitting to
them simultaneously and will need to sink a
substantial amount of bandwidth
– We demonstrated the ability of GridFTP to support
this at SC 2001 in the Bandwidth Challenge
– 16 Sites, with 27 Hosts, pushed a peak of 2.8 Gbs
to the showfloor in Denver with a sustained
bandwidth of nearly 2 Gbs
foster@mcs.anl.gov ARGONNE CHICAGO
81
Visualization of Network Traffic
During the Bandwidth Challenge
foster@mcs.anl.gov ARGONNE CHICAGO
82
The Replica
Management Problem
Maintain a mapping between logical names
for files and collections and one or more
physical locations
Important for many applications
Example: CERN high-level trigger data
– Multiple petabytes of data per year
– Copy of everything at CERN (Tier 0)
– Subsets at national centers (Tier 1)
– Smaller regional centers (Tier 2)
– Individual researchers will have copies
foster@mcs.anl.gov ARGONNE CHICAGO
83
Our Approach to Replica
Management
Identify replica cataloging and reliable
replication as two fundamental services
– Layer on other Grid services: GSI, transport,
information service
– Use as a building block for other tools
Advantage
– These services can be used in a wide variety
of situations
foster@mcs.anl.gov ARGONNE CHICAGO
84
Replica Catalog Structure:
A Climate Modeling Example
Replica Catalog
Logical Collection Logical Collection
C02 measurements 1998 C02 measurements 1999
Filename: Jan 1998
Filename: Feb 1998
…
Logical
Location Location
jupiter.isi.edu sprite.llnl.gov File Parent
Filename: Mar 1998 Filename: Jan 1998
Filename: Jun 1998 … Logical File Logical File
Filename: Oct 1998 Filename: Dec 1998 Jan 1998 Feb 1998
Protocol: gsiftp Protocol: ftp
UrlConstructor: UrlConstructor: Size: 1468762
gsiftp://jupiter.isi.edu/ ftp://sprite.llnl.gov/
nfs/v6/climate pub/pcmdi
foster@mcs.anl.gov ARGONNE CHICAGO
85
Giggle: A Scalable Replication
Location Service
Local replica catalogs maintain definitive
information about replicas
Publish (perhaps approximate) information
using soft state techniques
Variety of indexing strategies possible
10000
Time (ms)
10
Time for soft-state update (sec)
8 1000
6
Create (LRC) Add (LRC) 100
Delete (LRC) Query (LRC)
4 Query (RLI) 1 LRC
10
2 2 LRCs
0 1
1 10 100 1000 1 10 100 1000
Number of LFNs ('000) Number of LFNs ('000)
foster@mcs.anl.gov ARGONNE CHICAGO
86
GriPhyN = App. Science + CS + Grids
GriPhyN = Grid Physics Network
– US-CMS High Energy Physics
– US-ATLAS High Energy Physics
– LIGO/LSC Gravity wave research
– SDSS Sloan Digital Sky Survey
– Strong partnership with computer scientists
Design and implement production-scale grids
– Develop common infrastructure, tools and services
– Integration into the 4 experiments
– Application to other sciences via ―Virtual Data Toolkit‖
Multi-year project
– R&D for grid architecture (funded at $11.9M +$1.6M)
– Integrate Grid infrastructure into experiments through VDT
foster@mcs.anl.gov ARGONNE CHICAGO
87
GriPhyN Institutions
– U Florida – U Penn
– U Chicago – U Texas, Brownsville
– Boston U – U Wisconsin, Milwaukee
– Caltech – UC Berkeley
– U Wisconsin, Madison – UC San Diego
– USC/ISI – San Diego
– Harvard Supercomputer Center
– Indiana – Lawrence Berkeley Lab
– Johns Hopkins – Argonne
– Northwestern – Fermilab
– Stanford – Brookhaven
– U Illinois at Chicago
foster@mcs.anl.gov ARGONNE CHICAGO
88
GriPhyN: PetaScale Virtual Data Grids
Production Team
Individual Investigator Workgroups
~1 Petaop/s
~100 Petabytes
Interactive User Tools
Virtual Data Request Planning & Request Execution &
Tools Scheduling Tools Management Tools
Resource
Resource Security and
Security and Other Grid
Other Grid
Management
Management Policy
Policy Services
Services
Services
Services Services
Services
Transforms
Distributed resources
Raw data (code, storage, CPUs,
source networks)
foster@mcs.anl.gov ARGONNE CHICAGO
89
GriPhyN/PPDG
Data Grid Architecture
Application = initial solution is operational
DAG
Catalog Services Monitoring
Planner MCAT; GriPhyN catalogs MDS
DAG Info Services
Repl. Mgmt.
MDS
GDMP
Executor Policy/Security
DAGMAN, Kangaroo GSI, CAS
Reliable Transfer
Service
Globus
Compute Resource Storage Resource
GRAM GridFTP; GRAM; SRM
Ewa Deelman, Mike Wilde
foster@mcs.anl.gov ARGONNE CHICAGO
90
GriPhyN Research Agenda
Virtual Data technologies
– Derived data, calculable via algorithm
– Instantiated 0, 1, or many times (e.g., caches)
– ―Fetch value‖ vs. ―execute algorithm‖
– Potentially complex (versions, cost calculation, etc)
E.g., LIGO: ―Get gravitational strain for 2 minutes
around 200 gamma-ray bursts over last year‖
For each requested data value, need to
– Locate item materialization, location, and algorithm
– Determine costs of fetching vs. calculating
– Plan data movements, computations to obtain results
– Execute the plan
foster@mcs.anl.gov ARGONNE CHICAGO
91
Virtual Data
in Action
Data request may
Major facilities, archives
– Compute locally
– Compute remotely
– Access local data
– Access remote data
Scheduling based on Regional facilities, caches
– Local policies
– Global policies
– Cost Fetch item
Local facilities, caches
foster@mcs.anl.gov ARGONNE CHICAGO
92
GriPhyN Research Agenda (cont.)
Execution management
– Co-allocation (CPU, storage, network transfers)
– Fault tolerance, error reporting
– Interaction, feedback to planning
Performance analysis (with PPDG)
– Instrumentation, measurement of all components
– Understand and optimize grid performance
Virtual Data Toolkit (VDT)
– VDT = virtual data services + virtual data tools
– One of the primary deliverables of R&D effort
– Technology transfer to other scientific domains
foster@mcs.anl.gov ARGONNE CHICAGO
93
Programs as Community Resources:
Data Derivation and Provenance
Most scientific data are not simple
―measurements‖; essentially all are:
– Computationally corrected/reconstructed
– And/or produced by numerical simulation
And thus, as data and computers become
ever larger and more expensive:
– Programs are significant community resources
– So are the executions of those programs
foster@mcs.anl.gov ARGONNE CHICAGO
94
―I’ve come across some
interesting data, but I need
to understand the nature of
the corrections applied ―I’ve detected a calibration
when it was constructed Data error in an instrument and
before I can trust it for my want to know which derived
purposes.‖ data to recompute.‖
created-by consumed-by/
generated-by
Transformation execution-of Derivation
―I want to apply an
―I want to search an astronomical
astronomical analysis
database for galaxies with certain
characteristics. If a program that program to millions of
performs this analysis exists, I objects. If the results
won’t have to write one from already exist, I’ll save
scratch.‖
foster@mcs.anl.gov weeks of computation.‖
ARGONNE CHICAGO
95
The Chimera Virtual Data System
(GriPhyN Project)
Virtual data catalog
– Transformations, Virtual Data
Applications
derivations, data Task Graphs
(compute and data
Chimera
movement tasks, with
Virtual data language Virtual Data Language
(definition and query)
dependencies)
– Data definition + VDL Interpreter Data Grid Resources
query (manipulate derivations (distributed execution
and transformations) and data management)
Applications include SQL
browsers and data
Virtual Data Catalog
analysis applications (implements Chimera
Virtual Data Schema)
foster@mcs.anl.gov ARGONNE CHICAGO
96
SDSS Galaxy Cluster Finding
foster@mcs.anl.gov ARGONNE CHICAGO
97
Cluster-finding Data Pipeline
catalog
5
cluster
4
core core
3 3
brg brg brg brg
2 2 2 2
field field field field
1 1 1 1
tsObj tsObj tsObj
tsObj
foster@mcs.anl.gov ARGONNE CHICAGO
98
Virtual Data in CMS
Virtual Data Long Term Vision of CMS:
CMS Note 2001/047, GRIPHYN 2001-16
foster@mcs.anl.gov ARGONNE CHICAGO
Early GriPhyN Challenge Problem: 99
CMS Data Reconstruction
2) Launch secondary job on WI pool;
Master Condor input files via Globus GASS
job running at Secondary
Caltech Condor job on WI
5) Secondary pool
reports complete
Caltech to master
workstation
6) Master starts
reconstruction jobs 3) 100 Monte
via Globus Carlo jobs on
jobmanager on Wisconsin Condor
cluster pool
9) Reconstruction
job reports
complete to master
4) 100 data files
transferred via
7) GridFTP fetches GridFTP, ~ 1 GB
data from UniTree each
NCSA Linux cluster
NCSA UniTree
8) Processed - GridFTP-
objectivity enabled FTP
database stored server
to UniTree
Scott Koranda, Miron Livny, others
foster@mcs.anl.gov ARGONNE CHICAGO
100
Trace of a Condor-G Physics Run
120
100
Pre / Simulation Jobs / ooDigis at NCSA
Post (UW Condor)
80
ooHits at NCSA
60
40 Delay due to
script error
20
0
19
19
19
9
9
9
:1
:1
:1
4:
4:
4:
16
16
16
01
01
01
01
01
01
3/
4/
5/
3/
4/
5/
4/
4/
4/
4/
4/
4/
foster@mcs.anl.gov ARGONNE CHICAGO
101
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
102
Distributed Computing
Aggregate computing resources & codes
– Multidisciplinary simulation
– Metacomputing/distributed simulation
– High-throughput/parameter studies
Challenges
– Heterogeneous compute & network
capabilities, latencies, dynamic behaviors
Example tools
– MPICH-G2: Grid-aware MPI
– Condor-G, Nimrod-G: parameter studies
foster@mcs.anl.gov ARGONNE CHICAGO
103
Multidisciplinary Simulations:
Aviation Safety Wing Models
•Lift Capabilities
•Drag Capabilities Stabilizer Models
•Responsiveness
Airframe Models
•Deflection capabilities
•Responsiveness
Crew Capabilities
- accuracy
- perception
- stamina
- re-action times
- SOPs
Engine Models
Human Models •Braking performance
•Steering capabilities
•Thrust performance
•Traction
•Reverse Thrust performance
•Dampening capabilities
•Responsiveness
•Fuel Consumption
Landing Gear Models
Whole system simulations are produced by coupling all of the sub-system simulations
foster@mcs.anl.gov ARGONNE CHICAGO
104
MPICH-G2: A Grid-Enabled MPI
A complete implementation of the Message
Passing Interface (MPI) for heterogeneous,
wide area environments
– Based on the Argonne MPICH implementation
of MPI (Gropp and Lusk)
Requires services for authentication, resource
allocation, executable staging, output, etc.
Programs run in wide area without change
– Modulo accommodating heterogeneous
communication performance
See also: MetaMPI, PACX, STAMPI, MAGPIE
www.globus.org/mpi
foster@mcs.anl.gov ARGONNE CHICAGO
105
Grid-based Computation: Challenges
Locate ―suitable‖ computers
Authenticate with appropriate sites
Allocate resources on those computers
Initiate computation on those computers
Configure those computations
Select ―appropriate‖ communication methods
Compute with ―suitable‖ algorithms
Access data files, return output
Respond ―appropriately‖ to resource changes
foster@mcs.anl.gov ARGONNE CHICAGO
MPICH-G2 Use of Grid Services 106
% grid-proxy-init
% mpirun -np 256 myprog
Locates Generates
MDS mpirun resource specification
hosts
Stages Submits multiple jobs
GASS globusrun
executables
DUROC Coordinates startup
Authenticates
GRAM GRAM GRAM
Initiates job Detects termination
fork LSF LoadLeveler
Monitors/controls
P1 P2 P1 P2 P1 P2
Communicates via vendor-MPI and TCP/IP (globus-io)
foster@mcs.anl.gov ARGONNE CHICAGO
107
Cactus
(Allen, Dramlitsch, Seidel, Shalf, Radke)
Modular, portable framework for
parallel, multidimensional simulations
Construct codes by linking
Thorns
– Small core (flesh): mgmt services
Cactus
– Selected modules (thorns): Numerical ―flesh‖
methods, grids & domain decomps,
visualization and steering, etc.
Custom linking/configuration tools
Developed for astrophysics, but not
astrophysics-specific
www.cactuscode.org
foster@mcs.anl.gov ARGONNE CHICAGO
108
Cactus: An Application
Framework for Dynamic Grid Computing
Cactus thorns for active management of
application behavior and resource use
Heterogeneous resources, e.g.:
– Irregular decompositions
– Variable halo for managing message size
– Msg compression (comp/comm tradeoff)
– Comms scheduling for comp/comm overlap
Dynamic resource behaviors/demands, e.g.:
– Perf monitoring, contract violation detection
– Dynamic resource discovery & migration
– User notification and steering
foster@mcs.anl.gov ARGONNE CHICAGO
Cactus Example: Gig-E 109
100MB/sec
17 Terascale Computing
4 2 2
12 OC-12 line
But only 2.5MB/sec) 12
5
SDSC IBM SP 5
NCSA Origin Array
1024 procs 256+128+128
5x12x17 =1020 5x12x(4+2+2) =480
Solved EEs for gravitational waves (real code)
– Tightly coupled, communications required through derivatives
– Must communicate 30MB/step between machines
– Time step take 1.6 sec
Used 10 ghost zones along direction of machines:
communicate every 10 steps
Compression/decomp. on all data passed in this direction
Achieved 70-80% scaling, ~200GF (only 14% scaling
without tricks)
foster@mcs.anl.gov ARGONNE CHICAGO
110
Cactus Example (2):
Migration in Action
Running 3 successive Resource Running
At UC Load contract discovery At UIUC
applied violations & migration
1.4
1.2
Iterations/Second
1
0.8
0.6
0.4 (migration
0.2 time not to scale)
0
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
Clock Time
foster@mcs.anl.gov ARGONNE CHICAGO
IPG Milestone 3: 111
high-lift subsonic Large Computing Node
Completed 12/2000
wind tunnel model
Glenn
Ames
Cleveland, OH
Moffett Field, CA Sharp Langley
Hampton, VA
OVERFLOW on IPG
using Globus and
MPICH-G2 for intra-
problem, wide area
communication
Whitcomb
Lomax
512 node SGI Origin 2000 Application POC: Mohammad J. Djomehri
Slide courtesy Bill
foster@mcs.anl.gov Johnston, LBNL & NASA ARGONNE CHICAGO
112
High-Throughput Computing:
Condor
High-throughput computing platform for
mapping many tasks to idle computers
Three major components
– Scheduler manages pool(s) of [distributively
owned or dedicated] computers
– DAGman manages user task pools
– Matchmaker schedules tasks to computers
Parameter studies, data analysis
Condor-G extensions support wide area
execution in Grid environment
www.cs.wisc.edu/condor
foster@mcs.anl.gov ARGONNE CHICAGO
113
Defining a DAG
A DAG is defined by a .dag file, listing each
of its nodes and their dependencies:
# diamond.dag
Job A a.sub
Job A
Job B b.sub
Job C c.sub Job B Job C
Job D d.sub
Parent A Child B C Job D
Parent B C Child D
Each node runs the Condor job specified by
its accompanying Condor submit file
foster@mcs.anl.gov ARGONNE CHICAGO
114
High-Throughput Computing:
Mathematicians Solve NUG30
Looking for the solution to the
NUG30 quadratic assignment
problem
An informal collaboration of
mathematicians and computer
scientists
Condor-G delivered 3.46E8
CPU seconds in 7 days (peak
14,5,28,24,1,3,16,15,
1009 processors) in U.S. and
Italy (8 sites) 10,9,21,2,4,29,25,22,
13,26,17,30,6,20,19,
8,18,7,27,12,11,23
MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin
foster@mcs.anl.gov ARGONNE CHICAGO
115
Grid Application Development Software
(GrADS) Project
hipersoft.rice.edu/grads
foster@mcs.anl.gov ARGONNE CHICAGO
116
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
117
Access Grid
High-end group work and
collaboration technology
Grid services being used for
discovery, configuration,
authentication
O(50) systems deployed
worldwide Presenter
mic
Basis for SC’2001 SC Global
event in November 2001 Presenter
camera
– www.scglobal.org
Ambient mic
(tabletop)
Audience camera
www.accessgrid.org
foster@mcs.anl.gov ARGONNE CHICAGO
118
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
119
Grid-Enabled Research Facilities:
Leverage Investments
Research instruments, satellites, particle
accelerators, MRI machines, etc., cost a
great deal
Data from those devices can be accessed
and analyzed by many more scientists
– Not just the team that gathered the data
More productive use of instruments
– Calibration, data sampling during a run, via
on-demand real-time processing
foster@mcs.anl.gov ARGONNE CHICAGO
120
Telemicroscopy & Grid-Based Computing
DATA ACQUISITION
PROCESSING, ADVANCED
ANALYSIS VISUALIZATION
NETWORK
IMAGING COMPUTATIONAL
INSTRUMENTS RESOURCES
LARGE-SCALE
foster@mcs.anl.gov ARGONNE CHICAGO
DATABASES
APAN Trans-Pacific Telemicroscopy 121
Collaboration, Osaka-U, UCSD, ISI
(slide courtesy Mark Ellisman@UCSD)
1st
UHVEM NCMIR
(Osaka, Japan) (San Diego)
(Chicago) (San Diego)
Tokyo XP STAR SDSC
TAP
TransPAC vBNS
Globus
CRL/MPT UCSD
2nd
UHVEM NCMIR
(Osaka, Japan) (San Diego)
foster@mcs.anl.gov ARGONNE CHICAGO
Network for 122
Earthquake Engineering Simulation
NEESgrid: US national
infrastructure to couple
earthquake engineers
with experimental
facilities, databases,
computers, & each other
On-demand access to
experiments, data
streams, computing,
archives, collaboration
Argonne, Michigan,
foster@mcs.anl.gov NCSA, UIUC, USC www.neesgrid.org
ARGONNE CHICAGO
123
―Experimental Facilities‖ Can
Include Field Sites
Remotely controlled sensor grids for field
studies, e.g., in seismology and biology
– Wireless/satellite communications
– Sensor net technology for low-cost
communications
foster@mcs.anl.gov ARGONNE CHICAGO
124
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
125
Nature and Role of Grid
Infrastructure
Persistent Grid infrastructure is critical to
the success of many eScience projects
– High-speed networks, certainly
– Remotely accessible compute & storage
– Persistent, standard services: PKI,
directories, reservation, …
– Operational & support procedures
Many projects creating such infrastructures
– Production operation is the goal, but much
to learn about how to create & operate
foster@mcs.anl.gov ARGONNE CHICAGO
126
A National Grid Infrastructure
A
A
A
REGIONAL A
A
A
A A REGIONAL
REGIONAL
A A
A
A A A
A A
A
A
REGIONAL A
A REGIONAL
A
A A
A
A
foster@mcs.anl.gov ARGONNE CHICAGO
127
Example Grid Infrastructure
Projects
I-WAY (1995): 17 U.S. sites for one week
GUSTO (1998): 80 sites worldwide, exp
NASA Information Power Grid (since 1999)
– Production Grid linking NASA laboratories
INFN Grid, EU DataGrid, iVDGL, … (2001+)
– Grids for data-intensive science
TeraGrid, DOE Science Grid (2002+)
– Production Grids link supercomputer centers
U.S. GRIDS Center
– Software packaging, deployment, support
foster@mcs.anl.gov ARGONNE CHICAGO
The 13.6 TF TeraGrid: 128
Computing at 40 Gb/s
Site Resources Site Resources
26
4 HPSS HPSS
24
External External
8 Networks Networks
5
Caltech Argonne
External
External
Networks
Networks
Site Resources SDSC NCSA/PACI Site Resources
4.1 TF 8 TF
HPSS 225 TB 240 TB UniTree
NCSA, SDSC, Caltech, Argonne
foster@mcs.anl.gov www.teragrid.org
ARGONNE CHICAGO
129
TeraGrid (Details)
574p IA-32
Chiba City
256p HP 32 32 32 32
X-Class Caltech Argonne 128p Origin
24 32 Nodes 64 Nodes 32
128p HP 32 HR Display &
V2500
24
0.5 TF 1 TF
8 8 5
VR Facilities
92p IA-32 0.4 TB Memory 0.25 TB Memory 5
HPSS
86 TB disk 25 TB disk
HPSS 24
OC-12
4 Extreme Chicago & LA DTF Core Switch/Routers ESnet
Black Diamond OC-48
Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) HSCC
Calren OC-48 OC-12
MREN/Abilene
NTON GbE
OC-12 ATM Juniper M160 Starlight
Juniper M40 SDSC NCSA Juniper M40
OC-12
vBNS 256 Nodes 500 Nodes OC-12
vBNS
OC-12
Abilene 8 TF, 4 TB Memory 2 OC-12
OC-12
2 4.1 TF, 2 TB Memory Abilene
Calren OC-3
MREN
ESnet
OC-3 225 TB disk 240 TB disk
8
4
HPSS 8 UniTree
2
Sun = 32x 1GbE
4
Starcat
1024p IA-32
1176p IBM SP 320p IA-64
Blue Horizon 16
= 64x Myrinet 14
4 = 32x Myrinet Myrinet Clos Spine
Myrinet Clos Spine
1500p Origin
Sun E10K
= 32x FibreChannel = 8x FibreChannel
10 GbE
32 quad-processor McKinley Servers 32 quad-processor McKinley Servers Fibre Channel Switch
(128p @ 4GF, 8GB memory/server) (128p @ 4GF, 12GB memory/server)
16 quad-processor McKinley Servers Cisco 6509 Catalyst Switch/Router IA-32 nodes
(64p @ 4GF, 8GB memory/server)
foster@mcs.anl.gov ARGONNE CHICAGO
130
Targeted StarLight
Optical Network Connections
Asia- CERN
Pacific CA*net4 SURFnet
Vancouver
Seattle
Portland NTON
U Wisconsin
San Francisco
Chicago PSC NYC
NTON IU
NCSA
Asia- DTF 40Gb
Pacific
Los Angeles NW Univ (Chicago) StarLight Hub
Atlanta
UICAtlanta
San Diego I-WIRE Chicago Cross connect
(SDSC) ANL Ill Inst of Tech
Univ of Chicago
St Louis AMPATH
Indianapolis
AMPATH
GigaPoP (Abilene NOC)
NCSA/UIUC
www.startap.net
foster@mcs.anl.gov ARGONNE CHICAGO
131
CA*net 4 Architecture
CANARIE
GigaPOP
ORAN DWDM
Carrier DWDM
Edmonton
Saskatoon
Calgary Regina St. John’s
Winnipeg Quebec
Charlottetown
Thunder Bay
Victoria Montreal
Ottawa
Vancouver Fredericton
Halifax
Boston
Seattle Chicago
New York
CA*net 4 node) Toronto
Possible future CA*net 4 node Windsor
foster@mcs.anl.gov ARGONNE CHICAGO
132
APAN Network Topology 2001.9.3
Europe
Japan 622Mbps x 2
Korea STAR TAP
(USA)
China
Hong Kong
Thailand Vietnam Philippines
Malaysia
Sri Lanka
Singapore Indonesia
Current status
2001(plan)
Australia
foster@mcs.anl.gov ARGONNE CHICAGO
133
iVDGL: A Global Grid Laboratory
“We propose to create, operate and evaluate, over a
sustained period of time, an international research
laboratory for data-intensive science.”
From NSF proposal, 2001
International Virtual-Data Grid Laboratory
– A global Grid laboratory (US, Europe, Asia, South
America, …)
– A place to conduct Data Grid tests ―at scale‖
– A mechanism to create common Grid infrastructure
– A laboratory for other disciplines to perform Data Grid
tests
– A focus of outreach efforts to small institutions
U.S. part funded by NSF (2001-2006)
– $13.7M (NSF) + $2M (matching)
foster@mcs.anl.gov ARGONNE CHICAGO
134
Initial US-iVDGL Data Grid
SKC BU
Wisconsin
PSU
BNL
Fermilab
Indiana JHU Hampton
Caltech
UCSD
Florida
Tier1 (FNAL)
Other sites to be Brownsville Proto-Tier2
added in 2002 Tier3 university
foster@mcs.anl.gov ARGONNE CHICAGO
iVDGL: 135
International Virtual Data Grid Laboratory
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org
foster@mcs.anl.gov ARGONNE CHICAGO
136
iVDGL Architecture
(from proposal)
foster@mcs.anl.gov ARGONNE CHICAGO
137
US iVDGL Interoperability
US-iVDGL-1 Milestone (August 02)
iGOC
US-iVDGL-
1
Aug 2002
ATLAS
SDSS/NVO
CMS LIGO
1 1
2 2
1
1
2
2
foster@mcs.anl.gov ARGONNE CHICAGO
138
Transatlantic Interoperability
iVDGL-2 Milestone (November 02)
iGOC
Outreach
iVDGL-2 DataTAG
Nov 2002
ATLAS
SDSS/NVO
CMS LIGO
ANL
CS Research UC
BNL
FNAL
BU CIT CERN
CIT JHU
HU PSU INFN
UCSD
IU ANL UTB UK PPARC
UF
LBL UC UWM U of A
FNAL
UM UCB
OU IU
UTA ISI
NU
UW
foster@mcs.anl.gov ARGONNE CHICAGO
139
Another Example:
INFN Grid in Italy
20 sites, ~200 persons, ~90 FTEs, ~20 IT
Preliminary budget for 3 years: 9 M Euros
Activities organized around
– S/w development with Datagrid, DataTAG….
– Testbeds (financed by INFN) for DataGrid,
DataTAG, US-EU Intergrid
– Experiments applications
– Tier1..Tiern prototype infrastructure
Large scale testbeds provided by LHC
experiments, Virgo…..
foster@mcs.anl.gov ARGONNE CHICAGO
140
U.S. GRIDS Center
NSF Middleware Infrastructure Program
GRIDS = Grid Research, Integration,
Deployment, & Support
NSF-funded center to provide
– State-of-the-art middleware infrastructure
to support national-scale collaborative
science and engineering
– Integration platform for experimental
middleware technologies
ISI, NCSA, SDSC, UC, UW
NMI software release one: May 2002
www.grids-center.org
foster@mcs.anl.gov ARGONNE CHICAGO
141
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
142
Globus Toolkit: Evaluation (+)
Good technical solutions for key problems, e.g.
– Authentication and authorization
– Resource discovery and monitoring
– Reliable remote service invocation
– High-performance remote data access
This & good engineering is enabling progress
– Good quality reference implementation, multi-
language support, interfaces to many systems,
large user base, industrial support
– Growing community code base built on tools
foster@mcs.anl.gov ARGONNE CHICAGO
143
Globus Toolkit: Evaluation (-)
Protocol deficiencies, e.g.
– Heterogeneous basis: HTTP, LDAP, FTP
– No standard means of invocation, notification,
error propagation, authorization, termination, …
Significant missing functionality, e.g.
– Databases, sensors, instruments, workflow, …
– Virtualization of end systems (hosting envs.)
Little work on total system properties, e.g.
– Dependability, end-to-end QoS, …
– Reasoning about system properties
foster@mcs.anl.gov ARGONNE CHICAGO
144
Globus Toolkit Structure
Service naming
Soft state
Reliable invocation management
GRAM MDS GridFTP MDS ???
Notification
GSI GSI GSI
Job
manager
Job
manager
Compute Data Other Service
Resource Resource or Application
Lots of good mechanisms, but (with the exception of GSI) not that easily
incorporated into other systems
foster@mcs.anl.gov ARGONNE CHICAGO
145
Open Grid Services Architecture
Service orientation to virtualize resources
Define fundamental Grid service behaviors
– Core set required, others optional
A unifying framework for interoperability &
establishment of total system properties
Integration with Web services and hosting
environment technologies
Leverage tremendous commercial base
Standard IDL accelerates community code
Delivery via open source Globus Toolkit 3.0
Leverage GT experience, code, mindshare
foster@mcs.anl.gov ARGONNE CHICAGO
146
―Web Services‖
Increasingly popular standards-based
framework for accessing network applications
– W3C standardization; Microsoft, IBM, Sun, others
WSDL: Web Services Description Language
– Interface Definition Language for Web services
SOAP: Simple Object Access Protocol
– XML-based RPC protocol; common WSDL target
WS-Inspection
– Conventions for locating service descriptions
UDDI: Universal Desc., Discovery, & Integration
– Directory for Web services
foster@mcs.anl.gov ARGONNE CHICAGO
147
Web Services Example:
Database Service
WSDL definition for ―DBaccess‖ porttype
defines operations and bindings, e.g.:
– Query(QueryLanguage, Query, Result)
– SOAP protocol
DBaccess
Client C, Java, Python, etc., APIs can then
be generated
foster@mcs.anl.gov ARGONNE CHICAGO
148
Transient Service Instances
―Web services‖ address discovery & invocation
of persistent services
– Interface to persistent state of entire enterprise
In Grids, must also support transient service
instances, created/destroyed dynamically
– Interfaces to the states of distributed activities
– E.g. workflow, video conf., dist. data analysis
Significant implications for how services are
managed, named, discovered, and used
– In fact, much of our work is concerned with the
management of service instances
foster@mcs.anl.gov ARGONNE CHICAGO
149
The Grid Service =
Interfaces + Service Data
Reliable invocation
Authentication
Service data access GridService … other interfaces … Notification
Explicit destruction Authorization
Soft-state lifetime Service creation
Service Service Service
Service registry
data
element
data
element
data
element
Manageability
Concurrency
Implementation
Hosting environment/runtime
(―C‖, J2EE, .NET, …)
foster@mcs.anl.gov ARGONNE CHICAGO
150
Open Grid Services Architecture:
Fundamental Structure
1) WSDL conventions and extensions for
describing and structuring services
– Useful independent of ―Grid‖ computing
2) Standard WSDL interfaces & behaviors for
core service activities
– portTypes and operations => protocols
foster@mcs.anl.gov ARGONNE CHICAGO
151
WSDL Conventions & Extensions
portType (standard WSDL)
– Define an interface: a set of related operations
serviceType (extensibility element)
– List of port types: enables aggregation
serviceImplementation (extensibility element)
– Represents actual code
service (standard WSDL)
– instanceOf extension: map descr.->instance
compatibilityAssertion (extensibility element)
– portType, serviceType, serviceImplementation
foster@mcs.anl.gov ARGONNE CHICAGO
152
Structure of a Grid Service
Service
service
… service service
… service
Instantiation instanceOf instanceOf instanceOf instanceOf
Service
Description serviceImplementation cA serviceImplementation
…
serviceType
c
A
serviceType …
=Standard WSDL
PortType
… PortType
c
A
PortType
cA = compatibilityAssertion
foster@mcs.anl.gov ARGONNE CHICAGO
Standard Interfaces & Behaviors: 153
Four Interrelated Concepts
Naming and bindings
– Every service instance has a unique name,
from which can discover supported bindings
Information model
– Service data associated with Grid service
instances, operations for accessing this info
Lifecycle
– Service instances created by factories
– Destroyed explicitly or via soft state
Notification
– Interfaces for registering interest and
delivering notifications
foster@mcs.anl.gov ARGONNE CHICAGO
154
OGSA Interfaces and Operations
Defined to Date
GridService Required Factory
– FindServiceData – CreateService
– Destroy PrimaryKey
– SetTerminationTime – FindByPrimaryKey
– DestroyByPrimaryKey
NotificationSource
– SubscribeToNotificationTopic Registry
– UnsubscribeToNotificationTopic – RegisterService
NotificationSink – UnregisterService
– DeliverNotification
HandleMap
– FindByHandle
Authentication, reliability are binding properties
Manageability, concurrency, etc., to be defined
foster@mcs.anl.gov ARGONNE CHICAGO
155
Service Data
A Grid service instance maintains a set of
service data elements
– XML fragments encapsulated in standard
<name, type, TTL-info> containers
– Includes basic introspection information,
interface-specific data, and application data
FindServiceData operation (GridService
interface) queries this information
– Extensible query language support
See also notification interfaces
– Allows notification of service existence and
changes in service data
foster@mcs.anl.gov ARGONNE CHICAGO
156
Grid Service Example:
Database Service
A DBaccess Grid service will support at
least two portTypes Grid
Service DBaccess
– GridService
– DBaccess Name, lifetime, etc.
Each has service data DB info
– GridService: basic introspection
information, lifetime, …
– DBaccess: database type, query languages
supported, current load, …, …
foster@mcs.anl.gov ARGONNE CHICAGO
159
Lifetime Management
GS instances created by factory or manually;
destroyed explicitly or via soft state
– Negotiation of initial lifetime with a factory
(=service supporting Factory interface)
GridService interface supports
– Destroy operation for explicit destruction
– SetTerminationTime operation for keepalive
Soft state lifetime management avoids
– Explicit client teardown of complex state
– Resource ―leaks‖ in hosting environments
foster@mcs.anl.gov ARGONNE CHICAGO
160
Factory
Factory interface’s CreateService operation
creates a new Grid service instance
– Reliable creation (once-and-only-once)
CreateService operation can be extended to
accept service-specific creation parameters
Returns a Grid Service Handle (GSH)
– A globally unique URL
– Uniquely identifies the instance for all time
– Based on name of a home handleMap service
foster@mcs.anl.gov ARGONNE CHICAGO
161
Transient Database Services
“What services “Create a database
can you create?” service”
Grid Grid
DBaccess
Service Service
Factory DBaccess
Instance name, etc. Name, lifetime, etc.
“What database Factory info DB info
services exist?”
Grid Grid
Service Registry Service DBaccess
Instance name, etc. Name, lifetime, etc.
Registry info DB info
foster@mcs.anl.gov ARGONNE CHICAGO
162
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
Service
BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
―I want to create
Database
a personal database Database Service
containing data on Factory
e.coli metabolism‖
BioDB n
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
163
Example:
Data Mining for Bioinformatics
―Find me a data Community
mining service, and Registry Mining
Factory Database
somewhere to store Service
data‖
BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
Database
Database Service
Factory
BioDB n
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
164
Example:
Data Mining for Bioinformatics
Community
Registry Mining
GSHs for Mining Factory Database
Service
and Database
factories
BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
Database
Database Service
Factory
BioDB n
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
165
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
―Create a data mining Service
service with initial
lifetime 10‖ BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
―Create a Database
database with initial Database Service
lifetime 1000‖ Factory
BioDB n
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
166
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
―Create a data mining Service
service with initial
lifetime 10‖ Miner BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
―Create a Database
database with initial Database Service
lifetime 1000‖ Factory
BioDB n
Database
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
167
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
Query Service
Miner BioDB 1
User Compute Service Provider
.
Application .
.
.
Query .
.
Database
Database Service
Factory
BioDB n
Database
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
168
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
Query Service
Miner BioDB 1
Keepalive
User Compute Service Provider
.
Application .
.
.
Query .
.
Database
Database Service
Keepalive Factory
BioDB n
Database
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
169
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
Service
Miner BioDB 1
Keepalive
User Compute Service Provider
.
Application . Results .
.
.
.
Database
Database Service
Keepalive Factory
Results
BioDB n
Database
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
170
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
Service
Miner BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
Database
Database Service
Keepalive Factory
BioDB n
Database
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
171
Example:
Data Mining for Bioinformatics
Community
Registry Mining
Factory Database
Service
BioDB 1
User Compute Service Provider
.
Application .
.
.
.
.
Database
Database Service
Keepalive Factory
BioDB n
Database
Storage Service Provider
foster@mcs.anl.gov ARGONNE CHICAGO
172
Notification Interfaces
NotificationSource for client subscription
– One or more notification generators
> Generates notification message of a specific type
> Typed interest statements: E.g., Filters, topics, …
> Supports messaging services, 3rd party filter services, …
– Soft state subscription to a generator
NotificationSink for asynchronous delivery
of notification messages
A wide variety of uses are possible
– E.g. Dynamic discovery/registry services,
monitoring, application error notification, …
foster@mcs.anl.gov ARGONNE CHICAGO
173
Notification Example
Notifications can be associated with any
(authorized) service data elements
Grid Notification Grid
Service Sink Service DBaccess
Name, lifetime, etc. Name, lifetime, etc.
DB info Notification DB info Subscribers
Source
foster@mcs.anl.gov ARGONNE CHICAGO
174
Notification Example
Notifications can be associated with any
(authorized) service data elements
Grid Notification Grid
Service Sink me of
―Notify Service DBaccess
new data about
Name, lifetime, etc.
membrane proteins‖ Name, lifetime, etc.
DB info Notification DB info Subscribers
Source
foster@mcs.anl.gov ARGONNE CHICAGO
175
Notification Example
Notifications can be associated with any
(authorized) service data elements
Grid Notification Grid
Service Sink Service DBaccess
Name, lifetime, etc.
Keepalive Name, lifetime, etc.
DB info Notification DB info Subscribers
Source
foster@mcs.anl.gov ARGONNE CHICAGO
176
Notification Example
Notifications can be associated with any
(authorized) service data elements
Grid Notification Grid
Service Sink Service DBaccess
New data
Name, lifetime, etc. Name, lifetime, etc.
DB info Notification DB info Subscribers
Source
foster@mcs.anl.gov ARGONNE CHICAGO
Open Grid Services Architecture: 177
Summary
Service orientation to virtualize resources
– Everything is a service
From Web services
– Standard interface definition mechanisms:
multiple protocol bindings, local/remote
transparency
From Grids
– Service semantics, reliability and security models
– Lifecycle management, discovery, other services
Multiple ―hosting environments‖
– C, J2EE, .NET, …
foster@mcs.anl.gov ARGONNE CHICAGO
178
Recap: The Grid Service
Reliable invocation
Authentication
Service data access GridService … other interfaces … Notification
Explicit destruction Authorization
Soft-state lifetime Service creation
Service Service Service
Service registry
data
element
data
element
data
element
Manageability
Concurrency
Implementation
Hosting environment/runtime
(―C‖, J2EE, .NET, …)
foster@mcs.anl.gov ARGONNE CHICAGO
179
OGSA and the Globus Toolkit
Technically, OGSA enables
– Refactoring of protocols (GRAM, MDS-2, etc.)—
while preserving all GT concepts/features!
– Integration with hosting environments:
simplifying components, distribution, etc.
– Greatly expanded standard service set
Pragmatically, we are proceeding as follows
– Develop open source OGSA implementation
> Globus Toolkit 3.0; supports Globus Toolkit 2.0 APIs
– Partnerships for service development
– Also expect commercial value-adds
foster@mcs.anl.gov ARGONNE CHICAGO
180
GT3: An Open Source OGSA-
Compliant Globus Toolkit
GT3 Core
– Implements Grid service
interfaces & behaviors
– Reference impln of Other Grid
GT3
evolving standard Services
Data
– Java first, C soon, C#? Services
GT3 Base Services GT3 Base Services
– Evolution of current GT3 Core
Globus Toolkit capabilities
– Backward compatible
Many other Grid services
foster@mcs.anl.gov ARGONNE CHICAGO
181
Hmm, Isn’t This Just Another
Object Model?
Well, yes, in a sense
– Strong encapsulation
– We (can) profit greatly from experiences of
previous object-based systems
But
– Focus on encapsulation not inheritance
– Does not require OO implementations
– Value lies in specific behaviors: lifetime,
notification, authorization, …, …
– Document-centric not type-centric
foster@mcs.anl.gov ARGONNE CHICAGO
182
Grids and OGSA:
Research Challenges
Grids pose profound problems, e.g.
– Management of virtual organizations
– Delivery of multiple qualities of service
– Autonomic management of infrastructure
– Software and system evolution
OGSA provides foundation for tackling
these problems in a rigorous fashion?
– Structured establishment/maintenance of
global properties
– Reasoning about total system properties
foster@mcs.anl.gov ARGONNE CHICAGO
183
Summary
OGSA represents refactoring of current
Globus Toolkit protocols and integration
with Web services technologies
Several desirable features
– Significant evolution of functionality
– Uniform IDL facilitates code sharing
– Allows for alignment of potentially divergent
directions (e.g., info service, service
registry, monitoring)
foster@mcs.anl.gov ARGONNE CHICAGO
184
Evolution
This is not happening all at once
– We have an early prototype of Core (alpha
release May?)
– Next we will work on Base, others
– Full release by end of 2002??
– Establishing partnerships for other services
Backward compatibility
– API level seems straightforward
– Protocol level: gateways?
– We need input on best strategies
foster@mcs.anl.gov ARGONNE CHICAGO
185
For More Information
OGSA architecture and overview
– ―The Physiology of the Grid: An Open Grid
Services Architecture for Distributed Systems
Integration‖, at www.globus.org/ogsa
Grid service specification
– At www.globus.org/ogsa
Open Grid Services Infrastructure WG, GGF
– www.gridforum.org/ogsi (?), soon
Globus Toolkit OGSA prototype
– www.globus.org/ogsa
foster@mcs.anl.gov ARGONNE CHICAGO
186
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
187
GGF Objectives
An open process for development of standards
– Grid ―Recommendations‖ process modeled after
Internet Standards Process (IETF)
A forum for information exchange
– Experiences, patterns, structures
A regular gathering to encourage shared effort
– In code development: libraries, tools…
– Via resource sharing: shared Grids
– In infrastructure: consensus standards
foster@mcs.anl.gov ARGONNE CHICAGO
188
GGF Groups
Working Groups Research Groups
– Tightly focused on – More exploratory than
development of a spec or Working Groups
set of related specs – Focused on understanding
> Protocol, API, etc. requirements,
– Finite set of objectives and taxonomies, models,
schedule of milestones methods for solving a
Groups are approved and evaluated by a particular set of related
GGF Steering Group (GFSG) based on problems
written charters. Among the criteria for – May be open-ended but
group formation:
with a definite set of
• Is this work better done (or already being
done) elsewhere, e.g. IETF, W3C? objectives and milestones
• Are the leaders involved and/or in touch with to drive progress
relevant efforts elsewhere?
foster@mcs.anl.gov ARGONNE CHICAGO
Current GGF Groups 189
(Out-of-date List, Sorry…)
AREA Working Groups Research Groups
Grid Information Grid Object Specification Relational Database Information
Services Grid Notification Framework Services
Metacomputing Directory
Services
Scheduling and Advanced Reservation
Resource Scheduling Dictionary
Management Scheduler Attributes
Security Grid Security Infrastructure
Grid Certificate Policy
Performance Grid Performance Monitoring
Architecture
Architectures JINI Grid Protocol Architecture
NPI Architecture Accounting Models
Data GridFTP Data Replication
Applications, Applications
Programming Models, Grid User Services
and User Grid Computing Env.
Environments
Adv Programming Models
Adv Collaboration Env
foster@mcs.anl.gov ARGONNE CHICAGO
Proposed GGF Groups 190
(Again, Out of Date …)
AREA Working Groups Research Groups
Scheduling and Scheduling Command Line Scheduling Optimization
Resource API
Management Distributed Resource Mgmt
Applic API
Grid Resource Management
Protocol
Performance Network
Monitoring/Measurement
Sensor Management
Grid Event Service
Architectures Open Grid Services Grid Economies
Architecture
Data Archiving Command Line DataGrid Schema
API Application Metadata
Persistent Archives Network Storage
Area TBD… Open Source Software High-Performance Networks
Licensing for Grids
Cluster Standardization
foster@mcs.anl.gov ARGONNE CHICAGO
191
Getting Involved
Participate in a GGF Meeting
– 3x/year, last one had 500 people
– July 21-24, 2002 in Edinburgh (with HPDC)
– October 15-17, 2002 in Chicago
Join a working group or research group
– Electronic participation via mailing lists (see
www.gridforum.org)
foster@mcs.anl.gov ARGONNE CHICAGO
192
Grid Events
Global Grid Forum: working meeting
– Meets 3 times/year, alternates U.S.-Europe,
with July meeting as major event
HPDC: major academic conference
– HPDC’11 in Scotland with GGF’5, July 2002
Other meetings with Grid content include
– SC’XY, CCGrid, Globus Retreat
www.gridforum.org, www.hpdc.org
foster@mcs.anl.gov ARGONNE CHICAGO
193
Outline
The technology landscape
Grid computing
The Globus Toolkit
Applications and technologies
– Data-intensive; distributed computing;
collaborative; remote access to facilities
Grid infrastructure
Open Grid Services Architecture
Global Grid Forum
Summary and conclusions
foster@mcs.anl.gov ARGONNE CHICAGO
194
Summary
The Grid problem: Resource sharing &
coordinated problem solving in dynamic,
multi-institutional virtual organizations
– Real application communities emerging
– Significant infrastructure deployments
– Substantial open source/architecture
technology base: Globus Toolkit
– Pathway defined to industrial adoption, via
open source + OGSA
– Rich set of intellectual challenges
foster@mcs.anl.gov ARGONNE CHICAGO
Major Application 195
Communities are
Emerging
Intellectual buy-in, commitment
– Earthquake engineering: NEESgrid
– Exp. physics, etc.: GriPhyN, PPDG,
EU Data Grid
– Simulation: Earth System Grid,
Astrophysical Sim. Collaboratory
– Collaboration: Access Grid
Emerging, e.g.
– Bioinformatics Grids
– National Virtual Observatory
foster@mcs.anl.gov ARGONNE CHICAGO
Major Infrastructure 196
Deployments are
Underway
For example:
– NSF ―National Technology Grid‖
– NASA ―Information Power Grid‖
– DOE ASCI DISCOM Grid
– DOE Science Grid’
– EU DataGrid
– iVDGL
– NSF Distributed Terascale
Facility (―TeraGrid‖)
– DOD MOD Grid
foster@mcs.anl.gov ARGONNE CHICAGO
197
A Rich Technology Base
has been Constructed
6+ years of R&D have produced a substantial
code base based on open architecture
principles: esp. the Globus Toolkit, including
– Grid Security Infrastructure
– Resource directory and discovery services
– Secure remote resource access
– Data Grid protocols, services, and tools
Essentially all projects have adopted this as a
common suite of protocols & services
Enabling wide range of higher-level services
foster@mcs.anl.gov ARGONNE CHICAGO
198
Pathway Defined to
Industrial Adoption
Industry need
– eScience applications, service provider models,
need to integrate internal infrastructures,
collaborative computing in general
Technical capability
– Maturing open source technology base
– Open Grid Services Architecture enables
integration with industry standards
Result likely to be exponential industrial
uptake
foster@mcs.anl.gov ARGONNE CHICAGO
199
Rich Set of Intellectual Challenges
Transforming the Internet into a robust,
usable computational platform
Delivering (multi-dimensional) qualities of
service within large systems
Community dynamics and collaboration
modalities
Program development methodologies and
tools for Internet-scale applications
Etc., etc., etc.
foster@mcs.anl.gov ARGONNE CHICAGO
206
New Programs
U.K. eScience
program
EU 6th Framework
U.S. Committee on
Cyberinfrastructure
Japanese Grid
initiative
foster@mcs.anl.gov ARGONNE CHICAGO
U.S. Cyberinfrastructure: 207
Draft Recommendations
New INITIATIVE to revolutionize science and engineering research at NSF
and worldwide to capitalize on new computing and communications
opportunities 21st Century Cyberinfrastructure includes supercomputing,
but also massive storage, networking, software, collaboration, visualization,
and human resources
– Current centers (NCSA, SDSC, PSC) are a key resource for the INITIATIVE
– Budget estimate: incremental $650 M/year (continuing)
An INITIATIVE OFFICE with a highly placed, credible leader empowered to
– Initiate competitive, discipline-driven path-breaking applications within NSF
of cyberinfrastructure which contribute to the shared goals of the
INITIATIVE
– Coordinate policy and allocations across fields and projects. Participants
across NSF directorates, Federal agencies, and international e-science
– Develop high quality middleware and other software that is essential and
special to scientific research
– Manage individual computational, storage, and networking resources at least
100x larger than individual projects or universities can provide.
foster@mcs.anl.gov ARGONNE CHICAGO
208
For More Information
The Globus Project™
– www.globus.org
Grid concepts, projects
– www.mcs.anl.gov/~foster
Open Grid Services
Architecture
– www.globus.org/ogsa
Global Grid Forum
– www.gridforum.org
GriPhyN project
– www.griphyn.org
foster@mcs.anl.gov ARGONNE CHICAGO
O
Get documents about "