Grid Computing Lecture
1. The Grid Dream
a) Grid Electricity b) Social Perspective c) The Grid Push
The Grid
Grid takes its name from analogy with electrical power Grid: – electricity on demand via wall socket – source unknown but reliable – transparency and resilience are keys to its success The Grid dream is to allow users to tap into resources off the internet as easily as electrical power can be drawn from a wall socket - imagine …
1 2
2. Examples
a) Physics Data Experiment b) Aircraft Design Collaboration c) Common Goals
3. Virtual Organisations
a) VO concepts
4. A Grid
a) Criteria
5. Globus
a) An brief overview of the Globus toolkit
Can this Happen?
To make this happen, what do we need: • • • • Pervasive deployment of infrastructure security accountancy i.e pay for what you use… transparent access
• The user is not aware (and doesn!t care) what computing resources are used to solve their problem
Social Uptake ..
• the history of the evolution of infrastructures e.g. Electricity Grid, shows that returns on initial investments are an important factor in providing access to the capital required for further roll-outs (and hence a reduction in the 'Digital Divide'). This is why Edison chose of Wall Street in New York) • He had to compete with other technologies • Needed dense population • Needed to find switching costs ..
… just as one has with power.
3
• We need the same for Grid computing • strong industrial involvement (and profit) • pervasive uptake – need standards based infrastructure
4
The Uptake of the Grid
Strong commercial Involvement:
e.g. IBM and Globus Announce Open Grid Services for Commercial Computing - http://www.ibm.com/news/be/en/2002/02/211.html
Example: UK e-Science Program
• Spending Reviews
– – – – 2000 : £120m for 3 years 2002 : Further £115m for years 4 & 5 2003 : Further £16.2 million 2004 : Further £18 Million
Tony Hey, Director of the core UK eScience Core Programme But recently, joined Microsoft ….
Infrastructure
The US National Science Foundation committed: • 2001: $53 million on the TeraGrid - 13.6 teraflops of computing power, over 450 terabytes of data storage, and high-resolution visualisation systems, interconnected by a 40Gbps network. • 2002: $35 Million supplement • 2003: further $10 Million supplement
Standards Based
All Grid software is based on open software and tries to be standards based through GGF (Global Grid Forum – www.ggf.org) or other
Advertising ..
e.g. An Overview of Distributed Grid Computing, Grid Today, http://www.gridtoday.com/02/1104/100635.html
5
• Development of key IT infrastructure to support e-Science • Managed by Research Councils & DTI • Application specific Pilot Projects • Core programme to identify and develop generic Grid middleware
6
UK e-Science Network
National Centre in Edinburgh/Glasgow ! 8 regional centres ! Grid support centre
!
Glasgow Newcastle
Welsh e-Science Centre
• Based at Cardiff University
Edinburgh
– Department of Computer Science – Funded by DTI, WDA and CU
• Role:
Alex Hardisty, Manager Welsh e-Science Centre
Belfast DL
Manchester Oxford Cambridge London Hinxton (EBI)
Cardiff
RAL
– Promote e-Science research and development in Wales and South-west of England – Accelerate the adoption of e-Science (Grid) capabilities
Southampton
7
• http://www.wesc.ac.uk/
8
Example 1: Collaborative Scientific Experiments
http://eu-datagrid.web.cern.ch/eu-datagrid/
• Physicists collaborating in an international experiment need to share:
– Experimental data and storage resources. – Computers and software for extracting information from this data. – Computers and software for interpreting the data using largescale computer simulations.
Example 2: Engineering Design
• industrial consortium in order to integrate sophisticated tools to simulate a next-generation supersonic aircraft.
Large Hadron Collider (CERN): raw data rate = 1 Petabyte/sec Filtered rate = 100Mbyte/sec = 1 Petabyte/year = 1 Million CD ROMs ( ~200m3!)
9
• Collaborating organisations need to share: – Digital blueprints of the design – Supercomputers for performing multidisciplinary simulations – sensitive proprietary software components
A new aircraft may involve 10,000 collaborating engineers
10
Elements in Common
• Coordinated problem solving
– Beyond client-server: distributed data analysis, computation, collaboration, … – … Problem Solving Environments
Brief History
1. First Generation: Early Metacomputing environments, such as FAFNER (http://www.npac.syr.edu/factoring.html) and the I-WAY (see next slide) 2. Second Generation:
1. Core Grid technologies like the Globus toolkit (www.globus.org – later) and Legion (http://legion.virginia.edu/download/) 2. Distributed object systems e.g. Jini (www.jini.org) and CORBA (www.corba.org) 3. Grid resource brokers and Schedulers e.g.
1. Condor (http://www.cs.wisc.edu/condor/) and 2. SGE (http://wwws.sun.com/software/gridware/sge.html)
• Resource sharing
– Computers, data, instruments, networks
• Multi-institutional “virtual organisations”
– Overlying traditional organisational structures – Large or small, static or dynamic
4. Integrated systems including Cactus (cactuscode.org), DataGrid, UNICORE (www.unicore.org) and P2P Computing frameworks e.g. Jxta (jxta.org) 5. Application user interfaces for remote steering and visualization e.g. Portals and Grid Computing Environments (later..)
3. The Third Generation:
1. introduction of a service-oriented approach (e.g. OGSA later ..) 2. Increasing use of metadata (giving more detailed information describing services)
11
12
The I-Way
• connected supercomputers and other resources at 17 sites across North America based on ATM • consisted of a number of I-POP (point of presence)
– Connected by the internet or ATM networks
ATM Switch
Local Resource
The I-WAY
ATM Switch
Local Resource
I-POP
AFS Kerberos Schedluer
Local Resource
Possible Firewall
• I-Soft software could access the configured I-POP machines and provided an environment that consisted of a number of services, including:
– – – – scheduling security (authentication and auditing), parallel programming support (process creation and communication) distributed file system (using AFS, the Andrew File System).
Internet or ATM
I-POP
AFS Kerberos Schedluer
Local Resource
Possible Firewall
ATM Switch
• I-WAY became Globus ..
I-POP
AFS Kerberos Schedluer 13
Local Resource
Local Resource 14
Possible Firewall
Current Grid Definition
Virtual Organisations (VOs)
Virtual Organisations provide a highly controlled environment to allow each resource provider to specify exactly what they want to share, who is allowed to share it and the conditions whereby this sharing occurs. The set of individuals and/or institutions that provide such sharing rules are collectively known as a virtual organisation (VO).
Foster I, Kesselman C and Tuecke S, (2001) The Anatomy of the Grid: Enabling Scalable Virtual Organizations
• “The Grid is flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources • The concept of Virtual Organisations
15
16
A VO Overview
Users/Clients
Multiple VOs
VO2 VO1
Middleware
Internet Routing
Virtual Organization (VO) Resources
in Grid computing you can execute your own code on remote resources
•Must be secure !! • The VO provide blanket security policy for sharing between organizations • VO is implemented by Middleware - Globus
17
VOs are dynamically accessible from a Grid application and such applications are capable of spanning a number of different organizations, each running their own VO.
VO3
18
To be or not to be a Grid
The Criteria ! A Grid must • coordinate resources that are not subject to centralized control • uses standard, open, general-purpose protocols and interfaces • delivers non-trivial qualities of service (QoS)
Decentralized Control
The first point in the check list is talking about how the resources that make up the distributed system are controlled, whether they are:
1. centrally controlled by one administrator (a nonGrid) 2. consist of a number of interacting administrative domains that pull resource together using common policies. Therefore, computational Grids should connect resources together at different administrative domains – see VOs later
19
20
Standard, Open, GeneralPurpose Protocols
Grid computing is aiming to help standardize the way we do distributed computing rather than having a multitude of non-interoperable distributed systems. A standards-based open architecture promotes extensibility, interoperability and portability because they have general agreement within the community. To help with this standardization process, the Grid community has the Global Grid Forum (GGF) Also recently adopted Web Services standards for OGSA
QoS
There are three types of quality support that can be provided: 1. None: No QoS is supported at all... 2. Soft: You can specify QoS requirements and these will try to be met but they cannot be guaranteed. This is the most common form of QoS implemented in Grid applications. 3. Hard: This is where all nodes on the Grid support and guarantee the level of QoS requested. A Grid should be able to deliver non-trivial QoS, whether for example this is measured by: 1. performance 2. Service, data availability 3. data transfer etc QoS is application specific - it depends on the needs of the application… For example, in a Physics experiment, the QoS maybe specified in terms of computational throughput but on other experiments, the QoS maybe specified in terms of reliability of file transfers or data content.
21 22
Globus – globus.org
Consists of three elements: • Information Services: to provide information about Grid services • Data Management: involves accessing and managing data • Resource Management: to allocate resources provided by a Grid. And, of course, security: • Security: to provide authentication, delegation and authorization
23
The Globus Grid
Users/Clients
Internet Routing
GSI
VO
GRAM GridFTP
X. 509
VO
Res
MDS MDS
ourc
es
Middleware (Globus)
MDS
Sing
le S
ign-
on
VO
Aut ual Mut
tica hen
tion
24