Embed
Email

Grid Computing

Document Sample
Grid Computing
Shared by: HC11112908267
Categories
Tags
Stats
views:
0
posted:
11/29/2011
language:
English
pages:
55
Grid Computing

DCS861A Emerging Computing II

Spring 2005

DPS Team 2

29/11/2011

What is “Grid Computing”?



“…a type of parallel and distributed system

that enables the sharing, selection, and

aggregation of geographically distributed

"autonomous" resources dynamically at

runtime depending on their availability,

capability, performance, cost, and users'

quality-of-service requirements.”



11/29/2011 Source: Grid Computing Info Centre (www.gridcomputing.com) 2

What is “Grid Computing”?



“…a type of parallel and distributed system

that enables the sharing, selection, and

aggregation of geographically distributed

"autonomous" resources dynamically at

runtime depending on their availability,

capability, performance, cost, and users'

quality-of-service requirements.”



11/29/2011 Source: Grid Computing Info Centre (www.gridcomputing.com) 3

What is “Grid Computing”?



“…a type of parallel and distributed system

that enables the sharing, selection, and

aggregation of geographically distributed

"autonomous" resources dynamically at

runtime depending on their availability,

capability, performance, cost, and users'

quality-of-service requirements.”



11/29/2011 Source: Grid Computing Info Centre (www.gridcomputing.com) 4

Where Are These Resources?

 Mainframes are idle

about 35% of the time

 UNIX servers are

actually "serving"

something less than

15% of the time

 And most PCs do

nothing for 95% of a

typical day



Imagine an airline

with 85% of its fleet

on the ground, an

automaker with 35%

of its assembly

plants idle, a hotel

chain with 95% of its

rooms unoccupied!

11/29/2011 5

“Computing Grid As Utility”

A common metaphor in the literature:

“…[a computing grid is] analogous to electric

power network (grid) where power

generators are distributed, but the users

are able to access electric power without

bothering about the source of energy and

its location.”



― Grid Computing Info Centre

11/29/2011 6

“Grid as Utility” Origins



Early on in 1969, Len Kleinrock, one of the original

Arpanet designers, wrote…





“We will probably see the spread of

„computer utilities‟, which, like present

electric and telephone utilities, will

service individual homes and offices

across the country.”

11/29/2011 7

On-demand, Dispersed Resources

Quality, economies of scale









Decouples production &

consumption, enabling…

 On-demand access

 Economies of scale

 Consumer flexibility

 New devices



11/29/2011 8

Source: Ian Foster, U. of Chicago

Time

Grid Computing Scales









Cluster Grids Enterprise Grids Global Grids





11/29/2011 9

“But Computing isn’t Electricity”



 Usually users only consume electricity,

they don’t also produce it ― software

applications both consume and produce

data

 “Computing” is not a homogenous “thing”,

but is highly heterogeneous: data,

sensors, services, software, computing

hardware, …

11/29/2011 10

“But Computing isn’t Electricity”



 This complicates things; but, it means that the

result can be greater than the sum of the parts

 Also it raises some fundamental questions…

 Building applications that exploit the infrastructure?

 Operating such a complex environment?

 Managing heterogeneous resources not centrally

owned?

 Ensuring QoS across these distributed services?



11/29/2011 11

Another Way of Looking at Grids



 From a less technical viewpoint:

“Grid computing has emerged as an important

new field, distinguished from conventional

distributed computing by its focus on large-scale

resource sharing, innovative applications, and,

in some cases, high-performance

orientation...we [define] the "Grid problem”…as

flexible, secure, coordinated resource sharing

among dynamic collections of individuals,

institutions, and resources - what we refer to as

virtual organizations.”

The Anatomy of the Grid

Enabling Scalable Virtual Organizations

11/29/2011 Ian Foster, Carl Kesselman, Steven Tuecke 12

Intl. Journal Supercomputer Applications, 2001

Virtual Organizations (VOs)









In VOs a grid infrastructure is more a means to an end:

 Enables integration & sharing of distributed resources

 Removes geographical constraints on teams

 Creates consistent qualities of service via fault-

11/29/2011 tolerance, dynamic workload balancing, etc. 13

Grid History: I-WAY ― A Seminal Event





 Experiment led by researchers at the University of Illinois

at Chicago and Argonne National Laboratory

 For a week in Nov 95, it linked 11 research networks to

create one high-speed network infrastructure

 Connected 17 sites across the US and Canada

 Demonstrated 60 applications, from distributed

computing to virtual reality collaboration

 Attempted to construct a unified software infrastructure

providing scheduling, single sign-on, and other grid-

enabled services



11/29/2011 14

Early Grids: Govt.-funded Science



 GUSTO (1998): 80 global research sites

 3,000+ host grid software testbed

 NASA Information Power Grid (since 1999)

 Production grid linking NASA laboratories

 INFN Grid, EU DataGrid, iVDGL, … (2001+)

 Grids for data-intensive science

 TeraGrid, DOE Science Grid (2002+)

 Production grids linking supercomputer centers

 U.S. GRIDS Center

 Software packaging, deployment, support

11/29/2011 15

Why are Grids Hot Now?

 Hardware performance improving exponentially

 Computer speed doubles every 18 months

 Network speed doubles every 9 months

 Difference = order of magnitude every 5 years

 1986 to 2000…

 Computers: x 500

 Networks: x 340,000

 2001 to 2010…

 Computers: x 60

 Networks: x 4,000

11/29/2011 16

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan,

Kleiner, Caufield and Perkins.

Why are Grids Hot Now?

 Grids begin to address some real world IT issues:

 Low overall utilization of enterprise resources

 High cost of provisioning for peak demand

 Lack of information integration

 Physical distribution of teams is increasing

 Inability to apply available resources to advanced

computation & data-intensive applications when and

where they are needed

 However, the marketing hype is outrageous; every

possible SW & HW product has been “gridified”

11/29/2011 17

Early Commercial Adopters



 Aerospace and Automotive (for collaborative

design and modelling)

 Architecture (engineering and construction)

 Electronics (design and testing)

 Energy (for oil and gas for exploration)

 Finance/insurance/real estate (securities and

brokerage especially for stock/portfolio analysis

and risk management)



11/29/2011 18

Early Commercial Adopters



 Life sciences (particularly in pharmaceuticals)

 Manufacturing (inter/intra-team collaborative

design, process management)

 Media/entertainment (to generate digital

animation)

 Utilities (to improve efficiency while dealing with

peaks and valleys in utilization)



11/29/2011 19

Grid Market Projections

Leading adopters (Oct 2003)…

•Financial services: 31%

Grid Services Market Opportunities 2005









•Life sciences: 26%

•Manufacturing: 18%

Manufacturing

Financial

Services Mechanical/ LS /

Electronic Bioinformatics Other

Design

Energy Derivatives

Analysis Process Cancer Entertainment Web

Seismic Simulation Research Applications

Statistical

Analysis Analysis Finite Drug Digital Weather

Reservoir Element Discovery Rendering Analysis

Portfolio Analysis

Analysis Risk

Protein Massive

Analysis Multi-Player Code Breaking/

Failure Folding Simulation

Analysis Games

Protein Academic

Sequencing Streaming

Media







“Gridified” Infrastructure

11/29/2011 20

Sources: IDC, 2000 and Bear Stearns- Internet 3.0 - 5/01 Analysis by SAI

Example Adopter: Novartis



 PC-based grid of

3,700 desktop systems

“We have projects we

 R&D pharmaceutical

calculate would take 6

applications

years on a single

 Potentially mainstream

business computing

supercomputer.

 > 5 teraflop/s

Today, the run time is

computing power 12 hours.”

 Estimated savings of

$200M over 3 years ― Peter Sany, Novartis CIO



11/29/2011 21

Grid Application Attributes

 Computational complexity

 Genome research

 Financial product creation

 Geophysical studies

 Digital animation creation

 Massive data requirements

 Digital mammography diagnostics

 Particle physics research

 Astronomical observation analysis

11/29/2011 22

Computational Complexity:

Protein Analysis



 Example: Determining

the structure of a

complex molecule,

such as the cholera

toxin shown here, is

the kind of

computationally

intense operation that

grids are intended to

tackle

(Adapted from G. von Laszewski et al., Cluster

Computing, volume 3(3), page 187, 2000)





11/29/2011 23

Massive Data Requirements



 Storage density doubling every 12 months

 Dramatic growth in online data (1 petabyte

= 1000 terabytes = 1,000,000 gigabytes)

 2000 ≈ 0.5 petabyte

 2005 ≈ 10 petabytes

 2010 ≈ 100 petabytes

 2015 ≈ 1000 petabytes?

 These are sometimes called “data grids”

11/29/2011 24

Massive Data Requirements:

Digital Mammography





 Digital Radiology (hospital digital data)

 Mammogram X-rays

 MRI / CAT scans

 Endoscopies

 Very large data sources

 7 terabytes per hospital per year

 Dominated by digital images



11/29/2011 25

Massive Data Requirements:

Digital Mammography





 Why target

mammography?

 Increasing need for film

recall & computer analysis

 Large volumes (4,000

GB/year ― 57% of total)

 Storage and records

standards exist

 Great clinical value



11/29/2011 26

Grid Management Challenges



 Scale of data and compute resources is huge

 QoS and performance criteria are severe

 Platform must be scalable, able to evolve, fault-

tolerant, robust, persistent and reliable

 It should work seamlessly, and transparently –

the user might not know or care where their

calculation is done using how many machines,

or where data is actually held



11/29/2011 27

Grid Management Challenges

 Resource configurations are transient, dynamic

and volatile as services (databases, sensors,

compute servers) are switched in and out

 They are ad-hoc as service consortia have no

central location or control and no existing trust

relationships

 They may be large, with hundreds of services

orchestrated at any time

 They may be long-lived, for example a protein

folding simulation could take weeks

11/29/2011 28

Technical Challenges

How does a grid infrastructure, in a dynamic, multi-

institutional, physically distributed setting,…



 Locate suitable computers?

 Authenticate & authorize user requests?

 Allocate resources on those computers?

 Select appropriate communication methods?

 Configure the computations?

 Initiate these computations on those computers?

 Access data files and return output?

 Respond appropriately to resource changes?

11/29/2011 29

Grid Software Sources



 Academic & Scientific Researchers

 U. of Chicago & USC (Globus Toolkit)

 UC Berkeley (BOINC)

 Public consortium-based organizations

 Global Grid Forum (OGSA)

 Commercial Vendors

 IBM, Entropia, United Devices, etc.



11/29/2011 30

Globus Toolkit (www.globus.org)



 Includes software for…

 Early open-source

 security

grid infrastructure

toolkit  information

infrastructure

 Set of protocols,  resource management

services & software  data management

libraries that supports  communication

grids and grid

 fault detection

applications

 portability



11/29/2011 31

Evolving Open Grid Standards



Research Managed shared

virtual systems

Increased functionality,









Open Grid

Web services, etc.

standardization









Services Arch

Real standards

Multiple implementations

Internet

standards Globus Toolkit

Defacto standard

Custom Single implementation

solutions



1990 1995 2000 2005 2010

11/29/2011 32

OGSA (www.gridforum.org)

 Grid technologies ― including the Globus

Toolkit ― are evolving toward the Open

Grid Services Architecture (OGSA)

 OGSA provides an extensible set of

services that virtual organizations can

aggregate in various ways

 Built on concepts and technologies from

both the Grid and Web services

communities

11/29/2011 33

OGSA



 OGSA defines:

 Grid service semantics (like Web services)

 Standard mechanisms for creating, naming, &

discovering transient grid service instances

 Location transparency and multiple protocol

bindings for service instances

 Support for integration with underlying native

platform facilities

11/29/2011 34

OGSA



 OGSA also supports (via WSDL):

 creating/composing complex distributed systems

 lifetime management

 change management

 notification

 reliable invocation

 authentication & authorization



11/29/2011 35

Grid Standards: Summary





 Grid Services and Web Services are merging

 Web Services standards landscape is in flux

 OGSA will need to evolve with it

 Fuzzy security & policy standards are a concern

 W3C, OASIS, GGF are key standards orgs

 Open source software important for adoption





11/29/2011 36

Some Commercial

Grid Software Vendors

 IBM (www.ibm.com/grid)

 Avaki (www.avaki.com)

 GridIron Software (www.gridironsoftware.com)

 United Devices (www.ud.com)

 Platform Computing (www.platform.com)

 DataSynapse (www.datasynapse.com)

 Entropia (www.entropia.com)

 Oracle 10g (www.oracle.com/technologies/grid)

11/29/2011 37

“Wait a second! What about…”





 SETI@home (extra-terrestrial signal search)

 GIMPS (Great Internet Mersenne Prime Search)

 folding@home (protein manipulation)

 Distributed.net (brute force decryption)



…and all those other Internet “grid” projects

I’ve been reading about?



11/29/2011 38

“Public Resource” Computing



 These are all examples of what Dave Anderson

of Berkeley calls “public resource computing”

 Most of the world's computing power is no longer in

supercomputer centers or institutional machine rooms

 Instead, it is now distributed in the hundreds of

millions of personal computers, game consoles, and

TV set-top boxes

 “If all this computing power could be made

available to researchers somehow…”



11/29/2011 39

Hallmarks of Public Resource Computing





 Public resource computing shares some traits

with grid computing, but is qualitatively different

 “Open” vs. “closed” society of resources

 “Asymmetric usage”: more suppliers of resources

than consumers, e.g., millions of PC screensavers vs.

small team of researchers

 Must be able to attract “altruistic” participants

 Often some “reward” mechanisms will exist for

resource suppliers



11/29/2011 40

Public Resource Application Profile





 High computing to data ratio is typical

 Computation independence & parallelism

is crucial

 Must be tolerant to errors and outages

 Must be able to handle “malicious” users

 Sporadic connectedness is the norm



11/29/2011 41

Public Resource vs. Grid Computing



Public-Resource Grid

Managed resources? no yes

Secure resources? no yes

Always on? no yes

Always connected? no yes

Network bandwidth Expensive, scarce abundant

Network connection 1 way (pull) 2 way (pull or push)

Must be unobtrusive? yes no

Credit system? yes maybe

How to get resources complex complex

Public education/outreach? yes no

Self-upgrading? yes no

11/29/2011 Source: David Anderson, BOINC project (UC Berkeley) 42

Example: SETI@home



 SETI = “Search for Extraterrestrial Intelligence”

 Goal: detect intelligent life outside the Earth

 Uses radio telescopes to listen for narrow-

bandwidth radio signals (not known to occur

naturally) from space

 Initial version used hand-crafted server

architecture and workstation clients

11/29/2011 43

SETI Computational Model

 Signal data is divided into fixed-size work units

that are distributed, via the Internet, to a client

program running on numerous computers

 Client program computes a result (a set of

candidate signals), returns it to the server, and

gets another work unit

 Each work unit is processed multiple times to

detect and discard results from faulty processors

and from malicious users

11/29/2011 44

SETI@home at Work









11/29/2011 45

SETI@home Technical Specs

 SETI@home client program is written in C++

 Platform-independent framework with platform-

specific implementations

 graphics library

 SETI-specific data analysis code

 SETI-specific graphics code

 Client ported to 175 different platforms using the

GNU toolset

 Client can run as a background process, as a

GUI application, or as a screensaver

11/29/2011 46

SETI@home Results to Date

Totals Last 24 Hours

(as of 03/31/2005)





Users 5,388,068 784





Results received 1,811,656,328 1,339,532





Total CPU time 2,251,657.404 years 925.204 years





Floating Point 5.224175e+18

6.649645e+21

Operations (60.46 TeraFLOPs/sec)



Average CPU time

10 hr 53 min 15.2 sec 6 hr 03 min 01.6 sec

per work unit

11/29/2011 47

Lessons from SETI@home



 Public resource computing concept does

work, but…

 How do you make it easy for researchers to

access the public’s resources & good will?

 How do you make it easy for the public to

contribute their resources to multiple projects?

 One answer: the BOINC public resource

computing platform from UC Berkeley

11/29/2011 48

BOINC Goals



 For computing projects

 easy/cheap to create and operate projects

 support a wide range of applications

 no central authority

 For participants

 easy to participate in multiple projects

 resource allocation among projects

 invisible use of disk, CPU, network

11/29/2011 Source: David Anderson, BOINC project (UC Berkeley) 49

BOINC Architecture









11/29/2011 50

Some BOINC-based Projects



 SETI@home (updated for BOINC support)

 Predictor@home (protein-related disease)

 Einstein@home (gravity waves, LIGO)

 CERN (particle physics)

 UCB/Intel network performance study

 climateprediction.net (future climate impact)



11/29/2011 51

Example: climateprediction.net





 The Earth is likely to warm over the coming

century. Question is by how much?

 climateprediction.net is the world’s largest

climate modelling experiment to try and

answer this question

 62,000 participants in 130 countries (8/04)



11/29/2011 52

climateprediction.net Summary



1. Each user downloads and runs a unique

simulation model of the Earth's climate

2. Models undergo an initial calibration

3. Each model is tested by simulating 20th century

climate

4. Models which cannot reproduce present and

past climate are discarded

5. All remaining models are run to predict the 21st

century climate

6. These results create the probabilistic forecast

for the 21st century climate

11/29/2011 54

For More Information

 Globus Alliance

 www.globus.org

 Globus Consortium

 www.globusconsortium.com

 Global Grid Forum

 www.ggf.org

 Open Science Grid

 www.opensciencegrid.org

 Grid Today newsletter

 www.gridtoday.com

 Grid Blog

 www.gridblog.com

 BOINC

 boinc.berkeley.edu

11/29/2011 55


Related docs
Other docs by HC11112908267
Regular Session Board Minutes ...
Views: 6  |  Downloads: 0
Probabilidad y Estad�stica
Views: 1  |  Downloads: 0
O que � um Plano de Neg�cio
Views: 2  |  Downloads: 0
Radiation Safety Guide 2002
Views: 4  |  Downloads: 0
Questions for FAA IDIQ
Views: 2  |  Downloads: 0
EXPERIENCES PROFESSIONNELLES
Views: 4  |  Downloads: 0
U6 and U8 OFFICIALS Training AYSO Region 159
Views: 2  |  Downloads: 0
CALCULO I
Views: 5  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!