CSS434: Parallel & Distributed Computing - PowerPoint
Document Sample


CSS434 Grid Computing
Textbook No Corresponding Chapters
Professor: Munehiro Fukuda
A portion of these slides were compiled from The Grid: Blueprint for a New
Computer Infrastructure.
CSS434 Grid Computing 1
Network Infrastructure
Users login their organizational
systems first locally or
remotely.
If they are affiliated with other
organizations,
They can login from the
High-speed system of their main use to
Information high way some other systems. (They
are given an opportunity to
use those resources in
parallel).
Problems:
They must orchestrate job
execution among the
resources they use.
Should those resources be
limited to such a handful
number of researchers?
CSS434 Grid Computing 2
Purposes of Computational Grid
Use computing resource connected to high-speed information
highway as if we use electric power grid
Only 30% utilization in academic/commercial environments.
Many applications have only episodic requirements. So, why
don’t we share computation resource?
Computational results and data should be also made
available to all users.
Users:
Computational scientists and engineers
Experimental scientists
Association and corporations
Training and education
Consumers (E-commerce)
CSS434 Grid Computing 3
Grid Applications
Category Examples Characteristics
Distributed DIS and Stellar Very large problems needing lots of
supercomputing dynamics computing resource at a time
High throughput Chip design and Harnessing many idle resources to
parameter studies increase aggregate throughput
On demand Medical Allocating special resource dynamically
instrumentation
Data intensive Sky survey Using distributed data and needing
high-volume data flows
Collaborative Collaborative design Support communication or
Education collaborative work
CSS434 Grid Computing 4
Grid Services Architecture
from www.globus.org slide
High-energy Collaborative On-line
physics data engineering instrumentation
Applications analysis Regional Parameter
climate studies studies
Distributed Collab. Remote
Application computing design control
Toolkit Layer Data- Remote
intensive viz
Grid Services Information Resource mgmt ...
Layer
Security Data access Fault detection
Transport ... Multicast
Grid Fabric
Layer Instrumentation Control interfaces QoS mechanisms
CSS434 Grid Computing 5
Programming Model
Uniform Access
Paradigm
Bag of task or master workers (Condor-MW)
Client server (NetSolve)
Object oriented (Legion)
Synchronous applications (Not suited for massively
parallel computation.)
Language Support
MPI-G – message passing (Globus)
Open MP – shared memory
Math Library – remote procedure (NetSolve)
CSS434 Grid Computing 6
Resource Management
Discovery, Allocation, and Scheduling
Centralized resource manager
Systems Resource Front-end Resource Job launcher
descriptions process manager
Globus RSL: resource Broker and MDS GRAM
spec. language
Condor ClassAd and Schedd Agent Matchmaker Sandbox
DAGMan and startd (Starter)
Legion IDL: interface Scheduler Collection Enactor
def. language
+: easy to manage
–: a bottleneck
Decentralized resource manager
A collection of centralized manager (Condor’s gate flocking)
A combination of meta and local schedulers.
CSS434 Grid Computing 7
Fault Tolerance
Check-pointing
At the master (Condor)
At each node but collected at the master (Catalina)
Use a whiteboard (Optimal Grid)
Re-execution of fault worker jobs from the
beginning (Bayanihan, Optimal Grid)
Error code (NetSolve)
User is responsible to handle errors.
CSS434 Grid Computing 8
Security
Resources covered with security layers
Legion (Message/MayI layers)
Entropia (Intercepting all system calls)
A use of commodity tools
SSL
Public key
Security Certificate
Java sandbox
Kerberos
CSS434 Grid Computing 9
NetSolve
http://icl.cs.utk.edu/netsolve/
Network of servers Client
RPC-based approach
Clients
Include a set of APIs called Agent
as (asynchronous) RPCs
Agents
Match client’s requests for Agent
choice Scalar
services with servers Client
request server
Servers reply
Encapsulates remotely MPP servers
accessed numerical libraries
CSS434 Grid Computing 10
Legion
http://legion.virginia.edu/
Legion classes Prog
Act as managers and make policy request
Enactor
Core objects Scheduler
Provide mechanisms that classes Converted Legion object ID
use to implement policies: hosts By context objects reserve
(processors), vaults(memory), search Converted Logion object address
context, binding agents, etc. By binding agents
Resource
Per-Program Scheduling database Class
Host
Participating sites can assure their collection
tty
local policies. tty Host Host
Resources
User can choose a scheduling Class
policy. tty
CSS434 Grid Computing 11
Condor
http://www.cs.wisc.edu/condor/
A: User’s local agent
R: Each computer resource I/O forwarded to
M: Central manager a user’s home
CSS434 Grid Computing 12
AgentTeamwork at UWB
Architecture
CSS434 Grid Computing 13
Paper Review by Students
Globus
Legion
Condor
Netsolve
Discussions
What programming or execution model is each system
based on?
What resource allocation and scheduling algorithm does
each system use?
Are they fault-tolerant?
Did they any special security features for their own?
CSS434 Grid Computing 14
Get documents about "