How to Program Thousands of Processors in a Massive
Document Sample


How to Program
Thousands of
Processors in a
Massive Compute Grid
●
Curt Harpold
●Sr. SPARC Technical Specialist
●Sun Microsystems
1
What is Grid Computing?
Autonomic On-Demand
Organic IT Computing
Dynamic Systems
GRID
COMPUTING Initiative
Real Time Enterprise Grids
Adaptive Enterprise
Infrastructure
Utility Computing
Dynamic IT
2
Grid Computing Defined
“The combination of distributed
resources with a corresponding
management infrastructure hosting at
least one type of service or
workload...”
Fritz Ferstl
Director, Sun Grid Engine
Sun Microsystems, Inc.
3
What is Grid Computing?
• The network is the computer™
> Distributed resources
> Management infrastructure
> Targeted service or workload
• Utilization & performance ↑, costs & complexity ↓
• Examples:
> Rendering and simulation “farms”
> Aggregating desktops for computation, aka cycle stealing
– e.g. SETI@Home
> Managing an entire rack from a single interface
4
Compute Grids
• Solving problems horizontally
> High Performance [Technical] Computing
> Data center optimization
• Examples:
> EDA
> Modeling
> Transaction validation
• Increasing utilization
> 15%-25% is typical
> Cycle stealing
5
Dynamic Resource Management
Distributed
Resource
Manager
Jobs
Dispatch
Results
6
Grid-enabled Applications
• Applications which take advantage of the grid
> Integrated
• Embarrassingly parallel
> Completely independent tasks
• Or just parallel
> Tasks that need to talk to each other
• Or something that can live on the grid...
7
Typical Data Center
No Virtualization
• Every machine
u
managed individually
• Disproportionate
u number of admins
• No clear accounting or
accountability
u
• 10%-25% utilization
8
Data Center Virtualization
u
Compute Grid
• Users interact with grid,
not individual machines
u
• Increased utilization
u
• Increased availability
9
SGE In a Nutshell
•Distributed Resource Manager
>Matches workload* to resources
>Optimizes resource utilization
•Loves heterogeneity
>Solaris, Linux, Windows, MacOS, AIX, HP-UX...
•Why SGE?
>Very feature competitive
>Very cost competitive
>Credible open source & product offering
>Thriving community
* Best with jobs > ~30s
10
Architecture Overview
Execution
ARCo Daemon
qsub Execution
qrsh Qmaster Daemon
qlogin
qmon
Scheduler Execution
qtcsh
Daemon
App DRMAA Execution
Daemon
11
Example Configuration
Jobs
Queue Instances
Hosts
Slots
sparc amd
Queues
all.q
smp blast
solsparc1 solsparc2 solsparc3 solamd1 solamd2 Host
Groups
@dev @prod
@allhosts
12
Resource Matching
Selection Scheduling
JOB
User
●
User policies ●
Job policies
●
Groups ●
Resources ●
System
●
Roles characteristics
●
Departments ●
System status
●
Projects ●
Resources
13
Data Management
• No explicit data management
• Shared File System
> NFS “by default”
• Script files are transferred
> Binaries are not
• File Staging
> Copy data in before job
> Copy results out after
> Not inherent feature
– Configured via scripting hooks
14
Security
• Access Control Lists
> Explicitly allow or disallow
> Users and groups
• Restricted operations
> Managers and operators
> Submit and admin hosts
• Certificate-based encryption
> Hides and protects data
> Guarantees identity
• Replace rsh/rlogin/telnet with ssh
15
Maximum Flexibility
• Almost every behavior can be configured
• Resources
> Load sensors
• Hierarchical
> Hosts, host groups, queues, etc.
> Users, user groups, departments, projects, etc.
• Script-based integration points
> Suspend/resume
> Job execution
> Checkpointing, Parallel Environments
16
Policies and Priorities
User 1 Project C
Team B
Enterprise-wide
Resource Demand
User 2
Department 1 Project A Contractor X
Department 2
Departmental
Resource Access
Department 3
Department 5 Department 4
17
100% Scriptable
• Equivalence of GUI and command-line tools
> Most administrators do not use the GUI
• Non-interactive alternatives
> Instead of launching an editor
• Simple output text
> Easily parsed
• XML output
> Schema provided
• Rich ecosystem of tools
18
Ease of Use
• No configuration files
• Full equivalence of GUI and command line
• Changes do not require restart
• No client-side installation
> Assuming a shared file system...
• Simple install
> Even complex installation in minutes
> Can be fully automated
– Including remote access
19
Scalability
• Sun Grid Engine 6.1 target:
> 10k+ hosts (hosts ≤ CPU's)
> 500k+ jobs (no task limit)
• Sun Grid Engine 6.1:
> Job round-trip 0.4s
– Mostly fork and exec
> Submit rate >120 Jobs/sec
– Using DRMAA
• Sun Grid Engine 6.2 target:
> 90k+ cores
20
Sophisticated Scheduler
• Align resource usage with business policies
> Historical usage tracking
> Time-based priorities
> Resource-based priorities
> Fine-grained quotas
• Maximize utilization
> Hardware and software
• Dynamic, continuously evaluated
> Changes take effect immediately
– No restart
21
Accounting and Reporting
• ARCo: Accounting and Reporting Console
> Fine-grained resource accounting
– Stored in RDBMS in well-defined schema
– Standard SQL access for 3rd party tools
– Customizable and extensible
> Web-based console tool
– Generate reports, queries, etc.
– Customizable queries and report formats
– Spreadsheet report generation for offline analysis
22
Distributed Resource
Management Application API
• Standard from the Open Grid Forum
> Submit, monitor, control jobs
> Language & platform agnostic
• ISVs
> “Grid-enable” their applications
> Avoid DRM/Grid system lock-in
• In-house developers
> Integrate Grid tasks into workflow, orchestration, online
apps, etc.
23
User Interfaces
Browser (accounting)
Graphical Command-line
Sun Grid Engine
<c/> <java/>
Programmatic (DRMAA) Programmatic (DRMAA)
24
Utility Computing
• Everything gets logged
> All events – job, host, queue, etc.
> Usage information
> Projects, accounts, departments
• Accounting file
> qacct -j job_id
• Reporting file
> DBWriter → ARCo
• Core to the Sun Grid Compute Utility
25
How to Get Grid Engine
• Sun Grid Engine – Licensed Product
> Support and Customer Indemnification
> Limited platforms
• Grid Engine Open Source Project
> Same source tree as Sun Grid Engine
> Runs on almost anything
> Supported by the Community
> Free
• Sun Download Center
> Same as Licensed Product, for Free
> Add support contract later
> Ultimate Try and Buy 26
Open Source Project
• Foundation for Sun Grid Engine
> Development happens in open source
• Very widely adopted – strong community
> Active mailing lists
– Monitored by the development engineers
• Licensed under SISSL
• http://gridengine.sunsource.net/
• http://gridengine.info/
> By the community, for the community
27
6.1 Supported Platforms
Master Host Compute Host
Solaris 8, 9, 10 on SPARC Solaris 8, 9, 10 on SPARC
Solaris 9, 10 on x86 Solaris 9, 10 on x86
Solaris 10 on x64 Solaris 10 on x64
Linux kernel 2.4-2.6 on Linux kernel 2.4-2.6 on
x86/x64/ia64 (glibc >= 2.3.2) x86/x64/ia64 (glibc >= 2.3.2)
Windows 2000/XP Pro,
2000/2003 Server
Mac OS X 10.4 on PPC/x86
AIX 5.1, 5.3
HP-UX 11.0+ (32 & 64 bit)
Irix 6.5
28
What's New?
•Sun Grid Engine 6.2
>Out since May
>Advance reservations
>Improved interactive job support
>Improved scalability
–63k cores
–Streamlined communications
–Tighter memory management
>Scheduler thread
>Array task dependencies
>JMX API
29
Advance Reservation
•Enables users to schedule compute time
>Schedule around people, places, and data
•Just like calling a restaurant:
• “I'd like a table for 4 at 6:00PM on Tuesday, and we'll
need a booster seat.”
• “I'd like 4 nodes at 6:00PM on Tuesday for 2 hours, and
I'll need the Boost library.”
•Users can create a delete their own reservations
>The scheduler makes sure everything fits
30
Advance Reservation Details
•Jobs can be submitted to a reservation
>Scheduled within reservation boundaries
>Terminated when reservation ends
•Reservations can be shared
>Multiple users and/or groups
>Declared when making reservations
•Backfill before reservation
>Jobs with run time limits
•New tools for reservation administration
>qrsub, qrstat, qrdel
31
Advance Reservation Scheduling
•All requested resources must be available
•All allowed users must have access
•Unbounded jobs
>No soft or hard run time limit or infinite default duration
>Cannot be scheduled before an advance reservation
>An advance reservation cannot be scheduled after
•Free to use resources as desired
>May be shared by multiple users
>May be used for multiple jobs
>May go unused
32
Advance Reservation Example
• % qrsub -a 07271400 -d 0:30:0 -l arch=sol-sparc64
-pe mpi 16 -u dant,andy,@prgeng
• Your advance reservation 1 has been granted
• % qsub -pe mpi 4 -ar 1 blast1.csh
• Your job 1 ("BLAST") has been submitted
• % qsub -t 1-8 -ar 1 reseq.sh
• Your job 3 ("reseq.sh") has been submitted
33
Streamlined Communications
Execution
ARCo Daemon
qsub Execution
qrsh Daemon
qlogin QMaster
qmon qmaster
qmaster
qtcsh Execution
Daemon
App DRMAA Execution
Shadow
Scheduler Daemon
Master
34
Scheduler As a Thread
Execution
ARCo Daemon
qsub Execution
qrsh Qmaster Daemon
qlogin
qmon
Scheduler Execution
qtcsh
Daemon
App DRMAA Execution
Daemon
35
Sun Grid Engine Multi-Clustering
I need resources I have 2 free
Sun Grid Engine Sun Grid Engine Spare Pool
grid #1 grid #2
Service Domain Manager
36
Sun Grid Engine Multi-Clustering
I still need
I can spare some
resources
Sun Grid Engine Sun Grid Engine
grid #1 grid #2
Service Domain Manager
37
Sun Grid Engine Multi-Clustering
•Grids are monitored by Service Level Objectives
•Policies control relative grid priorities
Sun Grid Engine Sun Grid Engine
grid #1 grid #2
Service Domain Manager
38
Multi-Clustered Accounting
•Multiple grids can use the same ARCo database
>All accounting data available from the same web interface
Sun Grid Engine grid #1 Sun Grid Engine grid #2
ARCo
39
Job Dependencies Before 6.2
•Jobs can declare a dependency list
>Cannot be scheduled until all dependencies finish
•Parametric jobs work the same way
>No tasks can start until all tasks in dependencies finish
Job 1
Job 2
Job 3
40
Array Job Interdependencies
•Parametric job tasks can depend on other tasks
•Task dependencies can be “chunked”
>One task can depend on several
>Several tasks can depend on one
Job 1
Job 2
Job 3
41
Array Job Interdependencies
•Provides more flexibility
>Translating processes to job workflows
>Handling multi-step jobs
•Contribution by Rising Sun Pictures
>Charlotte's Web, Harry Potter, Blood Diamond, etc.
>More cost effective to contribute than switch DRMs
>Promoting Grid Engine as foundation for open source special
effects generation platform
42
Other 6.2 Features
•Improved scalability (up to 90k cores)
>Streamlined internal protocols
>Scheduler as a thread in the qmaster
•ARCo scalability enhancements
•Java™ Virtual Machine as a qmaster thread
>JGDI via JMX
•Driven by TACC
Sun Confidential: Internal Only 43
For the Price Of a Cup of Coffee...
•You too can make a difference
•Docs on wikis.sun.com
>Best practices on wiki.gridengine.info
•Open source at http://gridengine.sunsource.net
•Users alias at users@gridengine.sunsource.net
44
How to Program
Thousands of Cores
●
Curt Harpold
●curt.harpold@sun.com
45
Related docs
Get documents about "