How to Program Thousands of Processors in a Massive

Document Sample
scope of work template
							How to Program
Thousands of
Processors in a
Massive Compute Grid
●
 Curt Harpold
●Sr. SPARC Technical Specialist

●Sun Microsystems




                                  1
 What is Grid Computing?
                         Autonomic               On-Demand
    Organic IT           Computing

                                     Dynamic Systems
             GRID
             COMPUTING                  Initiative


                          Real Time            Enterprise Grids
Adaptive Enterprise
                          Infrastructure

                                           Utility Computing
             Dynamic IT

                                                                  2
Grid Computing Defined

 “The combination of distributed
 resources with a corresponding
 management infrastructure hosting at
 least one type of service or
 workload...”
                              Fritz Ferstl
                    Director, Sun Grid Engine
                      Sun Microsystems, Inc.


                                                3
What is Grid Computing?
• The network is the computer™
  > Distributed resources
  > Management infrastructure
  > Targeted service or workload
• Utilization & performance ↑, costs & complexity ↓
• Examples:
  > Rendering and simulation “farms”
  > Aggregating desktops for computation, aka cycle stealing
     – e.g. SETI@Home
  > Managing an entire rack from a single interface
                                                               4
Compute Grids
• Solving problems horizontally
  > High Performance [Technical] Computing
  > Data center optimization
• Examples:
  > EDA
  > Modeling
  > Transaction validation
• Increasing utilization
  > 15%-25% is typical
  > Cycle stealing
                                             5
Dynamic Resource Management
                Distributed
                 Resource
                 Manager
      Jobs
                              Dispatch

      Results




                                         6
Grid-enabled Applications
• Applications which take advantage of the grid
  > Integrated
• Embarrassingly parallel
  > Completely independent tasks
• Or just parallel
  > Tasks that need to talk to each other
• Or something that can live on the grid...



                                                  7
    Typical Data Center

                      No Virtualization
                      • Every machine
          u
                        managed individually
                      • Disproportionate
u                       number of admins
                      • No clear accounting or
                        accountability
      u
                      • 10%-25% utilization
                                                 8
    Data Center Virtualization



u
                       Compute Grid
                       • Users interact with grid,
                         not individual machines
u
                       • Increased utilization
u
                       • Increased availability



                                                     9
SGE In a Nutshell
•Distributed Resource Manager
>Matches workload* to resources
>Optimizes resource utilization
•Loves heterogeneity
>Solaris, Linux, Windows, MacOS, AIX, HP-UX...
•Why SGE?
>Very feature competitive
>Very cost competitive
>Credible open source & product offering
>Thriving community
* Best with jobs > ~30s
                                                 10
Architecture Overview
                          Execution
               ARCo       Daemon

    qsub                  Execution
    qrsh      Qmaster     Daemon
    qlogin
    qmon
              Scheduler   Execution
    qtcsh
                          Daemon

  App DRMAA               Execution
                          Daemon

                                      11
Example Configuration
                                             Jobs


Queue Instances
                            Hosts

                                                          Slots
     sparc                         amd
                                                         Queues
                    all.q

              smp                                blast

 solsparc1 solsparc2 solsparc3 solamd1       solamd2      Host
                                                         Groups
       @dev                              @prod
                       @allhosts
                                                                  12
Resource Matching
               Selection                      Scheduling


                             JOB

        User
  ●
      User policies   ●
                           Job policies
  ●
      Groups          ●
                           Resources      ●
                                              System
  ●
      Roles                                   characteristics
  ●
      Departments                         ●
                                              System status
  ●
      Projects                            ●
                                              Resources

                                                                13
Data Management
• No explicit data management
• Shared File System
  > NFS “by default”
• Script files are transferred
  > Binaries are not
• File Staging
  > Copy data in before job
  > Copy results out after
  > Not inherent feature
     – Configured via scripting hooks
                                        14
Security
• Access Control Lists
  > Explicitly allow or disallow
  > Users and groups
• Restricted operations
  > Managers and operators
  > Submit and admin hosts
• Certificate-based encryption
  > Hides and protects data
  > Guarantees identity
• Replace rsh/rlogin/telnet with ssh
                                       15
Maximum Flexibility
• Almost every behavior can be configured
• Resources
  > Load sensors
• Hierarchical
  > Hosts, host groups, queues, etc.
  > Users, user groups, departments, projects, etc.
• Script-based integration points
  > Suspend/resume
  > Job execution
  > Checkpointing, Parallel Environments
                                                      16
Policies and Priorities

                                                   User 1           Project C
                                                                                Team B

    Enterprise-wide
   Resource Demand
                                         User 2
   Department 1                                      Project A        Contractor X

                                    Department 2
                                                              Departmental
                                                             Resource Access

                                 Department 3
 Department 5     Department 4


                                                                                         17
100% Scriptable
• Equivalence of GUI and command-line tools
  > Most administrators do not use the GUI
• Non-interactive alternatives
  > Instead of launching an editor
• Simple output text
  > Easily parsed
• XML output
  > Schema provided
• Rich ecosystem of tools
                                              18
Ease of Use
•   No configuration files
•   Full equivalence of GUI and command line
•   Changes do not require restart
•   No client-side installation
    > Assuming a shared file system...
• Simple install
    > Even complex installation in minutes
    > Can be fully automated
       – Including remote access

                                               19
Scalability
• Sun Grid Engine 6.1 target:
  > 10k+ hosts (hosts ≤ CPU's)
  > 500k+ jobs (no task limit)
• Sun Grid Engine 6.1:
  > Job round-trip 0.4s
     – Mostly fork and exec
  > Submit rate >120 Jobs/sec
     – Using DRMAA
• Sun Grid Engine 6.2 target:
  > 90k+ cores

                                 20
Sophisticated Scheduler
• Align resource usage with business policies
  >   Historical usage tracking
  >   Time-based priorities
  >   Resource-based priorities
  >   Fine-grained quotas
• Maximize utilization
  > Hardware and software
• Dynamic, continuously evaluated
  > Changes take effect immediately
     – No restart

                                                21
Accounting and Reporting
• ARCo: Accounting and Reporting Console
  > Fine-grained resource accounting
     – Stored in RDBMS in well-defined schema
     – Standard SQL access for 3rd party tools
     – Customizable and extensible
  > Web-based console tool
     – Generate reports, queries, etc.
     – Customizable queries and report formats
     – Spreadsheet report generation for offline analysis




                                                            22
Distributed Resource
Management Application API
• Standard from the Open Grid Forum
  > Submit, monitor, control jobs
  > Language & platform agnostic
• ISVs
  > “Grid-enable” their applications
  > Avoid DRM/Grid system lock-in
• In-house developers
  > Integrate Grid tasks into workflow, orchestration, online
    apps, etc.

                                                                23
User Interfaces



                       Browser (accounting)

     Graphical                                   Command-line


                       Sun Grid Engine


      <c/>                                        <java/>
Programmatic (DRMAA)                          Programmatic (DRMAA)
                                                                 24
Utility Computing
• Everything gets logged
  > All events – job, host, queue, etc.
  > Usage information
  > Projects, accounts, departments
• Accounting file
  > qacct -j job_id
• Reporting file
  > DBWriter → ARCo
• Core to the Sun Grid Compute Utility

                                          25
How to Get Grid Engine
• Sun Grid Engine – Licensed Product
  > Support and Customer Indemnification
  > Limited platforms
• Grid Engine Open Source Project
  >   Same source tree as Sun Grid Engine
  >   Runs on almost anything
  >   Supported by the Community
  >   Free
• Sun Download Center
  > Same as Licensed Product, for Free
  > Add support contract later
  > Ultimate Try and Buy                    26
Open Source Project
• Foundation for Sun Grid Engine
  > Development happens in open source
• Very widely adopted – strong community
  > Active mailing lists
     – Monitored by the development engineers
• Licensed under SISSL
• http://gridengine.sunsource.net/
• http://gridengine.info/
  > By the community, for the community

                                                27
6.1 Supported Platforms
                Master Host                        Compute Host
        Solaris 8, 9, 10 on SPARC       Solaris 8, 9, 10 on SPARC
        Solaris 9, 10 on x86            Solaris 9, 10 on x86
        Solaris 10 on x64               Solaris 10 on x64
        Linux kernel 2.4-2.6 on         Linux kernel 2.4-2.6 on
        x86/x64/ia64 (glibc >= 2.3.2)   x86/x64/ia64 (glibc >= 2.3.2)
                                        Windows 2000/XP Pro,
                                        2000/2003 Server
                                        Mac OS X 10.4 on PPC/x86

                                        AIX 5.1, 5.3

                                        HP-UX 11.0+ (32 & 64 bit)

                                        Irix 6.5

                                                                        28
What's New?
•Sun Grid Engine 6.2
>Out since May
>Advance reservations
>Improved interactive job support
>Improved scalability
–63k cores
–Streamlined communications
–Tighter memory management
>Scheduler thread
>Array task dependencies
>JMX API
                                    29
Advance Reservation
•Enables users to schedule compute time
>Schedule around people, places, and data
•Just like calling a restaurant:
   • “I'd like a table for 4 at 6:00PM on Tuesday, and we'll
     need a booster seat.”

   • “I'd like 4 nodes at 6:00PM on Tuesday for 2 hours, and
     I'll need the Boost library.”
•Users can create a delete their own reservations
>The scheduler makes sure everything fits

                                                               30
Advance Reservation Details
•Jobs can be submitted to a reservation
>Scheduled within reservation boundaries
>Terminated when reservation ends
•Reservations can be shared
>Multiple users and/or groups
>Declared when making reservations
•Backfill before reservation
>Jobs with run time limits
•New tools for reservation administration
>qrsub, qrstat, qrdel
                                            31
Advance Reservation Scheduling
•All requested resources must be available
•All allowed users must have access
•Unbounded jobs
>No soft or hard run time limit or infinite default duration
>Cannot be scheduled before an advance reservation
>An advance reservation cannot be scheduled after
•Free to use resources as desired
>May be shared by multiple users
>May be used for multiple jobs
>May go unused
                                                               32
Advance Reservation Example
• % qrsub -a 07271400 -d 0:30:0 -l arch=sol-sparc64
  -pe mpi 16 -u dant,andy,@prgeng
• Your advance reservation 1 has been granted
• % qsub -pe mpi 4 -ar 1 blast1.csh
• Your job 1 ("BLAST") has been submitted
• % qsub -t 1-8 -ar 1 reseq.sh
• Your job 3 ("reseq.sh") has been submitted


                                                      33
Streamlined Communications
                                   Execution
               ARCo                Daemon

    qsub                           Execution
    qrsh                           Daemon
    qlogin    QMaster
    qmon      qmaster
               qmaster
    qtcsh                          Execution
                                   Daemon

  App DRMAA                        Execution
                          Shadow
              Scheduler            Daemon
                          Master

                                               34
Scheduler As a Thread
                          Execution
              ARCo        Daemon

    qsub                  Execution
    qrsh      Qmaster     Daemon
    qlogin
    qmon
              Scheduler   Execution
    qtcsh
                          Daemon

  App DRMAA               Execution
                          Daemon

                                      35
Sun Grid Engine Multi-Clustering
        I need resources                       I have 2 free




Sun Grid Engine              Sun Grid Engine                   Spare Pool
    grid #1                      grid #2

                      Service Domain Manager

                                                                            36
Sun Grid Engine Multi-Clustering
           I still need
                           I can spare some
           resources




  Sun Grid Engine                    Sun Grid Engine
      grid #1                            grid #2

                 Service Domain Manager

                                                       37
Sun Grid Engine Multi-Clustering
•Grids are monitored by Service Level Objectives
•Policies control relative grid priorities




             Sun Grid Engine              Sun Grid Engine
                 grid #1                      grid #2

               Service Domain Manager

                                                            38
Multi-Clustered Accounting
•Multiple grids can use the same ARCo database
>All accounting data available from the same web interface




        Sun Grid Engine grid #1          Sun Grid Engine grid #2



                                  ARCo
                                                                   39
Job Dependencies Before 6.2
•Jobs can declare a dependency list
>Cannot be scheduled until all dependencies finish
•Parametric jobs work the same way
>No tasks can start until all tasks in dependencies finish




           Job 1

                               Job 2
                                                   Job 3
                                                             40
Array Job Interdependencies
•Parametric job tasks can depend on other tasks
•Task dependencies can be “chunked”
>One task can depend on several
>Several tasks can depend on one


              Job 1

                      Job 2

                          Job 3
                                                  41
Array Job Interdependencies
•Provides more flexibility
>Translating processes to job workflows
>Handling multi-step jobs
•Contribution by Rising Sun Pictures
>Charlotte's Web, Harry Potter, Blood Diamond, etc.
>More cost effective to contribute than switch DRMs
>Promoting Grid Engine as foundation for open source special
effects generation platform



                                                               42
Other 6.2 Features
•Improved scalability (up to 90k cores)
>Streamlined internal protocols
>Scheduler as a thread in the qmaster
•ARCo scalability enhancements
•Java™ Virtual Machine as a qmaster thread
>JGDI via JMX
•Driven by TACC



                         Sun Confidential: Internal Only   43
For the Price Of a Cup of Coffee...
•You too can make a difference
•Docs on wikis.sun.com
>Best practices on wiki.gridengine.info
•Open source at http://gridengine.sunsource.net
•Users alias at users@gridengine.sunsource.net




                                                  44
How to Program
Thousands of Cores
●
 Curt Harpold
●curt.harpold@sun.com




                        45

						
Related docs