Desktop Grids

Document Sample
Desktop Grids Powered By Docstoc
					                                  Desktop Grids
                             Ashok Adiga
                   Texas Advanced Computing Center
                       {adiga@tacc.utexas.edu}




SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                             Topics

       • What makes Desktop Grids different?
       • What applications are suitable?
       • Three Solutions:
             – Condor
             – United Devices Grid MP
             – BOINC



SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
          Compute Resources on the Grid
    • Traditional: SMPs, MPPs, clusters, …
          – High speed, Reliable, Homogeneous, Dedicated,
            Expensive (but getting cheaper)
          – High speed interconnects
          – Upto 1000s of CPUs
    • Desktop PCs and Workstations
          – Low Speed (but improving!), Heterogeneous, Unreliable, Non-
            dedicated, Inexpensive
          – Generic connections (Ethernet connections)
          – 1000s-10,000s of CPUs
          – Grid compute power increases as desktops are upgraded


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                   Desktop Grid Challenges
    • Unobtrusiveness
          – Harness underutilized computing resources without impacting
            the primary Desktop user
    • Added Security requirements
          – Desktop machines typically not in secure environment
          – Must protect desktop & program from each other (sandboxing)
          – Must ensure secure communications between grid nodes
    • Connectivity characteristics
          – Not always connected to network (e.g. laptops)
          – Might not have fixed identifier (e.g. dynamic IP addresses)
    • Limited Network Bandwidth
          – Ideal applications have high compute to communication ratio
          – Data management is critical to performance


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
         Desktop Grid Challenges (cont’d)
        • Job Scheduling heterogeneous, non-
          dedicated resources is complex
             – Must match application requirements to resource
               characteristics
             – Meeting QoS is difficult since program might have
               to share the CPU with other desktop activity
        • Desktops are typically unreliable
             – System must detect & recover from node failure
        • Scalability issues
             – Software has to manage thousands of resources
             – Conventional application licensing is not set up for desktop
               grids

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                         Application Feasibility
        • Only some applications map well to Desktop
          grids
             –   Coarse-grain data parallelism
             –   Parallel chunks relatively independent
             –   High computation-data communication ratios
             –   Non-Intrusive behavior on client device
                   • Small memory footprint on the client
                   • I/O activity is limited
             – Executable and data sizes are dependent on
               available bandwidth


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                           Typical Applications
        • Desktop Grids naturally support
          data parallel applications
             – Monte Carlo methods
             – Large Database searches
             – Genetic Algorithms
             – Exhaustive search techniques
             – Parametric Design
             – Asynchronous Iterative algorithms

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                            Condor
 • Condor manages pools of workstations and dedicated
   clusters to create a distributed high-throughput computing
   (HTC) facility.
       – Created at University of Wisconsin
       – Project established in 1985
 • Initially targeted at scheduling clusters providing functions
   such as:
       –   Queuing
       –   Scheduling
       –   Priority Scheme
       –   Resource Classifications
 • And then extended to manage non-dedicated resources
       – Sandboxing
       – Job preemption

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                            Why use Condor?
  • Condor has several unique mechanisms such as :
        –   ClassAd Matchmaking
        –   Process checkpoint/ restart / migration
        –   Remote System Calls
        –   Grid Awareness
        –   Glideins
  • Support for multiple “Universes”
        – Vanilla, Java, MPI, PVM, Globus, …
  • Very simple to install, manage, and use
        – Natural environment for application developers
  • Free!


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                           Typical Condor Pool
       = Process Spawned                                                        Execute-Only
                                           Central Manager
       = ClassAd
         Communication                                                              master
         Pathway                                 master
                                                                                       startd
                                             startd          negotiator
                                                                                 Execute-Only
                                    schedd
    Submit-Only                                            collector
                                                                                     master
      master
                                                                                        startd
        schedd                                                Regular Node
                             Regular Node
                               master                                master

                                    startd                        startd

                                 schedd                              schedd

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide    December 8 & 9, 2005, Austin, TX
                              Condor ClassAds
        • ClassAds are at the heart of Condor
        • ClassAds
             – are a set of uniquely named expressions;
               each expression is called an attribute
             – combine query and data
             – semi-structured : no fixed schema
             – extensible



SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                               Sample ClassAd
          MyType = "Machine"
          TargetType = "Job"
          Machine = "froth.cs.wisc.edu"
          Arch = "INTEL"
          OpSys = "SOLARIS251"
          Disk = 35882
          Memory = 128
          KeyboardIdle = 173
          LoadAvg = 0.1000
          Requirements = TARGET.Owner=="smith" ||
            LoadAvg<=0.3 && KeyboardIdle>15*60


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                Condor Flocking
    • Central managers can allow schedds from other
      pools to submit to them.

                            Collector                   Collector                Collector
                          Negotiator                   Negotiator              Negotiator

      Submit                  Central                     Pool-Foo                 Pool-Bar
      Machine                 Manager                      Central                 Central
                         (CONDOR_HOST)                    Manager                  Manager
      Schedd



SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
      Example: POVray on UT Grid Condor
                                                                         Was: 2h 17 min




            5-8 min



               …




            5-8 min                                                          Now: 15 min
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide        December 8 & 9, 2005, Austin, TX
               Parallel POVray on Condor
    A. Submitting POVray to Condor Pool – Perl Script
         1.    Automated creation of image “slices”
         2.    Automated creation of condor submit files
         3.    Automated creation of DAG file
         4.    Using DAGman for job flow control

    B. Multiple Architecture Support
         1. Executable = povray.$$(OpSys).$$(Arch)

    C. Post processing with a C-executable
         1. “Stitching” image slices back together into one image file
         2. Using “xv” to display image back to user desktop
            • Alternatively transferring image file back to user’s desktop


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
         POVray Submit Description File
       Universe = vanilla
       Executable = povray.$$(OpSys).$$(Arch)
       Requirements = (Arch == "INTEL" && OpSys == "LINUX") || \
                        (Arch == "INTEL" && OpSys == "WINNT51") || \
                        (Arch == "INTEL" && OpSys == "WINNT52")
       transfer_files = ONEXIT
       Input = glasschess_0.ini
       Error = Errfile_0.err
       Output = glasschess_0.ppm
       transfer_input_files = glasschess.pov,chesspiece1.inc
       arguments = glasschess_0.ini
       log = glasschess_0_condor.log
       notification = NEVER
       queue

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                             DAGman Job Flow
  PARENT            A0         A1          A2         A3         A4          A5            …            An




                                    Pre-processing prior to executing Job B

   CHILD                                                    B




SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide        December 8 & 9, 2005, Austin, TX
              DAGman Submission Script
     $ condor_submit_dag povray.dag
                    # Filename: povray.dag
                    Job A0 ./submit/povray_submit_0.cmd
                    Job A1 ./submit/povray_submit_1.cmd
                    Job A2 ./submit/povray_submit_2.cmd
                    Job A3 ./submit/povray_submit_3.cmd
                    Job A4 ./submit/povray_submit_4.cmd
                    Job A5 ./submit/povray_submit_5.cmd
                    Job A6 ./submit/povray_submit_6.cmd
                    Job A7 ./submit/povray_submit_7.cmd
                    Job A8 ./submit/povray_submit_8.cmd
                    Job A9 ./submit/povray_submit_9.cmd
                    Job A10 ./submit/povray_submit_10.cmd
                    Job A11 ./submit/povray_submit_11.cmd
                    Job A12 ./submit/povray_submit_12.cmd       #!/bin/sh
                    Job B barrier_job_submit.cmd
                    PARENT A0 CHILD B
                                                                /bin/sleep 1
                    PARENT A1 CHILD B
                    PARENT A2 CHILD B
                    PARENT A3 CHILD B
                    PARENT A4 CHILD B
                    PARENT A5 CHILD B
                    PARENT A6 CHILD B
                    PARENT A7 CHILD B
                    PARENT A8 CHILD B
                    PARENT A9 CHILD B
                    PARENT A10 CHILD B                          #!/bin/sh
                    PARENT A11 CHILD B
                    PARENT A12 CHILD B
                                                                ./stitchppms glasschess > glasschess.ppm 2> /dev/null
                    Script PRE B postprocessing.sh glasschess   rm *_*.ppm *.ini Err* *.log povray.dag.*
                                                                /usr/X11R6/bin/xv $1.ppm

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide             December 8 & 9, 2005, Austin, TX
                     United Devices Grid MP
 • Commercial product that aggregates unused cycles
   on desktop machines to provide a computing
   resource.
 • Originally designed for non-dedicated resources
       – Security, non-intrusiveness, scheduling, …
       – Screensaver/graphical GUI on client desktop
 • Support for multiple clients
       – Windows, Linux, Mac, AIX, & Solaris clients




SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                    How Grid MP™ Works
                                                                                                    Grid MP Agent
                                                                            Low latency                   - Clusters
                           • Authenticates users and devices                parallel jobs
                           • Dispatches jobs based on priority…
                           • Monitors and reschedules failed jobs
                           • Collects job results
                                                                             Large
                                                                         sequential jobs
 • Submits jobs
 • Monitors job progress
                                                  Grid MP                                                 • Advertises capability
 • Processes results                                                              Grid MP Agent
                                                  Services                                                • Launches job
                                                                                        - Servers
                                                                                                          • Secure job execution
                                                                                                          • Returns result
                                                                                                          • Caches data for reuse
     User
               • Web browser interface                Administrator
               • Command Line Interface
               • XML Web services API                      Large data parallel jobs                 Grid MP Agent
                                                                                                          -Workstation
                                                                                                          - Desktop




SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide                          December 8 & 9, 2005, Austin, TX
                  UD Management Features
       • Enterprise features make it easier to convince
         traditional IT organizations & and individual
         desktop users to install software
            – Browser-based administration tools allow local
              management/policy specification to manage
                  • Devices
                  • Users
                  • Workloads
            – Single click install of client on PCs
                  • Easily customizable to work with software management
                    packages


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
             Grid MP™ Provisioning Example
                                                        Device Group X                        Device Group X
                                               • User Groups A = 50%, B = 25%
                                               • Usage: 8am-5pm, 2hr cut-off
                                               • Runnable app. list …..




                                                        Device Group
                                                        Administrator
   User
   Group B                                                                                       Device Group Y
                                                              Device Group Y
                                    Grid MP              • User Group B = 100%
                                    Services             • Usage: 24hrs, 1hr cut-off
                                                         • Runnable app. list….


                                                               Device Group X
                                                      • User Groups A = 50%, B = 50%
                                                      • Usage: 6pm-8am, 8hr cut-off          Device Group Z
                                                      • Runnable app. list…..
                                  Root
   User                           Administrator
   Group A
                                                     Device Group
                                                     Administrator(s)




SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide             December 8 & 9, 2005, Austin, TX
               Application Types Supported
        •     Batch jobs
             – Use mpsub command to run single executable on
               single remote desktop
        •     MPI jobs
             – Use ud_mpirun command to run MPI job across a
               set of desktop machines
        •     Data Parallel jobs
             – Single job consists of several independent
               workunits that can be executed in parallel
             – Application developer must create program
               modules and write application scripts to create
               workunits
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                          Hosted Applications
      •      Hosted Applications are easier to manage
            – Provides users with managed application
            – Great for applications that are run frequently but rarely
               updated
            – Data Parallel applications fit best in hosted scenario
            – Users do not have to deal with the application maintenance
               only developer does.


      •      Grid MP is optimized for running hosted applications
            – Applications and data are cached at client nodes
            – Affinity scheduling to minimize data movement by re-using
                cached executables and data.
            – Hosted application can be run across multiple platforms by
                registering executables for each platform



SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
            Example: Reservoir Simulation
     • Landmark’s VIP product benchmarked on Grid MP
     • Workload consisted of 240 simulations for 5 wells
          – Sensitivities investigated include:
                •   2 PVT cases,
                •   2 fault connectivity,
                •   2 aquifer cases,
                •   2 relative permeability cases,
                •   5 combinations of 5 wells
                •   3 combinations of vertical permeability multipliers
          – Each simulation packaged as a separate piece of work.
     • Similar Reservoir simulation application has been
       developed at TACC (with Dr. W. Bangerth, Institute of
       Geophysics)


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                Example: Drug Discovery
• Think & LigandFit
  applications
     – Internet Project in partnership
       with Oxford University Model
       interactions between proteins
       and potential drug molecules
     – Virtual screening of drug
       molecules to reduce time-
       consuming, expensive lab
       testing by 90%
     – Drug Database of 3.5 billion
       candidate molecules.
     – Over 350K active computers
       participating all over the world.


SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                               Think
        • Code developed at Oxford University
        • Application Characteristics
             – Typical Input Data File: < 1 KB
             – Typical Output File: < 20 KB
             – Typical Execution Time: 1000-5000
               minutes
             – Floating-point intensive
             – Small memory footprint
             – Fully resolved executable is ~3Mb in size.
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
    Grid MP: POVray Application Portal




SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                            BOINC
  • Berkeley Open Infrastructure for Network
    Computing (BOINC)
       – Open source follow-on to SETI@home
       – General architecture supports multiple
         applications
       – Solution targets volunteer resources, and not
         enterprise desktops/workstations
       – More information at http://boinc.berkeley.edu
  • Currently being used by several internet
    projects
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
               Structure of a BOINC project
                                                       Retry                     Result
                                                     generation                processing
                       BOINC DB
                       (MySQL)                     Work             Result            Garbage
                                                 generation       validation         collection




            Scheduling              Web                 data
           server (C++)          interfaces              data
                                                       server
                                                          data
                                                        server
                                   (PHP)               (HTTP)
                                                         server
                                                        (HTTP)
                                                         (HTTP)




        Ongoing tasks:
        - monitor server correctness
        - monitor server performance
        - develop and maintain applications
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide                   December 8 & 9, 2005, Austin, TX
                                            BOINC
       • No enterprise management tools
             – Focus on “volunteer grid”
                  • Provide incentives (points, teams, website)
                  • Basic browser interface to set usage preferences on PCs
                  • Support for user community (forums)
       • Simple interface for job management
             – Application developer creates scripts to submit
               jobs and retrieve results
       • Provides sandbox on client
       • No encryption: uses redundant computing to
         prevent spoofing

SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                        Projects using BOINC
        • Climateprediction.net: study climate change
        • Einstein@home: search for gravitational signals
          emitted by pulsars
        • LHC@home: improve the design of the CERN LHC
          particle accelerator
        • Predictor@home: investigate protein-related diseases
        • Rosetta@home: help researchers develop cures for
          human diseases
        • SETI@home: Look for radio evidence of
          extraterrestrial life
        • Cell Computing biomedical research (Japanese;
          requires nonstandard client software)
        • World Community Grid: advance our knowledge of
          human disease. (Requires 5.2.1 or greater)
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                                          SETI@home




     • Analysis of radio telescope data from Arecibo
           – SETI: search for narrowband signals
           – Astropulse: search for short broadband signals
     • 0.3 MB in, ~4 CPU hours, 10 KB out
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                             Climateprediction.net




        •   Climate change study (Oxford University)
             –   Met Office model (FORTRAN, 1M lines)
        •   Input: ~10MB executable, 1MB data
        •   Output per workunit:
             –   10 MB summary (always upload)
             –   1 GB detail file (archive on client, may upload)
        •   CPU time: 2-3 months (can't migrate)
             –   trickle messages
             –   preemptive scheduling
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX
                    Why use Desktop Grids?
 • Desktop Grid solutions are typically complete &
   standalone
       – Easy to setup and manage
       – Good entry vehicle to try out grids.
 • Use existing (but underutilized) resources
       – Number of desktops/workstations on campus (or in an
         enterprise) is typically an order of magnitude greater than
         traditional compute resources.
       – Power of grid grows over time as new, faster desktops
         are added
 • Typical (large) numbers of resources on desktop
   grids enable new approaches to solving problems
SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide   December 8 & 9, 2005, Austin, TX

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:10/6/2012
language:English
pages:35