gopu_hydra_almesvisit_feb2006

Document Sample
gopu_hydra_almesvisit_feb2006 Powered By Docstoc
					             HYDRA
Using Windows Desktop Systems
in Distributed Parallel Computing



     NSF Site Visit
     2-23-2006
                 Introduction…
• Windows desktop systems at IUB
  student labs
  – 2300 systems, 3 year replacement cycle
  – Pentium IV (>=1.6 GHz), 256/512/1024 MB
    memory, 10/100 Mbps/GigE, Windows XP
  – More than 1.5 TF



     NSF Site Visit
     2-23-2006
  Possibly Utilize Idle Cycles?




Red: total owner Blue: total idle Green: total Condor


       NSF Site Visit
       2-23-2006
       Problem Description
• Once again... Windows desktop
  systems at IUB student labs:
  – As a scientific resource
  – Harvest idle cycles



     NSF Site Visit
     2-23-2006
                   Constraints

• Systems dedicated to students using
  desktop office applications — not parallel
  scientific computing – making their
  availability unpredictable and sporadic
• Microsoft Windows environment
• Daily software rebuild (updates)


       NSF Site Visit
       2-23-2006
     What could these systems be
              used for?
• Many small computations and a few small
  messages
  – Foreman-worker
  – Parameter studies
  – Monte Carlo
• Goal: High Throughput Computing (not HPC)
  – Parallel runs of the aforementioned small computations
    to make better use of resource
  – Parallel libraries – MPI, PVM, etc. – have constraints if
    availability of resources is ephemeral i.e. not predictable


          NSF Site Visit
          2-23-2006
                     Solution
• Simple Message Brokering Library (SMBL)
  – Limited replacement for MPI
     • Both server and client library based on TCP socket
       abstraction
  – Porting from MPI is fairly straight forward
• Process and Port Manager (PPM)
• Plus …
  – Condor for job management, file transfer, no
    checkpointing or parallelism
  – Web portal for job submission


       NSF Site Visit
       2-23-2006
                 The Big Picture
   We’ll discuss each part in more detail next…




The shaded box indicates components hosted on multiple desktop computers



       NSF Site Visit
       2-23-2006
                      SMBL (Server)
                             SMBL Server Process Table for 4
                                   CPU parallel session
• SMBL server maintains      SMBL Rank     Condor Assigned Node
  a dynamic pool of client
                                 0        Wrubel Computing Center,
  process connections        (Foreman)    sacramento
• Worker job manager             1        Chemistry Student Lab,
  hides details of                        computer_14
  ephemeral workers at           2        CS Student Lab,
                                          computer_8
  the application level
                                 3        Library, computer_6




          NSF Site Visit
          2-23-2006
                      SMBL (Server)
                             SMBL Server Process Table for 4
                                   CPU parallel session
• SMBL server maintains      SMBL Rank     Condor Assigned Node
  a dynamic pool of client
                                 0        Wrubel Computing Center,
  process connections        (Foreman)    sacramento
• Worker job manager             1        Chemistry Student Lab,
  hides details of                        computer_14
  ephemeral workers at           2        Physics Student Lab,
                                          computer_11
  the application level
                                 3        Library, computer_6




          NSF Site Visit
          2-23-2006
                  SMBL (Client)
• Client library implements selected
  MPI-like calls
  – MPI_Send ()  SMBL_Send ()
  – MPI_Recv ()  SMBL_Recv ()
• In charge of message delivery for
  each parallel process


      NSF Site Visit
      2-23-2006
   Process and Port Manager
            (PPM)
• Starts the SMBL server and
  application processes on demand
• Assigns port/host to each parallel
  session
• Directs workers to their servers


     NSF Site Visit
     2-23-2006
                    PPM (cont’d ...)
            PPM with two SMBL servers (two parallel sessions)
                  SMBL Rank                 Condor Assigned Node
                  0 (Foreman)       Wrubel Computing Center, sacramento
 Parallel
Session 1
                          1         Chemistry Student Lab, computer_14

                          2         CS Student Lab, computer_8
                          3         Wells Library, computer_6
                  0 (Foreman)       Wrubel Computing Center, sacramento

 Parallel
                          1         Wells Library, computer_27
Session 2
                          2         Biology Student Lab, computer_4
                          3         CS Student Lab, computer_2


              NSF Site Visit
              2-23-2006
Once again … the big picture




The shaded box indicates components hosted on multiple desktop computers



       NSF Site Visit
       2-23-2006
        Recent Development
• Hydra cluster Teragrid enabled! (Nov
  2005)
  – Allow TG users to use resource
  – Virtual Host based solution – two different
    URLs for IU and Teragrid users
  – Teragrid users authenticate against PSC’s
    Kerberos server



       NSF Site Visit
       2-23-2006
              System Layout

• PPM, SMBL server, Condor and web
  portal running on Linux server
  – Dual Intel Xeon 3.0 GHz, 4 GB
    memory, GigE
• Second Linux server running Samba
  to serve BLAST database



       NSF Site Visit
       2-23-2006
                           Portal
• Creates and submits
  Condor files, handles
  data files
• Apache/PHP based
• Kerberos authentication

• URLs:
  – http://hydra.indiana.edu (IU users)
  – http://hydra.iu.teragrid.org (Teragrid users)


             NSF Site Visit
             2-23-2006
     Utilization of Idle Cycles




Red: total owner Blue: total idle Green: total Condor


       NSF Site Visit
       2-23-2006
                    Summary
• Large parallel computing facility created at a low
  cost
  – SMBL parallel message passing library that can deal
    with ephemeral resources
  – PPM port broker that can handle multiple parallel
    sessions
• SMBL Homepage
  – http://smbl.sourceforge.net (Open Source)



        NSF Site Visit
        2-23-2006
          Links and References
• Hydra Portal
  – http://hydra.indiana.edu (IU users)
  – http://hydra.iu.teragrid.org (Teragrid users)
• SMBL home page: http://smbl.sourceforge.net
• Condor home page:
  http://www.cs.wisc.edu/condor/
• IU Teragrid home page – http://iu.teragrid.org



          NSF Site Visit
          2-23-2006
    Links and References (cont’d..)

• Parallel FastDNAml:
  http://www.indiana.edu/~rac/hpc/fastDNAml
• Blast: http://www.ncbi.nlm.nih.gov/BLAST
• Meme: http://meme.sdsc.edu/meme/intro.html




         NSF Site Visit
         2-23-2006

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:3
posted:3/5/2010
language:
pages:21