What's New in Condor

Document Sample
What's New in Condor Powered By Docstoc
					Condor RoadMap


       Todd Tannenbaum
Computer Sciences Department
University of Wisconsin-Madison
   condor-admin@cs.wisc.edu
 http://www.cs.wisc.edu/condor
                    Outline
› The “Big Picture”
› Version 6.7.x
   Availability
     • Failover
   Scalability
     • Resources, jobs, matchmaking framework, files
   Accessibility
     • APIs, more Grid middleware, network




                              www.cs.wisc.edu/condor   2
                Big Picture
What do we want to achieve in a new
    Condor developer series?
›   Technology Transfer
     Building a bridge between the Condor
     production software development activity and
     the academic core research activity
    BAD-FS, Stork, Diskrouter, Parrot (transparent
     I/O), Schedd Glidein, VO Schedulers, HA,
     Management, Improved ClassAds…


                            www.cs.wisc.edu/condor   3
       What do we want to
         achieve, cont?
New Ports: Go to where the cycles are!
•The RedHat Dilemma
•Our porting ‘hopper’ :
       AIX 5.1L on the PowerPC architecture
       Redhat AS server on x86
       Fedora Core on x86
       Fedora Core 2 on x86
       Redhat AS server on AMD64
       SuSE 8.0 on AMD64
       Redhat AS server on IA64
       HPUX 11.11 64-bit

                              www.cs.wisc.edu/condor   4
What do we want to achieve,
          cont.
› Improve existing ports
  Move “clipped wing” port to full ports
   (w/ checkpoint, process migration)
    • Max OS X, Windows
  Better integration into environments
    • Windows: operate better w/ DFS, use MSI
    • Unix: operate w/ AFS



                          www.cs.wisc.edu/condor   5
What do we want to achieve,
          cont.
› Address changes in the computing
 landscape
  Firewalls, NATs
  64-bit operating systems
  Emphasis on data
  Movement towards standards such as
   WS, OGSA, …


                      www.cs.wisc.edu/condor   6
    Version 6.7.x Theme
› Version 6.7.x
  Scalability
    • Resources, jobs, matchmaking framework,
      security
  Availability
    • Failover
  Accessibility
    • APIs, more Grid middleware, network



                        www.cs.wisc.edu/condor   7
           High Availability in
                 v6.7.x
  What happens if my
submit machine reboots?
Once upon a time, only one answer: job restarts.


     Checkpoint?

     No Checkpoint?


                              www.cs.wisc.edu/condor   8
 New: Job Progress continues
 if connection is interrupted
› Now for Vanilla and Java universe jobs,
  Condor now supports reestablishment of
  the connection between the submitting and
  executing machines.
› To take advantage of this feature, put the
  following line into their job’s submit
  description file:
     JobLeaseDuration = <N seconds>
For example:
      JobLeaseDuration = 1200


                        www.cs.wisc.edu/condor   9
What if the submission point
 spontaneously explodes?

(don’t try this at home)




                           www.cs.wisc.edu/condor   10
     More High Availability
          Solutions
› Condor can support a submit machine
 “hot spare”
  If your submit machine is down for
   longer than N minutes, a second machine
   can take over
› Two mechanisms available
  Job Mirroring
    • Described by Jaime earlier today
  High Availability Daemon Failover
    • Just tell the condor_master to run ONE
      instance

                         www.cs.wisc.edu/condor   11
          Daemon Failover
Machine A    Refresh     Refresh
                         Obtain
                         Check
                                         Machine B
              Lock        Lock
                          Lock
                          Lock
 Master                                     Master

 SchedD                                     SchedD




 Active                                         (hot spare)
                                                Active


                       www.cs.wisc.edu/condor           12
             Accessibility
› Support for GCB
   Condor working w/ NATs, Firewalls
› Distributed Resource Management
 Application API (DRMAA)
   GGF Working Group
   An API specification for the submission and
    control of jobs to one or more Distributed
    Resource Management (DRM) systems
   Condor DRMAA interface to appear in v6.7.0



                          www.cs.wisc.edu/condor   13
 SOAP/Grid Service


                  condor_schedd
Cedar
        Web Service:
        SOAP                      OGSI:
        HTTPS                     SOAP
                                  HTTPG



              www.cs.wisc.edu/condor      14
     New “Grid Universe”
› With new Grid Universe, always
  specify a ‘gridtype’. So the old
  “globus” Universe is now declared as:
     universe = grid
     gridtype = gt2
› Other gridtypes? GT3 for OGSA-
  based Globus Toolkit 3

                     www.cs.wisc.edu/condor   15
    Condor-G improvements
› Condor-G can submit to either Globus GT2 or GT3
  resources, including support for GT3 with web
  services.
    Condor-G includes everything required; no need for client
     to have a GT3 installation.
    Good migration path to OGSA
› Condor-G to Nordugrid, Unicore, Condor, ORACLE
› Support for credential refresh via the MyProxy
  Online Credential Management in NMI
   http://grid.ncsa.uiuc.edu/myproxy/




                                www.cs.wisc.edu/condor       16
   Why Condor + MyProxy?
› Long-lived tasks or services need
  credentials
    Task lifetime is difficult to predict
› Don’t want to delegate long-lived
  credentials
    Fear of compromise
› Instead, renew credentials with MyProxy
  as needed during the task’s lifetime
    Provides a single point of monitoring and
     control
    Renewal policy can be modified at any time
      • For example, disable renewals if compromise is
        detected or suspected


                                  www.cs.wisc.edu/condor   17
           Credential Renewal
            Home                                 Remote
    Submit
     Jobs Condor-G       Launch Job            Resource
           Scheduler Refresh Credentials       Manager

                 Retrieve                                Refresh
                Credentials                             Credentials


 Enable     MyProxy
Renewal                                                Job

                              www.cs.wisc.edu/condor         18
                More…
› Condor can now transfer job data
 files larger than 2 GB in size.
  On all platforms that support 64bit file
   offsets
› Real-time spooling of stdout/err/in in
 any universe incl VANILLA
  Real-time monitoring of job progress


                       www.cs.wisc.edu/condor   19
Thank you!




     www.cs.wisc.edu/condor   20

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:3/9/2012
language:Latin
pages:20