Beverage Server Sample Resume - PowerPoint

Description

Beverage Server Sample Resume document sample

Shared by: rob14866
Categories
Tags
-
Stats
views:
13
posted:
1/6/2011
language:
English
pages:
149
Document Sample
scope of work template
							Using Condor
  An Introduction

Condor Week 2007
         Condor Project
Computer Sciences Department
University of Wisconsin-Madison
   condor-admin@cs.wisc.edu
 http://www.cs.wisc.edu/condor
            Tutorial Outline
›The story of Frieda, the scientist
›Using Condor to manage jobs
›Using Condor to manage resources
›Condor architecture and mechanisms
›Condor on the grid
Flocking
Condor and other grid technologies
›Stop me if you have any questions!

                     http://www.cs.wisc.edu/condor   2
       Meet Frieda.

   She is a
scientist with
a big problem.



            http://www.cs.wisc.edu/condor   3
    Frieda’s Application …
Run a Parameter Sweep of F(x,y,z) for 20
values of x, 10 values of y and 3 values of z
20×10×3 = 600 combinations
F takes on the average 6 hours to compute on a
 “typical” workstation (total = 600 × 6 = 3600 hours)
F requires a “moderate” (256MB) amount of
 memory
F performs “moderate” I/O - (x,y,z) is 5 MB and
 F(x,y,z) is 50 MB



                       http://www.cs.wisc.edu/condor    4
   I have 600
simulations to run.

Where can I get
    help?

      http://www.cs.wisc.edu/condor   5
While sharing a beverage with
some colleagues, she shares her
problem. Somebody asks “Have
you tried Condor? It’s free.”




               http://www.cs.wisc.edu/condor   6
         Getting Condor
› Available as a free download from
   http://www.cs.wisc.edu/condor
› Download Condor for your operating
  system
  Available for most UNIX (including Linux
   and Apple’s OS/X) platforms
  Also for Windows NT / XP




                   http://www.cs.wisc.edu/condor   7
         Condor Releases
› Stable / Developer Releases
  Version numbering scheme similar to
   that of the (pre 2.6) Linux kernels …
  Major.minor.release
    • Minor is even (a.b.c): Stable
       – Examples: 6.6.3, 6.8.4, 6.8.5
       – Very stable, mostly bug fixes
    • Minor is odd (a.b.c): Developer
       – New features, may have some bugs
       – Examples: 6.7.11, 6.9.1, 6.9.2




                        http://www.cs.wisc.edu/condor   8
 Frieda Installs a “Personal
  Condor” on her machine…
› What do we mean by a “Personal” Condor?
  Condor on your own workstation
  No root / administrator access required
  No system administrator intervention needed
› After installation, Frieda submits her jobs
  to her Personal Condor…




                     http://www.cs.wisc.edu/condor   9
F(3,4,5)         Frieda’s Condor Pool
 600 Condor
    jobs

           personal
            Condor


            Frieda's
           workstation




                         http://www.cs.wisc.edu/condor   10
    Personal Condor?!

 What’s the benefit of a
Condor “Pool” with just one
  user and one machine?


           http://www.cs.wisc.edu/condor   11
Your Personal Condor will ...
› Keep an eye on your jobs and will keep you
    posted on their progress
›   Implement your policy on the execution
    order of the jobs
›   Keep a log of your job activities
›   Add fault tolerance to your jobs
›   Implement your policy on when the jobs can
    run on your workstation



                     http://www.cs.wisc.edu/condor   12
             Definitions
› Job
  The Condor representation of your work
› Machine
  The Condor representation of computers
   and that can perform the work
› Match Making
  Matching a job with a machine “Resource”



                   http://www.cs.wisc.edu/condor   13
                  Job
Jobs state their requirements and
 preferences:
  I need a Linux/x86 platform
  I want the machine with the most memory
  I prefer a machine in the chemistry
    department




                  http://www.cs.wisc.edu/condor   14
               Machine
Machines state their requirements and
 preferences:
  Run jobs only when there is no keyboard
    activity
  I prefer to run Frieda’s jobs
  I am a machine in the physics department
  Never run jobs belonging to Dr. Smith




                   http://www.cs.wisc.edu/condor   15
 The Magic of Matchmaking
› Jobs and machines state their
 requirements and preferences

› Condor matches jobs with machines
based on requirements and preferences



                 http://www.cs.wisc.edu/condor   16
     Getting Started:
 Submitting Jobs to Condor
› Overview:
  Choose a “Universe” for your job
  Make your job “batch-ready”
  Create a submit description file
  Run condor_submit to put your job in the
   queue




                   http://www.cs.wisc.edu/condor   17
  1. Choose the “Universe”
› Controls how Condor handles jobs
› Choices include:
  Vanilla
  Standard
  Grid
  Java
  Parallel




                 http://www.cs.wisc.edu/condor   18
 Using the Vanilla Universe
• The Vanilla Universe:
  – Allows running almost
    any “serial” job
  – Provides automatic file
    transfer, etc.
  – Like vanilla ice cream
    • Can be used in just
      about any situation


                     http://www.cs.wisc.edu/condor   19
   2. Make your job batch-
            ready
Must be able to run in
  the background
• No interactive input
• No windows
• No GUI



                   http://www.cs.wisc.edu/condor   20
Make your job batch-ready
       (continued)…
 Job can still use STDIN, STDOUT, and
  STDERR (the keyboard and the screen),
  but files are used for these instead of
  the actual devices
 Similar to UNIX shell:
   • $ ./myprogram <input.txt >output.txt




                  http://www.cs.wisc.edu/condor   21
       3. Create a Submit
         Description File
› A plain ASCII text file
› Condor does not care about file extensions
› Tells Condor about your job:
  Which executable, universe, input, output and error
    files to use, command-line arguments, environment
    variables, any special requirements or preferences
    (more on this later)
› Can describe many jobs at once (a “cluster”),
  each with different input, arguments, output,
  etc.

                      http://www.cs.wisc.edu/condor      22
  Simple Submit Description
            File

# Simple condor_submit input file
# (Lines beginning with # are comments)
# NOTE: the words on the left side are not
#       case sensitive, but filenames are!
Universe   = vanilla
Executable = my_job
Output     = output.txt
Queue


                   http://www.cs.wisc.edu/condor   23
    4. Run condor_submit
› You give condor_submit the name of
 the submit file you have created:
  condor_submit my_job.submit
› condor_submit:
  Parses the submit file, checks for errors
  Creates a “ClassAd” that describes your
   job(s)
  Puts job(s) in the Job Queue



                   http://www.cs.wisc.edu/condor   24
               ClassAd ?
› Condor’s internal data representation
  Similar to classified ads (as the name
   implies)
  Represent an object & its attributes
    • Usually many attributes
  Can also describe what an object
   matches with




                    http://www.cs.wisc.edu/condor   25
          ClassAd Details
› ClassAds can contain a lot of details
  The job’s executable is analysis.exe
  The machine’s load average is 5.6
› ClassAds can specify requirements
  I require a machine with Linux
› ClassAds can specify preferences
  This machine prefers to run jobs from
    the physics group

                   http://www.cs.wisc.edu/condor   26
 ClassAd Details (continued)
› ClassAds are:
  semi-structured
  user-extensible
  schema-free
  Attribute = Expression




                     http://www.cs.wisc.edu/condor   27
ClassAd Example
Example:
MyType       = "Job"    String
TargetType   = "Machine"
ClusterId    = 1377     Number
Owner        = "roy"
Cmd          = "sim.exe"
Requirements =          Boolean
   (Arch == "INTEL")
&& (OpSys == "LINUX")
&& (Disk >= DiskUsage)
&& ((Memory * 1024)>=ImageSize)
…




            http://www.cs.wisc.edu/condor   28
             The Dog
                  ClassAd for the “Job”
ClassAd
Type = “Dog”    ...
Color = “Brown” Requirements =
Price = 12        (type == “Dog”) &&
                     (color == “Brown”) &&
                     (price <= 15)
                  ...



                http://www.cs.wisc.edu/condor   29
          The Job Queue
› condor_submit sends your job’s
  ClassAd(s) to the schedd
› The schedd (more details later):
  Manages the local job queue
  Stores the job in the job queue
    • Atomic operation, two-phase commit
    • “Like money in the bank”
› View the queue with condor_q

                    http://www.cs.wisc.edu/condor   30
               Example
      condor_submit and condor_q
% condor_submit my_job.submit
Submitting job(s).
1 job(s) submitted to cluster 1.

% condor_q

-- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> :
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  1.0    frieda           6/16 06:52   0+00:00:00 I 0    0.0 my_job

1 jobs; 1 idle, 0 running, 0 held

%




                               http://www.cs.wisc.edu/condor          31
 Input, output & error files
› Controlled by submit file settings
› You can define the job’s standard input,
  standard output and standard error:
  Read job’s standard input from “input_file”:
     • Input = input_file
     • Shell equivalent: program <input_file
  Write job’s standard ouput to “output_file”:
     • Output = output_file
     • Shell equivalent: program >output_file
  Write job’s standard error to “error_file”:
     • Error = error_file
     • Shell equivalent: program 2>error_file




                        http://www.cs.wisc.edu/condor   32
      Email about your job
• Condor sends email about job
  events to the submitting user
• Specify “notification” in your
  submit file to control which
  events:
Notification   =   complete           Default
Notification   =   never
Notification   =   error
Notification   =   always

                       http://www.cs.wisc.edu/condor   33
     Feedback on your job
› Create a log of job events
› Add to submit description file:
  log = sim.log
› Becomes the Life Story of a Job
  Shows all events in the life of a job
  Always have a log file



                    http://www.cs.wisc.edu/condor   34
      Sample Condor User Log
000 (0001.000.000) 05/25 19:10:03 Job submitted from host:
<128.105.146.14:1816>
...
001 (0001.000.000) 05/25 19:12:17 Job executing on host:
<128.105.146.14:1026>
...
005 (0001.000.000) 05/25 19:13:06 Job terminated.
(1) Normal termination (return value 0)
...




                           http://www.cs.wisc.edu/condor     35
Example Submit Description
    File With Logging
# Example condor_submit input file
# (Lines beginning with # are comments)
# NOTE: the words on the left side are not
#       case sensitive, but filenames are!
Universe   = vanilla
Executable = /home/frieda/condor/my_job.condor
Log        = my_job.log    ·Job log (from Condor)
Input      = my_job.in     ·Program’s standard input
Output     = my_job.out    ·Program’s standard output
Error      = my_job.err    ·Program’s standard error
Arguments = -a1 -a2        ·Command line arguments
InitialDir = /home/frieda/condor/run
Queue


                      http://www.cs.wisc.edu/condor     36
  “Clusters” and “Processes”
› If your submit file describes multiple jobs, we call
  this a “cluster”
› Each cluster has a unique “cluster number”
› Each job in a cluster is called a “process”
    Process numbers always start at zero
› A Condor “Job ID” is the cluster number, a period,
  and the process number (i.e. 2.1)
    A cluster can have a single process
      • Job ID = 20.0              ·Cluster 20, process 0
    Or, a cluster can have more than one process
      • Job ID: 21.0, 21.1, 21.2  ·Cluster 21, process 0, 1, 2




                            http://www.cs.wisc.edu/condor        37
                            for a Cluster
     Submit File cluster of 2 jobs
# Example submit file for a
# with separate input, output, error and log files
Universe   = vanilla
Executable = my_job
Arguments = -x 0
log        = my_job_0.log
Input      = my_job_0.in
Output     = my_job_0.out
Error      = my_job_0.err
Queue                ·Job 2.0 (cluster 2, process 0)
Arguments = -x 1
log        =   my_job_1.log
Input      =   my_job_1.in
Output     =   my_job_1.out
Error      =   my_job_1.err
Queue                 ·Job 2.1 (cluster 2, process 1)


                              http://www.cs.wisc.edu/condor   38
             Submitting The Job
% condor_submit my_job.submit-file
Submitting job(s).
2 job(s) submitted to cluster 2.
% condor_q
-- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> :
    ID     OWNER    SUBMITTED        RUN_TIME ST PRI SIZE CMD
     1.0   frieda   4/15 06:52    0+00:02:11 R       0    0.0     my_job –a1 –a2
     2.0   frieda    4/15 06:56   0+00:00:00 I       0    0.0     my_job –x 0
     2.1   frieda    4/15 06:56   0+00:00:00 I       0    0.0     my_job –x 1
3 jobs; 2 idle, 1 running, 0 held
%




                                  http://www.cs.wisc.edu/condor                    39
    Back to our 600 jobs…
› We could put all input, output, error &
 log files in the one directory
  One of each type for each job
  That’d be 2400 files (4 files × 600 jobs)
  Difficult to sort through
› Better: Create a subdirectory for
 each run

                   http://www.cs.wisc.edu/condor   40
    Organize your files and
    directories for big runs
› Create subdirectories for each “run”
  run_0, run_1, … run_599
› Create input files in each of these
  run_0/simulation.in
  run_1/simulation.in
  …
  run_599/simulation.in
› The output, error & log files for each job
  will be created by Condor from your job’s
  output

                     http://www.cs.wisc.edu/condor   41
Frieda’s simulation directory
     sim.exe
     sim.sub
                        simulation.in
     run_0             simulation.out
                       simulation.err
                       simulation.log



                        simulation.in
                       simulation.out
     run_599           simulation.err
                       simulation.log



                         http://www.cs.wisc.edu/condor   42
   Submit Description File for
           600 Jobs
# Cluster of   600 jobs with different directories
Universe   =   vanilla
Executable =   sim
Log        =   simulation.log
...
Arguments =    -x 0
InitialDir =   run_0     ·Log, input, output & error files -> run_0
Queue                    ·Job 3.0 (Cluster 3, Process 0)

Arguments = -x 1
InitialDir = run_1       ·Log, input, output & error files -> run_1
Queue                    ·Job 3.1 (Cluster 3, Process 1)

·Do this 598 more times…………



                             http://www.cs.wisc.edu/condor            43
 Submit File for a Big Cluster
           of Jobs
› We just submitted 1 cluster with 600
  processes
› All the input/output files will be in
  different directories
› The submit file is pretty unwieldy (over
  1200 lines)
› Isn’t there a better way?

                  http://www.cs.wisc.edu/condor   44
 Submit File for a Big Cluster
 of Jobs (the better way) #1
› We can queue all 600 in 1 “Queue”
 command
  Queue 600
› Condor provides $(Process) and
 $(Cluster)
  $(Process) will be expanded to the
   process number for each job in the cluster
    • 0, 1, … 599
  $(Cluster) will be expanded to the
   cluster number
    • Will be 4 for all jobs in this cluster

                       http://www.cs.wisc.edu/condor   45
 Submit File for a Big Cluster
 of Jobs (the better way) #2
› The initial directory for each job can
 be specified using $(Process)
  InitialDir = run_$(Process)
  Condor will expand these to “run_0”,
   “run_1”, … “run_599” directories
› Similarly, arguments can be variable
  Arguments = -x $(Process)
  Condor will expand these to “-x 0”,
   “-x 1”, … “-x 599”

                   http://www.cs.wisc.edu/condor   46
   Better Submit File for 600
             Jobs
# Example condor_submit input file that defines
# a cluster of 600 jobs with different directories
Universe   = vanilla
Executable = my_job
Log        = my_job.log
Input      = my_job.in
Output     = my_job.out
Error      = my_job.err
Arguments = –x $(Process)     ·–x 0, -x 1, … -x 599
InitialDir = run_$(Process)   ·run_0 … run_599
Queue 600                     ·Jobs 4.0 … 4.599




                        http://www.cs.wisc.edu/condor   47
        Now, we submit it…
$ condor_submit my_job.submit
Submitting job(s)
  ......................................................
  ......................................................
  ......................................................
  ......................................................
  .......................................
Logging submit event(s)
  ......................................................
  ......................................................
  ......................................................
  ......................................................
  .......................................
600 job(s) submitted to cluster 4.




                        http://www.cs.wisc.edu/condor      48
         And, Check the queue
$ condor_q
-- Submitter:   x.cs.wisc.edu : <128.105.121.53:510> : x.cs.wisc.edu
ID    OWNER     SUBMITTED   RUN_TIME ST PRI SIZE CMD
4.0   frieda    4/20 12:08 0+00:00:05 R 0    9.8 my_job -arg1 –x 0
4.1   frieda    4/20 12:08 0+00:00:03 I 0    9.8 my_job -arg1 –x 1
4.2   frieda    4/20 12:08 0+00:00:01 I 0    9.8 my_job -arg1 –x 2
4.3   frieda    4/20 12:08 0+00:00:00 I 0    9.8 my_job -arg1 –x 3
...
4.598 frieda    4/20 12:08 0+00:00:00 I   0     9.8   my_job -arg1 –x 598
4.599 frieda    4/20 12:08 0+00:00:00 I   0     9.8   my_job -arg1 –x 599

600 jobs; 599 idle, 1 running, 0 held




                               http://www.cs.wisc.edu/condor                49
         Removing jobs
› If you want to remove a job from the
  Condor queue, you use condor_rm
› You can only remove jobs that you
  own
› Privileged user can remove any jobs
  “root” on UNIX
  “administrator” on Windows




                    http://www.cs.wisc.edu/condor   50
 Removing jobs (continued)
› Remove an entire cluster:
  condor_rm 4      ·Removes the whole cluster


› Remove a specific job from a cluster:
  condor_rm 4.0 ·Removes a single job


› Or, remove all of your jobs with “-a”
  condor_rm -a     ·Removes all jobs / clusters



                  http://www.cs.wisc.edu/condor    51
Another Universe




      http://www.cs.wisc.edu/condor   52
More about Condor Universes
› Multiple Condor Universes
  Different feature sets
› We’ve been using the “Vanilla”
 universe
  Can be used to run any serial job
› And, introducing:
  Scheduler
  Local


                   http://www.cs.wisc.edu/condor   53
           Condor Universes:
          Scheduler and Local
› Scheduler Universe
  Plug in a meta-scheduler
  Developed for DAGMan (more later)
  Similar to Globus’s fork job manager
› Local
  Very similar to vanilla, but jobs run on
   the local host
  Has more control over jobs than
   scheduler universe


                    http://www.cs.wisc.edu/condor   54
F(3,4,5)
                  Frieda’s Condor Pool
600 Condor
   jobs

       personal
        Condor
                       Frieda can still only
                      run one job at a time,
        Frieda's
       workstation
                             however.




                      http://www.cs.wisc.edu/condor   55
                 Good News
(Boss Fat Cat)     The Boss says Frieda
                        can add her
                   co-workers’ desktop
                    machines into her
                   Condor pool as well…
                   but only if they can
                     also submit jobs.




                  http://www.cs.wisc.edu/condor   56
             Adding nodes
› Frieda installs Condor on the desktop
 machines, and configures them with
 her machine as the central manager
  The central manager:
    • Central repository for the whole pool
    • Performs job / machine matching, etc.
› These are “non-dedicated” nodes,
 meaning that they can't always run
 Condor jobs

                      http://www.cs.wisc.edu/condor   57
               Frieda’s Condor Pool
600 Condor
   jobs


         Condor Pool    Now, Frieda and her
                        co-workers can run
                       multiple jobs at a time
                            so their work
                         completes sooner.




                       http://www.cs.wisc.edu/condor   58
                  condor_status
% condor_status
Name          OpSys   Arch    State      Activ LoadAv Mem        ActvtyTime
antipholus.cs LINUX   INTEL   Unclaimed Idle     0.020     511   0+02:28:42
coral.cs.wisc LINUX   INTEL   Claimed    Busy    0.990     511   0+01:27:21
doc.cs.wisc.e LINUX   INTEL   Unclaimed Idle     0.260     511   0+00:20:04
dsonokwa.cs.w LINUX   INTEL   Claimed    Busy    0.810     511   0+00:01:45
ferdinand.cs. LINUX   INTEL   Claimed    Suspe 1.130       511   0+00:00:55
vm1@pinguino. LINUX   INTEL   Unclaimed Idle     0.000     255   0+01:03:28
vm2@pinguino. LINUX   INTEL   Unclaimed Idle     0.190     255   0+01:03:29




                               http://www.cs.wisc.edu/condor                  59
How can my jobs
access their data
      files?



          http://www.cs.wisc.edu/condor   60
 Access to Data in Condor
› Use shared filesystem if available
› No shared filesystem?
  Condor can transfer files
    • Can automatically send back changed files
    • Atomic transfer of multiple files
    • Can be encrypted over the wire
  Remote I/O Socket
  Standard Universe can use remote
   system calls (more on this later)


                     http://www.cs.wisc.edu/condor   61
       Condor File Transfer
› ShouldTransferFiles = YES
   Always transfer files to execution site
› ShouldTransferFiles = NO
   Rely on a shared filesystem
› ShouldTransferFiles = IF_NEEDED
   Will automatically transfer the files if the submit and
    execute machine are not in the same FileSystemDomain

Universe   = vanilla
Executable = my_job
Log        = my_job.log
ShouldTransferFiles   = IF_NEEDED
Transfer_input_files = dataset.$(Process), common.data
Transfer_output_files = TheAnswer.dat
Queue 600


                           http://www.cs.wisc.edu/condor      62
   We Always Want More
Condor is managing and
 running our jobs, but
   Our CPU requirements are
    greater than our resources
   Jobs are preempted more
    often than we like




                   http://www.cs.wisc.edu/condor   63
        Happy Day! Frieda’s
      organization purchased a
         Dedicated Cluster!
› Frieda Installs Condor on all
    the dedicated Cluster nodes
›   Frieda also adds a dedicated
    central manager
›   She configures her entire pool
    with this new host as the
    central manager…


                     http://www.cs.wisc.edu/condor   64
             Frieda’s Condor Pool
600 Condor
   jobs

                           With the additional
      Condor Pool
                         resources, Frieda and
                           her co-workers can
                               get their jobs
                              completed even
                    Dedicated
                     Cluster      faster.




                       http://www.cs.wisc.edu/condor   65
What Condor Daemons
 are running on my
machine, and what do
      they do?



           http://www.cs.wisc.edu/condor   66
             condor_master
› Starts up all other Condor daemons
› If there are any problems and a daemon
    exits, it restarts the daemon and sends email
    to the administrator
›   Acts as the server for many Condor remote
    administration commands:
    condor_reconfig, condor_restart,
      condor_off, condor_on,
      condor_config_val, etc.

                      http://www.cs.wisc.edu/condor   67
Condor Daemon Layout
   Personal Condor / Central Manager


              Master

           startd                   negotiator


  schedd
                               collector


    = Process Spawned


                http://www.cs.wisc.edu/condor    68
               Central Manager:
               condor_collector
› Central manager: central repository and match
    maker for whole pool
›   Collects information from all other Condor daemons
    in the pool
     “Directory Service” / Database for a Condor pool
› Each daemon sends a periodic update called a
    “ClassAd” to the collector
›   Services queries for information:
     Queries from other Condor daemons
     Queries from users (condor_status)
› Only on the Central Manager
› At least one collector per pool


                            http://www.cs.wisc.edu/condor   69
 Condor Pool Layout: Collector
= Process Spawned
                    Central Manager
= ClassAd
  Communication
  Pathway
                              Master



                            Collector




                         http://www.cs.wisc.edu/condor   70
            Central Manager:
            condor_negotiator
› Performs “matchmaking” in Condor
› Each “Negotiation Cycle” (typically 5 minutes):
    Gets information from the collector about all available
     machines and all idle jobs
    Tries to match jobs with machines that will serve them
    Both the job and the machine must satisfy each other’s
     requirements
› Only one negotiator per pool
› Only on the Central Manager




                           http://www.cs.wisc.edu/condor       71
Condor Pool Layout: Negotiator
= Process Spawned
                        Central Manager
= ClassAd
  Communication
  Pathway
                                      Master
                    negotiator

                                    Collector




                                 http://www.cs.wisc.edu/condor   72
             Execute Hosts:
             condor_startd
› Execute host: machines that run user jobs
› Represents a machine to the Condor
    system
›   Responsible for starting, suspending, and
    stopping jobs
›   Enforces the wishes of the machine owner
    (the owner’s “policy”… more on this in the
    administrator’s tutorial)
›   Creates a “starter” for each running job
›   One startd runs on each execute node


                     http://www.cs.wisc.edu/condor   73
    Condor Pool Layout: startd
= Process Spawned                                                Cluster Node
                        Central Manager
= ClassAd
  Communication                                                   Master
  Pathway
                                      Master
                    negotiator                                      startd

                                                                 Cluster Node
                                    Collector
                                                                  Master

                                                                    startd




                                 http://www.cs.wisc.edu/condor                  74
               Submit Hosts:
               condor_schedd
› Submit hosts: machines that users can submit
    jobs on
›   Maintains the persistent queue of jobs
›   Responsible for contacting available machines and
    sending them jobs
›   Services user commands which manipulate the job
    queue:
     condor_submit,condor_rm, condor_q, condor_hold,
      condor_release, condor_prio, …
› Creates a “shadow” for each running job
› One schedd runs on each submit host


                          http://www.cs.wisc.edu/condor   75
    Condor Pool Layout: schedd
= Process Spawned                                                Cluster Node
                        Central Manager
= ClassAd
  Communication                                                   Master
  Pathway
                                      Master
                    negotiator                                      startd

                    schedd                                       Cluster Node
                                    Collector
                                                                  Master

                                         Desktop                    startd
             Desktop
             Master                      Master

               startd                        startd

            schedd                      schedd



                                 http://www.cs.wisc.edu/condor                  76
    Condor Pool Layout: master
= Process Spawned                                                Cluster Node
                        Central Manager
= ClassAd
  Communication                                                   Master
  Pathway
                                      Master
                    negotiator                                      startd

                    schedd                                       Cluster Node
                                    Collector
                                                                  Master

                                           Desktop                  startd
             Desktop
             Master                       Master
               startd                           startd
             schedd                        schedd



                                 http://www.cs.wisc.edu/condor                  77
             Now what?
› Some of the machines in the pool can’t
 run my jobs
  Not enough RAM
  Not enough scratch disk space
  Required software not installed
  Etc.




                   http://www.cs.wisc.edu/condor   78
     Specify Requirements
› An expression (syntax similar to C or Java)
› Must evaluate to True for a match to be
  made
 Universe   =   vanilla
 Executable =   my_job
 Log        =   my_job.log
 InitialDir =   run_$(Process)
 Requirements   = Memory >= 256 && Disk > 10000
 Queue 600




                        http://www.cs.wisc.edu/condor   79
    Advanced Requirements
› Requirements can match custom attributes in your
  Machine Ad
    Can be added by hand to each machine
    Or, automatically using the “Hawkeye” mechanism

 Universe   = vanilla
 Executable = my_job
 Log        = my_job.log
 InitialDir = run_$(Process)
 Requirements = Memory >= 256 && Disk > 10000 \
 && ( HaveProg =!= UNDEFINED && HaveProg) )
 Queue 600



                          http://www.cs.wisc.edu/condor   80
         And, Specify Rank
› All matches which meet the requirements
    can be sorted by preference with a Rank
    expression.
›   Higher the Rank, the better the match
Universe   = vanilla
Executable = my_job
Log        = my_job.log
Arguments = -arg1 –arg2
InitialDir = run_$(Process)
Requirements = Memory >= 256 && Disk > 10000
Rank = (KFLOPS*10000) + Memory
Queue 600


                     http://www.cs.wisc.edu/condor   81
My jobs aren’t running!!




          http://www.cs.wisc.edu/condor   82
               Check the queue
› Check the queue with condor_q:
bash-2.05a$ condor_q
-- Submitter: x.cs.wisc.edu : <128.105.121.53:510> :x.cs.wisc.edu
ID OWNER    SUBMITTED    RUN_TIME ST PRI SIZE CMD
5.0 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 0
5.1 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 1
5.2 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 2
5.3 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 3
5.4 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 4
5.5 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 5
5.6 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 6
5.7 frieda 4/20 12:23 0+00:00:00 I 0     9.8 my_job -arg1 –n 7
6.0 frieda 4/20 13:22 0+00:00:00 H 0     9.8 my_job -arg1 –arg2
8 jobs; 8 idle, 0 running, 1 held




                              http://www.cs.wisc.edu/condor         83
        Look at jobs on hold
% condor_q –hold
-- Submiter: x.cs.wisc.edu : <128.105.121.53:510>
   :x.cs.wisc.edu
 ID       OWNER           HELD_SINCE HOLD_REASON
 6.0    frieda            4/20 13:23 Error from starter
   on vm1@skywalker.cs.wisc

9 jobs; 8 idle, 0 running, 1 held


Or, See full details for a job
% condor_q –l 6.0


                         http://www.cs.wisc.edu/condor    84
         Check machine status
› Verify that there are idle machines with condor_status:
bash-2.05a$   condor_status
Name          OpSys Arch State      Activity    LoadAv Mem      ActvtyTime
vm1@tonic.c   LINUX INTEL Claimed   Busy        0.000   501     0+00:00:20
vm2@tonic.c   LINUX INTEL Claimed   Busy        0.000   501     0+00:00:19
vm3@tonic.c   LINUX INTEL Claimed   Busy        0.040   501     0+00:00:17
vm4@tonic.c   LINUX INTEL Claimed   Busy        0.000   501     0+00:00:05

              Total Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX       4     0       4         0       0          0
      Total       4     0       4         0       0          0




                                http://www.cs.wisc.edu/condor                85
                Look in Job Log
› Look in your job log for clues:
bash-2.05a$ cat my_job.log
000 (031.000.000) 04/20 14:47:31 Job submitted from host:
  <128.105.121.53:48740>
...
007 (031.000.000) 04/20 15:02:00 Shadow exception!
        Error from starter on gig06.stat.wisc.edu: Failed
  to open '/scratch.1/frieda/workspace/v67/condor-
  test/test3/run_0/my_job.in' as standard input: No such
  file or directory (errno 2)
        0 - Run Bytes Sent By Job
        0 - Run Bytes Received By Job
...




                              http://www.cs.wisc.edu/condor   86
      Still not running?
   Exercise a little patience
› On a busy pool, it can take a while
  to match and start your jobs
› Wait at least a negotiation cycle
  or two (typically 5 minutes)




                http://www.cs.wisc.edu/condor   87
    Look to condor_q for help:
        condor_q -analyze
bash-2.05a$ condor_q -ana 29
---
029.000: Run analysis summary. Of 1243 machines,
   1243 are rejected by your job's requirements
0 are available to run your job

WARNING: Be advised:
   No resources matched request's constraints
   Check the Requirements expression below:
Requirements = ((Memory > 8192)) && (Arch == "INTEL") &&
  (OpSys == "LINUX") && (Disk >= DiskUsage) &&
  (TARGET.FileSystemDomain == MY.FileSystemDomain)




                         http://www.cs.wisc.edu/condor     88
   Better analysis (Linux only):
    condor_q –better-analyze
bash-2.05a$ condor_q -better-ana 29
The Requirements expression for your job is:

( ( target.Memory > 8192 ) ) && ( target.Arch == "INTEL" ) &&
( target.OpSys == "LINUX" ) && ( target.Disk >= DiskUsage ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition                       Machines Matched Suggestion
---------                       ---------------- ----------
1 ( ( target.Memory > 8192 ) ) 0                   MODIFY TO 4000
2 ( TARGET.FileSystemDomain == "cs.wisc.edu" )584
3   ( target.Arch == "INTEL" ) 1078
4   ( target.OpSys == "LINUX" ) 1100
5   ( target.Disk >= 13 )       1243



                            http://www.cs.wisc.edu/condor           89
             Learn about available
                  resources:
bash-2.05a$ condor_status –const 'Memory > 8192'
(no output means no matches)
bash-2.05a$ condor_status -const 'Memory > 4096'
Name         OpSys     Arch       State       Activ LoadAv Mem     ActvtyTime
vm1@s0-03.cs. LINUX    X86_64 Unclaimed Idle        0.000   5980   1+05:35:05
vm2@s0-03.cs. LINUX    X86_64 Unclaimed Idle        0.000   5980 13+05:37:03
vm1@s0-04.cs. LINUX    X86_64 Unclaimed Idle        0.000   7988   1+06:00:05
vm2@s0-04.cs. LINUX    X86_64 Unclaimed Idle        0.000   7988 13+06:03:47


                  Total Owner Claimed Unclaimed Matched Preempting
   X86_64/LINUX       4       0           0         4        0            0
          Total       4       0           0         4        0            0




                                          http://www.cs.wisc.edu/condor         90
      Job Policy Expressions
› User can supply job policy expressions in
    the submit file.
›   Can be used to describe a successful run.
     on_exit_remove = <expression>
     on_exit_hold = <expression>
     periodic_remove = <expression>
     periodic_hold = <expression>



                      http://www.cs.wisc.edu/condor   91
        Job Policy Examples
› Do not remove if exits with a signal:
   on_exit_remove = ExitBySignal == False

› Place on hold if exits with nonzero status
 or ran for less than an hour:
  on_exit_hold = ( (ExitBySignal==False)
    && (ExitSignal != 0) ) || (
    (ServerStartTime - JobStartDate) < 3600)
› Place on hold if job has spent more than
 50% of its time suspended:
   periodic_hold = CumulativeSuspensionTime >
    (RemoteWallClockTime / 2.0)
                         http://www.cs.wisc.edu/condor   92
  Insert ClassAd attributes
› Special purpose usage
› In the submit description file,
 introduce an attribute for the job
  +Department = biochemistry
 causes the ClassAd to contain
  Department = ”biochemistry”



                  http://www.cs.wisc.edu/condor   93
We’ve seen how Condor can:
 Keep an eye on your jobs and will
  keep you posted on their progress
 Implement your policy on the
  execution order of the jobs
 Keep a log of your job activities




               http://www.cs.wisc.edu/condor   94
 My new jobs run for 20
         days…

› What happens when a job is
 forced off it’s CPU?
   Preempted by higher priority
   user or job
  Vacated because of user activity
› How can I add fault tolerance
 to my jobs?


                    http://www.cs.wisc.edu/condor   95
      Run them in
Todd’s Private Universe?




          http://www.cs.wisc.edu/condor   96
 Condor’s Standard Universe
        to the rescue!
› Support for transparent process
  checkpoint and restart
› Remote system calls (remote I/O)
   Your job can read / write files
    as if they were local



               http://www.cs.wisc.edu/condor   97
       Remote System Calls in
       the Standard Universe

› I/O system calls are trapped and sent back
 to the submit machine
  Examples: open a file, write to a file
› No source code changes typically required
› Programming language independent




                         http://www.cs.wisc.edu/condor   98
    Process Checkpointing in the
         Standard Universe
› Condor’s process checkpointing provides a
    mechanism to automatically save the
    state of a job
›   The process can then be restarted from
    right where it was checkpointed
    After preemption, crash, etc.




                      http://www.cs.wisc.edu/condor   99
           Checkpointing:
           Process Starts
checkpoint: the entire state of a program,
  saved in a file
   CPU registers, memory image, I/O

                                            time




                     http://www.cs.wisc.edu/condor   100
   Checkpointing:
Process Checkpointed
                              time




     1          2               3




         http://www.cs.wisc.edu/condor   101
Checkpointing:
Process Killed
               Killed!                  time

                              3




    3




        http://www.cs.wisc.edu/condor          102
 Checkpointing:
Process Resumed
 goodput     badput                        goodput
                                time

                                 3




       3




           http://www.cs.wisc.edu/condor             103
         When will Condor
       checkpoint your job?
› Periodically, if desired
    For fault tolerance
› When your job is preempted by a higher
    priority job
›   When your job is vacated because the
    execution machine becomes busy
›   When you explicitly run condor_checkpoint,
    condor_vacate, condor_off or
    condor_restart command

                       http://www.cs.wisc.edu/condor   104
         Making the Standard
           Universe Work
› The job must be relinked with Condor’s
    standard universe support library
›   To relink, place condor_compile in front of
    the command used to link the job:
    % condor_compile gcc -o myjob myjob.c
    - OR -
    % condor_compile f77 -o myjob filea.f fileb.f
    - OR -
    % condor_compile make –f MyMakefile



                       http://www.cs.wisc.edu/condor   105
           Limitations of the
           Standard Universe
› Condor’s checkpointing is not at the kernel
  level.
  Standard Universe the job may not:
      • Fork()
      • Use kernel threads
      • Use some forms of IPC, such as pipes and shared
        memory
› Must have access to source code to relink
› Many typical scientific jobs are OK


                         http://www.cs.wisc.edu/condor    106
      Connecting Condors

› Frieda knows people with
  their own Condor pools, and
  gets permission to use
  their computing resources…
› How can Condor help her do
  this?



                 http://www.cs.wisc.edu/condor   107
        Connect Condors
         with Flocking

› Frieda configures her Condor pool
  to “flock” to her friend’s pool.
› Flocking is a Condor-specific
  technology.




                   http://www.cs.wisc.edu/condor   108
      Frieda’s Condor Pool
600 Condor
   jobs


         Condor Pool

                                    Friendly
                                   Condor Pool




                       http://www.cs.wisc.edu/condor   109
      Frieda meets The Grid
› Frieda also has access to grid resources
    she wants to use
    She has certificates and access to Globus or
      other resources at remote institutions
› But Frieda wants Condor’s queue
    management features for her jobs!
›   She installs Condor so she can submit “Grid
    Universe” jobs to Condor




                        http://www.cs.wisc.edu/condor   110
         “Grid” Universe
› All handled in your submit file
› Supports a number of “back end” types:
  Globus: GT2, GT3, GT4
  NorduGrid
  UNICORE
  Condor
  PBS
  LSF



                   http://www.cs.wisc.edu/condor   111
Grid Universe & Globus 2/3
› Used for a Globus GT2 / GT3 back-end
   “Condor-G”
› Format:
Grid_Resource = (gt2|gt3) Head-Node
Globus_rsl = <RSL-String>
› Example:
Universe = grid
Grid_Resource = gt2 beak.cs.wisc.edu/jobmanager
Globus_rsl = (queue=long)(project=atom-smasher)




                        http://www.cs.wisc.edu/condor   112
  Grid Universe & Globus 4
› Used for a Globus GT4 back-end
› Format:
Grid_Resource = gt4 <Head-Node> <Scheduler-Type>
Globus_XML = <XML-String>

› Example:
Universe = grid
Grid_Resource = gt4 beak.cs.wisc.edu Condor
Globus_xml = <queue>long</queue><project>atom-
  smasher</project>



                      http://www.cs.wisc.edu/condor   113
    Grid Universe & Condor
› Used for a Condor back-end
  “Condor-C”
› Format:
Grid_Resource = condor <Schedd-Name> <Collector-Name>
Remote_<param> = <value>
  “Remote_” part is stripped off
› Example:
Universe = grid
Grid_Resource = condor beak condor.cs.wisc.edu
Remote_Universe = standard




                      http://www.cs.wisc.edu/condor   114
Grid Universe & NorduGrid
› Used for a NorduGrid back-end
Grid_Resource = nordugrid <Host-Name>
› Example:
Universe = grid
Grid_Resource = nordugrid ngrid.cs.wisc.edu




                  http://www.cs.wisc.edu/condor   115
 Grid Universe & UNICORE
› Used for a UNICORE back-end
› Format:
Grid_Resource = unicore <USite> <VSite>
› Example:
Universe = grid
Grid_Resource = unicore uhost.cs.wisc.edu vhost




                      http://www.cs.wisc.edu/condor   116
     Grid Universe & PBS
› Used for a PBS back-end
› Format:
Grid_Resource = pbs
› Example:
Universe = grid
Grid_Resource = pbs




                  http://www.cs.wisc.edu/condor   117
    Grid Universe & LSF
› Used for a LSF back-end
› Format:
Grid_Resource = lsf
› Example:
Universe = grid
Grid_Resource = lsf




               http://www.cs.wisc.edu/condor   118
       Credential Management
› Condor will do The Right Thing™ with your
    X509 certificate and proxy
›   Override default proxy:
     X509UserProxy = /home/frieda/other/proxy
› Proxy may expire before jobs finish
    executing
    Condor can use MyProxy to renew your proxy
    When a new proxy is available, Condor will
     forward the renewed proxy to the job
    This works for non-grid jobs, too



                       http://www.cs.wisc.edu/condor   119
My jobs have have
 dependencies…
Can Condor help solve my
  dependency problems?




             http://www.cs.wisc.edu/condor   120
Frieda learns DAGMan
› Directed Acyclic Graph Manager
› DAGMan allows you to specify the
  dependencies between your Condor jobs, so
  it can manage them automatically for you.

› (e.g., “Don’t run job “B” until job “A” has
  completed successfully.”)


                     http://www.cs.wisc.edu/condor   121
            What is a DAG?
› A DAG is the data structure                      Job
  used by DAGMan to represent                       A
  these dependencies.

› Each job is a “node” in the        Job                 Job
  DAG.                                B                   C

› Each node can have any                           Job
  number of “parent” or                             D
  “children” nodes – as long as
  there are no loops!


                        http://www.cs.wisc.edu/condor          122
            Defining a DAG
› A DAG is defined by a .dag file, listing each of its
  nodes and their dependencies:                   Job A
   # diamond.dag
   Job A a.sub
   Job B b.sub                          Job B             Job C
   Job C c.sub
   Job D d.sub
   Parent A Child B C
   Parent B C Child D                             Job D

› each node will run the Condor job specified by its
  accompanying Condor submit file



                        http://www.cs.wisc.edu/condor             123
        Submitting a DAG
› To start your DAG, just run
 condor_submit_dag with your .dag file,
 and Condor will start a personal DAGMan
 daemon which to begin running your jobs:
  % condor_submit_dag diamond.dag

› condor_submit_dag is run by the schedd
  DAGMan daemon itself is “watched” by Condor,
    so you don’t have to


                      http://www.cs.wisc.edu/condor   124
          Running a DAG
› DAGMan acts as a “meta-scheduler”,
 managing the submission of your jobs to
 Condor based on the DAG dependencies.
                              A
                                                  .dag
   Condor A          B                C           File
   Job
   Queue
                 DAGMan D



                  http://www.cs.wisc.edu/condor          125
   Running a DAG (cont’d)
› DAGMan holds & submits jobs to the
 Condor queue at the appropriate times.

                              A

   Condor B          B                C
   Job
   Queue C
                 DAGMan D



                  http://www.cs.wisc.edu/condor   126
    Running a DAG (cont’d)
› In case of a job failure, DAGMan continues until it
  can no longer make progress, and then creates a
  “rescue” file with the current state of the DAG.

                                   A

    Condor                 B               X
    Job                                                Rescue
    Queue                                               File
                     DAGMan D



                       http://www.cs.wisc.edu/condor            127
        Recovering a DAG
› Once the failed job is ready to be re-run,
  the rescue file can be used to restore the
  prior state of the DAG.
                                A
                                                    Rescue
   Condor               B               C            File
   Job
   Queue C
                  DAGMan D



                    http://www.cs.wisc.edu/condor            128
 Recovering a DAG (cont’d)
› Once that job completes, DAGMan will
 continue the DAG as if the failure never
 happened.
                               A

   Condor              B               C
   Job
   Queue D
                 DAGMan D



                   http://www.cs.wisc.edu/condor   129
            Finishing a DAG
› Once the DAG is complete, the DAGMan
 job itself is finished, and exits.

                                A

   Condor               B               C
   Job
   Queue
                  DAGMan D



                    http://www.cs.wisc.edu/condor   130
      Additional DAGMan
           Features
› Provides other handy features
 for job management…
  nodes can have PRE & POST scripts
  failed nodes can be automatically re-
   tried a configurable number of times
  job submission can be “throttled”


                   http://www.cs.wisc.edu/condor   131
       General User Commands
›   condor_status               View Pool Status
›   condor_q                    View Job Queue
›   condor_submit               Submit new Jobs
›   condor_rm                   Remove Jobs
›   condor_prio                 Intra-User Prios
›   condor_history              Completed Job Info
›   condor_submit_dag           Submit new DAG
›   condor_checkpoint           Force a checkpoint
›   condor_compile              Link Condor library


                        http://www.cs.wisc.edu/condor   132
    Condor Job Universes
• Serial Jobs           • Parallel Jobs
  • Vanilla Universe         • MPI Universe
  • Standard                 • PVM Universe
    Universe                 • Parallel Universe
  • Grid Universe
  • Scheduler
  • Local Universe
  • Java Universe

                   http://www.cs.wisc.edu/condor   133
    Why have a special
  Universe for Java jobs?
› Java Universe provides more than just
 inserting “java” at the start of the execute
 line of a vanilla job:
  Knows which machines have a JVM installed
  Knows the location, version, and performance of
   JVM on each machine
  Knows about jar files, etc.
  Provides more information about Java job
   completion than just JVM exit code
     • Program runs in a Java wrapper, allowing Condor to
       report Java exceptions, etc.



                         http://www.cs.wisc.edu/condor      134
       Universe Java Job
› Example Java Universe Submit file:
Universe = java
Executable = Main.class
jar_files = MyLibrary.jar
Input = infile
Output = outfile
Arguments = Main 1 2 3
Queue


                   http://www.cs.wisc.edu/condor   135
               Java support, cont.
bash-2.05a$   condor_status –java
  Name         JavaVendor Ver     State         Actv   LoadAv Mem
abulafia.cs   Sun Microsy 1.5.0_ Claimed        Busy   0.180   503
acme.cs.wis   Sun Microsy 1.5.0_ Unclaimed      Idle   0.000   503
adelie01.cs   Sun Microsy 1.5.0_ Claimed        Busy   0.000 1002
adelie02.cs   Sun Microsy 1.5.0_ Claimed        Busy   0.000 1002
…
                  Total Owner Claimed Unclaimed Matched Preempting
    INTEL/LINUX    965 179     516       250      20          0
  INTEL/WINNT50    102    6     65        31       0          0
SUN4u/SOLARIS28      1    0      0         1       0          0
   X86_64/LINUX    128    2    106        20       0          0

          Total    1196 187     687          302          20         0



                              http://www.cs.wisc.edu/condor              136
Frieda wants Condor features
     on remote resources
› She wants to run standard universe
 jobs on Grid-managed resources
  For matchmaking and dynamic scheduling
   of jobs
  For job checkpointing and migration
  For remote system calls




                  http://www.cs.wisc.edu/condor   137
        Condor GlideIn
› Frieda can use the Grid Universe to run
    Condor daemons on Grid resources
›   When the resources run these GlideIn
    jobs, they will temporarily join her Condor
    Pool
›   She can then submit Standard, Vanilla,
    PVM, or MPI Universe jobs and they will be
    matched and run on the remote resources
›   Currently only supports Globus GT2
    We hope to fix this limitation



                       http://www.cs.wisc.edu/condor   138
                                                    Remote
                                                     Grid
 600
Condor
 jobs

     Condor Pool       PBS                          LSF

glide-in
 jobs
       Friendly
      Condor Pool
                     Condor




                    http://www.cs.wisc.edu/condor            139
Condor jobs
               How It Works
      Personal Condor                     Remote Resource
                Master                              Manager

     Collector &         Schedd                       LSF
     Negotiator
                                                     Startd

             Grid        Shadow                      Starter
            Manager
                                                     User Job
  GlideIn
   jobs



                            http://www.cs.wisc.edu/condor       140
           GlideIn Concerns
› What if the remote resource kills my GlideIn job?
    That resource will disappear from your pool and your jobs
     will be rescheduled on other machines
    Standard universe jobs will resume from their last
     checkpoint like usual
› What if all my jobs are completed before a
  GlideIn job runs?
    If a GlideIn Condor daemon is not matched with a job in
     10 minutes, it terminates, freeing the resource




                           http://www.cs.wisc.edu/condor         141
             In Review
With Condor’s help, Frieda can:
  Manage her compute job workload
  Access local machines
  Access remote Condor Pools via flocking
  Access remote compute resources on
   the Grid via “Grid Universe” jobs
  Carve out her own personal Condor Pool
   from the Grid with GlideIn technology


                  http://www.cs.wisc.edu/condor   142
Advanced Topics




      http://www.cs.wisc.edu/condor   143
     Administrator Commands
›   condor_vacate                 Leave a machine now
›   condor_on                     Start Condor
›   condor_off                    Stop Condor
›   condor_reconfig               Reconfig on-the-fly
›   condor_config_val             View/set config
›   condor_userprio               User Priorities
›   condor_stats                  View detailed usage
                                   accounting stats


                        http://www.cs.wisc.edu/condor   144
My boss wants to watch what
      Condor is doing




           http://www.cs.wisc.edu/condor   145
              Use CondorView!
› Provides visual graphs of current and past
    utilization
›   Data is derived from Condor's own accounting
    statistics
›   Interactive Java applet
›   Quickly and easily view:
     How much Condor is being used
     How many cycles are being delivered
     Who is using them
     Utilization by machine platform or by user



                            http://www.cs.wisc.edu/condor   146
CondorView Usage Graph




         http://www.cs.wisc.edu/condor   147
        A Common Question
› My Personal Condor is flocking with a bunch of
  Solaris and Linux machines, and also doing a
  GlideIn to a SGI O2K. I do not want to statically
  partition my jobs.
    Solution: In your submit file, specify:
    Executable = myjob.$$(OpSys).$$(Arch)
    Requirements = (Arch==“INTEL” && OpSys==“LINUX”)\
      ||(Arch==“SUN4u” && OpSys==“SOLARIS8” )\
      ||(Arch==“SGI” && OpSys==“IRIX65”)
    The “$$(xxx)” notation is replaced with attributes from
     the machine ClassAd which was matched with your job.




                          http://www.cs.wisc.edu/condor        148
      Thank you!
  Check us out on the Web:
http://www.condorproject.org

          Email:
 condor-admin@cs.wisc.edu



           http://www.cs.wisc.edu/condor   149

						
Related docs
Other docs by rob14866
Beta Agreement and Customer Reference
Views: 22  |  Downloads: 0
Beverage Template - Excel
Views: 61  |  Downloads: 0
Beverage Royalty Agreement
Views: 31  |  Downloads: 0
Beverage Industry Corporate Profile - PDF
Views: 20  |  Downloads: 0
Best Chance Roulette Strategy Guide
Views: 36  |  Downloads: 0
Best Proposal Presentation
Views: 13  |  Downloads: 0
Best Templates Ppt
Views: 35  |  Downloads: 0