Interactive Condor Tutorial

Reviews
Shared by: techmaster
Stats
views:
90
rating:
not rated
reviews:
0
posted:
10/29/2008
language:
pages:
0
Interactive Condor Tutorial Carey Kireyev ckireyev@cs.wisc.edu Boulder ACM Seminar June 12, 2004 1 Creating a Grid with Condor In this tutorial we will walk though the complete process of setting up a real computing Grid with Condor, including:      Installing Condor software on your machine Starting Condor on your machine Connecting to the pool Using Condor tools Submitting a job and watching it execute remotely Oct 16, 2004 Interactive Condor Tutorial 2 Creating a Grid with Condor Requirements:  Computer(s) running Windows, Linux, MacOS  Network / DNS  Windows users:   Unless you have SP2, stop Firewall service: Control Panel  Admin Tools  Services  Internet Connection Firewall (ICF) / Internet Connection Sharing (ICS)  Stop  Condor Software Oct 16, 2004 Interactive Condor Tutorial 3 Downloading Condor  Go to Condor website  http://www.condorproject.org   Click on “Download Condor Software” Choose “Condor 6.6.7”  Latest stable release Subscribe to “condor-world/condor-users” For UNIX/Linux, pick dynamic executable (2nd column) Oct 16, 2004 Interactive Condor Tutorial 4  Fill in user information   Pick the correct operating system  Installing Condor on Windows On Windows XP/2000/NT: 1. Run downloaded InstallShield executable 2. Enter your name, company name 3. Pick “Joining an existing pool” 4. Enter the name of the central manager: central-manager.gridtutorial.com 5. Answer “yes” to:   Submit jobs to the Condor pool Allow jobs to Run on this machine 6. Pick installation directory, e.g.: C:\Condor Oct 16, 2004 Interactive Condor Tutorial 5 Installing Condor on Windows, cont. 7.  Java Universe: Leave blank Can always configure later 8. 9. 10. Email/SMTP: Leave blank Domain: gridtutorial.com Access: Read (who can query jobs?): *.gridtutorial.com Write (who can submit jobs?): *.gridtutorial.com Admin (who can reconfigure?): Leave blank 11. 12. 13. Pick “Always run Condor jobs” Pick “Leave job in memory” After installation, go to Control Panel  Admin Tools  Services, make sure Condor service is started Oct 16, 2004 Interactive Condor Tutorial 6 Installing Condor on Linux 1.  Untar the tarball: tar xzf condor-6.6.7-linux-x86-glibc23-dynamic.tar.gz 2.  Install: ./condor_configure     –install –install-dir=/opt/condor –central-manager=central-manager.gridtutorial.com --type=submit,execute 3. 4. 5.  Set $CONDOR_CONFIG to /opt/condor/etc/condor_config Add /opt/condor/bin, /opt/condor/sbin to your $PATH Start Condor: /opt/condor/sbin/condor_master Oct 16, 2004 Interactive Condor Tutorial 7 Condor daemons  Look at the processes running on your computer   Linux: type ps –x Windows: Ctrl-Alt-Del Task Manager  Processes  You should see the following processes:    condor_master – manages other Condor daemons condor_schedd – keeps track of your job queue condor_startd – runs jobs on your machine  SchedD allows you to submit jobs, StartD allows jobs to be executed. You can configure which capabilities you want for each machine in your pool. Oct 16, 2004 Interactive Condor Tutorial 8 Configuration files   Condor has many “knobs” On Windows:   C:\Condor\condor_config C:\Condor\condor_config.local /opt/condor/etc/condor_config /opt/condor/local.myhost/condor_config.local  On Linux:    For a pool, the common configuration entries are usually in shared file, machine-specific entries in local file Oct 16, 2004 Interactive Condor Tutorial 9 Logs      Daemons log all their events. Logs are useful for troubleshooting On Windows C:\Condor\log\... On Linux /opt/condor/local.myhost/log/… Logs rotate automatically after they reach a certain size Log detail can be changed by config entries, e.g.:   SCHEDD_DEBUG = D_FULLDEBUG see Condor manual for more debug flags Oct 16, 2004 Interactive Condor Tutorial 10 Using Condor tools   Go to a shell prompt  Windows: Start  Run cmd Check your job queue: condor_q  condor_q is in the Condor bin directory C:\>condor_q -- Submitter: IBM-F9D9C420761 : <128.105.48.190:3292> : IBM-F9D9C420761 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held  There are currently no jobs in your queue  There will be when you submit them Oct 16, 2004 Interactive Condor Tutorial 11 Using Condor tools  Look at your pool: condor_status State Claimed Claimed Owner Activity Busy Busy Idle LoadAv Mem 0.990 1.000 0.000 500 1004 500 ActivityTime 0+01:00 0+00:34 0+06:40 C:\> condor_status Name OpSys Arch purple.gridtutori LINUX INTEL central-manager.g LINUX INTEL yellow.gridtutori WINNT INTEL   This information is collected by Collector condor_status has many options   condor_status –long to see detailed information condor_status –total to see totals Oct 16, 2004 Interactive Condor Tutorial 12 Submitting a job to Condor pool  Prepare your job      Must be able to run in the background: no interactive input, windows, GUI, etc. Can still use stdout, stderr, stdin (the keyboard and the screen), but files are used for these instead of the actual devices Make sure dynamic libraries are available or don’t use them at all! Scripts ok, but make sure interpreter is installed! Organize data files Oct 16, 2004 Interactive Condor Tutorial 13 Simple Job (Windows) @echo off echo “Hello, I’m a grid job” echo “I am running on a machine:” hostname echo “The time is:” date /T  Save to disk, e.g.:  C:\Condor\submit\my_job.bat Oct 16, 2004 Interactive Condor Tutorial 14 Simple Job (Linux) #!/bin/sh echo “Hello, I’m a grid job” echo “I am running on a machine:” hostname echo “The time is:” date  Save to disk: e.g.  /opt/condor/submit/my_job.sh  chmod +x my_job.sh Oct 16, 2004 Interactive Condor Tutorial 15 Creating a Submit Description File   A plain ASCII text file Tells Condor about your job:          executable universe input, output and error files user log command-line arguments environment variables any special requirements or preferences custom attributes Useful macros for submitting multiple runs of different datasets Oct 16, 2004 Interactive Condor Tutorial 16 Creating a Submit Description File  Let’s create in:   C:\Condor\submit\my_job.submit (Windows) /opt/condor/submit/my_job.submit (Linux) Universe=vanilla Executable=my_job.sh #(Win: my_job.bat) Output=my_job.output Error=my_job.error Log=my_job.log should_transfer_files = YES when_to_transfer_output = ON_EXIT Queue Oct 16, 2004 Interactive Condor Tutorial 17 Windows users: storing credentials  Users with Windows must first run:   Make sure your account is password-protected condor_store_cred add  Enter password for your account  Why? condor_ store_cred stores the password of a user/domain pair securely in the Windows registry. Using this stored password, Condor is able to run jobs with the user ID of the submitting user when running scheduler universe jobs and DAGMan. In addition, Condor uses this password to acquire the submitting user's credentials when writing output or log files. The password is stored in the same manner as the system does when setting or changing account passwords. Oct 16, 2004 Interactive Condor Tutorial 18 Running condor_submit   You give condor_submit the name of the submit file you have created condor_submit does the following:      parses the file checks for errors creates a “ClassAd” that describes your job(s) sends your job’s ClassAd(s) and executable to the SchedD, which stores the job in its queue SchedD reports the job ad(s) to the Central Manager, which tries to match it with a resource  condor_submit my_job.description Oct 16, 2004 Interactive Condor Tutorial 19 See your job in the queue  condor_q C:\>condor_q -- Submitter: IBM-F9D9C420761 : <128.105.48.190:3292> : IBMF9D9C420761 ID 2.0 OWNER ckireyev SUBMITTED 10/16 12:59 RUN_TIME 0+0:00:00 ST PRI SIZE CMD I 0 372.3 my_job.ba 1 jobs; 1 idle, 0 running, 0 held   You can see your job in the job queue Job is in the “I” (idle) state – it has not started running yet Oct 16, 2004 Interactive Condor Tutorial 20 See the job’s ClassAd  condor_q –l (long view) MyType = "Job" Owner = "ckireyev" Cmd = “/opt/condor-6.9./submit/my_job.sh" UserLog = “/opt/condor-6.6.7submit/my_job.log" In = "/dev/null" Out = "my_job.output" Err = "my_job.error" Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (TARGET.FileSystemDomain == MY.FileSystemDomain) . . . Oct 16, 2004 Interactive Condor Tutorial 21 Job Requirements  Notice the Requirements clause:  Requirements = (Arch == "INTEL") && (OpSys == "LINUX") …   Automatically generated, to ensure that job is run on a compatible OS/architecture User can specify custom requirements in submit file  e.g. HasJava = true, Disk > 10000, Name = “m1.grid.com” e.g. “Don’t run large executables”  Machines can have requirements/preferences too  Oct 16, 2004 Interactive Condor Tutorial 22 Job user log  Specified in submit file.  Optional but highly recommended!  Documents major milestones in job’s lifetime [/opt/condor-6.6.7/submit] cat my_job.log 000 (002.000.000) 10/12 15:50:47 Job submitted from host: <128.105.121.21:45261> ...  Well-defined format  Can be “monitored” by a script  Condor can email user log to submitter when job completes or fails Oct 16, 2004 Interactive Condor Tutorial 23 Meanwhile, the gears are turning…  Meanwhile:       SchedD reports jobs to Collector Negotiator retrieves the jobs from Collector … analyzes all the job ClassAds, ... compares them with all machine ads …finds machines to run jobs on (“matchmaking”) … notifies SchedD’s about StartD’s available to them  To see the “matching process” look in the log   e.g. /opt/condor-6.6.7/local./log/NegotiatorLog on Central Manager machine Oct 16, 2004 Interactive Condor Tutorial 24 See job run…  condor_q - notice the job in “R” state: [/opt/condor-6.6.7/submit] condor_q ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 2.0 ckireyev 10/16 12:59 0+21:29:33 R 0 372.3 my_job.sh 1 jobs; 1 idle, 0 running, 0 held  Notice the new event in user log: [/opt/condor-6.6.7/submit] cat my_job.log 000 (002.000.000) 10/12 15:50:47 Job submitted from host: <128.105.121.21:45261> ... 001 (002.000.000) 10/12 15:55:55 Job executing on host: <128.105.121.21:45260> ... Oct 16, 2004 Interactive Condor Tutorial 25 If all goes well…  Eventually you can see that the job is finished: [/opt/condor-6.6.7/submit] cat my_job.log ... 005 (002.000.000) 10/12 15:55:55 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 Usr 0 00:00:00, Sys 0 00:00:00 Usr 0 00:00:00, Sys 0 00:00:00 Usr 0 00:00:00, Sys 0 00:00:00 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job Run Remote Usage Run Local Usage Total Remote Usage Total Local Usage Oct 16, 2004 Interactive Condor Tutorial 26 Job output…  Notice the output file (job’s stdout): [/opt/condor-6.6.7/submit] cat my_job.output Hello, I'm a grid job I am running on a machine: blue.gridtutorial.com The time is: Tue Oct 12 16:16:36 CDT 2004   Similary the job’s stderr is in my_job.error If the job had created any other output files during it’s execution on the remote machine, they would also have been transferred back when job completes Oct 16, 2004 Interactive Condor Tutorial 27 What if your job doesn’t run?   Ideally, everything works great. But in the Grid world, problems are common. Your job may not run for many reasons:  There are no available machines in the pool   They’re running other Condor jobs or used by owner They don’t fit your job’s requirements    Network communication errors (can’t connect to Central Manager, remote host) Authentication errors File permissions errors Oct 16, 2004 Interactive Condor Tutorial 28 condor_q -analyze  Try condor_q -analyze [/opt/condor-6.6.7/submit] condor_q –analyze 004.000: Run analysis summary. Of 7 machines, 7 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 are available to run your job WARNING: Be advised: No resources matched request's constraints Check the Requirements expression below: Requirements = ((Disk >= 1000000000)) && (Arch == "INTEL") && (OpSys == "LINUX") && ((Memory * 1024) >= ImageSize) && (TARGET.FileSystemDomain == MY.FileSystemDomain) Oct 16, 2004 Interactive Condor Tutorial 29 Check the SchedD log  Next try looking at the SchedD log, for clues about what the SchedD was able to do with this job  .../local.myhost/log/SchedLog Set SCHEDD_DEBUG = D_FULLDEBUG in local config file (condor_config.local) condor_reconfig (to make daemons re-read the config files) SchedD can’t contact Collector Can’t authenticate with Collector Collector won’t allocate any machines for user Oct 16, 2004 Interactive Condor Tutorial 30  If necessary, increase the debug level    Potential causes:    Check the Shadow log  Check the Shadow Log for information about what happens after a successful match has been made   .../local.myhost/log/ShadowLog Can increase debug detail with SHADOW_DEBUG=D_FULLDEBUG Shadow can’t connect to Starter assigned to it Shadow can’t read the executable or input file(s)  Possible causes of problem   Oct 16, 2004 Interactive Condor Tutorial 31 If all else fails…  Check Condor manual  http://www.cs.wisc.edu/condor/manual/v6.6   Post your questions on condor-users mailing list Send your question/problem to our support system:    condor-admin@cs.wisc.edu Question will be answered by a Condor developer Paid “VIP” support also available Oct 16, 2004 Interactive Condor Tutorial 32

Related docs
Interactive Map Tutorial
Views: 62  |  Downloads: 4
WELCOME TO THE INTERACTIVE PHYSICS TUTORIAL�
Views: 66  |  Downloads: 4
Ns Tutorial
Views: 151  |  Downloads: 11
mitk tutorial
Views: 2  |  Downloads: 0
ALEKS Interactive Math Online Tutorial
Views: 24  |  Downloads: 0
SPSS tutorial
Views: 1168  |  Downloads: 110
Vedic Maths Tutorial (interactive)
Views: 624  |  Downloads: 86
Interactive Form
Views: 1  |  Downloads: 0
premium docs
Other docs by techmaster
The Pope
Views: 136  |  Downloads: 2
Armstrong_ Kelley - Birthright - Copy
Views: 92  |  Downloads: 3
PGI 242_12
Views: 34  |  Downloads: 0
IG5336.9201-ch4-4
Views: 23  |  Downloads: 0
050125_AFM_AFTRA_Grokster_Amici_Brief
Views: 120  |  Downloads: 0
Jesus is watching
Views: 225  |  Downloads: 3
Anderson_ Kevin J - Hopscotch
Views: 179  |  Downloads: 0
Apr-2006 Tax Court Opinion Ruling Case-LITTON
Views: 88  |  Downloads: 0
MP5315.403-1
Views: 37  |  Downloads: 0
Dfars 252_220
Views: 105  |  Downloads: 0
quotes[1]
Views: 179  |  Downloads: 6
nnchr10
Views: 32  |  Downloads: 0
PGI 206
Views: 109  |  Downloads: 0
pgbev10
Views: 149  |  Downloads: 0