Condor J2 + Developer APIs to Condor + A Tutorial on Condor’s Web Service Interface
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison matt@cs.wisc.edu tannenba@cs.wisc.edu http://www.cs.wisc.edu/condor
CondorJ2
› Quill/Quill++: Database reflects state of Condor pool › Condor J2: Database is the state of Condor pool › Overview of CondorJ2
Use database to maintain operational data (workflow state,
machine state, config policies, etc.) Implement workflow management, resource management and resource allocation in J2EE Application Server environment Modify master, startd and starter to be web service clients Provide web interface for all system services (workflow submission, machine reconfiguration etc.)
http://www.cs.wisc.edu/condor
2
Motivation
› Flexibility › Centralized Administratibility › Attempt to leverage standard “enterprise”
technology in this space › Scalability
big $$$
As big as you want if you are willing to pay the
http://www.cs.wisc.edu/condor
3
Java Application Servers
› Industrial strength middleware for high performance &
scalable web applications › Widely deployed systems
Oracle AS 10g, IBM WebSphere, BEA WebLogic, JBoss (open
› Key features
source)
Database connection pooling Support for transactions Web service interfaces Support for clustering (for scalability) Pluggable security models / role based authorization Backend database independence
http://www.cs.wisc.edu/condor
4
Condor Database
JDBC
Application Server
Machine Modules
Matchmaking Modules
Workflow Modules
Condor Pool Web Site
Condor Web Services
HTTP
SOAP over HTTP
User’s Web Browser
User’s Custom Tools
Web Service Clients
master startd starter
Execute Machines
Pool Database JDBC
Application Server
Application Server
Application Server
Load Balancer
Firewall NAT SOAP over HTTP
startd
starter
startd
starter
startd
starter
job
job
job
Execute Machines
What can do in CondorJ2 via browsers and web services?
› Where do we stand now?
Add and configure new machines Reconfigure machines on the fly Specify, submit, monitor and manage workflows Monitor global system state No matchmaking (yet)
› Is currently research work. When will it
ship? Will it ever ship? Only time will tell.
http://www.cs.wisc.edu/condor
7
Interfacing Applications w/ Condor
› Suppose you have an application which › ›
needs a lot of compute cycles You want this application to utilize a pool of machines How can this be done?
http://www.cs.wisc.edu/condor
8
› MW (previous talk) › Command Line tools › › › ›
DRMAA Condor GAHP Condor Perl Module SOAP
Some Condor APIs
condor_submit, condor_q, etc
http://www.cs.wisc.edu/condor
9
› Don’t underestimate them! › Your program can create a submit file
on disk and simply invoke condor_submit:
system(“echo universe=VANILLA > /tmp/condor.sub”); system(“echo executable=myprog >> /tmp/condor.sub”); . . . system(“echo queue >> /tmp/condor.sub”); system(“condor_submit /tmp/condor.sub”);
Command Line Tools
http://www.cs.wisc.edu/condor
10
Command Line Tools
› Your program can create a submit file
and give it to condor_submit through stdin:
PERL:
C/C++:
fopen(SUBMIT, “|condor_submit”); print SUBMIT “universe=VANILLA\n”; . . . int s = popen(“condor_submit”, “r+”); write(s, “universe=VANILLA\n”, 17/*len*/); . . .
http://www.cs.wisc.edu/condor
11
Command Line Tools
› Using the +Attribute with
condor_submit:
universe = VANILLA
executable = /bin/hostname output = job.out log = job.log +webuser = “zmiller” queue
http://www.cs.wisc.edu/condor
12
Command Line Tools
› Use -constraint and –format with
condor_q:
% condor_q -constraint „webuser==“zmiller”‟
-- Submitter: bio.cs.wisc.edu : <128.105.147.96:37866> : bio.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 213503.0 zmiller 10/11 06:00 0+00:00:00 I 0 0.0 hostname
% condor_q -constraint 'webuser=="zmiller"' -format "%i\t" ClusterId -format "%s\n" Cmd
213503 /bin/hostname
http://www.cs.wisc.edu/condor
13
Command Line Tools
› condor_wait will watch a job log file
and wait for a certain (or all) jobs to complete: system(“condor_wait job.log”);
› can specify a timeout
http://www.cs.wisc.edu/condor
14
Command Line Tools
› condor_q and condor_status –xml ›
option So it is relatively simple to build on top of Condor’s command line tools alone, and can be accessed from many different languages (C, PERL, python, PHP, etc). However…
http://www.cs.wisc.edu/condor
›
15
DRMAA
› DRMAA is a GGF standardized jobsubmission API › Has C (and now Java) bindings › Is not Condor-specific -- your app could submit to any job scheduler with minimal changes (probably just linking in a different library) › SourceForge Project
http://sourceforge.net/projects/condor-ext
http://www.cs.wisc.edu/condor
16
DRMAA
› Easy to use, but › Unfortunately, the DRMAA API does
not support some very important features, such as:
Two-phase commit
Fault tolerance Transactions
http://www.cs.wisc.edu/condor
17
Condor GAHP
› The Condor GAHP is a relatively low-level protocol
based on simple ASCII messages through stdin and stdout › Supports a rich feature set including two-phase commits, transactions, and optional asynchronous notification of events › Is available in Condor 6.7.X
http://www.cs.wisc.edu/condor
18
Example:
GAHP, cont
R: $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $ S: GRAM_PING 100 vulture.cs.wisc.edu/fork R: E S: RESULTS R: E S: COMMANDS R: S COMMANDS GRAM_JOB_CANCEL GRAM_JOB_REQUEST GRAM_JOB_SIGNAL GRAM_JOB_STATUS GRAM_PING INITIALIZE_FROM_FILE QUIT RESULTS VERSION S: VERSION R: S $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $ S: INITIALIZE_FROM_FILE /tmp/grid_proxy_554523.txt R: S S: GRAM_PING 100 vulture.cs.wisc.edu/fork R: S S: RESULTS R: S 0 S: RESULTS R: S 1 R: 100 0 S: QUIT R: S
http://www.cs.wisc.edu/condor
19
Condor Perl Module
› Perl module to parse the “job log file” › Recommended instead of polling w/ › ›
condor_q Call-back event model (Note: job log can be written in XML)
http://www.cs.wisc.edu/condor
20
SOAP
› Simple Object Access Protocol
Mechanism for doing RPC using XML
› SOAP Toolkit: Transform a WSDL to
a client library
(typically over HTTP or HTTPS) A World Wide Web Consortium (W3C) standard
http://www.cs.wisc.edu/condor
21
› Condor becomes a service
service tools
Benefits of a Condor SOAP API
Can be accessed with standard web
› Condor accessible from platforms ›
where its command-line tools are not supported Talk to Condor with your favorite language and SOAP toolkit
http://www.cs.wisc.edu/condor
22
Condor SOAP API functionality
› › › › ›
Submit jobs Retrieve job output Remove/hold/release jobs Query machine status Query job status
http://www.cs.wisc.edu/condor
23
Getting machine status via SOAP
Your program condor_collector
queryStartdAds()
Machine List
SOAP library
SOAP over HTTP
http://www.cs.wisc.edu/condor
24
Lets get some details…
http://www.cs.wisc.edu/condor
25
› Core API, described with WSDL, is
File transfer is done in chunks Transactions are explicit
The API
designed to be as flexible as possible
› Wrapper libraries aim to make
common tasks as simple as possible
Currently in Java and C# Expose an object-oriented interface
http://www.cs.wisc.edu/condor
26
› Start with a working condor_config › The SOAP interface is off by default
Condor setup
› Access to the SOAP interface is denied by default
work like ALLOW_READ/WRITE/… See section 3.7.4 of the v6.7 manual for a description Example: ALLOW_SOAP=*/*.cs.wisc.edu › If using HTTP, must set QUEUE_ALL_USERS_TRUSTED=TRUE
(not needed/wanted with HTTPS)
http://www.cs.wisc.edu/condor
27
Turn it on by adding ENABLE_SOAP=TRUE Set ALLOW_SOAP and DENY_SOAP, they
Necessary tools
› You need a SOAP toolkit
Apache Axis (Java) - http://ws.apache.org/axis/ Microsoft .Net - http://microsoft.com/net/ All our gSOAP (C/C++) - http://gsoap2.sf.net/ examples are ZSI (Python) - http://pywebsvcs.sf.net/ in Java using SOAP::Lite (Perl) - http://soaplite.com/
› You need Condor’s WSDL files
Apache Axis
› Put the two together to generate a client library
condorSchedd.wsdl › Compile that client library $ javac condor/*.java
$ java org.apache.axis.wsdl.WSDL2Java
Find them in lib/webservice/ in your Condor release
http://www.cs.wisc.edu/condor
28
Helpful tools
› The core API has some complex spots › A wrapper library is available in Java and C#
transfer & job ad submission) Makes the API more OO, no need to remember and pass around transaction ids › We are going to use the Java wrapper library for our examples You can download it from
http://www.cs.wisc.edu/condor/birdbath/birdbath.jar Will be included in Condor release
Makes the API a bit easier to use (e.g. simpler file
http://www.cs.wisc.edu/condor
29
Submitting a job
› The CLI way…
cp.sub:
universe = vanilla executable = /bin/cp arguments = cp.sub cp.worked should_transfer_files = yes transfer_input_files = cp.sub when_to_transfer_output = on_exit queue 1 clusterid = X procid = Y owner = matt requirements = Z
Explicit bits
Implicit bits
$ condor_submit cp.sub
http://www.cs.wisc.edu/condor
30
Submitting a job
• The SOAP way…
1. Begin transaction Repeat to submit multiple clusters 2.Create cluster 3.Create job 4.Send files Repeat to submit multiple 5.Describe job jobs in a single cluster 6.Commit transaction
http://www.cs.wisc.edu/condor
31
Submission from Java
Schedd schedd = new Schedd(“http://…”); Transaction xact = schedd.createTransaction(); 1. Begin transaction xact.begin(30); int cluster = xact.createCluster(); 2. Create cluster int job = xact.createJob(cluster); 3. Create job File[] files = { new File(“cp.sub”) }; xact.submit(cluster, job, “owner”, UniverseType.VANILLA, “/bin/cp”, “cp.sub cp.worked”, “requirements”, null, files); xact.commit(); 4&5. Send files & describe
job
6. Commit transaction
http://www.cs.wisc.edu/condor
32
Submission from Java
Schedd schedd = new Schedd(“http://…”); Transaction xact = schedd.createTransaction(); Max time between calls (seconds) xact.begin(30); int cluster = xact.createCluster(); int job = xact.createJob(cluster); File[] files = { new File("cp.sub") }; Job owner, e.g. “matt” xact.submit(cluster, job, “owner”, UniverseType.VANILLA, “/bin/cp”, “cp.sub cp.worked”, “requirements”, null, files); xact.commit();
Schedd’s location
Requirements, e.g. “OpSys==\“Linux\”” Extra attributes, e.g. Out=“stdout.txt” or Err=“stderr.txt”
http://www.cs.wisc.edu/condor
33
Querying jobs
› The CLI way…
$ condor_q -- Submitter: localhost : <127.0.0.1:1234> : localhost ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 matt 10/27 14:45 0+02:46:42 C 0 1.8 sleep 10000 … 42 jobs; 1 idle, 1 running, 1 held, 1 unexpanded
http://www.cs.wisc.edu/condor
34
Querying jobs
› The SOAP way from Java…
String[] statusName = { “”, “Idle”, “Running”, “Removed”, “Completed”, “Held” }; Also, getJobAds given a int cluster = 1; int job = 0; constraint, e.g. “Owner==\“matt\”” Schedd schedd = new Schedd(“http://…”); ClassAd ad = new ClassAd(schedd.getJobAd(cluster, job)); int status = Integer.valueOf(ad.get(“JobStatus”)); System.out.println(“Job is “ + statusName[status]);
http://www.cs.wisc.edu/condor
35
Retrieving a job
› The CLI way.. › Well, if you are submitting to a local
Schedd, the Schedd will have all of a job’s output written back for you › If you are doing remote submission you need condor_transfer_data, which takes a constraint and transfers all files in spool directories of matching jobs
http://www.cs.wisc.edu/condor
36
Retrieving a job
› The SOAP way in Java…
int cluster = 1; Discover available files int job = 0; Schedd schedd = new Schedd(“http://…”); Transaction xact = schedd.createTransaction(); xact.begin(30); Remote file FileInfo[] files = xact.listSpool(cluster, job); for (FileInfo file : files) { xact.getFile(cluster, job, file.getName(), file.getSize(), new File(file.getName())); } xact.commit(); Local file
http://www.cs.wisc.edu/condor
37
› Authentication is done via mutual SSL
authentication
themselves
Authentication for SOAP
Both the client and server have certificates and identify
› Possible in 6.7.20 › It is not always necessary, e.g. in some controlled
environments (a portal) where the submitting component is trusted › A necessity in an open environment -- remember that the submit call takes the job’s owner as a parameter
http://www.cs.wisc.edu/condor
38
Questions?
http://www.cs.wisc.edu/condor
39
Authentication setup
› Create and sign some certificates › Use OpenSSL to create a CA
› Create a server cert and password-less key
CA.sh -newca
› Create a client cert and key
CA.sh -newreq && CA.sh -sign mv newcert.pem server-cert.pem openssl rsa -in newreq.pem -out server-key.pem
CA.sh -newreq && CA.sh -sign && mv
newcert.pem client-cert.pem && mv newreq.pem client-key.pem
http://www.cs.wisc.edu/condor
40
› Config options…
Authentication config
ENABLE_SOAP_SSL is FALSE by default _SOAP_SSL_PORT
• Set this to a different port for each SUBSYS you want to talk to over ssl, the default is a random port • Example: SCHEDD_SOAP_SSL_PORT=1980 SOAP_SSL_SERVER_KEYFILE is required and has no default • The file containing the server’s certificate AND private key, i.e. “keyfile” after cat server-cert.pem server-key.pem > keyfile
http://www.cs.wisc.edu/condor
41
› Config options continue…
Authentication config
• The file containing public CA certificates used in signing client certificates, e.g. demoCA/cacert.pem › All options except SOAP_SSL_PORT have an optional SUBSYS_* version For instance, turn on SSL for everyone except the Collector with • ENABLE_SOAP_SSL=TRUE • COLLECTOR_ENABLE_SOAP_SSL=FALSE
SOAP_SSL_CA_FILE is required
http://www.cs.wisc.edu/condor
42
› The certificates we generated have a principal name, which
One last bit of config
is not standard across many authentication mechanisms › Condor maps authenticated names (here, principal names) to canonical names that are authentication method independent › This is done through mapfiles, given by SEC_CANONICAL_MAPFILE and SEC_USER_MAPFILE › Canonical map:
› “SSL” is the authentication method, “.*emailAddress….*” is a
pattern to match against authenticated names, and “\1” is the canonical name, in this case the username on the email in the principal
SSL .*emailAddress=(.*) \1
http://www.cs.wisc.edu/condor
43
HTTPS with Java
› Setup keys…
demoCA/cacert.pem openssl pkcs12 -export -inkey client-key.pem -in clientcert.pem -out keystore › All the previous code stays the same, just set some properties javax.net.ssl.trustStore, javax.net.ssl.keyStore, javax.net.ssl.keyStoreType, javax.net.ssl.keyStorePassword Example: java -Djavax.net.ssl.trustStore=truststore Djavax.net.ssl.keyStore=keystore Djavax.net.ssl.keyStoreType=PKCS12 Djavax.net.ssl.keyStorePassword=pass
keytool -import -keystore truststore -trustcacerts -file
http://www.cs.wisc.edu/condor
44