Leveraging Database Technologies in Condor
Jeff Naughton April 25, 2006
Overview
› Introducing ourselves › What we have done since last year
• Obtained funding (Yay! thank you NSF!) • Quill: deployed DB-centric data tool • Quill++: more comprehensive, deployed in test-
› BOF 1:30 on Thursday
cluster, running (guinea pig) user jobs • Condor J2EE: radical departure experimental system, deployed last week in test cluster • Published some research papers…
Who we are
› Faculty: David DeWitt, Jeff ›
Naughton Students: Jiansheng Huang, Ameet Kini, Christine Reilly, Eric Robinson, Srinath Shankar, Lakshmikant Shrinivas
How do we fit in?
› Advanced Development/Research group
focused on data management. › Goal:
• Interact frequently with Condor Dev. Team and
› What we don’t do:
users • Design and prototype new technology; • transfer to Condor team for deployment.
• Determine roadmap and schedules for
deployment within Condor.
Why Condor and DBMS?
› Premise: A running Condor system is awash
in data:
• Operational data, Historical data, User data
› DBMS technology can help capture,
organize, manage, archive, and query this data. › This can make Condor even more powerful, usable, and useful.
Quill
> Non-invasive approach to capturing job related > > >
information Works by sniffing updates to the job queue log Serves condor_q and condor_history queries Independent, reliable, and efficient querying of job related information, with underlying SQL interface So how does it work?
Quill Architecture
Master
…
Startd
Schedd
Quill
RDBMS
Job Queue log
Queue + History Tables
Quill++
› More comprehensive than Quill (data from
› Built on Quill code base › Condor daemons write to SQL logs, Quill
daemon reads and inserts in DBMS › Central database serves entire pool › Web-based query GUI
all daemons, not just SchedD)
Data Capture in Quill++
› Condor daemons
augmented to record important events in a database › Database is in addition to standard daemon logs › Pool will run unaffected even in the absence of a database
Schedd
Shadow
Startd
Starter
Negotiator
A Machine
Quill++ Architecture
Master
…
Startd
Schedd
Quill++
Event logs
Job Queue log
RDBMS Queue, History, Machine, Match etc.
Implementation Details
› Quill++: First class condor daemon
• Managed by Condor Master • Native PostgreSQL API • Can be ported to any platform for which
PostgreSQL drivers are available (AIX, BSD, IRIX, HP-UX, Linux, Solaris, Windows etc.)
• Porting Quill++ to other databases involves
implementing a database virtual class
Web Interface
› Useful for:
• User job monitoring
• Administrative monitoring over jobs and
resources
• Debugging
Condordb Admin Screen
Jobs in queue
History jobs
Machine Status
Recency summary
Job history by owner
Machine Report
Status about a job
Classad Info
Run Info
Event Info
Match Info
Rejects Info
Recency info for exceptional data sources
Present Status
› Deployed in testbed
• dbc cluster (93 machines) • Has successfully run almost 100,000 jobs.
• Working with Condor team planning
future distribution with Condor.
› Web interface to DB
friendly (!)
Caveats
• Basic prototype implemented
• Needs to be made more robust, user
› Gathers incomplete information in
in, condor-c)
multiple pool scenarios (flocking, glide-
CondorJ2
› To boldly go where no one has gone before › Overview of CondorJ2
• Quill/Quill++: Database reflects state of Condor pool • Condor J2: Database is the state of Condor pool
• Use database to maintain operational data (workflow
state, machine state, config policies, etc.) • Implement workflow management, resource management and resource allocation in Application Server environment • Modify master, startd and starter to be thin web service clients • Provide web interface for all system services (workflow submission, machine reconfiguration etc.)
Motivation
› Scalability › Flexibility › Administratibility
Java Application Servers
› Industrial strength middleware for high performance &
scalable web applications › Widely deployed systems
•
› Key features
• • • • •
Oracle AS 10g, IBM WebSphere, BEA WebLogic, JBoss (open source)
Support for transactions Web service interfaces Support for clustering (for scalability) Configurable security Backend database independence
Condor Database
JDBC
Application Server
Machine Modules
Matchmaking Modules
Workflow Modules
Condor Pool Web Site
Condor Web Services
HTTP
SOAP over HTTP
User’s Web Browser
User’s Custom Tools
Web Service Clients
master startd starter
Execute Machines
What can do in CondorJ2 via browsers and web services?
› Add and configure new machines › Reconfigure machines on the fly › Specify, submit, monitor and manage ›
workflows Monitor global system state
Virtuous Cycle
› As we learn where Condor can use DBMS
technology, we learn where DBMS technology can be (must be?) improved.
• Support for sparse data sets [ICDE 2006]. • Pushing match-making style operations into
› Improving DBMS technology will lead to
more places that it can be installed.
DBMS [SIGMOD 2006]. • Data provenance as byproduct of Quill++ data capture. [IPAW 2006]
Other ongoing work…
› File caching in Condor pools › Techniques for explaining data ›
consistency rather than dictating consistency Automatic monitoring of system “health” by mining captured data
Visit us and see demos!
› Come see demos of Quill, Quill++, and
CondorJ2 in Rm. 216/218 Fluno Center on Thurs. afternoon 1:30 – 2:30pm.