Evolution of BOSS_ a tool for job submission and tracking. - GridPP by wanghonghx



                       W. Bacchi, G. Codiispoti, C. Grandi, INFN Bologna, Italy
               D. Colling, B. MacEvoy, S. Wakefield, Y. Zhang, Imperial College London

                                                                describing the parameters to be monitored had to be
Abstract                                                        provided. The schema was used to create a table within
   BOSS (Batch Object Submission System) has been               the BOSS database to store the parameters for jobs of that
developed to provide logging and bookkeeping and real-          type.
time monitoring of jobs submitted to a local farm or a             The monitoring information was stored in the database
Grid system. The information is persistently stored in a        via a direct database connection from each job. If that
relational database for further processing. By means of         connection could not be established (for example, due to a
user-supplied filters, BOSS extracts the specific job           firewall) then the information was collected from the
information to be logged from the standard streams of the       BOSS journal file. The journal file was created by each
job itself and stores it in the database in a structured form   job on the execution host and contained a complete log of
that allows easy and efficient access. BOSS has been used       the job’s actions. This was accomplished by running a
since 2002 for CMS Monte Carlo productions and is               separate BOSS command.
being re-engineered to satisfy the needs of user analysis in       When a job was created, BOSS created its own
a highly distributed environment. The new architecture          programs to be sent as well. These included a wrapper
has the concept of composite jobs and of job clusters           script, the job type monitoring filters and the dbUpdator
(Tasks) and benefits from a factorization of the                process, which was tasked with returning information
monitoring system and the job archive.                          from the running job to the database.

  BOSS has been used within CMS computing since
2002 providing job submission and monitoring [1]. BOSS
was initially designed for use with a local batch system
(Fig. 1), but is also capable of working in a distributed
environment. Currently CMS uses BOSS to provide low-                          Figure 1: Basic flow of BOSS.
level job submission and monitoring for the CMS
production and analysis components, both of which have             BOSS was not a batch system; instead, it interfaced
used BOSS in a distributed environment. The scale of            with the scheduler (LSF, PBS, Condor, LCG, gLite etc.)
these activities is set to increase vastly. Although BOSS       to provide the required functionality (Fig. 1). It did this
had been modified to work in a distributed environment,         via a set of scripts, with one for each interfaced command
experience had shown that more modification was                 (submit, query, kill etc.). These scripts were particular to
required to meet fully the needs of large scale CMS             each scheduler system and were written by local farm
distributed computing.                                          administrators, although scripts for the most popular
                                                                batch systems were included in the BOSS distribution.
                                                                   Using the BOSS client it was possible to perform job
        PREVIOUS FUNCTIONALITY                                  commands (e.g. submit and kill) without understanding
   Central to BOSS was the idea of a job. A job consisted       the details of the underlying scheduler.
of a user-defined executable with a set of optional                A user would create a job by either providing the
monitoring attributes. BOSS provided a uniform                  appropriate job options to the BOSS client command line
mechanism for submitting and monitoring jobs.                   or, for more complex jobs, by providing a configuration
   To provide persistency, BOSS utilized a relational           file. BOSS job specification files were written in the
database (MySQL[2]) to store all information; only the          ClassAd[3] syntax. This syntax allowed both simple and
BOSS binaries and configuration were stored as files.           complex expressions and is also used by the Condor,
   As well as monitoring the standard attributes of a job       LCG and gLite schedulers for their job descriptions. The
(start time, finish time, exit code etc.) BOSS could also       configuration of the job was then saved to the database,
provide monitoring tailored specifically to the executable      returning a unique job identifier.
within a job. To do this BOSS introduced the concept of a          The user then submitted their job by running the BOSS
job type. The monitoring consisted of parsing the standard      client with the submit flag, the job id and the desired
input, output and error of the executable and searching for     scheduler. Further job operations (e.g. query and kill)
pre-determined expressions. To facilitate this, filters were    were possible with the BOSS client using the job id.
run over the executables’ input and output before, during
and after runtime. When defining a job type a schema
BOSS and distributed computing                                    Therefore it was decided that BOSS needed to be re-
  For several years BOSS had the functionality to treat         engineered to modify certain features to work better in a
various Grid implementations (EDG, LCG and gLite) as            distributed environment, and to include new features
schedulers. This allowed CMS to take advantage of these         necessary for providing a feature-rich tool for use within
Grids, in particular the LCG, in their computing model.         CMS.
Version 3.4 of BOSS proved itself useful in this regard.        New Features
However, as CMS plans to increase vastly its use of                The major improvement to the system is to the logging
distributed computing in the next few years as the              and monitoring information and allowing more
experiment begins taking data, the suitability and              complicated job flows.
functionality of BOSS must also increase. These changes            The logging and real time monitoring information have
will result in a new release (BOSS v4).                         been logically and physically split. Logging records are
                                                                always complete and reliable while real time monitoring
                BOSS EVOLUTION                                  is available if activated and allowed by the site. This
                                                                functionality is already available in BOSS version 3.6.2.
   Large-scale use of distributed computing had revealed
                                                                   To represent the job flows that are required by the CMS
various limitations and areas where improvements in
                                                                computing and analysis systems we have introduced the
BOSS were required. There was also the need to prepare
                                                                concepts of a task, a chain and a program. A program is
BOSS for 2007/8 when CMS starts taking data. At this
                                                                analogous to an old BOSS job in that it consists of a
time computing usage is expected to increase
                                                                single executable with various attributes, e.g. input files
significantly, possibly by several orders of magnitude. As
                                                                and output files. A chain is a group of programs that may
BOSS will be used very heavily during this period it is
                                                                be executed in an arbitrarily complex order on a single
important that it is highly robust and will provide all of
                                                                worker node. A task is a homogeneous group of chains
the necessary features.
                                                                with only slight differences, e.g. in the attributes. This
   Use by physicists for distributed analysis has shown
                                                                hierarchy is well-suited to a typical HEP workflow where
that the requirement for all users to have access to a
                                                                a computational task must be split into multiple parallel
MySQL server is not always appropriate. The CMS
                                                                jobs. The introduction of chains allows much more
distributed analysis client has also been hampered by the
                                                                complicated job flows, for instance event generation,
lack of a suitable API and having to run the BOSS client
                                                                simulation and reconstruction one after the other on the
as an external program.
                                                                same worker node.
   Use of the real time monitoring capabilities has shown
                                                                   The other features include:
that the architecture described above, with a direct
                                                                    Functionality suited to use cases ranging from
connection from each running job to the BOSS database,
                                                                        simple tasks of a single user to a large-scale
has problems due to scaling and firewalls.
                                                                        scheduled activity carried out by a production
   The CMS production and analysis tools are similar in
that they both contain a computational task that is split
into multiple pieces for parallel processing. As the                A searchable, robust and complete archive of all
previous version of BOSS had no understanding of this                   logging information relating to users’ tasks;
model, systems had to register each job individually. This          The possibility of re-submitting a chain while
meant that the BOSS client had to be run for each job                   retaining the logging information of the previous
registration, which wasted time and did not take                        submission. These runtime executions are called
advantage of the shared properties.                                     jobs to separate them from the logical view;
   The two main features of BOSS are the ability to                 An interface to various batch systems, both local
manage jobs with an arbitrary scheduler and the logging                 and distributed, with the same presentation for
and monitoring of these jobs. For the CMS community                     both and exploiting the possibility of submitting
this logging is vital to allow users to access their                    jobs in bulk to schedulers that support this
bookkeeping information.                                                functionality;
   The previous implementation of logging and                       An optional run-time monitoring system that is
bookkeeping was interleaved with the real time                          scalable in a distributed environment and that can
monitoring with no clear separation. This was undesirable               collect information during the running of a chain
as it made it difficult to address the peculiarities of the             and present it to a user within the BOSS interface;
two. Also, if the job was unable to make a connection to                and
the database this would result in an incomplete logging             Specialised monitoring at the program level
record. In principle, this could be rectified after the job             providing custom monitoring of each program in a
had finished by telling the BOSS client to parse the                    chain.
journal file, if it was available. However, this required the   Architecture
user to perform a manual operation. Manual intervention           The architecture of BOSS has been modified from Ref.
is undesirable in a production-quality system used by           [1] in the following ways:
(possibly) novice users.
      The logging and real time monitoring have been         accessible. The database may or may not be local to that
                                                                                       Worker Node

                           LOCAL OR
                             GRID                 Job control and logging             BOSS JOB
                          SCHEDULER                   File I/O control                WRAPPER

      Submit or control                                                                                BOSS
                                               Retrieve output              BOSS                       REAL-
                                               files                      JOURNAL
                                     Get job running
                                     status                                        Set job logging info
                                                                                   (possibly via proxy)

         BOSS                  CLIENT                         Pop job monitoring info                REAL-TIME
          DB                                                                                           BOSS
                                                                                                     DB SERVER
                User Interface

                                             Figure 2: New BOSS architecture.
       split into separate databases;                         machine. The BOSS client uses the specified batch
      The logging database no longer has running jobs        scheduler to perform job submission, control (cancel,
       connecting directly to it and is only contacted via    suspend and resume), query operations and output
       the BOSS client;                                       retrieval.
      The BOSS client contains a command for output             Once running, the job wrapper is responsible for
       retrieval that also processes the BOSS journal and     starting all BOSS processes, including the monitoring and
       updates the logging database;                          the chaining tool. It is then the responsibility of the
      The real time database is only contacted by the        chaining tool to execute the user’s chain.
       BOSS client and real time updater, via a                  The journal file is created by the job wrapper and
       specialized monitoring layer (MySQL, R-                contains a complete log of all steps carried out by the job.
       GMA[4], MonaLisa[5], etc.); and                        This includes information from the wrapper about the job
      The BOSS job wrapper is more complex in order          (e.g. start time, other initial conditions and when other
       to support program chains. A tool that supports        components of the job are started) and information from
       simple chaining is provided but the ability to use a   the specialized program monitoring.
       third party tool that can implement complex chains        If real time monitoring has been requested, the real time
       (e.g. ShReek [6]) also exists.                         updater monitors the journal file and periodically sends
Logical vs. Execution view                                    new information back to the real-time database via a
                                                              monitoring transport mechanism. Once in the database the
  It is important to keep the logical view of the necessary
                                                              user may query this information with the BOSS client.
work separate from the execution view. This is required
                                                                 The chaining tool is responsible for correctly running
for various reasons including the possibility of job
                                                              and monitoring the user’s programs. It follows the user’s
resubmissions due to scheduler and processing farm
                                                              chaining directives and starts each program in the
problems. The logical view in BOSS is the view at
                                                              prescribed order. Each program is started in its own
declaration time and consists of a user task composed of
                                                              environment along with its own specialised monitoring
multiple chains each made of several programs.
                                                              services. These monitoring scripts monitor the programs’
  At submission time BOSS deals with the task as
                                                              input and output according to the program monitoring
composed of jobs of program executions. It is therefore
possible, in the case of resubmission, to have multiple
jobs related to the same chain.                               The BOSS client
BOSS components                                                 The BOSS client includes:
 The main components in the new BOSS architecture                Access to the (possibly local) logging database;
may be seen in Fig. 2. The BOSS client runs on the user          An interface for task creation and submission;
machine, from which the logging database is also
      An interface for logging, database querying and         minor differences and used an iterator to automate the
       management;                                             differences. The iterator tag is used to loop over lower
      An interface to the real time monitoring database;      elements and, when it encounters its name, to print its

   <?xml version="1.0" encoding="UTF-8" standalone=”yes"?>
   <task schema="schema.xml">
    <iterator name=“ITR” start=“0” end=“100” step=“1”>
     <chain scheduler="glite” rtupdater="mysql" ch_tool_name="jobExecutor">
      <program exec="test.pl"
            outtopdir="" />

      An interface to the schedulers providing job            current value.
       submission, deletion, querying and output
       retrieval; and

                                    Figure 3: An example XML task specification file.
        An interface to register schedulers, program types,
         chaining tools and monitoring mechanisms.                              CURRENT STATUS
   The user may choose the logging database technology
                                                               Of the new features described most have been
from a list of supported types. As the database is only
                                                               implemented fully, including:
contacted via the BOSS client it is possible to implement
                                                                   Task, chain and program hierarchy;
it in an embeddable database such as SQLite[7] (which
only requires access to the local file system), or in a            XML task description;
database with server capabilities (such as MySQL or even           An API in C++ and Python;
ORCALE[8]).                                                        A basic chaining tool, with only linear program
   The BOSS client had already been implemented as a                    execution;
CLI and has now been implemented as an API with C++                Separate logging and monitoring databases; and
and Python bindings. The Python bindings are generated             Optional real-time monitoring using a variety of
by SWIG[9] from C++ source, so further language                         technologies (currently MySQL) and direct
support is possible.                                                    connections to the BOSS DB.
   All database management is through the BOSS client,            Features being implemented at present include:
including facilities such as program, chainer and                  Allowing usage of third party chainer plug-ins;
monitoring tool registration.                                           and
   The interface to the schedulers remains similar to              Additional real-time monitoring technologies.
previous versions, but with improved handling of input            It is possible that a new component will be added which
and output files. Upon completion of a job, output is          is capable of taking pro-active measures depending on the
automatically retrieved, the journal file is read and the      current status of the job, for instance killing a program
logging database is updated.                                   stuck in an infinite loop.
   It is still possible for a user to submit a task with a        A graphical client will also be developed to allow users
single chain and program using only options in the CLI,        to browse their tasks in a convenient manner.
but to describe more complicated tasks a specification file       An interim release has been created containing the
is required. Writing this kind of complex structure in         completed features. It is envisaged that a release
ClassAds would have been difficult, so it was decided to       containing the full range of new features (Version 4) will
move to XML as it is ideally suited to describing complex      be available shortly.
task hierarchies and is relatively easy for the user to
   An example is shown in Fig. 3, where we have taken
the idea that a task is a composition of chains with only
[1] C. Grandi, A.Renzi “Object Based System for Batch
     Job Submission and Monitoring (BOSS)”. CMS
     NOTE 2003/005 http://boss.bo.infn.it/
[2] http://mysql.com
[3] ClassAd web page at University of Wisconsin:
[4] R-GMA: An Information Integration System for Grid
     SCIENCE, 2003, ISSU 2888, pages 462-481
[5] MonALISA: A Distributed Monitoring Service
     Architecture, Computing in High Energy and Nuclear
     Physics (CHEP03), La Jolla, Ca, USA, March 2003.
[6] Managing Workflows with ShReek (CHEP06),
     Mumbai, India, Feb 2006.
[7] http://sqlite.org
[8] http://orcacle.com/database
[9] http://swig.org

To top