Automated Software Packaging and Installation for the ATLAS by maclaren1


									      Automated software
   packaging and installation
   for the ATLAS experiment
              Simon George
   Royal Holloway, University of London
      Christian Arnault, LAL Orsay; Michael Gardner, RHUL; Roger Jones,
            University of Lancaster; Saul Youssef, Boston University

                       e-Science All Hands Meeting
                           2-4 September 2003                           
   This talk is about packaging, distribution and
    installation for a large software project
   It is essential because
       The project computing resources are widely distributed
        around 140 institutes, who all want to use the software
       We want to be able to use Grid resources that do not
        have locally managed installations of the software
       Our working model also requires the ability to deploy
        user code that is not part of an official distribution
   I’ll describe the process developed and the tools

Wed 03Sep03                Simon George RHUL                      2
 ATLAS  and its software
 Requirements
 Tools and formats
 Meta data
 Naming conventions
 Creating and installing the kits
 Conclusions and outlook

Wed 03Sep03        Simon George RHUL   3
          The ATLAS Experiment
       A Particle Physics experiment at
        the Large Hadron Collider, CERN
       1600 physicists, 140 institutes,
        6 continents
       Studies include
         • search for the origin of mass
         • excess of matter over antimatter in
           the universe
         • evidence for Supersymmetry
         • other new physics

Wed 03Sep03                Simon George RHUL     4
              ATLAS software suite
   Simulation, data processing and analysis
   500 “packages”, 50 external, inter-dependent.
   100s of developers and 1000s of users in 140 institutes
   One release build is 2.5 GB of files
   It takes 10 hours to build
   Build types and frequencies
       Production release 3-4 times per year
       Developer release every 2-3 weeks
       Nightly build of snapshot
   Build configuration permutations
       Optimised, debug and sometimes also profile builds.
       Two platforms (RedHat 7.3 on Intel x86, Solaris 8 on SPARC)
       One or more compilers (gcc 3.2)
   Config. management, build and install handled by CMT
   So not a trivial task to package, distribute and install
Wed 03Sep03                   Simon George RHUL                       5
   Configuration management tool
       Concerned with setting up user’s environment to build
        and run software
       Needs help of tools for a large project
   CMT helps to define and impose conventions
       For naming packages, files, directories
       For describing their relationships
       In other words, package metadata
       This is the key feature exploited for this project.
   Useful features to manage sub projects,
   A broad user base, especially in Particle Physics
    and Astronomy experiments.

Wed 03Sep03                 Simon George RHUL                     6
        Packaging Requirements
   Three types of kit required
       Binary kit
         • Pre-built executables, libraries and configuration files needed to
           run the software
         • Used for data challenges, production, basic users
       Developer’s kit
         • Binary kit plus
         • Headers, libraries and configuration needed to build against it
         • For developers and most users
       Full source kit
         • To rebuild from scratch on binary-incompatible platforms
         • When local source code browsing is required
   For each permutation of platform, config, compiler
Wed 03Sep03                    Simon George RHUL                             7
        Installation requirements
   For large facilities: unattended, push button deployment
   For normal user: relocateable, no root access
   Automatic configuration
   Updates, multiple versions
   Avoid duplication and unnecessary downloads
   Possibility to take subset of software
   Self contained, apart from …
   Prerequisite software: modest list and automatic check
   Set up user’s environment (e.g. LD_LIBRARY_PATH)
   Reversible: uninstall
   Install and work disconnected from network,
    e.g. install onto a laptop from CDs

Wed 03Sep03               Simon George RHUL                    8
   ATLAS software is divided into sub-projects
       Currently ATLAS and Gaudi
       Could be more in the future, e.g. split ATLAS into
        simulation and reconstruction
       Each sub-project consists off many packages
   External/Internal package distinction
       Internal packages are developed and managed within
        the ATLAS software project
       External packages are the opposite, e.g. software from
        the Particle Physics community, public domain software
        or commercial products.
       Interface packages for externals
         • Pure metadata package
         • Actual external sw can be installed anywhere, any way.
         • Gives it the outside appearance of an internal package

Wed 03Sep03                   Simon George RHUL                     9
              Constraints, continued
   Existing use of CMT
       Package structure already in place
       Meta data provided by packages or implied by default
        policies is already enough for automated packaging.
   Problems
       ATLAS software is written by large communities with a
        mixed level of experience
       All such software projects will have small flaws
        introduced in each release
       These must be worked around when they impact on the
       For example, one problem of particular relevance to
        packaging & installation is cyclic dependencies

Wed 03Sep03               Simon George RHUL                    10
          Packaging: starting point
 One         kit per package
       Follow existing granularity
 Separate        metadata and payload
       Two parts to each kit
 Performed   by librarian as integral part of
  release procedure
 Distribution by web or distributed filesystem
  (e.g. AFS)

Wed 03Sep03              Simon George RHUL    11
                      Tools used
   CMT
       Define and impose conventions on packages
       Query the metadata needed for packaging
   Pacman
       Metadata format
       Tool used to manage kit installation
   Tar and RPM
       Payload format – the package itself
   “Deployment tools” shell scripts
       Construct the kits using CMT
       Control location of Pacman cache and distribution
       Post-installation configuration
Wed 03Sep03                Simon George RHUL                12
Overview of process and tools
                CMT                               Web server
                            Create kits           or AFS

  Pacman                                                        CMT

 Local s/w

                           Local                 Run software
Wed 03Sep03                  Simon George RHUL                          13

   A package manager
   Packager defines how the software should be
    fetched, installed, configured, updated, in a
    “Pacman” file. The package itself can be in any
    format as that file is separate.
   A directory of these files is known as a cache,
    usually available on the web.
   Pacman tool is used to install the software
   Pacman’s feature list is a good match to the
    requirements for installation.
   Already used by several Particle Physics and
    GRID projects.
Wed 03Sep03            Simon George RHUL              14
    Package distribution format
   Tar vs. RPM
   Both can be made relocateable
   Feature set
       Tar has a simple feature set but is complementary to CMT and
       RPM overlaps with CMT and Pacman
         • e.g. RPM also handles dependencies and prerequisites
   Platforms
       RPM is only widely used on Linux, while tar is standard on pretty
        much any Unix
   Annoyances
       Default RPM database needs root access to write to it
         • There are workarounds for this but not pretty
   Conclusion
       Decided to use tar
       but retained RPM as an option

Wed 03Sep03                      Simon George RHUL                          15
                               Meta data
   For each package
       Other packages it uses (dependencies)
       Location of constituents
         •    Applications and libraries
         •    Header files
         •    Run time/config files
         •    CMT requirements file
   External packages
       Pure meta data “glue” packages
       Just define paths to export
   All defined in CMT requirements files
       or implied by default conventions of ATLAS
   Can be queried through cmt
       cmt show uses
       cmt show macro <package>_export_paths

Wed 03Sep03                         Simon George RHUL   16
               Naming and structure
   Package naming convention
       Packages in a sub-project
         • <package name>-<sub-project release id>
       External packages
         • <package name>-<version id>
       These names are used when expressing the inter-package
   Directory structure within each kit
       <sub-project>/<release-id>/InstallArea/
         • contains the sub-directories bin, lib, include, share.
       <sub-project>/<release-id>/<package>/<version>/cmt/
         • Contains the configuration management files
       <external-package>/
         • Assumed to have their own internal structure for versions & builds
   This is designed to support coexistence of:
       Different versions of every piece of software
       Different binary versions (platform and build config)

Wed 03Sep03                       Simon George RHUL                             17
CMT requirements file:
package ExamplePkgA                         Package name and author
author A. Person <>
use ExamplePkgB                             Inter-package dependencies
use ExampleExtPkg                           Instruction to build a library
library ExamplePkgA *.cxx                   from source files
apply pattern component_library
                                            Type of library to build,
apply pattern declare_runtime
                                            implies library file names
                                            Default location implied

Pacman file:
description=‘Package ExamplePkgA-01-07-02 in release 6.5.0’
download = { ‘*’:’ExamplePkgA-6.5.0.tar.gz’ }
depends = [ ‘ExamplePkgB-6.5.0’, ‘ExampleExtPkg-v1’ ]

Wed 03Sep03             Simon George RHUL                             18
                     Creating the kits
   First, build a release
   Discover cycles in the dependencies
       Use a feature of CMT to discover cycles in the dependencies, as
        these must not be propagated to the kits. Record the output in a
   Then, use a feature of CMT to visit every package in a
    dependency tree and apply a command there
       cmt broadcast <command>
   Usage of the script to create a kit:
     –release <release-id> -cycles <file>        [-rpm]
        <target distribution directory>
       Creates a pacman file and tar file, optional RPM file
   Finally, there are often a few things to fix by hand specific
    to each release.
   Note that CMT itself is included as a kit
Wed 03Sep03                    Simon George RHUL                           19
   Performed by site software manager or end user on
    desktop or laptop
   Straightforward procedure:
       Install Pacman, if not already done
       Install prerequisite software
         • Currently just RedHat 7.3 o/s, gcc-3.2 and Java SDK 1.4.1
       Choose directory for the installation
         • Probably the same as before
       Choose which release to install
         • Available releases are listed on a web page
       Use Pacman to download, install and configure it, e.g.
        pacman –get ATLAS:AtlasRelease-6.5.0
         • Dependencies followed automatically to get everything you need
       Optionally, run script to set up a user environment and run a test
   User configures software in the usual way
       Just choose release and private working area as normal
       Run a setup script provided by CMT
Wed 03Sep03                     Simon George RHUL                            20
   Procedures and tools have been developed for the
    packaging, distribution and installation of ATLAS software
   Based on Pacman, CMT, tar/rpm and some shell scripts
   The basic principles could be applied more generally
       Using some or all of the same tools
   It satisfies most of the requirements for run-time and
    developers’ kits and for installation.
       Full source kit still to be done.
   Early adopters have given useful feedback and it is now
    being imported into Grid production systems
   Must now move to its use as part of the standard release
    procedure in ATLAS
       by December 2003, for our global `Data Challenge 2'

Wed 03Sep03                      Simon George RHUL               21
              Future developments
 Better  handling of prerequisite software and
   platform compatibility checks
       EDG WP4 configuration management task
         to work with an installation on
 Potential
  demand mechanism for GRID farms
       Meta packaging proposal for Grid middleware
        and applications, O. Barring et al.
 Pacman       version 3

Wed 03Sep03            Simon George RHUL              22

To top