Atlas Management by ert554898


• We commend US ATLAS S&C
  – for progress made during the past year,
  – for the careful evaluation of the experiment’s
    computing requirements and the thorough Computing
  – and for the very clear presentations during the review.
• US ATLAS S&C should continue to carefully
  examine whether the specific computing model
  choices cannot be further optimized.
• US ATLAS has been responsive to the
  suggestions of previous year’s review:
  – e. g. change control has been implemented and
    appears to be working,
  – Implementation of an analysis support model,
              General 2
• The distributed Tier 2 model
  leverages multiple university
  infrastructures effectively
  – Ref. Rome WS grid production
• The planning for the transition from
  construction to operations appears
  – Careful planning of the Computing
    System Commissioning (CSC)
• A resource allocation committee (RAC)
  has been set up.
  – We commend US ATLAS S&C for anticipating
    this important need
• The efficacy of this body may be limited by
  its size and composition
  – The membership should be revisited as
    experience accrues
           Collaboration Growth
• We note that the number of US collaborators in ATLAS
  is likely to increase
   – A rational mechanism to control the growth was presented,
   – Computing costs for additional collaborators were estimated:
       • Increased computing needs both at Tier 1 and Tier 2 to support
         additional US scientists.
• The committee finds this defensible.
   – More US collaborators require additional Tier 1 and Tier 2
     resources to support analysis
• We believe that the collaboration should remain open in
  spite of pressure on resources
   – They should continue to assess the physics priorities and
     computing needs,
   – Pursue other funding opportunities,
   – Work with the funding agencies to obtain adequate resources.
            Analysis Support
• DC2 and the Rome workshop have revealed the
  current lack of user support (as noted during this
  review 2 years ago…)
• A user support model based on three Analysis
  Support Centers (ASC) and an Analysis Support
  Group (ASG) has been agreed upon.
• This model requires significant funding (~1M$/y)
  which is currently not identified
  – We recommend that US ATLAS optimize their
    planning such as to accommodate the analysis
    support within their projected budget
  Funding in 2008 and beyond
• There is a funding contention of the order
  of 3M$/y in 2008 and beyond between
  M&O, Computing and Upgrade R&D.
  – This is a concern since the planning already
    appears subcritical. However this is an
    internal ATLAS issue and it should be
    resolved accordingly.
Dependence on Grid Technologies
• The 2005 DC2 and Rome WS have
  exposed risks due to poorly performing
  grid deliverables.
  – PanDA is a reasonable way of dealing with
    these dependencies.
  – We endorse the plan of integrating additional
    GRID deliverables as they become available
    and robust
US ATLAS: Facilities & Grids
             Tier1 - Observations
• Note that BNL has started charging explicitly for power for the Tier1
  facility. The projected power consumption may be a factor of 2 too
  low and US ATLAS is following up on this and there is additional
  uncertainty on projected cost.
• Projected manpower profile (ramp up) is a reasonable plan.
• A manpower reduction caused by not receiving the anticipated $2M
  MR in FY08 and beyond would cause serious risk in the ability of the
  Tier1 facility to provide the required functionality.
• US ATLAS has chosen to deploy disk-based event store using
  dCache managed disk distributed on compute nodes. This allows
  reduced hardware cost compared to traditional disk arrays but may
  cause an increase of risk due to use at unprecedented scale.
        Tier2 - Observations
• Three Tier2 centers have been awarded
  and established. Each of these is a multi-
  site installation.
• Some have achieved significant leveraging
  of resources.
• There has been some work on addressing
  the physics analysis & user support
  questions and relation between a Tier2
  and an Analysis Support Center.
      Network - Observations
• Tier1 WAN connection is not redundant.
  This is understood as an issue that is
  being investigated but at present there is
  not a concrete plan to add it.
• The Tier2 connectivity planned is more
  than adequate and is expected to achieve
  10 Gbps at each site by end of 2007.
         Grids - Observations
• The OSG “volunteer effort” is currently providing
  the VDT software, software validation,
  integration, and operations which US ATLAS
  depends upon. Many of these functions would
  need to be picked up by US ATLAS if OSG is
  not funded.
• We note that the PanDA controlled job
  submission failure rate is significantly reduced
  compared to the system it replaced and it has
  provided the ability to handle significantly higher
  rate of job submission.
• We recommend further efforts to understand the effect
  and usage model of individual physicists at the Tier2
  centers and relation to the Analysis Support Centers.
• US ATLAS should apply all available pressure to the grid
  middleware projects to get middleware personnel
  engaged in the service challenges this year and scaling
  problems addressed.
• We recommend that US ATLAS pursue acquisition of
  redundant network connectivity to the Tier1 center.
• We recommend to test the scalability of the dCache
  managed distributed disk.
                                ATLAS Core Software

•   Core software has matured considerably since last presented. Necessary
    support of the DataChallenges and Physics meetings happened.

•   PanDA appears to be a targeted solution to the failures seen in the Data
    Challenges, and will provide better automation and database handling of
     –   While the datasets facilitate the finding of data, it may be of quite coarse granularity
         and cause more than intended data movement traffic
     –   For example, a Tier 3 site with 2TB of disk might have difficulties swallowing a dataset

•   The Core Framework group has delivered a working product and is now in a
    mode of maintenance, small upgrades and (if they can) longer term upgrades.
    They are collecting input from the community on usability.

•   The Data Management group is concentrating on the details of efficient use of
    Root with ATLAS data, including looking at the details of tag databases and
    schema evolution, as well as I/O efficiency.

•   Devise a plan to test how well the schema evolution solution works

•   Revisit the analysis support group staffing periodically to determine
    whether it is working effectively, especially viz a viz the use of
    software professionals in this role.

•   Examine use case tradeoffs between advantages of centrally defined
    datasets vs downside of potential traffic increase
     – Ensure definition of finer-grained datasets is supported

To top