Database Summary 2009 (focus on Conditions DB) by vps11289


									Database Summary 2009
(focus on Conditions DB)

     Elizabeth Gallas – Oxford

  2009 Software & Computing
 Post Mortem Workshop (Part I)
         Jan 18-19, 2010
 General Database Summary 2009
 Distributed Database Summary 2009
 Conditions DB in 2009
    Coming of age
       Release 15 and other major improvements to Conditions access

       Frontier – Grid-wide access to Conditions via Frontier
               Sasha will cover in next presentation
 Conditions DB Usage 2009 and Challenges for 2010
 Briefly: Other DB Projects 2009  2010
        Real Data Conditions in Simulation (Sim)
        Luminosity in Conditions DB (LWG, LMTF)
        Data Quality in Conditions DB entry/usage (DQ)
        Conditions and Catalogue Metadata for TAGs (TAG)
        AGIS – ATLAS Grid Information System (ADC)
 Thanks to many …
 Summary

18-Jan-2010                     Elizabeth Gallas - Databases      2
  General Database Summary 2009
 General Database 2009 summary in the Software
  Week Extended CMB (so I will just summarize here)
       Online stability and isolation from GPN – many aspects
       Schema movement to final locations – many applications
       Oracle Streams performance (ONL  OFL  Tier1) – Good
       DB Issues logging and follow up
          continued improvements in identifying and fixing problems

       Plot: Service Usage by application
          Highlights largest DB resource consumers

       Creation of Archive instance – increases robustness
          Makes offline “Standby Database” possible in 2010
       Grid-wide Frontier Deployment
          It not only works, people are using it …

 18-Jan-2010                 Elizabeth Gallas - Databases              3
 Distributed Database Summary 2009
 ATLAS represented at “3D Workshop” in November 2009
     (This workshop brought WLCG DBAs, system managers, and
       experiment representatives together – thanks to Maria Girone)
      ATLAS Distributed Databases include
          AMI … Trigger … Conditions … TAGs

      TAG DB distribution and operation at volunteer sites
          Discussion of the evolution of the TAG DB/services model

      Conditions DB access – biggest challenge since the majority of
       ATLAS processing/analysis happens on “The Grid”
          DB Release – continued usage for reprocessing
                But can use other access methods
             Direct Oracle access – can deploy if needed
               “Message Board” style queue throttles COOL jobs based on load
          Frontier – see talk in workshop by Douglas Smith
          More details in next talk: Sasha Vaniachine
 Very useful to enhance our communication with Tier-1s
    Making them aware of our usage and optimize feedback

18-Jan-2010                     Elizabeth Gallas - Databases                   4
 Conditions DB Improvements in 2009 (1)
 See TWiki
    Itemizes some (but not all) of the improvements on next few slides
 Updates for underlying infrastructure changes …
    COOL 2.6 … removed SEAL dependencies … LCG 56 …
 IOVDbSvc:
    Reduce number of simultaneous COOL connections (15.0)
    Request reloading conditions which have changed (detect open-ended
    Add internal monitoring to check which folders are used and how much
      data is read … summary written to log files
    Alignment of IOV queries and greater use of read-cache
             improves performance with Frontier
 “COOL schema split” was complete in 15.4 (online conditions moved to
  offline folders if they are not needed online)
    Associated global tags reviewed to ensure non-needed relations are
 Support for locked UPDx tags was added (with internal lock/unlock cycle)
    Essential for implementation of UPD1-mode tags

18-Jan-2010                      Elizabeth Gallas - Databases            5
 Conditions DB Improvements in 2009 (2)
   Detector Status (DQ) enhancement:
      black flag, additional detector IDs, combined performance flags
   Magnetic field migrated to Conditions DB (15.4)
   Frontier support (15.4)
   Better ways to configure access to resources (Oracle server,
      POOL file catalogues) at external sites
          based on job information system or site-specific customisation
   By default, ignore SQLite files in the DB release when
      running with real data
          uses Oracle or Frontier in Release 15.4+
   Fix Conditions access to non-POOL files
          these are COOL managed histogram files -- currently are the
           reference MC histograms but will also arise with real data

18-Jan-2010                    Elizabeth Gallas - Databases                 6
 Additional Conditions DB refinements
  Completion of mechanisms for COOL offline  online

  Improved Conditions DB POOL file registration system
     optimized for better grid distribution
          (introduction of Conditions dataset families)
  Implementation of UPD1-mode tags
      (ensure Conditions data used for any given already-taken
      run cannot be overridden. )
  Savannah #51429: Single-row data retrieval for CLOB
     Fixed inefficiency … performance comparisons pending

18-Jan-2010                  Elizabeth Gallas - Databases            7
 Conditions DB Operation
 In 2009: Generally worked well, no major failures
 Issues with high load on CERN Oracle servers due to
    Jobs analysing LBs out of order produced bad behaviour in
     IOVDbSvc caching -- now fixed.
    Jobs with lots of small files with 'simple' events which don't take
     much time to process, but each require full job initialisation and
     data reading from COOL
 Experience so far in ramping up of Frontier usage with locally
    cached POOL files on Grid
        Many user questions … high support load on Rod Walker,
         Sasha Vaniachine, and Richard Hawkings
           people still learning how real data analysis differs from MC

 No formal 'on call' rotation for Conditions DB issues
    still many things only Richard Hawkings can fix.

18-Jan-2010                   Elizabeth Gallas - Databases                 8
 Conditions DB Tag and Data Management
  In 2009: Worked well, but under strain
  For reproducibility of Conditions, all groups must understand
         importance of using the global tags properly, eliminating hard-coded tags, adding
          UPD1 protection to SV (single version) folders
  Paul Laycock, COOL Tag Coordinator
         Spending significant time training / corresponding with the subsystem experts and
          various production coordination groups
                 Hope this will lessen in 2010,
                 but scope/number of global tags increases every month !
                 Communities are diverse (Tier0, HLT, online monitoring, MC, Reprocessing)
         Subsystem experts must be aware of which global tags exist…ensure that the
          correct conditions go into each one … there's no way that Paul can do that.
  The high multiplicity of tags to be maintained comes from:
         different B field configurations,
         particularly in MC
         lots of global tags which differ only by small things
                 (e.g. beamspot size for different energies).
    Is this the best approach for handling small 'delta' changes ?
  Need:
         Better awareness from subsystems and groups
         Better communication between the Data and Monte Carlo coordination groups
         Time / Help to develop COOL Tag management / browsing tools
                 sorely needed even a the current level of global tags
18-Jan-2010                             Elizabeth Gallas - Databases                          9
    Global Conditions Tag Summary (from Paul)
     Good
          4 global tags for tier0 processing, 1 for HLT and 1 for monitoring.
          *All* conditions linked to the global tag were locked before we started taking collisions
          Subsystem experts were educated in the process and are now mostly capable of
           dealing with conditions-related issues using the high-level AtlCool tools.

 Bad
          *Not all* conditions are linked to the global tag
               need to make this a priority this month,
               there was no possibility to change this situation once data-taking had started.
          Single-version folders are not linked to the global tag, but any that are used should
           have UPD1 protection - again this didn't happen because of the risk of breaking
           things, this is the second top priority item.

 Actions:
          Paul: contacting each sub-system individually and cc Andreas and Beate.
          Vakho: identified some manpower for work on web browser for inspecting global tags
               In the meantime, Paul will generate a list of global tags automatically in the nightly
                checks, so at least the information is guaranteed to be up to date

    18-Jan-2010                           Elizabeth Gallas - Databases                                   10
Update Issues Summary (from Paul)
     Good
         The online/offline split was completed in time and the P1 gateway transfer
          mechanism worked well

  Bad
         TRT had problems updating the alignment, understood and corrected.
                 They were updating only some of the channels, and thus old un-updated channels
                  conditions "peeked through".
                 The tags have been truncated and a warning now reports if your update is missing
                  some channels in the folder.
         Rt and t0 constants are in different folders but are completely correlated.
                 A problem came when updating both folders but a run start happens in between
                     Rt is valid from run X, t0 from run X+1. No solution to this, but the expected low
                      frequency of run starts during data taking should mean this isn't an issue.
         Two SCT online folders had problems in the very beginning
                 Code couldn't deal with locked folders
                 This was quickly resolved.
         Not everybody who needs to know about updates, does.

  Actions:
         Richard is updating the AtlCool tools to send the information to more email-lists
          and make them more informative.

18-Jan-2010                               Elizabeth Gallas - Databases                                     11
 Known Additional Challenges for 2010
 Have not done any 24hr calibration loop processing yet
     Will bring a new set of challenges (and work required)
    Any indication when this will start ?
 Need to complete migration of MC simulation using BField from COOL
    Personpower needed !
 Elimination of hard-coded COOL tags in JobOptions
    Highest priority: locking tags used by Tier-0
    Plan: eliminate from real data first, then MC
             Complicated by need to reverse-engineer what was used previously and
              "not breaking anything" (higher priority)
             …uncertainty whether or when complete elimination is possible …
 We must continue to review most resource-consuming subdetector usage
  of Conditions DB in Athena …suggest optimisations
    This is an ongoing effort, increased when real data analysis started,
    some involve particular grid sites where particular types of processing
 Must enhance level of our communication of expected activity to Tier-1s
  and improve monitoring / feedback with Tier-1s
    Oracle … Frontier … Squid servers
    … and understanding limitations at each level …

18-Jan-2010                      Elizabeth Gallas - Databases                    12
 ATLAS DB Projects 2009  2010
  I wanted to mention a few ATLAS projects under considerable
     development / usage using DB based information and the
     groups under which they are evolving:
   Luminosity in Conditions DB (LWG, LMTF)
          Model for Luminosity in COOL laid out … related data is being
           added as available (online and offline)
   Real Data Conditions in Simulation (Simulation)
      Special Simulation meeting Wednesday this week:

   Data Quality in Conditions DB entry/usage (DQ)
      Data Quality and Good Run Lists now in computing tutorials
   Conditions and Catalogue Metadata for TAGs (TAG)
      Evolution of TAG DB and services will increasingly use
       metadata collected from many systems to enhance user
   AGIS – ATLAS Grid Information System (ADC)
      Support the optimization of the ATLAS computing grid

18-Jan-2010                  Elizabeth Gallas - Databases            13
 Thanks to many …
Oracle Databases are a shared resource essential to smooth operation
 THANKS: many many application developers !
    Applications are living, breathing, consuming, and evolving systems …
     requiring careful intervention for collective stability
 Thanks: DBAs, System Administrators, Other Application support
    ATLAS – Gancho Dimitrov, Florbela Viegas
    CERN – Maria Girone and Physics Database Services;
   Andrea Valassi, others for Conditions DB, CORAL support
    Tier-1s – many DBAs and system admins
 Thanks: ADC Operations
 Thanks: Fellow DB Coordinators
    Online – Giovanna Lehmann
             PVSS – Stefan Schlenker, Slava Khomuntnikov
        TAG / Metadata – David Malon, Eric Torrence
        Operations – Sasha Vaniachine
        Frontier – John DeStefano, Rod Walker
             Squid – Douglas Smith
     Conditions / COOL Tagging – Richard Hawkings, Paul Laycock
 Others ! RDSchaffer, Shaun Roe, Hans von der Schmitt, Uli Felzmann …

18-Jan-2010                     Elizabeth Gallas - Databases          14
 Summary (from 3D Workshop in Nov 2009)
                                                              Schemas are stable
              It’s November…                                  Access is controlled
              and the data is coming…                         Monitoring in place
               intensive and diverse
                                                              Depth of experience
              analysis will follow …
                                                              Some redundancy and
              Are we ready ?                                   tools in back pockets
                                                               (COOL Pilot, Frontier)

                                                          Will we be challenged?
                                                                  Certainly !

                                                               Are we Ready ?
                                                                    YES !

18-Jan-2010                     Elizabeth Gallas - Databases                      15
 Sorry for lack of pictures in this talk … images from the web …
  Washup:                            Post mortem:

  Rename this meeting for 2010 ?
     … but its no Jamboree either …

18-Jan-2010             Elizabeth Gallas - Databases          16

To top