Atlas Computing - A. De Salvo – INFN Workshop CCR 27-2004

Document Sample
Atlas Computing - A. De Salvo – INFN Workshop CCR 27-2004 Powered By Docstoc
					                         Atlas Computing
                       Alessandro De Salvo <Alessandro.DeSalvo@roma1.infn.it>
                             Terzo workshop sul calcolo dell’INFN 5-2004




                                                         Outline

                          Computing    model
                          Activities in 2004
                          Conclusions



A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004
        Atlas Data Rates per year

                                        Rate(Hz)     sec/year     Events/year        Size(MB)      Total(TB)
Raw Data                                       200    1.00E+07        2.00E+09              1.6        3200

ESD (Event Summary Data)                       200    1.00E+07        2.00E+09              0.5        1000

General ESD                                    180    1.00E+07        1.80E+09              0.5         900

General AOD (Analysis Object Data)             180    1.00E+07        1.80E+09              0.1         180

General TAG                                    180    1.00E+07        1.80E+09            0.001            2

Calibration                                                                                              40

MC Raw                                                                1.00E+08                2         200

ESD Sim                                                               1.00E+08              0.5          50

AOD Sim                                                               1.00E+08              0.1          10

TAG Sim                                                               1.00E+08            0.001            0

Tuple                                                                                      0.01


   Nominal year: 107 s
   Accelerator efficiency: 50%

                 A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
 Processing times
 Reconstruction
    Time/event for Reconstruction now: 60 kSI2k sec
       • We could recover a factor 4:
             •   factor 2 from running only one default algorithm
             •   factor 2 from optimization
       • Foreseen reference: 15 kSI2k sec/event

 Simulation
    Time/event for Simulation now: 400 kSI2k sec
       • We could recover a factor 4:
             •   factor 2 from optimization (work already in progress)
             •   factor 2 on average from the mixture of different physics processes (and rapidity
                 ranges)
       • Foreseen reference: 100 kSI2k sec/event

    Number of simulated events needed: 108 events/year
       • Generate samples about 3-6 times the size of their streamed AOD samples




      A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Production/analysis model
   Central analysis
         Central production of tuples and TAG collections from ESD
         Estimate data reduction to 10% of full AOD
             •   About 720Gb/group/annum
         0.5kSI2k per event (estimate), quasi real time  9MSI2k

   User analysis
         Tuples/streams analysis
         New selections
         Each user will perform 1/N of the MC non-central simulation load
             •   analysis of WG samples and AOD
             •   private simulations
         Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape
         Assume this is all done on T2s


   DC2 will provide very useful informations in this domain



         A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Computing centers in Atlas
   Tiers defined by capacity and level of service
        Tier-0 (CERN)
           •   Hold a copy of all raw data to tape
           •   Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing)
           •   Keep calibration data on disk
           •   Run first-pass calibration/alignment and reconstruction
           •   Distribute ESD’s to external Tier-1’s
                  •   (1/3 to each one of 6 Tier-1’s)


        Tier-1’s (at least 6):
           •   Regional centers
           •   Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s
           •   Keep on tape 1/6 of Raw Data
           •   Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions
           •   Provide facilities for physics group controlled ESD analysis
           •   Calibration and/or reprocessing of real data (one per year)

        Tier-2’s (about 4 per Tier-1)
           •   Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s
           •   Keep on disk a small selected sample of ESD’s
           •   Provide facilities (CPU and disk space) for user analysis and user simulation (~25
               users/Tier-2)
           •   Run central simulation



        A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
                                              Tier-1 Requirements
                                                                                  External T1 : Storage requirement                     Fraction
R. Jones – Atlas Software Workshop may 2005




                                                                                   Disk (TB)             Tape (TB)
                                              General ESD (curr.)                        429                    150                      1/3
                                              General ESD (prev..)                       214                    150                      1/6
                                              AOD                                        257                    180                      1/1
                                              TAG                                           3                      2                     1/1
                                              RAW Data (sample)                             6                   533                      1/6
                                              RAW sim                                     0.0                  33.3                      1/6
                                              ESD Sim (curr.)                           23.8                     8.3                     1/3
                                              ESD Sim (prev.)                           11.9                     8.3                     1/6
                                              AOD Sim                                      14                     10                     1/1
                                              Tag Sim                                       0                      0                     1/1
                                              User Data (20 groups)                      171                    120                      1/3
                                              Total                                     1130                   1195


                                                                        Processing for Physics Groups 1760 kSI2k
                                                                                Reconstruction 588 kSI2k

                                                      A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
                                              Tier-2 Requirements
R. Jones – Atlas Software Workshop may 2005




                                                                               External T2 : Storage requirement                  Fraction
                                                                                           Disk (TB)             Tape (TB)
                                              General ESD (curr.)                                  26                      0        1/50
                                              General ESD (prev..)                                   0                    18        1/50

                                              AOD                                                  64                      0        1/4
                                              TAG                                                    3                     0        1/1
                                              ESD Sim (curr.)                                      1.4                     0        1/50
                                              ESD Sim (prev.)                                        0                     1        1/50
                                              AOD Sim                                              14                     10        1/1
                                              User Data (600/6/4=25)                               37                     26
                                              Total                                               146                     57


                                                                                    Simulation 21 kSI2k
                                                                                   Reconstruction 2 kSI2k
                                                                                      Users 176 kSI2k
                                                                                      Total: 199 kSI2k
                                                      A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
                                              Tier 0/1/2 sizes
                                                 Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones)
                                                       Scheduled CPU activity, 85% efficiency
                                                       Chaotic CPU activity, 60%
                                                       Disk usage, 70% efficient
                                                       Tape assumed 100% efficient
R. Jones – Atlas Software Workshop may 2005




                                                                                CERN                 All T1           All T2
                                                                             T0+T1/2                    (6)            (24)              Total
                                              Auto tape (Pb)                          4.4                 7.2             1.4             12.9

                                              Shelf tape (Pb)                         3.2                 0.0             0.0              3.2

                                              Disk (Pb)                               1.9                 6.8             3.5             12.2

                                              CPU (MSI2k)                             4.8               14.2              4.8             23.8




                                                       A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
                                                       Atlas Computing System
                                                                           ~Pb/sec                                             PC (2004) = ~1 kSpecInt2k
                                                                                        Event Builder        10 GB/sec

                                                                                                                    Event Filter
                                               •Some data for calibration and                                       ~159kSI2k
R. Jones – Atlas Software Workshop may 2005




                                                  monitoring to institutess                                                        450 Mb/sec
                                                   •Calibrations flow back                                                                                                 ~9 Pb/year/T1
                                                                                                          Tier 0            T0 ~5MSI2k                                     No simulation
                                                                                ~ 300MB/s/T1 /expt
                                              Tier 1
                                                     US Regional                Italian Regional          French Regional              UK Regional                              ~7.7MSI2k/T1
                                                       Centre                         Centre                  Centre                   Centre (RAL)                             ~2 Pb/year/T1
                                                                                                                                                       622Mb/s

                                                                                                                                          Tier2 Centre
                                                                                                                               Tier2 Centre          Tier2 Centre               ~200 Tb/year/T2
                                                                     622Mb/s                        Tier 2                     ~200kSI2k ~200kSI2k ~200kSI2k
                                                                                                                                                                        




                                                                                                               Each Tier 2 has ~25 physicists working on
                                                                        LNF        NA       MI      RM1        one or more channels

                                               Physics data                                                    Each Tier 2 should have the full AOD, TAG &
                                                                                      100 - 1000
                                               cache                                                           relevant Physics Group summary data
                                                                                        MB/s
                                                                                                               Tier 2 do bulk of simulation
                                                                                                   Desktop
                                                       Workstations

                                                                   A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
    Atlas computing in 2004
   “Collaboration” activities
       Data Challenge 2
         • May-August 2004
         • Real test of computing model for computing TDR (end 2004)
         • Simulation, reconstruction, analysis & calibration
     Combined test-beam activities
         • Combined test-beam operation concurrent with DC2
           and using the same tools

 “Local” activities
         •    Single muon simulation (Rome1, Naples)
         •    Tau studies (Milan)
         •    Higgs production (LNF)
         •    Other ad-hoc productions


             A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
        Goals in 2004
   DC2/test-beam
       Computing model studies
       Pile-up digitization in Athena
       Deployment of the complete Event Data Model and the Detector Description
       Simulation of full Atlas and 2004 Combined Testbeam
       Test of the calibration and alignment procedures
       Full use of Geant4, POOL and other LCG applications
       Use widely the GRID middleware and tools
       Large scale physics analysis
       Run as much as possible the production on GRID
         • Test the integration of multiple GRIDs

 “Local” activities
         • Run local, ad-hoc productions using the LCG tools



            A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
                             DC2 timescale
                                September 03: Release7                Put in place, understand & validate:
                                                                             Geant4; POOL; LCG applications
                                                                             Event Data Model
                                                                             Digitization; pile-up; byte-stream
                                                                             Conversion of DC1 data to POOL; large scale persistency
                                                                              tests and reconstruction


                                Mid-November 03: pre-                 Testing and validation
                                                                              Run test-production
Slide from Gilbert Poulard




                                 production release                       
                                                                       Testing and validation
                                March 17th 04:   Release 8                  Continuous testing of s/w components
                                 (production)                                Improvements on Distribution/Validation Kit
                                                                       Start final validation
                                                                             Intensive test of “Production System”
                                                                       Event generation ready
                                May 17th 04:                          Simulation ready
                                                                             Data preparation
                                                                             Data transfer

                                June 23rd 04:                         Reconstruction ready

                                July 15th 04:                         Tier 0 exercise

                                August 1st                            Physics and Computing model studies
                                                                             Analysis (distributed)
                                                                             Reprocessing
                                                                             Alignment & calibration



                                 A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
          DC2 resources

Process             No. of     Time          CPU          Volume       At       Off
                    events     duration      power        of data      CERN     site


                                months         kSI2k          TB         TB       TB
Simulation            107           2          1000           20          4       16               Phase I
RDO                   107           2           100           20          4       16          (May-June-July)
Pile-up               107           2           100           30         30       24
Digitization

Event mixing &        107           2         (small)         20         20        0
Byte-stream

Total Phase I         107           2          1200           90         58       56
Reconstruction        107          0.5          600           5           5       10               Phase
Tier-0
                                                                                                     II
Reconstruction        107           2           600           5           0        5               (>July)
Tier-1

Total                 107                                    100         63       71




                 A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
 Tiers in DC2                                                            More than 23 countries involved


Country                     “Tier-1”          Sites             Grid                kSI2k (ATLAS DC)

Australia                                                        NG                        12
Austria                                                         LCG                        7
Canada                      TRIUMF              7               LCG                       331
CERN                         CERN               1               LCG                       700
China                                                           LCG                        30
Czech Republic                                                  LCG                        25
France                      CCIN2P3             1               LCG                      ~ 140
Germany                      GridKa             3               LCG                        90
Greece                                                          LCG                        10
Israel                                                          LCG                        23
Italy                        CNAF               5               LCG                       200
Japan                        Tokyo              1               LCG                       127
Netherlands                 NIKHEF              1               LCG                        75
NorduGrid                     NG               30                NG                       380
Poland                                                          LCG                        80
Russia                                                          LCG                       ~ 70
Slovakia                                                        LCG
Slovenia                                                         NG
Spain                         PIC               4               LCG                        50
Switzerland                                                     LCG                        18
Taiwan                       ASTW               1               LCG                        78
UK                            RAL               8               LCG                      ~ 1000
US                            BNL              28            Grid3/LCG                   ~ 1000



              A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
    DC2 tools
   Installation tools
        Atlas software distribution kit
        Validation suite

   Production system
        Atlas production system interfaced to LCG, US-Grid, NorduGrid and
         legacy systems (batch systems)
        Tools
          •    Production management
          •    Data management
          •    Cataloguing
          •    Bookkeping
          •    Job submission

   GRID distributed analysis
          •    ARDA domain: test services and implementations


              A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
    Software installation
   Software installation and configuration via PACMAN
        Full use of the Atlas Code Management Tool (CMT)
   Relocatable, multi-release distribution
   No root privileges needed to install
   GRID-enabled installation
        Grid installation via submission of a job to the destination sites
   Software validation tools, integrated with the GRID installation procedure
        A site is marked as validated after the installed software is checked with the validation tools

   Distribution format
        Pacman packages (tarballs)

   Kit creation
        Building scripts (Deployment package)
        Built in about 3 hours, after the release is built

   Kit requirements
        RedHat 7.3
        >= 512 MB of RAM
        Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release

   Kit installation
        pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease

   Documentation (building, installing and using)
        http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution
              A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
    Atlas Production System components
   Production database
        Oracle based
        Hold definition for the job transformations
        Hold sensible data on the jobs life cycle

   Supervisor (Windmill)
        Consumes jobs from the production database
        Dispatch the work to the executors
        Collect info on the job life-cycle
        Interact with the DMS for data registration and movements among the systems

   Executor
        One for each grid falvour and legacy system
           •    LCG (Lexor)
           •    NorduGrid (Dulcinea)
           •    US Grid (Capone)
           •    LSF
        Communicates with the supervisor
        Executes the jobs to the specific subsystems
           •    Flavour-neutral job definitions are specialized for the specific needs
           •    Submit to the GRID/legacy system
           •    Provide access to GRID flavour specific tools

   Data Management System (Don Quijote)
        Global cataloguing system
        Allows global data management
        Common interface on top of the system-specific facilities



               A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Production System architecture
Task = [job]*
Dataset = [partition]*                                                           JOB DESCRIPTION
            Data           Location
                                             Task                 Task
         Management          Hint
                            (Task)         (Dataset)             Transf.      + physics signature
           System                                               Definition

                                                Human
                                                intervention

                            Location
                 Job          Hint           Job                Partition Transformation infos
               Run Info      (Job)        (Partition)            Transf. Release version
                                                                Definition signature




             Supervisor 1              Supervisor 2             Supervisor 3           Supervisor 4

           Jabber                 Jabber                       Jabber                 Jabber

                US Grid                    LCG                       NG                      LSF
                Executer                 Executer                  Executer                Executer
                Chimera                                              RB


                                           RB
               US Grid                    LCG                       NG                   Local Batch


       A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
    DC2 status
 DC2 first phase started May 3rd
     Test the production system
     Start the event generation/simulation tests

 Full production should start next week
     Full use of the 3 GRIDs and legacy systems

   DC2 jobs will be monitored via GridICE
    and an ad-hoc monitoring system, interfaced to the
    production DB and the production systems




        A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
        Atlas Computing & INFN (1)
   Responsibles & managers
       D. Barberis
          •   Genova, inizialmente membro del Computing Steering Group come responsabile del software
              dell.Inner Detector, ora ATLAS Computing Coordinator
       G. Cataldi
          •   Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore
       S. Falciano
          •   Roma1, responsabile TDAQ/LVL2
       A. Farilla
          •   Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction
              Coordinator e coordinatore del software per il Combined Test Beam
       L. Luminari
          •   Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia
       A. Nisati
          •   Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board
       L. Perini
          •   Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE
       G. Polesello
          •   Pavia, Atlas Physics Coordinator
     A. Rimoldi
          •   Pavia, ATLAS Simulation Coordinator e membro del Software Project Management Board
       V. Vercesi
          •   Pavia, PESA Coordinator e membro del Computing Managament Board




               A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
    Atlas Computing & INFN (2)
   Atlas INFN sites LCG compliant for DC2
      Tier-1
           •    CNAF (G.Negri)
      Tier-2
           •    Frascati (M. Ferrer)
           •    Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa)
           •    Naples (G. Carlino, A. Doria, L. Merola)
           •    Rome1 (A. De Salvo, A. Di Mattia, L. Luminari)

   Activities
      Development of the LCG interface to the Atlas Production Tool
           •    F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa
        Participation to the DC2 using the GRID middleware (May - July 2004)
        Local productions with GRID tools
        Atlas VO management (A. De Salvo)
        Atlas code distribution (A. De Salvo)
           •    Atlas code distribution model (PACMAN based) fully deployed
           •    The current installation system/procedure gives the possibility to have easily the cohexistence of
                the Atlas software and other experiments’ environment
        Atlas distribution kit validation (A. De Salvo)
        Transformations for DC2 (A. De Salvo)


               A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
     Conclusions
   First real test of the Atlas computing model is starting
       DC2 tests started at the beginning of May
       “Real” production starting in June
       Will give important informations for the Computing TDR
 Very intensive use of the GRIDs
     Atlas Production System interfacted to LCG, NG and US Grid
      (GRID3)
     Global data management system


   Getting closer to the real experiment computing model




           A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:7/13/2011
language:Italian
pages:22