ATLAS-Storage-Classes-at-CERN by liamei12345


									                       ATLAS STORAGE CLASSES AT CERN

                                     (January 24, version 0.3)


This note describes a proposal for the ATLAS storage classes at CERN. Initially the role of the
Tier-0 at CERN was limited to calibration and initial processing of the data. Moreover a copy of
the RAW data from the detector would be archived on tape.

Gradually it has become clear that many people also want to do group based analysis on the data
at CERN and this makes it also a Tier-1 (although re-processing is not foreseen). Moreover
people at CERN also use the computing infrastructure for group and end-user analysis. This
makes CERN also a Tier-2 (although Simulation production is not foreseen) and even a Tier-3.

This proposal tries to create some order in the situation that has grown over the past years.
Although not foreseen in the ATLAS Computing Model, the fact is that people use the CERN
infrastructure for non-Tier-0 tasks and we cannot tell them not to. We can only point them to
alternatives (using real Tier-1’s) which may work also or may even work better when the load
on the CERN resources increases. It is also unrealistic to remove group and user data at this
stage because it wasn’t mentioned in the Computing Model. We have to adapt the infrastructure
to better cope with it.

                                      THE CERN TIER-0

In the ATLAS Computing Model the Tier-0 is meant for calibration and initial processing of the
data from the detector. The RAW data from the detector is written into the t0atlas pool. From
here it goes to tape and to the farm for calibration and initial reconstruction. Also from this pool
raw and derived data are exported to the Tier-1 and Tier-2 sites. There is a second disk pool
t0merge for temporary files. The t0atlas and t0merge pools must be accessible for READ by DDM
and for READ and WRITE by the DAQ. The disks should not be accessible neither for READ nor
WRITE by anybody else in ATLAS.

Just as the ESD, AOD and TAG files are exported to the Tier-1 sites to remain on disk, they are
also made available for users on disk at CERN. A full copy of the AOD and TAG files from raw
data reconstruction will be on disk. Moreover a representative sample of RAW data will be made
available. Which sample must be determined by request. When requested also a sample of ESD
data could be made available. These extra samples of RAW and ESD data can only be of limited
size and an upper limit to the storage space for those must be set beforehand.

The atldata pool was intended to contain those AOD files from detector data but has been used
also for many other things. These other things now have to be removed or copied to other
places. The atldata pool should be writable for the T0 managers and readable for everybody in

These storage spaces are managed by the T0 managers only and are completely internal to the
Tier-0 are not managed through SRM.
  Pool          Storage                                      Space         Write           Read
                                      Used for
  Name           Type                                         [TB]         Access         Access

 t0merge         T0D1             Temporary files              50            T0             T0

 t0atlas         T1D0         RAW, ESD, AOD and TAG           200          T0,DAQ        T0, DDM

                                  AODs from initial
 atldata         T0D1                                         100            T0             All

                                      THE CERN TIER-1

AOD data from the reconstruction of simulated data was subscribed to all Tier-1’s and also to
CERN although not foreseen in the Computing Model. To date there is about 50 TB of such data
in the atldata pool. Because of the different data types and retention periods it seems logical to
separate AODs from detector data and AODs from simulated data. This is also done in all other
Tier-1’s. This pool may also contain other data types from simulation if requested. The trigger
community f.e. has requested RDO samples to be available for special studies and this seems to
be the logical place to put them.

The atlprod disk pool at CERN is foreseen to contain this sort of data. At this moment is contains
many other data that need to be removed of or copied to other places. The distribution of AODs
from simulated data is done by DDM and this pool should only be writable as such. It must be
readable for all ATLAS users. Data to this pool is written using SRM and space tokens must be

                               Storage                             Space      Write        Read
 Pool       Space Token                          Used for
                                Type                                [TB]      Access      Access

                                           ESD and AOD from
atldata   ATLASDATADISK         T0D1                                100      T0,DDM         All

                                           ESD and AOD from
atlprod    ATLASMCDISK          T0D1                                100        DDM          All

                        GROUP ANALYSIS AT THE CERN TIER-1
It seems logical that if all AOD data from the detector and from simulation are available to allow
for group analysis. Many different DPD sample are created by each physics group from AODs. In
all other Tier-1 have a dedicated disk pool for each of those physics groups and the same should
be made available at CERN. As this is still organized analysis these disk space must be writable
only for group analysis coordinators and for nobody else. If they were writable for each member
of the physics group the space quota would be harder to control. Obviously the data on disk
must be readable again for everybody from the ATLAS collaboration with an ATLAS certificate.

                          Storage                                   Read Access         Write
   Storage Token                         Used for      Size[TB]
                           Type                                                         Access

                                         DPD and                      Group               All
  ATLASGRPTOP              T0D1                           10
                                          ntup                       managers

                                         DPD and                      Group               All
  ATLASGRPTAU              T0D1                           10
                                          ntup                       managers

                                         DPD and                      Group               All
ATLASGRPEGAMMA             T0D1                           10
                                          ntup                       managers

                                         DPD and                      Group               All
 ATLASGRPMUON              T0D1                           10
                                          ntup                       managers

                                         DPD and                      Group               All
  ATLASGRPJETS             T0D1                           10
                                          ntup                       managers

   ATLASGRPSM              T0D1          DPD and          10           Group              All
                                            ntup                       managers

                                          DPD and                       Group                   All
 ATLASGRPBPHYS             T0D1                            10
                                           ntup                        managers

Other groups that need to be set up all the same way are: HIGGS, HEAVYION, EXOTIC, SUSY and

                      END USER ANALYSIS AT THE CERN TIER-3

Again it is not described in the ATLAS Computing Model but there is a substantial amount of
end-user analysis going on at CERN. In the ATLAS CM speak this is a Tier-3 activity. It can be
debated if this should only be allowed for people on the CERN payroll or whether this should be
allowed for all CERN resident people but a fact is that there is a lot of such data already on our
disks and we better make some rules to be able to control the quota. The following is a proposal.

Disk space must be made available at CERN for all ATLAS users in the default area. This is the
user’s home directory on the afs servers. This should allow them to store small data samples
while doing their analysis on DPD and TAG data.

Larger data samples will not fit on their home directories and a sizeable stage area has to be
made available for those data sets on a disk pool analysis with space token ATLASENDUSER. This
pool is primarily a cash and the data is supposed to copied elsewhere, a local disk of a desktop or
laptop or a storage server in the home institute of the user. The analysis pool at CERN should be
readable and writable for all ATLAS users and the pool should not have a tape backend but a
garbage collector that remove the oldest and least used files from the disks. The data deletion
policy for this server must be well defined and be made very clear to the users.

If CERN has a Tier-3 role in the ATLAS Computing Model it must also make disk space available
for its own people, CERN employees without a home institute on which they can rely to store
their data. ATLAS considers this a CERN responsibility and not an ATLAS task just as it is a
responsibility of all participating institute to make storage space available for their end-users. It
must be disk space at CERN but not accounted against ATLAS and it should be readable and
writeable for only those CERN based end-users. User and space management is also a CERN task.

    Storage Token       Storage Class Size [TB]        Used for      Read Access Write Access

   ATLASENDUSER             T0D0           10       Temp. Storage         All             All

  ATLASCERNUSER             T0D1           10       End-user data    CERN empl.      CERN empl.
                                    MIGRATION PLAN

To have an orderly migration from the current situation the following steps are proposed:

   1. Close atldata and atlprod. Discuss with the CASTOR team how we can make sure that
      only the above mentioned users can write/read to those pools.
   2. Set up the other storage classes with the space mentioned above. Again we must make
      sure that only the people mentioned in the tables above are allowed to write those pools
      and nobody else.
      NB It is probably better to base the permission on service certificates rather than
      personal certificates.
   3. When appropriate move some of the data we recognize from the atldata and atlprod
      pools into the pools where we think they belong and tell the owners of those data.
   4. Remove all data that we cannot determine after having told the user community and
      after having given them some time to react.
   5. From now on better monitor and manage those pools at the T0

                                          NOT BENE

NB1 Note that no ATLAS Tier-2 simulation role is attributed to CERN. This means that we don’t
expect that any such production will be done. If this is not true, the corresponding storage
classes for simulation data would have to be set up as well. Neither have we attributed the Tier-1
re-processing role to CERN. Again if it turns out that re-processing of raw data will have to be
done the corresponding storage classes will have to be set up.

NB2 We need to find a place for other data types that need to be stored at the Tier-0 or at CERN.
Calibration data is not mentioned here. Is that just data like any other data and the calibration
constants are uploaded to the calibration database? Or is this another data type and do we need
to find a proper place?

NB3 Database releases are distributed through DDM. Where are they stored? And what are
retention characteristics?

Ref.1 The ATLAS Space Token document can be found at:

Ref.2The note on the storage tokens for the Tier-1&2 sites to be used for FDR1 and CCRC-1:

To top