030514_dial_reqs

Document Sample
030514_dial_reqs Powered By Docstoc
					                      DIAL requirements for
                        ATLAS database

                      ATLAS Software Workshop
                          Database Session

                           David Adams
                              BNL
                           May 14, 2003

        David Adams

ATLAS
                       Contents
Design
Datasets
ATLAS datasets
Use cases
Requirements
Conclusions




        David Adams

ATLAS                 DIAL requirements   ATLAS SW DB session   May 14, 2003 2
                                Design
DIAL has the following components
        • Dataset describing the data of interest
             – Organized into events
        • Application
             – Event loop providing access to the data
        • Task
             – Result to fill for each event
             – Code to process each event
        • Scheduler
             – Distributes processing and combines results
        David Adams

ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 3
                                                                  9. fill
   Dataset 1               Dataset 2               Result                       Job 1
     7. create
                                6. split                            8. run(app,tsk,ds1)
               Dataset                         10. gather

         4. select                         Scheduler
e.g. ROOT
                 User               1. Create or locate             8. run(app,tsk,ds2)
                Analysis            5. submit(app,tsk,ds)
e.g. athena
               2. select        3. Create or select                             Job 2
Application                     Task                     Result      9. fill


                       Result              Code
         David Adams

 ATLAS                               DIAL requirements   ATLAS SW DB session   May 14, 2003 4
                                      Datasets
Datasets specify event data to be processed
Datasets provide the following
        • List of event identifiers
        • Content
             – E.g. raw data, refit tracks, cone=0.3 jets, …
        • Means to locate the data
             – List of of logical files where data can be found
             – Mapping from each event ID and content to a file
             – Means to access data iterating over events
                      > For athena, return an event collection
        David Adams

ATLAS                               DIAL requirements   ATLAS SW DB session   May 14, 2003 5
             AOD

          Electrons

               Jets

           Tracks
Content




          Clusters

             Raw


                      Event                                  File   Data object




                                       Example dataset with mapping to files


               David Adams

ATLAS                         DIAL requirements   ATLAS SW DB session   May 14, 2003 6
                         Datasets (cont)
User may specify content of interest
        • File list should be restricted to those required to
          access this content
        • Only this subset required for processing
        • Application processing dataset (e.g. athena)
          must be able to run with only these files
        • Content selection may be ignored
             – Lose the optimization of only delivering files with
               useful data
             – No loss if the data for each event is contained in a
               single file
        David Adams

ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 7
             AOD

     Electrons

              Jets

          Tracks
Content




          Clusters

             Raw

                     Event    Selected file                  File   Data object




                                        Example dataset with content selection



                David Adams

  ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 8
                          Datasets (cont)
Distributed analysis requires means to divide
dataset into sub-datasets on event boundaries
        • Sub-dataset is a another dataset
        • Do not split data from any one event
        • Usually split along file boundaries
             – Jobs can be assigned where files are already present
             – Split most likely done at high level (grid, site, farm)
        • May assign different events from one file to
          different jobs to speed processing
             – This split likely done at low level (farm, node)
        David Adams

ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 9
             AOD

    Electrons

              Jets

          Tracks
Content




          Clusters

             Raw

                     Event                           Selected events        File   Data object

                              Selected file


                                                  Example sub-dataset with content selection



                David Adams

 ATLAS                                        DIAL requirements   ATLAS SW DB session   May 14, 2003 10
                        ATLAS datasets
ATLAS dataset implementations
        • Dataset is interface for distributed processing
        • ATLAS implementations of this interface can
          be processed with DIAL
        • ATLAS dataset candidates:
             – AthenaRoot file (existence proof)
             – Combined ntuple hbook file (soon)
             – Pool event collection (next?)
        • What is required from ATLAS DB?
             – See following…
        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 11
                      ATLAS datasets (cont)
Following slides show
        • System dependencies
             – Note ATLAS datasets are not in the ATLAS system
        • ATLAS DIAL use cases
             – For dataset which corresponds to an event collection
        • Requirements for ATLAS DB and POOL
             – These follow from the use cases




        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 12
                      ATLAS datasets (cont)

                       ATLAS_DIAL
                                                       ATLAS dataset
                                                      implementations
             DIAL      ATLAS_dataset


                                                                 Requirements
            dataset                           ATLAS




                                       ROOT           POOL




            Xerces           System dependencies


        David Adams

ATLAS                         DIAL requirements   ATLAS SW DB session   May 14, 2003 13
                              Use cases
1. Build starting dataset
        • Select event collection (EC)
             – Query on production database (PDB)
        • Identify files holding the data for this collection
             – From PDB or EC
        • Determine the event ID’s included in these files
             – From EC (if explicit)
        • Determine content
             – Might be in PDB or EC
             – Or might be derived from xform and parent EC
             – Or read the files (slow)
        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 14
                          Use cases (cont)
2. Split dataset along file boundaries
        • Required to distribute processing
        • Either create new EC from subset of files in the
          original EC
             – By copying relevant event headers or
             – New EC which is old plus filter
        • Or if EC is hierarchical, split it along its
          internal boundaries
             – Presumably these are file boundaries
             – Natural that file names are stored in hierarchy

        David Adams

ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 15
                      Use cases (cont)
3. Select content
        • Optional; may speed processing
        • Most useful if the selected content is in
          different files than the unneeded content
        • Identify files which hold the selected content
        • Either copy EC and modify event headers to
          drop uninteresting objects
        • Or use existing EC and assume athena does not
          load the missing content

        David Adams

ATLAS                   DIAL requirements   ATLAS SW DB session   May 14, 2003 16
                      Use cases (cont)
4. Apply event selection
        • Result of processing can be a list of selected
          events
        • Next step is then to apply this selection to the
          original dataset
        • Either copy only the selected event headers to a
          new EC
        • Or reference the original EC and add a filter
          which only exposes the selected events

        David Adams

ATLAS                    DIAL requirements   ATLAS SW DB session   May 14, 2003 17
                           Requirements
Content specification
        • Which data is found for each event in a dataset
        • For combined ntuples, content is list of blocks
             – Each subdetector contributes a block
        • For reconstructed data, natural to follow the
          StoreGate model
             – EDO specified by type and string key
             – Content is a collection of type-keys
             – Or maybe a grouping of these (e.g. tracking data)
             – Would be nice to have an ATLAS class to represent
               content
        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 18
                      Requirements (cont)
Production database
        • Provides desired queries to locate event
          collections for analysis
        • Might provide file list, event ID list or content
          if not available from the EC




        David Adams

ATLAS                     DIAL requirements   ATLAS SW DB session   May 14, 2003 19
                      Requirements (cont)
Transformation catalog
        •   Transformations to apply to an EC
        •   Part of a virtual data system
        •   Might want to register DIAL task/applications
        •   Transformation entry should include input and
            output content
             – To determine suitability for application to a input
               EC/dataset
             – Can be used to deduce the content added to a output
               EC/dataset

        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 20
                      Requirements (cont)
Provenance catalog
        • Input EC’s and transformation used to produce
          an EC.
        • This is another component of a virtual data
          system
        • Provenance of an EC can be used to deduce the
          full content of the EC
             – Append output content of each transformation



        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 21
                       Requirements (cont)
Event collection
        • Must provide
             – List of files
             – Event ID’s
             – Content (optional)
             – Map of event ID and content to file
             – Split into sub-datasets
             – Content selection (if content is used)
             – Event selection
        • Separately consider implicit, explicit and
          hierarchical EC’s
        David Adams

ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 22
                      Requirements (cont)
Event filter
        • We often construct an EC which is made up of
          a subset of the event headers in an existing EC
        • Of course, this can be done by copying the
          relevant event headers
        • But we can avoid the copy with a new
          collection which references the original and has
          an event filter
        • User querying or iterating over the new
          collection only sees the selected events
        David Adams

ATLAS                     DIAL requirements   ATLAS SW DB session   May 14, 2003 23
                             Requirements (cont)
Content filter
        • We may want to restrict the visible content of
          an EC (e.g. tracks but not jets)
        • This can be done by copying but requires
          modifying all of the event headers to drop the
          unwanted content
        • Probably better to implement this on the
          application side: teach athena to
             – Only load specified data
                      > EC could carry list of desired content
             – Or only load data for which files are accessible
        David Adams

ATLAS                               DIAL requirements   ATLAS SW DB session   May 14, 2003 24
                      Requirements (cont)
Implicit event collection
        • Is a collection of files from which event headers
          can be constructed
        • Assume each file is a valid EC
        • Must open files to get event ID’s and content
        • Mapping to files and splitting are trivial




        David Adams

ATLAS                     DIAL requirements   ATLAS SW DB session   May 14, 2003 25
                      Requirements (cont)
Explicit event collection
        • Event ID’s obtained with query
        • Content can be obtained with query if EC
          includes attributes to describe content
        • Files and file mapping could be obtained with a
          query if a file attributes are added




        David Adams

ATLAS                     DIAL requirements   ATLAS SW DB session   May 14, 2003 26
                             Requirements (cont)
Hierarchical event collection
        • Is a collection of collections
        • Mapping to files and splitting are easy if
             – lowest level branch maps to a file
             – File ID attribute assigned to that branch
                      > (instead of to each event or EDO!)
        • Dataset and hierarchical collection have very
          similar motivations
        • Will likely need to impose requirements on the
          hierarchical attributes and their queries
        David Adams

ATLAS                               DIAL requirements   ATLAS SW DB session   May 14, 2003 27
                            Conclusions
ATLAS event collections and DIAL
        • A DIAL dataset can be constructed from an
          ATLAS event collection
        • This will make it possible to use DIAL to
          process ATLAS event collections
             – Dataset interface used for distributing the processing
             – EventCollection interface used for athena processing
               of event data
        • Event collections should be implemented
          carefully to optimize DIAL processing
        • Expect similar benefit to other distributed
          production or analysis systems
        David Adams

ATLAS                        DIAL requirements   ATLAS SW DB session   May 14, 2003 28
                       Conclusions (cont)
ATLAS requirements
        • With appropriate attributes, an EC dataset can
          be constructed and used without opening event
          data files
             – Event ID’s and content obtained by queries on EC
             – EC is explicit or (better) hierarchical
        • Embedding filters in an EC circumvents the
          need to copy the EC upon event selection
        • Content selection can be handled by athena
             – EC might carry the selected content so queries
               return the appropriate files
        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 29
                       Conclusions (cont)
ATLAS requirements (cont)
        • Hierarchical collection is an especially
          attractive option for distributed processing
             – Natural decomposition for distributed processing
             – If appropriate hierarchy is chosen
POOL requirements
        • Collections should support filtering, i.e.
          embedded queries for iteration
        • We need careful comparison of our needs with
          the plans for hierarchical collections
        David Adams

ATLAS                       DIAL requirements   ATLAS SW DB session   May 14, 2003 30

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:0
posted:7/11/2011
language:English
pages:30