bookkeeping software by abe20

VIEWS: 251 PAGES: 20

									  European Laboratory for Particle Physics
  Laboratoire Européen pour la Physique des Particules
  CH-1211 Genève 23 - Suisse




  Atlas Offline Software Application
  Metadata
  Requirements
  Document Version:                 1
  Document Date:                    18 July 2002 12:35
  Document Status:                  DRAFT
  Document Author:                  Solveig ALBRAND, Jerome FULACHIER




  Abstract

  The functions which will be provided for the user interfaces of the Atlas Offline Software
  Application Metadata catalogue are described.

  Table 1 Document Change Record

   Title:      Atlas Offline Software Application Metadata Requirements

   ID:         [Document ID]

   Version:    1                                         Originator:      S. Albrand

   Date:       2002-07-16                                Approved By:

   Page        Paragraph         Reason for Change




1 Introduction

  This document is a list of the requirements for an Atlas Application Metadata Catalogue, and
  the interfaces used to access the catalogue.

  An application metadata base or metadata catalogue is also known as a bookkeeping
  database. It could also be called a data “warehouse” because in practice, several different
  databases will be manipulated, and the evolution of their structure must be managed.




                                             DRAFT                                         page 1
Atlas Offline Software Application Metadata                                                         Requirements
2 Definition of terms. References                                                                Version/Issue: 1/1




             Its purpose is to
                   •    contain a logical description of the data produced in various processing steps which
                        may be necessary in the analysis of physics data. This data may be simulated data
                        (Monte Carlo data) or come from real detector output (raw data).
                   •    provide a set of user interfaces to access and to manage the metadata.



    1.1 Purpose of this document

             This document attempts to list a complete set or user requirements, in the sense that all user
             requirements should be included, even if some requirements can not be completely defined.
             User requirements should be testable. The list of user requirements can be considered as a list
             of tests which can be used to accept or reject a design and implementation of an application.

             The requirements list is used to construct the design of the application. No details of design or
             implementation are discussed in the present document. The requirements document is not
             based on, or related to, an existing system.



    1.2 Structure of the document

             Section 2 contains a definition of the terms used in the document, and a list of documents
             used in its preparation.

             Section 3 discusses the context, constraints and dependencies of the application

             Section 4 lists the specific constraints, assumptions, dependencies, and section 5 lists use cases
             and requirements.

             TBD             Some items are followed by a paragraph with this format - which marks an item
                             to be discussed.




         2 Definition of terms. References

             No Atlas wide reference document currently exists which defines the terms which should be
             used to describe the entities which the bookkeeping application must manage. This is a
             situation which leads to ambiguity, as each application is obliged to define its own terms.
             Below is the glossary of terms which are used in the present document.




page 2                                                DRAFT
Atlas Offline Software Application Metadata                                                             Requirements
2 Definition of terms. References                                                                    Version/Issue: 1/1




    2.1 Glossary

             event
                             The ensemble of data for a particular beam crossing, or a subset of this data.
                             Event data may be “real”, directly recorded from the detector for a particular set
                             of trigger conditions, or simulated, using Monte Carlo techniques.
             eventID
                             A tag, which could just be an integer, which defines the event, either within the
                             dataset, or uniquely within all Atlas.
             dataset
                             A collection of events.
             datasetNumber
                             A tag, which could just be an integer, which is assigned to a dataset. We will use
                             this term to mean that part of the total identification of data which is given to the
                             dataset at its creation. This means that for real data, the datsetNumber is the same
                             as the run number assigned by the DAQ.
             partition
                             A file which contains a part of a dataset. Datasets have to be divided into
                             partitions because of file size limitations.
             partitionNumber
                             An integer, from 1 to N, where N is the number of partitions created for a given
                             dataset.
             project
                             A set of datasets which have been created with the same physics, or computing
                             purpose. Each project has a project name, for example “dc0”, “dc1”.
             processingStep
                             A dataset, once created, may undergo a sequence of different processes. We refer
                             to each process in the sequence as a processingStep. Each processingStep has a
                             name, for example “simul”. Different projects may choose to define different sets
                             of processing steps. A processing step maps to a particular algorithm or sequence
                             of algorithms.
             passNumber
                             A dataset may undergo the same processingStep several times with different
                             parameters.
             datasetID
                             A datasetID (or dataset name) is a combination of other terms which is unique
                             within Atlas. For example datasetNumber.processingStep.passNumber.
             attribute
                             An attribute is a named property of a dataset or a partition. Each project and
                             processing step pair is associated with a set of attributes, and the set of relations
                             between these attributes.
             logical file name
                             A tag which completely identifies a partition. It must be unique within the Atlas
                             collaboration. It consists of at least, the dataset Name and the partition number.



                                                         DRAFT                                                 page 3
Atlas Offline Software Application Metadata                                                         Requirements
2 Definition of terms. References                                                                Version/Issue: 1/1




  2.1.1 Acronyms and abbreviations

             AM              Application Metadata
             AMB             Application Metadata Base
             MC              Metadata Catalogue (synonym for AMB)
             AMI             Application Metadata Interface
             LFN             Logical File Name



    2.2 References

                   1          Use cases for LAr Bookkeeping. 200-06-19
                              http://a.home.cern.ch/a/albrand/www/bookkeeping/index.html
                   2          Logical File Names for DC0,
                              http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/DC/doc/LogicalFileNa
                              mesforDC0.pdf
                   3          Application Metadata Base for DC0 2001-11-13
                              http://a.home.cern.ch/a/albrand/www/AMBforDC0.pdf
                   4          Hybrid Event Store. Ed. David Adams 2002-02-28
                              http://www.usatlas.bnl.gov/~dladams/hybrid/hybrid.pdf
                   5          Replica Selection in the Globus Data Grid. S. Vazhkudai et al. Proceedings of the
                              1st. IEEE/ACM International Conference on Cluster Computing and the Grid.
                              IEEE Computer Society Press May 2001
                              http://www.globus.org/research/papers/repsel.pdf
                   6          Job Configuration, Data Production, Bookkeeping. LHCb data management
                              working group. 2001-11-20
                              http://lhcb-comp.web.cern.ch/lhcb-comp/Frameworks/DataManagement/Do
                              cuments/Use_Cases_and_Requirements.pdf
                   7          ATLAS TDAQ/DCS Online Software, Online Bookkeeper Requirements. A.
                              Amorim et al. 2002-02-21
                   8          Virtual data catalogue. P. Nevski, Talk given at Atlas Software week 2002-05-30.
                              http://doc.cern.ch/archive/electronic/other/agenda/a02248/a02248s10t2/tra
                              nsparencies/atlasVDC.pdf
                   9          EUDG WP1: L&B Advanced Queries Extensions. Ales KRENEK, Ludek
                              MATYSKA, Zdenek SALVET.
                              http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=345842&versio
                              n=1.7&p_tab=
                10            The GriPhyN Virtual Data System: Technical Report GriPhyN-2002-02. Jens-S.
                              Vöckler, Mike Wilde, Ian Foster.
                              http://www.griphyn.org/documents/document_server/uploaded_documents
                              /doc--151--VDS1.V8.020118.pdf




page 4                                                  DRAFT
Atlas Offline Software Application Metadata                                                       Requirements
3 General Description; context, constraints, assumptions and dependencies.                     Version/Issue: 1/1




                11           The Raw Data Flow in Atlas. Atlas EDM Group. 2002-06-01
                             http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/architecture/Even
                             tDataModel/RawDataFlow.pdf
                12           Athena framework and Grid Architecture. C.E. Tull.Talk given at Atlas Software
                             week 2002-05-30.
                             http://documents.cern.ch/cgi-bin/setlink?base=agenda&categ=a02248&id=a02
                             248s16t3/transparencies




       3 General Description; context, constraints, assumptions
         and dependencies.

             In this section we give a general description of the application and of the external systems
             with which the application must interact.



    3.1 Context

             The system described in this document is to be applied to Atlas offline software. It must
             therefore be compliant with the general architecture of Atlas offline software where necessary.
             Compliance means conforming to interfaces defined in the Athena framework, and to the
             conventions and definition of terms adopted by the collaboration in the Event Data Model, or
             in the database components. (references [11], [12])

             Since Atlas offline software itself aims to be “grid-capable”, Atlas Bookkeeping must also be
             grid capable.

             The application described in this document applies particularly to Monte Carlo simulation
             but it should be adaptable to real data. In particular, it should be able to communicate in the
             future with the on line bookkeeping [7].



    3.2 Required capabilities of the system

             The bookkeeping application is a database of application metadata. This means that it
             provides a way of stocking data about data. The real physics data consists of large binary files
             written on mass storage devices which are relatively slow to access. A cataloguing system
             should provide a rapid way of determining the physics content of a data file.

             The metadata base must be reliable, robust and secure.

             The bookkeeping application must provide mechanisms for input and output of application
             metadata, with interfaces adapted to each different group of users. The most important of




                                                             DRAFT                                       page 5
Atlas Offline Software Application Metadata                                                        Requirements
3 General Description; context, constraints, assumptions and dependencies.                      Version/Issue: 1/1




             these interfaces is perhaps that which permits the users to query the catalogue using diverse
             search criteria.

             The potential users of the system are widely distributed geographically so the all functions
             must also be available in a distributed manner.

             It is not possible to know at the outset the exact set of attributes which will be needed to
             describe the physics data throughout the lifetime of the application. Therefore any
             implementation must be flexible, and gracefully evolutive.

             The desired functionality can be divided into four groups
                  1.   Data base management.
                  2.   Structure - obtaining information about the state of the catalogue.
                  3.   Input - inserting and updating information in the catalogue database
                  4.   Output - querying the catalogue.



    3.3 General constraints

             A constraint is something that affects the way in which requirements are met. It imposes
             restrictions on the design of the system that do not affect the external behaviour of the system,
             but must be fulfilled to meet technical or project obligations. In this section we must consider
             such factors as time, money, technology, and interaction with already existing systems.



  3.3.1 Time

             The development of the application should keep pace with the general development of Atlas
             offline software. In particular, the bookkeeping application should always be able to meet the
             needs of the Atlas data challenges.



  3.3.2 Money and manpower

             Since projects of software in particle physics are in general not richly endowed either with
             money or manpower, the application should make use of low cost software components
             wherever feasible.

             The design should take into account the limitations of manpower available for the project.
             This implies that existing tools should be reused where possible. It also implies that all stages
             of the project shall be well documented so that maximum use can be made of collaborators
             with only a limited time of participation.




page 6                                                       DRAFT
Atlas Offline Software Application Metadata                                                        Requirements
3 General Description; context, constraints, assumptions and dependencies.                      Version/Issue: 1/1




  3.3.3 Technology

             The choice of technology is determined by five factors.
                   •   The technology must be adapted to the requirements.
                   •   As in all software projects, it is dangerous to become too dependent on a particular
                       technology, as available technologies are in rapid evolution.
                   •   Since manpower is limited, it is useful to choose technology in function of the
                       competence and experience of those who work on the project.
                   •   The large number of potential users, and their geographic distribution implies that
                       any technology chosen must be fairly ubiquitous.
                   •   Certain sites may impose a particular technology, or configuration.



  3.3.4 Interaction with existing systems

             Interaction with the Atlas Athena framework implies that the bookkeeping shall provide a
             C++ interface which complies to the Athena “Service” architecture.

             Interaction with the datagrid implies that the application must be aware of the datagrid
             architecture. The application metadata catalogue is not a part of the grid; it contains
             information which has no relevance to the grid mechanisms. However some parts of the grid
             may need to query the catalogue. These are the replica selection service (reference [5]), and the
             virtual data system (references [8] and [10]). The job submission components of the grid may
             be involved in input of data to the application metadata catalogue. (references [9] and [12])

             Interaction with datagrid tools will imply that certain specific grid compliant interfaces must
             be provided.



    3.4 General Assumptions and Dependencies

             In establishing a list of requirements we may be obliged to make some assumptions about the
             external systems with which the bookkeeping application shall interact. These assumptions
             may become constraints on the external system.

             The application may be dependent on external systems, for example to supply input to the
             application in a specific way.

             The biggest assumptions concern the Atlas offline collaboration itself. The bookkeeping
             application is dependent on clear definitions of the entities which it must manage. It is
             unthinkable for the bookkeeping application to define the way that in which an event is
             identified, or the algorithms which can be used in a particular processing step. On the other
             hand the bookkeeping cannot efficiently manage metadata unless these clear definitions exist.

             Efficient searching mechanisms rely on a organization of the data to be searched. It will be
             necessary to establish some constraints for users. For example, after detailed requirements
             gathering, the bookkeeping may establish valid sets of values for a dataset attribute.



                                                             DRAFT                                        page 7
Atlas Offline Software Application Metadata                                                        Requirements
3 General Description; context, constraints, assumptions and dependencies.                      Version/Issue: 1/1




             For the offline bookkeeping application we need make no assumptions about interactions
             with hardware components. For example we have no need to consider interactions with
             messages from DAQ crates. We assume that interfaces with other software components can
             always be defined using standard and agreed formats, such as CSV or XML. In the specific
             case of datagrid replica managers, the interface will be based on the logical file name, which is
             by definition unique.



    3.5 Users


  3.5.1 Database Administrators

             A group of 2 or 3 people who have complete access to all tables.



  3.5.2 Project Managers

             Project managers work with database administrators to ensure that the correct schema is
             available for their project. This involves the definition of processing steps to be used by the
             project, and establishing the set of attributes which must be catalogued for each step. Project
             managers may also pre-populate the catalogue, as a way of informing site production
             physicists which work is assigned to their group.



  3.5.3 Site Production Managers

             Each site should have at least one person who has the power to edit and delete information in
             a subset of the bookkeeping catalogue. The site production manager is responsible for
             ensuring that the correct metadata from his site is uploaded to the AMB.



  3.5.4 Physicists

             Physicists can query the database using any of the interfaces provided. They may have write
             access on a subset of the bookkeeping catalogue.



  3.5.5 Framework and Grid Components

             These are processes which may query the databases. We expect that framework components
             will use the C++ interface, whereas Grid Components may require some special interface
             development. Some Grid components may provide information to be input to the AMB.




page 8                                                       DRAFT
Atlas Offline Software Application Metadata                                                    Requirements
4 Specific Constraints, Assumptions and Dependencies                                        Version/Issue: 1/1




       4 Specific Constraints, Assumptions and Dependencies

            In this section we give numbered lists of specific items. Each item should be unambiguous.



    4.1 Constraints

            These are interactions with already existing systems. They impose restrictions which the AM
            implementation is not free to alter.



  4.1.1 Dataset Identification

CO01        The AMB will conform to the dataset identification scheme decided by the Atlas Collaboration



  4.1.2 Event Numbering Scheme

CO02        The AMB will conform to the event numbering scheme decided by the Atlas Collaboration



  4.1.3 Logical File Names

CO03        The AMB will conform to the Logical File Name scheme decided by the Atlas Collaboration.



  4.1.4 Job submission

CO04        The AMB Interface must provide a mechanism for both job submission scripts and GUI
            programs to input and output application metadata.

            Job submission may entail using the AMI to query the AMB to obtain suitable input to a new
            job. It is also this stage which will permit the definition of a new dataset.

            Job submission may be by using a “classic” batch script, a special “Grid aware” job
            submission script such as the EUDG WP1 Job Description Language, or by a GUI program
            which is not yet defined. This means that the AMI must be ready to support several I/O
            formats.




                                                       DRAFT                                          page 9
Atlas Offline Software Application Metadata                                                       Requirements
4 Specific Constraints, Assumptions and Dependencies                                           Version/Issue: 1/1




CO05        The AMI must support the interfaces required by grid job submission tools which wish to input
            information to the catalogue.



  4.1.5 Grid resource brokers, replica services and catalogues.

CO06        The AMI must provide a mechanism for the communication of a Logical File Name, or a list of
            Logical File names, resulting from a query based on attributes of a dataset.



  4.1.6 Grid security.

CO07        The AMI must conform to the security requirements of the grid architecture.




    4.2 Assumptions

AS01        The dataset number will be allocated by a separate mechanism, and known to the user before
            any information is added to the AM

            We anticipate that physicists responsible for the Monte Carlo data generation will obtain a
            dataset number, or a set of dataset numbers from a specific server, or from a person
            designated by ATLAS who will distribute dataset numbers.

            In the case of real data, the dataset number correspond to the run numbers allocated by the
            DAQ.

            The dataset number will stay with the dataset through all the processing steps which the
            dataset passes. It could be used to tag the events which belong to the dataset.

            TBD            Will users want to merge events from different datasets into a new dataset? If so
                           will a new number be given to this new dataset? How? Perhaps datasets formed
                           in this way will be collections of event tags, i.e. references to events, and not
                           events themselves. See section 10 of reference [4].


AS02        The datsetID is unique for all Atlas production.

            The datasetID consists of several parts. One of the parts is the datasetNumber. The datasetID
            will be unique within Atlas. It could be used as part of the logical file nameAS03

            We assume that the events within a dataset are numbered consecutively, and that the eventID is
            unique only within the dataset.

            An alternative would be that every event in Atlas has a unique eventID, for example a time
            stamp.

            TBD            Can we assume that the first event generated in a dataset will always be
                           numbered “1”?




page 10                                                DRAFT
Atlas Offline Software Application Metadata                                                        Requirements
4 Specific Constraints, Assumptions and Dependencies                                            Version/Issue: 1/1




AS03        An event is uniquely identified by two tags; the dataset Name of the event collection to which it
            belongs and the eventID.


AS04        Logical File Names will be unique by construction.

            Nevertheless, since the AM database is able to check the uniqueness of LFNs, a mechanism
            should be provided to do it.


AS05        There will be a published schema for Atlas LFN

            The schema will evolve as we progress through the different Atlas data challenges. There is a
            schema published for DC0 [2]


AS06        In the absence of a clear understanding of the details of interactions with Grid tools, we will
            assume that the LFN are constructed by the job submission mechanism, which will inform both
            the AMB and the Grid Replica catalogues.

            An alternative would be that the LFN is attributed by the Replica catalog itself, in which case
            it would follow global Grid rules, and not be a specifically Atlas defined name.


AS07        We assume that the Grid Virtual Data Service will not interact directly with the metadata catalog.
            Communication will be through the replica service, and will use the Logical File Name.




    4.3 Dependencies

            This is the list of components whose behaviour may be affected by interaction with the AMB.



  4.3.1 Framework Persistency Service

DE01        When a new file of events is written, the Framework persistency service will be required to
            inform the AMB.

            Even if the job submission mechanism declares that a new partition will be written by a job, it
            is only when the job terminates successfully that the file can really be considered to exist See
            reference [12] for two scenarios



  4.3.2 On line bookkeeping

DE02        There should be a possibility of exchange of information between the two bookkeeping
            catalogues.

            The requirements of the on line bookkeeping are given in reference [7]




                                                       DRAFT                                            page 11
Atlas Offline Software Application Metadata                                                        Requirements
5 Use Cases and Requirements.                                                                   Version/Issue: 1/1




       5 Use Cases and Requirements.


    5.1 Sources of Use Cases

             Use cases come from references [1],[4] and [6], and also from private communications to the
             authors.



    5.2 List of Use Cases

UC01         Retrieval of datasets for physics analysis.

             The physicist wants to access the datasets, which contains the information he asks for.

             He wants to select datasets according to several criteria, which are:

             The type of event, selected from a list of known event types such as “B-> J/Psi”

             A set of generator parameters.

             He also wants to possibly restrict his selection to datasets, where certain job configuration
             parameters have a certain value.

             Via a Web display program, he is able to do selections on different channels and to get a list of
             the data found.

             At the same moment he also wants to retrieve basic information about the resulting datasets
             such as the total number of events.


UC02         Retrieve additional information about a dataset.

             The results of a data analysis job has shown results, which cannot be explained. To
             understand the differences to the expectation, the physicist has the suspicion, that e.g. the
             Monte-Carlo event generation was performed with incorrect parameters. He wants to inspect
             the parameter set, which was used to produce the dataset in question. Using a display
             program (e.g. WWW browser), he is able to retrieve all relevant information from the
             individual processing steps, which were used to produce this dataset Code configuration.

                   •    Parameter configuration.
                   •    Input datasets.
                   •    Log files.


UC03         Access to event data for application tests.

             Program developers have very similar needs as physicists performing data analysis. Their
             selection is often smaller than for the data analysis, and they sometimes want to have a look to
             some details (as: log files, etc.). They would be happy to be able to do their selections on the



page 12                                              DRAFT
Atlas Offline Software Application Metadata                                                            Requirements
5 Use Cases and Requirements.                                                                       Version/Issue: 1/1




             bookkeeping database directly from Gaudi/Athena. This feature typically is used just to
             check that programs basically work. For this purpose they would like to access event data by
             specifying a statement like “10 events of type B->pi pi”.


UC04         Updating the Bookkeeping database

             A subsystem application (production, analysis,....) has an output (data collection) to write to
             the Persistent event store. The Bookkeeping must be informed. The producer of the output
             can supplement the information written by the Bookkeeper by adding a text comment.
             Precondition
                             The application has permission to write in the persistent data store. A message
                             service exists between the bookkeeping and the event store
             Flow of Events
                             An application has produced a data collection which it wishes to put in the
                             persistent store. A request is sent to the Event Persistency Service.
                             The request is examined. The Event Persistency Service requires information on
                             the origin of the new data collection in order to allocate a name. (or maybe just its
                             pre- assigned logical name)
                             If the Event Persistency service is successful in storing the data collection a new
                             data collection name is assigned which will give access to the newly stored data
                             collection.
                             The Event Persistency service informs the Bookkeeping of the existence of the
                             new object and sends all the information about it.
                             The new name is returned to the application
                             The application may now use this new name write additional information on the
                             new dataset, including a text comment, to the Bookkeeping.
                             Later, another application, or a physicist using a direct interface to the
                             bookkeeping may update the information and add other text comments about the
                             data collection, accessing it by its name.


UC05         Monitoring of production by the production manager

             A production manager has put in place a chain of processing steps to be performed on several
             datasets. Each dataset consists of several thousand events. The work has been distributed
             over a large number of sites and physicists.

             Each site manager completes the work assigned, and updates the bookkeeping database.

             The production manager can query the bookkeeping to determine how many sites have
             completed their work, or how many events have passed each processing step of the chain.



    5.3 List of Functional Requirements.

             Functional requirements describe the behaviour which the system should have.



                                                        DRAFT                                               page 13
Atlas Offline Software Application Metadata                                                          Requirements
5 Use Cases and Requirements.                                                                     Version/Issue: 1/1




  5.3.1 Main Functions Required

UR01         The metadata catalogue will provide information about real or simulated physics data in function
             of a logical file name, or in function of a set of attributes which define a logical file.


UR02         The metadata catalogue will provide information about real or simulated physics data in function
             of a dataset name or in function of a set of attributes which define a dataset.


UR03         The metadata catalogue will permit retrieval of a set of logical file names in function of other
             information provided by the user.


UR04         The information with which the metadata catalogue is concerned is the physics metadata. This
             information should permit physicists to completely determine the contents of a data file without
             having to actually read the file.

             TBD             How are we going to test this?


UR05         In addition to the physics metadata described in UR04, the metadata catalogue should contain
             any information which the users consider necessary for retrieval purposes.

             An example is what can be considered “sociological” information, such as the name of the
             physicist who ran the process, or the production site. There seems to be no reason why the
             AMB should not have a certain amount of overlap with other relevant database applications,
             such as the replica catalogue, or the virtual data catalogue



  5.3.2 Organization of Metadata

UR06         The user should not be required to know the schema of the databases in order to use the set of
             interfaces to it.

             Interfaces should be able to hide the implementation details of databases.


UR07         The organization of metadata (schema) is expected to evolve during the lifetime of the project.
             Reorganization should be transparent for the user.

             This means that it is important that the interfaces to the database should be generic.




page 14                                                DRAFT
Atlas Offline Software Application Metadata                                                         Requirements
5 Use Cases and Requirements.                                                                    Version/Issue: 1/1




  5.3.3 Metadata Set of Attributes

UR08         The set of attributes which describe the contents of data files produced by particular processing
             steps is likely to evolve over the lifetime of the project. Therefore the system must manage
             evolution of sets of attributes.


UR09         Since it is not possible to foresee the attributes of a dataset the bookkeeping must allow users
             to add extra attributes to a particular dataset.


UR010        Different datasets may have different sets of attributes.


UR011        Since a particular logical file may be processed (used as input) several times, it should be
             possible for any physicist to attach a text comment to a logical file.


UR012        Attributes should be able to be associated with a comment which explains their meaning



  5.3.4 Metadata Integrity

UR013        The metadata shall be subject to a certain number of “business” rules which ensure that it is
             coherent.

             numeric data Should always have its units specified. An application needs to make sure that
                             numeric data is entered in the correct units.
             relational data Min <= Max
             text            Fields which are not free comments should be regarded as case sensitive.



  5.3.5 Metadata Acquisition

UR014        Metadata acquisition must be possible in several formats.

             Some physicists like command line interfaces, and some like GUI. Some like plain text and
             some like XML. Some know how to use spreadsheets, others detest them.


UR015        Metadata acquisition must be available in a distributed way

             This means that it should be possible to input to the application metadata catalogue from
             many different sites.


UR016        One of the formats of data input to the AMB should be close to that proposed by EU Grid WP1.

             This will probably be XML.




                                                        DRAFT                                            page 15
Atlas Offline Software Application Metadata                                                          Requirements
5 Use Cases and Requirements.                                                                     Version/Issue: 1/1




UR017        Insertion and update of data into the AMB must be possible from the Athena Framework.

             An Athena Bookkeeping service is required. It remains to be seen whether the user will call it
             directly, or whether it will be called by a persistency service. For the present purpose, the
             answer does not matter.


UR018        Insertion and update of data into the AMB must be possible from a command line interface.

             This of course facilitates communication with other applications, such as the replica
             catalogue.


UR019        Physicists should be able to keep local copies of the data that they have sent to the AMB.

             This is evident when using a command line interface. The requirement implies that any GUI
             developed should produce log files.


UR020        Messages should be sent to the users confirming the data base update.


UR021        If an error condition prevents the database update the user must be informed.



  5.3.6 Accessing Metadata

UR022        The AMB will be available to all users for read access.


UR023        Physicists will have write access to a subset of the AMB.


UR024        Site managers will have delete access to a subset of the AMB.


UR025        To facilitate the interactive access to the data a web interface must be provided.


UR026        Search masks must be provided for the most common searches.

             This should be both from the web, and from a command line.


UR027        It should be possible to refine the result of a query.


UR028        One of the standard queries should allow the selection a list of logical file names defined by
             attribute values.

             Even if the query was to find a particular dataset, the result always maps to a list of LFN.




page 16                                                DRAFT
Atlas Offline Software Application Metadata                                                          Requirements
5 Use Cases and Requirements.                                                                     Version/Issue: 1/1




UR029        One of the standard queries should return an LFN which contains a particular event


UR030        One of the standard queries should allow the display of attribute values in function of a logical
             file name.


UR031        One of the standard queries should allow the display of the history of a logical file.

             This means that it must be possible to see which LFN are ancestors of a particular LFN.


UR032        The result of a query which has selected set of logical file names, should display the total
             number of events selected.


UR033        It must be possible to convert the result of a query which has selected set of logical file names,
             into input which can be understood by an analysis program.


UR034        A mechanism should be provided to allow users to specify their own queries.


UR035        Users should be able to save their private queries for later reuse.


UR036        An Athena service must be provided for AMB access.




  5.3.7 Communication with the Replica Catalogue

             The Replica Catalogue contains the physical description of data.


UR037        The base parameter of the interface with the replica catalogue must be the LFN.

             It will be necessary to communicate with the replica catalogue to satisfy requirement UR033




                                                      DRAFT                                               page 17
Atlas Offline Software Application Metadata                                                            Requirements
5 Use Cases and Requirements.                                                                       Version/Issue: 1/1




    5.4 List of Non Functional Requirements.

             Non functional requirements determine the manner in which the application satisfies the
             functional requirements.



  5.4.1 Choice of Technology

UR038        The design of the AMB shall not depend on any particular platform


UR039        The design of the AMB must not depend on any particular technology


UR040        The design of the AMB shall not depend on any particular implementation of a particular
             technology.



  5.4.2 Scalability

UR041        The AMB design must enable the management of an as yet undetermined amount of data.

             TBD             This is a tricky question - how to estimate the amount of data - in terms of the size
                             of the database itself, and in terms of the number of records that the database may
                             be expected to manage? It is probably possible to estimate the number of datasets
                             which Atlas will produce, but the number of records is strongly dependent on the
                             size of partitions for example.



  5.4.3 Response Time

UR042        To make sure that the response time of a query remains on an interactive scale (of the order of a
             few minutes), some kind of limiting mechanism should be included.

             This is both a technical and a psychological question. A very long query could potentially
             block access to other users. Also after a certain time users lose confidence and decide that the
             application does not work. Either they go away disgusted, or they start launching new
             queries, which will probably have the effect of making the situation worse.

             This could take the form of a warning - or an upper limit on the number of records which a
             query response can contain.




page 18                                                 DRAFT
Atlas Offline Software Application Metadata                                                      Requirements
5 Use Cases and Requirements.                                                                 Version/Issue: 1/1




  5.4.4 Conviviality

UR043        AMI will include user help facilities, and user documentation.


UR044        AMI should remain stable and homogeneous, even if the implementation changes.

             If we change technology, the commands, the Athena service and the web interface should
             remain unchanged for the user.



  5.4.5 Distribution

UR045        The design of the AMB should permit its implementation in a distributed environment.

             This is probably the best way of ensuring scalability and availability.



  5.4.6 Availability

UR046        The context of an International collaboration requires availability of the database 365 days a
             year, and 24 hours a day. The design should ensure that the application does not depend on the
             availability of a unique server.



  5.4.7 Security and Robustness

             Robustness means that the application does not break if a particular resource is not available,
             or if a user performs an unexpected action.

             Security means that the system should prevent unauthorised access to the data.


UR047        An unexpected or inappropriate input from a user should not allow data corruption. It should be
             signalled by an error message indicating the source of the problem


UR048        Write access will be managed by passwords which limit access to subsets of the AMB.


UR049        The AMB will implement an interface which ensures compliance to grid certification
             authorisation.



  5.4.8 Reliability (Back-Up)

             This section is about not losing data.


UR050        Only database administrators have delete privileges on data which is used for searching.



                                                      DRAFT                                             page 19
Atlas Offline Software Application Metadata                                                       Requirements
5 Use Cases and Requirements.                                                                  Version/Issue: 1/1




             This implies that there is a buffering between the acquisition of data - a phase during which
             site managers have delete privileges on a particular subset of the data, and the central
             databases which are available for all the collaboration.


UR051        All AMB servers should be backed up nightly.


UR052        In addition to the regular nightly back-up of the disk image, the AMB must be saved in such a
             way that the database can be recreated at another site.

             This facilitates changing of servers.


UR053        All the data of the AMB should be able to be dumped into text files which are accessible without
             the database engine.

             This facilitates changing technology.




page 20                                              DRAFT

								
To top