Docstoc

Powerpoint - INFN

Document Sample
Powerpoint - INFN Powered By Docstoc
					CMS HLT production using
       Grid tools

    Flavia Donno (INFN Pisa)
    Claudio Grandi (INFN Bologna)
    Ivano Lippi (INFN Padova)
    Francesco Prelz (INFN Milano)
    Andrea Sciaba` (INFN Pisa)
    Massimo Sgaravatto (INFN Padova)
    Zhen Xie (INFN Pisa)
Introduction
   Goals
       Evaluate the existing GRID technologies with real
        applications and on real production environments
       Can these GRID tools be useful to “manage”
        these HEP applications ?
   Collaboration between:
       CMS
       INFN-GRID WP 1 (Installation and Evaluation of
        the Globus toolkit) http://www.infn.it/globus
       DataGrid WP 1 (Grid Workload Management)

                    M. Sgaravatto - INFN Padova
Applications
Signal




                                                                           MC Prod.
              HEPEVT                                    Zebra files
                                CMSIM
                                                        with HITS
 MB           ntuples
                    Catalog import




                                                                           ORCA Prod.
                      ORCA                                      ORCA
Objectivity        Digitization        Objectivity              ooHit
                  (merge signal
 Database           and MB)             Database              Formatter




                                                                           Mirrored Db’s
              Catalog import
 HLT Algorithms
                                            Databases
                                             HLT Grp
     New                                                   Objectivity
                                                            Objectivity
 Reconstructed           Objectivity                         ytivitcejbO
                                                            Database
    Objects               Database                           Database
                                                              esabataD
                     M. Sgaravatto - INFN Padova
                          Tested configuration for CMS production

                  Production manager                       Submit jobs

                                                      condor_submit
                                                      (Globus Universe)


  Condor-G as reliable,
      crash-proof                      Condor-G            Padova
   submitting service




  GRAM as uniform
 interface to different          Globus                                   Globus
resource management
        systems                  GRAM                                     GRAM
   Local
 Resource
Management                    CONDOR                                      LSF
 Systems


 CMS
Farms

                             Bologna
                                                                          Pisa
                                              M. Sgaravatto - INFN Padova
Overview
   PC farms at each site installed and configured
    using the CMS farm kickstart toolkit
   PC farms managed by possible different local
    resource management systems
   Globus GRAM as uniform interface to the
    different local resource management systems
   Globus deployment using the INFNGRID
    distribution toolkit (see Zhen’s presentation)
    considering the INFN setup

                 M. Sgaravatto - INFN Padova
Overview
   Condor-G as reliable, crash proof submitting service
   Job submission and monitoring by the production
    manager from a single machine
   The production manager decides on which Globus
    resource (farm) the job must be executed
   Executable and input files stored on the executing
    farm
   Output files created on the executing machine
   Log files created on the submitting machine
   Authentication using Globus GSI (use of certificates
    signed by INFN CA)

                   M. Sgaravatto - INFN Padova
Results
   The CMS production using Globus and Condor-G
    failed
       Many many many memory leaks found in the Globus
        jobmanager !!!
       ... but we (Francesco Prelz, INFN Milano) have been able to
        provide fixes for these bugs
       Fixes reported to Globus team
            Feedback only for what concerning the bugs in the GAA and
             GSS modules (new fixes “merged” with the original ones)
   Work in progress
       Tests with these fixes
       Fixes included in the INFN-GRID distribution


                        M. Sgaravatto - INFN Padova
Other problems
   Globus GRAM
       Some minor bugs found and fixed (fixes included
        in the INFN-GRID distribution)
       Necessary to “address” some “major” problems
            Scalability (one jobmanager for each job)
            Reliability (the jobmanager is not persistent)
            …
   Condor-G
       Some problems in the current implementation (it’s
        a prototype)
            Scalability in the submitting machine
            Logging
            …            M. Sgaravatto - INFN Padova
Next steps
   New tests considering the next CMS productions with the
    “patched” Globus jobmanager
   New tests with the new implementations of Condor-G and
    Globus jobmanager (by Condor team)
   Tests with bypass
        Tool written by D. Thain (Condor team) that allows redirection of
         standard input/output/error to a remote machine (the submitting
         machine) while the program is running (split execution system)
        Use of GSI authentication mechanisms
        New implementation reliable to several kind of failures
   Tests with the first WP 1 prototype
   “Integration” with software provided by the other WPs (i.e.
    replica management tools, ..)


                         M. Sgaravatto - INFN Padova
                              Prototype workload management system
                                           architecture
                                                                            Resource                           Other
                                                                            Discovery                           info
                                 Submit jobs           Master
                                                                                        Grid Information
                                  (using Class-Ads)                                      Service (GIS)
                                                               condor_submit
             Master chooses in which                           (Globus Universe)
             Globus resources the jobs
                must be submitted
                                                                                                  Information on
     Condor-G able to
                                           Condor-G                                               characteristics and
        provide a                                                                                 status of local resources
 reliable/crashproof job
   submission service


       Globus GRAM
   as uniform interface
to different local resource         Globus             Globus
  management systems                                                               Globus
                                    GRAM               GRAM                        GRAM
   Local
 Resource
Management                       CONDOR                  LSF                        PBS
 Systems


Farms                  Site1
                                                                    Site2                    Site3

                                                      M. Sgaravatto - INFN Padova

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:7/2/2013
language:Unknown
pages:10