LHC Computing Grid Project - GridPP10 - CERN - 02jun03

Document Sample
LHC Computing Grid Project - GridPP10 - CERN - 02jun03 Powered By Docstoc
					      LCG   LHC Computing Grid Project - LCG

                         LCG and GridPP

                GridPP Collaboration Meeting at CERN
                              2-4 June 2004

                   Les Robertson – LCG Project Leader
              CERN – European Organization for Nuclear Research
                           Geneva, Switzerland

                                les robertson - cern-it 1


                   Status of the LCG Project
                   LCG & GridPP
                   Preparing for 2007

31-May-04              les robertson - cern-it-2
                                   LCG Status
               Half-way through the R&D Phase (2002-2005)

       Applications Area

           POOL – persistency for event data integrated in three experiments

           SEAL – tools widely used by experiments, robust LCG dictionary

           SPI – comprehensive software development infrastructure, used
            well beyond AA: experiments, other LCG areas, EGEE, other
            projects (CLHEP, probably Geant4)

           Successful, and deepening, collaboration with ROOT
              data store technology in POOL
              analysis environment used either directly or via interfaces
                                       (pyROOT, pyLCGDict, AIDA ROOT)
31-May-04                              les robertson - cern-it-3
                                 LCG Status (ii)
               Half-way through the R&D Phase (2002-2005)

           Simulation - important steps in simulation physics validation –
              Geant4 em and hadronic physics "as good as or better than Geant3“
              simulation physics requirements for LHC experiments documented
              collaboration on validation work with both Geant4 and FLUKA

           Generator library GENSER developed, populated, and is being
            evaluated/adopted by experiments

           Strong LHC Geant4 program squarely focused on LHC priorities -
            successfully deployed in CMS and ATLAS

31-May-04                              les robertson - cern-it-4
    LCG                       LCG Status (iii)

            Applications Area Highlights for the next year
               Common conditions DB
               Common math library development
               Closer relations with ROOT –
                   Aiming for convergence with ROOT on mathlib and
                   ROOT will use LCG AA software components, as well as vice
               Physicist-level event collections: collaboration
               POOL and Geant4 in Data Challenge production in CMS,
                ATLAS and LHCb
               Experiment adoption and validation will continue to be the
                measure of success

31-May-04                             les robertson - cern-it-5
    LCG                        LCG Status (iv)
       CERN Fabric
           Fabric management automation
              First version of the ELFms suite in production at CERN
                 - important contribution from EDG WP4
              In-sourcing of systems administration  cost containment for Phase 2

           High performance data recording
               File system  network  tape storage  1 GB/s in April 2003
               ALICE – Mass Storage Data Challenges at CERN
                     2002 – target 200 MB/sec sustained – achieved 280 MB/s
                     2003 – target 300 MB/sec – achieved 280 MB/sec
                     2004  450 MB/s, 2005  700 MB/s -- --
                                        -- -- CDR in 2008  1.2 GB/s
           New version of CASTOR ready for release
           Computer Centre upgrade on track  2.5 MW, second computer room
           Phase 2 acquisition process starting
31-May-04                             les robertson - cern-it-6
    LCG                         LCG Status (v)

            Grid Deployment
                Service opened on 15 September 2003 – with 12 sites
                Middleware package - components from EDG & VDT
                About 30 sites by the end of the year –
                        - but not heavily used
                Upgraded version of the grid software (LCG-2) in February
                Good reliability, main issues are with data management
                Now more than 50 sites connected - > 3,500 processors

               VO for D0 managed by NIKHEF
               Hewlett Packard to provide “Tier 2-like” services for LCG,
                initially in Puerto Rico

31-May-04                              les robertson - cern-it-7
    LCG     LCG for the 2004 Data Challenges

 LCG-2 target                                                 Grid Operations Centre
    – the 2004 “LHC Data Challenges”
   ALICE and CMS data challenges started at
                                                                     at RAL
    the beginning of March
   LHCb and ATLAS – started in May

 Much has to be learned about sustained                        User Support Centre
    operation of a grid
                                                                    at FZK
 The big challenge for this year - data –
      - file catalogue,
      - replica management,
      - database access,                  Planning for a second operations
      - integrating mass storage             & support centre in Taipei

31-May-04                         les robertson - cern-it-8
    LCG            LCG-2 Support Agreements

        VDT (US tools)                         VDT team at Wisconsin (NSF
        DataGrid resource broker               INFN/CERN
        DataGrid replica management            CERN
        DataGrid relational information        RAL
        system (RGMA)
        GridIce monitoring tools               INFN
        GLUE schema                            INFN
        VOMS                                   INFN
        dCache storage manager                 DESY
        CASTOR storage manager                 CNAF/PIC/CERN
        Security & VO policies and             Leadership by RAL
31-May-04                             les robertson - cern-it-9
    LCG                         LCG Status (vi)
            Middleware Development
               Managed by EGEE –
                Exploiting and integrating experience, expertise and
                technology from EDG, VDT, AliEn
               Joint EGEE-VDT design team
               Focus on HEP requirements + bio-medical
               Strongly coupled to ARDA - a new LHC distributed analysis
               We need to see an early prototype soon,
                                   involving HEP applications and users
                and a “usable system” within a year - stability, performance
                as important as functionality

            By this time next year we will have to start making decisions
                 about the middleware to be used in 2007

31-May-04                              les robertson - cern-it-10
    LCG                     LCG Status (vii)

ARDA – distributed physics analysis
         batch to interactive                                    ALICE    ATLAS    CMS      LHCb
         end-user emphasis                                       Distr.    Distr.  Distr.   Distr.
                                                                 analysis analysis analysis analysis

      4 pilots by the LHC experiments (core                             ARDA Project
       of the HEP activity in EGEE NA4)                                  Collaboration
      Rapid prototyping  pilot service                                 Coordination
      Providing focus for the first products                              Planning
       of the EGEE middleware
      Kept realistic by what the EGEE                             EGEE Middleware Activity
       middleware can deliver

31-May-04                           les robertson - cern-it-11
    LCG                 LCG & GridPP
 GridPP is a major part of the common LCG activity at CERN

                             GridPP Staff by Activity

   Active in most
   key areas                      LCG Management
   of the project                 Distributed
                                                                    Software Process
                      Grid Infrastructure,                           Core Libraries and
                       Operations and                                      Services
                         User Support
                          Grid deployment                               Simulation
                           integration &                              (GEANT4,etc)
                                  Grid Data                        ROOT
                                 Management                   Fabric automation
                                     Grid Security
                                           Data Storage
31-May-04                        les robertson - cern-it-12
             LCG                            LCG & GridPP (ii)
 GridPP is a major part of the common LCG activity at CERN
                                LCG Phase 1 Agreed External Personnel Profile
             90   EU
                  CERN (P+M)
             70   Taipei
                                                                                               The largest
             60   Portugal                                                                     participation of
                  Switzerland                                                                  any country

             40   Italy


             10                        UK funded staff
              2002              2003                2004                                2005

31-May-04                                                  les robertson - cern-it-13
    LCG                  LCG & GridPP (iii)
                Total participation – staff and materials
                     FTE-years   MCHF-equiv.                    materials   total (MCHF)   share
   United Kingdom      78.5         10.8                           2.9           13.7       38%
   Germany             14.4         1.85                           3.7            5.6       15%
   Italy               34.7         3.6                            1.6            5.2       14%
   United States       23.7         2.8                            0.4            3.2       9%
   France               22          2.4                             0             2.4       7%
   Spain                7.8         0.8                            0.6            1.4       4%
   Russia               7.3         0.9                             0             0.9       2%
   Israel               8.6         0.6                             0             0.6       2%
   Switzerland          5.1         0.6                             0             0.6       2%
   Taipei               5.2         0.6                           0.03            0.6       2%
   Belgium               0            0                            0.4            0.4       1%
   Japan                 0            0                            0.4            0.4       1%
   Portugal             4.5         0.4                             0             0.4       1%
   Sweden               4.1         0.3                            0.1            0.4       1%
   Hungary              3.5         0.3                             0             0.3      0.8%
   Finland               0            0                            0.1            0.1      0.3%
                       219.4        26.0                          10.2           36.2      100%
31-May-04                          les robertson - cern-it-14
    LCG                       LCG & GridPP (iv)

               The operation centre at RAL
                  Key role in deploying the global grid
                  Design, definition, implementation of process and tools
                  Central accounting
               Leadership of the Security Group
                  Defining processes, negotiating agreements
                  Close collaboration with site security officers
                  Excellent leadership by David Kelsey
               Major middleware role in EDG and EGEE
               Advanced planning of Tier-1/Tier-2 infrastructure in the UK
               Constructive participation in LCG management boards
                  Oversight Board, SC2, GDB

31-May-04                               les robertson - cern-it-15
      LCG               Preparing for 2007
           2003 – has demonstrated event production
           In 2004 we must show that we can also                                 core data
            handle the data – even if the computing                               handling and
            model is very simple                                                  batch
            -- This is a key goal of the                                         Decisions on
                                                                                 final core
                               2004 Data Challenges                              middleware

           Target for end of this year –                           2006
              Basic model demonstrated                                      Installation and
                       using current grid middleware                         commissioning
              All Tier-1s and ~25% of Tier-2s                                   Initial service
                       operating a reliable service                              in operation
              Validate security model,                             2007
                       understand storage model                            first data
              Clear idea of the performance,
                       scaling, and management issues

31-May-04                              les robertson - cern-it-16
    LCG         LCG-2 and Next Generation Middleware
        2004                                   2005

                                                                         Next Generation
            prototype    product development          mainline service

           LCG-2 will be the main service for the 2004 data challenges
           This will provide essential experience on operating and managing a
            global grid service – and will be supported and developed
           Target is to establish a base (fallback) solution for early LHC years

           LCG-2 will be maintained until the new generation has proven itself

31-May-04                                les robertson - cern-it-17
    LCG                      Service Challenges
               Purpose
                  Understand what it takes to operate a real grid service
                    – run for days/weeks at a time (outside of experiment
                    Data Challenges)
                  Trigger/encourage the Tier1 & large Tier-2 planning –
                    move towards real resource planning – based on realistic
                    usage patterns
                  Get the essential grid services ramped up to target
                    levels of reliability, availability, scalability, end-to-end
                  Set out milestones needed to achieve goals during the
                    service challenges
               NB: This is focussed on Tier 0 – Tier 1/large Tier 2
                  Data management, batch production and analysis
               Short term goal – by end 2004 –
                have in place a robust and reliable data management service
                and support infrastructure and robust batch job submission
                                     Ian Bird – ian.bird@cern.ch
                                         les robertson - cern-it-18
    LCG           Service challenges – examples
           Data Management
               Networking, file transfer, data management
               Storage management and interoperability
               Fully functional storage element (SE)
           Continuous job probes
               Understand limits
           Operations centres
               Accounting, assume levels of service responsibility, etc
               Hand-off of responsibility (RAL-Taipei-US/Canada)
           "Security incident"
               Detection, incident response, dissemination and resolution
           User support
               Assumption of responsibility, demonstrate staff in place, etc
           VO management
               Robust and flexible registration, management interfaces, etc
           Etc.
                                    Ian Bird – ian.bird@cern.ch
                                        les robertson - cern-it-19
    LCG         Data Management Service Challenge
            First steps in planning at last week’s HEPiX in Edinburgh
                Evolve towards a sustainable service
                   Permanent service infrastructure
                   Workload generator – simulating realistic data traffic
                   Identify problems, develop solid (long-term) fixes
                   Frequent performance limits tests
                        1-2 week periods with extra resources brought in
                   But the goal is to integrate this in the standard LCG service
                    as soon as practicable
               Focus on
                   Service operability - minimal interventions, automated
                    problem discovery and recovery
                   Reliable data transfer service
                   End-to-end performance
               Complete high level planning during June

31-May-04                                les robertson - cern-it-20
    LCG                               Conclusions

               LCG now has first applications products in production within
               The LHC Grid is in sustained operation, with reasonable reliability,
                but still missing basic functionality
               This year we begin to tackle the data problems
                -- and the operation of a permanent, reliable grid service
                  Experiment data challenges
                  Service data challenges
               The UK, through GridPP, is making a major contribution to LCG
                through work in the experiments, UK regional centres and a very
                large presence in the common project activities at CERN
               I look forward to continued close collaboration as we prepare for

                            Thank you for your support

31-May-04                                les robertson - cern-it-21