The LHC Computing Grid Project Technical Design Report LHCC,

LCG The LHC Computing Grid Project Technical Design Report LHCC, 29 June 2005 Jürgen Knobloch, IT Department, CERN last update 20/05/2009 23:18 This file is available at: http://cern.ch/lcg/tdr/LCG_TDR.ppt LHC Computing Grid – Technical Design Report LCG Technical Design Report - limitations  Computing is different from detector building  It’s ‘only’ software – can and will be adapted as needed  Technology evolves rapidly – we have no control  Prices go down – Moore’s law – Buy just in time – Understand startup  We are in the middle of planning the next phase       The Memorandum of Understanding (MoU) is being finalized The list of Tier-2 centres is evolving Baseline Services have been agreed EGEE continuation is being discussed Experience from Service Challenges will be incorporated Some of the information is made available from (dynamic) Web-sites  The LCG TDR appears simultaneously with the experiments’ ones  Some inconsistencies may have passed undetected  Some people were occupied on both sides Jürgen Knobloch, CERN-IT Slide No. 2 LHC Computing Grid – Technical Design Report LCG The LCG Project  Approved by the CERN Council in September 2001  Phase 1 (2001-2004): Development and prototyping a distributed production prototype at CERN and elsewhere that will be operated as a platform for the data challenges - leading to a Technical Design Report, which will serve as a basis for agreeing the relations between the distributed Grid nodes and their coordinated deployment and exploitation.  Phase 2 (2005-2007): Installation and operation of the full world-wide initial production Grid system, requiring continued manpower efforts and substantial material resources.  A Memorandum of Understanding  … has been developed defining the Worldwide LHC Computing Grid Collaboration of CERN as host lab and the major computing centres.  Defines the organizational structure for Phase 2 of the project. Jürgen Knobloch, CERN-IT Slide No. 3 LHC Computing Grid – Technical Design Report LCG Organizational Structure for Phase 2 LHC Committee – LHCC Scientific Review Computing Resources Review Board - C-RRB Funding Agencies Collaboration Board – CB Experiments and Regional Centres Overview Board - OB Management Board - MB Management of the Project Grid Deployment Board Coordination of Grid Operation Architects Forum Coordination of Common Applications Jürgen Knobloch, CERN-IT Slide No. 4 LHC Computing Grid – Technical Design Report LCG Cooperation with other projects  Network Services  Grid Software  LCG will be one of the most demanding applications of national research networks such as the pan-European backbone network, GÉANT  Globus, Condor and VDT have provided key components of the middleware used. Key members participate in OSG and EGEE  Enabling Grids for E-sciencE (EGEE) includes a substantial middleware activity.  The majority of the resources used are made available as part of the EGEE Grid (~140 sites, 12,000 processors). EGEE also supports Core Infrastructure Centres and Regional Operations Centres.  The US LHC programmes contribute to and depend on the Open Science Grid (OSG). Formal relationship with LCG through US-Atlas and US-CMS computing projects.  The Nordic Data Grid Facility (NDGF) will begin operation in 2006. Prototype work is based on the NorduGrid middleware ARC.  Grid Operational Groupings Jürgen Knobloch, CERN-IT Slide No. 5 LHC Computing Grid – Technical Design Report LCG The Hierarchical Model  Tier-0 at CERN  Record RAW data (1.25 GB/s ALICE)  Distribute second copy to Tier-1s  Calibrate and do first-pass reconstruction  Tier-1 centres (11 defined)  Manage permanent storage – RAW, simulated, processed  Capacity for reprocessing, bulk analysis  Tier-2 centres (>~ 100 identified)  Monte Carlo event simulation  End-user analysis  Tier-3  Facilities at universities and laboratories  Access to data and processing in Tier-2s, Tier-1s  Outside the scope of the project Jürgen Knobloch, CERN-IT Slide No. 6 LHC Computing Grid – Technical Design Report LCG Tier-1s Experiments served with priority ALICE ATLAS X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X CMS LHCb Tier-1 Centre TRIUMF, Canada GridKA, Germany CC, IN2P3, France CNAF, Italy SARA/NIKHEF, NL Nordic Data Grid Facility (NDGF) ASCC, Taipei RAL, UK BNL, US FNAL, US PIC, Spain Jürgen Knobloch, CERN-IT Slide No. 7 LHC Computing Grid – Technical Design Report LCG Tier-2s ~100 identified – number still growing Jürgen Knobloch, CERN-IT Slide No. 8 LHC Computing Grid – Technical Design Report LCG The Eventflow Rate RAW ESD rDST RECO [MB] 2.5 AOD Monte Carlo [MB/evt] 300 Monte Carlo % of real 100 [Hz] ALICE HI 100 [MB] 12.5 [kB] 250 ALICE pp ATLAS CMS LHCb 100 200 150 2000 1 1.6 1.5 0.025 0.04 0.5 0.25 0.025 4 100 50 0.4 2 2 0.5 100 20 100 20 50 days running in 2007 107 seconds/year pp from 2008 on  ~109 events/experiment 106 seconds/year heavy ion Jürgen Knobloch, CERN-IT Slide No. 9 LHC Computing Grid – Technical Design Report LCG CPU Requirements 350 300 250 LHCb-Tier-2 CMS-Tier-2 ATLAS-Tier-2 ALICE-Tier-2 LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN MSI2000 200 Tier-1 150 100 50 0 2007 58% pledged 2008 Year 2009 Jürgen Knobloch, CERN-IT 2010 Slide No. 10 CERN Tier-2 LHC Computing Grid – Technical Design Report LCG Disk Requirements 160 140 LHCb-Tier-2 CMS-Tier-2 ATLAS-Tier-2 ALICE-Tier-2 LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN 100 PB 80 40 54% 20 0 2007 pledged 2008 Year 2009 2010 CERN Tier-1 60 Tier-2 120 Jürgen Knobloch, CERN-IT Slide No. 11 LHC Computing Grid – Technical Design Report LCG Tape Requirements 160 140 LHCb-Tier-1 120 CMS-Tier-1 100 PB Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN 80 60 40 20 0 2007 75% pledged CERN ATLAS-CERN ALICE-CERN 2008 Year 2009 Jürgen Knobloch, CERN-IT 2010 Slide No. 12 LHC Computing Grid – Technical Design Report LCG Experiments’ Requirements  Single Virtual Organization (VO) across the Grid  Standard interfaces for Grid access to Storage Elements (SEs) and Computing Elements (CEs)  Need of a reliable Workload Management System (WMS) to efficiently exploit distributed resources.  Non-event data such as calibration and alignment data but also detector construction descriptions will be held in data bases  Analysis scenarios and specific requirements are still evolving  Prototype work is in progress (ARDA)  read/write access to central (Oracle) databases at Tier-0 and read access at Tier-1s with a local database cache at Tier-2s  Online requirements are outside of the scope of LCG, but there are connections:  Raw data transfer and buffering  Database management and data export  Some potential use of Event Filter Farms for offline processing Jürgen Knobloch, CERN-IT Slide No. 13 LHC Computing Grid – Technical Design Report LCG Architecture – Grid services  Storage Element  Computing Element  Mass Storage System (MSS) (CASTOR, Enstore, HPSS, dCache, etc.)  Storage Resource Manager (SRM) provides a common way to access MSS, independent of implementation  File Transfer Services (FTS) provided e.g. by GridFTP or srmCopy  Virtual Organization Management  Grid Catalogue Services  Interoperability  Interface to local batch system e.g. Globus gatekeeper.  Accounting, status query, job monitoring  Virtual Organization Management Services (VOMS)  Authentication and authorization based on VOMS model.  Mapping of Globally Unique Identifiers (GUID) to local file name  Hierarchical namespace, access control  EGEE and OSG both use the Virtual Data Toolkit (VDT)  Different implementations are hidden by common interfaces Jürgen Knobloch, CERN-IT Slide No. 14 LHC Computing Grid – Technical Design Report LCG Baseline Services Mandate The goal of the working group is to forge an agreement between the experiments and the LHC regional centres on the baseline services to be provided to support the computing models for the initial period of LHC running, which must therefore be in operation by September 2006. The services concerned are those that supplement the basic services for which there is already general agreement and understanding (e.g. provision of operating system services, local cluster scheduling, compilers, ..) and which are not already covered by other LCG groups such as the Tier-0/1 Networking Group or the 3D Project. … Members Experiments: ALICE: L. Betev, ATLAS: M. Branco, A. de Salvo, CMS: P. Elmer, S. Lacaprara, LHCb: P. Charpentier, A. Tsaragorodtsev Projects: ARDA: J. Andreeva, Apps Area: D. Düllmann, gLite: E. Laure Sites: F. Donno (It), A. Waananen (Nordic), S. Traylen (UK), R. Popescu, R. Pordes (US) Chair: I. Bird, Secretary: M. Schulz Timescale: 15 February to 17 June 2005 Jürgen Knobloch, CERN-IT Slide No. 15 LHC Computing Grid – Technical Design Report LCG Baseline Services – preliminary priorities Service Storage Element Basic transfer tools Reliable file transfer service Catalogue services Catalogue and data management tools Compute Element Workload Management VO agents VOMS Database services Posix-I/O Application software installation Job monitoring tools Reliable messaging service Information system ALICE A A A B C A B/C A A A C C C C A Jürgen Knobloch, CERN-IT ATLAS A A A B C A A A A A C C C C A CMS A A A/B B C A A A A A C C C C A LHCb A A A B C A C A A A C C C C A Slide No. 16 LHC Computing Grid – Technical Design Report LCG Architecture – Tier-0 Gigabit Ethernet Ten Gigabit Ethernet Double ten gigabit Ethernet WAN 2.4 Tb/s CORE 10 Gb/s to 32×1 Gb/s Campu s networ k Experimenta l areas Distribution layer … . ..96.. ..96.. ..96.. ..32.. ..10.. ~6000 CPU servers x 8000 SPECINT2000 (2008) ~2000 Tape and Disk servers Slide No. 17 Jürgen Knobloch, CERN-IT LHC Computing Grid – Technical Design Report LCG Tier-0 components  Batch system (LSF) manage CPU resources  Shared file system (AFS)  Disk pool and mass storage (MSS) manager (CASTOR)  Extremely Large Fabric management system (ELFms)  Quattor – system administration – installation and configuration  LHC Era MONitoring (LEMON) system, server/client based  LHC-Era Automated Fabric (LEAF) – high-level commands to sets of nodes  CPU servers – ‘white boxes’, INTEL processors, (scientific) Linux  Disk Storage – Network Attached Storage (NAS) – mostly mirrored  Tape Storage – currently STK robots – future system under evaluation  Network – fast gigabit Ethernet switches connected to multigigabit backbone routers Jürgen Knobloch, CERN-IT Slide No. 18 LHC Computing Grid – Technical Design Report LCG Tier-0 -1 -2 Connectivity National Reasearch Networks (NRENs) at Tier-1s: ASnet LHCnet/ESnet GARR LHCnet/ESnet RENATER DFN SURFnet6 NORDUnet RedIRIS UKERNA CANARIE Jürgen Knobloch, CERN-IT Slide No. 19 LHC Computing Grid – Technical Design Report LCG Technology - Middleware  Currently, the LCG-2 middleware is deployed in more than 100 sites  It originated from Condor, EDG, Globus, VDT, and other projects.  Will evolve now to include functionalities of the gLite middleware provided by the EGEE project which has just been made available.  In the TDR, we describe the basic functionality of LCG-2 middleware as well as the enhancements expected from gLite components.  Site services include security, the Computing Element (CE), the Storage Element (SE), Monitoring and Accounting Services – currently available both form LCG-2 and gLite.  VO services such as Workload Management System (WMS), File Catalogues, Information Services, File Transfer Services exist in both flavours (LCG-2 and gLite) maintaining close relations with VDT, Condor and Globus. Jürgen Knobloch, CERN-IT Slide No. 20 LHC Computing Grid – Technical Design Report LCG Technology – Fabric Technology  Moore’s law still holds for processors and disk storage  For CPU and disks we count a lot on the evolution of the consumer market  For processors we expect an increasing importance of 64-bit architectures and multicore chips  The cost break-even point between disk and tape store will not be reached for the initial LHC computing  Mass storage (tapes and robots) is still a computer centre item with computer centre pricing  It is too early to conclude on new tape drives and robots  Networking has seen a rapid evolution recently  Ten-gigabit Ethernet is now in the production environment  Wide-area networking can already now count on 10 Gb connections between Tier-0 and Tier-1s. This will move gradually to the Tier-1 – Tier-2 connections. Jürgen Knobloch, CERN-IT Slide No. 21 LHC Computing Grid – Technical Design Report LCG Common Physics Applications Simulation Program Event Engines Generators Framework Simulation Geometry MathLibs PluginMgr Foundation Histograms I/O Dictionary Utilities Detector Reconstruction Program Calibration Persistency FileCatalog Algorithms DataBase Conditions Analysis Program Experiment Frameworks Batch Interactive Distributed Analysis Physics 2D Graphics Collections 3D Graphics Core  Core software libraries     SEAL-ROOT merger Scripting: CINT, Python Mathematical libraries Fitting, MINUIT (in C++)  Data management  POOL: ROOT I/O for bulk data RDBMS for metadata  Conditions database – COOL Data Management Fitters GUI Interpreter OS binding NTuple  Event simulation  Event generators: generator library (GENSER)  Detector simulation: GEANT4 (ATLAS, CMS, LHCb)  Physics validation, compare GEANT4, FLUKA, test beam  Software development infrastructure  External libraries  Software development and documentation tools  Quality assurance and testing  Project portal: Savannah Slide No. 22 Jürgen Knobloch, CERN-IT LHC Computing Grid – Technical Design Report LCG Prototypes  It is important that the hardware and software systems developed in the framework of LCG be exercised in more and more demanding challenges  Data Challenges have been recommended by the ‘Hoffmann Review’ of 2001. They have now been done by all experiments. Though the main goal was to validate the distributed computing model and to gradually build the computing systems, the results have been used for physics performance studies and for detector, trigger, and DAQ design. Limitations of the Grids have been identified and are being addressed.  Presently, a series of Service Challenges aim to realistic end-to-end testing of experiment use-cases over in extended period leading to stable production services.  The project ‘A Realisation of Distributed Analysis for LHC’ (ARDA) is developing end-to-end prototypes of distributed analysis systems using the EGEE middleware gLite for each of the LHC experiments. Jürgen Knobloch, CERN-IT Slide No. 23 LHC Computing Grid – Technical Design Report LCG Data Challenges  ALICE  ATLAS  PDC04 using AliEn services native or interfaced to LCG-Grid. 400,000 jobs run producing 40 TB of data for the Physics Performance Report.  PDC05: Event simulation, first-pass reconstruction, transmission to Tier-1 sites, second pass reconstruction (calibration and storage), analysis with PROOF – using Grid services from LCG SC3 and AliEn  Using tools and resources from LCG, NorduGrid, and Grid3 at 133 sites in 30 countries using over 10,000 processors where 235,000 jobs produced more than 30 TB of data using an automatic production system.  CMS  LHCb  100 TB simulated data reconstructed at a rate of 25 Hz, distributed to the Tier-1 sites and reprocessed there.  LCG provided more than 50% of the capacity for the first data challenge 2004-2005. The production used the DIRAC system. Jürgen Knobloch, CERN-IT Slide No. 24 LHC Computing Grid – Technical Design Report LCG Service Challenges  A series of Service Challenges (SC) set out to successively approach the production needs of LHC  While SC1 did not meet the goal to transfer for 2 weeks continuously at a rate of 500 MB/s, SC2 did exceed the goal (500 MB/s) by sustaining throughput of 600 MB/s to 7 sites.  SC3 will start now, using gLite middleware components, with diskto-disk throughput tests, 10 Gb networking of Tier-1s to CERN providing SRM (1.1) interface to managed storage at Tier-1s. The goal is to achieve 150 MB/s disk-to disk and 60 MB/s to managed tape. There will be also Tier-1 to Tier-2 transfer tests.  SC4 aims to demonstrate that all requirements from raw data taking to analysis can be met at least 6 months prior to data taking. The aggregate rate out of CERN is required to be 1.6 GB/s to tape at Tier-1s.  The Service Challenges will turn into production services for the experiments. Jürgen Knobloch, CERN-IT Slide No. 25 LHC Computing Grid – Technical Design Report LCG SC2 – throughput to Tier-1s from CERN Jürgen Knobloch, CERN-IT Slide No. 26 LHC Computing Grid – Technical Design Report LCG Key dates for Service Preparation Sep05 - SC3 Service Phase May06 –SC4 Service Phase Sep06 – Initial LHC Service in stable operation Apr07 – LHC Service commissioned 2005 2006 2007 2008 SC3 SC4 LHC Service Operation cosmics First physics First beams Full physics run • SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s) • SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput) • LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput Jürgen Knobloch, CERN-IT Slide No. 27 LHC Computing Grid – Technical Design Report LCG ARDA- A Realisation of Distributed Analysis for LHC  Distributed analysis on the Grid is the most difficult and least defined topic  ARDA sets out to develop end-to-end analysis prototypes using the LCG-supported middleware.  ALICE uses the AliROOT framework based on PROOF.  ATLAS has used DIAL services with the gLite prototype as backend.  CMS has prototyped the ‘ARDA Support for CMS Analysis Processing’ (ASAP) that us used by several CMS physicists for daily analysis work.  LHCb has based its prototype on GANGA, a common project between ATLAS and LHCb. Jürgen Knobloch, CERN-IT Slide No. 28 LHC Computing Grid – Technical Design Report LCG Thanks to …  EDITORIAL BOARD  I. Bird, K. Bos, N. Brook, D. Duellmann, C. Eck, I. Fisk, D. Foster, B. Gibbard, C. Grandi, F. Grey, J. Harvey, A. Heiss, F. Hemmer, S. Jarp, R. Jones, D. Kelsey, J. Knobloch, M. Lamanna, H. Marten, P. Mato Vila, F. Ould-Saada, B. Panzer-Steindel, L. Perini, L. Robertson, Y. Schutz, U. Schwickerath, J. Shiers, T. Wenaus  Contributions from  J.P. Baud, E. Laure, C. Curran, G. Lee, A. Marchioro, A. Pace, and D. Yocum, A. Aimar, I. Antcheva, J. Apostolakis, G. Cosmo, O. Couet, M. Girone, M. Marino, L. Moneta, W. Pokorski, F. Rademakers, A. Ribon, S. Roiser, and R. Veenhof  Quality assurance by  The CERN Print Shop, F. Baud-Lavigne, S. Leech O’Neale, R. Mondardini, and C. Vanoli  … and the members of the Computing Groups of the LHC experiments who either directly contributed or have provided essential feed-back. Jürgen Knobloch, CERN-IT Slide No. 29

Related docs
Other docs by Sayre Willy
Application for variance
Views: 168  |  Downloads: 0
224_MayurSontakke
Views: 252  |  Downloads: 0
Adendum To Rental Agreement For Additional Tenant
Views: 2246  |  Downloads: 43
Transcript of Interstate Commerce Act
Views: 175  |  Downloads: 0
Local business district
Views: 235  |  Downloads: 0
Devise of real property as consideration
Views: 203  |  Downloads: 2
Federal Judiciary Act info
Views: 217  |  Downloads: 0
EMPLOYMENT AGREEMENT
Views: 890  |  Downloads: 91
INDEPENDENT CONTRACTOR AGREEMENT
Views: 428  |  Downloads: 25
Finance Lecture1
Views: 274  |  Downloads: 12
pressreleasesample
Views: 278  |  Downloads: 4