"CMS Offline and Computing Project Status Plans"
CMS Offline and Computing Project : Status & Plans Lucia Silvestris INFN-Bari Napoli, 14/02/07 Outline Scope and organizational structure of the project Status : some highlights from 2006 Plans for 2007: – Goals and Milestones – Offline Release Plan & Production Key tasks and manpower needs 14/2/2007 Planning and Milestones L. Silvestris 2 CMS Organization Chart Computing, Offline, Detector Performance Groups (DPG) Computing Offline Coordination Commissioning & Run Coordination Common J. Harvey Coordination M. Kasemman Coordination & L. Silvestris T. Camporesi,D. Acosta P. McBride Integration Tasks I.Fish, NN Software Development Calibration/Alignment Facilities / Tools O. Buchmuller Infrastructure D.Lange, A.PFeiffer L. Malgeri Operations D. Bonacorsi, NN ECAL Fast Simulation Framework DPG Computing F. Beaudette L Sexton-Kennedy, NN Commissioning P. Janot HCAL S.Belforte, Data & Workflow Reconstruction DPG F.Würthwein Management T. Boccali P. Elmer, NN S. Rahatlou Data Operations MUON C. Pauss, NN HLT and Simulation DPG DQM framework F. Cossutti E. Meschi, NN D. Elvira TRACKER User Support DPG Kati Lassila- L1 Software Analysis Tools Perini, NN J. Brooke C. Jones W. Sun L. Lista 14/2/2007 Planning and Milestones L. Silvestris 3 Scope of the CMS Computing Project Much of the new organization is built on continuity – Changes are motivated by preparations for operations – Effective operations of data taking requires close coordination with Offline and Computing Scope of Computing The Computing Project is responsible for managing the computing infrastructure and for all operational aspects of CMS data processing activities. 14/2/2007 Planning and Milestones L. Silvestris 4 Scope of CMS Offline Project Coordinates release of all CMS Offline Software Core Software Programme: specialist software skills – Develops framework that implements common software services needed in all data processing applications – Delivers data and workload management tools, interfaces and monitoring services for the production system – Supports the development, optimisation, testing, release and distribution of CMS offline software Integration tasks: physics and software skills – provide oversight and integration framework for delivering detector subsystem software and high-level physics objects – Reconstruction, Simulation, Fast Simulation, Calibration & Alignment, Analysis Tools , L1 emulation s/w Offline discharges responsibility for integration, validation and delivery of releases using the Software Release Plan 14/2/2007 Planning and Milestones L. Silvestris 5 Status: Highlights from 2006 New framework – ~ 0.5 M lines of CMSSW code developed since April 2005 MTCC – Supported readout, calibration, local reconstruction and visualisation of data – components of Event Filter integrated with DAQ/Run Control system – > 30M events; maximum rate of 200 Hz; transfer rate to CASTOR 20 MB/s – parasitic stream to serve event display and Data Quality Monitoring consumers CSA06 – Simulation chain extensively validated in terms of robustness and performance (SVSuite); ~60M events generated – All main reconstruction components included • local, global, physics objects – Calibration/Alignment - tested ability to pull in constants from Offline DB – fully functional prototype of production system • WM: Tier 0, MC production system (ProdAgent), CMS Remote Analysis Builder (CRAB), which is used for submission of analysis jobs to the grid DM: successful integration of DBS, DLS, PhEDEx, FronTier, Dashboard Feb 2007: major milestone for validation of CMSSW wrt ORCA 14/2/2007 Planning and Milestones L. Silvestris 6 Plans for 2007: Milestones End Feb: CMSSW validation complete – Physics-TDR Volume I recovered End May: MTCC3 & Commissioning tests – First Global Run foreseen; repeated on a monthly basis Mid June: HLT exercise complete – Full trigger table, algorithms, CPU Beg July: CSA07 Computing Offline and Analysis Exercise – 5-05 (June): Software complete - HLT, reconstruction, simulation, calibration & alignment, visualisation and analysis October 15: First physics papers prepared – Full analysis breakdown; details of how and what for each physics paper October 15: Offline and Computing ready for Pilot Run http://cms.cern.ch/iCMS/jsp/page.jsp?mode=cms&action=url&urlkey=CMS_OFFLINE 14/2/2007 Planning and Milestones L. Silvestris 8 Offline Release Plan & Production End Jan: 1_2_2 – Complete Physics validation of CMSSW – Production of samples for HLT Studies (Geant4.7.1) Mid Feb: 1_2_3 – Production of samples for Physics Studies (Geant4.7.1) End Feb : 1_3_0 – all components needed for HLT exercise; no changes in geometry – Re-Reco processing with 1_3_x in March End March : 1_4_0 – Changes to geometry allowed; new/improved local reconstruction algorithms – Improved DQM and HLT software; support for partial releases. – Integration and commissioning tests with 1_4_x in May-June – Production of SIMU + Digi events for CSA07 start end of April. Mid May : 1_5_0: – new/improved global reconstruction algorithms and calibration alignment algorithms – 1_5_x: Production of RECO and AOD for CSA07 can start mid June – 1_5_x: New cycle of integration and commissioning tests in July-August 14/2/2007 Planning and Milestones L. Silvestris 11 Offline Release Plan & Production Mid June : 1_6_0 – complete calibration & alignment, visualization, analysis components for CSA07. – 1_6_x: Production of AlCaReco and Physics streams; T1-T2 involved – 1_6_x: New cycle of integration and commissioning tests in July-August Mid July : 1_7_0 – new cycle with improvements/fixes – 1_7_x: New cycle of integration and commissioning tests August - November Beg Sept : 1_8_0 – new cycle with improvements/fixes – Lesson learnt from CSA07 and from integration and commissioning tests – 1_8_x: New cycle of integration and commissioning tests August- November Mid Oct: 1_9_0 – 1_9_x: Pilot Run 14/2/2007 Planning and Milestones L. Silvestris 12 Towards CSA07 Need to demonstrate performance and capacity of CMS Data and Analysis Model at >50% of 2008 capacity and performance. – Aim to include DAQ and HLT – Aim for simultaneous functionality – Include functionality we didn’t include in the CSA06 scope Schedule: summer 2007 14/2/2007 Planning and Milestones L. Silvestris 13 CSA07 Goals: Increase Scale CMS demonstrated 25% performance in 2006. We have two more factors of 2 to ramp up before data taking in 2008 The data transfer between Tier-0 and Tier-1 reached about 50% of scale – Very successful test, but some signs of system stress were visible Job submission rate reached 25%. Another factor of 2 will be possible with the current system – The next factor of two is less clear We plan another formal challenge in 2007 A >50% challenge in the summer of 2007 – Extend the system to include the HLT farm – Add elements like simulation production – Increase user load – Run concurrent with other experiments stressing the system 14/2/2007 Planning and Milestones L. Silvestris 15 Key tasks and Manpower needs in Offline & Computing 14/2/2007 Planning and Milestones L. Silvestris 19 Databases Variety of DB Subsystems Offline Apps – Construction DB – Equipment Mgt DB Tier 0 Offline – Geometry DB Offline Conditions ORCOF ‘objects’ – Configuration DB – Conditions DB HCAL – Bookkeeping DB (DBS/DLS) ‘objects’ HLT Figure shows OMDS /ORCON / ORCON Conditions ORCOF deployment model – All critical CMS data should Online Online Monitoring, reside in this set of databases OMDS Configuration Create a Core Database Team to plan ‘tables’ evolution of DB systems together PVSS Monitoring, Configuration with subsystems Conditions=calibration,alignment, slow controls data 14/2/2007 Planning and Milestones L. Silvestris 20 Core Database Team Main responsibilities are to : – increase commonalities between subsystems – implement and maintain common services and policies – manage common schedules and milestones Need new effort to setup a core team comprising: – A DB coordinator : physicist profile (physics data, analysis) – Core DB software specialists : C++ and SQL developer profile – Operations team : DBA profile • co-operate with IT+ other LHC experiments to establish a 24/7 service 14/2/2007 Planning and Milestones L. Silvestris 21 Calibration and Alignment Delivers executable for prompt calibration and alignment task and for the export of constants to the HLT Provides forum for discussing global strategies and policies for driving new calibration updates – Works closely with the Detector Performance Groups It provides mechanisms for accessing databases and migrating data between them in collaboration with Online Project Specific tasks requiring new effort: - develop the general alignment strategy utilizing the global alignment approaches (e.g. Millepede II) • How to align >100K parameters with global alignment – implement global alignment approaches into the work and data flow of prompt alignment/calibration at Tier 0 – work and dataflow management for alignment at Tier 1/ Tier 2 centers – establish offline monitoring of online calibration constants • requires development of new interface in CMSSW 14/2/2007 Planning and Milestones L. Silvestris 22 HLT and DQM Provides infrastructure for executing HLT algorithms in the Filter Farm New effort needed for commissioning of detector code in Event Filter – organize and lead online sessions with subdetectors to validate code – prepare and document validation procedures – write documentation for shifters – need to understand many aspects of reconstruction code plus implication of online operation (access to resources/robustness issues) – become one of the "experts on call" for the filter farm New effort needed for developing Conditions DB access infrastructure – Collaborate to tests of various options for access to calibration data from Filter Farm (direct oracle access, frontier/squid,..) – This task needs to converge rapidly on a viable scheme to be used in the Filter Farm in 2008. Once the solution is identified, the person would become responsible for the deployment/maintenance of the required infrastructure in Cessy. Depending on the solution chosen, this may be in collaboration with external teams Ideally should be resident at CERN for these tasks 14/2/2007 Planning and Milestones L. Silvestris 23 HLT and DQM DQM provides the monitoring environment for safe operation of the HLT software and Filter Farm – DQM components are also used in offline monitoring DQM system essential for the commissioning and operation of CMS – Work with sub-detectors to modularise and make more uniform DQM code – Work on commissioning aspects of DQM in online environment – Manage technical aspects of DQM shift organization/documentation/training – Liaise with Commissioning and Run Coordination • DQM half-day workshop in next CMS Week DQM development work also required – Implementation of history plots and development / adaptation of data management tools for DQM data – Implementation of a "DQM catalog" database Effort urgently needed to work on the DQM GUI – Component of IGUANA (event display) - debug tool for offline simulation and reconstruction, data analysis, test beams, and detector monitoring 14/2/2007 Planning and Milestones L. Silvestris 24 Reconstruction Delivers executables for event selection in HLT and prompt reconstruction in the Tier0 (and re-R in Tier 1) Oversees development and manages integration of all reconstruction code and RECO data formats – local, global, high-level physics objects Tasks requiring effort – certifying releases used for physics, organizing/testing/ integrating – following the nightly builds; check what went ok/not ok every day, and help developers to fix problems – make performance optimisation studies - cpu, memory footprint; act as a contact person to the Performance Task Force 14/2/2007 Planning and Milestones L. Silvestris 25 Framework Provides the necessary services for building CMS data processing applications (reconstruction, analysis) – Module scheduling, event setup, provenance tracking, EDM… Effort is concentrated at FNAL - people based in Europe (preferably CERN) are needed to work closely with HLT and Tier 0 projects, especially during commissioning Development effort is needed for several tasks – Geometry infrastructure; improvements to the DDD ( XML-based detector description language) and regression testing and validation of the CMS geometry at every release – Infrastructure code for the Calibration and Alignment databases – Development and maintenance of the event setup and parameter set services 14/2/2007 Planning and Milestones L. Silvestris 26 Simulation Delivers full CMS simulation program – framework that interfaces to geometry, event generators, and Geant4 – Performs tuning and validation, profiling and optimisation studies – Oversees production workflow of MC data samples Most urgent tasks requiring effort – improvement/maintenance of Magnetic Field interface to Geant4, implementation of new field maps, development of validation code – work on local magnetic field management is needed to speed up full simulation. Needs significant Geant4 expertise. – GFLASH infrastructure, implementation and tuning of EM and HAD showers to full simulation. – develop simulation tools/features requested by users, interact and support users 14/2/2007 Planning and Milestones L. Silvestris 27 Software Development Tools Prepares and maintains the software environment for development, release, distribution and deployment A major task requiring effort relates to development of the Software Validation Suite (SVSuite) for certifying new releases of CMSSW applications – involves communication with all groups involved: now mainly simulation, reconstruction,...in the future other groups doing trigger, calib/align, fast simulation, MC/data comparison packages – Integration/testing of all packages into the MC Production environment – Communication with computing about production of RelVal samples and with the rest of the software groups on configuration file availability – Coordination of the software validation process at the time of a release. – Integration of DQM tools into SVSuite: histogram transfer to server, web publishing, implementation of macros as DQM client programs 14/2/2007 Planning and Milestones L. Silvestris 28 Data and Workflow Mgt Tools Development of tools used in production system – Data management tools: DBS and DLS, PhEDEx, Frontier – Workload management tools: CRAB, ProdAgent system,Tier 0 – Monitoring tools, CMS Dashboard – Web interfaces to enhance end-user usability Tasks requiring effort – a developer to work on the Tier 0 workflow system, and its integration with the HLT and Data Management system – a developer to join the team working on the production system, including various features and improvements: https://twiki.cern.ch/twiki/bin/view/CMS/ProdAgentDevPlanCurrent – A developer to work on web interfaces to the WM/DM tools 14/2/2007 Planning and Milestones L. Silvestris 29 Fast Simulation Provides a framework for fast simulation and reconstruction – interaction of particles in the tracker are simulated and shower parametrizations are used in the calorimeters – standard reconstruction algorithms used for track reconstruction, b-tagging, clustering, jets etc. Tasks requiring effort – simulation of the L1 calorimeter trigger; the standard L1 emulator cannot be used, since it relies on the digis which are not available in the FastSimulation; a parametrization of the L1 response on trigger tower basis has to be developed – simulation of the hits in the muon detector; the path and the energy deposit of the muons in the calorimeters have to be computed – precise tuning of the fast simulation using real (testbeam) data – detailed comparison of the Fast Simulation output w.r.t. the Full Simulation output in a wide range of energy 14/2/2007 Planning and Milestones L. Silvestris 30 Analysis Tools Provide a software suite integrated with the framework that facilitates data and physics analysis Organise and manage release of libraries developed in physics analysis projects Provide frameworks that support batch and interactive analysis – FWLite, Iguana, Python, PROOF, ROOT Effort needed for all parts of the work programme including – porting the FWLite based code to Microsoft's VC++ and work on an Windows 'friendly' installation – creating a 'provenance viewer' for EDM ROOT files which runs nicely in ROOT 14/2/2007 Planning and Milestones L. Silvestris 31 Level 1 Software The L1 Trigger offline software includes packages that provide bit-level emulation of the trigger electronics, along with tools for interpreting, monitoring and analysing L1 data. A further task would be to figure out exactly how the HLT Supervisor will communicate with CMSSW modules(e.g. through EventSetup). Another project would be to develop prototype offline DB schema for emulator configuration constants. 14/2/2007 Planning and Milestones L. Silvestris 32 Integration N.N. for Integration I.Fisk, N.N. Organize the program of work for integrating the software infrastructure (production tools, applications) and data handling systems (facilities, data operations) in preparation for data taking Senior Physicists/Computer Scientist with computing expertise, operations experience and organizational talents Presence at CERN: 50%, (I.Fisk is 25% at CERN) coordination of remote activities from remote (this was done by M.Ernst) 14/2/2007 Planning and Milestones L. Silvestris 33 Data Operations N.N. for Data Operations C.Paus, N.N. Responsible for data processing and executing production workflows. Overall responsibility to ensure that detector corrections, calibration and alignment constants are readily applied at prompt processing and distributed. Senior Physicists with computing expertise and operations experience and organizational talents Presence at CERN: 50%, (C.Paus is >75% at CERN) coordination of remote activities from remote is feasable 14/2/2007 Planning and Milestones L. Silvestris 34 Facility/Infrastructure N.N. for Facility / Infrastructure Operations D.Bonacorsí, N.N. Responsible to provide and maintain a working distributed computing fabric. - consistent working environment for the Data operation and the users - Coordination of facilities operation, resource management - liaison to external projects and organisations involved in this process. Physicist/Computer Scientist with strong systems and grid systems expertise - Presence at CERN required 100% (D.Bonacorsi is 50% at CERN) 14/2/2007 Planning and Milestones L. Silvestris 35 User Support N.N. for Provide user support for analysis K.Lassila-Perini, N.N. Helpdesk, triaging of problems Provide guidance to user for trouble shooting analysis jobs Organize and provide documentation Organize tutorials Define what is need for users to monitor and track jobs and data in the system User accounts and VO management Coordination based at CERN – team based at CERN – involvement of SW developers – involvement of Physicist teaching how to perform analysis – user support front-end person at each T1/T2 Physicist or Computer Scientist with good data analysis experience, writing skills to assist users - Presence at CERN: up to 50% (K.Lassila-perini is 100% at CERN) 14/2/2007 Planning and Milestones L. Silvestris 36 Back-up Relationship to Run Coordination Commissioning will involve large scale tests of data- and work-flow in steps up to full scale operation in 2008 Run coordination is making a plan for the commissioning and operation of the CMS detector Effective operations of data-taking requires close coordination with Offline and Computing Weekly to daily interaction with contacts, in particular – DQM - monitoring, error diagnostics – HLT - operation of online filter software – Calibration and alignment, derivation of constants and population of database – Reconstruction - identify and solve problems, ensure quality of data Integration tasks are critical component of commissioning and run activities (releases, databases) 14/2/2007 Planning and Milestones L. Silvestris 38 Meetings Weekly Offline Meeting Schedule Start Time Monday Tuesday Wednesday Thursday Friday 08:30 09:00 09:30 Plenary/ 10:00 Joint Physics 10:30 Commissioning Detectors Offline 11:00 Offline 11:30 12:00 12:30 13:00 13:30 14:00 L1 14:30 Coordination Commissionin 15:00 Joint g Reconstructio 15:30 Task Forces/ Comp/Off L2 n/ Analysis 16:00 Production Tools 16:30 Tools/ COMPUTING Calibration & 17:00 Framework & Simulation/ Online Alignment/ 17:30 SW Devel Level1 SW/ Selection HLT&DQM 18:00 Tools Fast 18:30 Simulation 19:00 14/2/2007 Planning and Milestones L. Silvestris 40 Physics Days Start Time Monday Tuesday Wednesday Thursday Friday 08:30 09:00 Reports from 09:30 DPGs, 10:00 Analysis alignment & 10:30 Groups calibration, 11:00 presentations commissioning 11:30 news 12:00 12:30 13:00 13:30 14:00 Plenary 14:30 Physics 15:00 POG and Meeting Common 15:30 Joint Offline- analysis (group Offline- 16:00 Physics meetings highlights, Computing 16:30 meeting analysis meeting 17:00 approvals) 17:30 Physics 18:00 Coordination 18:30 Meeting 14/2/2007 Planning and Milestones L. Silvestris 41 Joint Offline/Physics Meetings during Physics Days 6 Feb 6-8 Feb 27 Mar 27-29 Mar 24 Apr Physics Trigger Week 22 May 22-24 May 11-15 Jun Annual Review 10 Jul 10-12 Jul 31 Jul 31-2 Aug 28 Aug 28-20 Aug 23 Oct Physics Trigger Week 13 Nov 13-15 Nov Offline and Computing Weeks 16 Apr Offline & Computing Week 15 Oct Offline & Computing Week 14/2/2007 Planning and Milestones L. Silvestris 42 CMS weeks Monday Tuesday Wednesday Thursday Friday 8:30 Sub-Detectors Plenary Plenary Physics 9:00 Sub-Detectors Parallel Technical Coordn Computing Plenary II 9:30 Parallel and Software 10:00 10:30 11:00 OPENING 11:30 SESSION 12:00 12:30 CB 14:00 Plenary Physics 14:30 Commissioning Plenary I 15:00 15:30 PM 16:00 Reports 16:30 17:00 17:30 18:00 FB Special MB 18:30 19:00 14/2/2007 Planning and Milestones L. Silvestris 43 CMS Week Meetings 26 Feb -2 Mar Plenary Offline and Computing Meeting on Thursday afternoon instead of Thursday morning Parallel session can be done during Tuesday full day, Wednesday and Thursday morning CMS Tutorial on Tuesday morning 14/2/2007 Planning and Milestones L. Silvestris 44 Schedules Offline/Production Schedule Releases/Milestones Production Activities End Jan : 1_2_1: Production version for HLT Studies 1_2_1: HLT Production and Physics End Jan : 1_2_2: Version to be used for Physics analysis Production Feb-March. Mainly Validation of CMSSW Generation and Simulation step. Mid Feb : Complete Physics Validation of CMSSW End Feb : 1_3_0: all components needed for HLT 1_3_x: re-reprocessing for HLT in exercise; no changes in geometry; Geant4.7.1 March End Feb : 1_3_0_Geant4.81 all components needed for 1_3_x: Production of Physics Samples HLT exercise; no changes in geometry; Geant4.7.1 (30M/mth) Reco Step in March 1_3_x_geant4.81 Validation samples End March : 1_4_0 : Changes to geometry allowed. 1_4_x: HLT test in April new/improved local reconstruction algorithms 1_4_x: Integration and Commissioning Improved DQM and HLT SW. tests in May This release should resolve dependencies problems in a 1_4_x: Production of SIMU end of April way that we can release Online releases without Geant4 components. 14/2/2007 Planning and Milestones L. Silvestris 46 Offline/Production Schedule Releases/Milestones Production Activities Mid May : 1_5_0: new/improved global reconstruction 1_5_x: Production of RECO and AOD algorithms and calibration alignment algorithms can start mid June T0 Production for CSA07 1_5_x: New cycle of integration and commissioning tests in July-August Mid June : 1_6_0 – complete calibration & 1_6_x: Production of AlCaReco and alignment, visualization and analysis components Physics streams. T1-T2 involved needed CSA07. 1_6_x: New cycle of integration and commissioning tests in July-August mid July : 1_7_0 - new cycle with 1_7_x: New cycle of integration and improvements/fixes commissioning tests August- November This version will need also an Online Release Further updated on Geometry and Data format 14/2/2007 Planning and Milestones L. Silvestris 47 Offline/Production Schedule Releases/Milestones Production Activities Beg Sept : 1_8_0 - new cycle with 1_8_x: New cycle of integration and improvements/fixes commissioning tests August- Lesson learnt from CSA07 and integration and November and Pilot Run. commissioning tests Mid Oct: 1_9_0 1_9_x: Pilot Run 14/2/2007 Planning and Milestones L. Silvestris 48 Updated Milestones for LHCC 14/2/2007 Planning and Milestones L. Silvestris 49