cms grid activities in the united states

Document Sample
cms grid activities in the united states Powered By Docstoc
					CMS Grid Activities in the United States
I. Fisk5, J. Amundson3, P. Avery7, L. A. T. Bauerdick3, J. Branson5, J.J. Bunn1, R. Clare6, I. Gaines3, G. Graham3, T.M. Hickey1, K. Holtman1, I. Legrand1, V. Litvin1, V. S. Muzaffar3, H.B. Newman1, V. O'Dell3, A. Samar1, S. Singh1, C. Steenberg1, D. Stickland4, K. Stockinger2, H.Wenzel3, T. Wildish4, R. Wilkinson1 (for the CMS collaboration)

(Caltech), 2(Caltech/CERN), 3(Fermilab), 4(Princeton), 5(UCSD), 6(UC Riverside), 7(Univ. of Florida)

The CMS groups in the USA are actively involved in several grid-related projects, including the DoE-funded Particle Physics Data Grid (PPDG) and the NSF-funded Grid Physics Network (GriPhyN). We present developments of: the Grid Data Management Pilot (GDMP) software; a Java Analysis Studio-based prototype remote analysis service for CMS data; tools for automating job submission schemes for large scale distributed simulation and reconstruction runs for CMS; modeling and development of job scheduling schemes using the MONARC toolkit; a robust execution service for distributed processors. The deployment and use of these tools at prototype Tier1 and Tier2 computing centers in the USA is described. Keywords: CMS, LHC, GRID, regional center, USA

1. Introduction
CMS has adopted a distributed computing model to perform data analysis, event simulation, and event reconstruction in which two-thirds of the total computing resources are located at regional centers. The unprecedented size of the LHC collaborations and complexity of the computing task requires that new approaches be developed to allow physicists spread globally to efficiently participate. The Grid computing projects, including PPDG, GriPhyN, and DataGrid, are potentially excellent sources of tools to help CMS facilitate distributed computing. As the largest single country participant in CMS and a long way from CERN, the US may have the most to gain from the successful implementation of efficient distributed computing and subsequently devotes considerable resources to participating in the Grid projects. CMS needs a combination of tools to efficiently exploit the global grid of computing resources being planned: tools to submit jobs, tools to move data and results, tools to decide when it is appropriate to perform each. In this paper we will summarize some of the US CMS Grid Development efforts. Details about the CMS development of a robust queuing system, which is better suited for use in a distributed computing environment, will be given. The Grid Data Management Pilot (GDMP) progress will be described. GDMP has been used successfully by CMS in a production environment to replicate databases between regional centers. The development of tools for automating job submission schemes for large scale distributed simulation and eventually data reconstruction for CMS, which integrate elements of job submission and data replication, will be discussed. CMS work toward distributed analysis will be introduced. Finally, we will close with a description of the US development of prototype computing centers. CMS has a tiered architecture, with Tier0, Tier1, and Tier2 regional centers. The US is committed to building a Tier1 center and five Tier2’s. Due to the complex nature of some of the subjects and severely limited page count, many of the advanced details will be left to references and web links.

2. A Robust Execution Service
A member of CMS in the US has been developing a robust execution service called RES. The ultimate goal of this project is to develop a tool that aids high energy physicists in effectively using computational resources distributed worldwide. There are a number of tools available today that are aimed toward similar goals (LSF, PBS, DQS, Condor etc.), but none quite adequate. The first step in the project was to take a look at a few technical issues that are important to high energy physics experiments that have not been addressed by existing tools. These are keeping track of large number of long running jobs, supporting collaboration among multiple physicists, conserving limited network bandwidth, maintaining high availability, and tolerating partition failures that are common in wide-area networks. RES addresses the first two issues introducing the notion of sessions. A session serves as a container for multiple jobs that are related. Jobs that are in the same session can be examined or killed with a single command. Multiple physicists can share a session, and RES ensures safe access to it. The next is addressed by allowing the selection of processors based not only on load or type but also on where data resides. High availability of service is addressed by having multiple copies of servers. Servers share some state so that resources will not be over-allocated or under-utilized. The shared state is replicated at each server and its consistency is maintained. In the environment where network partitions can occur availability and consistency are at odds with each other. RES addresses this issue by allowing multiple partitions to coexists occasionally with potentially divergent states but by merging states consistently as soon as partitions are healed. Implementation of RES utilizes two existing technologies: functional language ML and group communication toolkit. ML is strongly typed and does garbage collection. These two features makes programs written in ML much less prone to bugs compared to those written in more popular languages. Group communication aims to facilitate programmers in writing distributed programs. Two important concepts are group membership and ordered communication. Currently a prototype of RES exists, and has been tested on the California Tier2 prototype center. Implementation of a new prototype aimed toward larger Tier1/0 centers has just begun. More detailed information about RES is found in references [1,2].

GDMP was started as a pilot project to evaluate the applicability of available Grid tools to data replication. The first releases were only capable of replicating Objectivity database files but could publish a catalog of available files, securely authenticate users using Globus certificates, transfer the files between remote sites, and integrate them into the local database federation. The pilot was widely used in the CMS production environment and the success has lead PPDG and DataGrid to work together with CMS to improve the project, with the aim of creating a data replication solution common to several experiments. GDMP is based on a flexible, modular architecture and built using Globus Middleware tools. The core modules are Control Communication, Request Manager, Security, Database Manager, and the Data Mover. The Control Communication module handles the communication between the clients and the server. The Request Manager handles generating and executing request on either the client or server side. It currently contains several request generator/executing pairs, and more can be added to customize the functionality. The Security module handles authentication and is built around the Globus Security Infrastructure. The Database Manager interacts with the Database Management System; this is used to query the database catalog and integrate incoming database files into the local federation. The final module is the Data Mover, which handles the data transfer [3]. During the Fall 2000 production CMS successfully transferred about 1TB of Objectivity database files between CERN and regional centers using GDMP. The experience was fed back to the development team. Recently GDMP was integrated with the PPDG-developed Hierarchical Storage Manager, HRM, to provide a more consistent interface to Mass Storage Systems. The future plans are quite promising. The GDMP developers have implemented the Globus replica catalogue in GDMP,

which allows the transfer of files independent of the file format. The general release is expected in September and represents a big step toward the goal of a general grid data replicator. There is an ambitious plan to implement specialized functionality for files of arbitrary file types (e.g. ROOT, ZEBRA files) and Objectivity including the ability to check a database or lock server, to create a new federation, and to load or to update a federation schema. There is much more information about the construction, status and plans of GDMP available in references [3,4, and 5].

4. Tools for Job Automation
CMS needs large amounts of Monte Carlo simulation to validate the detector reconstruction software and to complete the trigger studies and the Physics Technical Design Review. While CMS has been successful at building dedicated prototype computing facilities and obtaining time at shared facilities, staff to run event production is in short supply and requires considerable expertise to be successful. Both for the short and long term CMS needs tools to automate production to make it easier, more efficient, and less manpower intensive. Groups in the US have pursued two development programs: a short-term proof-of-concept system and a longer-term development effort. The proof-ofconcept system, described in references [6 and 7], which when implemented with the full CMS production chain allowed one production manager to utilize facilities at Los Alamos, Argonne, the University of Wisconsin, and the California CMS Tier2 Prototype. The second development effort, Monte Carlo Production system or MOP, seeks to integrate existing grid tools with CMS job specification tools to form a modular automated production system. MOP consists of three main modules: a job manager, a queue manager and a file manager. The job manager is the most application specific module and converts a user specified English description to job scripts. The queue manager, which is based on Condor-G, controls the submission of jobs to the regional facilities. One of the advantages of Condor-G is that it interfaces to a number of locally installed queuing systems. Finally the file manager, which uses GDMP, handles the transfer of input files to the processors and the output files back to the central archive. MOP is currently being deployed as a test between Fermilab and the University of Wisconsin, but other US facilities should be integrated by the end of the summer. More information on the design and current status of MOP can be found in references [8 and 9].

5. CMS Analysis and Load Balancing
Considerable effort is currently being expended within CMS to develop tools to help facilitate the relatively organized task of event production and eventually data reconstruction. However, the much more chaotic environment of data analysis is more difficult and only now beginning to be addressed. Work has begun in the US on a remote analysis server based on Java Analysis Studio. A detailed description is given in CHEP 3-044. The work with JAS begins to define a reasonable interface for remote physicists to submit analysis jobs and perform analysis tasks, but the problem of how to efficiently run analysis jobs submit by hundreds of remote physicists is open. In CMS the data required for an analysis job may not be on the facility where the job was originally submitted, it may not be on a system where there are sufficient computing resources accessible, and it may not even all be at the same facility. Fairly complex systems may be needed to efficient assign jobs to computing resources and move data or jobs appropriately. Work is progressing investigating Self Organizing Neural Networks to complete this task. This approach allows the system to evolve and improve itself dynamically, learning from “past experiences”. Clustering data and forming correlations in the high-dimensional input space is done using a growing self organizing network. The structure is not predefined and the addition and removal of neurons is handled as part of the learning process. The connections within the network are assigned ages and the network can recover from incorrect or incomplete input by removing connections which are not reestablished over a period of time. CMS efforts in the MONARC simulation toolkit were instrumental in allowing this type of development to proceed. A complete description of the project is available in references [10, 11 and 12].

6. Prototype Computing Facility Development
The US is involved in the development of one Tier1 and two Tier2 prototype computing facilities. The CMS plan calls for one Tier0 facility, located at CERN with one-third the computing resources; 5 Tier1 facilities, four located away from CERN, which integrated are the second third, and 25 Tier2 facilities located away from CERN with the remaining computing resources. The development of computing prototypes serves several purposes. R&D on the distributed computing model is performed as well as strategies for production processing and data analysis. Testing and evaluation of software tools is performed. Hardware for computing, storage, and networking is evaluated and benchmarked. This gives a starting point for the extrapolation of the resources required and the cost to complete the proposed computing task. The prototypes also serve as dedicated production facilities for CMS, helping with production and building remote expertise. The prototypes serve as centers for prototypical US-based analysis communities. The three centers constructed in the US are located at Fermilab, the site of the US Tier1 facility, and Southern California and Florida, proposed permanent Tier2 sites. The Fermilab and California facilities both consist of 40 dual CPU Linux computation nodes for batch farm use and Fermilab is expected to have a dedicated user analysis facility soon. Florida is the largest US facility for CMS with 76 dual CPU nodes. All facilities have a few TB of disk storage and access to tape storage. More information on the prototype effort can be found in [13 and 14]

[1] Takako M. Hickey and Robbert van Renesse, "An Execution Service for a Partitionable Low Bandwidth Network", In Proceedings of IEEE Fault-Tolerant Computing Symposium, June, 1999. The Robust Execution Service Homepage, A. Samar, H. Stockinger. Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication, IASTED International Conference on Applied Informatics (AI2001), Innsbruck, Austria, February 19-22, 2001. The GDMP Homepage, A. Samar and H. Stockinger, Grid Data Management Pilot (GDMP) User Guide for GDMP V1.2.2, S. Koranda and V. Litvin, ”Infrastructure for CMS Production Runs on NCSA/Alliance Resources: A Prototype”, Presentation, V. Litvin et. al., “Grid Infrastructure for CMS Production on Alliance Resources”, Presentation to NCSA, J. Amundson and G. Graham, MonteDisPro: A System for CMS Monte Carlo Distributed Production, J. Amundson, MOP Status Presentation to PPDG, I. Legrand and H. Newman, “A Self Organizing Neural Network for Job Scheduling in Distributed Systems”, CMS Note 2001/009. I. Legrand and H. Newman, “A Self Organizing Neural Network for Job Scheduling in Distributed Systems”, Contribution to ACAT 2000. The MONARC SONN Simulation Page, California US CMS Tier2 Center Prototype Homepage, US CMS Fermilab Computing Page,

[2] [3]

[4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Shared By:
Description: cms grid activities in the united states