BNL-Service-Challenge-1

Document Sample
BNL-Service-Challenge-1 Powered By Docstoc
					  BNL Facility Status and Service
           Challenge 3

                   HEPiX
            Karlsruhe, Germany
                May 9~13, 2005
Zhenping Liu, Razvan Popescu, and Dantong Yu
      USATLAS/RHIC Computing Facility
           Brookhaven National Lab
                              Outline

 Lessons learned from SC2.

 Goals of BNL Service Challenges

 Detailed SC3 planning
      Throughput Challenge (Simple)
         Network Upgrade Plan
         USATLAS dCache system at BNL
         MSS
         Tier 2 Integration Planning
         File Transfer System
         Tier 2 sites involvement

      Service Phase Challenge to include ATLAS applications (difficult)

                                  HEPiX
                                                                           2
                   Karlsruhe, Germany, May 09-13, 2005
One day data transfer of SC2




                     HEPiX
                                            3
      Karlsruhe, Germany, May 09-13, 2005
                                       Goals
 Network, disk, and tape service
       Sufficient network bandwidth: 2Gbit/sec
       Quality of service: performance: 150Mbtype/sec to Storage, upto 60
        Mbytes/second to tape, Has to be done with efficiency and effectives.
            Functionality/Services, high reliability, data integration, high performance

 Robust file transfer service
       Storage Servers
       File Transfer Software (FTS)
       Data Management software (SRM, dCache)
       Archiving service: tapeservers, taperobots, tapes, tapedrives,

 Sustainability
       Weeks in a row un-interrupted 24/7 operation

 Involve ATLAS experiment Applications
                                         HEPiX
                                                                                            5
                          Karlsruhe, Germany, May 09-13, 2005
BNL network Topology




                 HEPiX
                                        6
  Karlsruhe, Germany, May 09-13, 2005
 Network Upgrade Status and Plan

 WAN connection OC-48.

 Dual GigE links connect the BNL boarder router to
  the Esnet router.

 Work on LAN upgrade from 1 GigE to 10 GigE, date
  to complete: Middle of June, 2005




                             HEPiX
                                                      7
              Karlsruhe, Germany, May 09-13, 2005
BNL Storage Element: dCache System
 Allows transparent access to large amount of data files distributed on disk in dCache
  pools or stored on HPSS.
        Provides the users with one unique name-space for all the data files.

 Significantly improve the efficiency of connected tape storage systems, through
  caching, i.e. gather & flush, and scheduled staging techniques.
 Clever selection mechanism.
        The system determines whether the file is already stored on one or more disks or
         on HPSS.
        The system determines the source or destination dCache pool based on storage
         group and network mask of clients, also CPU load and disk space, configuration
         of the dcache pools.
 Optimizes the throughput to and from data clients as well as balances the load of the
  connected disk storage nodes by dynamically replicating files upon the detection of
  hot spots.
 Tolerant against failures of its data servers.
 Various access protocols, including gridftp, SRM and dccp.

                                          HEPiX
                                                                                            8
                           Karlsruhe, Germany, May 09-13, 2005
             BNL dCache Architecture

                                                  DCap      GridFTP    SRM Clients
                                                  Clients   Clientsd




                               Data Channel                                          Control Channel




       Oak Ridge
       Batch system   External      Internal
HPSS                  Write Pools   write pools                               DCap         GridFTP     SRM
                                                  Read pools                  doors        doors       door
                                                                                           doors       doors




                                            Pnfs Manager               Pool Manager



                                       HEPiX
                                                                                      DCache System
                                                                                                               9
                        Karlsruhe, Germany, May 09-13, 2005
        dCache System, Continued
 BNL USATLAS dCache system works as a disk
  caching system as a front end for Mass Storage
  System

 Current configuration: Total 72 nodes with 50.4 TB
  disks:
      Core server nodes, database server
      Internal/External Read pool: 65 x 49.45 TB
      Internal write pool nodes 4 x 532 GB
      External write pool nodes 2 x 420 GB
      dCache version: V1.2.2.7-2
      Access protocols: GridFTP, SRM, dCap, gsi-dCap

                                HEPiX
                                                        10
                 Karlsruhe, Germany, May 09-13, 2005
         Immediate dCache Upgrade

 Existing dCache has 50 TB data storage.

 288 new dual-CPU 3.4 Ghz dell hosts will be on-site on
  May/11/2005
       2 x 250G SATA drives
       2GB memory and dual Gigbit on-board ports

 These hosts will be split into more than two dCache system.

 One of system will be used to SC3. The disk pool nodes will be
  connected directly to ATLAS router which has 10 G uplink.

 SL3 will be installed on all these dell hosts.

 File System to be installed: XFS, need to tune to improve disk
  utilization per host.
                                  HEPiX
                                                                   11
                   Karlsruhe, Germany, May 09-13, 2005
            BNL ATLAS MSS


 Two 9940B tape drivers. Data transfer rate is
  between 10MB~30MB/second. These two tape drives
  are saturated with daily USATLAS production.

 200 GB tapes.

 We need to borrow tape drives from other BNL in-
  house experiments on July to meet 60MByte/second
  performance target.




                            HEPiX
                                                     12
             Karlsruhe, Germany, May 09-13, 2005
               File Transfer Service

ATLAS sees benefits on trying gLite FTS as soon as
  possible
      To see ASAP whether it meet data transfer requirements
      Data transfer requires significant effort to ramp up, learn
       from SC2
      Help debugging gLite FTS
      Transfers between Tier 0, Tier 1 and a few Tier 2.
      A real usage with test data.
      Uniform low-level file transfer layer to interface with several
       implementations of SRM: dCache/SRM, DPM, even vanilla
       GridFtp?

                                 HEPiX
                                                                         13
                  Karlsruhe, Germany, May 09-13, 2005
                    Tier 2 Plans

 Choose two USATLAS tier 2 sites.

 Each site will deploy DPM server as storage element
  with SRM interface.

 gLite FTS (file transfer service) will transfer data from
  BNL to each of two chosen sites in the speed of 75M
  byte/second.

 Files will be kept in BNL Tier 1 dCache until they are
  read once to Tier 2 center.


                              HEPiX
                                                              14
               Karlsruhe, Germany, May 09-13, 2005
   ATLAS and SC3 Service Phase

 September
      ATLAS release 11 (mid September)
           Will include use of conditions database and COOL
      We intend to use COOL for several sub-detectors
           Not clear how many sub-detectors will be ready
           Not clear as well how we will use COOL
                 Central COOL database or COOL distributed database
      Debug scaling for distributed conditions data access
       calibration/alignment, DDM, event data distribution and discovery
      Tier 0 exercise testing
      A dedicated server is requested for the initial ATLAS COOL service
      Issues on FroNtier are still under discussion and ATLAS is
       interested
      Data can be thrown away
                                    HEPiX
                                                                           15
                     Karlsruhe, Germany, May 09-13, 2005
          ATLAS & SC3 Service Phase
 April-July: Preparation phase
       Test of FTS (“gLite-SRM”)
       Integration of FTS with DDM

 July: Scalability tests (commissioning data; Rome Physics workshop data)
 September: test of new components and preparation for real use of the service
       Intensive debugging of COOL and DDM
       Prepare for “scalability” running

 Mid-October
       Use of the Service
       Scalability tests of all components (DDM)
       Production of real data (MonteCarlo; Tier-0; …)

 Later
       “continuous” production mode
       Re-processing
       Analysis


                                        HEPiX
                                                                                  16
                         Karlsruhe, Germany, May 09-13, 2005
                      Conclusion
 Storage Element and network go well with upgrade.

 The whole chain of system will be tuned before the end of
  May.

 Wait for FTS software to control data transfer.

 Talk with USATLAS Tier 2 sites to participate SC3.

 Discuss on how the experiment software can be involved.




                                HEPiX
                                                              17
                 Karlsruhe, Germany, May 09-13, 2005
        Thank You!




               HEPiX
                                      18
Karlsruhe, Germany, May 09-13, 2005

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:10
posted:6/13/2011
language:Romanian
pages:17