Embed
Email

An Atlas Computing Facility at BNL

Document Sample

Shared by: ewghwehws
Categories
Tags
Stats
views:
0
posted:
2/15/2012
language:
pages:
24
U.S. ATLAS Computing Facilities

(Overview)





Bruce G. Gibbard



Brookhaven National Laboratory



Review of U.S. LHC Software and Computing Projects



Fermi National Laboratory

November 27-30, 2001

Outline

 US ATLAS Computing Facilities Definition

 Mission

 Architecture & Elements



 Motivation for Revision of the Computing Facilities Plan

 Schedule

 Computing Model & Associated Requirements

 Technology Evolution

 Tier 1 Budgetary Guidance



 Tier 1 Personnel, Capacity, & Cost Profiles for New

Facilities Plan





B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 2

US ATLAS Computing Facilities Mission





 Facilities procured, installed and operated

 …to meet U.S. “MOU” obligations to ATLAS

 Direct IT support (Monte Carlo generation, for example)

 Support for detector construction, testing, and calibration

 Support for software development and testing



 …to enable effective participation by US physicists in the

ATLAS physics program!

 Direct access to and analysis of physics data sets

 Simulation, re-reconstruction, and reorganization of data as

required to complete such analyses







B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 3

Element of US ATLAS Computing Facilities

 A Hierarchy of Grid Connected Distributed Resources Including:

 Tier 1 Facility Located at Brookhaven – Rich Baker / Bruce Gibbard

 Operational at < 0.5% level

 5 Permanent Tier 2 Facilities (to be Selected in April ’03)

 2 Prototype Tier 2’s selected earlier this year and now active

 Indiana University – Rob Gardner

 Boston University – Jim Shank

 Tier 3 / Institutional Facilities

 Several currently active; most candidate to become Tier 2’s

 Univ. of California at Berkeley, Univ. of Michigan, Univ. of Oklahoma, Univ. of

Texas at Arlington, Argonne Nat. Lab.

 Distribute IT Infrastructure – Rob Gardner

 US ATLAS Persistent Grid Testbed – Ed May

 HEP Networking – Shawn McKee

 Coupled to Grid Projects with designated liaisons

 PPDG – Torre Wenaus

 GriPhyN – Rob Gardner

 iVDGL – Rob Gardner

 EU Data Grid – Craig Tull



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 4

Tier 2’s

 Mission of Tier 2’s for US ATLAS

 A primary resource for simulation

 Empower individual institutions and small groups to do relatively

autonomous analysis using high performance regional networks

and more directly accessible and locally managed resources



 Prototype Tier 2’s were selected based on their ability to

contribute rapidly to Grid architecture development

 Goal in future Tier 2 selections will be to leverage

particularly strong institutional resources of value to ATLAS

 Aggregate of the 5 Tier 2’s is expected to be comparable to

Tier 1 in CPU and disk capacity available for analysis



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 5

US ATLAS Persistent Grid Testbed



U Michigan Boston

Esnet, Mren University

UC Berkeley

LBNL-NERSC NPACI,

Abilene





Argonne

National

Laboratory



Calren Esnet,

Abilene, Nton



Brookhaven

Esnet National

Abilene

Laboratory

Oklahoma

University



Indiana Prototype Tier 2s

University



HPSS sites University of Texas

At Arlington





B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 6

Evolution of US ATLAS Facilities Plan

 In Responds to Changes or Potential Changes in

 Schedule

 Computing Model & Requirements

 Technology

 Budgetary Guidance









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 7

Changes in Schedule

 LHC start-up projected to be a year later

 2005/2006  2006/2007



 ATLAS Data Challenges (DC’s) have, so far, stayed fixed

 DC0 – Nov/Dec 2001 – 105 events

 Software continuity test

 DC1 – Feb/Jul 2002 – 107 events

 ~1% scale test

 DC2 – Jan/Sep 2003 – 108 events

 ~10% scale test

 A serious functionality & capacity exercise

 A high level of US ATLAS facilities participation is deemed very

important







B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 8

Computing Model and Requirements

 Nominal model was:

 At Tier 0 (CERN)

 Raw  ESD/AOD/TAG pass done, result shipped to Tier 1’s

 At Tier 1’s (six anticipated for ATLAS)

 TAG/AOD/~25% of ESD on Disk, Tertiary storage for remainder of ESD

 Selection passes through complete ESD ~monthly

 Analysis of TAG/AOD/selected ESD/etc. (n-tuples) on disk for analysis pass by

~200 users within 4 hours

 At Tier 2’s (five in U.S.)

 Data access primarily via Tier 1 (to control load on CERN and transatlantic link)

 Support ~50 users as above but frequent access ESD on disk at Tier 1 likely

 Serious limitations are

 A month is a long time to wait for the next selection pass

 Only 25% of ESD is available for event navigating from TAG/AOD during analysis

 The 25% of ESD on disk will rarely have been consistently selected (once a

month) and will be continuously rotating, altering the accessible subset of data



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 9

Changes in Computing Model and

Requirements (2)

 Underlying problem:

 Selection pass and analysis event navigation access to ESD is sparse

 Estimated to be ~1 out of 100 events per analysis

 ESD is on tape rather than on disk

 Tape is a sequential medium

 Must access 100 times more data then needed

 Tape is expensive per unit of I/O bandwidth

 As much as 10 times that of disk

 Thus penalty in access cost relative to disk may be a factor of ~1000



 Solution:

 Get all ESD on disk

 Methods for accomplishing this are:

 Buy more disk at Tier 1 – most straight forward

 Unify/coordinate use of existing disk across multiple Tier 1’s – more economical

 Some combination of above – compromise as necessitated by available funding





B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 10

“2007” Capacities for U.S. Tier 1

Options



Tape Based 3 Tier 1 Standalone

Model Disk Model Disk Model

CPU (SPECint95) 209 329 500

Disk (TBytes) 365 483 1000

Tape (PBytes) 1.85 1.85 1.85

Disk (GBytes/sec) 18.3 18.3 18.3

Tape (MBytes/sec) 802 185 185

WAN (Mbit/sec) 4610 9864 9864

1/3+1/6 of ESD on disk Add other 2/3 of ESD

ESD pass each month ESD pass per group each day



 “3 Tier 1” Model (Complete ESD found on disk of U.S. plus 2 other Tier 1’s)

 Highly dependent on the performance of other Tier 1’s and the Grid middleware and

network (transatlantic) used to connect to them



 “Standalone” Model (Complete ESD on disk of US Tier 1)

 While avoiding above dependencies, is more expensive



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 11

Changes in Technology

 No dramatic new technologies

 Previously assumed technologies are tracking Moore’s Law well

 Recent price performance points from RHIC Computing Facility

 CPU: IBM procurement - $33/SPECint95

 310 Dual 1 GHz Pentium III nodes @ 97.2 SPECint95/Node

 Delivered Aug 2001

 $1M fully racked including cluster management hardware & software

 Disk: OSSI/LSI procurement - $27k/TByte

 33 Usable TB of high availability Fibre Channel RAID 5 @ 1400 MBytes/sec

 Delivered Sept 2001

 $887k including SAN switch



 Strategy is to project, somewhat conservatively, from these points for

facilities design and costing

 Actually used 20 month rather than the observed <18 month

price/performance halving time for disk and cpu



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 12

Changes in Budgetary Assumptions (2)

 Assumed Funding Profiles (At Year $K)

Planning Date FY 01 FY 02 FY 03 FY 04 FY 05 FY 06 FY 07 FY 08

Nov-00 1411 1609 2398 3270 5074 8348

Nov-01 855 839 1600 2500 4600 7000 10600 8000



 For revise LHC startup schedule, new profile is better



 For ATLAS DC 2 which stayed fixed in ’03, new profile is worse

 Hardware capacity goals of DC 2 will not be met

 Personnel intensive facility development may be as much as 1 year behind



 Hope is that another DC will be added allowing validation of a more

nearly fully developed Tier 1 and US ATLAS facilities Grid









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 13

Profiles for Standalone Disk Option

 Much higher functionality (than other options) and, given new stretched out LHC

schedule, within budget guidance



 Fractions in revised profiles in table below are of a final system which has

nearly 2.5 times the capacity of that discussed last year





Year 2001 2002 2003* 2004 2005 2006 2007

Previous Profiles

ATLAS 5% 15% 40% 100%

US ATLAS 1% 2% 5% 10% 20% 100%

Revised Profiles

ATLAS * "5%" 18% 45% 100%

US ATLAS 0.1% 0.2% 1% 3% 10% 30% 100%



* Converted from a funding profile

Region of strictly limited funding

* DC2





B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 14

Associated Labor Profile





FY '01 FY '02 FY '03 FY '04 FY '05 FY '06 FY '07 FY '08

11/00 Projection (FTE's) 5 7 10 15 25 25 25 25

11/01 Projection* (FTE's) 2.7 4.2 6.5 11 16 22 25 25

Labor Cost (@Yr $K) 419 677 1090 1918 2901 4149 4903 5099

Support Costs (@Yr $K) 50 66 91 141 199 271 313 322

Total Cost (@Yr $K) 469 743 1181 2058 3100 4420 5216 5421

* Not including .5 FTE of PPDG in FY '02-'04









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 15

Summary Tier 1 Cost Profile

(At Year $K)



2001 2002 2003 2004 2005 2006 2007 TOTAL 2008

CPU $ 30 $ - $ 59 $ 117 $ 305 $ 565 $ 1,316 $ 2,392

Disk $ 100 $ - $ 118 $ 263 $ 564 $ 1,058 $ 2,446 $ 4,549

Tertiary Storage $ 55 $ 6 $ 45 $ 140 $ 120 $ 225 $ 305 $ 896

LAN $ 79 $ - $ 20 $ 20 $ 90 $ 100 $ 250 $ 559

Other Infrastructure $ 40 $ - $ 11 $ 26 $ 53 $ 90 $ 207 $ 427

Sftwr, Lic. & Maint. $ 50 $ 89 $ 128 $ 165 $ 215 $ 307 $ 443 $ 1,398

Overhead $ 35 $ 19 $ 47 $ 80 $ 136 $ 228 $ 455 $ 999

Hardware $ 389 $ 114 $ 428 $ 811 $ 1,484 $ 2,573 $ 5,422 $ 11,220 $ 2,572

Labor $ 469 $ 743 $ 1,181 $ 2,058 $ 3,100 $ 4,420 $ 5,216 $ 17,187 $ 5,421

Total $ 857 $ 857 $ 1,609 $ 2,869 $ 4,584 $ 6,992 $ 10,638 $ 28,407 $ 7,993

Guidance $ 855 $ 839 $ 1,600 $ 2,500 $ 4,600 $ 7,000 $ 10,700 $ 28,094 $ 8,000







 Current plan violated guidance by $370k in FY ’04, but this is a year of

some flexibility in guidance



 Strict adherence to FY ’04 guidance would …

 reduce facility capacity from 3% to 1.5% or staff by 2 FTE’s







B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 16

Tier 1 Capacity Profile





2001 2002 2003 2004 2005 2006 2007

CPU (SPECint95) 3 3 6 15 50 150 500

Disk (TBytes) 2 2 8 30 100 300 1,000

Disk (MBytes/sec) 40 40 200 600 2,000 6,000 20,000

Tape (PBytes) 0.01 0.02 0.05 0.09 0.15 0.65 1.85

Tape (MBytes/sec) 10 10 20 20 48 106 212

WAN (Mbits/sec) 155 155 622 622 2488 9952 9952









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 17

Tier 1 Cost Profiles



At Year $K



$6,000K







$5,000K







$4,000K





Hardw are

$3,000K

Labor





$2,000K







$1,000K







$-

2001 2002 2003 2004 2005 2006 2007

Year









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 18

Standalone Disk Model Benefits



 All ESD, AOD, and TAG data on local disk

 Enables analysis specific 24 hour selection passes (versus one month

aggregated passes) – faster, better tuned, more consistent selection

 Allows navigation for individual events (to all processed, but not Raw,

data) without recourse to tape and associated delay – faster more

detailed analysis of larger consistently selected data sets

 Avoids contention between analyses over ESD disk space and the

need for complex algorithms to optimize management of that space –

better result with less effort

 While prepared to serve appropriate levels of data access to other Tier

1’s, US will not in general be unduly sensitive to the performance of

other Tier 1’s or intervening network (transatlantic) and middleware –

improved system reliability, availability, robustness and performance



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 19

Tier 2 Issues



 The high availability of the complete ESD set on disk at the Tier 1 and

the associated increased frequency of ESD selection passes will, for

connected Tier 2’s (and Tier 3’s), lead to …

 More analysis activity – (Increasing CPU & Disk utilization)

 More frequent analysis passes on

 More and larger usable TAG, AOD and ESD subsets

 More network traffic into the site from the Tier 1 – (Increasing WAN utilization)

 Selection results

 Event navigation into the full disk resident ESD



 As in the case of the Tier 1, an additional year of funding before turn-on

and the increased effectiveness of “year later” funding contribute to

satisfying these increased needs within or near the integrated out year

(’05-’07) budget guidance

 The delay of some ’06 funding to ’07 is required for a better match of profiles



B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 20

Tier 2 Distribution Of Hardware Cost

Total Tier 2 Hardware Costs









CPU

Disks

3% 4% 0%

2% 2% Interactive

5%

32% FireWall

Spec. Purp

10%

LAN

Desktop

Backup

SW

Travel

2%

Videoconf

1%

8% Supplies

4%

Tapes

2% 1% 24%

Tape HW

Tape LM









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 21

Tier 1 Distribution Of Hardware Cost



Total Procurements Through 2007









23%





45%

Disk

Tertiary Storage

LAN

Other Infrastructure

Sftwr, Lic. & Maint.

CPU

14%





4%

5% 9%









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 22

FY 2007 Capacity Comparison of Models



Previous New

Tier 1 Tier 1

CPU (kSPEint95)

CPU 209

(kSPEint95) 500

Disk (TBytes)Disk (TBytes)365 1000

Tape (TBytes)

Tape 2000

(TBytes) < 2000

Tier 2 a-e Tier 2 a-e

CPU (kSPEint95)

CPU 250

(kSPEint95) 500

Disk (TBytes)Disk (TBytes)375 500

Tape (TBytes)

Tape 1000

(TBytes) < 1000

Total Total

CPU (kSPEint95)

CPU 459

(kSPEint95) 1000

Disk (TBytes)Disk (TBytes)740 1500

Tape (TBytes)

Tape 3000

(TBytes) < 3000









B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 23

Conclusions

 Standalone disk model

 A dramatic improvement over previous tape based mode –

Functionality & Performance

 A significant improvement over multi-Tier 1 disk model –

Performance, Reliability & Robustness

 Respects funding guidance in model sensitive out-years



 If costs are higher or funding lower than expected, a

graceful fallback is to access some of the data on disks at

other Tier 1’s

 Adiabaticly move toward multi-Tier 1 model







B. Gibbard Review of US LHC Software and Computing Projects 27-30 November, 2001 24



Related docs
Other docs by ewghwehws
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!