Purchase Silo - PowerPoint by rvr11980

VIEWS: 6 PAGES: 21

Purchase Silo document sample

More Info
									Computing & Networking
  User Group Meeting

       Roy Whitney
      Andy Kowalski
      Sandy Philpott
       Chip Watson

         17 June 2008


                         1
                 Users and JLab IT
• Ed Brash is User Group Board of Directors’ representative
  on the IT Steering Committee.

• Physics Computing Committee (Sandy Philpott)

• Helpdesk and CCPR requests and activities

• Challenges
   – Constrained budget
       • Staffing
       • Aging infrastructure
   – Cyber Security




                                                   2
Computing and Networking
     Infrastructure

       Andy Kowalski




                       3
              CNI Outline
• Helpdesk
• Computing
• Wide Area Network
• Cyber Security
• Networking and Asset Management




                                    4
                     Helpdesk
• Hour 8am-12pm M-F
  – Submit a CCPR via http://cc.jlab.org/
  – Dial x7155
  – Send email to helpdesk@jlab.org

• Windows XP, Vista and RHEL5 Supported
  Desktops
  – Migrating older desktops

• Mac Support?


                                            5
                     Computing
• Email Servers Upgraded
   – Dovecot IMAP Server (Indexing)
   – New File Server and IMAP Servers (Farm Nodes)

• Servers Migrating to Virtual Machines

• Printing
   – Centralized Access via jlabprt.jlab.org
   – Accounting Coming Soon

• Video Conferencing (working on EVO)

                                               6
          Wide Area Network
• Bandwidth
  – 10Gbps WAN and LAN backbone
  – Offsite Data Transfer Servers
    • scigw.jlab.org(bbftp)
    • qcdgw.jlab.org(bbcp)




                                    7
          Cyber Security Challenge
• The threat: sophistication and volume of attacks continue
  to increase.
   – Phishing Attacks
      • Spear Phishing/Whaling are now being observed at JLab.

• Federal, including DOE, requirements to meet the cyber
  security challenges require additional measures.

• JLab uses a risk based approach that incorporates
  achieving the mission while at the same time dealing with
  the threat.




                                                         8
                  Cyber Security
• Managed Desktops

   – Skype Allowed From Managed Desktops On Certain Enclaves

• Network Scanning

• Intrusion Detection

• PII/SUI (CUI) Management




                                                      9
Networking and IT Asset Management
• Network Segmentation/Enclaves
   – Firewalls

• Computer Registration
   – https://reggie.jlab.org/user/index.php

• Managing IP Addresses
   – DHCP
       • Assigns all IP addresses (most static)
       • Integrated with registration

• Automatic Port Configuration
   – Rolling out now
   – Uses registration database

                                                  10
 Scientific Computing


Chip Watson & Sandy Philpott




                               11
      Farm Evolution Motivation
• Capacity upgrades
  – Re-use of HPC clusters

• Movement to Open Source
  – O/S upgrade
  – Change from LSF to PBS




                                  13
         Farm Evolution Timetable
Nov 07: Auger/PBS available – RHEL3 - 35 nodes
Jan 08: Fedora 8 (F8) available – 50 nodes
May 08: Friendly-user mode; IFARML4,5
Jun 08: Production
   – F8 only; IFARML3 + 60 nodes from LSF IFARML alias
Jul 08: IFARML2 + 60 nodes from LSF
Aug 08: IFARML1 + 60 nodes from LSF
Sep 08: RHEL3/LSF->F8/PBS Migration complete
   – No renewal of LSF or RHEL for cluster nodes


                                                   14
          Farm F8/PBS Differences
• Code must be recompiled
   – 2.6 kernel
   – gcc 4

• Software installed locally via yum
   – cernlib
   – Mysql

• Time limits: 1 day default, 3 days max

• stdout/stderr to ~/farm_out

• Email notification


                                           15
                Farm Future Plans
• Additional nodes
   – from HPC clusters
       • CY08: ~120 4g nodes
       • CY09-10: ~60 6n nodes
   – Purchase as budgets allow

• Support for 64 bit systems when feasible & needed




                                                  16
                    Storage Evolution

• Deployment of Sun x4500 “thumpers”

• Decommissioning of Panasas
   (old /work server)

• Planned replacement of old cache nodes




                                           17
                       Tape Library
• Current STK “Powderhorn” silo is nearing end-of-life
   – Reaching capacity & running out of blank tapes
   – Doesn’t support upgrade to higher density cartridges
   – Is officially end-of-life December 2010

• Market trends
   – LTO (Linear Tape Open) Standard has proliferated since 2000
   – LTO-4 is 4x density, capacity/$, and bandwidth of 9940b:
     800 GB/tape, $100/TB, 120 MB/s
   – LTO-5, out next year, will double capacity, 1.5x bandwidth:
     1600 GB/tape, 180 MB/s
   – LTO-6 will be out prior to the 12 GeV era
     3200 GB/tape, 270 MB/s


                                                            18
           Tape Library Replacement
• Competitive procurement now in progress
    – Replace old system, support 10x growth over 5 years

• Phase 1 in August
    – System integration, software evolution
    – Begin data transfers, re-use 9940b tapes

• Tape swap through January

• 2 PB capacity by November

• DAQ to LTO-4 in January 2009

• Old silo gone in March 2009

End result: breakeven on cost by the end of 2009!


                                                            19
            Long Term Planning
• Continue to increase compute & storage capacity
  in most cost effective manner

• Improve processes & planning
   – PAC submission process
   – 12 GeV Planning…




                                           20
                                  E.g.: Hall B Requirements
             Event Simulation          2012         2013         2014          2015          2016
SPECint_rate2006 sec/event                    1.8         1.8          1.8           1.8           1.8
Number of events                        1.00E+12    1.00E+12     1.00E+12      1.00E+12      1.00E+12
Event size (KB)                                20          20           20            20            20
% Stored Long Term                           10%         25%          25%           25%           25%
Total CPU (SPECint_rate2006)             5.7E+04     5.7E+04      5.7E+04       5.7E+04       5.7E+04
Petabytes / year (PB)                          2           5            5             5             5
               Data Acquisition
Average event size (KB)                        20           20            20            20            20
Max sustained event rate (kHz)                  0            0            10            10            20
Average event rate (kHz)                        0            0            10            10            10
Average 24-hour duty factor (%)               0%           0%           50%           60%           65%
Weeks of operation / year                       0            0             0            30            30
Network (n*10gigE)                              1            1             1             1             1
Petabytes / year                              0.0          0.0           0.0           2.2           2.4
                 st
               1 Pass Analysis         2012         2013         2014          2015          2016
SPECint_rate2006 sec/event                    1.5         1.5          1.5            1.5           1.5
Number of analysis passes                       0           0          1.5            1.5           1.5
Event size out / event size in                  2           2            2              2             2
Total CPU (SPECint_rate2006)             0.0E+00     0.0E+00      0.0E+00        7.8E-03       8.4E-03
Silo Bandwidth (MB/s)                           0           0         900            900          1800
Petabytes / year                              0.0         0.0          0.0            4.4           4.7
Total SPECint_rate2006                   5.7E+04     5.7E+04      5.7E+04       5.7E+04       5.7E+04
SPECint_rate2006 / node                      600         900         1350          2025          3038
# nodes needed (current year)                 95          63           42            28            19
Petabytes / year                               2           5            5            12            12
               LQCD Computing
• JLab operates 3 clusters with nearly 1100 nodes,
  primarily for LQCD plus some accelerator modeling

• National LQCD Computing Project
  (2006-2009: BNL, FNAL, JLab; USQCD Collaboration)

• LQCD II proposal 2010-2014 would double the hardware
  budget to enable key calculations

• JLab Experimental Physics & LQCD computing share
  staff (operations & software development) & tape silo,
  providing efficiencies for both


                                                      22

								
To top