The CERN Computer Centres October 14th 2005 Tony.Cass@CERN.ch
Document Sample


The CERN
Computer Centres
October 14th 2005
Tony.Cass@CERN.ch
Talk Outline
Where
Why
What
How
Who
Tony.Cass@CERN.ch 2
Where
Where
– B513
» Main Computer Room, ~1,500m2 & 1.5kW/m2, built for mainframes
in 1970, upgraded for LHC PC clusters 2003-2005.
» Second ~1,200m2 room created in the basement in 2003 as
additional space for LHC clusters and to allow ongoing operations
during the main room upgrade. Cooling limited to 500W/m2.
– Tape Robot building ~50m from B513
» Constructed in 2001 to avoid loss of all CERN data due to an
incident in B513.
Why
What
How
Who
Tony.Cass@CERN.ch 3
Why
Where
Why
– Support
» Laboratory computing infrastructure
Campus networks—general purpose and technical
Home directory, email & web servers (10k+ users)
Administrative computing servers
» Physics computing services
Interactive cluster
Batch computing
Data recording, storage and management
Grid computing infrastructure
What
How
Who
Tony.Cass@CERN.ch 4
Physics Computing Requirements
25,000k SI2K in 2008, rising to 56,000k in 2010
– 2,500-3,000 boxes
– 500kW-600kW @ 200W/box.
2.5MW @ 0.1W/SI2K
6,800TB online disk in 2008, 11,800TB in 2010
– 1,200-1,500 boxes,
– 600kW-750kW
15PB of data per year
– 30,000 500GB cartridges/year
– Five 6,000 slot robots/year
Sustained data recording at up to 2GB/s
– Over 250 tape drives and associated servers
Tony.Cass@CERN.ch 5
What are the major issues
Where
Why
What are the major issues
– Commodity equipment from multiple vendors
– Large scale clusters
– Infrastructure issues
» Power and cooling
» Limited budget
How
Who
Tony.Cass@CERN.ch 6
Commodity equipment & many vendors
Given the requirements, significant pressure to limit cost
per SI2K and cost per TB.
Open tender purchase process
– Requirements in terms of box performance
– Reliability criteria seen as subjective and so difficult to incorporate
in process.
» Also, as internal components are similar, are branded boxes intrinsically
more reliable?
Cost requiremens and tender process lead to ―white box‖
equipment, not branded.
Tender purchase process leads to frequent changes of
bidder.
Good in that there is competition and we aren‘t reliant on a single
supplier.
Bad as we must deal with many companies, most of whom are remote
and subcontract maintenance services.
Tony.Cass@CERN.ch 7
Large Scale Clusters
Thelarge number of boxes leads to problems in
terms of
– Maintaining software homogeneity across the clusters
– Maintaining services despite the inevitable failures
– Logistics
» Boxes arrive in batches of O(500)
» Are vendors respecting the contractual warranty times?
(Have they returned the box we sent them last week…)
» How to manage service upgrades
especially as not all boxes for a service will be up at the time of upgrade
– …
Tony.Cass@CERN.ch 8
Tony.Cass@CERN.ch 9
Infrastructure Issues
Cooling capacity limits the equipment we can install
– Maximum cooling of 1.5kW/m2
– 40x1U servers @ 200W/box = 8kW/m2
We cannot provide diesel backup for the full
computer centre load.
– Swiss/French auto-transfer covers most failures.
– Dedicated zone for ―critical equipment‖ with diesel
backup and dual power supplies.
» Limited to 250kW for networks and laboratory computing
infrastructure.
» … and physics services such as Grid and data management servers
but not all the physics network, so careful planning needed in terms of
switch/router allocations and the power connections.
Tony.Cass@CERN.ch 10
How
Where
Why
What
How
Who
Tony.Cass@CERN.ch 11
How
Where
Why
What
How
– Rigorous, centralised control
Who
Tony.Cass@CERN.ch 12
ELFms
Extremely Large Farm management system
– box nodes in:
» deliver required configuration
» monitor performance and any
deviation from the required state
Node Configuration
Management
» track nodes through hardware and
software state changes
Node
Management
Three components:
– quattor for configuration, installation and node
management
– Lemon for system and service monitoring
– Leaf for managing state changes—both hardware (HMS)
and software (SMS)
Tony.Cass@CERN.ch 13
quattor
takes care of the configuration, installation
quattor
and management of nodes.
– A Configuration Database holds the ‗desired state‘ of all
fabric elements
» Node setup (CPU, HD, memory, software RPMs/PKGs, network,
system services, location, audit info…)
» Cluster (name and type, batch system, load balancing info…)
» Defined in templates arranged in hierarchies – common
properties set only once
– Autonomous management agents running on the node
take care of
» Base installation
» Service (re-)configuration
» Software installation and management
Tony.Cass@CERN.ch 14
quattor architecture
Configuration server
SQL
SQL backend
CLI
SOAP
GUI CDB
scripts
XML backend
HTTP
XML configuration profiles
SW server(s) Install server
Node Configuration Manager
NCM
Install
CompA CompB CompC
Manager
HTTP / PXE
SW
HTTP
RPMs ServiceA ServiceB ServiceC base OS
Repository
RPMs / PKGs System
SW Package Manager installer
SPMA
Managed Nodes
Tony.Cass@CERN.ch 15
Lemon
Lemon (LHC Era Monitoring) is a client-server tool
suite for monitoring status and performance
comprising
– sensors to measure the values of various metrics
» Several sensors exist to monitor node performance, process, hw
and sw monitoring, database monitoring, security, alarms
» ―External‖ sensors for metrics such as hardware errors and
computer centre power consumption.
– a monitoring agent running on each node. This manages
the sensors and sends data to the central repository
– a central repository to store the full monitoring history
» two implementations, Oracle or flat file based
– an RRD based display framework
» Pre-processes data into rrd files and creates cluster summaries
Including ―virtual‖ clusters such as the set of nodes being used by a given
experiment.
Tony.Cass@CERN.ch 16
Lemon architecture
Repository
SQL
backend
RRDTool /
PHP
Correlation Monitoring
SOAP
SOAP
Engines Repository apache
TCP/UDP HTTP
Nodes
Lemon Web
Monitoring Agent
CLI browser
Sensor Sensor Sensor User
User Workstations
Tony.Cass@CERN.ch 17
Leaf
LEAF (LHC Era Automated Fabric) is a collection of
workflows for high level node hardware and software
state management, built on top of quattor and Lemon.
– HMS (Hardware Management System)
» Track systems through all physical steps in lifecycle eg. installation,
moves, vendor calls, retirement
» Automatically requests installs, retires etc. to technicians
» GUI to locate equipment physically
» HMS implementation is CERN specific, but concepts and design should be
generic
– SMS (State Management System)
» Automated handling (and tracking of) high-level configuration steps
Reconfigure and reboot all LXPLUS nodes for new kernel and/or physical move
Drain and reconfig nodes for diagnosis / repair operations
» Issues all necessary (re)configuration commands via quattor
» extensible framework – plug-ins for site-specific operations possible
– CCTracker (in development)
» shows location of equipment in room
Tony.Cass@CERN.ch 18
Use Case: Move rack of machines
1. Import
6. Shutdown work order
HMS 10. Install work order
7. Request move
Sysadmins
Operations
2. Set to standby
11. Set to production
8. Update
9. Update
SMS
LAN DB
3. Update
12. Update
5. Take out of production
CDB
4. Refresh Node
13. Refresh 14. Put into production
Tony.Cass@CERN.ch 19
Tony.Cass@CERN.ch 20
Tony.Cass@CERN.ch 21
Tony.Cass@CERN.ch 22
Tony.Cass@CERN.ch 23
Tony.Cass@CERN.ch 24
Tony.Cass@CERN.ch 25
Tony.Cass@CERN.ch 26
Tony.Cass@CERN.ch 27
Tony.Cass@CERN.ch 28
Tony.Cass@CERN.ch 29
Tony.Cass@CERN.ch 30
Who
Where
Why
What
How
Who
– Contract Shift Operators: 1 person 24x7
– Technician level System Administration Team
» 10 team members plus 3 people for machine room operations plus
engineer level manager
– Engineer level teams for Physics computing
» System & Hardware support: approx 10FTE
» Service support: approx 10FTE
» ELFms software: 3FTE plus students and collaborators.
~30FTE-years total investment since 2001
Tony.Cass@CERN.ch 31
Summary
Physics requirements, budget and tendering
process lead to large scale clusters of commodity
hardware.
We have developed and deployed tools to install,
configure, monitor nodes and to automate hardware
and software lifecycle steps.
Services must cope with individual node failures
– already the case for simple services such as batch
– new data management software being introduced to
reduce reliance on individual servers
– focussing now on grid level services
We believe we are well prepared for LHC computing
– but expect managing the large scale, complex
environment to be an exciting adventure
Tony.Cass@CERN.ch 32
Related docs
Get documents about "