cambridge
Document Sample


Cambridge HEP Group - site report
April 2005
John Hill
27 April 2005 HEP SYSMAN meeting 1
Group hardware (non-GRID)
• Three file servers:
– 1 ALPHAserver (DIGITAL UNIX 4.0F) – soon to be
decommissioned (no other D.U. left in group).
– 2 DELL (Intel) servers (one is running Linux Redhat 7.3, the other
SLC 304).
– total of ~10TB disk space served
• 3TB IDE-SCSI RAID
• 5.8TB SCSI-SCSI RAID
• Rest JBODs
– DLT8000 autoloader
– LTO-3 autoloader on order
– all UPS-protected
27 April 2005 HEP SYSMAN meeting 2
Group hardware (non-GRID)
• RAID experiences:
– IDE-SCSI: very painful! Individual disk “failures” daily when
array being used heavily. About 2 years too late – find this seems
to be a feature of IDE disks. Very few of these failures are terminal
– disk works fine when reintroduced into the array. Only a handful
of disks replaced in 2.5 years.
– SCSI-SCSI: so far so good. 2 disk failures (out of 48 in use) in last
4-5 months, neither due to heavy usage. At present I’m assuming
the bathtub model.
• Of course, the downside is that the SCSI-SCSI disks are much more
expensive, and it is not a route we can continue down.
27 April 2005 HEP SYSMAN meeting 3
Group hardware (non-GRID)
• 69 desktop PCs (give-or-take...)
– 41 Linux RedHat 7.3 (soon SLC3)
– 27 Windows XP/2000
– 1 Windows 95 (for our probestation)
– wide range of performance (400MHz Pentium II up to 3.2 GHz
Pentium 4)
• 1 MAC desktop to complicate matters!
• 33 PC and 6 MAC laptops registered (but certainly not all are real
machines...). Purchased with a confusing mix of group, college and
private funds.
• Also a few PDAs starting to appear.
27 April 2005 HEP SYSMAN meeting 4
Group Hardware (non-GRID)
• Printers
– 2 LexMark B&W laser printers (20ppm+) – 1 supports A3 for
(e.g.) CAD drawings
– HP Color LaserJet 4550 (for paper)
– 2 HP Business DeskJets (for transparencies)
– Epson A2 colour inkjet (specialised printing: CAD, posters,...)
27 April 2005 HEP SYSMAN meeting 5
Hardware for GRID work
• Long-standing: 20-CPU analysis farm (MIMCluster from
Workstations UK)
– 1.13 GHz Pentium III CPUs
– Linux SL3
• Recently added further 10 systems:
– 2.8 GHz Xeon dual-CPU DELL PowerEdge 1850 servers
– Linux SLC3
• Also 4 systems provided by GridPP:
– 2.8GHz Xeon dual-CPU Streamline Computing servers running
SL3
– Intended as front-end boxes for LCG2 deployment.
• 3TB IDE-SCSI RAID specifically for GRID work (ie. in addition to
that mentioned earlier).
• GRID computers are being used both for HEP grids and for CamGrid –
this is creating a few configuration headaches!
27 April 2005 HEP SYSMAN meeting 6
Network
• Wired:
– departmental network based on switches from Extreme Networks
– Gigabit Ethernet fibre-optic backbone between switches
– a minimum of Fast Ethernet to all desktops (with Gigabit available,
though not yet deployed)
– Gigabit connection to campus backbone
– Gigabit Ethernet on campus backbone (likely to be upgraded to
10Gbps soon)
– Campus network connects to EastNET via a ~8Gbps link and
hence to SuperJANET.
– departmental network currently has a mainly “flat” topology - but
VLANs are being slowly introduced.
27 April 2005 HEP SYSMAN meeting 7
Network
• Wireless:
– Group bought a cheap and cheerful interim solution using Buffalo
wireless hub
– Use limited to registered laptops and PDAs
– 19 PC and 5 MAC laptops, 3 PDAs registered in hub
– Intended initially for convenience of users meeting in group library
(where hub is sited) but find that used heavily from offices also.
– Range is limited (~10-15m, which helps with security!) due to
metal framework of building, so not all group members can see the
hub.
– Department considering a rollout of wireless, but not a high
priority (mainly because many influential groups will not permit it
in their areas, as it interferes with experiments).
27 April 2005 HEP SYSMAN meeting 8
Campus
Group Hub
Central
switch
departmental City
switch SuperJANET
Router
PoP
Hub
router
Group
switch Hub
Hub
Hub
1Gbps optical
Printer fibre
Server
Computer Physics
100Mbps UTP
Department
SuperJANET
connection
via EastNET
(~8Gbps total)
27 April 2005 HEP SYSMAN meeting 9
Video Conferencing
• “mid-range” system (Zydacron Z360 (H.323) and ZC206 (ISDN)
cards)
• Sony EVI D31 Camera (pan, tilt, zoom)
• hosted by (aging) 500MHz Pentium III PC
• use (existing) data projector to display video on projection screen
• OK for up to ~12 people (though best for 6 or fewer!)
• in use for nearly 4 years now, so beginning to consider a replacement
• Possible that the department may provide a VC system within the next
year.
27 April 2005 HEP SYSMAN meeting 10
Software
• Nothing special…
– AutoCAD, CADENCE etc for mechanical and electronic CAD
work – all CAD work now PC-based (SUN decommissioned last
Christmas).
– Moving to SLC3 on desktops as soon as feasible.
– Departmental license for Mathematica – allows home use for no
extra cost.
– XFree86 on Windows XP for X11 provision.
– Group continues to run its own mail server (using Exim) –
probably would be better to use campus facilities, but users not
convinced!
– GRID nodes use Condor for batch – common pool for LCG and
CamGrid. Conflicting networking requirements, as well as
Condor’s immaturity in some areas is causing some difficulty with
this arrangement at present.
27 April 2005 HEP SYSMAN meeting 11
Future plans
• As usual, our plans are very fluid – the world changes quickly, and
physicists rarely know what they really want ahead of time! Hence try
to avoid large purchases where possible.
• Cycle of desktop replacement is slowing – even 3-4 year-old PCs are
adequate and the performance of new PCs is relatively static at present.
• Extra disk space will be needed: ~6-8TB/year is current best guess.
• Continue to enhance our farm provision – partly by taking advantage
of CamGrid for HEP use.
27 April 2005 HEP SYSMAN meeting 12
Concerns
• Main concern is how we manage all the extra kit in the medium term –
especially as the system management team will become heavily
embroiled in ATLAS/LHCb commissioning very soon.
• Security obviously a permanent concern. Departmental procedures are
improving (from a very low base...). Most recent incidents have been
due to human error rather than software loopholes.
• Also a problem coping with experimental bloatware – which is not
confined to LHC experiments
27 April 2005 HEP SYSMAN meeting 13