Application Note
Grid Computing
Enterasys and CERN on the LHC Grid—the Openlab Project
Page 1 of 11 • Application Note
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 CERN’S Need for Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 How Enterasys Helps Cern in its Efforts . . . . . . . . . . . . . . . . . . . . . . . 6 Enterasys Products Deployed at CERN . . . . . . . . . . . . . . . . . . . . . . . . 7 Production Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 CERN Openlab for Datagrid Applications . . . . . . . . . . . . . . . . . . . . . 9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Page 2 of 11 • Application Note
Introduction
The last decade has seen a substantial increase in commodity computer and network performance, primarily as a result of faster hardware and more sophisticated software. Nevertheless, there are still problems in fields like science, engineering, and business, which cannot effectively use the current generation of supercomputers. Due to their size and complexity, the problems these organizations face are often very numerically and/or data intensive and, consequently, require a variety of heterogeneous resources that are not available on a single machine. A number of teams have conducted experimental studies on the cooperative use of geographically distributed resources unified to act as a single powerful computer. This new approach is known by several names, such as metacomputing, scalable computing, global computing, Internet computing and more recently, peer-to-peer or Grid computing. The early efforts in Grid computing started as a project to link supercomputing sites, but have now grown far beyond their original intent. In fact, many applications—including collaborative engineering, data exploration, high-throughput computing, and distributed supercomputing—can benefit from the Grid infrastructure. Moreover, due to the rapid growth of the Internet and IP/Ethernet, there has been a growing interest in IP/Ethernet-based distributed computing, and many projects aim to exploit this as an infrastructure for running coarse-grained distributed and parallel applications. In this context, IP/Ethernet has the capability to be a platform for parallel and collaborative work as well as a key technology to create a pervasive and ubiquitous Grid-based infrastructure.
From Supercomputer through Cluster to Grid-Computing Utility
1980
1990
2000
Page 3 of 11 • Application Note
CERN’s Need for Grid Computing
Funded by 20 European member states and located in Geneva, Switzerland, CERN is the world’s largest particle physics center. The center is staffed by 2,500 scientists, including physicists and engineers, and 6,500 visiting scientist, who represent 80 nations and more than 500 universities. Particle physics looks at elementary particles, which make up all matter in the universe, and the fundamental forces that hold matter together. Special tools are required to create and study new particles. These are: • Accelerators. Huge machines able to speed up particles to very high energies before colliding them into other particles. • Detectors. Massive instruments, which register the particles produced when the accelerated particles collide. The Large Hadron Collider (LHC), the world’s largest accelerator, is currently under construction. The LHC will collide beams of protons at an energy of 14 TeV. Using the atest superconducting technologies, it will operate at about -270 degrees Celsius, just above absolute zero. With its 27 km circumference, the accelerator will be the largest superconducting installation in the world.
After a particle collision, a physicist’s goal is to count, trace and characterize all the particles produced and fully reconstruct the process. Among all tracks, the presence of “special shapes” is a sign of interesting interactions.
Page 4 of 11 • Application Note
Selectivity: 1 in 1013 Like looking for 1 person in a thousand world populations! Or for a neddle in 20 million haystacks! You are looking for the “signature”
The LHC creates 40 million collisions per second; after filtering, 100 collisions of interest per second remain. A Megabyte of digitised information for each collision results in a recording rate of 0.1 Gigabytes per sec. Each year, 1,011 recorded collisions result in 10 Petabytes of data. This creates a real challenge for storage and CPU capacity.
Estimated Mass Storage at CERN
140— 120— PetaBytes 100— 80— 60— 40— 20— Other experiments LHC
0— I I I I I I I I I I I I I 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year
Estimated CPU Capacity at CERN
6,000 5,000 K S195 4,000 3,000 2,000 1,000 0 LHC Moore’s law (based on 2000 data)
I I I I I I I I I I I I I 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year
Page 5 of 11 • Application Note
How Enterasys Helps CERN in its Efforts
CERN has undertaken a number of projects, which contribute to the amount of storage and CPU capacity required. • CERN projects: LHC Computing Grid (LCG) • EC funded projects led by CERN: European DataGrid (EDG), Enabling Grids for E-Science in Europe (EGEE) • Industry-funded projects: CERN Openlab for DataGrid applications The LCG interconnects all data centers to form a Grid.
10 Gbps
Tier 0, CERN: 10,000+ CPUs 2 Tb/s+ Network
Tier 1 5000+ CPUs
Tier 1 5000+ CPUs
Etc., etc.
Tier 2
Tier 2
Tier 2
Tier 2
Within the CERN Openlab for DataGrid applications, participating organizations like Enterasys, IBM, Intel, Hewlett Packard and others provide the latest technology, which enables CERN to: • Build an ultra high-performance computer cluster • Link it to the DataGrid and test its performance • Evaluate the potential of new technologies for LCG
Page 6 of 11 • Application Note
In this context, IBM is primarily responsible for the storage, Intel for the 64 Bit (Itanium CPUs) and Gigabit/10-Gigabit Ethernet NICs, and Hewlett Packard for the servers. Enterasys delivers the networking equipment: high-speed Gigabit and 10-Gigabit Ethernet switches and routers. The CERN Opencluster
Remote Fabrics
WAN
10 Gigabit External Links
Terabit LAN
CPU Servers
Disk Servers Storage Systems
Enterasys Products Deployed at CERN
The relationship between Enterasys and CERN began in late 1998. At that time, CERN’s need for bandwidth had increased rapidly and their existing FDDI routing infrastructure could not cope. They wanted to migrate to Gigabit Ethernet routing, and after a market survey, they chose Enterasys’ X-Pedition™ switch routers to fulfill their requirements. Production Network The X-Pedition switch routers met a range of CERN’s criteria especially well: • Large switching capacity • Redundancy • Layer 3 functionality (Unicast and Multicast Routing, ACLs etc.) • Layer 4 features and Quality of Service • Debugging • Stability • Software maturity
Page 7 of 11 • Application Note
The network was up and running by February 1999 and is still in place today. In 2001, when a new project came up for the accelerator controls, more X-Pedition switch routers were deployed. By mid-2004, the CERN infrastructure was made up of the following: • Enterasys X-Pedition switch routers: 28 X-Pedition 8600s and 50 X-Pedition 8000s • 1,200 subnets • 650 switches; ~15,000 ports • 860 Ethernet hubs; ~20,000 ports • 400+ servers with Gigabit Ethernet attachment • 15,000 active connections • 35,000 sockets; 1,200 km of UTP cable • 170 starpoints (from 20 to 1,000 outlets) • 2,500 km of fibers • 1,500+ requests for adds, moves and changes per month • Multivendor site using only standards-based equipment The network design is fully redundant, so maximum uptime can be achieved.
Server Farms
Technical Network
Computer Center
Remote Major Starpoints
. . . etc. Firewall CIXP Internet
Page 8 of 11 • Application Note
CERN Openlab for DataGrid Applications
Since the fall 2004, the network for Openlab has been upgraded in terms of Hewlett Packard Itanium CPUs (more than 32). The number of 10-Gigabit Ethernet connections has increased, and up to 400 Linux CPUs are attached with Gigabit Ethernet. The requirements for core network equipment include: • Raw throughput for Gigabit and 10-Gigabit Ethernet, non-blocking —The product must be able to route and switch all front-panel ports at wire-speed concurrently. • Scalability beyond 10-Gigabit Ethernet —Throughput per slot should be capable of being upgraded to at least 80 Gbps to cope with new high-speed Ethernet standards (40-Gigabit Ethernet is under discussion). • Extreme fairness for all attached ports —In the case of oversubscription by network design, all ports must be served equally, even without Quality of Service enabled. • Guaranteed Quality of Service through the whole system —Queuing algorithms need to measure the congestion of an outgoing link. When the sources transmit at the maximum speed of the link, the link connected to the destination could become congested. The concept of the virtual queue is to collect the queue length information of all packets having a common output port and to transmit this information from the ingress module of the source input ports to the egress module of the common output port. This avoids congestion on the outgoing link. • Throughput independent of traffic patterns —Typical enterprise switch products are flow based. This architecture enables a whole set of desirable features but is also a limiting factor for core products. If the traffic pattern is not predictable and the number of new flows exceeds the capacity of the system, then system throughput decreases dramatically. An architecture that does a packet-by-packet lookup does have clear advantages here; the throughput is independent of whether the traffic consists only of a single stream or only of new flows. • High availability, 24 x 7 operation —This requires a system that can cover any failure without disrupting the traffic that flows through it. Even software updates have to be done without any service interruption. • Standards based —Obviously, due to its scale, any function or protocol used within a Grid environment must be standards based.
Page 9 of 11 • Application Note
In order to meet and exceed these criteria within the new core network environment, CERN has deployed state-of-the-art products from Enterasys. The Matrix N-Series has been used for the aggregation of 10/100/1000 Ethernet connections into 10-Gigabit Ethernet trunks toward the core. Within the core itself, the Matrix X, Enterasys next-generation technology, is being evaluated. The Matrix X provides non-blocking, single-stream 10-Gigabit Ethernet true routing capabilities. The overall system performance of the Matrix X can reach up to 5.12 Terabit per second and the I/O performance can reach up to 1.2 Terabits per second. Up to 64 10-Gigabit ports in a single system—up to 128 10-Gigabit ports in a 19" rack—can be supported by this architecture. Enterasys Operating System firmware provides the Matrix X with standards-based support for redundancy (802.1w RTSP, 802.3ad LACP, VRRP, OSPF, IS-IS, BGP) and its integrated high-availability features, such as hitless software upgrades, ensure that this product is perfectly suited to core Grid applications. The following diagram depicts the current Openlab cluster design.
400 Linux CPUs
1000Base-TX Ethernet
Enterasys Matrix N7
10GBase-SR Ethernet (Trunks)
HP Itanium 64 Bit
Enterasys Matrix X16
HP Itanium 64 Bit
Page 10 of 11 • Application Note
Summary
To meet all the requirements for LHC computing, CERN faces a number of challenges. Enterasys and CERN have a long-term relationship, a large part of which constitutes the continued participation within the Openlab for DataGrid applications. The new Enterasys products, especially the Matrix X, are perfectly suited to meeting the needs for Grid computing. For more information about the Matrix X, or about any other Enterasys product or solution, visit the web at enterasys.com.
All contents are copyright © 2005 Enterasys Networks, Inc. All rights reserved. Lit. #9013829-1 01/05
Page 11 of 11 • Application Note