Providing Quality of Service Networking to High Performance GRID by sdfsb346f


More Info
									Providing Quality of Service Networking to
  High Performance GRID Applications
    Miguel Rio, Javier Orellana, Andrea di Donato, Frank Saka, Ian Bridge, Peter Clarke
          Networked Systems Centre of Excellence, University College London

                                                  high demands from the network. Once more
                                                  scientists require the network to be pushed to
                                                  the limits and challenge the idea that
                     ABSTRACT                     bandwidth availability is not a problem any
This paper reports on QoS experiments and         more.
demonstrations done in the MB-NG and              Unlike traditional applications like email,
DataTAG EU projects. These are leading edge       WWW or even peer-to-peer systems, these
network research projects involving more that     applications require reliable file transfers on
50 researchers in the UK, Europe and North        the order of the 1Gb/s and have often tight
America, concerned with the development and       delay requirements. High Energy Physics,
testing of protocols and standards for the next   Radio Astronomy or High Performance Steered
generation of high speed networks.                Simulations cannot achieve its goals on a
We implemented and tested the Differentiated      sustainable, efficient and reliable way with
Services Architecture (DiffServ) in a multi-      current production networks almost totally
domain, 2.5 Gbits/s network (the first such       based on a best-effort service model.
deployment) defining appropriate Service          Although part of the networking community
Level Agreements (SLAs) to be used between        believe that bandwidth over-provisioning will
administrative domains to guarantee end-to-end    always solve every network problem, our work
Quality of Service.                               shows that Quality of Service enabled
We also investigated the behaviour of DiffServ    networks provide a vital role to support high
on High Bandwidth, high delay development         performance       applications     efficiently,
networks connecting Europe and North              inexpensively and with smaller additional
America using a variety of manufacturer’s         configuration work.
These quality of service tests also included      QoS      performance     has     been    studied
innovative MPLS (Multi-Protocol Label             exhaustively through analytical work and
Switching) experiments to establish guaranteed    simulation (see for example [1]). These works
bandwidth connections to GRID applications in     are of extreme importance and relevance but
a fast and efficient way.                         they have two drawbacks. On one hand
We finally report on experiences delivering       simulation models have proved to be
quality of Service networking to high             incomplete [2,3] because they fail to represent
performance applications like Particle Physics    all possible real configurations of the network.
data transfer and High Performance                They also fail to account for several
Computation. This included implementation         implementation details of “real” networks.
and development of middleware incorporated        These include operating system tuning, driver
in the Globus toolkit that enables these          configurations, memory and CPU overflows,
applications to easily use these network          etc. Therefore testbed networks play a vital role
services.                                         in network research as a way of consolidating
                                                  technology and, through exhaustive debugging
                                                  and testing, provide implementation guidelines
                                                  to the QoS network user community.
              I.   INTRODUCTION

                                                  The work reported here used two testbeds with
                                                  different characteristics: A United Kingdom
Last years have seen the appearance of a wide     testbed used in the context of the MB-NG
range of scientific applications with extremely
project and a European/Transatlantic one used          Control Plane in section V. We describe some
in the context of the DataTAG project.                 GRID demonstrations using our QoS testbeds
                                                       in section VI and present conclusions and
                                                       further work in section VII.
The Managed Bandwidth - Next Generation
(MB-NG) project [4] created a pan-UK
Networking and Grid testbed that focused upon
advanced        networking       issues     and
interoperability of administrative domains.
The project addressed the issues which arise in
the sector of high performance inter-Grid
networking, including sustained and reliable
high performance data replication and end-to-
end advanced network services.

MB-NG testbed can be seen in Figure 1 It
consists of a triangle connecting RAL,
University of and London at OC-48 (2.5Gb/s)
                                                               Figure 2: DataTAG Network
speeds using CISCO’s 12000. Each of the edge
domains is built with 2 Cisco’s 7600
interconnected at 1Gb/s.
                                                               II. DIFFERENTIATED SERVICES

   Manchester                 UKERNA                   In the last decade there has been many attempts
                                             2.5Gb/s   to provide a Quality of Service enabled
                                                       network that extends the current best-effort
                                                       Internet by allowing applications to request
                                                       specific bandwidth, delay or loss. These
                                        London         included complete new networks like ATM [6]
                                                       or “extensions” to the TCP/IP like the
                                                       Integrated Services Architecture [7]. Both these
                                                       approaches required state to be stored in every
                                                       router of the entire path for every connection.
                                                       Soon was realized that the core routers would
          Figure 1: MB-NG Network                      not be able to cope and these architectures
                                                       would not scale.

                                                       With the Differentiated Services Architecture
The DataTAG project [5] created a large-scale          [8] a simpler solution was proposed. Traffic at
intercontinental Grid testbed involving the            the edges is classified into classes and routers
European DataGrid project, several national            in the core only have to schedule the traffic
projects in Europe, and related Grid projects in       among these classes. Typically routers in the
the USA. It involves more than 40 people and           core only deal with approximately 10 classes
among other things is researching inter-domain         making it easier and manageable to implement.
quality of Service and high throughput                 Routers at the edges may have to maintain per-
transfers of high delay networks (making use           flow state (specially the first router in the path)
of an intercontinental link connecting Geneva          but since the amount of traffic is several orders
to Chicago which can be seen in Figure 2).             of magnitude smaller this is not a significant
This paper is organized as follows. In the next
session we describe the Differentiated Services        However applications can only choose now in
Architecture and some experimental results             which class they want to be classified into as
implementing it in our testbeds. In section III        opposed to a complete traffic specification in
we describe MPLS and why it is a useful                the IntServ architecture. This makes
technology for GRID applications. In section           experimentation in testbeds crucial to
IV we discuss Service Level Agreements                 understand how applications should make use
definition between administrative domains              of a DiffServ enabled network.
followed by the description of Middleware and
Our first tests used iperf [9], a traffic
generation tool (which we used to generate
UDP flows) to understand the end to end                                                                                                            III. MPLS
effects produced by such a network. We tested
the implementation of two new services: Less                                                                                     MPLS – Multiprotocol Label Switching [10]
than Best-effort (LBE) and Expedited                                                                                             has its origins in the IP over ATM effort.
Forwarding which will be of use by different                                                                                     Nevertheless soon people realized that a label
kinds of applications. Both classes were tested                                                                                  switching technology could be extended to
individually against Best-effort traffic.                                                                                        other layer 2 technologies and complement IP
                                                                                                                                 in a global scale.
Less than Best-effort tests can be seen in
Figure 3 where we injected traffic in two                                                                                        MPLS is considered by some as a layer 2.5
classes increasing the offered load on both of                                                                                   technology since it resides between IP and the
them simultaneously. Here Scheduling                                                                                             underlying medium. It adds a small label to
mechanisms guarantee at least 1% of the link                                                                                     each packet and the forwarding decision is
capacity to LBE and the rest to normal Best-                                                                                     made solely based on this label. To establish
Effort. When there is no congestion both                                                                                         these labels a signaling protocol, like RSVP-
classes share the link equally. When the link                                                                                    TE [11], must be used. These signaling
gets saturated LBE traffic gets dropped and in                                                                                   protocols allow specifying explicitly the route
the extreme only 1% of the link capacity is                                                                                      that the MPLS flows will take bypassing
guaranteed to be given to LBE.                                                                                                   normal routing tables (see Figure 5). Here we
                                                                                                                                 can see Label Edge Routers (LER) classifying
                                                                                                                                 traffic into a specific label and Label Switch
                                                      Lest Than Best Effort
                                                                                                                                 Routers forwarding traffic according to these

                      1000                                                                                                       MPLS has two major uses: Traffic Engineering
  Throughput (Mbps)

                                                                                                                  Bes t Effort   and VPNs – Virtual Private Networks. In this
                                                                                                                                 work we were mainly concerned with the

                                                                                                                                 former. The ability to switch based on a label,
                                                                                                                                 as opposed to traditional IP forwarding based
                             0     200   400      600         800         1000
                                                 per flow offered load (Mbps)
                                                                                    1200     1400   1600
                                                                                                                                 solely in the IP destination address, allows us
                                                                                                                                 to manage available bandwidth in a more
                                 Figure 3: Less than Best-effort                                                                 efficient way. Since GRID applications have
                                                                                                                                 traffic flows orders of magnitude bigger than
                                                                                                                                 traditional applications but will represent a
Expedited Forwarding is the premium service                                                                                      small percentage of the flows in the network, it
in the DiffServ architecture. Whatever the                                                                                       becomes cost efficient to select dedicated paths
congestion level of the link EF traffic should                                                                                   for their flows. This way we can be sure that
always receive the same treatment. In our                                                                                        the network complies with the QoS constraints
example (see Figure 4) 10% of the link                                                                                           and that the bandwidth available in the network
capacity is allocated to EF and this is always                                                                                   is used on a more efficient way.
guaranteed. If EF traffic exceeds that
percentage the remaining traffic is dropped or,
less frequently, remarked to lower priority.

                                                  Throughput OC-48 (EF-10%)


 Throughput (Mbps)

                                                                                                    BE1+BE2+BE3 received
                                                                                                    EF received


                             0     200     400          600         800          1000      1200
                                          per flow offered load (Mbps)

                                 Figure 4: Expedited Forwarding
                                                                                                                                           Figure 5: MPLS Example
                                                                                           As our first definition, we are trying to
Unfortunately at the time of writing MPLS                                                  standardize the definition of IP Premium (or
implementations we used do not allow MPLS                                                  EF – Expedited Forwarding [12] in the
traffic to be isolated in case of congestion.                                              DiffServ literature). This is to be used by
Although the signaling protocol allows                                                     applications that require tight delay bounds.
specifying a bandwidth for the flow, this will                                             The SLA is divided in two parts: An
not be a guaranteed bandwidth in the case of                                               administrative part and a Service Level
congested link(s). To force this bandwidth                                                 Specification part (SLS). The SLS contains
guarantee, traffic need to be policed at the edge                                          information about:
of the network.
                                                                                           •     Scope - defines the topological region to
In MB-NG we executed tests to verify if MPLS                                                     which the IP premium service will be
could be used to reserve bandwidth for a given                                                   provided
TCP flow. In Figure 6 we can see the result of                                             •     Flow Description – This will indicate for
reserving an MPLS tunnel for a given TCP                                                         which IP packets the QoS guarantees of
connection using explicit routes. Not only we                                                    the SLS will be applied
can optimize the network bandwidth but we                                                  •     Performance      Guarantees       –    The
could guarantee with very good precision a                                                       performance guarantee field depicts the
400Mbits/s connection to our application (in                                                     guarantees that the network offers to the
this case just simulated traffic with iperf). The                                                customer for the packet stream described
typical TCP saw-tooth behaviour did not                                                          by the flow descriptor over the topological
prevent us from having a very stable TCP                                                         extent given by the scope value. The
connection in a congested network.                                                               suggested performance parameters for the
                                                                                                 IP Premium are:

                                  TCP flow over MPLS tunnel                                            o      One-way delay
              2000                                                                                     o      Inter-Packet delay variation
                                                                                                       o      One way packet loss
              1400                                                                                     o      Capacity

                                                                   Single TCP connection               o      Maximum Transfer Unit
                                                                   Background Traffic


                                                                                                 Traffic Envelope and Traffic Conformance
                                                                                           •     Excess treatment
                     0        5       10          15          20                                 Service Schedule
               Figure 6: MPLS for Flow Reservation                                         •     User visible SLS metrics

                                                                                           This SLA template is inspired by the one
                         IV. SERVICE LEVEL AGREEMENTS                                      DANTE defined in [13] and can be read in
                                                                                           more detail in [14].
An important issue in the implementation of an
end to end Differentiated Services enabled
network is the definition of Service Level
Agreements (SLAs) between Administrative                                                       V. MIDDLEWARE FOR THE CONTROL PLANE
Domains. Because individual flows are only
inspected in the edge of the network, strong,
enforceable agreements about the traffic                                                   The final piece for providing the potential of
aggregates need to be made at each border of                                               QoS enabled networks to the applications is a
every pair of domains, so that all the                                                     usable and efficient control plane. Even when
guarantees made by all the providers can be                                                the network is configured to support
met.                                                                                       Differentiated Services and/or MPLS it is
                                                                                           unreasonable to assume human intervention for
One of the goals of MB-NG as a leading multi-                                              every flow request. There has to be a way for
domain DiffServ experimental network is to                                                 Applications to request resources from the
provide guidelines for the definition and                                                  network.     In   the    Integrated    Services
implementation of Service Level Agreements.                                                Architecture applications would use a signaling
                                                                                           protocol like RSVP [15] to allocated resources.
In a DiffServ network resources are not              transfer    applications     and      creating
allocated in the entire path for a specific flow.    documentation of how to port other ones.
The literature [16] describes two mechanisms
to achieve this: in the first RSVP is used and
DiffServ clouds are seen as single hops. In the
second a Bandwidth Broker [8] per domain is
used. The application “contacts” the Bandwidth                  Application
Broker which is responsible to check,
guarantee and possible reserve resources.

It is still unclear how the network of
Bandwidth Brokers will work in the future and
serious doubts to its scalability are always
raised. In our context, where a small number of
users, on a small number of computers using a         Figure 7: Bandwidth Broker Architecture
small number of Administrative domains, we
can postpone the scalability issue and
implement a bandwidth broker architecture that
                                                                 VI. DEMONSTRATIONS
works in our testbeds.

                                                     The concluding part of the work reported here
We      are   currently     researching the          was to execute demonstrations of High
implementation of two architectures: The             Performance Applications on our QoS testbed.
GARA architecture [17] and to a smaller              These kinds of applications will drive future
extend the GRS architecture [18].                    network research and are, therefore, a vital
                                                     piece to our work. We worked with two
                                                     applications: High Performance Visualisation
Gara was first presented in [17] and is is tightly   and High Energy Particle Physics
connected to the Globus toolkit middleware
package [19] (although in the future it may be
made standalone). It is designed to be the           High      Performance      Computing      (HPC)
module that reserves resources (mainly               Visualisation applications have particularly
network resources but not only) in GRIDs.            different requirements than pure data transfer
                                                     applications. Because they are frequently
                                                     interactive, they have tight delay constraints in
GARA follows the Bandwidth Broker                    both directions of the communication. When
architecture that can be seen in Figure 7 a          these requirements are coupled with high
program called Bandwidth Broker (BB)                 transfer rates (between 500Mbits/s and 1Gb/s)
receives requests from applications and              the necessity of new network paradigms
reserves, when possible, appropriate resources       becomes evident.
in the network. This Bandwidth Broker is
designed to interact with heterogeneous              The RealityGRID (see Figure 8) project aims
networks hiding the particulars of each              to grid-enable the realistic modeling and
router/switch implementation from the final          simulation of complex condensed matter
users. Applications should be linked with the        structures at the meso- and nanoscale levels as
client part of the Middleware to interact with       well as the discovery of new materials. The
the BB.                                              project also involves applications in
                                                     bioinformatics and its long-term ambition is to
                                                     provide generic technology for grid based
Both in MB-NG and DataTAG we are                     scientific, medical and commercial.
contributing to the development of GARA and
implementing it on the testbeds running trials       Our tests with RealityGRID consist of
with simple applications. Since applications         transferring visualization data from a remote
need to be modified to interact with GARA this       high performance graphic serve to a user’s
is unreasonable to get all the applications to       client interface. The Differentiated Services
work with it. We are porting simple file             enabled network allows the application to run
seamlessly across multiple domains in several        Our tests in an HEP environment tend to be
degrees of network congestion.                       bulk data transfer where the delay requirements
                                                     are not as tight. The use of LBE (Less than best
                                                     effort) is appropriate for this scenario since we
                                                     can use spare capacity when available without
                                                     affecting the rest of the traffic when the
                                                     network is congested or near congestion.

     Figure 8: Communications between
  simulation, visualisation and client in the
            RealityGRID project

                                                            Figure 9: HEP Data Collection

The second application are that we focused our         VII. CONCLUSIONS AND FURTHER WORK
tests is the High Energy Particle Physics. By
the nature of its large international                Experimental work in network testbed plays a
collaborations and data-intensive experiments,       crucial part in network research. Many
particle physics has long been in the vanguard       problems are undetected by analytical and
of computer networking for scientific research.      simulation work which practical experiments
The need for particle physics to engage in the       find in its early stages. They also provide good
development of high-performance networks for         feedback for new topics of theoretical research.
research is becoming stronger in the latest
generation of experiments, which are                 In our experiments we concluded that quality
producing, or will produce, so much data that        of service networks will play an important role
traditional model of data production and             in future GRIDs and IP networks in general.
analysis centred on the laboratories at which        Bandwidth over-provisioning of the core
the experiments are located is no longer viable,     network does not solve all the problems and it
and the exploitation of the experiments can          will be impossible to guarantee end to end.
only be performed through the creation of large
distributed computing facilities. These facilities   Both Differentiated Services and MPLS
need to exchange very large volumes of data in       provide allow for the creation of valuable
a controlled production schedule, often with         services for the scientific community with no
low real-time requirements on such                   major extra administration effort.
characteristics as delay or jitter, but with very
high aggregate throughputs.                          We       successfully  demonstrated     high
                                                     performance applications in a multi-domain
The requirements of HEP in data transport and        QoS network, showing major qualitative
management are one of the high profile               improvements on the quality perceived by the
motivations for “hybrid service networks”. The       final users.
next generation of collider experiments will
produce vast datasets measured in Petabytes          As current work we are trying to integrate
that can only be processed by globally               GARA middleware into the OGSA [20]
distributed computing resources (see Figure 9).      architecture and researching how we can scale
High-bandwidth data transport between                the Bandwidth Broker Architecture to several
federated processing centres is therefore an         domains. Work is also being done on the
essential component of the reconstruction and        integration    of    AAA       (Authentication,
analysis of events recorded by HEP                   Authorization and Accounting) mechanism
experiments.                                         into the GARA framework to solve crucial
                                                     security problems arising in the GRID
                                                                    APPENDIX A: CONFIGURATION EXAMPLE
Our QoS tests are being extended to research                     The following example shows an example for
the behavior of new proposals for TCP in a                       the configuration of DiffServ in Cisco IOS. As
Differentiated Services enabled network. This                    can be seen the amount of configuration
will enable us to use more efficiently                           needed in each router is minimal.
applications that require reliable transfers in a
QoS network.                                                     class-map match-any EF
                                                                   match ip dscp 46
                                                                 class-map match-any BE
                                                                   match ip dscp 0
                   VIII. REFERENCES
                                                                 class-map match-any LBE
[1]       Best-Effort versus Reservations: A Simple                match ip dscp 8
       Comparative Analysis, Lee Breslau and Scott
       Shenker, in Proceedings of SIGCOMM 98,                    !
       September 1998.                                           !
[2]       Difficulties in simulating the Internet. Sally Floyd   policy-map UCL_policy
       and Vern Paxson. IEEE/ACM Transactions on                   class BE
       Networking volume 9 number 4, 2001
[3]       Internet Research needs better models, Sally               bandwidth percent 88
       Floyd, Eddie Kohler, in Proceedings of HOTNETS-1,           class LBE
       October 2002                                                  bandwidth percent 1
[4]                                  class EF
[6]       Essentials of ATM Networks and Services, Oliver             priority percent 10
       Ibe, Addison Wesley, 1997
[7]       RFC 1633 – Integrated Services in the Internet
       Architecture: an Overview, R. Braden, D. Clark, S.        interface POS4/1
       Shenker, June 1994
[8]       RFC 2475 – An Architecture for Differentiated           …
       Services, S. Blake, D. Black, M. Carlson, E. Davies,       service-policy output UCL_policy
       Z. Wang, W. Weiss. December 1998                           …
[10]        MPLS: Technology and Applications, Bruce S.
       Davie and Yakov Rekhter, Morgan Kaufmann Series
       on Networking, May 2000
[11]        RSVP Signaling Extensions for MPLS Traffic
       Engineering, White Paper, Juniper Networks, May
[12]        RFC 2598 – An Expedited Forwarding PHB, Van
       Jacobsen, K. Nichols, K. Poduri, June 1999
[13]        SLA definition for the provision of an EF-based
       service, Christos Bouras, Mauro Campanella and
       Afrodite Sevasti, Technical Report

[14]       SLA definition – MB-NG Technical Report
       (work in progress)
[15]       RSVP – A New Resource Reservation Protocol,
       Lixia Zhang, Steve Deering, Deborah Estrin, Scott
       Shenker, Daniel Zappala, in IEEE Nework,
       September 1993, volume 5, number 5.
[16]       Internet Quality of Service: Architectures and
       Mechanisms. Zheng Wang, March 2001
[17]       A Quality of Service Architecture that Combines
       Resource Reservation and Application Adaptation,
       Ian Foster, A. Roy, V. Sander. Proceedings of the 8th
       International Workshop on Quality of Service
       (IWQoS2000), June 2000
[18]       Decentralised QoS Reservations for protected
       network capacity, S. N. Bhatti, S.A. Sorenson, P.
       Clarke and J. Crowcroft in Proceedings of TERENA
       Networking Conference 2003, 19-22 May 2003
[20]       The Physiology of the GRID: An Open Grid
       Services Architecture for Distributed Systems
       Integration, Ian Foster, Carl Kesselman, Jeffrey M.
       Nick and Steven Tuecke

To top