Preproposal

Document Sample
Preproposal Powered By Docstoc
					ITR/SI+AP: GUARANTEEING HIGH-BANDWIDTH CONNECTIONS IN A
     DYNAMIC DISTRIBUTED NETWORK: DEVELOPMENT OF A
             DEPLOYABLE, AUTHENTICATED QoS



INVESTIGATORS

William A. Adamson, Research Investigator, Information Technology Division, Assistant
       Director, Center for Information Technology Integration, University of Michigan

Brian Athey, Assistant Professor of Biology, Director, University of Michigan Visible Human
       Project

Steve Corbatό, Director, Backbone Network, University Corporation for Advanced Internet
       Development

Farnam Jahanian, Associate Professor of Electrical Engineering and Computer Science, Director
      of Software Systems Laboratory

Shawn P.McKee, Assistant Research Scientist, Department of Physics, University of Michigan

Eric Myers, Research Physicist, Department of Physics, University of Michigan

Homer A. Neal, Professor of Physics, Director, UM-ATLAS Collaboratory Project, University of
      Michigan

Jeffrey C. Ogden, UM Internet2 Coordinator, Associate Director, Merit Network

Kang G. Shin, Professor of Electrical Engineering and Computer Science, Director, Real-Time
      Computing Laboratory, University of Michigan

Victor K. Wong, Academic Liaison and Director, Information Technology for Research,
       University of Michigan



PARTICIPATING INSTITUTIONS

University of Michigan
University Corporation for Advanced Internet Development
Merit Network, Inc.
EXECUTIVE SUMMARY

We propose to develop a secured, dynamic Quality of Service (QoS) computer network in a
production environment. This has never been done before. We will focus our efforts mainly on
implementing QoS at end-points and at the gigaPoP level, but with a view towards using QoS
over intermediate Internet2 QoS domains when that becomes possible. This proposal builds on
current work, seed funded by several institutions, to develop a working QoS channel between
researchers at two end-points, specifically the University of Michigan (U-M) Physics
Department in Ann Arbor and the European Laboratory for Particle Physics (CERN) in Geneva,
Switzerland. We will do this by 1) implementing a system of bandwidth brokers which use
distributed authentication and authorization to dynamically control differentiated services at the
router level; 2) implementing additional intermediate QoS domains and/or additional end-points;
3) studying this network to further understand the requirements for a scalable authorization
policy and to understand the effects of network topology on performance; and 4) testing the
network in real-world situations against practical research applications that require QoS. We
expect our proposed QoS network to serve as a deployable template for other research and
education communities that need to make use of QoS. Our larger goal is to enable a whole series
of collaborative research applications that require QoS for optimal performance, such as high
quality interactive video streaming and large scale distributed computing. We will test our QoS
network with several such applications, including those of direct use to the current research
efforts of the ATLAS project – one of the world’s largest scientific collaborations – and the NGI
Visible Human project. We will also test our work with a variety of network monitoring and
diagnostic tools.


BACKGROUND: The need for QoS

Many research tasks of current and future interest require the transfer of large amounts of data
across computer networks in a timely fashion. The Internet as it is presently constructed is not
up to this task. Today’s Internet provides only a “best effort” data delivery service where all
network traffic receives the same priority, whether it contains time-sensitive medical data or is
just part of a popular music file. When network traffic exceeds the capacity of some network
segments to carry all of the traffic, the network responds to the congestion by dropping some
packets, which must then be resent. Network congestion is usually not serious enough to keep
the resent packets from eventually reaching their final destination, but the need to re-send the
packets causes delays. Well-behaved network applications are expected to slow their rate of data
transmission when they re-send packets, and thus adapt to the network resources that are
available. This allows large numbers of users and applications to share the network, but the
speed of delivery and the amount of network bandwidth available to a specific user and
application varies. The variability in network performance is beyond the control of any single
user or application.
        The variable performance provided by the Internet’s “best effort” delivery service works
well for many network applications, but not for others. Examples of demanding applications that
do not work well on today’s Internet include high quality interactive video, the real-time control
                                                                                                 2

Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:
Development of a Deployable, Authenticated QoS
of remote scientific instruments, scientific visualization where the computational function and
the display function are remote from each other, and distributed “grid” computations where the
use of remote computational and storage elements must be tightly coordinated in order to
perform the computation efficiently.
         Two approaches are being pursued to overcome the limitations of “best effort” delivery.
The first and so far the most common approach is to build specialized portions of the network
that are capable of very high performance and which are open to relatively few users. This
approach over-provisions portions of the network in the hope that there will be little or no
network congestion, and hence no need to re-send packets, and no need for applications to slow
the data rate at which they transmit to the network, resulting in good network performance with
little variation due to external factors.
         The other approach is to implement different levels of network service that may be
requested by an application, so that important or time-sensitive traffic is given preference over
other traffic, much like first-class mail is given preference over parcel post. With this approach
there is no longer a single best effort service. Instead, data packets are marked to indicate the
service level they require and capable networks give the packets different priorities. The general
name for such differentiated service is “Quality of Service” (QoS). These two approaches, over
provisioning and QoS, are not mutually exclusive. In fact it seems likely that significant
progress will require that both approaches be pursued.
         A particular example of the need for QoS is the ATLAS project [1], which involves
almost 2000 physicists around the world working to design, construct, test, and operate a particle
physics detector for CERN’s Large Hadron Collider (LHC). ATLAS participants need
guaranteed high bandwidth and low latency now for interactive collaboration (e.g., high quality
video conferencing) and for testing and implementing distributed grid computing. As the
ATLAS detector becomes operational it will generate, on average, several terabytes of data per
day (on the order of a petabyte per year), and distributed grid computing is seen as possibly the
only way that meaningful physics can be extracted by dispersed researchers from such a large
volume of data. ATLAS researchers, both at Michigan and elsewhere, are beginning initial work
to implement both QoS and distributed grid computing.
         Guaranteed high bandwidth and low latency will also be important in medical and life
sciences applications. An immediate example is provided by the Next Generation Internet (NGI)
Visible Human (VH) Project [2]. Its aim is to develop NGI systems to serve visible human
datasets in novel and educationally useful ways. These include a comprehensive set of
interactive 2D and 3D VH browsers featuring arbitrary 2D cutting and 3D visualizations as well
as an interactive web navigation engine to create and visualize anatomic flythroughs under the
haptic control of the users. This will allow for delivery of several simultaneous high quality
digital streams, enabled by the QoS network system, creating structured medical knowledge
using the VH datasets.
         Although the idea behind QoS is simple, implementing it in practice in a real-world
network environment has turned out to be a daunting task. The ability to create a dedicated
preferred channel has only recently been demonstrated [3], but a number of other issues remain.
In a dynamic environment bandwidth allocations would have to be created and released as
needed, by making a request to a “Bandwidth Broker” (BB), which is a program that controls the
differentiated service of routers. Separate bandwidth brokers will control different QoS domains,
                                                                                                3

Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:
Development of a Deployable, Authenticated QoS
so some means of coordinating bandwidth allocations across separate QoS domains is needed.
Abuse of preferred services could cause serious problems, so authentication and authorization to
and between Bandwidth Brokers will be an important component of a practical QoS system.
         Several other hurdles exist. One is the fact that it may not be possible to have any control
over intermediate network segments, which are often commercial commodity networks and
which may not support QoS. One solution to this problem is the creation of separate over-
provisioned intermediate network segments, such as the Internet2 Abilene network. The biggest
problems with QoS, however, are expected to be at the end-point networks, roughly at the level
of either a campus network or a gigaPoP. The effects of adding QoS flows to existing network
traffic are presently unknown. One cannot hope to tune a network to use QoS effectively until
the effects of QoS on existing networks can be measured and understood.


PRESENT WORK: Authenticated QoS Signaling

Because of the importance of QoS, work on implementing QoS is already underway at the
University of Michigan, with seed funding and participation from several U-M sources: the
Physics Department, the Center for Information Technology Integration (CITI), the Office of the
Vice President for Research (OVPR), the office of the Chief Information Officer (CIO), and the
College of Literature, Science and the Arts (LS&A). Support is also provided by the University
Corporation for Advanced Internet Development (UCAID), Merit Network Inc, the European
Laboratory for Particle Physics (CERN), the NGI Visible Human project, and the Argonne
National Laboratory.
        This non-NSF funded project focuses just on the problems of Authenticated QoS
Signaling [4]. In a dynamic network environment, QoS requires signaling to bandwidth brokers
to make and release bandwidth reservations. To avoid abuse, these signals must be
authenticated, and the bandwidth broker must verify the authorization of the authenticated
signals. The system we are constructing uses the Akenti Access Control System [5] to provide
Public Key (PK) based authorization decisions, the Lightweight Directory Access Protocol
(LDAP) for storing and retrieving directory data (such as PK certificates, user, resource or
Akenti authorization data), and the Globus Architecture for Reservation and Allocation (GARA)
bandwidth broker software from Argonne National Laboratory.
        The Internet2 Middleware Working group is developing LDAP schema to provide a
common directory namespace for Internet2 organizations. This effort has so far produced a user
schema, called Eduperson. One goal of our present project is to use the Internet2 Middleware
Working group’s directory definitions and provide some real-world feedback to them.
        GARA is actually a part of ANL’s Globus system for grid computing [6], so the system
will be tested using Globus as one of the underlying applications requiring QoS. Because the
University of Michigan relies heavily on Kerberos for authentication and authorization, an
important additional component is the KX509 software from U-M’s Center for Information
Technology Integration (CITI), which creates and signs short-term PK certificates based on valid
Kerberos authentication of the requester, thus joining the Kerberos and PK systems. Integration
of KX509 with GARA will result in the ability to use Kerberos as a bandwidth broker
authentication method.
                                                                                                    4

Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:
Development of a Deployable, Authenticated QoS
        Although this project has only just started, significant progress has already been made.
Basic services are in place. The Akenti software has been compiled and tested and is running in
its base form. CITI’s LDAP service is currently being configured with the Internet2 Eduperson
schema. The globus software has been compiled, tested, and is now running. GARA, the globus
bandwidth broker component, is compiled and is currently being tested. A router has been made
available for the project. Once the router hardware has been upgraded and the bandwidth broker
is up and running we will begin to reserve bandwidth with GARA in it’s current form. KX509,
Akenti and LDAP integration will follow.
        Solving the problem of authenticated QoS authorization is an important step toward a
fully operational QoS network, though it is only one of many steps toward that goal.


NEXT CHALLENGES: A secured, dynamic QoS that works end-to-end

This ITR grant will enable us to tackle the next set of problems that must be overcome in order
to create a practical QoS system. Our next goal is to move from a simple demonstration of QoS
signaling between two specific end-points toward a production QoS deployment by developing
the tools needed to realize end-to-end QoS for diverse, and ultimately any, end-points. Once this
has been accomplished we will test the network using practical research applications that require
QoS. In particular, we propose to meet the following challenges:
I) Analyze and understand how QoS router configurations affect a working production network.
   Installing and removing configurations can lead to combinations of configuration features
   that lead to undesirable network behavior. These configurations include:
    Configuring edge routers to allow marked packets to enter the network from end points
       when the packets are associated with a network session that was previously approved
       through a request to a local Bandwidth Broker.
    Configuring edge routers to prevent marked packets from entering the network for end
       points when the packets are not associated with a network session that was previously
       approved through a request to a local Bandwidth Broker.
    Configuring edge routers to police marked packets from approved network sessions to
       ensure that they are not using more network resources than requested or authorized.
    Configuring all routers to expedite the delivery of marked packets.

II) Significantly extend the simple end-point only QoS domain model by:
     Implementing additional Quality of Service domains along the network path between
       Ann Arbor and CERN when over-provisioning to prevent network congestion is not
       possible on portions of the network.
     Adding additional end-points beyond Ann Arbor and CERN, such as the Pittsburgh
       Supercomputing Center (PSC), Argonne, Fermilab, SDSC, and other universities.

III) In our present work, a minimum set of Akenti attributes are used to describe a simple
    authorization policy. In a production system, the policies will be much more complex. While
    Akenti is designed to accommodate such complexity we will:
                                                                                                5

Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:
Development of a Deployable, Authenticated QoS
       Discover what additional attributes will be needed to manage a production system.
       Meet head-on the issues of managing the underlying common LDAP namespace.
       Discover if the authorization information exchange between remote sites included within
        the bandwidth broker to bandwidth broker protocol is sufficient to make an authorization
        decision in the common case, or whether out-of-band communication is necessary.

IV) Analyze and understand the effects of network topology on performance. This includes:
    Deploying network performance monitors and beacons to assist in troubleshooting
      problems in the delivery of end-to-end QoS.
    Developing new software tools or integrating software tools available from others to help
      network engineers diagnose and troubleshoot problems in delivering requested levels of
      network performance and quality of service. Integrate network performance and QoS
      diagnostic capabilities into a web-based tool set available to end users as well as network
      engineers.

V) Train first ourselves and then additional network engineers, departmental network
   administrators, application developers, and end users in what is required to develop, deploy,
   operate, troubleshoot, and tune high performance networking and QoS solutions.

VI) Test the performance of the network in real-world situations against practical research
    applications that require QoS, particularly those that will be of direct use to the ATLAS
    project and the Visible Human project. We expect these to include (but not be limited to):
     High quality interactive video conferencing,
     Distributed grid computing, using the Globus [6] and GriPhyN [7] toolkits, to perform
        distributed simulations of parts of the ATLAS detector, and to perform analysis of test
        beam data in the ATLAS computing environment.
     Compare and contrast the performance of distributed computing applications when non-
        QoS enabled sites are included.

         There are a number of reasons why the University of Michigan is an ideal choice for this
project. There is strong interest at Michigan in making QoS work, as evinced by the fact that
work on QoS has already started. Multiple organizations are committed to the project, both in
and outside of the University. These organizations have strong track records in software
development and advanced network design and operation, including the operation of production
networks. Through our partners we have access to the underlying network infrastructure within
departments at U-M, the U-M campus network, the Michigan gigaPoP, Internet2/Abilene,
STARTAP, and CERN. We have several real world applications that require QoS, and a strong
incentive to get those applications working. We anticipate the participation of students from our
Research Experience for Undergraduates at CERN program, which will involve them in both the
latest research in particle physics and the newest developments in computer networking. This
project will support real science and education by putting applications that use QoS in the hands
of end users through our work with the ATLAS and Visible Human projects.

                                                                                                   6

Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:
Development of a Deployable, Authenticated QoS

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:4/1/2012
language:
pages:6