Docstoc

Overcast Reliable Multicasting with an Overlay Network

Document Sample
Overcast Reliable Multicasting with an Overlay Network Powered By Docstoc
					      Overcast: Reliable Multicasting with an Overlay Network

                      John Jannotti David K. Gifford    Kirk L. Johnson
                         M. Frans Kaashoek    James W. O’Toole, Jr.
                                       Cisco Systems
                                {jj,gifford,tuna,kaashoek,otoole}@cisco.com


Abstract                                                consumers is considerably less than the natural con-
                                                        sumption rate of such media. With currently avail-
Overcast is an application-level multicasting system    able bandwidth, a 10-minute news clip might require
that can be incrementally deployed using today’s        an hour of download time. On the other hand, large-
Internet infrastructure. These properties stem from     scale (thousands of simultaneous viewers) use of
Overcast’s implementation as an overlay network.        even moderate-bandwidth live video streams (per-
An overlay network consists of a collection of nodes    haps 128Kbit/s) is precluded because network costs
placed at strategic locations in an existing network    scale linearly with the number of consumers.
fabric. These nodes implement a network abstrac-
tion on top of the network provided by the under-       Overcast attempts to address these difficulties by
lying substrate network.                                combining techniques from a number of other sys-
                                                        tems. Like IP Multicast, Overcast allows data to
Overcast provides scalable and reliable single-source   be sent once to many destinations. Data are repli-
multicast using a simple protocol for building effi-      cated at appropriate points in the network to mini-
cient data distribution trees that adapt to changing    mize bandwidth requirements while reaching multi-
network conditions. To support fast joins, Overcast     ple destinations. Overcast also draws from work in
implements a new protocol for efficiently tracking        caching and server replication. Overcast’s multicast
the global status of a changing distribution tree.      capabilities are used to fill caches and create server
                                                        replicas throughout a network. Finally Overcast is
Results based on simulations confirm that Over-          designed as an overlay network, which allows Over-
cast provides its added functionality while perform-    cast to be incrementally deployed. As nodes are
ing competitively with IP Multicast. Simulations        added to an Overcast system the system’s benefits
indicate that Overcast quickly builds bandwidth-        are increased, but Overcast need not be deployed
efficient distribution trees that, compared to IP         universally to be effective.
Multicast, provide 70%-100% of the total band-
width possible, at a cost of somewhat less than twice   An Overcast system is an overlay network consist-
the network load. In addition, Overcast adapts          ing of a central source (which may be replicated
quickly to changes caused by the addition of new        for fault tolerance), any number of internal Over-
nodes or the failure of existing nodes without caus-    cast nodes (standard PCs with permanent storage)
ing undue load on the multicast source.                 sprinkled throughout a network fabric, and stan-
                                                        dard HTTP clients located in the network. Using
                                                        a simple tree-building protocol, Overcast organizes
1   Introduction                                        the internal nodes into a distribution tree rooted
                                                        at the source. The tree-building protocol adapts
Overcast is motivated by real-world problems faced      to changes in the conditions of the underlying net-
by content providers using the Internet today. How      work fabric. Using this distribution tree, Overcast
can bandwidth-intensive content be offered on de-        provides large-scale, reliable multicast groups, espe-
mand? How can long-running content be offered to         cially suited for on-demand and live data delivery.
vast numbers of clients? Neither of these challenges    Overcast allows unmodified HTTP clients to join
are met by today’s infrastructure, though for dif-      these multicast groups.
ferent reasons. Bandwidth-intensive content (such
as 2Mbit/s video) is impractical because the bot-       Overcast permits the archival of content sent to mul-
tleneck bandwidth between content providers and         ticast groups. Clients may specify a starting point
when joining an archived group, such as the begin-
ning of the content. This feature allows a client to
“catch up” on live content by tuning back ten min-                                                       O
utes into a stream, for instance. In practice, the                                        100
nature of a multicast group will most often deter-
mine the way it is accessed. A group containing                         10
                                                               S
stock quotes will likely be accessed live. A group
containing a software package will likely be accessed
                                                                                          100
from start to finish; “live” would have no meaning                                                        O
for such a group. Similarly, high-bandwidth con-
tent can not be distributed live when the bottleneck
bandwidth from client to server is too small. Such
content will always be accessed relative to its start.   Figure 1: An example network and Overcast topology. The
                                                         straight lines are the links in the substrate network. These
We have implemented Overcast and used it to create       links are labeled with bandwidth in Mbit/s. The curved lines
                                                         represent connections in the Overlay network. S represents
a data distribution system for businesses. Most cur-
                                                         the source, O represents two Overcast nodes.
rent users distribute high quality video that clients
access on demand. These businesses operate ge-
ographically distributed offices and need to dis-          the constrained link only once.
tribute video to their employees. Before using Over-
cast, they met this need with low resolution Web-        The contributions of this paper are:
accessible video or by physically reproducing and
mailing VHS tapes. Overcast allows these users             • A novel use of overlay networks. We describe
to distribute high-resolution video over the Inter-          how reliable, highly-scalable, application-level
net. Because high quality videos are large (Approx-          multicast can be provided by adding nodes that
imately 1 Gbyte for a 30 minute MPEG-2 video),               have permanent storage to the existing network
it is important that the videos are efficiently dis-           fabric.
tributed and available from a node with high band-
width to the client. To a lesser extent, Overcast is       • A simple protocol for forming efficient and scal-
also being used to broadcast live streams. Existing          able distribution trees that adapt to changes in
Overcast networks typically contain tens of nodes            the conditions of the substrate network without
and are scheduled to grow to hundreds of nodes.              requiring router support.

The main challenge in Overcast is the design and           • A novel protocol for maintaining global status
implementation of protocols that can build effi-               at the root of a changing distribution tree. This
cient, adaptive distribution trees without knowing           state allows clients to join an Overcast group
the details of the substrate network topology. The           quickly while maintaining scalability.
substrate network’s abstraction provides the ap-           • Results from simulations that show Overcast is
pearance of direct connectivity between all Over-            efficient. Overcast can scale to a large num-
cast nodes. Our goal is to build distribution trees          ber of nodes; its efficiency approaches router-
that maximize each node’s bandwidth from the                 based systems; it quickly adjusts to configura-
source and utilize the substrate network topology            tion changes; and a root can track the status of
efficiently. For example, the Overcast protocols               an Overcast network in a scalable manner.
should attempt to avoid sending data multiple times
over the same physical link. Furthermore, Overcast
should respond to transient failures or congestion in    Section 2 details Overcast’s relation to prior work.
the substrate network.                                   Overcast’s general structure is examined in Section
                                                         3, first by describing overlay networks in general,
Consider the simple network depicted in Figure 1.        then providing the details of Overcast. Section
The network substrate consists of a root node (R),       4 describes the operation of the Overcast network
two Overcast nodes (O), a router, and a number           performing reliable application-level multicast. Fi-
of links. The links are labeled with bandwidth in        nally, Section 5 examines Overcast’s ability to build
Mbit/s. There are three ways of organizing the root      a bandwidth-efficient overlay network for multicas-
and the Overcast nodes into a distribution tree. The     ting and to adapt efficiently to changing network
organization shown optimizes bandwidth by using          conditions.
2   Related Work                                         the need for manually determined topology infor-
                                                         mation when the overlay network is created, but
Overcast seeks to marry the bandwidth savings of         also reacts transparently to the addition or removal
an IP Multicast distribution tree with the reliability   of nodes in the running system. Initialization, ex-
and simplicity of store-and-forward operation using      pansion, and fault tolerance are unified.
reliable communication between nodes. Overcast
builds on research in IP multicast, content distri-      A number of service providers (e.g., Adero, Aka-
bution (caching, replication, and content routing),      mai, and Digital Island) operate content distribu-
and overlay networks. We discuss each in turn.           tion networks, but in-depth information describing
                                                         their internals is not public information. FastFor-
IP Multicast IP Multicast [11] is designed to pro-       ward’s product is described below as an example of
vide efficient group communication as a low level          an overlay network.
network primitive. Overcast has a number of ad-
vantages over IP Multicast. First, as it requires no     Overlay Networks A number of research groups
router support, it can be deployed incrementally on      and service providers are investigating services
existing networks. Second, Overcast provides band-       based on overlay networks. In particular, many of
width savings both when multiple clients view con-       these services, like Overcast, exist to provide some
tent simultaneously and when multiple clients view       form of multicast or content distribution. These in-
content at different times. Third, while reliable mul-    clude End System Multicast [16], Yoid [13] (formerly
ticast is the subject of much research [19, 20], prob-   Yallcast), X-bone [24], RMX [9], FastForward [1],
lems remain when various links in the distribution       and PRISM [5]. All share the goal of providing
tree have widely different bandwidths. A common           the benefits of IP multicast without requiring di-
strategy in such situations is to decrease the fidelity   rect router support or the presence of a physical
of content over lower bandwidth links. Although          broadcast medium. However, except Yoid, these ap-
such a strategy has merit when content must be de-       proaches do not exploit the presence of permanent
livered live, Overcast also supports content types       storage in the network fabric.
that require bit-for-bit integrity, such as software.
                                                         End System Multicast is an overlay network that
Express [15] is a single-source multicasting system      provides small-scale multicast groups for telecon-
that addresses some of IP Multicast’s deficits. Ex-       ferencing applications; as a result the End System
press alleviates difficulties relating to IP Multicast’s   Multicast protocol (Narada) is designed for multi-
small address space, susceptibility to denial of ser-    source multicast. The Overcast protocols different
vice attacks, and billing difficulties which may lie       from Narada in order to support large-scale multi-
at the root of IP Multicast’s lack of deployment         cast groups.
on commercial networks. In these three respects
Overcast bears a great deal of similarity to Ex-         Yoid is a generic architecture for overlay networks
press. Overcast differs mainly by stressing deploy-       with a number of new protocols, which are in devel-
ability and flexibility. Overcast does not require        opment. The most striking difference between Yoid
router modifications, simplifying adoption and in-        and Overcast is in approach. Yoid strives to be a
creasing flexibility. Although Overcast provides a        general purpose overlay network and content distri-
useful range of functionality, we recognize that there   bution toolkit, addressing applications as diverse as
needs for which Overcast may not be suited. Ex-          netnews, streaming broadcasts, and bulk email dis-
press standardizes a single model in the router which    tribution. While these goals are laudable, we believe
works to lock out applications with different needs.      that because Overcast is more focused on providing
                                                         single-source multicast our protocols are simpler to
Content Distribution Systems Others have ad-             understand and implement. Nonetheless, there re-
vocated distributing content servers in the net-         mains a great deal of similarity between Overcast
work fabric, from initial proposals [10] to larger       and Yoid, including url-like group naming, the use
projects, such as Adaptive Caching [26], Push            of disk space to “time-shift” multicast distribution,
Caching [14], Harvest [8], Dynamic Hierarchical          and automatic tree configuration.
Caching [7], Speculative Data Dissemination [6],
and Application-Level Replication [4]. Overcast ex-      X-bone is also a general-purpose overlay network
tends this previous work by building an overlay net-     that can support many different network services.
work using a self-organizing algorithm. This algo-       The overlay networks formed by X-bone are meshes,
rithm, operating continuously, not only eliminates       which are statically configured.
RMX focuses on real-time reliable multicast. As           plication uses large amounts of disk space for long
such, its focus is on reconciling the heterogenous ca-    periods of time which is problematic in a shared en-
pabilities and network connections of various clients     vironment.
with the need for reliability. Therefore their work
focuses on semantic rather than data reliability. For     Our observation is that one-time hardware costs do
instance, RMX can be used to change high resolu-          not drive the total costs of systems on the scale
tion images into progressive JPEGs before trans-          that we propose. Total cost is dominated by band-
mittal to underprovisioned clients. Our work is less      width, maintenance, and continual hardware obso-
concerned with interactive response times. Overcast       lescence. Therefore Overcast seeks to minimize the
is designed for content that clients are interested in    use of bandwidth, cut maintenance costs by sim-
only at full fidelity, even if it means that the content   plifying node deployment, and avoid obsolescence
does not become available to all clients at the same      by structuring the system to allow older nodes to
time.                                                     continue to contribute to the total efficiency of the
                                                          overlay network.
FastForward Networks produces a system sharing
many properties with RMX. Like RMX, FastFor-              Active Networks One may view overlay networks
ward focuses on real-time operation and includes          as an alternative implementation of active net-
provisions for intelligently decreasing the band-         works [23]. In active networks, new protocols and
width requirements of rich media for low-bandwidth        application-code can dynamically be downloaded
clients. Beyond this, FastForward’s product differs        into routers, allowing for rapid innovation of net-
from Overcast in that its distribution topology is        work services. Overcast avoids some of the hard
statically configured by design. Within this stati-        problems of active networks by focusing on a single
cally configured topology, the product can pick dy-        application; it does not have to address the prob-
namic routes. In this way FastForward allows ex-          lems created by dynamic downloading of code and
perts to configure the topology for better perfor-         sharing resources among multiple competing appli-
mance and predictability while allowing for a lim-        cations. Furthermore, since Overcast requires no
ited degree of dynamism. Overcast’s design seeks          changes to existing routers, it is easier to deploy.
to minimize human intervention to allow its overlay       The main challenge for Overcast is to be competi-
networks to scale to thousands of nodes. Similarly,       tive with solutions that are directly implemented on
FastForward achieves fault tolerance by statically        the network level.
configuring distribution topologies to avoid single
points of failure, while Overcast seeks to dynami-
cally reconfigure its overlay in response to failures.     3     The Overcast Network
PRISM is an architecture for distributing streaming
media over IP. Its architecture bears some similarity     This section describes the overlay network created
to Overcast, but their work appears focused on the        by the Overcast system. First, we argue the ben-
naming of content and the design of interior nodes of     efits and drawbacks of using an overlay network.
the system. PRISM’s high level design includes an         After concluding that an overlay network is appro-
overlay based content distribution mechanism, but         priate for the task at hand, we explore the particular
it is assumed that such a system can be “plugged          design of an overlay network to meet Overcast’s de-
in” to the rest of PRISM. Overcast could provide          mands. To do so, we examine the key design require-
that mechanism.                                           ment of the Overcast network—single source distri-
                                                          bution of bandwidth-intensive media on today’s In-
Active Services Active Services [2] is a frame-           ternet infrastructure. Finally we illustrate the use
work for implementing services at the application-        of Overcast with an example.
level throughout the fabric of the network. In that
sense, there is a strong similarity in mindset between
our works. However, Active Services must contend
                                                          3.1    Why overlay?
with the difficulty of sharing the resources of a sin-
gle computer among multiple services, a difficulty          Overcast was designed to meet the needs of con-
we avoid by using dedicated nodes. Perhaps be-            tent providers on the Internet. This goal led us to
cause of this challenge, Active Service applications      an overlay network design. To understand why we
have focused on real-time multimedia streaming, an        chose an overlay network, we consider the benefits
application with transient resource needs. Our ap-        and drawbacks of overlays.
An overlay network provides advantages over both         Management complexity The manager of an
centrally located solutions and systems that advo-       overlay network is physically far removed from the
cate running code in every router. An overlay net-       machines being managed. Routine maintenance
work is:                                                 must either be unnecessary or possible from afar,
                                                         using tools that do not scale in complexity with the
Incrementally Deployable An overlay network
                                                         size of the network. Physical maintenance must be
requires no changes to the existing Internet infras-
                                                         minimized and be possible by untrained personnel.
tructure, only additional servers. As nodes are
added to an overlay network, it becomes possible to      The real world In the real world, IP does not
control the paths of data in the substrate network       provide universal connectivity. A large portion of
with ever greater precision.                             the Internet lies behind firewalls. A significant and
Adaptable Although an overlay network abstrac-           growing share of hosts are behind Network Address
tion constrains packets to flow over a constrained        Translators (NATs), and proxies. Dealing with
set of links, that set of links is constantly being      these practical issues is tedious, but crucial to adop-
optimized over metrics that matter to the applica-       tion.
tion. For instance, the overlay nodes may opti-
mize latency at the expense of bandwidth. The De-        Inefficiency An overlay can not be as efficient as
tour Project [21] has discovered that there are often    code running in every router. However, our observa-
routes between two nodes with less latency than the      tion is that when an overlay network is small, the in-
routes offered by today’s IP infrastructure. Overlay      efficiency, measured in absolute terms, will be small
networks can find and take advantage of such routes.      as well — and as the overlay network grows, its ef-
                                                         ficiency can approach the efficiency of router based
Robust By virtue of the increased control and the        servcies.
adaptable nature of overlay networks, an overlay
network can be more robust than the substrate fab-       Information loss Because the overlay network is
ric. For instance, with a sufficient number of nodes       built on top of a network infrastructure (IP) that
deployed, an overlay network may be able to guar-        offers nearly complete connectivity (limited only by
antee that it is able to route between any two nodes     firewalls, NATs, and proxies), we expend consider-
in two independent ways. While a robust substrate        able effort deducing the topology of the substrate
network can be expected to repair faults eventu-         network.
ally, such an overlay network might be able to route
around faults immediately.                               The first two of these problems can be addressed
                                                         and nearly eliminated by careful design. To ad-
Customizable Overlay nodes may be multi-                 dress management complexity, management of the
purpose computers, easily outfitted with whatever         entire overlay network can be concentrated at a sin-
equipment makes sense. For example, Overcast             gle site. The key to a centralized-administration
makes extensive use of disk space. This allows           design is guaranteeing that newly installed nodes
Overcast to provide bandwidth savings even when          can boot and obtain network connectivity without
content is not consumed simultaneously in different       intervention. Once that is accomplished, further in-
parts of the network.                                    structions may be read from the central manage-
Standard An overlay network can be built on the          ment server.
least common denominator network services of the
                                                         Firewalls, NATs and HTTP proxies complicate
substrate network. This ensures that overlay traffic
                                                         Overcast’s operation in a number of ways. Fire-
will be treated as well as any other. For example,
                                                         walls force Overcast to open all connections “up-
Overcast uses TCP (in particular, HTTP over port
                                                         stream” and to communicate using HTTP on port
80) for reliable transport. TCP is simple, well un-
                                                         80. This allows an Overcast network to extend ex-
derstood, network friendly, and standard. Alterna-
                                                         actly to those portions of the Internet that allow
tives, such as a “home grown” UDP protocol with
                                                         web browsing. NATs are devices used to multiplex
retransmissions, are less attractive by all these mea-
                                                         a small set of IP addresses (often exactly one) over a
sures. For better or for worse, creativity in reliable
                                                         number of clients. The clients are configured to use
transport is a losing battle on the Internet today.
                                                         the NAT as their default router. At the NAT, TCP
On the other hand, building an overlay network           connections are rewritten to use one of the small
faces a number of interesting challenges. An overlay     number of IP addresses managed by the NAT. TCP
network must address:                                    port numbers allow the NAT to demultiplex return
packets back to the correct client. The complication     and without obvious administration to avoid colli-
for Overcast is that client IP addresses are obscured.   sions amongst new groups.
All Overcast nodes behind the NAT appear to have
the same IP address. HTTP proxies have the same          On the other hand, a single-source model clearly of-
effect.                                                   fers reduced functionality compared to a model that
                                                         allows any group member to multicast. As such,
Although private IP addresses are never directly         Overcast is not appropriate for applications that re-
used by external Overcast nodes, there are times         quire extensive use of such a model. However, many
when an external node must correctly report the          applications which appear to need multi-source mul-
private IP address of another node. For example,         ticast, such as a distributed lecture allowing ques-
an external node may have internal children. Dur-        tions from the class, do not. In such an application,
ing tree building a node must report its childrens’      only one “non-root” sender is active at any particu-
addresses so that they may be measured for suitabil-     lar time. It would be a simple matter for the sender
ity as parents themselves. Only the private address      to unicast to the root, which would then perform the
is suitable for such purposes. To alleviate this com-    true multicast on the behalf of the sender. A num-
plication all Overcast messages contain the sender’s     ber of projects [15, 17, 22] have used or advocated
IP address in the payload of the message.                such an approach.

The final two disadvantages are not so easily dis-
missed. They represent the true tradeoff between          3.3   Bandwidth Optimization
overlay networks and ubiquitous router based soft-
ware. For Overcast, the goal of instant deployment       Overcast is designed for distribution from a single
is important enough to sacrifice some measure of          source. As such, small latencies are expected to be
efficiency. However, the amount of inefficency in-           of less importance to its users than increased band-
troduced is a key metric by which Overcast should        width. Extremely low latencies are only important
be judged.                                               for applications that are inherently two-way, such
                                                         as video conferencing. Overcast is designed with
3.2   Single-Source Multicast                            the assumption that broadcasting “live” video on
                                                         the Internet may actually mean broadcasting with
                                                         a ten to fifteen second delay.
Overcast is a single-source multicast system. This
contrasts with IP Multicast which allows any mem-
                                                         Overcast distribution trees are built with the sole
ber of a multicast group to send packets to all
                                                         goal of creating high bandwidth channels from the
other members of the group. Beyond the fact that
                                                         source to all nodes. Although Overcast makes no
this closely models our intended application domain,
                                                         guarantees that the topologies created are optimal,
there are a number of reasons to pursue this partic-
                                                         our simulations show that they perform quite well.
ular refinement to the IP Multicast model.
                                                         The exact method by which high-bandwidth distri-
Simplicity Both conceptually and in implementa-          bution trees are created and maintained is described
tion, a single-source system is simpler than an any-     in Section 4.2.
source model. For example, a single-source provides
an obvious rendezvous point for group joins.             3.4   Deployment
Optimization It is difficult to optimize the struc-
ture of the overlay network without intimate knowl-      An important goal for Overcast is to be deployable
edge of the substrate network topology. This only        on today’s Internet infrastructure. This motivates
becomes harder if the structure must be optimized        not only the use of an overlay network, but many
for all paths [16].                                      of its details. In particular, deployment must re-
                                                         quire little or no human intervention, costs per node
Address space Single-source multicast groups pro-        should be minimized, and unmodified HTTP clients
vide a convenient alternative to the limited IP Mul-     must be able to join multicast groups in the Over-
ticast address space. The namespace can be par-          cast network.
titioned by first naming the source, then allowing
further subdivision of the source’s choosing. In con-    To help ease the human costs of deployment, nodes
trast, IP Multicast’s address space is flat, limited,     in the Overcast network configure themselves in an
adaptive distributed tree with a single root. No hu-     a web page announcing the availability of the con-
man intervention is required to build efficient dis-       tent. When a user clicks on the URL for published
tribution trees, and nodes can be a part of multiple     content, Overcast redirects the request to a nearby
distribution trees.                                      appliance and the appliance serves the content. If
                                                         the content is video, no special streaming software
Overcast’s implementation on commodity PCs run-          is needed. The user can watch the video over stan-
ning Linux further eases deployment. Development         dard protocols and a standard MPEG player, which
is speeded by the familiar programming environ-          is supplied with most browsers.
ment, and hardware costs are minimized by con-
tinually tracking the best price/performance ratio       An administrator at the studio can control the over-
available in off-the-shelf hardware. The exact hard-      lay network from a central point. She can view the
ware configuration we have deployed has changed           status of the network (e.g., which appliances are
many times in the year or so that we have deployed       up), collect statistics, control bandwidth consump-
Overcast nodes.                                          tion, etc.

The final consumers of content from an Overcast           Using this system, bulk data can be distributed effi-
network are HTTP clients. The Overcast proto-            ciently, even if the network between the appliances
cols are carefully designed so that unmodified Web        and the studio consists of low-bandwidth or inter-
browsers can become members of a multicast group.        mittent links. Given the relative prices of disk space
In Overcast, a multicast group is represented as an      and network bandwidth, this solution is far less ex-
HTTP URL: the hostname portion names the root            pensive than upgrading all network links between
of an Overcast network and the path represents a         the studio and every client.
particular group on the network. All groups with
the same root share a single distribution tree.
                                                         4     Protocols
Using URLs as a namespace for Overcast groups
has three advantages. First, URLs offer a hierar-         The previous section described the structure and
chal namespace, addressing the scarcity of multi-        properties of the Overcast overlay network. This
cast group names in traditional IP Multicast. Sec-       section describes how it functions: the initializa-
ond, URLs and the means to access them are an            tion of individual nodes, the construction of the
existing standard. By delivering data over a simple      distribution hierarchy, and the automatic mainte-
HTTP connection, Overcast is able to bring multi-        nance of the network. In particular, we describe
casting to unmodified applications. Third, a URL’s        the “tree” protocol to build distribution trees and
richer structure allows for simple expression of the     the “up/down” protocol to maintain the global state
increased power of Overcast over tradition multi-        of the Overcast network efficiently. We close by de-
cast. For example, a group suffix of start=10s may         scribing how clients (web browsers) join a group and
be defined to mean “begin the content stream 10           how reliable multicasting to clients is performed.
seconds from the beginning.”
                                                         4.1    Initialization
3.5   Example usage
                                                         When a node is first plugged in or moved to a new
We have used Overcast to build a content-                location it automatically initializes itself and con-
distribution application for high-quality video and      tacts the appropriate Overcast root(s). The first
live streams. The application is built out of a pub-     step in the initialization process is to determine an
lishing station (called a studio) and nodes (called      IP address and gateway address that the node can
appliances). Appliances are installed at strategic       use for general IP connectivity. If there is a local
locations in their network. The appliances boot,         DHCP server then the node can obtain IP configu-
contact their studio, and self-organize into a distri-   ration directly data using the DHCP protocol [12].
bution tree, as described below. No local adminis-       If DHCP is unavailable, a utility program can be
tration is required.                                     used from a nearby workstation for manual config-
                                                         uration.
The studio stores content and schedules it for deliv-
ery to the appliances. Typically, once the content       Once the node has an IP configuration it contacts a
is delivered, the publisher at the studio generates      global, well-known registry, sending along its unique
serial number. Based on a node’s serial number, the      root thereby becomes the current node. Next, the
registry provides a list of the Overcast networks the    new node begins a series of rounds in which it will
node should join, an optional permanent IP config-        attempt to locate itself further away from the root
uration, the network areas it should serve, and the      without sacrificing bandwidth back to the root. In
access controls it should implement. If a node is        each round the new node considers its bandwidth
intended to become part of a particular content dis-     to current as well as the bandwidth to current
tribution network, the configuration data returned        through each of current’s children. If the band-
will be highly specific. Otherwise, default values        width through any of the children is about as high
will be returned and the networks to which a node        as the direct bandwidth to current, then one of
will join can be controlled using a web-based GUI.       these children becomes current and a new round
                                                         commences. In the case of multiple suitable chil-
4.2   The Tree Building Protocol                         dren, the child closest (in terms of network hops) to
                                                         the searching node is chosen. If no child is suitable,
                                                         the search for a parent ends with current.
Self-organization of appliances into an efficient, ro-
bust distribution tree is the key to efficient opera-      To approximate the bandwidth that will be ob-
tion in Overcast. Once a node initializes, it begins a   served when moving data, the tree protocol mea-
process of self-organization with other nodes of the     sures the download time of 10 Kbytes. This mea-
same Overcast network. The nodes cooperatively           surement includes all the costs of serving actual
build an overlay network in the form of a distri-        content. We have observed that this approach to
bution tree with the root node at its source. This       measuring bandwidth gives us better results than
section describes the tree-building protocol.            approaches based on low-level bandwidth measure-
                                                         ments such as using ping. On the other hand, we
As described earlier, the virtual links of the overlay   recognize that a 10 Kbyte message is too short to
network are the only paths on which data is ex-          accurately reflect the bandwidth of “long fat pipes”.
changed. Therefore the choice of distribution tree       We plan to move to a technique that uses progres-
can have a significant impact on the aggregate com-       sively larger measurements until a steady state is
munication behavior of the overlay network. By           observed.
carefully building a distribution tree, the network
utilization of content distribution can be signifi-       When the measured bandwidths to two nodes are
cantly reduced. Overcast stresses bandwidth over         within 10% of each other, we consider the nodes
other conceivable metrics, such as latency, because      equally good and select the node that is closest, as
of its expected applications. Overcast is not in-        reported by traceroute. This avoids frequent topol-
tended for interactive applications, therefore opti-     ogy changes between two nearly equal paths, as well
mizing a path to shave small latencies at the ex-        as decreasing the total number of network links used
pense of total throughput would be a mistake. On         by the system.
the other hand, Overcast’s architecture as an over-
lay network allows this decision to be revisited. For    A node periodically reevaluates its position in the
instance, it may be decided that trees should have       tree by measuring the bandwidth to its current sib-
a fixed maximum depth to limit buffering delays.           lings (an up-to-date list is obtained from the par-
                                                         ent), parent, and grandparent. Just as in the initial
The goal of Overcast’s tree algorithm is to max-         building phase, a node will relocate below its sib-
imize bandwidth to the root for all nodes. At a          lings if that does not decrease its bandwidth back
high level the algorithm proceeds by placing a new       to the root. The node checks bandwidth directly
node as far away from the root as possible with-         to the grandparent as a way of testing its previous
out sacrificing bandwidth to the root. This ap-           decision to locate under its current parent. If nec-
proach leads to “deep” distribution trees in which       essary the node moves back up in the hierarchy to
the nodes nonetheless observe no worse bandwidth         become a sibling of its parent. As a result, nodes
than obtaining the content directly from the root.       constantly reevaluate their position in the tree and
By choosing a parent that is nearby in the network,      an Overcast network is inherently tolerant of non-
the distribution tree will form along the lines of the   root node failures. If a node goes off-line for some
substrate network topology.                              reason, any nodes that were below it in the tree
                                                         will reconnect themselves to the rest of the rout-
The tree protocol begins when a newly initialized        ing hierarchy. When a node detects that its parent
node contacts the root of an Overcast group. The         is unreachable, it will simply relocate beneath its
grandparent. If its grandparent is also unreachable       We call this protocol the “up/down” protocol be-
the node will continue to move up its ancestry until      cause our current system uses it mainly to keep track
it finds a live node. The ancestor list also allows cy-    of what nodes are up and what nodes are down.
cles to be avoided as nodes asynchronously choose         However, arbitrary information in either of two large
new parents. A node simply refuses to become the          classes may be propagated to the root. In particu-
parent of a node it believes to be it’s own ances-        lar, if the information either changes slowly (e.g.,
tor. A node that chooses such a node will forced to       up/down status of nodes), or the information can
rechoose.                                                 be combined efficiently from multiple children into a
                                                          single description (e.g., group membership counts),
While there is extensive literature on faster fail-over   it can be propagated to the root. Rapidly chang-
algorithms, we have not yet found a need to opti-         ing information that can not be aggregated during
mize beyond the strategy outlined above. It is im-        propagation would overwhelm the root’s bandwidth
portant to remember that the nodes participating          capacity.
in this protocol are dedicated machines that are less
prone to failure than desktop computers. If this be-      Each node in the network, including the root node,
comes an issue, we have considered extending the          maintains a table of information about all nodes
tree building algorithm to maintain backup parents        lower than itself in the hierarchy and a log of all
(excluding a node’s own ancestry from considera-          changes to the table. Therefore the root node’s ta-
tion) or an entire backup tree.                           ble contains up-to-date information for all nodes in
                                                          the hierarchy. The table is stored on disk and cached
By periodically remeasuring network performance,
                                                          in the memory of a node.
the overlay network can adapt to network condi-
tions that manifest themselves at time scales larger
                                                          The basis of the protocol is that each node period-
than the frequency at which the distribution tree
                                                          ically checks in with the node directly above it in
reorganizes. For example, a tree that is optimized
                                                          the tree. If a child fails to contact its parent within
for bandwidth efficient content delivery during the
                                                          a preset interval, the parent will assume the child
day may be significantly suboptimal during the
                                                          and all its descendants have “died”. That is, either
overnight hours (when network congestion is typ-
                                                          the node has failed, an intervening link has failed, or
ically lower). The ability of the tree protocol to
                                                          the child has simply changed parents. In any case,
automatically adapt to these kinds of changing net-
                                                          the parent node marks the child and its descendants
work conditions provides an important advantage
                                                          “dead” in its table. Parents never initiate contact
over simpler, statically configured content distribu-
                                                          with descendants. This is a byproduct of a design
tion schemes.
                                                          that is intended to cross firewalls easily. All node
                                                          failures must be detected by a failure to check in,
4.3    The Up/Down Protocol                               rather than active probing.
To allow web clients to join a group quickly, the         During these periodic check-ins, a node reports new
Overcast network must track the status of the Over-       information that it has observed or been informed
cast nodes. It may also be important to report sta-       of since it last checked in. This includes:
tistical information back to the root, so that content
providers might learn, for instance, how often cer-
tain content is being viewed. This section describes        • “Death certificates” - Children that have
a protocol for efficient exchange of information in             missed their expected report time.
a tree of network nodes to provide the root of the          • “Birth certificates” - Nodes that have become
tree with information from nodes throughout the               children of the reporting node.
network. For our needs, this protocol must scale
sublinearly in terms of network usage at the root,          • Changes to the reporting node’s “extra infor-
but may scale linearly in terms of space (all with            mation.”
respect to the number of Overcast nodes). This
                                                            • Certficates or changes that have been propa-
is a simple result of the relative requirements of a
                                                              gated to the node from its own children since
client for these two resources and the cost of those
                                                              its last checkin.
resources. Overcast might store (conservatively) a
few hundred bytes about each Overcast node, but
even in a group of millions of nodes, total RAM cost      This simple protocol exhibits a race condition when
for the root would be under $1,000.                       a node chooses a new parent. The moving node’s
former parent propagates a death certificate up the        4.4   Replicating the root
hierarchy, while at nearly the same time the new
parent begins propagating a birth certificate up the
                                                          In Overcast, there appears to be the potential for
tree. If the birth certificate arrives at the root first,
                                                          significant scalability and reliability problems at the
when the death certificate arrives the root will be-
                                                          root. The up/down protocol works to alleviate the
lieve that the node has failed. This inaccuracy will
                                                          scalability difficulties in maintaining global state
remain indefinitely since a new birth certificate will
                                                          about the distribution tree, but the root is still
only be sent in response to a change in the hierarchy
                                                          responsible for handling all join requests from all
that may not occur for an arbitrary period of time.
                                                          HTTP clients. The root handles such requests by
                                                          redirection, which is far less resource intensive than
To alleviate this problem, a node maintains a se-
                                                          actually delivering the requested content. Nonethe-
quence number indicating of how many times it has
                                                          less, the possibility of overload remains for particu-
changed parents. All changes involving a node are
                                                          larly popular groups. The root is also a single point
tagged with that number. A node ignores changes
                                                          of failure.
that are reported to it about a node if it has already
seen a change with a higher sequence number. For
                                                          To address this, overcast uses a standard technique
instance, a node may have changed parents 17 times.
                                                          used by many popular websites. The DNS name of
When it changes again, its former parent will propa-
                                                          the root resolves to any number of replicated roots
gate a death certificate annotated with 17. However,
                                                          in round-robin fashion. The database used to per-
its new parent will propagate a birth certificate an-
                                                          form redirections is replicated to all such roots. In
notated with 18. If the birth certificate arrives first,
                                                          addition, IP address takeover may be used for imme-
the death certificate will be ignored since it is older.
                                                          diate failover, since DNS caching may cause clients
                                                          to continue to contact a failed replica. This sim-
An important optimization to the up/down protocol
                                                          ple, standard technique works well for this purpose
avoids large sets of birth certificates from arriving
                                                          because handling joins from HTTP clients is a read-
at the root in response to a node with many de-
                                                          only operation that lends well to distribution over
scendants choosing a new parent. Normally, when
                                                          numerous replicas.
a node moves to a new parent, a birth certificate
must be sent out for each of its descendants to its
                                                          There remains, however, a single point of failure for
new parent. This maintains the invariant that a
                                                          the up/down protocol. The functionality of the root
node knows the parent of all its descendants. Keep
                                                          in the up/down protocol cannot be distributed so
in mind that a birth certificate is not only a record
                                                          easily because its purpose is to maintain changing
that a node exists, but that it has a certain parent.
                                                          state. However the up/down protocol has the use-
                                                          ful property that all nodes maintain state for nodes
Although this large set of updates is required, it is
                                                          below them in the distribution tree. Therefore, a
usually unnecessary for these updates to continue
                                                          convenient technique to address fault tolerance is to
far up the hierarchy. For example, when a node
                                                          specially construct the top of the hierarchy.
relocates beneath a sibling, the sibling must learn
about all of the node’s descendants, but when the         Starting with the root, some number of nodes are
sibling, in turn, passes these certificates to the orig-   configured linearly, that is, each has only one child.
inal parent, the original parent notices that they do     In this way all other overcast nodes lie below these
not represent a change and quashes the certificate         top nodes. Figure 2 shows a distribution tree in
from further propagation.                                 which the top three nodes are arranged linearly.
                                                          Each of these nodes has enough information to act
Using the up/down protocol, the root of the hi-           as the root of the up/down protocol in case of a fail-
erarchy will receive timely updates about changes         ure. This technique has the drawback of increasing
to the network. The freshness of the information          the latency of content distribution unless special-
can be tuned by varying the length of time between        case code skips the extra roots during distribution.
check-ins. Shorter periods between updates guaran-        If latency were important to Overcast this would be
tee that information will make its way to the root        an important, but simple, optimization.
more quickly. Regardless of the update frequency,
bandwidth requirements at the root will be propor-        “Linear roots” work well with the need for replica-
tional to the number of changes in the hierarchy          tion to address scalability, as mentioned above. The
rather than the size of the hierarchy itself.             set of linear nodes has all the information needed to
                                                                  the distribution tree built by the tree protocol.
                                                                  Data is moved between parent and child using TCP
                                                                  streams. If a node has four children, four separate
                                                                  connections are used. The content may be pipelined
                                                                  through several generations in the tree. A large file
                                                                  or a long-running live stream may be in transit over
                                                                  tens of different TCP streams at a single moment,
                                                                  in several layers of the distribution hierarchy.
Figure 2: A specially configured distribution topology that
                                                                  If a failure occurs during an overcast, the distri-
allows either of the grey nodes to quickly stand in as the root
(black) node. All filled nodes have complete status informa-       bution tree will rebuild itself as described above.
tion about the unfilled nodes.                                     After rebuilding the tree, the overcast resumes for
                                                                  on-demand distributions where it left off. In order
                                                                  to do so, each node keeps a log of the data it has
perform Overcast joins, therefore these nodes are                 received so far. After recovery, a node inspects the
perfect candidates to be used in the DNS round-                   log and restarts all overcasts in progress.
robin approach to scalability. By choosing these
                                                                  Live content on the Internet today is typically
nodes, no further replication is necessary.
                                                                  buffered before playback. This compensates for mo-
4.5     Joining a multicast group                                 mentary glitches in network throughput. Overcast
                                                                  can take advantage of this buffering to mask the
To join a multicast group, a Web client issues an                 failure of a node being used to Overcast data. As
HTTP GET request with the URL for a group. The                    long as the failure occurs in a node that is not at the
hostname of the URL names the root node(s). The                   edge of the Overcast network, an HTTP client need
root uses the pathname of the URL, the location of                not ever become aware that the path of data from
the client, and its database of the current status of             the root has been changed in the face of failure.
the Overcast nodes to decide where to connect the
client to the multicast tree. Because status informa-             5   Evaluation
tion is constantly propagated to the root, a decision
may be made quickly without further network traf-                 In this section, the protocols presented above     are
fic, enabling fast joins.                                          evaluated by simulation. Although we have          de-
Joining a group consists of selecting the best server             ployed Overcast in the real world, we have not     yet
and redirecting the client to that server. The de-                deployed on a sufficiently large network to run      the
tails of the server selection algorithm are beyond                experiments we have simulated.
the scope of this paper as considerable previous                  To evaluate the protocols, an overlay network is sim-
work [3, 18] exists in this area. Furthermore, Over-              ulated with increasing numbers of overcast nodes
cast’s particular choices are constrained consider-               while keeping the total number of network nodes
ably by a desire to avoid changes at the client. With-            constant. Overcast should build better trees as
out such a constraint simpler choices could have                  more nodes are deployed, but protocol overhead
been made, such as allowing clients to participate                may grow.
directly in the Overcast tree building protocol.
                                                                  We use the Georgia Tech Internetwork Topology
Although we do not discuss server selection here, a               Models [25] (GT-ITM) to generate the network
number of Overcast’s details exist to support this                topologies used in our simulations. We use the
important functionality, however it may actually be               “transit-stub” model to obtain graphs that more
implemented. A centralized root performing redi-                  closely resemble the Internet than a pure random
rections is convenient for an approach involving                  construction. GT-ITM generates a transit-stub
large tables containing collected Internet topology               graph in stages, first a number of random back-
data. The up/down algorithm allows for redirec-                   bones (transit domains), then the random structure
tions to nodes that are known to be functioning.                  of each back-bone, then random “stub” graphs are
                                                                  attached to each node in the backbones.
4.6     Multicasting with Overcast
                                                                  We use this model to construct five different 600
We refer to reliable multicasting on an overcast net-             node graphs. Each graph is made up of three tran-
work as “overcasting”. Overcasting proceeds along                 sit domains. These domains are guaranteed to be
connected. Each transit domain consists of an aver-




                                                        Fraction of possible bandwidth achieved
age of eight stub networks. The stub networks con-                                                1.0
tain edges amongst themselves with a probability of
0.5. Each stub network consists of an average of 25                                               0.8
nodes, in which nodes are once again connected with
a probability of 0.5. These parameters are from the                                               0.6
                                                                                                                                       Backbone
sample graphs in the GT-ITM distribution; we are                                                                                       Random
unaware of any published work that describes pa-                                                  0.4
rameters that might better model common Internet
topologies.                                                                                       0.2


We extended the graphs generated by GT-ITM                                                        0.0
with bandwidth information. Links internal to                                                           0     200             400                 600
the transit domains were assigned a bandwidth                                                               Number of overcast nodes
of 45Mbits/s, edges connecting stub networks to
                                                           Figure 3: Fraction of potential bandwidth provided by
the transit domains were assigned 1.5Mbits/s, fi-           Overcast.
nally, in the local stub domain, edges were assigned
100Mbit/s. These reflect commonly used network
technology: T3s, T1s, and Fast Ethernet. All               the sum of all nodes’ bandwidths back to the root
measurements are averages over the five generated           in an optimal distribution tree using router-based
topologies.                                                software. This indicates how well Overcast performs
                                                           compared to IP Multicast.
Empirical measurements from actual Overcast
                                                           The main observation is that, as expected, the back-
nodes show that a single Overcast node can eas-
                                                           bone strategy for placing Overcast nodes is more
ily support twenty clients watching MPEG-1 videos,
                                                           effective than the random strategy, but the results
though the exact number is greatly dependent on
                                                           of random placement are encouraging nonetheless.
the bandwidth requirements of the content. Thus
                                                           Even a small number of deployed Overcast nodes,
with a network of 600 overcast nodes, we are simu-
                                                           positioned at random, provide approximately 70%-
lating multicast groups of perhaps 12,000 members.
                                                           80% of the total possible bandwidth.

5.1   Tree protocol                                        It is extremely encouraging that, when using the
                                                           backbone approach, no node receives less bandwidth
                                                           under Overcast than it would receive from IP Mul-
The efficiency of Overcast depends on the position-
                                                           ticast. However some enthusiasm must be withheld,
ing of Overcast nodes. In our first experiments, we
                                                           because a simulation artifact has been left in these
compare two different approaches to choosing po-
                                                           numbers to illustrate a point.
sitions. The first approach, labelled “Backbone”,
preferentially chooses transit nodes to contain Over-      Notice that the backbone approach and the random
cast nodes. Once all transit nodes are Overcast            approach differ in effectiveness even when all 600
nodes, additional nodes are chosen at random. This         nodes of the network are Overcast nodes. In this
approach corresponds to a scenario in which the            case the same nodes are participating in the proto-
owner of the Overcast nodes places them strategi-          col, but better trees are built using the backbone
cally in the network. In the second, labelled “Ran-        approach. This illustrates that the trees created by
dom”, we select all Overcast nodes at random. This         the tree-building protocol are not unique. The back-
approach corresponds to a scenario in which the            bone approach fares better by this metric because
owner of Overcast nodes does not pay attention to          in our simulations backbone nodes were turned on
where the nodes are placed.                                first. This allowed backbone nodes to preferrentially
                                                           form the “top” of the tree. This indicates that in
The goal of Overcast’s tree-building protocol is to        future work it may be beneficial to extend the tree-
optimize the bottleneck bandwidth available back           building protocol to accept hints that mark certain
to the root for all nodes. The goal is to provide          nodes as “backbone” nodes. These nodes would
each node with the same bandwidth to the root that         preferentially form the core of the distribution tree.
the node would have in an idle network. Figure 3
compares the sum of all nodes’ bandwidths back to          Overcast appears to perform quite well for its in-
the root in Overcast networks of various sizes to          tended goal of optimizing available bandwidth, but
                4                                                            50
                                                                                                                              ||
                                                                                       ||    Lease = 20 Rounds
                                                   Random                    40        |     Lease = 10 Rounds                      ||
                3                                                                            Lease = 5 Rounds            ||
                                                   Backbone
Average waste



                                                                                                       ||      ||
                                                                             30




                                                                    Rounds
                                                                                                  ||                                |
                2                                                                            ||
                                                                                                  |                      |
                                                                             20                                |              |
                                                                                                       |

                1                                                                            |
                                                                             10         ||
                                                                                        |
                                                                                  |
                                                                                  ||
                0                                                            0
                    0     200             400                 600                 0                  200             400           600
                        Number of overcast nodes                                                        Overcast nodes

Figure 4: Ratio of the number of times a packet must “hit           Figure 5: Number of rounds to reach a stable distribution
the wire” to be propagated through an Overcast network to a         tree as a function of the number of overcast nodes and the
lower bound estimate of the same measure for IP Multicast.          length of the lease period.


it is reasonable to wonder what costs are associated                that network load is more telling for Overcast. That
with this performance.                                              is, Overcast has quite low scores for average stress,
                                                                    but that metric does not describe how often a longer
To explore this question we measure the network
                                                                    route was taken when a shorter route was available.
load imposed by Overcast. We define network load
to be the number of times that a particular piece of                Another question is how fast the tree protocol con-
data must traverse a network link to reach all Over-                verges to a stable distribution tree, assuming a sta-
cast nodes. In order to compare to IP Multicast
                                                                    ble underlying network. This is dependent on three
Figure 4 plots the ratio of the network load imposed                parameters. The round period controls how long a
by Overcast to a lower bound estimate of IP Mul-                    node that has not yet determined a stable position
ticast’s network load. For a given set of nodes, we
                                                                    in the hierarchy will wait before evaluating a new set
assume that IP Multicast would require exactly one                  of potential parents. The reevaluation period deter-
less link than the number of nodes. This assumes
                                                                    mines how long a node will wait before reevaluating
that all nodes are one hop away from another node,
                                                                    its position in the hierarchy once it has obtained a
which is unlikely to be true in sparse topologies, but              stable position. Finally the lease period determines
provides a lower bound for comparison.
                                                                    how long a parent will wait to hear from a child
Figure 4 shows that for Overcast networks with                      before reporting the child’s death.
greater than 200 nodes Overcast imposes somewhat
less than twice as much network load as IP Multi-                   For convenience, we measure all convergence times
cast. In return for this extra load Overcast offers                  in terms of the fundamental unit, the round time.
reliable delivery, immediate deployment, and future                 We also set the reevaluation period and lease pe-
flexibility. For networks with few Overcast nodes,                   riod to the same value. Figure 5 shows how long
Overcast appears to impose a considerably higher                    Overcast requires to converge if an entire Overcast
network load than IP Multicast. This is a result of                 network is simultaneously activated. To demon-
our optimistic lower bound on IP Multicast’s net-                   strate the effect of a changing reevaluation and lease
work load, which assumes that 50 randomly placed                    period, we plot for the “standard” lease time—10
nodes in a 600 node network can be spanned by 49                    rounds, as well as longer and shorter periods. Lease
links.                                                              periods shorter than five rounds are impractical be-
                                                                    cause children actually renew their leases a small
Another metric to measure the effectiveness of an                    random number of rounds (between one and three)
application-level multicast technique is stress, pro-               before their lease expires to avoid being thought
posed in [16]. Stress indicates the number of times                 dead. We expect that a round period on the order of
that the same data traverses a particular physical                  1-2 seconds will be practical for most applications.
link. By this metric, Overcast performs quite well
with average stresses of between 1 and 1.2. We do                   We next measure convergence times for an existing
not present detailed analysis of Overcast’s perfor-                 Overcast network in which overcast nodes are added
mance by this metric, however, because we believe                   or fail. We simulate overcast networks of various
         50                                                                                               60
                         ||      Ten new nodes
         40              |       Five new nodes                                 ||                                     ||       10 new node
                                 One new node                                                                          |        5 new node
                         ||      Ten nodes fail                                                           40                    1 new node




                                                                                           Certificates
         30                                                           ||                                                                                     ||
Rounds




                         |       Five nodes fail                                |
                                 One node fails                       |                                                                       ||     ||
                                                           ||
         20        |||                                     |                                                                       ||    ||
                               ||              |||                                                                          ||
                                            ||        ||        |||                                       20       ||
                                | || |||                                   ||                                                                                |
                                             |         |                    |                                                                 |      |
                                         ||                                                                                              |
         10                       |      |                                                                                  |      |
                                                                                                                   |
                         |||
         0                                                                                                0
              0                              200                400                  600                       0                        200         400           600
                                                 Overcast nodes                                                              Overcast nodes (before additions)

Figure 6: Number of rounds to recover a stable distribution                                Figure 7: Certificates received at the root in response to
tree as a function of the number of nodes that change state                                node additions.
and the number of nodes in the network.

                                                                                           used include the size of the overcast network and
sizes until they quiesce, add and remove Overcast                                          the rate of topology changes. Topology changes oc-
nodes, and then simulate the network until it qui-                                         cur when the properties of the underlying network
esces once again. We measure the time, in rounds,                                          change, nodes fail, or nodes are added. Therefore
for the network to quiesce after the changes. We                                           the up/down algorithm is evaluated by simulating
measure for various numbers of additions and re-                                           overcast networks of various sizes in which various
movals allowing us to assess the dependence of con-                                        numbers of failures and additions occur.
vergence on how many nodes have changed state.
We measure only the backbone approach.                                                     To assess the up/down protocol’s ability to provide
                                                                                           timely status updates to the root without undue
Figure 6 plots convergence times (using a 10 round                                         overhead we keep track of the number of certificates
lease time) against the number of overcast nodes in                                        (for both “birth” and “death”) that reach the root
the network. The convergence time for node fail-                                           during the previous convergence tests. This is in-
ures is quite modest. In all simulations the Over-                                         dicative of the bandwidth required at the root node
cast network reconverged after less than three lease                                       to support an overcast network of the given size and
times. Furthermore, the reconvergence time scaled                                          is dependent on the amount of topology change in-
well against both the number of nodes failing and                                          duced by the additions and deletions.
the total number of nodes in the overcast network.
In neither case was the convergence time even lin-                                         Figure 7 graphs the number of certificates received
early affected.                                                                             by the root node in response to new nodes being
                                                                                           brought up in the overcast network. Remember, the
For node additions, convergence times do appear                                            root may receive multiple certificates per node ad-
more closely linked to the size of the Overcast net-                                       dition because the addition is likely to cause some
work. This makes intuitive sense because new nodes                                         topology reconfiguration. Each time a node picks
are navigating the network to determine their best                                         a new parent that parent propagates a birth cer-
location. Even so, in all simulations fewer than                                           tificate. These results indicate that the number
five lease times are required. It is important to                                           of certificates is quite modest: certainly no more
note that an Overcast network continues to func-                                           than four certificates per node addition, usually ap-
tion even while stabilizing. Performance may be                                            proximately three. What is more important is that
somewhat impacted by increased measurement traf-                                           the number of certificates scales more closely to the
fic and by TCP setup and tear down overhead as                                              number of new nodes than the size of the overcast
parents change, but such disruptions are localized.                                        network. This gives evidence that overcast can scale
                                                                                           to large networks.
5.2               Up/Down protocol
                                                                                           Similarly, Overcast requires few certificates to react
The goal of the up/down algorithm is to minimize                                           to node failures. Figure 8 shows that in the common
the bandwidth required at the root node while main-                                        case, no more than four certificates are required per
taining timely status information for the entire net-                                      node failure. Again, because the number of certifi-
work. Factors that affect the amount of bandwidth                                           cates is proportional to the number of failures rather
               60
                                                                                       that Overcast networks work well on large-scale net-
                                    ||                                                 works, supporting multicast groups of up to 12,000
                        ||                     ||        10 node failures
                                    |          |         5 node failures               members. Given these results and the low cost for
               40                                        1 node failure                Overcast nodes, we believe that putting computa-
Certificates




                        |
                                                                                       tion and storage in the network fabric is a promis-
                                                                            ||         ing approach for adding new services to the Internet
                                                    ||
               20            ||           ||                   ||                      incrementally.

                             |            |         |                       |
                                                               |
                                                                                       Acknowledgements
               0
                    0                    200                  400                600
                                 Overcast nodes (before deletions)                     We thank Hari Balakrishnan for helpful input
                                                                                       concerning the tree building algorithm; Suchitra
Figure 8: Certificates received at the root in response to
node deletions.                                                                        Raman, Robert Morris, and our shepherd, Fred
                                                                                       Douglis, for detailed comments that improved our
                                                                                       presentation in many areas; and the many anony-
than the size of the network, Overcast appears to of-                                  mous reviewers whose reviews helped us to see our
fer the ability to scale to large networks.                                            work with fresh eyes.
On the other hand, Figure 8 shows that there are
some cases that fall far outside the norm. The large
spikes at 50 and 150 node networks with 5 and 10                                       References
failures occurred because of failures that happened
to occur near the root. When a node with a sub-                                         [1] FastForward Networks’ broadcast overlay archi-
stantial number of children chooses a new parent                                            tecture. Technical report, FastForward, 2000.
it must convey it’s entire set of descendants to its                                        www.ffnet.com/pdfs/BOA-whitepaperv6.PDF.
new parent. That parent then propagates the entire                                      [2] Elan Amir, Steven McCanne, and Randy H. Katz.
set. However, when the information reaches a node                                           An active service framework and its application to
that already knows the relationships in question, the                                       real time multimedia transcoding. In Proc. ACM
update is quashed. In these cases, because the re-                                          SIGCOMM Conference (SIGCOMM ’98), pages
configurations occurred high in the tree there was                                           178–190, September 1998.
no chance to quash the updates before they reached                                      [3] Yair Amir, Alec Peterson, and David Shaw. Seam-
the root. In larger networks such failures are less                                         lessly selecting the best copy from Internet-wide
likely.                                                                                     replicated web servers. In The 12th International
                                                                                            Symposium on Distributed Computing (DISC’98),
                                                                                            pages 22–23, September 1998.
6                   Conclusions                                                         [4] Michael Baentsch, Georg Molter, and Peter Sturm.
                                                                                            Introducing application-level replication and nam-
We have described a simple tree-building protocol                                           ing into today’s web. In Proc. 5th International
that yields bandwidth-efficient distribution trees for                                        World Wide Web Conference, May 1996.
single-source multicast and our up/down protocol                                        [5] A. Basso, C. Cranor, R. Gopalakrishnan, M. Green,
for providing timely status updates to the root of the                                      C.R. Kalmanek, D. Shur, S. Sibal, C.J. Sreenan,
distribution tree in scalable manner. Overcast im-                                          and J.E. van der Merwe. PRISM, an IP-based ar-
plements these protocols in an overlay network over                                         chitecture for broadband access to TV and other
the existing Internet. The protocols allow Overcast                                         streaming media. In Proc. IEEE International
networks to dynamically adapt to changes (such as                                           Workshop on Network and Operating System Sup-
                                                                                            port for Digital Audio and Video, June 2000.
congestion and failures) in the underlying network
infrastructure and support large, reliable single-                                      [6] Azer Bestavros. Speculative data dissemination
source multicast groups. Geographically-dispersed                                           and service to reduce server load, network traffic,
businesses have deployed Overcast nodes in small-                                           and response time in distributed information sys-
scale Overcast networks for distribution of high-                                           tems. In Proc. of the 1996 International Conference
                                                                                            on Data Engineering (ICDE ’96), March 1996.
quality, on-demand video to unmodified desktops.
                                                                                        [7] M. Blaze. Caching in Large-Scale Distributed File
Simulation studies with topologies created with the                                         Systems. PhD thesis, Princeton University, January
Georgia Tech Internetwork Topology Models show                                              1993.
 [8] Anawat Chankhunthod, Peter B. Danzig, Chuck            [21] Stefan Savage, Tom Anderson, Amit Aggarwal,
     Neerdaels, Michael F. Schwartz, and Kurt J. Wor-            David Becker, Neal Cardwell, Andy Collins, Eric
     rell. A hierarchical Internet object cache. In Proc.        Hoffman, John Snell, Amin Vahdat, Geoff Voelker,
     USENIX 1996 Annual Technical Conference, pages              and John Zahorjan.      Detour: A case for in-
     153–164, January 1996.                                      formed Internet routing and transport. IEEE Mi-
 [9] Yatin Chawathe, Steven McCanne, and Eric                    cro, 19(1):50–59, January 1999.
     Brewer. RMX: Reliable multicast for heterogeneous      [22] Michael D. Schroeder, Andrew D. Birrell, Michael
     networks. In Proc. IEEE Infocom, March 2000.                Burrows, Hal Murray, Roger M. Needham,
[10] P. Danzig, R. Hall, and M. Schwartz. A case                 Thomas L. Rodeheffer, Edwin H. Satterthwaite,
     for caching file objects inside internetworks. In            and Charles P. Thacker. Autonet: A high-speed,
     Proc. ACM SIGCOMM Conference (SIGCOMM                       self-configuring local area network using point-
     ’93), pages 239–248, September 1993.                        to-point links. IEEE/ACM Trans. Networking,
                                                                 9(8):1318–1335, October 1991.
[11] S. E. Deering. Multicast Routing in a Datagram
     Internetwork. PhD thesis, Stanford University, De-     [23] David L. Tennenhouse, Jonathan M. Smith, W.
     cember 1991.                                                David Sincoskie, David J. Wetherall, and Gary J.
                                                                 Minden. A survey of active network research. IEEE
[12] R. Droms. Dynamic host configuration protocol.               Communications Magazine, 35(1):80–86, January
     RFC 2131, Internet Engineering Task Force, March            1997.
     1997. ftp://ftp.ietf.org/rfc/rfc2131.txt.
                                                            [24] J. Touch and S. Hotz.     The X-bone (white
[13] Paul Francis. Yoid: Your Own Internet Dis-                  paper).   Technical report, SIS, May 1997.
     tribution. Technical report, ACIRI, April 2000.             www.isi.edu/x-bone.
     www.aciri.org/yoid.
                                                            [25] Ellen W. Zegura, Kenneth L. Calvert, and Samrat
[14] J. Gwertzman and M. Seltzer. The case for geo-              Bhattacharjee. How to model an internetwork. In
     graphical push-caching. In Proc. 5th Workshop on            Proc. IEEE Infocom, pages 40–52, March 1996.
     Hot Topics in Operating Systems (HotOS-V), pages
     51–57. IEEE Computer Society Technical Commit-         [26] Lixia Zhang, Scott Michel, Khoi Nguyen, Adam
     tee on Operating Systems, May 1995.                         Rosenstein, Sally Floyd, and Van Jacobson. Adap-
                                                                 tive web caching: Towards a new global caching ar-
[15] Hugh W. Holbrook and David R. Cheriton. IP                  chitecture. In Proc. 3rd International World Wide
     multicast channels: EXPRESS support for large-              Web Caching Workshop, June 1998.
     scale single-source applications. In Proc. ACM SIG-
     COMM Conference (SIGCOMM ’99), pages 65–78,
     September 1999.
[16] Yang hu Chu, Sanjay G. Rao, and Hui Zhang. A
     case for end system multicast. In Proc. ACM SIG-
     METRICS Conference (SIGMETRICS ’00), June
     2000.
[17] M. Frans Kaashoek, Robbert van Renesse, Hans
     van Staveren, and Andrew S. Tanenbaum. FLIP:
     an internetwork protocol for supporting dis-
     tributed systems. ACM Trans. Computer Systems,
     11(1):77–106, February 1993.
[18] D. R. Karger, E. Lehman, T. Leighton, M. Levine,
     D. Lewin, and R. Panigrahy. Consistent hashing
     and random trees: Distributed caching protocols
     for relieving hot spots on the world wide web. In
     Proc. 29th ACM Symposium on Theory of Comput-
     ing, pages 654–663, May 1997.
[19] Steven McCanne and Van Jacobson. Receiver-
     driven layed multicast. In Proc. ACM SIGCOMM
     Conference (SIGCOMM ’96), pages 117–130, Au-
     gust 1996.
      o
[20] J¨rg Nonnenmacher, Ernst W. Biersack, and Don
     Towsley. Parity-based loss recovery for reliable
     multicast transmission.   In Proc. ACM SIG-
     COMM Conference (SIGCOMM ’97), pages 289–
     300, September 1997.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:24
posted:11/6/2011
language:English
pages:16