Overcast: Reliable Multicasting with an Overlay Network
John Jannotti David K. Giﬀord Kirk L. Johnson
M. Frans Kaashoek James W. O’Toole, Jr.
Abstract consumers is considerably less than the natural con-
sumption rate of such media. With currently avail-
Overcast is an application-level multicasting system able bandwidth, a 10-minute news clip might require
that can be incrementally deployed using today’s an hour of download time. On the other hand, large-
Internet infrastructure. These properties stem from scale (thousands of simultaneous viewers) use of
Overcast’s implementation as an overlay network. even moderate-bandwidth live video streams (per-
An overlay network consists of a collection of nodes haps 128Kbit/s) is precluded because network costs
placed at strategic locations in an existing network scale linearly with the number of consumers.
fabric. These nodes implement a network abstrac-
tion on top of the network provided by the under- Overcast attempts to address these diﬃculties by
lying substrate network. combining techniques from a number of other sys-
tems. Like IP Multicast, Overcast allows data to
Overcast provides scalable and reliable single-source be sent once to many destinations. Data are repli-
multicast using a simple protocol for building eﬃ- cated at appropriate points in the network to mini-
cient data distribution trees that adapt to changing mize bandwidth requirements while reaching multi-
network conditions. To support fast joins, Overcast ple destinations. Overcast also draws from work in
implements a new protocol for eﬃciently tracking caching and server replication. Overcast’s multicast
the global status of a changing distribution tree. capabilities are used to ﬁll caches and create server
replicas throughout a network. Finally Overcast is
Results based on simulations conﬁrm that Over- designed as an overlay network, which allows Over-
cast provides its added functionality while perform- cast to be incrementally deployed. As nodes are
ing competitively with IP Multicast. Simulations added to an Overcast system the system’s beneﬁts
indicate that Overcast quickly builds bandwidth- are increased, but Overcast need not be deployed
eﬃcient distribution trees that, compared to IP universally to be eﬀective.
Multicast, provide 70%-100% of the total band-
width possible, at a cost of somewhat less than twice An Overcast system is an overlay network consist-
the network load. In addition, Overcast adapts ing of a central source (which may be replicated
quickly to changes caused by the addition of new for fault tolerance), any number of internal Over-
nodes or the failure of existing nodes without caus- cast nodes (standard PCs with permanent storage)
ing undue load on the multicast source. sprinkled throughout a network fabric, and stan-
dard HTTP clients located in the network. Using
a simple tree-building protocol, Overcast organizes
1 Introduction the internal nodes into a distribution tree rooted
at the source. The tree-building protocol adapts
Overcast is motivated by real-world problems faced to changes in the conditions of the underlying net-
by content providers using the Internet today. How work fabric. Using this distribution tree, Overcast
can bandwidth-intensive content be oﬀered on de- provides large-scale, reliable multicast groups, espe-
mand? How can long-running content be oﬀered to cially suited for on-demand and live data delivery.
vast numbers of clients? Neither of these challenges Overcast allows unmodiﬁed HTTP clients to join
are met by today’s infrastructure, though for dif- these multicast groups.
ferent reasons. Bandwidth-intensive content (such
as 2Mbit/s video) is impractical because the bot- Overcast permits the archival of content sent to mul-
tleneck bandwidth between content providers and ticast groups. Clients may specify a starting point
when joining an archived group, such as the begin-
ning of the content. This feature allows a client to
“catch up” on live content by tuning back ten min- O
utes into a stream, for instance. In practice, the 100
nature of a multicast group will most often deter-
mine the way it is accessed. A group containing 10
stock quotes will likely be accessed live. A group
containing a software package will likely be accessed
from start to ﬁnish; “live” would have no meaning O
for such a group. Similarly, high-bandwidth con-
tent can not be distributed live when the bottleneck
bandwidth from client to server is too small. Such
content will always be accessed relative to its start. Figure 1: An example network and Overcast topology. The
straight lines are the links in the substrate network. These
We have implemented Overcast and used it to create links are labeled with bandwidth in Mbit/s. The curved lines
represent connections in the Overlay network. S represents
a data distribution system for businesses. Most cur-
the source, O represents two Overcast nodes.
rent users distribute high quality video that clients
access on demand. These businesses operate ge-
ographically distributed oﬃces and need to dis- the constrained link only once.
tribute video to their employees. Before using Over-
cast, they met this need with low resolution Web- The contributions of this paper are:
accessible video or by physically reproducing and
mailing VHS tapes. Overcast allows these users • A novel use of overlay networks. We describe
to distribute high-resolution video over the Inter- how reliable, highly-scalable, application-level
net. Because high quality videos are large (Approx- multicast can be provided by adding nodes that
imately 1 Gbyte for a 30 minute MPEG-2 video), have permanent storage to the existing network
it is important that the videos are eﬃciently dis- fabric.
tributed and available from a node with high band-
width to the client. To a lesser extent, Overcast is • A simple protocol for forming eﬃcient and scal-
also being used to broadcast live streams. Existing able distribution trees that adapt to changes in
Overcast networks typically contain tens of nodes the conditions of the substrate network without
and are scheduled to grow to hundreds of nodes. requiring router support.
The main challenge in Overcast is the design and • A novel protocol for maintaining global status
implementation of protocols that can build eﬃ- at the root of a changing distribution tree. This
cient, adaptive distribution trees without knowing state allows clients to join an Overcast group
the details of the substrate network topology. The quickly while maintaining scalability.
substrate network’s abstraction provides the ap- • Results from simulations that show Overcast is
pearance of direct connectivity between all Over- eﬃcient. Overcast can scale to a large num-
cast nodes. Our goal is to build distribution trees ber of nodes; its eﬃciency approaches router-
that maximize each node’s bandwidth from the based systems; it quickly adjusts to conﬁgura-
source and utilize the substrate network topology tion changes; and a root can track the status of
eﬃciently. For example, the Overcast protocols an Overcast network in a scalable manner.
should attempt to avoid sending data multiple times
over the same physical link. Furthermore, Overcast
should respond to transient failures or congestion in Section 2 details Overcast’s relation to prior work.
the substrate network. Overcast’s general structure is examined in Section
3, ﬁrst by describing overlay networks in general,
Consider the simple network depicted in Figure 1. then providing the details of Overcast. Section
The network substrate consists of a root node (R), 4 describes the operation of the Overcast network
two Overcast nodes (O), a router, and a number performing reliable application-level multicast. Fi-
of links. The links are labeled with bandwidth in nally, Section 5 examines Overcast’s ability to build
Mbit/s. There are three ways of organizing the root a bandwidth-eﬃcient overlay network for multicas-
and the Overcast nodes into a distribution tree. The ting and to adapt eﬃciently to changing network
organization shown optimizes bandwidth by using conditions.
2 Related Work the need for manually determined topology infor-
mation when the overlay network is created, but
Overcast seeks to marry the bandwidth savings of also reacts transparently to the addition or removal
an IP Multicast distribution tree with the reliability of nodes in the running system. Initialization, ex-
and simplicity of store-and-forward operation using pansion, and fault tolerance are uniﬁed.
reliable communication between nodes. Overcast
builds on research in IP multicast, content distri- A number of service providers (e.g., Adero, Aka-
bution (caching, replication, and content routing), mai, and Digital Island) operate content distribu-
and overlay networks. We discuss each in turn. tion networks, but in-depth information describing
their internals is not public information. FastFor-
IP Multicast IP Multicast  is designed to pro- ward’s product is described below as an example of
vide eﬃcient group communication as a low level an overlay network.
network primitive. Overcast has a number of ad-
vantages over IP Multicast. First, as it requires no Overlay Networks A number of research groups
router support, it can be deployed incrementally on and service providers are investigating services
existing networks. Second, Overcast provides band- based on overlay networks. In particular, many of
width savings both when multiple clients view con- these services, like Overcast, exist to provide some
tent simultaneously and when multiple clients view form of multicast or content distribution. These in-
content at diﬀerent times. Third, while reliable mul- clude End System Multicast , Yoid  (formerly
ticast is the subject of much research [19, 20], prob- Yallcast), X-bone , RMX , FastForward ,
lems remain when various links in the distribution and PRISM . All share the goal of providing
tree have widely diﬀerent bandwidths. A common the beneﬁts of IP multicast without requiring di-
strategy in such situations is to decrease the ﬁdelity rect router support or the presence of a physical
of content over lower bandwidth links. Although broadcast medium. However, except Yoid, these ap-
such a strategy has merit when content must be de- proaches do not exploit the presence of permanent
livered live, Overcast also supports content types storage in the network fabric.
that require bit-for-bit integrity, such as software.
End System Multicast is an overlay network that
Express  is a single-source multicasting system provides small-scale multicast groups for telecon-
that addresses some of IP Multicast’s deﬁcits. Ex- ferencing applications; as a result the End System
press alleviates diﬃculties relating to IP Multicast’s Multicast protocol (Narada) is designed for multi-
small address space, susceptibility to denial of ser- source multicast. The Overcast protocols diﬀerent
vice attacks, and billing diﬃculties which may lie from Narada in order to support large-scale multi-
at the root of IP Multicast’s lack of deployment cast groups.
on commercial networks. In these three respects
Overcast bears a great deal of similarity to Ex- Yoid is a generic architecture for overlay networks
press. Overcast diﬀers mainly by stressing deploy- with a number of new protocols, which are in devel-
ability and ﬂexibility. Overcast does not require opment. The most striking diﬀerence between Yoid
router modiﬁcations, simplifying adoption and in- and Overcast is in approach. Yoid strives to be a
creasing ﬂexibility. Although Overcast provides a general purpose overlay network and content distri-
useful range of functionality, we recognize that there bution toolkit, addressing applications as diverse as
needs for which Overcast may not be suited. Ex- netnews, streaming broadcasts, and bulk email dis-
press standardizes a single model in the router which tribution. While these goals are laudable, we believe
works to lock out applications with diﬀerent needs. that because Overcast is more focused on providing
single-source multicast our protocols are simpler to
Content Distribution Systems Others have ad- understand and implement. Nonetheless, there re-
vocated distributing content servers in the net- mains a great deal of similarity between Overcast
work fabric, from initial proposals  to larger and Yoid, including url-like group naming, the use
projects, such as Adaptive Caching , Push of disk space to “time-shift” multicast distribution,
Caching , Harvest , Dynamic Hierarchical and automatic tree conﬁguration.
Caching , Speculative Data Dissemination ,
and Application-Level Replication . Overcast ex- X-bone is also a general-purpose overlay network
tends this previous work by building an overlay net- that can support many diﬀerent network services.
work using a self-organizing algorithm. This algo- The overlay networks formed by X-bone are meshes,
rithm, operating continuously, not only eliminates which are statically conﬁgured.
RMX focuses on real-time reliable multicast. As plication uses large amounts of disk space for long
such, its focus is on reconciling the heterogenous ca- periods of time which is problematic in a shared en-
pabilities and network connections of various clients vironment.
with the need for reliability. Therefore their work
focuses on semantic rather than data reliability. For Our observation is that one-time hardware costs do
instance, RMX can be used to change high resolu- not drive the total costs of systems on the scale
tion images into progressive JPEGs before trans- that we propose. Total cost is dominated by band-
mittal to underprovisioned clients. Our work is less width, maintenance, and continual hardware obso-
concerned with interactive response times. Overcast lescence. Therefore Overcast seeks to minimize the
is designed for content that clients are interested in use of bandwidth, cut maintenance costs by sim-
only at full ﬁdelity, even if it means that the content plifying node deployment, and avoid obsolescence
does not become available to all clients at the same by structuring the system to allow older nodes to
time. continue to contribute to the total eﬃciency of the
FastForward Networks produces a system sharing
many properties with RMX. Like RMX, FastFor- Active Networks One may view overlay networks
ward focuses on real-time operation and includes as an alternative implementation of active net-
provisions for intelligently decreasing the band- works . In active networks, new protocols and
width requirements of rich media for low-bandwidth application-code can dynamically be downloaded
clients. Beyond this, FastForward’s product diﬀers into routers, allowing for rapid innovation of net-
from Overcast in that its distribution topology is work services. Overcast avoids some of the hard
statically conﬁgured by design. Within this stati- problems of active networks by focusing on a single
cally conﬁgured topology, the product can pick dy- application; it does not have to address the prob-
namic routes. In this way FastForward allows ex- lems created by dynamic downloading of code and
perts to conﬁgure the topology for better perfor- sharing resources among multiple competing appli-
mance and predictability while allowing for a lim- cations. Furthermore, since Overcast requires no
ited degree of dynamism. Overcast’s design seeks changes to existing routers, it is easier to deploy.
to minimize human intervention to allow its overlay The main challenge for Overcast is to be competi-
networks to scale to thousands of nodes. Similarly, tive with solutions that are directly implemented on
FastForward achieves fault tolerance by statically the network level.
conﬁguring distribution topologies to avoid single
points of failure, while Overcast seeks to dynami-
cally reconﬁgure its overlay in response to failures. 3 The Overcast Network
PRISM is an architecture for distributing streaming
media over IP. Its architecture bears some similarity This section describes the overlay network created
to Overcast, but their work appears focused on the by the Overcast system. First, we argue the ben-
naming of content and the design of interior nodes of eﬁts and drawbacks of using an overlay network.
the system. PRISM’s high level design includes an After concluding that an overlay network is appro-
overlay based content distribution mechanism, but priate for the task at hand, we explore the particular
it is assumed that such a system can be “plugged design of an overlay network to meet Overcast’s de-
in” to the rest of PRISM. Overcast could provide mands. To do so, we examine the key design require-
that mechanism. ment of the Overcast network—single source distri-
bution of bandwidth-intensive media on today’s In-
Active Services Active Services  is a frame- ternet infrastructure. Finally we illustrate the use
work for implementing services at the application- of Overcast with an example.
level throughout the fabric of the network. In that
sense, there is a strong similarity in mindset between
our works. However, Active Services must contend
3.1 Why overlay?
with the diﬃculty of sharing the resources of a sin-
gle computer among multiple services, a diﬃculty Overcast was designed to meet the needs of con-
we avoid by using dedicated nodes. Perhaps be- tent providers on the Internet. This goal led us to
cause of this challenge, Active Service applications an overlay network design. To understand why we
have focused on real-time multimedia streaming, an chose an overlay network, we consider the beneﬁts
application with transient resource needs. Our ap- and drawbacks of overlays.
An overlay network provides advantages over both Management complexity The manager of an
centrally located solutions and systems that advo- overlay network is physically far removed from the
cate running code in every router. An overlay net- machines being managed. Routine maintenance
work is: must either be unnecessary or possible from afar,
using tools that do not scale in complexity with the
Incrementally Deployable An overlay network
size of the network. Physical maintenance must be
requires no changes to the existing Internet infras-
minimized and be possible by untrained personnel.
tructure, only additional servers. As nodes are
added to an overlay network, it becomes possible to The real world In the real world, IP does not
control the paths of data in the substrate network provide universal connectivity. A large portion of
with ever greater precision. the Internet lies behind ﬁrewalls. A signiﬁcant and
Adaptable Although an overlay network abstrac- growing share of hosts are behind Network Address
tion constrains packets to ﬂow over a constrained Translators (NATs), and proxies. Dealing with
set of links, that set of links is constantly being these practical issues is tedious, but crucial to adop-
optimized over metrics that matter to the applica- tion.
tion. For instance, the overlay nodes may opti-
mize latency at the expense of bandwidth. The De- Ineﬃciency An overlay can not be as eﬃcient as
tour Project  has discovered that there are often code running in every router. However, our observa-
routes between two nodes with less latency than the tion is that when an overlay network is small, the in-
routes oﬀered by today’s IP infrastructure. Overlay eﬃciency, measured in absolute terms, will be small
networks can ﬁnd and take advantage of such routes. as well — and as the overlay network grows, its ef-
ﬁciency can approach the eﬃciency of router based
Robust By virtue of the increased control and the servcies.
adaptable nature of overlay networks, an overlay
network can be more robust than the substrate fab- Information loss Because the overlay network is
ric. For instance, with a suﬃcient number of nodes built on top of a network infrastructure (IP) that
deployed, an overlay network may be able to guar- oﬀers nearly complete connectivity (limited only by
antee that it is able to route between any two nodes ﬁrewalls, NATs, and proxies), we expend consider-
in two independent ways. While a robust substrate able eﬀort deducing the topology of the substrate
network can be expected to repair faults eventu- network.
ally, such an overlay network might be able to route
around faults immediately. The ﬁrst two of these problems can be addressed
and nearly eliminated by careful design. To ad-
Customizable Overlay nodes may be multi- dress management complexity, management of the
purpose computers, easily outﬁtted with whatever entire overlay network can be concentrated at a sin-
equipment makes sense. For example, Overcast gle site. The key to a centralized-administration
makes extensive use of disk space. This allows design is guaranteeing that newly installed nodes
Overcast to provide bandwidth savings even when can boot and obtain network connectivity without
content is not consumed simultaneously in diﬀerent intervention. Once that is accomplished, further in-
parts of the network. structions may be read from the central manage-
Standard An overlay network can be built on the ment server.
least common denominator network services of the
Firewalls, NATs and HTTP proxies complicate
substrate network. This ensures that overlay traﬃc
Overcast’s operation in a number of ways. Fire-
will be treated as well as any other. For example,
walls force Overcast to open all connections “up-
Overcast uses TCP (in particular, HTTP over port
stream” and to communicate using HTTP on port
80) for reliable transport. TCP is simple, well un-
80. This allows an Overcast network to extend ex-
derstood, network friendly, and standard. Alterna-
actly to those portions of the Internet that allow
tives, such as a “home grown” UDP protocol with
web browsing. NATs are devices used to multiplex
retransmissions, are less attractive by all these mea-
a small set of IP addresses (often exactly one) over a
sures. For better or for worse, creativity in reliable
number of clients. The clients are conﬁgured to use
transport is a losing battle on the Internet today.
the NAT as their default router. At the NAT, TCP
On the other hand, building an overlay network connections are rewritten to use one of the small
faces a number of interesting challenges. An overlay number of IP addresses managed by the NAT. TCP
network must address: port numbers allow the NAT to demultiplex return
packets back to the correct client. The complication and without obvious administration to avoid colli-
for Overcast is that client IP addresses are obscured. sions amongst new groups.
All Overcast nodes behind the NAT appear to have
the same IP address. HTTP proxies have the same On the other hand, a single-source model clearly of-
eﬀect. fers reduced functionality compared to a model that
allows any group member to multicast. As such,
Although private IP addresses are never directly Overcast is not appropriate for applications that re-
used by external Overcast nodes, there are times quire extensive use of such a model. However, many
when an external node must correctly report the applications which appear to need multi-source mul-
private IP address of another node. For example, ticast, such as a distributed lecture allowing ques-
an external node may have internal children. Dur- tions from the class, do not. In such an application,
ing tree building a node must report its childrens’ only one “non-root” sender is active at any particu-
addresses so that they may be measured for suitabil- lar time. It would be a simple matter for the sender
ity as parents themselves. Only the private address to unicast to the root, which would then perform the
is suitable for such purposes. To alleviate this com- true multicast on the behalf of the sender. A num-
plication all Overcast messages contain the sender’s ber of projects [15, 17, 22] have used or advocated
IP address in the payload of the message. such an approach.
The ﬁnal two disadvantages are not so easily dis-
missed. They represent the true tradeoﬀ between 3.3 Bandwidth Optimization
overlay networks and ubiquitous router based soft-
ware. For Overcast, the goal of instant deployment Overcast is designed for distribution from a single
is important enough to sacriﬁce some measure of source. As such, small latencies are expected to be
eﬃciency. However, the amount of ineﬃcency in- of less importance to its users than increased band-
troduced is a key metric by which Overcast should width. Extremely low latencies are only important
be judged. for applications that are inherently two-way, such
as video conferencing. Overcast is designed with
3.2 Single-Source Multicast the assumption that broadcasting “live” video on
the Internet may actually mean broadcasting with
a ten to ﬁfteen second delay.
Overcast is a single-source multicast system. This
contrasts with IP Multicast which allows any mem-
Overcast distribution trees are built with the sole
ber of a multicast group to send packets to all
goal of creating high bandwidth channels from the
other members of the group. Beyond the fact that
source to all nodes. Although Overcast makes no
this closely models our intended application domain,
guarantees that the topologies created are optimal,
there are a number of reasons to pursue this partic-
our simulations show that they perform quite well.
ular reﬁnement to the IP Multicast model.
The exact method by which high-bandwidth distri-
Simplicity Both conceptually and in implementa- bution trees are created and maintained is described
tion, a single-source system is simpler than an any- in Section 4.2.
source model. For example, a single-source provides
an obvious rendezvous point for group joins. 3.4 Deployment
Optimization It is diﬃcult to optimize the struc-
ture of the overlay network without intimate knowl- An important goal for Overcast is to be deployable
edge of the substrate network topology. This only on today’s Internet infrastructure. This motivates
becomes harder if the structure must be optimized not only the use of an overlay network, but many
for all paths . of its details. In particular, deployment must re-
quire little or no human intervention, costs per node
Address space Single-source multicast groups pro- should be minimized, and unmodiﬁed HTTP clients
vide a convenient alternative to the limited IP Mul- must be able to join multicast groups in the Over-
ticast address space. The namespace can be par- cast network.
titioned by ﬁrst naming the source, then allowing
further subdivision of the source’s choosing. In con- To help ease the human costs of deployment, nodes
trast, IP Multicast’s address space is ﬂat, limited, in the Overcast network conﬁgure themselves in an
adaptive distributed tree with a single root. No hu- a web page announcing the availability of the con-
man intervention is required to build eﬃcient dis- tent. When a user clicks on the URL for published
tribution trees, and nodes can be a part of multiple content, Overcast redirects the request to a nearby
distribution trees. appliance and the appliance serves the content. If
the content is video, no special streaming software
Overcast’s implementation on commodity PCs run- is needed. The user can watch the video over stan-
ning Linux further eases deployment. Development dard protocols and a standard MPEG player, which
is speeded by the familiar programming environ- is supplied with most browsers.
ment, and hardware costs are minimized by con-
tinually tracking the best price/performance ratio An administrator at the studio can control the over-
available in oﬀ-the-shelf hardware. The exact hard- lay network from a central point. She can view the
ware conﬁguration we have deployed has changed status of the network (e.g., which appliances are
many times in the year or so that we have deployed up), collect statistics, control bandwidth consump-
Overcast nodes. tion, etc.
The ﬁnal consumers of content from an Overcast Using this system, bulk data can be distributed eﬃ-
network are HTTP clients. The Overcast proto- ciently, even if the network between the appliances
cols are carefully designed so that unmodiﬁed Web and the studio consists of low-bandwidth or inter-
browsers can become members of a multicast group. mittent links. Given the relative prices of disk space
In Overcast, a multicast group is represented as an and network bandwidth, this solution is far less ex-
HTTP URL: the hostname portion names the root pensive than upgrading all network links between
of an Overcast network and the path represents a the studio and every client.
particular group on the network. All groups with
the same root share a single distribution tree.
Using URLs as a namespace for Overcast groups
has three advantages. First, URLs oﬀer a hierar- The previous section described the structure and
chal namespace, addressing the scarcity of multi- properties of the Overcast overlay network. This
cast group names in traditional IP Multicast. Sec- section describes how it functions: the initializa-
ond, URLs and the means to access them are an tion of individual nodes, the construction of the
existing standard. By delivering data over a simple distribution hierarchy, and the automatic mainte-
HTTP connection, Overcast is able to bring multi- nance of the network. In particular, we describe
casting to unmodiﬁed applications. Third, a URL’s the “tree” protocol to build distribution trees and
richer structure allows for simple expression of the the “up/down” protocol to maintain the global state
increased power of Overcast over tradition multi- of the Overcast network eﬃciently. We close by de-
cast. For example, a group suﬃx of start=10s may scribing how clients (web browsers) join a group and
be deﬁned to mean “begin the content stream 10 how reliable multicasting to clients is performed.
seconds from the beginning.”
3.5 Example usage
When a node is ﬁrst plugged in or moved to a new
We have used Overcast to build a content- location it automatically initializes itself and con-
distribution application for high-quality video and tacts the appropriate Overcast root(s). The ﬁrst
live streams. The application is built out of a pub- step in the initialization process is to determine an
lishing station (called a studio) and nodes (called IP address and gateway address that the node can
appliances). Appliances are installed at strategic use for general IP connectivity. If there is a local
locations in their network. The appliances boot, DHCP server then the node can obtain IP conﬁgu-
contact their studio, and self-organize into a distri- ration directly data using the DHCP protocol .
bution tree, as described below. No local adminis- If DHCP is unavailable, a utility program can be
tration is required. used from a nearby workstation for manual conﬁg-
The studio stores content and schedules it for deliv-
ery to the appliances. Typically, once the content Once the node has an IP conﬁguration it contacts a
is delivered, the publisher at the studio generates global, well-known registry, sending along its unique
serial number. Based on a node’s serial number, the root thereby becomes the current node. Next, the
registry provides a list of the Overcast networks the new node begins a series of rounds in which it will
node should join, an optional permanent IP conﬁg- attempt to locate itself further away from the root
uration, the network areas it should serve, and the without sacriﬁcing bandwidth back to the root. In
access controls it should implement. If a node is each round the new node considers its bandwidth
intended to become part of a particular content dis- to current as well as the bandwidth to current
tribution network, the conﬁguration data returned through each of current’s children. If the band-
will be highly speciﬁc. Otherwise, default values width through any of the children is about as high
will be returned and the networks to which a node as the direct bandwidth to current, then one of
will join can be controlled using a web-based GUI. these children becomes current and a new round
commences. In the case of multiple suitable chil-
4.2 The Tree Building Protocol dren, the child closest (in terms of network hops) to
the searching node is chosen. If no child is suitable,
the search for a parent ends with current.
Self-organization of appliances into an eﬃcient, ro-
bust distribution tree is the key to eﬃcient opera- To approximate the bandwidth that will be ob-
tion in Overcast. Once a node initializes, it begins a served when moving data, the tree protocol mea-
process of self-organization with other nodes of the sures the download time of 10 Kbytes. This mea-
same Overcast network. The nodes cooperatively surement includes all the costs of serving actual
build an overlay network in the form of a distri- content. We have observed that this approach to
bution tree with the root node at its source. This measuring bandwidth gives us better results than
section describes the tree-building protocol. approaches based on low-level bandwidth measure-
ments such as using ping. On the other hand, we
As described earlier, the virtual links of the overlay recognize that a 10 Kbyte message is too short to
network are the only paths on which data is ex- accurately reﬂect the bandwidth of “long fat pipes”.
changed. Therefore the choice of distribution tree We plan to move to a technique that uses progres-
can have a signiﬁcant impact on the aggregate com- sively larger measurements until a steady state is
munication behavior of the overlay network. By observed.
carefully building a distribution tree, the network
utilization of content distribution can be signiﬁ- When the measured bandwidths to two nodes are
cantly reduced. Overcast stresses bandwidth over within 10% of each other, we consider the nodes
other conceivable metrics, such as latency, because equally good and select the node that is closest, as
of its expected applications. Overcast is not in- reported by traceroute. This avoids frequent topol-
tended for interactive applications, therefore opti- ogy changes between two nearly equal paths, as well
mizing a path to shave small latencies at the ex- as decreasing the total number of network links used
pense of total throughput would be a mistake. On by the system.
the other hand, Overcast’s architecture as an over-
lay network allows this decision to be revisited. For A node periodically reevaluates its position in the
instance, it may be decided that trees should have tree by measuring the bandwidth to its current sib-
a ﬁxed maximum depth to limit buﬀering delays. lings (an up-to-date list is obtained from the par-
ent), parent, and grandparent. Just as in the initial
The goal of Overcast’s tree algorithm is to max- building phase, a node will relocate below its sib-
imize bandwidth to the root for all nodes. At a lings if that does not decrease its bandwidth back
high level the algorithm proceeds by placing a new to the root. The node checks bandwidth directly
node as far away from the root as possible with- to the grandparent as a way of testing its previous
out sacriﬁcing bandwidth to the root. This ap- decision to locate under its current parent. If nec-
proach leads to “deep” distribution trees in which essary the node moves back up in the hierarchy to
the nodes nonetheless observe no worse bandwidth become a sibling of its parent. As a result, nodes
than obtaining the content directly from the root. constantly reevaluate their position in the tree and
By choosing a parent that is nearby in the network, an Overcast network is inherently tolerant of non-
the distribution tree will form along the lines of the root node failures. If a node goes oﬀ-line for some
substrate network topology. reason, any nodes that were below it in the tree
will reconnect themselves to the rest of the rout-
The tree protocol begins when a newly initialized ing hierarchy. When a node detects that its parent
node contacts the root of an Overcast group. The is unreachable, it will simply relocate beneath its
grandparent. If its grandparent is also unreachable We call this protocol the “up/down” protocol be-
the node will continue to move up its ancestry until cause our current system uses it mainly to keep track
it ﬁnds a live node. The ancestor list also allows cy- of what nodes are up and what nodes are down.
cles to be avoided as nodes asynchronously choose However, arbitrary information in either of two large
new parents. A node simply refuses to become the classes may be propagated to the root. In particu-
parent of a node it believes to be it’s own ances- lar, if the information either changes slowly (e.g.,
tor. A node that chooses such a node will forced to up/down status of nodes), or the information can
rechoose. be combined eﬃciently from multiple children into a
single description (e.g., group membership counts),
While there is extensive literature on faster fail-over it can be propagated to the root. Rapidly chang-
algorithms, we have not yet found a need to opti- ing information that can not be aggregated during
mize beyond the strategy outlined above. It is im- propagation would overwhelm the root’s bandwidth
portant to remember that the nodes participating capacity.
in this protocol are dedicated machines that are less
prone to failure than desktop computers. If this be- Each node in the network, including the root node,
comes an issue, we have considered extending the maintains a table of information about all nodes
tree building algorithm to maintain backup parents lower than itself in the hierarchy and a log of all
(excluding a node’s own ancestry from considera- changes to the table. Therefore the root node’s ta-
tion) or an entire backup tree. ble contains up-to-date information for all nodes in
the hierarchy. The table is stored on disk and cached
By periodically remeasuring network performance,
in the memory of a node.
the overlay network can adapt to network condi-
tions that manifest themselves at time scales larger
The basis of the protocol is that each node period-
than the frequency at which the distribution tree
ically checks in with the node directly above it in
reorganizes. For example, a tree that is optimized
the tree. If a child fails to contact its parent within
for bandwidth eﬃcient content delivery during the
a preset interval, the parent will assume the child
day may be signiﬁcantly suboptimal during the
and all its descendants have “died”. That is, either
overnight hours (when network congestion is typ-
the node has failed, an intervening link has failed, or
ically lower). The ability of the tree protocol to
the child has simply changed parents. In any case,
automatically adapt to these kinds of changing net-
the parent node marks the child and its descendants
work conditions provides an important advantage
“dead” in its table. Parents never initiate contact
over simpler, statically conﬁgured content distribu-
with descendants. This is a byproduct of a design
that is intended to cross ﬁrewalls easily. All node
failures must be detected by a failure to check in,
4.3 The Up/Down Protocol rather than active probing.
To allow web clients to join a group quickly, the During these periodic check-ins, a node reports new
Overcast network must track the status of the Over- information that it has observed or been informed
cast nodes. It may also be important to report sta- of since it last checked in. This includes:
tistical information back to the root, so that content
providers might learn, for instance, how often cer-
tain content is being viewed. This section describes • “Death certiﬁcates” - Children that have
a protocol for eﬃcient exchange of information in missed their expected report time.
a tree of network nodes to provide the root of the • “Birth certiﬁcates” - Nodes that have become
tree with information from nodes throughout the children of the reporting node.
network. For our needs, this protocol must scale
sublinearly in terms of network usage at the root, • Changes to the reporting node’s “extra infor-
but may scale linearly in terms of space (all with mation.”
respect to the number of Overcast nodes). This
• Certﬁcates or changes that have been propa-
is a simple result of the relative requirements of a
gated to the node from its own children since
client for these two resources and the cost of those
its last checkin.
resources. Overcast might store (conservatively) a
few hundred bytes about each Overcast node, but
even in a group of millions of nodes, total RAM cost This simple protocol exhibits a race condition when
for the root would be under $1,000. a node chooses a new parent. The moving node’s
former parent propagates a death certiﬁcate up the 4.4 Replicating the root
hierarchy, while at nearly the same time the new
parent begins propagating a birth certiﬁcate up the
In Overcast, there appears to be the potential for
tree. If the birth certiﬁcate arrives at the root ﬁrst,
signiﬁcant scalability and reliability problems at the
when the death certiﬁcate arrives the root will be-
root. The up/down protocol works to alleviate the
lieve that the node has failed. This inaccuracy will
scalability diﬃculties in maintaining global state
remain indeﬁnitely since a new birth certiﬁcate will
about the distribution tree, but the root is still
only be sent in response to a change in the hierarchy
responsible for handling all join requests from all
that may not occur for an arbitrary period of time.
HTTP clients. The root handles such requests by
redirection, which is far less resource intensive than
To alleviate this problem, a node maintains a se-
actually delivering the requested content. Nonethe-
quence number indicating of how many times it has
less, the possibility of overload remains for particu-
changed parents. All changes involving a node are
larly popular groups. The root is also a single point
tagged with that number. A node ignores changes
that are reported to it about a node if it has already
seen a change with a higher sequence number. For
To address this, overcast uses a standard technique
instance, a node may have changed parents 17 times.
used by many popular websites. The DNS name of
When it changes again, its former parent will propa-
the root resolves to any number of replicated roots
gate a death certiﬁcate annotated with 17. However,
in round-robin fashion. The database used to per-
its new parent will propagate a birth certiﬁcate an-
form redirections is replicated to all such roots. In
notated with 18. If the birth certiﬁcate arrives ﬁrst,
addition, IP address takeover may be used for imme-
the death certiﬁcate will be ignored since it is older.
diate failover, since DNS caching may cause clients
to continue to contact a failed replica. This sim-
An important optimization to the up/down protocol
ple, standard technique works well for this purpose
avoids large sets of birth certiﬁcates from arriving
because handling joins from HTTP clients is a read-
at the root in response to a node with many de-
only operation that lends well to distribution over
scendants choosing a new parent. Normally, when
a node moves to a new parent, a birth certiﬁcate
must be sent out for each of its descendants to its
There remains, however, a single point of failure for
new parent. This maintains the invariant that a
the up/down protocol. The functionality of the root
node knows the parent of all its descendants. Keep
in the up/down protocol cannot be distributed so
in mind that a birth certiﬁcate is not only a record
easily because its purpose is to maintain changing
that a node exists, but that it has a certain parent.
state. However the up/down protocol has the use-
ful property that all nodes maintain state for nodes
Although this large set of updates is required, it is
below them in the distribution tree. Therefore, a
usually unnecessary for these updates to continue
convenient technique to address fault tolerance is to
far up the hierarchy. For example, when a node
specially construct the top of the hierarchy.
relocates beneath a sibling, the sibling must learn
about all of the node’s descendants, but when the Starting with the root, some number of nodes are
sibling, in turn, passes these certiﬁcates to the orig- conﬁgured linearly, that is, each has only one child.
inal parent, the original parent notices that they do In this way all other overcast nodes lie below these
not represent a change and quashes the certiﬁcate top nodes. Figure 2 shows a distribution tree in
from further propagation. which the top three nodes are arranged linearly.
Each of these nodes has enough information to act
Using the up/down protocol, the root of the hi- as the root of the up/down protocol in case of a fail-
erarchy will receive timely updates about changes ure. This technique has the drawback of increasing
to the network. The freshness of the information the latency of content distribution unless special-
can be tuned by varying the length of time between case code skips the extra roots during distribution.
check-ins. Shorter periods between updates guaran- If latency were important to Overcast this would be
tee that information will make its way to the root an important, but simple, optimization.
more quickly. Regardless of the update frequency,
bandwidth requirements at the root will be propor- “Linear roots” work well with the need for replica-
tional to the number of changes in the hierarchy tion to address scalability, as mentioned above. The
rather than the size of the hierarchy itself. set of linear nodes has all the information needed to
the distribution tree built by the tree protocol.
Data is moved between parent and child using TCP
streams. If a node has four children, four separate
connections are used. The content may be pipelined
through several generations in the tree. A large ﬁle
or a long-running live stream may be in transit over
tens of diﬀerent TCP streams at a single moment,
in several layers of the distribution hierarchy.
Figure 2: A specially conﬁgured distribution topology that
If a failure occurs during an overcast, the distri-
allows either of the grey nodes to quickly stand in as the root
(black) node. All ﬁlled nodes have complete status informa- bution tree will rebuild itself as described above.
tion about the unﬁlled nodes. After rebuilding the tree, the overcast resumes for
on-demand distributions where it left oﬀ. In order
to do so, each node keeps a log of the data it has
perform Overcast joins, therefore these nodes are received so far. After recovery, a node inspects the
perfect candidates to be used in the DNS round- log and restarts all overcasts in progress.
robin approach to scalability. By choosing these
Live content on the Internet today is typically
nodes, no further replication is necessary.
buﬀered before playback. This compensates for mo-
4.5 Joining a multicast group mentary glitches in network throughput. Overcast
can take advantage of this buﬀering to mask the
To join a multicast group, a Web client issues an failure of a node being used to Overcast data. As
HTTP GET request with the URL for a group. The long as the failure occurs in a node that is not at the
hostname of the URL names the root node(s). The edge of the Overcast network, an HTTP client need
root uses the pathname of the URL, the location of not ever become aware that the path of data from
the client, and its database of the current status of the root has been changed in the face of failure.
the Overcast nodes to decide where to connect the
client to the multicast tree. Because status informa- 5 Evaluation
tion is constantly propagated to the root, a decision
may be made quickly without further network traf- In this section, the protocols presented above are
ﬁc, enabling fast joins. evaluated by simulation. Although we have de-
Joining a group consists of selecting the best server ployed Overcast in the real world, we have not yet
and redirecting the client to that server. The de- deployed on a suﬃciently large network to run the
tails of the server selection algorithm are beyond experiments we have simulated.
the scope of this paper as considerable previous To evaluate the protocols, an overlay network is sim-
work [3, 18] exists in this area. Furthermore, Over- ulated with increasing numbers of overcast nodes
cast’s particular choices are constrained consider- while keeping the total number of network nodes
ably by a desire to avoid changes at the client. With- constant. Overcast should build better trees as
out such a constraint simpler choices could have more nodes are deployed, but protocol overhead
been made, such as allowing clients to participate may grow.
directly in the Overcast tree building protocol.
We use the Georgia Tech Internetwork Topology
Although we do not discuss server selection here, a Models  (GT-ITM) to generate the network
number of Overcast’s details exist to support this topologies used in our simulations. We use the
important functionality, however it may actually be “transit-stub” model to obtain graphs that more
implemented. A centralized root performing redi- closely resemble the Internet than a pure random
rections is convenient for an approach involving construction. GT-ITM generates a transit-stub
large tables containing collected Internet topology graph in stages, ﬁrst a number of random back-
data. The up/down algorithm allows for redirec- bones (transit domains), then the random structure
tions to nodes that are known to be functioning. of each back-bone, then random “stub” graphs are
attached to each node in the backbones.
4.6 Multicasting with Overcast
We use this model to construct ﬁve diﬀerent 600
We refer to reliable multicasting on an overcast net- node graphs. Each graph is made up of three tran-
work as “overcasting”. Overcasting proceeds along sit domains. These domains are guaranteed to be
connected. Each transit domain consists of an aver-
Fraction of possible bandwidth achieved
age of eight stub networks. The stub networks con- 1.0
tain edges amongst themselves with a probability of
0.5. Each stub network consists of an average of 25 0.8
nodes, in which nodes are once again connected with
a probability of 0.5. These parameters are from the 0.6
sample graphs in the GT-ITM distribution; we are Random
unaware of any published work that describes pa- 0.4
rameters that might better model common Internet
We extended the graphs generated by GT-ITM 0.0
with bandwidth information. Links internal to 0 200 400 600
the transit domains were assigned a bandwidth Number of overcast nodes
of 45Mbits/s, edges connecting stub networks to
Figure 3: Fraction of potential bandwidth provided by
the transit domains were assigned 1.5Mbits/s, ﬁ- Overcast.
nally, in the local stub domain, edges were assigned
100Mbit/s. These reﬂect commonly used network
technology: T3s, T1s, and Fast Ethernet. All the sum of all nodes’ bandwidths back to the root
measurements are averages over the ﬁve generated in an optimal distribution tree using router-based
topologies. software. This indicates how well Overcast performs
compared to IP Multicast.
Empirical measurements from actual Overcast
The main observation is that, as expected, the back-
nodes show that a single Overcast node can eas-
bone strategy for placing Overcast nodes is more
ily support twenty clients watching MPEG-1 videos,
eﬀective than the random strategy, but the results
though the exact number is greatly dependent on
of random placement are encouraging nonetheless.
the bandwidth requirements of the content. Thus
Even a small number of deployed Overcast nodes,
with a network of 600 overcast nodes, we are simu-
positioned at random, provide approximately 70%-
lating multicast groups of perhaps 12,000 members.
80% of the total possible bandwidth.
5.1 Tree protocol It is extremely encouraging that, when using the
backbone approach, no node receives less bandwidth
under Overcast than it would receive from IP Mul-
The eﬃciency of Overcast depends on the position-
ticast. However some enthusiasm must be withheld,
ing of Overcast nodes. In our ﬁrst experiments, we
because a simulation artifact has been left in these
compare two diﬀerent approaches to choosing po-
numbers to illustrate a point.
sitions. The ﬁrst approach, labelled “Backbone”,
preferentially chooses transit nodes to contain Over- Notice that the backbone approach and the random
cast nodes. Once all transit nodes are Overcast approach diﬀer in eﬀectiveness even when all 600
nodes, additional nodes are chosen at random. This nodes of the network are Overcast nodes. In this
approach corresponds to a scenario in which the case the same nodes are participating in the proto-
owner of the Overcast nodes places them strategi- col, but better trees are built using the backbone
cally in the network. In the second, labelled “Ran- approach. This illustrates that the trees created by
dom”, we select all Overcast nodes at random. This the tree-building protocol are not unique. The back-
approach corresponds to a scenario in which the bone approach fares better by this metric because
owner of Overcast nodes does not pay attention to in our simulations backbone nodes were turned on
where the nodes are placed. ﬁrst. This allowed backbone nodes to preferrentially
form the “top” of the tree. This indicates that in
The goal of Overcast’s tree-building protocol is to future work it may be beneﬁcial to extend the tree-
optimize the bottleneck bandwidth available back building protocol to accept hints that mark certain
to the root for all nodes. The goal is to provide nodes as “backbone” nodes. These nodes would
each node with the same bandwidth to the root that preferentially form the core of the distribution tree.
the node would have in an idle network. Figure 3
compares the sum of all nodes’ bandwidths back to Overcast appears to perform quite well for its in-
the root in Overcast networks of various sizes to tended goal of optimizing available bandwidth, but
|| Lease = 20 Rounds
Random 40 | Lease = 10 Rounds ||
3 Lease = 5 Rounds ||
20 | |
0 200 400 600 0 200 400 600
Number of overcast nodes Overcast nodes
Figure 4: Ratio of the number of times a packet must “hit Figure 5: Number of rounds to reach a stable distribution
the wire” to be propagated through an Overcast network to a tree as a function of the number of overcast nodes and the
lower bound estimate of the same measure for IP Multicast. length of the lease period.
it is reasonable to wonder what costs are associated that network load is more telling for Overcast. That
with this performance. is, Overcast has quite low scores for average stress,
but that metric does not describe how often a longer
To explore this question we measure the network
route was taken when a shorter route was available.
load imposed by Overcast. We deﬁne network load
to be the number of times that a particular piece of Another question is how fast the tree protocol con-
data must traverse a network link to reach all Over- verges to a stable distribution tree, assuming a sta-
cast nodes. In order to compare to IP Multicast
ble underlying network. This is dependent on three
Figure 4 plots the ratio of the network load imposed parameters. The round period controls how long a
by Overcast to a lower bound estimate of IP Mul- node that has not yet determined a stable position
ticast’s network load. For a given set of nodes, we
in the hierarchy will wait before evaluating a new set
assume that IP Multicast would require exactly one of potential parents. The reevaluation period deter-
less link than the number of nodes. This assumes
mines how long a node will wait before reevaluating
that all nodes are one hop away from another node,
its position in the hierarchy once it has obtained a
which is unlikely to be true in sparse topologies, but stable position. Finally the lease period determines
provides a lower bound for comparison.
how long a parent will wait to hear from a child
Figure 4 shows that for Overcast networks with before reporting the child’s death.
greater than 200 nodes Overcast imposes somewhat
less than twice as much network load as IP Multi- For convenience, we measure all convergence times
cast. In return for this extra load Overcast oﬀers in terms of the fundamental unit, the round time.
reliable delivery, immediate deployment, and future We also set the reevaluation period and lease pe-
ﬂexibility. For networks with few Overcast nodes, riod to the same value. Figure 5 shows how long
Overcast appears to impose a considerably higher Overcast requires to converge if an entire Overcast
network load than IP Multicast. This is a result of network is simultaneously activated. To demon-
our optimistic lower bound on IP Multicast’s net- strate the eﬀect of a changing reevaluation and lease
work load, which assumes that 50 randomly placed period, we plot for the “standard” lease time—10
nodes in a 600 node network can be spanned by 49 rounds, as well as longer and shorter periods. Lease
links. periods shorter than ﬁve rounds are impractical be-
cause children actually renew their leases a small
Another metric to measure the eﬀectiveness of an random number of rounds (between one and three)
application-level multicast technique is stress, pro- before their lease expires to avoid being thought
posed in . Stress indicates the number of times dead. We expect that a round period on the order of
that the same data traverses a particular physical 1-2 seconds will be practical for most applications.
link. By this metric, Overcast performs quite well
with average stresses of between 1 and 1.2. We do We next measure convergence times for an existing
not present detailed analysis of Overcast’s perfor- Overcast network in which overcast nodes are added
mance by this metric, however, because we believe or fail. We simulate overcast networks of various
|| Ten new nodes
40 | Five new nodes || || 10 new node
One new node | 5 new node
|| Ten nodes fail 40 1 new node
30 || ||
| Five nodes fail |
One node fails | || ||
20 ||| | || ||
|| ||| ||
|| || ||| 20 ||
| || ||| || |
| | | | |
10 | | | |
0 200 400 600 0 200 400 600
Overcast nodes Overcast nodes (before additions)
Figure 6: Number of rounds to recover a stable distribution Figure 7: Certiﬁcates received at the root in response to
tree as a function of the number of nodes that change state node additions.
and the number of nodes in the network.
used include the size of the overcast network and
sizes until they quiesce, add and remove Overcast the rate of topology changes. Topology changes oc-
nodes, and then simulate the network until it qui- cur when the properties of the underlying network
esces once again. We measure the time, in rounds, change, nodes fail, or nodes are added. Therefore
for the network to quiesce after the changes. We the up/down algorithm is evaluated by simulating
measure for various numbers of additions and re- overcast networks of various sizes in which various
movals allowing us to assess the dependence of con- numbers of failures and additions occur.
vergence on how many nodes have changed state.
We measure only the backbone approach. To assess the up/down protocol’s ability to provide
timely status updates to the root without undue
Figure 6 plots convergence times (using a 10 round overhead we keep track of the number of certiﬁcates
lease time) against the number of overcast nodes in (for both “birth” and “death”) that reach the root
the network. The convergence time for node fail- during the previous convergence tests. This is in-
ures is quite modest. In all simulations the Over- dicative of the bandwidth required at the root node
cast network reconverged after less than three lease to support an overcast network of the given size and
times. Furthermore, the reconvergence time scaled is dependent on the amount of topology change in-
well against both the number of nodes failing and duced by the additions and deletions.
the total number of nodes in the overcast network.
In neither case was the convergence time even lin- Figure 7 graphs the number of certiﬁcates received
early aﬀected. by the root node in response to new nodes being
brought up in the overcast network. Remember, the
For node additions, convergence times do appear root may receive multiple certiﬁcates per node ad-
more closely linked to the size of the Overcast net- dition because the addition is likely to cause some
work. This makes intuitive sense because new nodes topology reconﬁguration. Each time a node picks
are navigating the network to determine their best a new parent that parent propagates a birth cer-
location. Even so, in all simulations fewer than tiﬁcate. These results indicate that the number
ﬁve lease times are required. It is important to of certiﬁcates is quite modest: certainly no more
note that an Overcast network continues to func- than four certiﬁcates per node addition, usually ap-
tion even while stabilizing. Performance may be proximately three. What is more important is that
somewhat impacted by increased measurement traf- the number of certiﬁcates scales more closely to the
ﬁc and by TCP setup and tear down overhead as number of new nodes than the size of the overcast
parents change, but such disruptions are localized. network. This gives evidence that overcast can scale
to large networks.
5.2 Up/Down protocol
Similarly, Overcast requires few certiﬁcates to react
The goal of the up/down algorithm is to minimize to node failures. Figure 8 shows that in the common
the bandwidth required at the root node while main- case, no more than four certiﬁcates are required per
taining timely status information for the entire net- node failure. Again, because the number of certiﬁ-
work. Factors that aﬀect the amount of bandwidth cates is proportional to the number of failures rather
that Overcast networks work well on large-scale net-
|| works, supporting multicast groups of up to 12,000
|| || 10 node failures
| | 5 node failures members. Given these results and the low cost for
40 1 node failure Overcast nodes, we believe that putting computa-
tion and storage in the network fabric is a promis-
|| ing approach for adding new services to the Internet
20 || || || incrementally.
| | | |
0 200 400 600
Overcast nodes (before deletions) We thank Hari Balakrishnan for helpful input
concerning the tree building algorithm; Suchitra
Figure 8: Certiﬁcates received at the root in response to
node deletions. Raman, Robert Morris, and our shepherd, Fred
Douglis, for detailed comments that improved our
presentation in many areas; and the many anony-
than the size of the network, Overcast appears to of- mous reviewers whose reviews helped us to see our
fer the ability to scale to large networks. work with fresh eyes.
On the other hand, Figure 8 shows that there are
some cases that fall far outside the norm. The large
spikes at 50 and 150 node networks with 5 and 10 References
failures occurred because of failures that happened
to occur near the root. When a node with a sub-  FastForward Networks’ broadcast overlay archi-
stantial number of children chooses a new parent tecture. Technical report, FastForward, 2000.
it must convey it’s entire set of descendants to its www.ffnet.com/pdfs/BOA-whitepaperv6.PDF.
new parent. That parent then propagates the entire  Elan Amir, Steven McCanne, and Randy H. Katz.
set. However, when the information reaches a node An active service framework and its application to
that already knows the relationships in question, the real time multimedia transcoding. In Proc. ACM
update is quashed. In these cases, because the re- SIGCOMM Conference (SIGCOMM ’98), pages
conﬁgurations occurred high in the tree there was 178–190, September 1998.
no chance to quash the updates before they reached  Yair Amir, Alec Peterson, and David Shaw. Seam-
the root. In larger networks such failures are less lessly selecting the best copy from Internet-wide
likely. replicated web servers. In The 12th International
Symposium on Distributed Computing (DISC’98),
pages 22–23, September 1998.
6 Conclusions  Michael Baentsch, Georg Molter, and Peter Sturm.
Introducing application-level replication and nam-
We have described a simple tree-building protocol ing into today’s web. In Proc. 5th International
that yields bandwidth-eﬃcient distribution trees for World Wide Web Conference, May 1996.
single-source multicast and our up/down protocol  A. Basso, C. Cranor, R. Gopalakrishnan, M. Green,
for providing timely status updates to the root of the C.R. Kalmanek, D. Shur, S. Sibal, C.J. Sreenan,
distribution tree in scalable manner. Overcast im- and J.E. van der Merwe. PRISM, an IP-based ar-
plements these protocols in an overlay network over chitecture for broadband access to TV and other
the existing Internet. The protocols allow Overcast streaming media. In Proc. IEEE International
networks to dynamically adapt to changes (such as Workshop on Network and Operating System Sup-
port for Digital Audio and Video, June 2000.
congestion and failures) in the underlying network
infrastructure and support large, reliable single-  Azer Bestavros. Speculative data dissemination
source multicast groups. Geographically-dispersed and service to reduce server load, network traﬃc,
businesses have deployed Overcast nodes in small- and response time in distributed information sys-
scale Overcast networks for distribution of high- tems. In Proc. of the 1996 International Conference
on Data Engineering (ICDE ’96), March 1996.
quality, on-demand video to unmodiﬁed desktops.
 M. Blaze. Caching in Large-Scale Distributed File
Simulation studies with topologies created with the Systems. PhD thesis, Princeton University, January
Georgia Tech Internetwork Topology Models show 1993.
 Anawat Chankhunthod, Peter B. Danzig, Chuck  Stefan Savage, Tom Anderson, Amit Aggarwal,
Neerdaels, Michael F. Schwartz, and Kurt J. Wor- David Becker, Neal Cardwell, Andy Collins, Eric
rell. A hierarchical Internet object cache. In Proc. Hoﬀman, John Snell, Amin Vahdat, Geoﬀ Voelker,
USENIX 1996 Annual Technical Conference, pages and John Zahorjan. Detour: A case for in-
153–164, January 1996. formed Internet routing and transport. IEEE Mi-
 Yatin Chawathe, Steven McCanne, and Eric cro, 19(1):50–59, January 1999.
Brewer. RMX: Reliable multicast for heterogeneous  Michael D. Schroeder, Andrew D. Birrell, Michael
networks. In Proc. IEEE Infocom, March 2000. Burrows, Hal Murray, Roger M. Needham,
 P. Danzig, R. Hall, and M. Schwartz. A case Thomas L. Rodeheﬀer, Edwin H. Satterthwaite,
for caching ﬁle objects inside internetworks. In and Charles P. Thacker. Autonet: A high-speed,
Proc. ACM SIGCOMM Conference (SIGCOMM self-conﬁguring local area network using point-
’93), pages 239–248, September 1993. to-point links. IEEE/ACM Trans. Networking,
9(8):1318–1335, October 1991.
 S. E. Deering. Multicast Routing in a Datagram
Internetwork. PhD thesis, Stanford University, De-  David L. Tennenhouse, Jonathan M. Smith, W.
cember 1991. David Sincoskie, David J. Wetherall, and Gary J.
Minden. A survey of active network research. IEEE
 R. Droms. Dynamic host conﬁguration protocol. Communications Magazine, 35(1):80–86, January
RFC 2131, Internet Engineering Task Force, March 1997.
 J. Touch and S. Hotz. The X-bone (white
 Paul Francis. Yoid: Your Own Internet Dis- paper). Technical report, SIS, May 1997.
tribution. Technical report, ACIRI, April 2000. www.isi.edu/x-bone.
 Ellen W. Zegura, Kenneth L. Calvert, and Samrat
 J. Gwertzman and M. Seltzer. The case for geo- Bhattacharjee. How to model an internetwork. In
graphical push-caching. In Proc. 5th Workshop on Proc. IEEE Infocom, pages 40–52, March 1996.
Hot Topics in Operating Systems (HotOS-V), pages
51–57. IEEE Computer Society Technical Commit-  Lixia Zhang, Scott Michel, Khoi Nguyen, Adam
tee on Operating Systems, May 1995. Rosenstein, Sally Floyd, and Van Jacobson. Adap-
tive web caching: Towards a new global caching ar-
 Hugh W. Holbrook and David R. Cheriton. IP chitecture. In Proc. 3rd International World Wide
multicast channels: EXPRESS support for large- Web Caching Workshop, June 1998.
scale single-source applications. In Proc. ACM SIG-
COMM Conference (SIGCOMM ’99), pages 65–78,
 Yang hu Chu, Sanjay G. Rao, and Hui Zhang. A
case for end system multicast. In Proc. ACM SIG-
METRICS Conference (SIGMETRICS ’00), June
 M. Frans Kaashoek, Robbert van Renesse, Hans
van Staveren, and Andrew S. Tanenbaum. FLIP:
an internetwork protocol for supporting dis-
tributed systems. ACM Trans. Computer Systems,
11(1):77–106, February 1993.
 D. R. Karger, E. Lehman, T. Leighton, M. Levine,
D. Lewin, and R. Panigrahy. Consistent hashing
and random trees: Distributed caching protocols
for relieving hot spots on the world wide web. In
Proc. 29th ACM Symposium on Theory of Comput-
ing, pages 654–663, May 1997.
 Steven McCanne and Van Jacobson. Receiver-
driven layed multicast. In Proc. ACM SIGCOMM
Conference (SIGCOMM ’96), pages 117–130, Au-
 J¨rg Nonnenmacher, Ernst W. Biersack, and Don
Towsley. Parity-based loss recovery for reliable
multicast transmission. In Proc. ACM SIG-
COMM Conference (SIGCOMM ’97), pages 289–
300, September 1997.