Packet Processors new abstraction for router extensibility

Document Sample
Packet Processors new abstraction for router extensibility Powered By Docstoc
					                                 Dynamic Packet Processors
                                      A new abstraction for router extensibility

                    Gísli Hjálmtýsson, Heimir Sverrisson, Björn Brynjúlfsson, and Ólafur R. Helgason
                                                  Department of Computer Science
                                                       Reykjavík University
                                                        Reykjavík, Iceland

Abstract - The history of the Internet is that of rapid change and   introduce services into network nodes without detailed
adaptability. The simple IP service model and the informal           knowledge of the underlying hardware. Instead of proprietary
standardization process have fostered rapid evolution of systems     access to router internals, service programmers access router
and services. However, as the Internet has transformed from an       facilities through an open Service Application Programming
academic playground to an essential commercial infrastructure        Interface (SAPI).      While a significant departure from
the service model and the standardization process have become        traditional thinking on the roles of routers, the reality of the
more complex. While outsourcing and optimizations are driving        current Internet includes an ad-hoc collection of proxies, trans-
services into the network, the IETF is increasingly the              coders, load balancers, firewalls, NATs, and boundary
battleground of vendors. Consequently, introducing new services
                                                                     gateways, already blurring the distinctions between router and
to the Internet has become increasingly complicated.
                                                                     end-system functions. In part this transformation is facilitated
Active and programmable networking is an effort to reclaim the       by technology, as increasing hardware capacity reduces the
flexibility of the early Internet and reduce the need for            need for excessive optimizations and affords more intense
standardization.                                                     computing as part of packet processing. There is an emerging
                                                                     consensus to adopt a two-layer architecture similar to that of a
In this paper we introduce a new abstraction, packet processors,     computing system, where services executing within a
to evolve the lower layer router facilities and to extend the        Execution Environments (EE’s) access the Node-OS kernel
programmable interface in a type safe manner. We show how            functions via a well defined SAPI [1, 2, 3, 4].
this abstraction is sufficient to capture all facilities commonly
found in data-paths of commodity routers and how it gives the            While the exact realization of the two-layer model is still
ability to introduce new types of packet processors. Packet          being debated, there is a clear need to provide programmability
processors can add type specific methods thereby extending the       at both levels. Whereas introducing a service within an
programmable interface while maintaining the semantic integrity      Execution Environment enriches the services offered,
of the Node OS and the Execution Environment. We discuss how         programmability at the Node OS kernel level extends the
paths can be constructed from a sequence of packet processors,       service generic facilities of the (commodity) router’s
and how these paths can cross the services/node OS boundary,         forwarding engine. Whereas services are developed by service
and cross hardware processor boundaries. Our measurements            programmers irrespective of the specifics of the underlying
show that the added flexibility adds negligible overhead.            hardware, enhancements of the kernel level forwarding engine
                                                                     are developed by the respective router vendor with detailed
  I.   INTRODUCTION                                                  knowledge and with the intent of fully exploiting details of the
    The success of the Internet is a history of change and           underlying hardware.          In addition, extensible router
adaptability. The simple IP service model and the informal           architecture benefits router vendors by allowing delay in some
standardization process have fostered rapid evolution of             design decisions, promoting modularity, and supporting
systems and services. In the traditional Internet model,             dynamic upgrades, customization and bug-fixes throughout the
services are provided by users at end-systems, without the need      lifetime of the router. With technology advances rapidly
for    network     wide      consensus     or  standardization.      rendering router performance deficient, the lifecycle of a router
Standardization was limited to a small set of basic network          has become a progression from the performance focused
functionality essential for interoperability. However, as the        network core, to the wan, to the service rich edge.
Internet has transformed from an academic playground to an           Dynamically enhancing the functionality of the OS facilities to
essential commercial infrastructure the service model and the        adapt to the changing roles will greatly enhance the value of
standardization process have become more complex. While              routers throughout the lifecycle.
outsourcing and optimizations are driving services into the              Dynamic extensibility at the Node OS kernel level requires
network, the IETF is increasingly the battleground of vendors.       comparable extensibility of the programmable interface
As a consequence, introducing new services to the Internet has       (SAPI). Although most research on extensible routers have
become increasingly arduous to the level of threatening the          ignored this aspect of extensibility, extending the
future evolution of the Internet.                                    programmable interface dynamically reduces the set of
    Active and programmable networking is an effort to               interface primitives that need to be standardized thus helping in
reclaim the flexibility of the early Internet and to reduce the      facilitating the emergence of a Node OS standard. Without
need for standardization, by allowing service programmers to         extending the interface to match the enhanced functionality
services are not be able to exploit it. A good example of this is   independently schedulable segments. Unlike packet processors,
link scheduling. Whereas, a simple priority scheduler needs         however, plugins are placed at predetermined gates limiting the
only to be initialized with the number of priority levels, a        flexibility in branching and placement of functions in the path.
sophisticated hierarchical scheduler supports continuous            In essence, router plugins are designed with a particular
manipulation of the scheduling hierarchy. Similarly, while a        hardware model in mind. In contrast, packet processors are
simple IP tunnel may only support IP-in-IP encapsulation,           designed to maximize the flexibility in hardware architectures
richer tunneling abstractions may support resource                  and to isolate the hardware from the service programmers.
reservations, or mapping of tunnels to physical resources such
as optical light paths [5]. While determining and standardizing         More generally, the Crossbow/Router plugins differ
a programmable interface remains important, reducing this set       significantly in their overall approach to router
to a bare minimum lowers that barrier significantly.                programmability. In the Crossbow architecture, router plugins
                                                                    are the only means of introducing code to the router, implying
    In this paper we introduce packet processors, a new             that new services are implemented as router plugins. This
abstraction to evolve and extend the lower layer router             model therefore does not support the separation between the
facilities while extending the programmable interface in a type     services and router facilities that is the hallmark of our
safe manner. We discuss how packet processors are sufficient        architecture. Developing services therefore requires the same
to capture all facilities commonly found in data-paths of           set of skills and detailed hardware knowledge as extending the
commodity routers and how our system has the ability to             router facilities. In contrast, fundamental to packet processors
dynamically introduce new types of packet processors. Paths         is the separations of responsibilities between service
are constructed from chains of packet processors, potentially       development and the dynamic extensibility of router facilities
having multiple branches and can cross the EE/OS boundary,          provided by vendors. The Crossbow architecture gives service
and hardware processor boundaries. Packet processors can add        programmers access to type-specific functionality only in an
type specific methods thereby extending the programmable            ad-hoc manner.
interface while maintaining the semantic integrity of the Node
OS and the Execution Environment. Packet processors can do              VERA is an extensible router architecture based on Scout,
arbitrary processing, including additional classification,          exploring hierarchical router architectures built on general-
branching and more. We have implemented a number of                 purpose commodity components. Fundamental to Scout is the
interesting packet processors, and a multiple services to           abstraction of a path combining functionality and resource
validate the abstraction and the router architecture. We show       management into a single abstraction. VERA defines three
through measurements that the added modularity and late             distinguishable functions in the data-path, classification,
binding incurs negligible cost.                                     forwarding, and scheduling at an output port. A path is an
                                                                    instantiation of a forwarder, which in turns can consist of one
    The rest of the paper is organized as follows. We begin by      or more stages.
discussing related work in Section 2. In Section 3 we discuss
the PPROC system, packet processors and paths in detail. In             Packet processors share many of the attributes of Scout1. In
Section 4 we give some examples of packet processors that we        fact, the upper layers of VERA can be realized in our model,
have implemented and experimented with, specifically tunnels,       with packet processor instances and paths realizing paths of
queuing and more.           In Section 5 we discuss the             stages in Scout. VERA models the hardware layer of routers.
implementation, and in Section 6 applications and                   We have not done this, in part because we see the mapping of
experimentation, commencing by briefly discussing the Pronto        packet processors onto hardware facilities as vendors’ domain
Router in Section 6.A, discussing an implementation of a novel      where the desire for using proprietary methods is strong and a
multicast using packet processors in Section 6.B. In Section 7      driver for standardization is wanting. Our paths can have
we show measurements to demonstrate that the benefits of this       multiple branches, may branch at any packet processor in the
architecture come at a negligible performance cost. In Section      chain and consists of multiple asynchronous segments that may
8 we conclude.                                                      be scheduled separately. In contrast, in Scout all resources
                                                                    required for all stages of the forwarder must be scheduled
 II.   RELATED WORK                                                 before a packet is processed. Scout paths are point-to-point,
                                                                    and are described by a linear string of modules whose
    While significant amount of research on active networking       instantiations constitute the forwarder. Instead, our paths can
is being conducted (for example [6, 7, 8]) this work is most        have multiple branches dynamically added and removed during
closely related to that of Router Plugins/Crossbow [9], and the     the lifetime of a path. Moreover, individual packet processors
Scout based work on extensible routers [10, 11].                    may be added or removed dynamically to/from an active
    The work on Router Plugins [9] is an approach to router         packet processors path.
programmability where router modules can be plugged in at the           Although on the surface the Scout router API is simple –
kernel level, thus customizing a router without the performance     introducing only three methods – each method contains one or
penalties of user level protections. In this way, router plugins    more magic parameters. In particular, the createPath, takes as
are similar to kernel level packet processors. In both models,      an argument a forwarder followed by forwarder parameters,
security and safety is provided externally to the programmable      that are passed to the forwarder. Forwarders are typically
model. However, packet processors are not limited to the
kernel level. Like packet processors router plugins encapsulate
arbitrary processing, classification, processing, forwarding and     Unless specifically applicable to Scout and not to VERA, or
scheduling, and thus can segment the data path into                 vice versa, we refer to both as Scout.
composed from multiple modules; extensibility is achieved by        A. The PPROC System
introducing new modules. We fail to see how access to type              The Packet Processor System consists of three main
specific functionality can be provided across this interface in a   components: Type management, type specific operations and
type safe manner, without substantial additional machinery. In      instance operations. Each component is implemented in part at
contrast, with packet processors type specific methods              the kernel level, with a complimentary services (user) level
maintain program invariants and the semantic integrity of the       functionality. All aspects of the PPROC system are controlled
router system even across the EE/OS boundary.                       and managed from user level, either from the services
    We are inspired by the Click work [12]. Their performance       executing in an EE, or as part of the general EE support.
numbers are our envy. However, while Click elements give            Moreover, for each kernel level packet processor instance there
modularity the Click router is not dynamically programmable,        exists a corresponding proxy instance within the service
and thus not extensible in the sense that Crossbow, Scout or the    application.
work described herein are. Moreover, Click elements are tied            Type management is responsible for the introduction of
together at configuration time. In contrast new packet              new types, and removal of obsolete types. The user level type
processors can be dynamically installed over the network; new       management daemon supports dynamic installation over the
paths can be constructed on a per flow basis; new branches,         network, and is otherwise responsible for managing a kernel
and packet processors can be added and removed from paths           level type map and installed modules and libraries. The user
during the lifetime of a path. Like our system Click offers         level type manager implements install, remove and list
access to element specific functions; this is achieved through      methods. For kernel level packet processor modules these
handlers and the Linux proc file system. In contrast, packet        have semantics analogous to that of Linux insmod, rmmod and
processors use late binding of system calls to offer a typed        lsmod respectively, but supporting network wide names
interface to processor specific functions.                          (URL’s). User level packet processors can be similarly
 III.   PACKET PROCESSOR AND PATHS                                  managed and dynamically installed using dynamically linkable
                                                                    libraries as described in [13]. The user level type manager is
    Abstracting the functions of the data-path, a packet            furthermore responsible for cleanup, caching (libraries and
processor (PPROC) is an entity where packets arrive, are            kernel modules) and soft-state maintenance. Although the
processed and then delivered either to an output device or to       presence of a packet processor type is soft timeout intervals are
another packet processor. Abstractly, a packet processor is an      very large. The kernel level type manager maintains a
abstract data type (a class) implementing a common interface,       simplified version of what the EE level daemon maintains,
potentially having type and instance specific state and methods,    mapping type names of installed packet processor types to the
and performing arbitrary functions. Examples of packet              corresponding type handle. The kernel level type manager
processors include basic IP forwarding, tunnel entry, queuing       exports two methods to the upper layer, get_handle, and
and scheduling, reclassification and more. The only functions       remove as follows:
in the traditional router data-path that packet processors do not
subsume are classification on (physical) input devices and the        interface PPROC {
                                                                        get_handle( type_name ) returns type_handle
device drivers.                                                         remove( type_handle )
    An important part of packet processors is the ability to          }
introduce and export type specific methods to the services          Given a valid type name, get_handle returns the requested type
layer, thereby extending the programmable interface. Services       handle if one exists. Remove invalidates the type preventing
manage, manipulate and interact with packet processor               further instance creation of that type. Instances have access to
implementations through service level library objects. For          the validity status of their respective types. Implementations
kernel level packet processors these are effectively proxy          are responsible for determining if invalidation triggers the
objects.                                                            destruction of existing instances automatically. Note that the
    Packet processors are chained together as needed to make        type management daemon, and/or the service application can
paths through the router. A simple path consists of basic IP        remove instances at any time.
forwarding, followed by a FIFO queue at an outgoing device.            At kernel level, the type manager exports in addition a
A path may have multiple branches, and may branch at an             method to register a new type, which all packet processor types
arbitrary packet processor in the chain. Multiple branches or       must call upon installation to register their presence:
paths may converge onto a single packet processor, e.g, at a
link scheduler. Packet processors may be “synchronous” –              interface PPROC_k extends PPROC {
                                                                        register( type_name, type_handle )
implying immediate execution – or “asynchronous” – where an
arriving packet is queued and scheduled for subsequent
processing. This enables paths to cross the Node OS/EE                   Type specific operations (class methods) are dispatched to
boundary as well as physical processor boundaries, while            the corresponding type object. The only method required by
maintaining both the packet processor and the path abstraction      all types of packet processors is a create method, returning an
intact. A path is assigned to a filter in the packet classifier.    instance of the specified type.
Multiple filters may map to the same path.
                                                                        Packet processors implement two interfaces: a standard
                                                                    packet processor interface shared by all packet processors, and
                                                                    for all but the most trivial types of packet processors a type
specific service interface, thereby extending the SAPI. The           equivalence class is defined as E = { p | F0(p) or F1(p) … Fn(p)
standard interface contains two methods: the in-data-path             }. Each equivalence class maps to a path descriptor, each path
method arrive, taking a packet control block as a parameter,          descriptor having a set of resource allocators (potentially
and a link-to method to enable path construction, taking as a         buffer, bandwidth, and CPU), and a list of branches, plus some
parameter the target object to link to. All packet processors         additional state. All other functionality is absorbed into the
implement a (virtual) destructor. Due to factory construction,        respective packet processors. Our system supports multiple
instance creation is without parameters, requiring in almost all      classifiers and multiple instances of different classifiers, with
cases at least a type specific init function to be defined.           instances assigned to interfaces or packet processors.
Instance methods implemented by packet processors at kernel           Moreover, we have moved packet classification into the device
level, follow an RPC pattern from the services level proxy            drivers as part of the receive interrupt handling routine.
object to the kernel level implementation. The proxy object           Although strictly speaking not necessary for the PPROC
maintains an instance handle, that is provided as a parameter in      system, doing so enables finer grained resource management
each system call, and mapped to the kernel level instance using       and other optimizations. The packet classifiers implement and
the instance map maintained by the type manager.                      export the following interface to services:
    The PPROC system bootstraps with a very small interface;            interface classifier {
only a total of six methods are defined by the standard packet            add( filter_definition, path_descriptor ) returns f_handle
processor interface. All type specific methods are late bound             remove( f_handle )
and dispatched by the type introducing the method. This is
also true across the service/kernel boundary. In particular, our      where add creates a new entry in the classifier, associating a
implementation introduces only three proper PPROC system              given path descriptor to the filter definition specified, and
calls.                                                                remove undoes a previous add.
    Packet processors provide a very strong abstraction, hiding           Implementations of classifiers furthermore implement and
all implementation details of packet processors. A service            export to the Node OS a lookup function performing the
programmer manages and manipulates packet processor                   classification, i.e., mapping a packet to a path descriptor. This
objects, regardless of the where and how that particular              is described by:
processor is implemented. Moreover, the implementation of
                                                                        interface k_classifier extends classifier {
particular types of packet processors can differ between                  lookup( lookup_buffer ) returns path_descriptor;
platforms and can depend on hardware configurations without             }
affecting the service applications.
                                                                      where the lookup_buffer is a pointer to the beginning of the
    Paths can be broken into segments by asynchronous packet          data buffer containing the “value” (V). To perform the
processors. This gives added flexibility in scheduling and in         classifier lookup before the whole packet is read into memory
where processing takes place without exposing added                   (whose allocator is determined by the lookup) the device driver
complexity to service programmers. Rather than waiting for            modifications employ “frame peeking” to view the first few
resources for the whole path to be available, progress can be         bytes of the data (64 bytes in our current implementation).
made on a per segment basis. A common example is where a
path crosses the EE/OS boundary. The ability to segment paths           1) Path descriptors
facilitates parallelism and is particularly suitable for                 Path descriptors contain references to resource allocators
hierarchical hardware architectures. A practical example is a         (potentially buffer, bandwidth, and CPU), a list of branches,
high performance router, with control plane and the service           plus some additional state. Path descriptors are managed and
level execution environment implemented on an attached                manipulated via the following interface:
workstation.      Another example is routers employing                  interface path_descriptor {
sophisticated interface cards with powerful network processors            create( set of allocators ) returns path_handle
[14]. To fully exploit such hardware the interface vendor                 add_branch(path_handle, pproc_type, pproc_instance)
would supply a set of packet processors to execute on the                      returns branch_handle
network processor, effectively placing segments of paths onto             remove_branch(path_handle, branch_handle)
the network processor. On a more elementary router the same             }
might be processed by the routers single CPU.
                                                                       IV.   EXAMPLES OF PACKET PROCESSORS
B. The Classifier
                                                                      A. Basic forwarding
    The packet classifier separates the packet stream into flows.
Abstractly, the classifier maps packets into equivalence classes          Basic forwarding piggybacks on Linux’s standard
based on attributes of the packet. Typically these attributes         forwarding. Three versions of the wrapper exists, the default
include the IP header and the transport level header, but may         basic forwarding, an address specific forwarding with flow
include other attributes such as IP options. The equivalence          pinnig, and an address specific forwarding without flow
relation is defined by a filter, F, so that two packets, pi and pj,   pinning. The first, used as part of the default path when the
are equivalent, iff F(pi) = F(pj). We define the equivalence          classifier does not return a specific entry, simply wraps the
classes in terms of a byte mask, M, and a value V, such that          functions of the Linux forwarding path, i.e., routing table
F(p) = ( p & M = V ), where the & is the bitwise and operator.        lookup and elementary forwarding functions (ip_forward) into
Multiple filters may map to the same equivalence class, i.e., an      a packet processor implementing the PPROC interface. The
latter two are used for forwarding on destination specific paths,   send form local, link protection, and various snoop and
such as tunnels, multicast and more, and are on paths               instrumentation packet processors.
corresponding to a specific entry in the classifier. Flow
pinning implies that the output port does not change during the     E. Service level packet processors:
lifetime of the path, whereas without flow pinning forwarding       A more limited flora of service level packet processors exist,
to the specific destination reflects current routing information    so far targeted on functional verification and experimentation.
for that destination.                                               However, wrappers for service level UDP and TCP protocol
                                                                    processing are being developed, and we have embarked on
B. Tunnels
                                                                    implementing IPSEC at the services layer.
    Using packet processors tunnel exit and tunnel entry are
implemented as separate types of packet processors, as               V.    IMPLEMENTATION
opposed to the traditional virtual device driver implementation.        We have built a prototype implementation of the PPROC
Tunnel exit is trivial to implement, but requires post exit         system on the Pronto Router (described below), an active and
reclassification of the originally encapsulated packet. Post exit   programmable software router based on Linux. The PPROC
classification can either employ the classifier at the incoming     system is realized as one main module at user and kernel level,
interface, or a tunnel specific classifier, thus enabling           implementing the type management and kernel level instance
accounting and filtering of the tunneled data. The tunnel entry     mapping, plus a user level library included by services. We
packet processor exploits the flexibility of allocators ensuring    have a number of modules implementing classifiers, and the
that sufficient buffer head-room is allocated on entry, thus        various types of packet processors. Currently, the user level
avoiding copying the data to perform the encapsulation. Thus        library and the service level packet processors only exist for
using standard packet processor facilities we achieve line rate     C++.
tunnel entry and exit. Basic tunnel entry is initialized with the
target (remote end of tunnel) IP address. More elaborate                All kernel level implementation is done in kernel modules
tunnels may support resource reservations or an association to      using C++. Our kernel level implementations exploit the
physical resources such as optical light paths [5].                 powerful abstractions of C++ to the fullest – strong typing,
                                                                    inheritance, virtual functions, and run time type checking. We
C. Queuing and scheduling                                           are currently instrumenting the use of kernel level exceptions
    Scheduling is a good example of where the SAPI needs            to determine where their use is affordable. All packet
type specific methods. Whereas a simple priority scheduler          processors implement a common interface, defined in C++ as
only needs to be initialized with the number of priority levels,    an abstract class of pure virtual functions, from which all
richer scheduling disciplines, such as Class Based Queuing          packet processors inherit. This interface is shown in Figure 1.
(CBQ) [15], or other hierarchical schedulers require the            New types extend the SAPI by adding to this interface. For
ongoing manipulation and maintenance of the scheduling              kernel level packet processors, each type specific method
hierarchy. We have implemented a simple priority scheduler          exported as part of the type specific service interface translates
and a fine-grained hierarchical start time fair scheduling that     into a corresponding type specific system call.
provides cumulatively predictable scheduling (for details see
[16]). The interface of the latter provides a general interface         To enable us to keep using the latest available Linux kernel,
for hierarchical schedulers. The bandwidth allocator assigns        we have strived to ensure that the footprint on the kernel proper
queues upon classification.                                         is at absolute minimum. In particular our implementation

  1) A hierarchical link scheduler                                   class pproc_iface {
    Hierarchical schedulers require the ongoing manipulation         public:
and maintenance of the scheduling hierarchy. For each output              pproc_iface() {}
device governed by the hierarchical scheduler the scheduler is            virtual ~pproc_iface() {}
initialized, creating a root for the scheduling hierarchy and a
default FIFO queue. Our hierarchical link scheduler interface             virtual int arrive(struct sk_buff *skb)                    =0;
                                                                          virtual int link_to( int pptype_id, int ppinst_id )        =0;
adds to the SAPI the following seven primitives:
                                                                          virtual const char *type_name()                            =0;
  interface hierarchical_scheduler {                                      virtual int destroy()                                      =0;
    init ( device )                                                  };
    get_root( device ) return node
    add( path/name, weight ) return internal-node
    add_leaf( path/name, weight, Qpolicy ) return leaf-node                     Figure 1. The abstract packet processor interface.
    find( path/name ) return node
    remove( node )                                                  introduces only four proper system calls into the kernel’s
    attach( queue [, weight ] )                                     global system calls table, three for the PPROC systems and one
  }                                                                 for classifiers. No new entries are added to this table when
D. Other examples                                                   packet processor specific methods are introduced. In fact,
                                                                    doing so would require a rebuild of the kernel. Instead, all type
   We have implemented an array of kernel level packet              or instance specific functions crossing the EE/OS boundary
processors, including NAT processor (full header swap               (i.e., system calls) are dispatched into the implementing
capability), dev/null for discarding packets, restart to restart    module, where the call is late bound to the appropriate method.
processing of a packet form arrival classifier, local delivery,
    We have experimented with techniques to allow zero copy
forwarding for paths crossing into EE’s. This is of interest to
some of our ongoing work, including implementing IPSEC at                  Environment                        Active
the services level. While sharing of pools of memory is rather               manager                         Services
simple we are still experimenting with how to balance the
flexibility and performance against protection and security
                                                                             Signaling             Other              CPU Sched
A. The Pronto Router                                                                                    Packet Processors
    The Pronto active router architecture is depicted in
Figure 2. It consists of three major parts: a forwarding engine
providing service generic facilities, an execution environment                       Figure 2. The Pronto Router Architecture
where service specific functions reside and execute, and the
interface between the two. The most essential part of the             B. Multicast
architecture is the separation of the service generic facilities of
                                                                          As an application of Active and Programmable Networking
the forwarding engine from the service specific functions
                                                                      and to verify the applicability of packet processors we have
running in an execution environment. While this fundamental
                                                                      implemented a new IP multicast on the Pronto router using
architecture has not changed from earlier incarnations, a subtle
                                                                      packet processors [17]. Our multicast is a single source
but important difference of this architecture is in the data-path,
                                                                      multicast similar to Express [18], identifying multicast groups
where packet processors subsume all functions other than the
                                                                      with the pair 〈source, group〉, and employing dynamic
classification and how the different service models described in
[3] are realized using packet processors.                             tunneling to construct a virtual topology for the multicast
                                                                      distribution tree on a per tree basis. This way the new
    The execution environment consists of a virtual machine           multicast avoids all multicast specific infrastructure, and in
and an environment manager. The environment manager is                particular avoids multicast routing.
responsible for authenticating, installing and managing the
programs executing in its environment. In addition, the                   Implementing this multicast on the Pronto platform
environment manager dispatches control messages to programs           demonstrates the power and usability of packet processors.
in its environment. The execution environment we have used            The Topology Management Protocol is implemented as a
for prototyping and experimentation of services is the native         service level daemon avoiding all protocol complexity or
virtual machine. However, the router can support other                multicast semantics at the kernel level. A first join to the group
environments that have been proposed. Moreover the router             〈S,G〉 received at an intermediate router triggers a new path,
can support multiple execution environments simultaneously.           and a new entry in the classifier mapping 〈S,G〉 to the new
                                                                      path. If the join was sent by an immediate neighbor router the
    Below the abstraction interface is the forwarding engine          path consists of a destination specific basic forwarding packet
and other service generic operating system and hardware               processor with flow pinning. Additional joins received on
facilities. The focus of this paper is the extensibility of the       different interfaces trigger an additional branch added to the
facilities in the data-path. The data path consists of two            path descriptor. If a join is received form a non-neighbor, the
conceptual components, a flow classifier, and packet                  receiving router creates a tunnel to the downstream system
processors (see Figure 2). A simple packet path starting from         (router or receiver) using a tunnel entry processor, and
flow classifier consists of an elementary forwarding processor,       adjusting the required head-room for the path. In our
followed by an output queue on the target output device. This         experimentation with this new multicast we have implemented
mirrors the typical data path in all routers, where in traditional    additional packet processors to address practical problems
IPv4 routers the classifier classifies only on destination prefix,    encountered on the first and last hop, including a basic
the elementary forwarding processing is that of IPv4 and the          forwarding using a multicast MAC address, and UDP level
output queue is a simple FIFO queue. More recent commercial           tunnel processor. Using packet processors realizing multiple
routers support richer classification, the ability to configure       branches each consisting of different processors – forwarding,
multiple queues and provide scheduling among these queues.            tunneling, NAT, or multiple of these – has proven invaluable.
A key benefit of our new packet processor abstraction is that it
provides an integral framework to implement and exploit               C. Other applications
arbitrary functionality without complicating the architecture of          We have implemented a dynamic request driven firewall.
the data-path or the service level program model.                     In addition to traditional firewall functions an authorized user,
                                                                      presenting credentials to that effect, can have the firewall open
    Outside of the data path, the Pronto router includes a            up for flows that would not be accepted using the configured
service generic signal processor, and a resource manager. The         rule base. The firewall exploits the two layer Pronto model to
first enhances and generalizes current IP signaling to better         the fullest, implementing the rule base and making per flow
support service specific signaling. The resource manager              grant/deny decisions at the services level while employing the
allocates processors, memory and bandwidth, and ensures               classifier and NAT packet processors for enforcement.
resource isolation of the service specific programs.
          µ−secs                                                                                     Mbps
 30                                                                                         90
                                                                                            80                                                             Classifier with flow

 26                                                              Linux
                                                                 Classifier                 60
                                                                 Classifier with flow

                                                            Packet size                                                                                   Packet size
 20                                                                                         30
      0       200     400      600      800      1000     1200       1400          1600          0          200   400     600      800     1000    1200        1400          1600

          Figure 3. Latency in microseconds as a function of packet size.                              Figure 4. Throughput in Mbps as a function of packet size

VII. INSTRUMENTATION                                                                      at 20.94, a difference of just about 1%. For packets of other
                                                                                          sizes the difference is similar. We conclude that this difference
    Although optimizing performance is not the highest priority                           is negligible, is within the range we expect to reach with minor
in our research, our approach departs in significant ways from                            optimizations, and anyway acceptable for the benefits provided
traditional router architectures even for software-based routers.                         by packet processors. As shown in [12], traditional trickeries
We have instrumented our implementation of the router data-                               of polling vs. interrupts will yield more dramatic performance
path using packet processors to verify that the enhanced                                  effects.
extensibility and flexibility of packet processors does not
impede forwarding performance.             We report on two                                   We measure the throughput of the Pronto router using two
measurements of throughput and latency. For each of these we                              metrics: packets per second, and bits per second. The former
measure the Linux forwarding performance as a baseline; we                                metric peaks with minimum packet size, and is thus limited by
then instrument the Pronto router, first with the kernel patch                            the CPU. The latter is limited by network bandwidth and
and classifiers and the packet processors system installed but                            peaks just below the MTU of 1500 bytes. For each packet size
not active, and then with data forwarding through a path                                  we measure the three configurations. For our throughput
constructed from basic forwarding packet processor.                                       measurements, packets are generated from a source sent
                                                                                          through the router and discarded at the sink.             Each
    The measurement setup is the following. The router is                                 measurement batch sends one hundred thousand packages.
constructed using Linux 2.4.17, and the Pronto platform. The                              Time is measured at the router from the time the first packet is
Pronto router is a single CPU Dell PC with a 1GHz Pentium                                 received until the last packet is forwarded. As we approach the
processor using 10/100 Mbps 3Com network interfaces and                                   throughput limit packet losses begin. We require less than one
their standard Linux driver (3c59x.c). The end systems in                                 in thousand packets lost for a valid throughput measurement.
these test use the same hardware. The systems are configured
in tandem, with the router in the middle.                                                     Figure 4 shows the throughput in Mbps for various packet
                                                                                          lengths. This figure verifies that the Pronto platform with
    Latency is measured on a lightly loaded router (about 10                              packet processors has negligible impact on router performance
thousand packets per second). To instrument latency we                                    compared to using Linux unchanged. The three lines cannot be
augment the skbuff packet control block of Linux to fit five 64                           told apart. The difference of 1% observed with latency, is
bit timestamps. Time is measured in clock-cycles, with our                                insufficient to separate the throughput curves. For 64 byte
processor clocking 996,784,000 ticks per second. We measure                               packets the maximum packet rate is reached at 53.5K packets
the latency at five points in the data-path, at first touch in                            per second corresponding to 33.4Mbps. Throughput increases
device driver, at queuing out of device driver, at entry to                               rapidly until flattening out as the link saturates. We achieve
ip_recv – when processing resumes after interrupt handling, at                            performance of above 96Mbps with only 256 byte packets, and
queuing at the output device, and immediately after writing the                           at 99Mbps at 1024 bytes. We conclude that even with
packet on the wire on the output device.                                                  moderate packet sizes we can afford some additional
    Figure 3 shows the total time taken to traverse the data path                         processing without impeding performance. We conclude that
from the time the packet is first touched until it has been                               the Pronto platform performs practically identically to Linux
written out on the wire. For the very smallest packets latency                            without it.
increases until the DMA threshold is reached, and the IO starts
to dominate the latency. For packets larger than 200 bytes the                            VIII. SUMMARY
latency increases linearly, with vanilla Linux having the                                     Packet processors are a new abstraction for extending OS
bottom curve, and Pronto using packet processors on top.                                  level functionality of routers and to extend the programmable
However this difference is small. For the smallest size packets                           interface correspondingly in a type safe manner. Packet
(64 bytes) where the latency for vanilla Linux is 20.72                                   processors are implemented as a strongly typed object system,
microseconds, but for Proto with packet processors is measured                            providing simple extensible programmable interface to service
programmers. Paths of packet processors can be segmented
into separately schedulable segments, and thus extend across             [15] Sally Floyd and Van Jacobson, “Link-sharing and Resource
the EE/OS boundary and across multiple processors, allowing              Management Models for Packet Networks”, IEEE/ACM
hardware vendors a consistent framework to exploit proprietary           Transactions on Networking, Vol.3, No.4, 1995
hardware capabilities.                                                   [16] P. Goyal, H.M. Vin, and H. Cheng, “Start-time Fair Queuing: A
                                                                         Scheduling Algorithm for Integrated Services Packet Switching
    We have implemented services exploiting the capabilities             Networks”, IEEE/ACM Transactions on Networking, Vol. 5, No. 5,
of packet processors, validating the power and practicality of           pp. 690-704, October 1997
the packet processor abstraction and the corresponding router            [17] Gísli Hjálmtýsson,,
architecture.                                                            2002
                                                                         [18] Hugh W. Holbrook and David R. Cheriton, “IP Multicast
    Our measurements demonstrate that the added expressive               Channels: EXPRESS Support for Large-Scale Single-Source
power and flexibility has negligible impact on router                    Applications,” in the Proceedings of ACM SIGCOMM '99,
performance.                                                             Cambridge, Massachusetts, September 1999.
[1] Larry Peterson, Yitzchak Gottlieb, Mike Hibler, Patrick
Tullmann, Jay Lepreau, Stephen Schwab, Hrishikesh Dandekar,
Andrew Purtell, and John Hartman, “A NodeOS Interface for Active
Networks.” IEEE JSAC, March 2001.
[2] S. Merugu, S. Bhattacharjee, E. Zegura and K. Calvert,
“Bowman: A Node OS for Active Networks,” Proceedings of IEEE
Infocom 2000, Tel Aviv, Israel, March 2000.
[3] Gísli Hjálmtýsson, "The Pronto Platform - A Flexible Toolkit for
Programming Networks using a Commodity Operating System," in
the Proceedings of OpenArch 2000, Tel Aviv, Israel, March 2000.
[4] J.E. van der Merwe, S. Rooney, I.M. Leslie and S.A. Crosby "The
Tempest - A Practical Framework for Network Programmability,"
IEEE Network, Vol. 12, No. 3, May/June 1998, pp.20-28.
[5] Gísli Hjálmtýsson, Jennifer Yates, Sid Chaudhuri and Albert
Greenberg, "Smart Routers – Simple Optics: An Architecture for the
Optical Internet," IEEE/OSA Journal of Lightwave Technology,
December 2000.
[6] D. Wetherall, J. Guttag, and D. L. Tennenhouse, “ANTS: A
Toolkit for Building and Dynamically Deploying Network
Protocols,” in the Proceedings of OpenArch 1998, San Francisco,
CA, April 1998.
[7] D. Scott Alexander, Marianne Shaw, Scott Nettles and Jonathan
M. Smith, “Active Bridging,” in the Proceedings of Sigcomm 1997,
Cannes, France, September 1997.
[8] Michael Hicks, Angelos D. Keromytis, and Jonathan M. Smith,
“A Secure PLAN (Extended Version),” in Proceedings, DARPA
Active Networks Conference and Exposition, IEEE Computer
Society Press, San Francisco, CA (2002), pp. 224-237.
[9] D. Decasper, Z. Dittia, G. Parulkar, B. Plattner, “Router Plugins:
A Software Architecture for Next Generation Routers.”               In
Proceedings of the ACM SIGCOMM ’98, p.p. 229-240, September
[10] Larry Peterson, Scott Karlin, and Kai Li, “OS Support for
General-Purpose Routers,” HotOS Workshop, March 1999.
[11] Scott Karlin and Larry Peterson. “VERA: An Architecture for
Extensible Routers.” Computer Networks, Vol. 38, Issue 3, February
[12] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashock.
“The Click modular router. ACM Transactions on Computer
Systems, 18(3), pp. 263-297, August 2000.
[13] Robert Gray and Gísli Hjálmtýsson, "Dynamic C++ classes - A
Lightweight mechanism to update code in a running program," in
Proceedings of the USENIX Annual Technical Conference, pp. 65-
76, June, 1998
[14] Intel Corporation, “Internet Exchange Architecture Network
Processors: Flexible, Wire-Speed Processing from the Customer
Premises to the Network Core” White Paper, 2002.