The Implementation of a Differentiated Services Architecture on by gyvwpsjkko


									       The Implementation of a Differentiated Services
            Architecture on Network Processors
                                         Tyrell Sassen, Neco Ventura, Sven Shepstone
                                 Department of Electrical Engineering, University of Cape Town,
                                            Rondebosch, Cape Town, South Africa

                                                 {tsassen, neco, sven}

                                                                           more real-time applications such as voice and video
Abstract— For IP to become a truly unified network protocol, it            transmission. Whereas in the past, many companies would
needs to provide some mechanism for guaranteeing QoS; the                  typically have a TDM voice network, an ISDN link for video
Differentiated Services (DS) architecture is one such mechanism.           conferencing and an IP network for transmitting data, it is now
DS work by classifying packets at the network edge and then
using this classification to specify the per-hop behaviour of the          becoming possible to provide all these services using a unified
packet in the network core. This means that the edge routers are           IP network [1].
required to perform complex packet processing at high speeds.
Network Processors (NPs) are discussed as a means for                            II. PROVIDING A GUARANTEED LEVEL OF SERVICE
implementing this.
                                                                              Even though it is possible to use IP for all these services, the
   NPs are software programmable devices specifically designed
to process packets at line speeds. This architecture provides a            protocol was not designed to support real-time applications. IP
good platform to implement a DS-enabled edge router. Several               was designed as a connectionless protocol and therefore does
implementation designs are described, along with their benefits            not provide any form of guaranteed bandwidth. IP networks
and limitations.                                                           are said to provide ‘best-effort’ data delivery [2]. Best-effort
   It is also noted that packet classification algorithms require          protocols do not provide full reliability between network end-
large amounts of memory, which is seen to be a problem.
                                                                           points, this is done purposefully, to move the complexity of the
                                                                           protocol to the end hosts and allow the network nodes to
  Index Terms—Differentiated Services, Quality of Service,
Network Processors.                                                        remain relatively simple. Although this class of protocol makes
                                                                           the core network scalable and easier to implement, it is limited
                         I. INTRODUCTION                                   in that it does not provide any differentiation between traffic
                                                                           on the network, which means that all packets on the network
W      HEN computer networks were first introduced a variety
       of network protocols were introduced with them. These               are given the same level of service regardless of their needs.
protocols were designed with specific services in mind, and                  A. Service Level Measures
were tailored to meet the different needs of these services.                  There are a number of parameters that can affect the service
Some of these protocols include the Internet Protocol (IP),
                                                                           level of a traffic stream, these consist of the following:
which was originally used to connect UNIX hosts to the
                                                                           network availability, bandwidth, delay, jitter, and loss [1].
Internet, the Internetwork Package Exchange (IPX) protocol, a
                                                                              Network availability can have a significant effect on the
datagram protocol developed for use in the Novell Netware
operating systems, and Systems Network Architecture (SNA),                 service level, if the network is unavailable, even for a brief
a propriety protocol developed by IBM and used by IBM-                     period of time, it would mean unpredictable performance for
compatible mainframes.                                                     the current services.
   In the last few years, we have seen a move away from this                  The second most significant factor in providing a good
protocol and service separation and towards a universal                    service level is bandwidth; this parameter is especially
network protocol that can connect a variety of devices and                 applicable to real-time applications. Real-time services
support a variety of services. IP has risen as a clear favourite           generally require a guaranteed minimum bandwidth to function
for this task, due largely to the huge popularity of the Internet.         properly. If this bandwidth is not met it will result in
Today, it is not uncommon to see IP networks being used for a              substantial service-quality degradation.
multitude of purposes, from transferring emails and files to                  Delay is defined as the transit time of a particular packet
                                                                           from network ingress point to network egress point. Most
                                                                           applications are designed to compensate for small amounts of
   The authors would like to thank Telkom SA, Siemens, the National        delay. There are two types of delay: fixed delays and variable
Research Foundation (NRF) and the Department of Trade and Industry (DTI)
for supporting this research project.                                      delays. Fixed delays are caused by application delays, such as
packet processing and encoding, and propagation delay based         amount of resources (buffer space and bandwidth) in each DS-
on the transmission distance. Variable delays are caused when       enabled node. This results in each class having a specific loss
packets are queued at network nodes.                                probability based on the rate at which packets of a particular
   Jitter is the variation in the delay between consecutive         class are being forwarded. The more resources assigned to a
packets. This has a very pronounced effect on real-time             particular class the lower the loss probability of that class.
applications, as these applications expect to receive packets at    When the traffic in a particular class starts exceeding its
a fairly constant rate, with a fairly constant delay between        profile (the temporal properties of a traffic stream such as rate
consecutive packets.                                                and burst size), the network node must then start selecting
   Loss can occur when packets get lost either due to physical      packets to drop. This is done using the drop precedence
errors, in the transmission medium, or they may be dropped          parameter of the particular class. Packets with a higher drop
due to congestion in network nodes.                                 precedence will be dropped first to try and keep the stream in
   When these parameters are outside the expected limits, an        profile.
application will receive a diminished overall level of service
                                                                      C. Overview of Differentiated Services Edge-Router
when transmitting data over the network. This is generally
perceived as a loss of quality in real-time voice/video                Part of the design philosophy behind DS was to keep the
applications and a reduced download/response time in non-           complexity at the edge of the network. This is beneficial in two
real-time applications.                                             ways; firstly it allows DS to be relatively cheap to implement
   All these parameters contribute to the quality of the service    as most of the complex equipment will only need to be added
provided by the network, and is commonly referred to as             on the edge of the network. Secondly, since the edge of
Quality of Service or QoS. The properties identified above are      network generally operates at a lower line speed then the core
part of the reason that IP is not particularly well suited for      network, DS can perform most of the packet processing here
handling certain real-time applications (e.g. VoIP, video           without sacrificing the speed of the core network.
conferencing and streaming video-on-demand). It therefore              The DS-enabled network ingress nodes1 must therefore
becomes necessary to try and rectify some of these problems,        provide a mechanism for classifying and marking data with a
which is why a large amount of research has gone into               DSCP and making sure that traffic does not exceed the profile
providing QoS over IP networks.                                     for the particular node. We can see a block diagram describing
   To try and solve this issue an IETF working group                the flow of packets through the node in Figure 1.
(Diffserv) [3] is currently drafting definitions and a framework
to provide a Differentiated Services architecture for IP. This
will effectively allow different levels of service over a shared                            Meter
IP network.
  B. Overview of Differentiated Services
   Differentiated Services (DS) provides a simple mechanism
for classifying different services [4]. This is done by assigning

each IP packet to a particular service class according to a                    Classifier               Marker
predetermined policy, this service class is known as the DS
codepoint (DSCP). The DS codepoint is stored in the DS field        Figure 1. Logical View of a Packet Classifier and Traffic Conditioner [4].
of the IP header [5] and will be used in the network core to
determine the Per-Hop Behaviour (PHB) at each DS-enabled               The node will firstly classify the incoming packet based on
network node. Per-Hop Behaviours provide a means of sharing         particular fields in the packet headers. This classification can
buffers and available bandwidth between different traffic           be done using fields from any of the packet layers (including
streams in each node. This provides a granular means of             the IP, TCP and application layers) and is done using
prioritising different network traffic.                             predefined, customisable policies on the node.
   There are currently two standard PHBs defined for DS:               Once the packets have been classified they will be marked
Expedited Forwarding (EF) [8] and Assured Forwarding (AF)           with a particular DS codepoint, which defines the class of
[7].                                                                traffic. The marker is responsible for setting this DSCP.
     1) Expedited Forwarding                                           As packets are forwarded through the node, the meter will
   EF is defined by a single DS codepoint and can be used to        measure the temporal properties of the traffic stream. These
provide a low latency, low jitter, assured bandwidth end-to-        measures will be used to ensure that the traffic remains in
end service. This service will appear to the end points as a        profile and if necessary signal the node to shape or drop the
point-to-point connection or virtual leased-line.                   stream.
     2) Assured Forwarding
   The AF PHB is defined by four classes with three levels of          1
                                                                         the terms ingress nodes and edge routers are used interchangeably in this
drop precedence per class. Each AF class is assigned a specific     document, since it is assumed that all ingress points to the network will be
                                                                    DS-enabled edge routers
   Based on information received from the meter, the                  include Intel, IBM, Cisco and Motorola [11].
shaper/dropper will then delay or drop packets (paying
                                                                        A. Overview of Network Processor Architecture
attention to the drop precedence of the currently queued
packets) to ensure that the network traffic does not exceed the          Although there are many different layouts and architectures
limits placed on it. This is done to alter the temporal properties    for NPs [11], in Figure 2 we can see a typical layout. The
of the stream and keep the traffic in profile.                        whole system consists of a series of network interfaces, a
   Once the packet has been passed through the network                controlling processor, memory, function specific hardware and
ingress node and properly marked it can be forwarded through          a set of programmable processing engines.
the network. This can be done at high speeds since core nodes
need only examine the DSCP to determine the PHB required
for that particular packet.                                                            Function Specific Hardware

   As core network speeds increase, the same process is
happening at the network edges. These advances mean that
edge routers must continue to operate at higher and higher line
speeds [13]. Add to this the significant processing power
required to classify and shape traffic for DS-enabled networks

and we can see that not only do our edge routers have to                                        PPE

operate at high speeds, but they must also provide a great deal                                  PPE
of flexibility and complexity.                                                                      PPE

   For a long time, the high-speed realm of networks has                                                                         MAC

remained the domain of Application Specific Integrated
Circuits (ASICs) and other hard-wired solutions [10]. These
solutions sacrifice programmability and flexibility to achieve
high line speeds. General-Purpose Processors (GPPs) on the                                      Memory
other hand have provided a high level of programmability and
customisation due to the fact that most of the functionality is       Figure 2. A Block Diagram Showing the Simplified Layout of a Typical
                                                                      Network Processor.
implemented in software. However, this comes at the cost of
low line speeds, as software generally runs slower then
                                                                          1) Packet Flow Through The System
hardwired solutions.
                                                                         The movement of an incoming packet through the system is
   QoS protocols like DS are relatively complex to implement
                                                                      described as follows:
and require a high-level of customisability. Not to mention the
                                                                         The NP will connect to the network using a set of standard
fact that these protocols are still very much in a development
                                                                      Media Access Controller (MAC) chips implementing a
stage and implementations will need a large degree of
                                                                      number of network ports. These ports will be used for all
flexibility if they are to remain flexible as the protocol evolves.
                                                                      incoming and outgoing packets passing through the network
Since DS edge routers will also be required to handle large
                                                                      processor. Once packets are received they will be queued for
amounts of traffic at possibly rather large network speeds; it
becomes necessary to find a solution that can provide the best
                                                                         Most NPs take advantage of the parallel nature of packet
of both worlds, providing a certain degree of programmability
                                                                      processing by implementing a number of individual
and flexibility, while still being able to function at high line
                                                                      programmable processing engines (PPE) that function in
                                                                      parallel. Each incoming packet is then assigned to an
   Luckily, the last couple of years have seen a vast amount of
                                                                      individual PPE; this allows multiple packets to be processed
research and development going into Network Processors
                                                                      simultaneously and at high speeds.
(NPs). Network Processors are software programmable
                                                                         After the packets have been processed they are sent to a
devices that are designed to specifically process network
                                                                      particular output port and queued there until they can be sent
packets at high line speeds. This is accomplished using a
                                                                      over the network.
variety of different mechanisms, which will be described later.
                                                                          2) Controlling Processor and other Hardware
   NPs provide an attractive solution to building network
                                                                         This whole process is controlled by the Controlling
devices. Along with their high-speeds and flexibility they offer
                                                                      Processor. This is usually a GPP, which is responsible for
the possibility to cut down dramatically on development time
                                                                      initialising the system and dealing with exceptional cases that
due to the fact that designing software is much faster and
                                                                      can not be handled by the PPEs.
cheaper than designing hardware with similar functionality. All
                                                                         These NPs also usually contain function specific hardware
these factors have lead to a number of companies developing
                                                                      specifically designed to provide the specialised functionality
their own network processors; some of these companies
required in packet processing, these can include packet                         C. Forked Pipeline Processing
matching, table lookup, data manipulation and queue                              The third, and probably most practical solution, would be to
management [10].                                                              assign PPEs to DS subsystems based on the complexity and
   Although not as fast as ASICs and not as versatile as GPPs,                amount of processing power needed by each subsystem. This
network processors provide a nice median between the two                      would create a forked pipeline, and allow the router to use its
extremes, combining some of the best features of both. The                    resources efficiently, not having too many wasted clock-
relationship between flexibility and performance for the                      cycles. One pitfall of this method is that it requires some kind
different solutions can be seen in Figure 3.                                  of queuing mechanism at the forking stages of the pipeline;
                                                                              this queuing could create unacceptable delays in the system.
                                                                              An example layout can be seen Figure 4c.

                GPP                                                              (a)
                                                                                PPE1                                            PPE2

                                                                                        Me                                               Me

                                                                                 C                    M                S            C             M                S
                                                                                       PPE2                                             PPE6
                                                                                       Me                                               Me
Figure 3. Performance vs. Flexibility for General Purpose Processors (GPP),
Network Processors (NP) and Application Specific Integrated Circuits
(ASIC).                                                                         PPE1             PPE3            PPE4           PPE5           PPE7        PPE8

 IV. IMPLEMENTATION OF DIFFERENTIATED SERVICES IN THE                            C                    M           S                 C            M             S
                NETWORK PROCESSOR
   The parallel architecture of the NP provides us with three                    (c)
main methods for implementing a DS-enabled edge router:                         PPE1                      PPE3                 PPE5
single stage processing, multistage pipeline processing and                                                                                    C: Classifier
                                                                                 C                        Me                    S
forked pipeline processing.


                                                                                                                                               Me: Meter
  A. Single Stage Processing                                                    PPE2                      PPE4                 PPE6            M: Marker
   The first method is to implement all DS subsystems, namely
the classifier, the marker, the meter and the shaper/dropper in                  C                        M                     S              S: Shaper/Dropper
each PPE. Therefore if we have n PPEs in our NP, we will
have n packet processing engines working in parallel. The                     Figure 4 (a) Example block layout for Single Stage Processing. (b) Example
                                                                              block layout for Mulitstage Processing. (c) Example block layout for Forked
shortcoming of this method is that to implement all                           Pipeline Processing.
functionality in each PPE will require a large amount of
processing time for each packet and could result in periods of                  It is also possible that the PPEs will not provide enough
low bandwidth utilisation, followed by short bursts of high rate              functionality to implement one or more of the DS subsystems;
data transmission. An example layout is shown in Figure 4a.                   it would then become necessary to implement these
                                                                              subsystems in the controlling processor.
  B. Multistage Pipeline Processing
                                                                                Ultimately the implementation will depend largely on the
   The second method is to implement the DS subsystems                        architecture and functionality of the NP used. It is therefore
using a multistage pipeline. This can be done by implementing                 necessary to delay any final design until this has been decided.
each DS subsystem in a different PPE, and then allowing a
packet to move through each PPE as it moves through the DS                      V. PROBLEMS WITH NETWORK PROCESSORS AND PACKET
system. This method may allow the router to have a higher                                        CLASSIFICATION
average throughput, once the pipeline stages have been filled.
                                                                                 Although NPs seem to be well suited to this task, this is not
It may also be possible to implement more than once of these
                                                                              to say that NPs are without their own problems. According to
pipelines working in parallel, depending on the number of
                                                                              [12], one of the key problems in NPs is the lack of memory.
available PPEs. Figure 4b shows an example layout for eight
                                                                              NPs are generally designed to have access to relatively small
                                                                              amounts of memory. This is due largely to the fact that in most
                                                                              NPs the time needed to access memory can often be a bottle
neck in the system. This is especially a problem when trying to
implement packet classification mechanisms, like those found
in DS, which often require packet headers to be compared
against tables of predefined rules to try and find a matching
classification. These tables are often very large and need to be
stored in memory, where they can be accessed often and

   This problem can be addressed to a large extent by using
different packet classification algorithms that don’t use large
amount of memory. Further research needs to be done into this

                            VI. CONCLUSION
   Besides the problems identified, it would seem that NPs
provide a good platform to implement DS-enabled nodes that
will not only be useful now but will hopefully offer the
possibility of being easily extendable in this ever evolving area
of research. There is still a lot of research that needs to be
done into the exact implementation of the system, but the
initial groundwork looks promising.

[1]    Nortel Networks, “An Introduction to Quality of Service (QoS)”,
       l/56058.25_022403.pdf, February 2004.
[2]    Yee’s Homepage, Quality of Service, Introduction,, July 2003.
[3]    Differentiated Services (diffserv) Charter,, September
[4]    Carlson, M., Weiss, W., Blake, S., Wang, Z., Black, D. and E. Davies,
       “An Architecture for Differentiated Services”, RFC 2475, December
[5]    Nichols, K., Blake, S., Baker, F. and D. Black, “Definition of the
       Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”,
       RFC 2474, December 1998.
[6]    Yee’s Homepage, Quality of Service, DiffServ,, July 2003.
[7]    Heinanen, J., Baker, F., Weiss, W. and J. Wroclawski, “Assured
       Forwarding PHB Group”, RFC 2597, June 1999.
[8]    Davie, B., Charny, A., Bennett, J.C.R., Benson, K., Le Boudec, J.Y.,
       Courtney, W., Davari, S., Firoiu, V. and D. Stiliadis, “An Expedited
       Forwarding PHB (Per-Hop Behavior)”, RFC 3246, March 2002.
[9]    T. Spalink, S. Karlin, L. Peterson, Y. Gottlieb, “Building a Robust
       Software-Based Router Using Network Processors”, ACM SIGOPS
       Operating Systems Review , Proceedings of the eighteenth ACM
       symposium on Operating systems principles, Volume 35 Issue 5, pages
       216-229, October 2001.
[10]   A. Heppel, “An Introduction to Network Processors”, Roke Manor
       Research Ltd. Roke Manor, Romsey, Hants, SO51, 0ZN, UK, January
[11]   N. Shah, “Understanding Network Processors”, Master's Thesis, Dept.
       of Electrical Engineering and Computer Science, Univ. of California,
       Berkeley. June 2001.
[12]   M.E. Kounavis, A. Kumar, H. Vin, R. Yavatkar and A.T. Campbell,
       “Directions in Packet Classification for Network Processors”, Second
       Workshop on Network Processors (NP2), Anaheim, California,
       February 2003.
[13]   D. Comer, “An Overview of Network Processors”, Computer Science
       Department, Purdue University, January 2004.

To top