Investigating occurrence of duplicate updates in BGP announcements by NeemiaTialata

VIEWS: 15 PAGES: 11

									               Investigating occurrence of duplicate updates
                           in BGP announcements

                  Jong Han Park ∗                          Dan Jen∗                           Mohit Lad†
                jpark@cs.ucla.edu                    jenster@cs.ucla.edu                mohit.lad@nokia.com
                   Shane Amante‡                                 Danny McPherson§                   Lixia Zhang∗
              Shane.Amante@level3.com                             danny@tcb.net                 lixia@cs.ucla.edu

ABSTRACT                                                                cate’ routing updates occur in BGP today.
BGP is a hard-state protocol that uses TCP connections to re-              Existence of duplicate updates in BGP was first reported
liably exchange routing state updates between neighbor BGP              in 1998. Labovitz’ et al. [5] seminal work on BGP mea-
routers. According to the protocol, only routing changes                surements showed that the actual number of BGP updates
should trigger a BGP router to generate updates; updates                observed were an order of a magnitude more than expected.
that do not express any routing changes are superfluous and              Labovitz revealed that a large portion of the total updates
should not occur. Nonetheless, such ‘duplicate’ BGP up-                 were in fact duplicates, and he attributed this to problems
dates have been observed in reports as early as 1998 and as             with routers from specific vendors. The industry quickly re-
recently as 2007. To date, no quantitative measurement has              sponded with a software fix to address the duplicate gener-
been conducted on how many of these duplicates get sent,                ation problem, and it was believed that the fix would elim-
who is sending them, when they are observed, what impact                inate the duplicate pathology observed in [5]. However, in
they have on the global health of the Internet, or why these            2007 Li et al. [7] re-examined the health of BGP dynamics
‘duplicate’ updates are even being generated. In this paper,            and discovered that, despite industry attempts to stop dupli-
we address all of the above through a systematic assessment             cate generation, duplicates were still seen in BGP. To date,
on the BGP duplicate updates. We first show that duplicates              nobody has been able to determine the cause of these dupli-
have a negative impact on reachability and router processing            cates. There also have never been any reports on the effects,
loads. We then reveal that there is a significant number of              if any, that duplicates have on Internet health.
duplicates on the Internet - about 13% of all BGP routing                  In this paper, we make the following contributions.
updates are duplicates. Finally, through a detailed investi-               • We attempt to provide a better understanding of the na-
gation of duplicate properties, we manage to discover the                    ture of duplicate generation by quantify the amount of
major cause behind the generation of pathological duplicate                  duplicate updates from different points on the Internet.
BGP updates.                                                                 We also look at duplicates from different moments in
                                                                             time.
1.   INTRODUCTION                                                          • We reveal the impact of duplicates on Internet health.
   BGP is the de facto standard inter-domain routing proto-                  Unlike the common belief that duplicates are relatively
col used to exchange destination reachability information on                 benign, we show that they actually negatively impact
the Internet. BGP was designed as a hard-state protocol, so                  reachability and router processing load.
all BGP updates sent by a router should always communi-                    • As part of our work towards understanding duplicates,
cate some change or addition to the most current routing                     we provide a methodology for mapping eBGP updates
information reported by the router [15]. However, actual                     to their corresponding iBGP updates. This task is non-
observations of BGP dynamics reveal that routers tend to                     trivial, since timing differences can exist between cor-
occassionally send BGP updates that reveal absolutely no                     responding eBGP and iBGP updates. We believe that
change to the most current routing information reported by                   our methodology can be useful toward future studies
the router. In fact, there are many cases where routers send                 that require a mapping of eBGP to iBGP updates, or
exact copies of the most recent update previously sent. To                   vice versa.
date, there has been no explanation as to why these ‘dupli-
                                                                           • Using our observations of duplicate behavior, we man-
∗
  University of California, Los Angeles.                                     age to finally determine the exact cause behind dupli-
†
  Nokia                                                                      cate generation.
‡
  Level-3 Communications Inc.
§                                                                         This paper is organized as follows. Section 2 briefly goes
  Arbor Networks
0
  We would like our paper to be considered for the best paper.          over some aspects of BGP and routing that are relevant to


                                                                    1
our investigation of duplicate generation. Section 3 explains          as well as intra-domain routing (iBGP). Here we briefly de-
the negative impact that duplicates place on reachability and          scribe the common peering topologies for both inter and
router processing loads. Section 4 takes a thorough look at            intra-domain routing.
duplicate generation across time and space. We look into
                                                                       External BGP: When BGP is used to convey reachability
quantity of duplicates as well as patterns of duplicate gener-
                                                                       information between two routers that reside in different do-
ation. In Section 5, we reveal the cause of duplicate update
                                                                       mains (inter-domain routing), the session between these two
generation, along with supporting evidence. In Section 6,
                                                                       routers is called an eBGP session. The routing information
we discuss the ramifications of our observations and discov-
                                                                       in each update is conveyed in the form of BGP attributes.
eries, and present data suggesting that duplicates are not the
                                                                       Some of the more relevant attributes to this paper are AS-
only superfluous BGP updates being propogated throughout
                                                                       path, MED, and Community. AS-path is particularly im-
the Internet.
                                                                       portant because it is used to prevent update messages from
                                                                       looping. If a router receives an update from an eBGP peer
2.     BACKGROUND                                                      containing its own AS in the AS-path attribute, it discards
   In this section, we review some routing details that are            the update to avoid creating an update loop. Otherwise, it
particularly relevant to our study of duplicates. Specifically,         appends its own AS number to the AS-path attribute value
we discuss the definition of ‘duplicate updates’, BGP session           before sending the update to its other eBGP peers.
establishment/resets, BGP peering topologies, and Route-
                                                                       Internal BGP: iBGP is used to convey reachability informa-
Views/RIPE monitors. Readers with a strong background
                                                                       tion within a domain. For inter-domain routing, full-mesh
in routing may skip this section and refer to it as needed.
                                                                       peering is not necessary since AS-path can be used to pre-
2.1      Definition of Duplicates                                       vent update loops. However, all routers involved in intra-
                                                                       domain routing belong to the same AS, so AS-path attribute
   A BGP update for prefix p sent by router r is a ‘duplicate’
                                                                       values cannot be used to avoid update loops. Therefore,
if and only if all attributes in the update are the same as the
                                                                       iBGP requires all peers to be reached within one hop in a
most recent previous update for prefix p sent by router r, and
                                                                       full-mesh topology as shown in Figure 1(a). In practice, this
both the update and the previous update belong to the same
                                                                       approach is not scalable and too expensive to manage. This
BGP session. Since an update and its duplicate may be sent
                                                                       leads to the use of route reflectors (RR) [1] and AS confed-
at different times, the timestamps for a BGP update and its
                                                                       erations [16], which relaxes the full-mesh requirement.
duplicate may be different. A router might send other BGP
updates regarding other prefixes between an update and its              Route Reflector: Route reflector architectures consist of a
duplicate. For example, a router r might send an update for            route reflector server (RR server) and route reflector clients
prefix p, then send updates for prefixes a, b, and c before              (RR client). Under an RR architecture, non-RR iBGP routers
finally sending a duplicate update for prefix p. This last up-           connect to a route reflector server. The non-RR clients send
date is still considered a duplicate. If an update for prefix p         updates to their RR server, and the RR server will reflect
is sent that is identical to an update for prefix p from a previ-       this route to all of its other clients. Figure 1(b) illustrates an
ous BGP session, we do not consider the update a duplicate.            example of an RR architecture. In practice, large ISPs of-
An update and its duplicate must belong to the same BGP                ten have many RR servers that are fully meshed with each
session.                                                               other, and there are often many hierarchical levels of route
                                                                       reflector servers to balance traffic load. However, hierarchies
2.2      BGP Session Establishments/Resets                             re-introduce the possibility of update loops [19], so a new
   When a BGP session is established between two routers,              attribute called cluster-list was added to iBGP in order to ad-
the routers first exchange their entire routing state informa-          dress this problem. Each RR is assigned a cluster ID, and
tion via a series of BGP updates. This is referred to as ‘full         the cluster-list attribute works similarly to eBGP’s AS-path
table exchange’. Once the full table exchange is completed,            attribute, except cluster IDs are used instead of AS numbers.
each router sends updates only when there is a change in               AS confederation: AS confederation topologies group a
its routing state (e.g. change in topology or policy). If the          number of routers together into sub-ASes. This leads to
BGP session between the routers gets reset, the routers need           many sub-ASes within a domain, and each sub-AS commu-
to go through the full table exchange again. For all of the            nicates with each other within the domain as shown in Fig-
measurements presented in this paper, we carefully ensure              ure 1(c). Within each sub-AS, iBGP speakers must be fully
that updates and their duplicates belong to the same BGP               meshed. Routing loops are avoided between ASes via a new
session. The full table exchanges allow us to differentiate            BGP attribute called as-confed-sequence, which works sim-
between sessions. Section 4 provides further details on how            ilarly to the AS-path attribute in inter-domain routing.
we differentiate between sessions.
                                                                       iBGP and eBGP interaction: Since certain BGP attributes
2.3      BGP Peering Topologies                                        such as cluster-list and as-confed-sequence are meant to be
     Today, BGP is used for both inter-domain routing (eBGP),          used only for intra-domain routing, routers that have both


                                                                   2
              r2                    r3                  rrc2                        rrc3                 r2                      r3

                                                                                                                     Sub-AS2

      r1                                         rrc1                                            r1
                                   r4                                             rrc4                                          r4
                                                          rrs1         rrs2                              Sub-AS1


                     r5                                           rrc5                                          r5

              (a) iBGP full mesh                    (b) iBGP with route reflector                   (c) iBGP with AS confederation

                                           Figure 1: Different iBGP infrastructure


iBGP and eBGP peers will remove these iBGP-specific at-                   which the largest number of updates were received. Our
tributes when forwarding iBGP-learned updates to eBGP peers.             data set consists of a specific subset of all RouteViews/RIPE
Such attributes are referred to as non-mandatory non-transitive          monitors. The monitors were carefully chosen such that each
BGP attributes.                                                          monitor was available for the entire month of March 2009
                                                                         and that there was at most one monitor per AS in our dataset.
2.4        RouteViews and RIPE RIS                                       The number of stub, transit, and tier-1 monitors we ended up
   We use eBGP data collected from RouteViews and RIPE                   with were 8, 55, and 27 respectively, for a total of 90 moni-
routers[17, 12]. These public-data collection routers connect            tors. Figure 2(a) shows the percent of duplicate traffic during
to BGP routers in different ASes and passively collect up-               busiest times for each of the 90 ASes we monitored. Notice
dates from them. We refer to these passive collection routers            that for 22% (20 out of 90) of all monitored ASes, duplicates
as monitors.                                                             contribute 50% or more of the update traffic during busiest
                                                                         times. As we describe in section 4, we carefully verified that
                                                                         the busy times were not due to session resets, and that the
3.    DUPLICATES CAN HARM                                                duplicates seen here are from the same bgp sessions as the
      THE INTERNET                                                       original updates.
   We start by measuring the impact that duplicates have on                 Figure 2(b) is a close-up look at a particularly bad case of
Internet health. Up until now, duplicate updates have been               our measurement, AS1853. Overall, 86.42% of total updates
considered relatively benign; people believed that duplicates            during the top 0.01% of busiest times were duplicates. Dur-
do not hinder routing efficiency in any significant way [6].               ing the busiest second the router in AS1853 had to process
However, we find that duplicates are far from harmless; they              about 175,000 updates in Figure 2(b).
negatively affect reachability and are also responsible for
the majority of router processing loads during their busiest             3.2     Impact on Reachability
times. These results suggest that solving the mystery of du-                We also find that duplicates negatively impact reachability
plicate generation is not merely an academic exercise; there             when combined with route flap damping(RFD). Duplicates
are very practical reasons for removing duplicate updates                can cause routers to label many stable routes as unstable,
from the Internet.                                                       rendering them unusable to the router. Duplicates also in-
                                                                         crease the time that routes remain unusable.
3.1        Impact on Processing Loads
   A router must perform a series of tasks on each BGP up-               3.2.1    Route flap damping
date it receives. Some of these tasks, such as ingress filtering             Before going into our measurements, we briefly review
and RIB-in updating, must be done regardless of the update’s             how RFD operates. Route Flap Damping (RFD) [18] is a
content [3]. Therefore, the amount of processing is propor-              mechanism designed to prevent very unstable routes from
tional to the number of updates the router receives, and du-             affecting routers. Route flap damping was built into main-
plicates contribute to the processing load of the routers. Pre-          stream routers around the late 90s. Under RFD, routers main-
vious studies have shown that higher processing loads can                tain a penalty value for each prefix mentioned in BGP up-
lead to more session resets, routing loops, and packet losses            dates from its peer. For every BGP update (announcement or
[20, 2].                                                                 withdrawal) that a peer sends the router, the router increases
   Thus, we measured how much duplicates contribute to the               the penalty value for the prefixes affected by the update.
router processing loads during their busiest times during the            This penalty value decreases exponentially over time. If the
month of March 2009. We define ‘busiest times’ as the top                 penalty value for any prefix goes above a certain threshold
0.01% of seconds (corresponding to 268 seconds) within                   (cutoff threshold), the router will mark the route as ‘sup-


                                                                   3
                                                                            (a) Distribution of additionally suppressed prefix count
                 (a) Distribution of % duplicates




                                                                                      (b) Number of suppresed prefixes

                                                                                    Figure 3: Impact on reachability
           (b) % Duplicates during the busiest seconds

           Figure 2: Impact on processing loads
                                                                       tomer’s frequent route oscillations locally without affecting
                                                                       other networks.
   RFD parameter               Cisco   Juniper   Quagga   XORP
   Withdrawal penalty          1000     1000      1000    1000         3.2.2    Duplicates and RFD in action
   Re-advertisement penalty      0      1000      1000    1000
   Attributes change penalty    500      500      1000    1000            Since penalty values increase as a function of update counts,
   Cutoff threshold            2000     3000      2000    3000         duplicate updates can contribute to the suppression of pre-
   Half-life (min)              15       15        15      15          fixes when combined with RFD. Duplicates can increase both
   Reuse threshold              750      750      750      750         the number of prefixes suppressed, as well as the length of
   Max suppress time (min)      60       60        60      60
                                                                       time that a prefix remains suppressed.
         Table 1: Default RFD parameter values                            To illustrate the negative impact duplicates can have on
                                                                       reachability when combined with RFD, we measured the
                                                                       amount of prefixes suppressed with and without duplicate
pressed’, meaning the route will not be used by the router un-         updates during the month of March 2009. We simulated
til the accumulated penalty value drops below another thresh-          RFD by reading updates sequentially and applying the algo-
old (reuse threshold).                                                 rithm described in the RFD RFC [18]. As mentioned in our
    Different types of updates contribute different penalty val-       RFD primer, different router implementations have different
ues to the penalty total of the prefix and different vendors use        default RFD penalty parameters. For our measurement, we
different default RFD parameters. Table 1 lists default RFD            used the Cisco default parameters, since Cisco routers have
parameters for different router RFD implementations.                   the greatest Internet presence amongst all router vendors.
    Although people have suggested that RFD be turned off,                Figure 3(a) shows the additional number of prefixes sup-
[8, 22, 13], a quick survey on North America Network Op-               pressed due to duplicates for each of our 90 monitors during
erator Group (NANOG) [10] indicates that, in practice, RFD             March 2009. About 2/3 of ASes have noticeable amounts of
is still used mainly for two reasons: 1) to protect resources          additional prefixes supressed due to duplicates.
such as CPU during peak times, and 2) to confine a cus-                    The monitor in AS812 is particularly affected by dupli-


                                                                   4
cates. Figure 3(b) shows the number of suppressed prefixes
(X-axis) with cumulative suppression time (Y-axis) with and
without duplicates during the month of March 2009 for the
monitor in AS812. For example, if prefix p is suppressed
twice for 1000 seconds each during March 2009, its cumu-
lative suppression time would be 2000 seconds.
   During this one month period, AS812 announced 289,006
prefixes at least once. With duplicates, 208,920 prefixes will
be dampened if the neighbor router has RFD turned on with
default parameters (1000 as the penalty value for an update).
Without duplicates, the monitor in AS812 only suppresses
76,131 prefixes. The additional median cumulative suppres-
sion time for a prefix was 26.06 hours. In other words, du-
plicates caused a median of 26.06 extra hours that routes re-
mained unnecessarily suppressed.
   We note that different routers have different default pa-                Figure 4: Amount of duplicates observed in time
rameter values, which will affect the number of prefixes sup-
pressed with and without duplicates. However as long as an
update is associated with a penalty value greater than 0, du-            Next, we look at the prevalence of duplicates on the In-
plicates will have a negative impact on reachability when             ternet over the last few years. Figure 4 shows how long
combined with RFD.                                                    duplicates have existed in BGP by showing the maximum,
                                                                      minimum, and 95% confidence intervals of % duplicates ob-
4.    UNDERSTANDING DUPLICATES ACROSS                                 served by different monitors for the month of March from
                                                                      2002 through 2009. For each year, we selected monitors
      TIME AND SPACE                                                  based on the criteria described in section 3. Table 2 shows
   Now that we understand the negative impact that dupli-             the number of monitors we used from 2002 through 2009.
cates can have on Internet health, we analyze duplicate gen-          The number of qualified monitors generally increase over
eration in detail to gain a better understanding of this dupli-       time, mainly because more ASes peered with RouteViews
cate pathology, and maybe even discover the cause of du-              and RIPE over time. The exception was between 2008 and
plicate generation. Not only do we measure the prevalence             2009. This was because some of the collectors in RIPE had
of duplicates updates on the Internet today, we also measure          problems during March 2009, and we did not use any moni-
the number of duplicates that we have seen over the past few          tors that did not have complete data for the month. We per-
years. We then explore whether topological factors (such as           formed the same measurement for other months from 2002
size of AS or connectivity) show any correlation with occur-          through 2009, and the results were all similar. The amount of
rences of duplicates.                                                 duplicates we counted also agree with the amount observed
   Our data set consisted of the same 90 monitors we used             in previous studies [7, 20, 11].
for our measurements in section 3. Recall that the number                Figure 5 shows that duplicates are consistently observed
of stub, transit, and tier-1 monitors were 8, 55, and 27 re-          over time and often go up and down with the total number
spectively, for a total of 90 monitors. As mentioned in 2.2,          of updates. If duplicates were observed in sporadic bursts,
an update cannot be a duplicate of another update from a dif-         that might have suggested the cause of duplicates to be some
ferent BGP session. For that reason, we preprocess all of our         descrete events correlated with the times that duplicates were
data using the minimum collection algorithm (MCT) [21] to             observed. However this does not appear to be the case, since
filter out updates due to session resets before performing any         we observe duplicates to come at a fairly constant rate.
of the measurements presented in this paper.
                                                                      Result Summary: We observe that duplicate updates have
4.1    Are Duplicates Observed at All Times?                          been consistently observed over time, accounting for any-
                                                                      where between 10% to 20% of total BGP updates from all
   We start by measuring the number of duplicates seen dur-
                                                                      monitors. Furthermore, duplicates come at a fairly constant
ing the month of March 2009 from all monitors. Figure 5
                                                                      rate, and are not bursty in nature.
shows the amount of duplicates along with the total num-
ber of updates from all 90 monitors during March 2009. It
turns out that duplicate generation is not just a pathological        4.2     Are Duplicates Observed from All Networks?
behavior rarely seen on the Internet. In this month alone,               The Internet is a network of networks, and their behav-
the total aggregated number of updates was about 677 mil-             iors can vary widely. In general, a tier-1 network has a very
lion. Among those, about 91 million updates were dupli-               different topology from a stub network, and the networks
cates. Thus duplicates make up 13.4% of aggregated BGP                are optimized to achieve very different goals. Thus our next
traffic.                                                               measurement is aimed at understanding if size or type (e.g.


                                                                  5
                 Year                           2002     2003     2004      2005     2006    2007    2008    2009
                 # Monitors                       27       37       54        67       79     100     109      90
                 # Total updates (106 )        129.5    207.3    316.4     426.5    423.7   511.2   652.2   677.4
                 # Duplicate updates (106 )      12.7     32.0     68.9      74.6    63.8   137.1   111.0    91.3

                                              Table 2: Aggregated number updates




                                                                          Figure 6: Amount of duplicates in different AS Type

  Figure 5: Amount of duplicates during March 2009
                                                                     verify our hypothesis.
                                                                        For this, we looked specifically at duplicates for partic-
stub, tier-1) of network has any correlation with observed           ular prefixes where the following was true. First, the ob-
duplicates. As mentioned in section 3, March 2009 moni-              served duplicate for prefix p from peer X had an AS-path
tors consisted of 8 tier-1 networks, 55 transit networks, and        ending with X-Y. Second, we had to have monitors for both
27 stub networks. We measured the percentage of dupli-               AS X and AS Y. With this, we can see whether the dupli-
cates out of total updates that each network generated for           cates actually originate at (or within) AS X, or whether they
the month of March 2009.                                             were sent to X from Y. Our case study consisted of prefix
   Figure 6 summarizes our findings. All three types of net-          85.249.120.0/23 advertised by AS 9002, a direct customer of
works generate duplicates with some variation in their per-          AS 3356. We had monitors in both AS 9002 and AS 3356.
centages. Minimum % duplicates were very low in all three               During March 2009, AS9002 announced and withdrew
cases. At the same time maximum % duplicates were quite              prefix 85.249.120.0/23 21 times. Upon receipt of these an-
high for all types, showing a large variation in behavior even       nouncement and withdrawal pairs, AS3356 sends out the
amongst networks of the same type, which is typical of a             announcement to the monitor with prepended AS-path, but
system as huge as the Internet. We note that the large con-          AS3356 never sends the withdrawal. Instead, AS3356 sends
fidence interval range for tier-1s is mainly due to the small         a duplicate announcement to our monitor. In total, AS3356
number of data points available to us (8 tier1s compared to          generates 53 duplicates on prefix 85.249.120.0/23 after re-
55 transits and 27 stubs). Later in section 6, we discuss why        ceiving 21 pairs of announcement and withdraw messages
the amount of duplicates observed varies so widely amongst           as shown in Figure 7.
networks of the same type.                                              Not only does our experiment back up our hypothesis that
                                                                     the sender of duplicates is the originator of duplicates, but
Result Summary: The topological role of an AS (e.g. stub,            it also suggests that the cause of duplicates may have some-
transit or tier-1) does not have any strong correlation with         thing to do with the way internal topology dynamics inter-
the generation of duplicate updates.                                 acts with eBGP updates.
4.3   Where Do Duplicates Originate?                                 Result Summary: Duplicates are originated by the peer
   So far, we have observed duplicates from different moni-          sending the duplicate. Furthermore, it is likely that the cause
tors. However, we do not quite know where these duplicates           for duplicate generation involves iBGP dynamics inside an
originate. By specification, a BGP router should not prop-            AS. In the next section, we examine what exactly happens
agate a duplicate it receives. Thus, when a router receives          inside an AS that results in duplicate generation.
a duplicate from peer X with a path X-Y-Z, where Z is the
origin AS, we hypothesized that the duplicate message must           5.     DISCOVERING THE CAUSE OF DUPLI-
be generated by X and not by Y or Z. Our next exercise is to                CATES

                                                                 6
                                                                                       Algorithm 1: Inferring time difference of Re and Ri
             AS9002                                                                       Data: iBGP, eBGP
                                                                                          Result: time difference td
                                    Flaps 21 times                                        // generate signature for all updates
             W P: 85.249.120.0/23
                                                                                      1   generate sig(iBGP)
                                                                                      2   generate sig(eBGP)
          A P: 85.249.120.0/23, ASPATH: 9002
                                                                                          // search range is 30 seconds
                                                                                     3    tw = 15
                                                           53 duplicates
                                                                                     4    ts = start time
              AS3356
                                       A P: 85.249.120.0/23, ASPATH: 3356 9002       5    te = end time
                                     A P: 85.249.120.0/23, ASPATH: 3356 9002              // init [ −tw , tw ]
                                                                                      6   for td from (−tw ) to (tw ) do
                                                                                      7        matched fraction[td ] = 0
                                                                                      8   end
      Figure 7: External view of duplicate generation                                     // accumulate matched fraction
                                                                                      9   for tebgp from (ts +tw ) to (te -tw ) do
                                                                                     10        for tibgp from (tebgp −tw ) to (tebgp +tw ) do
                                               Re                                    11             td = tibgp - tebgp
                                                      eBGP collector                 12             fraction = match(eBGP,tebgp ,iBGP,tibgp )
                                                       (eBGP peer)                   13             matched fraction[td ] += fraction
                                                                                     14        end
                                                                                     15   end
                Rs                                     iBGP collector                16   return td with maximum match[td ]
                                                        (Client of Rs)
                                               Ri

               Figure 8: Data collection settings                                    leading to different timestamps on the received updates. We
                                                                                     resolve these timing issues by introducing the notion of up-
                                                                                     date ‘signatures’, which we now describe.

   Once we suspected that duplicates may be generated due                            sig(u) = peer asn prefix aspath origin comm agg
to some interaction between iBGP and eBGP, we ran an ex-
periment designed to compare eBGP update+duplicate pairs,                               The signature of an update contains all of BGP’s transitive
match them with their iBGP counterparts, and compare these                           attributes that should be the same in Rs ’s updates to either
iBGP updates to see what we might learn about duplicate                              Ri or Re . These attributes include: peer, AS number, pre-
generation.                                                                          fix, AS-path, origin, Community, and aggregator. Note that
                                                                                     the Community attribute is an optional transitive attribute,
5.1    Collecting iBGP and eBGP Data from a                                          meaning it is only optionally retained when forwarding up-
       Tier-1 ISP                                                                    dates to eBGP peers. For our experiment, Rs was config-
   Our first step was to obtain the data needed for our investi-                      ured to retain the Community attribute value when forward-
gation. We teamed up with a tier-1 ISP who provided us with                          ing eBGP updates. In order for the signature of update u1 to
access to both iBGP and eBGP updates generated by one of                             match with the signature of update u2 , u1 and u2 must have
their routers. Figure 8 illustrates our data-collection setup.                       the same transitive attribute values.
Rs is the router sending updates to our two collector boxes,                            Now that we’ve defined what a signature is, we use this
Ri and Re . Ri is configured as an iBGP client of Rs (i.e.                            notion to calculate the time differences td observed between
route reflector client), collecting iBGP data from Rs . Re                            eBGP updates and their iBGP counterparts.
is an eBGP peer of Rs , collecting eBGP updates from Rs .
Both the iBGP and eBGP sessions have their MRAI timers
disabled, so that Rs will send updates to our collectors as Rs
has updates to send.                                                                         # of updates                                       eBGP
                                                                                                                                    10
5.2    Mapping iBGP and eBGP Updates                                                                                  t-1       t    t+1 t+2
  Now that we obtained the necessary data, we needed a
way to match up eBGP updates to their corresponding iBGP
                                                                                             # of matched
updates for comparison. Performing this match is nontrivial.                                                                                    iBGP
                                                                                              signatures                    1       7    2
The main challenge in matching eBGP update streams with
iBGP updates streams is timing issues. The time that two                                                                                        Time
updates, triggered by the same event, are sent out from Rs                                                            t’-1      t’ t’+1 t’+2
may be different due to the nondeterministic behavior of Rs .
Also, Ri and Re ’s system clocks may not be synchronized,                                          Figure 9: Infering time difference td


                                                                                 7
   Algorithm 1 describes how we infer the time difference          Algorithm 2: Mapping eBGP Updates to iBGP Updates
for updates sent to our two collectors, Ri and Re . tebgp           Data: ibgp, ebgp
is a value in seconds. We look at signatures of all updates         Result: observed ibgp update difference
received during the tebgp second, and then search for the        1 td = 0
second in iBGP, tibgp , that yields the maximum fraction of      2 tj = 10
matched signatures. In Algorithm 1, we used a search win-        3 forall ebgp update ue for prefix p in time do
dow value of 30 seconds (15 seconds before and after tebgp ).          // record history
When there is no distinct maximum peak within the search         4     tnow = timestamp(ue )
window tw , one should increase the value. In our case, the      5     sig = sig(ue )
peak fraction of matched signatures was about 0.7 at a lag       6     insert new(historyp ,{(tnow ,sig})
value of 0 (i.e. td = 0). The remaining 0.3 were dispersed       7     remove old(historyp ,300)
within a 10-second range centered at tebgp as shown in Fig-      8     if ue is a duplicate then
ure 9. This means that the system times of Ri and Re have                  // ibgp window [ws ,we ]
synchronized system clocks to the second precision.              9         ws = tnow - 300 + td - tj
   After discovering that the time difference td was always     10         we = tnow + td + tj
within 10 seconds, we were able to map eBGP updates to                     // find the match
their iBGP counterparts using a heuristic algorithm involv-     11         m = find match(iBGP,ws ,we ,historyp )
ing signature and timestamp comparisons.                                   // find the difference
   Algorithm 2 describes our heuristic at a high level. We      12         code = ibgp diff(m)
collected one day of iBGP and eBGP updates, putting them        13         record(code)
in sequential order as sent from Rs . We start with the first
                                                                14     end
eBGP update in the sequence. As we moved down the se-
                                                                15 end
quence, we kept per-prefix history of signatures for every
update we encounter for a time window of 5 minutes. For
each eBGP duplicate update for prefix p we found as we
moved forward, we looked at the corresponding iBGP time         contents of their corresponding iBGP updates. For 100%
window to find a match for the sequence of signatures we         of the 176,266 matched ebgp+duplicate pairs, we observed
recorded in eBGP for this prefix p. We say the sequence          that their iBGP counterparts had differing non-mandatory at-
has a match when there is the exact sequence of update sig-     tribute values. Table 3 shows our results. 0.15% of pairs
natures within the iBGP time window. In case of multiple        were exceptions, only differing in MED values. We plan
iBGP matches for an eBGP 5-minute stream, we choose the         to look into these exceptions as future work. For the other
iBGP stream occurring at the time closest to the time of the    99.85% of eBGP update+duplicate pairs, we observed cor-
eBGP stream. Note that we found very few cases where an         responding iBGP update pairs with either cluster-list and/or
eBGP stream matched to multiple iBGP streams.                   originator-id differences. These attribute differences repre-
   Since we discovered that iBGP and matching eBGP up-          sent changes in intra-domain routing path selections.
dates are always sent within 10 seconds of each other, we       5.4    The Cause of Duplicates
add 10 seconds to both ends of iBGP’s time window. For
example, if eBGP time window is 5 minutes of [ t-300 , t ],        The results of our experiment allowed us to determine
then the corresponding iBGP time window is [ t+td -300-10       the cause of eBGP duplicate updates. Our theory proved
, t+td +10 ]. In our case, td is 0, and our iBGP windows is [   to be correct; duplicates are caused by an unintended inter-
t-310 , t+10 ].                                                 action between eBGP and iBGP. Recall from 2 that when
   Using our heuristic, we were able to match 95.61% of         forwarding iBGP updates to eBGP peers, a router strips all
eBGP update+duplicate pairs to their iBGP counterparts. 4.39%   non-eBGP attributes before sending the update. The reason
of eBGP update+duplicate pairs could not be mapped to any       that duplicates are generated is that routers are receiving up-
iBGP counterparts. We believe this is due to problems with      dates via iBGP which differ in iBGP attribute values alone,
bgpdump mentioned in [4]. While many of these issues men-       and thus the router believes the updates to be unique. How-
tioned are fixed, some have not been. Furthermore, users         ever, once the router processes the update, strips the iBGP
have experienced other undocumented bugs that occur dur-        attribute values, and sends the update to its eBGP peer, the
ing writes to disk and conversions from one file format to       two updates look identical from the point of view of the
another (e.g. mrt to text), and these could also have con-      peer. Figure 10 illustrates a case where duplicates are gener-
tributed to our inability to match all updates.                 ated due to changes in an iBGP attribute (cluster-list in this
                                                                case). Assume that we have an RR architecture within an
                                                                ISP. Originally, RRC1 (RR Client 1) in AS1 receives an up-
5.3   Comparing iBGP and eBGP updates                           date to reach P1 via RR1 -RRC2 and chooses the route as
  After mapping eBGP updates to their iBGP counterparts,        the best path. Since this is the new best path, RRC1 sends
we took each eBGP update+duplicate pair and compared the        out an update to its neighbors. When the update is sent


                                                            8
                                  eBGP duplicate count        % Total   Observed iBGP differences
                                  173,594                      94.77    cluster-list only
                                  244                            0.13   cluster-list and others
                                  1,371                          0.75   originator-id and others
                                  1,057                          0.58   cluster-list + originator-id + others
                                  269                            0.15   MED
                                  6,647                          3.63   no match found
                                  Total: 183,182              100.00

                                                     Table 3: Matched iBGP updates


                                                                               From our conclusions about the cause of duplicates, more
                           P1: NO-CLIST
       P1: NO-CLIST                                                         internal path exploration within an AS will lead to a higher
                           2
                                                                            number of duplicates generated. The number of duplicates
                3               P1: CLIST = RR1                             will be proportional to the number of paths the router tries
                       RRC1 RR1                                             before selecting the new best path. Larger ASes(e.g. tier-1s)
                                                    1
                                                                            tend to have more complex internal connectivity, leading to
                                                         P1
                                RR2                                         more internal path exploration. Given that larger ASes gen-
                      P1: CLIST = RR2
                                                  RRC2                      erally have more direct customers, a greater portion of the
                                                                            Internet may have to suffer the negative effects of duplicates.
                                 AS1                                           We found that duplicates can cause good routes to be sup-
                                                                            pressed under RFD. When a network experiences internal
              Figure 10: Cause of duplicates
                                                                            route flaps, duplicates will be generated to eBGP peers. If
                                                                            route flap dampening is turned on at the neighboring eBGP
out, the cluster-list attribute is filtered since this is a non-             router, the route to the prefix through the network can be
mandatory non-transitive attributes meant to only be used                   unnecessarily suppressed.
internally within the network. Shortly after RRC1 sends
                                                                            6.2    Differences in the Amount of Observed Du-
out an update, it receives an update to reach P1 via RR2 -
                                                                                   plicates
RRC2 and chooses this route as the new best path to reach
P1 . RRC1 then sends another eBGP update to its neigh-                         As observed in 4.2, ASes of the same type vary in the pro-
bors after filtering the cluster-list attribute again. To RRC1 ’s            portion of duplicates they generate. One reason may be a dif-
neighbors, these two updates are identical, and the latter up-              ference in MRAI timer settings amongst the networks. Du-
date is a duplicate. Here, what we essentially observe are                  plicates are generated during internal path exploration. Dur-
internal route flappings being leaked to the external network                ing path exploration, updates come in bursts, and thus MRAI
in the form of duplicates. That is, one will observe more du-               timers can prevent many updates from being sent. Thus, net-
plicates when there is more internal path exploration. Based                works that turn off their MRAI timers are likely to gener-
on the amount of duplicates, one may be able to infer the                   ate more duplicates than other networks with MRAI timers
complexity of a network’s internal connectivity.                            turned on. AT&T, a network believed to have MRAI timers
                                                                            turned off, does appear to generate a greater proportion of
                                                                            duplicates as compared to other tier-1 networks. MRAI timer
6.    DISCUSSION                                                            differences do not fully explain why the amount of observed
  In this section we discuss some of the ramifications of our                duplicates varies so much from one AS to another. Our
measurements and discoveries.                                               analysis shows that internal path exploration with changed
                                                                            non-mandatory attributes such as cluster-list or originator-
6.1    Implications of Our Duplicate Investiga-                             id can generate duplicates. During our experiments involv-
       tion                                                                 ing eBGP and iBGP interactions, we noticed that cluster-list
   In our study, we observed that duplicates are generated                  changes were often coupled with a change in Community
due to changes in cluster-list and originator-id oscillations               or MED attribute values. In other words, non-transitive at-
under route reflector architectures. However, it is not only                 tribute changes were coupled with transitive attribute changes.
ASes using route reflectors that are affected. ASes using AS                 These cases did not result in eBGP duplicates; instead, we
confederation architectures will also generate duplicates due               observed updates with fluctuating Community/MED values.
to the use of a non-mandatory non-transitive attribute named                We asked operators at our tier-1 ISP and they confirmed that
AS-confed-sequence, which is essentially the AS confeder-                   this was quite deliberate; routers were configured to make
ation version of the cluster-list attribute under route reflector            changes in certain transitive attribute values whenever there
architectures.                                                              was a change in certain non-mandatory attribute values in


                                                                        9
accordance with [9, 14]. [9, 14] suggests using Community
attribute values as a general purpose attribute to convey in-
formational tags as well as action tags to receiving networks.
MED values were also used for traffic engineering purposes.
However, operators admit that not all peers need or use this
Community information, and for those routers that do not
use the Community information, these BGP updates are as
useless to them as duplicates. However, such updates can
be more detrimental than duplicates in one significant way;
with duplicates, the negative impact is limited to the direct
neighbors. As described earlier, duplicates do not travel
more than one hop. However, if some other(optional) tran-
sitive attributes such as Community is changed, then the up-
date is no longer a duplicate and can be propagated through-
out the Internet. Community value changes are not useful                          Figure 11: Other potential noises
to networks that are more than 1 hop away, and yet these
networks still must suffer the same negative impacts of re-
ceiving a superfluous BGP update.                                      rently working with a vendor to fix this.
   Our discovery of these non-duplicate wasteful BGP up-                 While pure duplicates are clearly unnecessary BGP over-
dates led us to wonder if other ASes generated similarly              head, our work revealed that duplicates may not be the only
wasteful non-duplicate updates. We looked at all updates              superfluous BGP updates floating around on the Internet. As
from tier-1s observed by our monitors for the month of March          described in section 6.2, updates that couple non-transitive
2009, and classified the updates into 3 types - duplicates,            attribute changes with transitive attribute changes may not
Community/MED change, and remainder. Duplicates are                   be useful to all recipients. It would be interesting to iden-
defined as described in section 2.1. Community/MED change              tify all forms of superfluous BGP updates and gain an exact
updates are defined as follows. An update for prefix p sent             measure of how much BGP traffic is simply unwanted noise.
by router r is a Community/MED change update if and only              However, we hope that our work allows the Internet com-
if the Community and/or MED attribute value differs from              munity to take a significant step towards a optimal and clean
the most recent previous update for prefix p sent by router r,         routing communication system.
all other attributes in the update are the same as the most
recent previous update for prefix p sent by router r, and              8.   REFERENCES
both the update and the previous update belong to the same
BGP session. Remainder updates are defined as updates that              [1] T. Bates and R. Chandra. RFC 1966: BGP route
are neither duplicates nor Community/MED change updates.                   reflection an alternative to full mesh IBGP, 1998.
Figure 11 shows our results. While AS3549 and AS2914                   [2] J. Cowie and A. Ogielski. Global routing instabilities
generated almost no duplicates, 50% or more of their total                 during code red ii and nimda worm propagation.
updates were Community/MED change updates. We sus-                         NANOG 23.
pect that many of these updates could be useless to many               [3] S. Halabi and D. McPherson. Internet routing
networks that receive the update, for similar reasons as de-               architectures, 2nd ed., 2001.
scribed earlier in this section. We intend on verifying our            [4] H. Kong. The consistency verification of zebra BGP
suspicion in future work.                                                  data collection.
                                                                           http://www.ripe.net/projects/ris/papers/report.pdf.
                                                                       [5] C. Labovitz, G. R. Malan, and F. Jahanian. Internet
7.   CONCLUSION                                                            routing instability. ACM/IEEE Transactions on
   In this paper, we conducted the first comprehensive mea-                 Networking, 6(5):515–528, October 1998.
surement study quantifying the prevalence of duplicates on             [6] C. Labovitz, G. R. Malan, and F. Jahanian. Origins of
the Internet across space and time. We discovered that du-                 internet routing instability. ACM/IEEE Infocom,
plicates make up over 10% of all BGP update traffic. We ex-                 1:218–226, March 1999.
amined the impact that duplicates have on the overall health           [7] J. Li, M. Guidero, Z. Wu, E. Purpus, and
of the Internet, and discovered that duplicates can negatively             T. Ehrenkranz. BGP Dynamics Revisited. In ACM
impact reachability and router processing loads. We devel-                 Sigcomm Computer Communications Review, April
oped a heuristic to match eBGP updates with their corre-                   2007.
sponding iBGP counterparts. Finally, we combined our ob-               [8] Z. M. Mao, R. Govindan, G. Varghese, and R. H. Kats.
servations with our heuristic to discover the major cause of               Route flap damping exacerbates internet routing
duplicates on the Internet - duplicates are caused by an un-               convergence. SIGCOMM Comput. Commun. Rev.
intended interaction between iBGP and eBGP. We are cur-                    (CCR), 32(4):75–84, 2002.


                                                                 10
 [9] D. Meyer. RFC 4384: BGP Communities for data
     collection, 2006.
[10] NANOG. Nanog. http://www.nanog.org.
[11] J. Rexford, J. Wang, Z. Xiao, and Y. Zhang. BGP
     routing stability of popular destinations. In ACM
     SIGCOMM Internet Measurement Workshop (IMW),
     2002.
[12] RIPE NCC. Routing Information Service.
     http://www.ris.ripe.net/.
[13] P. Smith and C. Panigl. Recommendations on
     route-flap damping.
     http://www.ripe.net/ripe/docs/ripe-378.html.
[14] R. A. Steenbergen and T. Scholl. BGP Communities:
     a guide for service provider networks, 2007.
[15] P. Traina. RFC 1774: BGP-4 protocol analysis, 1995.
[16] P. Traina, D. McPherson, and J. Scudder. RFC 3065:
     Autonomous system confederations for BGP, 1998.
[17] University of Oregon. Route Views Project.
     http://www.routeviews.org.
[18] C. Villamizar, R. Chandra, and R. Govindan. RFC
     2439: BGP route flap damping, 1998.
[19] M. Vutukuru, P. Valiant, S. Kopparty, and
     H. Balakrishnan. How to construct a correct and
     scalable iBGP configuration. In ACM/IEEE Infocom,
     2006.
[20] L. Wang, X. Zhao, D. Pei, R. Bush, D. Massey,
     A. Mankin, S. F. Wu, and L. Zhang. Observation and
     analysis of BGP behavior under stress. In IMW ’02:
     Proceedings of the 2nd ACM SIGCOMM Workshop on
     Internet measurment, pages 183–195, New York, NY,
     USA, 2002. ACM.
[21] B. Zhang, V. Kambhampati, M. Lad, D. Massey, and
     L. Zhang. Identifying BGP routing table transfers. In
     MineNet ’05: Proceedings of the 2005 ACM
     SIGCOMM workshop on Mining network data, pages
     213–218, New York, NY, USA, 2005. ACM.
[22] B. Zhang, D. Pei, D. Massey, and L. Zhang. Timer
     interaction in route flap damping. In Distributed
     computing systems, ICDCS 2005, Proceedings, 25th
     IEEE, pages 393–403, 2005.




                                                             11

								
To top