Near-Deterministic Inference of AS Relationships

W
Document Sample
scope of work template
							           Near-Deterministic
           Inference of AS
           Relationships
                                         Udi Weinsberg



A thesis submitted toward the degree of Master of Science in
Electrical and Electronic Engineering
Under the guidance of Dr. Yuval Shavitt and Eran Shir.
 Outline

 Introduction and Theory
 Problem and Algorithm

 Experimental Results

 Conclusion
Introduction and
Theory
         Introduction

   Today's Internet consists of thousands of
    networks administrated by various
    Autonomous Systems (AS).
     Large Provider - AS7018 – AT&T
     Small Provider - AS1680 – NetVision
     Educational Network – AS378 – ILAN
   ASes are assigned with one or more blocks of
    IP prefixes and communicate routing information
    to each other using Border Gateway Protocol
    (BGP).
         Type-of-Relationship

 ASes use a set of local policies for selecting the
  best route for each reachable prefix.
 These policies are based on the Type-of-
  Relationship (ToR) that exists between ASes.
 ToRs are used to calculate paths between
  ASes.
     ToRs are regarded as proprietary information.
     Deducing them is an important yet difficult
      problem
         Type-of-Relationship (2)

   Three major commercial relationships between
    neighboring ASes:
     Customer-to-Provider (C2P)
     Peer-to-Peer (P2P)
     Sibling-to-Sibling (S2S)
       Customer-to-Provider (C2P)

 Customer AS pays a Provider AS for traffic
  that is sent between the two.
 Provider AS is usually larger than the customer.

                             Provider



                     C2P                C2P




                  Customer              Customer
       Peer-to-Peer (P2P)

 Two ASes freely exchange traffic between
  themselves and their customers.
 Do not exchange traffic from or to their providers
  or other peers.


                           P2P

                    Peer         Peer
       Sibling-to-Sibling (S2S)

 Two ASes administratively belong to the same
  organization.
 Freely exchange traffic between their providers,
  customers, peers, or other siblings


                             S2S

                   Sibling         Sibling
         Valley Free Routing

   BGP paths must comply with the following
    Valley-Free hierarchical pattern:
     An uphill segment of zero or more c2p or s2s
      links,
     Followed by zero or one p2p links,
     Followed by a downhill segment of zero or more
      p2c or s2s links.
Valley Free Routing (1)




          
Valley Free Routing (2)




           
Problem and
Algorithm
           The Problem

   Given the AS graph (ASes as vertices with
    interconnecting edges), find the type-of-
    relationship between all adjacent ASes.
       Inferring ToR = Classifying edges.
                                   Peer
                                 Customer
                                 Provider



                            ?               ?




                      Provider
                        Peer
                      Customer
           Related Works

   Current relationships inference algorithms use
    one of two techniques:
       Using heuristic assumptions
         • Comparing AS degree to determine the “larger” AS.
       Optimizing some aspects of the ToR
        assignments
         • Minimizing number of paths that are not valley-free
         • Not allowing cycles in the resulting directed AS graph
       The Gap

 Using heuristic assumptions throughout the
  relationships inference process causes the
  erroneous ToRs to be spread over all
  interconnecting ASes links.
 Optimization models fail to capture the true
  Internet hierarchy.
           Work Goal

 Improve on existing methods by providing a
  near-deterministic inference algorithm for
  solving the ToR problem.
 We use the Internet Core, a sub-graph that
  consists of the globally top-level providers of
  the Internet
       Their interconnecting edges are already
        classified.
           Near-Deterministic Inference

   Theoretically, given an accurate core with no
    relationships errors, the algorithm
    deterministically infers most of the remaining
    AS relationships using the AS-level paths
    relative to this core
       Without incurring additional inference errors!
   In real-world scenarios, where the core and AS-
    level paths can contain errors, the algorithm
    introduces minimal inference errors.
       Why Near-?

 For the remaining set of relationships that
  cannot be inferred deterministically, a heuristic
  inference method is deployed.
 This group is relatively small, so it is still
  possible to provide a strict bound on the
  inference error.
            Algorithm - Definitions

   Input
     S – a set of AS-level routing paths.
     G(VG,EG) – the set of vertices that represent all
      ASes, and the interconnecting edges that need
      to be classified.
     Core(VC,EC) – the vertices and interconnecting
      edges that represent the core of G, and is
      assumed to contain all the top-level ASes.
   Output
       EG – Edges of input graph with votes for ToRs.
           Deterministic Algorithm
                                  Pre-processing
 Prior to starting the relationships inference
  algorithm, we infer S2S relationships.
 We use S2S data collected from CAIDA
       Obtained from IRR databases (RIPE, ARIN,
        APNIC).
           Deterministic Algorithm
                                                Phase 1
   Assuming that the input core consists
    of the global top-level ASes.
       Use the valley-free model of Internet
        routing.
   All paths that pass through the core
    are split into three segments:
       A segment of zero or more uphill C2P
        edges towards the core,
       At most one P2P edge in the core,
       A downhill segment of zero or more
        P2C edges from the core.
                                                          Code
           Deterministic Algorithm
                                                    Phase 2
   Paths that do not traverse the core, fail
    to provide us with a direct method for
    classification.
       There are paths that partly overlap other
        paths that traverse the core.
   For each of the remaining paths:
       Edges that precede a C2P edge must          C2P
                                                                 C2P
        reside in an uphill segment, and be of
        type C2P.
       Edges that follow a P2C edge must be in
        a downhill segment, and be of type P2C.
                                                              Code
           Deterministic Algorithm
                                                 Voting
   The data we use might be noisy and reflect transient
    routing effects.
       Especially when performing relationships inference over
        a long time frame.
   To avoid incorrect inferences resulting from these
    effects, we use voting technique:
       The above methods vote for the ToR of each traversed
        edge.
   Once the algorithm is finished, we count the votes and
    assign each edge with the type that received a relative
    votes count that passes a given threshold.           Graph

                                                             Code
        Non-Deterministic Algorithm

 The deterministic algorithm fails to
  classify several types of edges.
 We use heuristic assumptions to classify
  these edges.
         Non-Deterministic Algorithm
                                                Peers
   Edges that appear in paths that do
    not traverse the core, and reside
    between a c2p edge and a p2c
    edge.
     A c2p or p2c edges should
      participate in, at least, one path that      P2P


      pass through the core.
     The path may have a p2p
      relationship between its two top-
      level vertices
          Non-Deterministic Algorithm
                                           Voting Ties
   Edges that have a similar number of votes for
    two or more types of relationships:
     The result of changes in the commercial
      relationship over the measurements period.
     More complex peering agreements that can
      cause the same edge to behave differently as
      seen from different view points in the Internet.
        • Internet Exchange Points.
   Compare AS degrees to resolve ambiguities.
           Non-Deterministic Algorithm
                                           Valleys
   Edges might appear in non-
    valley-free paths.
     Result of valid paths that pass a
      malformed core,
     Or invalid paths that pass an
      accurate core.
   These invalid paths occur in only
    a small fraction of paths
       less than 1% on average from
        the investigated paths per week.
         Core Graph Construction

   We use three core construction methods, that
    result in cores that vary in size and density:
     Greedy Max Clique
     Kmax-Core
     CAIDA Peers
         Core Graph Construction (1)

   Greedy Max Clique
     Tauro et at. proposed the Jellyfish model.
     The core is a clique of high-degree vertices.
     The first vertex in the core is the one with the
      highest degree.
     Sorting vertices in non-increasing degree order.
     A vertex is added to the vertex only if it forms a
      clique with the vertices already in the core.
     The resulting core is a clique but not necessarily
      the maximal clique of the graph.
          Core Graph Construction (2)

   Kmax-Core (kCore)
     Carmi et at. proposed the Medusa model.
     Use a k-pruning algorithm to decompose the
      Internet AS graph and extract a nucleus
        • The Kmax-Core, which is a very well connected globally
          distributed subgraph.
     This algorithm extracts a core by looking at the
      entire graph (global approach).
     The nucleus plays a critical role in BGP routing,
      since its vertices lie in a large fraction of the
      paths that connect different ASes.
             Core Graph Construction (3)




Taken from

http://www.netdimes.org/
         Core Graph Construction (4)

   CAIDA Peers
     Constructed from ASes and edges that exhibit
      P2P relationship under the inference method of
      Dimitropoulos et al.
     Used the Automated AS ranking provided by
      CAIDA and constructed a graph that contains all
      the edges classified as P2P.
     Selected the largest connected component
      that contains some of the largest tier-1 ASes.
           Algorithm – Recall in brief

   Construct AS-level graph and extract the Core.
   Classify all edges in paths relative to the core:
       Uphill to the core.
       Downhill from the core.
   Classify all edges in remaining paths, that now have
    some classified edges.
   Count votes to decide on types.
   Classify remaining paths using heuristics:
       Single edge between P2C and C2P is probably a P2P
       Break voting ties using AS degree.
Experimental
Results
         Data Sources

   Combined data from RouteViews and DIMES
     Maximize the size and density of the topology.
     RouteViews collects BGP advertisements using
      several routers.
     DIMES performs ~2 million daily active
      traceroute measurements from hundreds of
      Agents.
   The raw DIMES data was filtered in order to
    reduce inference mistakes. Filtering
         Data Sources
                              Topology




On a weekly average, we filtered approximately 5,100 DIMES
edges that were measured only once, which is over 15% of the
edges measured by DIMES. Around half of these edges
appear in RouteViews.
         Sensitivity Analysis
                                    Core Construction




The smallest GMC core results in the lowest deterministic
inference percentage while the largest CAIDA Peers core
have the highest percentage.
           Sensitivity Analysis
                             Core Construction
   kCore provides an excellent overall inference
    percentage.
       Over 95% deterministically inferred and around
        75% matching CAIDA).
   CAIDA Peers core seems to result in the best
    overall performance
     However, almost all 6,000 edges marked as P2P
      in CAIDA are in connected.
     This is very unlikely to be the case, and causes a
      bias.
                                               Comparing Cores
           Size Sensitivity Analysis
                                     Robustness to Core Size




For more than 20 vertices in the core the algorithm classification success
and similarity to CAIDA do not significantly change, while the number of
deterministically classified edges increases.
         Size Sensitivity Analysis
                             Non-Valley-Free paths




The increase in the number of deterministically classified
edges comes with an increase in the percentage of non-valley-
free paths.
         Time Sensitivity Analysis
                              Increasing Time Frame




Using data from a single week results in over 90% of the
edges being classified for all core types.
         Time Sensitivity Analysis
                             Matching CAIDA




At any time frame, the algorithms agree on over 92% of the
edges.
         Mistake Sensitivity Analysis
                          Heuristically Classified Edges




While the algorithm's performance decreases as we increase
the randomness of the core, the overall degradation is not as
high as one would expect.
        Mistake Sensitivity Analysis
               Type of Heuristically Classified Edges




As more errors are injected, the algorithm needs to use more
heuristics.
         P2P Analysis
                Validating the DIMES Promise




While on average the p2p relationships comprise 4-5% of the
total number of edges, it goes up to around 12% of the edges
that appear only in DIMES. Approximately 40% of the p2p
edges inferred by our algorithm, do not appear in the
RouteViews.
Conclusion
       Conclusion (1)

 The common weakness of previously proposed
  AS relationships inference algorithms is their
  lack of guarantee on inference errors
  introduced during the process.
 This work improves on existing methods by
  providing a near-deterministic algorithm that,
  given a classified error-free input core, does not
  introduce additional inference errors.
            Conclusion (2)

   The proposed algorithm provides accurate inferences
       Robust under changes in the core's size and creation
        technique.
       A core containing as little as 20 almost fully-connected
        ASes is sufficient for good inference results.
       Heuristic methods can still play an important role in
        inferring the remaining relationships.
   Using single week’s data, the algorithm runs for only
    about 2 hours and yields over 95% deterministically
    inferred relationships.
Thank You!
Backup Slides…
         Voting Threshold
                                    Validation




On average, over 94% of the edges have votes for exactly
one relationship type, and almost 99% of the edges have over
80% of the votes for a single relationship type.
Deterministic Algorithm
                     Phase 1
Deterministic Algorithm
                     Phase 2
Deterministic Algorithm
                     Voting
          Data Sources
                               Filtering
   The raw DIMES data was filtered in order to
    reduce inference mistakes and inclusion of false
    links:
     Included edges that were seen from at least two
      agents.
     Trimmed all traces that exhibit known traceroute
      problems.
        • Routing loops
        • Destination impersonation
           Internet Exchange Points
                                             Definition
   An Internet exchange point (IXP)
    is a physical infrastructure
    that allows different ASes to
    exchange traffic between them
       Edges connecting an IXP to
        adjacent ASes can exhibit
        different ToR depending on the
        AS-level path they participate in.
           Internet Exchange Points
                                           Analysis
   We identify edges between ASes and IXPs and
    create corresponding virtual edges
       A virtual edge is an edge that connects two
        ASes that have indirect peering via an IXP.
   The algorithm then infers relationships in the
    paths using these virtual edges
       Instead of using the original edges between the
        ASes and IXPs.
         Sensitivity Analysis
                                    Comparing Cores




Less than 6% of the edges were differently classified using
two cores in each week. The difference between kCore and
GMC is much smaller.

						
Related docs