Near-Deterministic Inference of AS Relationships
Document Sample


Near-Deterministic
Inference of AS
Relationships
Udi Weinsberg
A thesis submitted toward the degree of Master of Science in
Electrical and Electronic Engineering
Under the guidance of Dr. Yuval Shavitt and Eran Shir.
Outline
Introduction and Theory
Problem and Algorithm
Experimental Results
Conclusion
Introduction and
Theory
Introduction
Today's Internet consists of thousands of
networks administrated by various
Autonomous Systems (AS).
Large Provider - AS7018 – AT&T
Small Provider - AS1680 – NetVision
Educational Network – AS378 – ILAN
ASes are assigned with one or more blocks of
IP prefixes and communicate routing information
to each other using Border Gateway Protocol
(BGP).
Type-of-Relationship
ASes use a set of local policies for selecting the
best route for each reachable prefix.
These policies are based on the Type-of-
Relationship (ToR) that exists between ASes.
ToRs are used to calculate paths between
ASes.
ToRs are regarded as proprietary information.
Deducing them is an important yet difficult
problem
Type-of-Relationship (2)
Three major commercial relationships between
neighboring ASes:
Customer-to-Provider (C2P)
Peer-to-Peer (P2P)
Sibling-to-Sibling (S2S)
Customer-to-Provider (C2P)
Customer AS pays a Provider AS for traffic
that is sent between the two.
Provider AS is usually larger than the customer.
Provider
C2P C2P
Customer Customer
Peer-to-Peer (P2P)
Two ASes freely exchange traffic between
themselves and their customers.
Do not exchange traffic from or to their providers
or other peers.
P2P
Peer Peer
Sibling-to-Sibling (S2S)
Two ASes administratively belong to the same
organization.
Freely exchange traffic between their providers,
customers, peers, or other siblings
S2S
Sibling Sibling
Valley Free Routing
BGP paths must comply with the following
Valley-Free hierarchical pattern:
An uphill segment of zero or more c2p or s2s
links,
Followed by zero or one p2p links,
Followed by a downhill segment of zero or more
p2c or s2s links.
Valley Free Routing (1)
Valley Free Routing (2)
Problem and
Algorithm
The Problem
Given the AS graph (ASes as vertices with
interconnecting edges), find the type-of-
relationship between all adjacent ASes.
Inferring ToR = Classifying edges.
Peer
Customer
Provider
? ?
Provider
Peer
Customer
Related Works
Current relationships inference algorithms use
one of two techniques:
Using heuristic assumptions
• Comparing AS degree to determine the “larger” AS.
Optimizing some aspects of the ToR
assignments
• Minimizing number of paths that are not valley-free
• Not allowing cycles in the resulting directed AS graph
The Gap
Using heuristic assumptions throughout the
relationships inference process causes the
erroneous ToRs to be spread over all
interconnecting ASes links.
Optimization models fail to capture the true
Internet hierarchy.
Work Goal
Improve on existing methods by providing a
near-deterministic inference algorithm for
solving the ToR problem.
We use the Internet Core, a sub-graph that
consists of the globally top-level providers of
the Internet
Their interconnecting edges are already
classified.
Near-Deterministic Inference
Theoretically, given an accurate core with no
relationships errors, the algorithm
deterministically infers most of the remaining
AS relationships using the AS-level paths
relative to this core
Without incurring additional inference errors!
In real-world scenarios, where the core and AS-
level paths can contain errors, the algorithm
introduces minimal inference errors.
Why Near-?
For the remaining set of relationships that
cannot be inferred deterministically, a heuristic
inference method is deployed.
This group is relatively small, so it is still
possible to provide a strict bound on the
inference error.
Algorithm - Definitions
Input
S – a set of AS-level routing paths.
G(VG,EG) – the set of vertices that represent all
ASes, and the interconnecting edges that need
to be classified.
Core(VC,EC) – the vertices and interconnecting
edges that represent the core of G, and is
assumed to contain all the top-level ASes.
Output
EG – Edges of input graph with votes for ToRs.
Deterministic Algorithm
Pre-processing
Prior to starting the relationships inference
algorithm, we infer S2S relationships.
We use S2S data collected from CAIDA
Obtained from IRR databases (RIPE, ARIN,
APNIC).
Deterministic Algorithm
Phase 1
Assuming that the input core consists
of the global top-level ASes.
Use the valley-free model of Internet
routing.
All paths that pass through the core
are split into three segments:
A segment of zero or more uphill C2P
edges towards the core,
At most one P2P edge in the core,
A downhill segment of zero or more
P2C edges from the core.
Code
Deterministic Algorithm
Phase 2
Paths that do not traverse the core, fail
to provide us with a direct method for
classification.
There are paths that partly overlap other
paths that traverse the core.
For each of the remaining paths:
Edges that precede a C2P edge must C2P
C2P
reside in an uphill segment, and be of
type C2P.
Edges that follow a P2C edge must be in
a downhill segment, and be of type P2C.
Code
Deterministic Algorithm
Voting
The data we use might be noisy and reflect transient
routing effects.
Especially when performing relationships inference over
a long time frame.
To avoid incorrect inferences resulting from these
effects, we use voting technique:
The above methods vote for the ToR of each traversed
edge.
Once the algorithm is finished, we count the votes and
assign each edge with the type that received a relative
votes count that passes a given threshold. Graph
Code
Non-Deterministic Algorithm
The deterministic algorithm fails to
classify several types of edges.
We use heuristic assumptions to classify
these edges.
Non-Deterministic Algorithm
Peers
Edges that appear in paths that do
not traverse the core, and reside
between a c2p edge and a p2c
edge.
A c2p or p2c edges should
participate in, at least, one path that P2P
pass through the core.
The path may have a p2p
relationship between its two top-
level vertices
Non-Deterministic Algorithm
Voting Ties
Edges that have a similar number of votes for
two or more types of relationships:
The result of changes in the commercial
relationship over the measurements period.
More complex peering agreements that can
cause the same edge to behave differently as
seen from different view points in the Internet.
• Internet Exchange Points.
Compare AS degrees to resolve ambiguities.
Non-Deterministic Algorithm
Valleys
Edges might appear in non-
valley-free paths.
Result of valid paths that pass a
malformed core,
Or invalid paths that pass an
accurate core.
These invalid paths occur in only
a small fraction of paths
less than 1% on average from
the investigated paths per week.
Core Graph Construction
We use three core construction methods, that
result in cores that vary in size and density:
Greedy Max Clique
Kmax-Core
CAIDA Peers
Core Graph Construction (1)
Greedy Max Clique
Tauro et at. proposed the Jellyfish model.
The core is a clique of high-degree vertices.
The first vertex in the core is the one with the
highest degree.
Sorting vertices in non-increasing degree order.
A vertex is added to the vertex only if it forms a
clique with the vertices already in the core.
The resulting core is a clique but not necessarily
the maximal clique of the graph.
Core Graph Construction (2)
Kmax-Core (kCore)
Carmi et at. proposed the Medusa model.
Use a k-pruning algorithm to decompose the
Internet AS graph and extract a nucleus
• The Kmax-Core, which is a very well connected globally
distributed subgraph.
This algorithm extracts a core by looking at the
entire graph (global approach).
The nucleus plays a critical role in BGP routing,
since its vertices lie in a large fraction of the
paths that connect different ASes.
Core Graph Construction (3)
Taken from
http://www.netdimes.org/
Core Graph Construction (4)
CAIDA Peers
Constructed from ASes and edges that exhibit
P2P relationship under the inference method of
Dimitropoulos et al.
Used the Automated AS ranking provided by
CAIDA and constructed a graph that contains all
the edges classified as P2P.
Selected the largest connected component
that contains some of the largest tier-1 ASes.
Algorithm – Recall in brief
Construct AS-level graph and extract the Core.
Classify all edges in paths relative to the core:
Uphill to the core.
Downhill from the core.
Classify all edges in remaining paths, that now have
some classified edges.
Count votes to decide on types.
Classify remaining paths using heuristics:
Single edge between P2C and C2P is probably a P2P
Break voting ties using AS degree.
Experimental
Results
Data Sources
Combined data from RouteViews and DIMES
Maximize the size and density of the topology.
RouteViews collects BGP advertisements using
several routers.
DIMES performs ~2 million daily active
traceroute measurements from hundreds of
Agents.
The raw DIMES data was filtered in order to
reduce inference mistakes. Filtering
Data Sources
Topology
On a weekly average, we filtered approximately 5,100 DIMES
edges that were measured only once, which is over 15% of the
edges measured by DIMES. Around half of these edges
appear in RouteViews.
Sensitivity Analysis
Core Construction
The smallest GMC core results in the lowest deterministic
inference percentage while the largest CAIDA Peers core
have the highest percentage.
Sensitivity Analysis
Core Construction
kCore provides an excellent overall inference
percentage.
Over 95% deterministically inferred and around
75% matching CAIDA).
CAIDA Peers core seems to result in the best
overall performance
However, almost all 6,000 edges marked as P2P
in CAIDA are in connected.
This is very unlikely to be the case, and causes a
bias.
Comparing Cores
Size Sensitivity Analysis
Robustness to Core Size
For more than 20 vertices in the core the algorithm classification success
and similarity to CAIDA do not significantly change, while the number of
deterministically classified edges increases.
Size Sensitivity Analysis
Non-Valley-Free paths
The increase in the number of deterministically classified
edges comes with an increase in the percentage of non-valley-
free paths.
Time Sensitivity Analysis
Increasing Time Frame
Using data from a single week results in over 90% of the
edges being classified for all core types.
Time Sensitivity Analysis
Matching CAIDA
At any time frame, the algorithms agree on over 92% of the
edges.
Mistake Sensitivity Analysis
Heuristically Classified Edges
While the algorithm's performance decreases as we increase
the randomness of the core, the overall degradation is not as
high as one would expect.
Mistake Sensitivity Analysis
Type of Heuristically Classified Edges
As more errors are injected, the algorithm needs to use more
heuristics.
P2P Analysis
Validating the DIMES Promise
While on average the p2p relationships comprise 4-5% of the
total number of edges, it goes up to around 12% of the edges
that appear only in DIMES. Approximately 40% of the p2p
edges inferred by our algorithm, do not appear in the
RouteViews.
Conclusion
Conclusion (1)
The common weakness of previously proposed
AS relationships inference algorithms is their
lack of guarantee on inference errors
introduced during the process.
This work improves on existing methods by
providing a near-deterministic algorithm that,
given a classified error-free input core, does not
introduce additional inference errors.
Conclusion (2)
The proposed algorithm provides accurate inferences
Robust under changes in the core's size and creation
technique.
A core containing as little as 20 almost fully-connected
ASes is sufficient for good inference results.
Heuristic methods can still play an important role in
inferring the remaining relationships.
Using single week’s data, the algorithm runs for only
about 2 hours and yields over 95% deterministically
inferred relationships.
Thank You!
Backup Slides…
Voting Threshold
Validation
On average, over 94% of the edges have votes for exactly
one relationship type, and almost 99% of the edges have over
80% of the votes for a single relationship type.
Deterministic Algorithm
Phase 1
Deterministic Algorithm
Phase 2
Deterministic Algorithm
Voting
Data Sources
Filtering
The raw DIMES data was filtered in order to
reduce inference mistakes and inclusion of false
links:
Included edges that were seen from at least two
agents.
Trimmed all traces that exhibit known traceroute
problems.
• Routing loops
• Destination impersonation
Internet Exchange Points
Definition
An Internet exchange point (IXP)
is a physical infrastructure
that allows different ASes to
exchange traffic between them
Edges connecting an IXP to
adjacent ASes can exhibit
different ToR depending on the
AS-level path they participate in.
Internet Exchange Points
Analysis
We identify edges between ASes and IXPs and
create corresponding virtual edges
A virtual edge is an edge that connects two
ASes that have indirect peering via an IXP.
The algorithm then infers relationships in the
paths using these virtual edges
Instead of using the original edges between the
ASes and IXPs.
Sensitivity Analysis
Comparing Cores
Less than 6% of the edges were differently classified using
two cores in each week. The difference between kCore and
GMC is much smaller.
Related docs
Get documents about "