Reliable Overlay Networks

Document Sample
Reliable Overlay Networks Powered By Docstoc
					Resilient Overlay Networks

     CS294-4 Presentation
       Nikita Borisov
        Sep 15, 2003
    Internet Routing Inefficient
• BGP is designed for scalability, sacrificing
• Link outages common, but routing tables
  take minutes to update
• Summarized data creates inefficient paths
• No response to congestion
Network Redundancies
       Network Redundancies
• Multiple paths exist between most hosts
  – Many are not advertised due to private peering
• Link outages lead to non-transitive
  – A and C can’t reach each other but B can reach
    them both
• Indirect paths often offer better performance
  – (though possibly violate AUPs)
                RON goals
• Fast failure detection and recovery
  – Seconds, not minutes
• Integration with application
  – Optimize routes for latency, throughput, etc.
• Fine-grained policy specification
  – E.g. keep commercial traffic off Internet2
           Overlay Network
• Small network - 3-50 nodes
• Continuous measurement of each pairwise
• Connectivity/performance stats distributed
• Pick best path out of direct and indirect
  – Restrict search to one indirect hop
            Failure Detection
• Active monitoring
  – Send probes on each virtual link
  – One probe every 14s
  – Fast timeout probes if one is lost
• Detect failure in under 20s
  – Faster than any TCP timeout
  – Good enough for even human scale
        Performance Metrics
• Estimate latency based on RTT of probes
  – Moving weighted average
  – Assume latency is symmetric
• Estimate loss rate based on probes received
  – Average of last 100 samples
• Estimate TCP throughput
  – Model TCP performance based on latency and
    loss rate
                Path Selection
• Always route around outages
• Application can optimize for latency, loss rate,
   – Throughput hard to optimize
   – Avoid bad-throughput routes instead
   – Exhaustively search all one-hop paths
• Introduce hysteresis to prevent “route flapping”
             Routing Policy
• Policies specify which virtual links to use
• Separate routing tables per policy
• Packets classified with policy tag and
  routed accordingly
• Sample policy: exclusive clique
  – Only members of clique can use links between
    each other
  – E.g. Internet2 hosts
• Two studies (RON1 and RON2)
• RON recovers from 100% (RON1) or 60%
  (RON2) outages and high loss rates
• Routes around bad throughput failures
  – Doubles TCP throughput in 5% of all samples
• Reduces loss rate by 0.05 in 5% of samples
       Performance Problems
• RON worse in some cases
  – Measurement inaccuracies
  – Information propagation delays
  – Hysteresis
• But …
  – RON win in most cases
  – RON loss never very large
  – RON win, though, can be dramatic
• Probing traffic - grows O(N)
• Routing state traffic - grows O(N2)
• Total BW consumed
  – 2.2Kbps with 10 nodes
  – 33Kbps with 50 nodes
• A limiting factor for scaling
• Is this overhead excessive?
  – Less than 10% of a broadband link
• What if RONs become more popular?
• Is using a RON “cheating”?
•   Videoconferencing
•   Cooperating ISPs
•   Branch offices of companies
•   Others?

Shared By: