Docstoc

EXPLOITING SYSTEM DIVERSITY IN PEER-TO-PEER PUBLISH-SUBSCRIBE SYSTEMS

Document Sample
EXPLOITING SYSTEM DIVERSITY IN  PEER-TO-PEER PUBLISH-SUBSCRIBE SYSTEMS Powered By Docstoc
					                                             1




EXPLOITING SYSTEM DIVERSITY IN
PEER-TO-PEER PUBLISH-SUBSCRIBE SYSTEMS

Final Exam of Jay A. Patel (April 1, 2009)
    Outline
2
    Computing Today
3


    Continued growth of Internet services
      search engines, webmail systems, social networks,
       blogging platforms, file sharing systems, etc.
    Many use publish-subscribe paradigm
      FaceBook, Twitter, YouTube
    Scaling via geographic distribution of infrastructure
      moving around large sets of data
        Content distribution, log collection, archiving data
    System Diversity
4


    Successful distributed systems must provide high
     reliability, availability, performance, and scalability
    System diversity arises from variations in either
      resources or characteristics available to end hosts
        e.g., network diversity
      requirements of individual end hosts themselves, usually
       arising due to end user demands
        e.g., interest diversity
    Many Forms of System Diversity
5


    Interest
      Subscription heterogeneity
    Network
      Bandwidth, location, connectivity, packet loss rates
    Platform
      Hardware and software
    Workload
      Time of day effects, flash crowds
    Other
      Availability
       Taxonomies of Publish-Subscribe Systems
6


Subscription Model                                Architecture
       Subject based                                  Client-Server
       Content based                                  Peer-to-Peer
                Delivery Mechanisms
                   Unicast (“1-to-1”)
                   Single source multicast (“1-to-n”)
                   Convergecast (“n-to-1”)
                   General purpose multicast (“m-to-
                    n”)
                   [Castro02], delivery mechanism), SpiderCast [Chockler07],
            RSS (contemporarySplitStream [Castro03],Gryphon [Bhola03], Siena [Carzaniga01],
        RSSMulticast, ESM FeedTree[Sandler05], Scribe [Castro02], SpiderCastMulticast [Liao06]
      Scribe(contemporary delivery mechanism),Bayeux [Zhuang01], FeedTree [Sandler05],
         IP Scribe(contemporary delivery [Paul96], Gossip [Demers88], Bimodal AnySee
              [Castro02], [Chu02], RMTP mechanism), E-mail Listservs
    Gryphon RSS [Ramasubramanian06], Cobra [Rose07]
            Corona
              [Bhola03], Siena [Carzaniga01], Sub-2-Sub [Voulgaris05]
        Bayeux [Zhuang01], SpiderCast[Chockler07]
         [Birman99]
      [Chockler07], Sub-2-Sub [Voulgaris05]
    Thesis Focus
7


    We choose to focus on the convergecast and
     general purpose multicast paradigms as
      these paradigms remain the least explored of the lot
       with respect to system diversity
        Unicast is meant for simplicity of use
        Multicast is well studied
      these paradigms are most relevant in today’s
       computing
        Multi-site cloud computing, grid computing, etc.
        Improved delivery for RSS
    Thesis Statement
8


    “We show that by directly addressing interest and
     network diversity as a first class design principle,
     the scale and performance of peer-to-peer publish-
     subscribe systems can be improved.”
    Thesis Contributions
9


    Confluence (n-to-1)
      A system for lossless collection of data from multiple sources
       to a single sink
      Exploits both spatial and temporal network diversity
    RANS Framework
      Provides realistic and deterministic simulation at large scales
      Helps model network, interest, and availability diversity
    Rappel (m-to-n)
      A system that leverages interest and network locality to
       improve fairness
      Exploits network and interest diversity
     Outline
10
11   Confluence
     A System for Lossless Multi-Source Data
     Collection
     Introduction
12


     New emerging paradigm: fetching large files from
      multiple sources to a central clearing house
       Multiple data centers (cloud computing)
       PlanetLab servers
       Tele-immersive video gateways
       Grid computing
     A n-to-1 publish-subscribe system
       Multiple sources = multiple publishers
       Single sink = single subscriber
     The Objective
13


     “To minimize the total time required to transfer the
       necessary files from the source nodes to the sink
       node.”

     Currently, there are no known systems that optimize
      for this goal
       End users generally use the direct transfer (1-to-1)
        strategy to fetch files
     Key Observation
14


     The diversity of connections amongst Internet hosts
      has been widely observed and falls into two
      categories
       Spatial diversity refers to the fact that different links
        have different bandwidth availabilities
       Temporal diversity refers to the variation over time of
        the available bandwidth at a single link
                             2500 secs
Host x                                                           Host y
                                  2 MBps

5000 MB            Exploit Natural Parallelism                      5000 MB


                                          5 MBps
             1 MBps                    1000 secs
            5000 secs


                                               Host t
 15       Motivating Example
          The transfer process can be speeded up by routing data
          via intermediate nodes
            • 37% of PlanetLab node pairs achieve better throughput by leveraging
              a third node
     System Assumptions
16


     Files can be subdivided into blocks
     All files (file blocks) are unique
       no a priori replicas present in the system
     Source node failures do not occur
       if a source node fails, Confluence provides no resiliency
        guarantees
       acceptable as same problem with Direct Transfer
     Network Graph Model
17


 Host x                               x                              y
                    Host y            +                              +

                                       Cx                             Cy
                                       +                              +
                                      x                              y
                                      0                              0
                                                cx         cy
                                          Cx    y          x          Cy
                                          -                           -
                                      x              t+              y
          Host t                      -        cxt             cyt   -

The network graph G supports both                    Ct
                                                      +
     asymmetric link capacities
                                                     t0
     asymmetric ISP connectivity limits
     Theoretical Solution (Spatial Only)
18

 Graph translation: GT                             bx     s          by
      T is duration
 Add super source s                       x                               y
                                           +                               +
 Connect super source to nodes
  via file edges                            Cx
      bx capacity equals file size held
                                           TCx                              Cy
                                                                           TCy
                                           ++                              ++
      total capacity of s,                x                               y
                                           0                               0
 The maximum st0 flow in GT                        cx
                                                     Tcx        cy
                                                                Tcyx
     corresponding flow: f T                  TC
                                               Cx    yy         x           Cy
                                                                           TC
                                               --                           --
 Find the smallest time T* that               x                           y
  can move B blocks from st0              x               t+              y
     binary search on [0, Tmax)
                                           -        cxt
                                                    Tc              cyt
                                                                    Tcyt   -

     corresponding flow: f T*                      xt      TC
                                                            C t t+
                                                            +
     optimal flow rates: f *=f T*/ T*
       Time complexity: O(log (VBC) VE log (V2/E))         t0
     System Design
19


     Maintaining network graph G
       via coordinator
     Link information
       staleness: large gap between probes (round robin)
       cost: must send data packets to measure
       our solution: k << n
          each participating node has k peers
          chosen uniformly at random by the coordinator
     Measuring cxy and Cx-
       PathChirp [Ribeiro2003]
       PathBlast – brute force
          needed only once, during bootstrapping
     Transfer Plan
20


     Coordinator calculates based on theoretical solution
       Sends directives to nodes: how many blocks, to whom
       What about optimal flow rate?
     First approach: Flow Control System [Mehra2003]
       Subpar performance with multiple flows
       Temporal diversity galore
     Better approach: dynamic adaptation at the
      application level
     Dynamic Adaptation
21


     Data is pushed from senders to all receivers
      simultaneously
     Periodic recomputation
       Update network graph G with new state information
         the number of residual file blocks: b´x
         if rxy > f *xy × (1 − slack)
                   then cxy = max(cxy , rxy)
                   else cxy = max(cxy /2, rxy)
       Coordinator recalculates transfer plan periodically
       State inconsistency: final_computation flag
                                         rxy – measured flow rate
                                         f *xy – optimal flow rate
                                         slack – to prevent hysteresis
      Exploiting Block Replication
22

 Replication naturally occurs                 b'         s           b'y +
  during the transfer process                  x                      b yx
     e.g., node y holds some blocks   x                                   y
       originating at node x           +                                  +
      purge-immediately policy
 “Tag” blocks with origin node
                                        Cx                                 Cy
                                        +                                  +
  and unique ID                        x             by                   y
                                       0             x                    0
 File replication edge
     Conservation of blocks               Cx        cx        cy
                                           -
                                                                           Cy
 If the solution uses a file                        y         x           -
  replication edge                     x                  t+              y
                                       -           cxt              cyt   -
     REPLICATED_BLOCKS directive
     Lets origin node retransmit                          Ct
                                                           +
 Note: threshold for replication
                                                          t0
     Experimental Methodology
23


     Implemented using ns2 network simulator
       TCP CUBIC as transport layer protocol
     PlanetLab topology
       trace measurements collected via S3 [Yalagandula06]
     Up to a 100 source nodes
       100 MB file per source node
     Confluence parameters
       Number of neighbors, k = 10
       Recomputation interval, p = 15
       Replication enabled with purge-immediately policy
      Direct Transfer: Microbenchmark
24




                                  Pool 49




                                             Pool 10




     Maximizing the number of parallel connections is the best
      approach
       Poorly-connection connections start earlier
        Direct Transfer: Macrobenchmark
25


      100 total nodes
         50 runs: in run #i, node #i
            acts as a sink node
      First result: fetch from the first
       50 nodes (49 source nodes)
      Second result: fetch from all
       nodes (99 source nodes)
      y-axis = ratio of second result
       to first result


      Direct Transfer scales will increasing number of source nodes
         Well-connected nodes are better able to exploit parallelism
     Confluence vs. Direct Transfer
26




     50 participating nodes, 50 different runs
       in run #i, node #i acts as a sink node, other nodes act as
        source nodes
       each sink node fetches from all other participating nodes




     Confluence performs better than Direct Transfer
       70% perform better with Confluence on a planetary scale
       90% perform better with Confluence on a continental scale
     Overheads in Confluence
27


     Measuring network graph G
       may be inaccurate, stale, or both
     k peers may not be able to saturate the sink
     Delayed start
       metadata must be collected by the coordinator, solution
        calculated, and transfer plan directives sent
     State inconsistency
       The final set of blocks sent directly to coordinator may
        take sometime to finish
     Tracking each block
     Confluence vs. Direct Transfer II
28




      Confluence performs well on different planetary scale topologies
         All nodes perform better in 3 topologies, 90% in one, and 70% in
          other
      Confluence excels in scenarios with small number of source nodes
         100% under 50 nodes, 80% with 75 nodes, 70% with 100 nodes
     Related Work
29


      CoBlitz
       uses PlanetLab nodes to improve 1-to-1 transfers
      GridFTP
       tool to move around large sets of data on the grid
      BitTorrent [Cohen03], CDNs [Akamai]
       Multiple replicas of data available
      Miscellaneous
       Estimating bandwidth: pathChirp [Ribeiro03], PTR/IGI [Hu03],
        Pathload [Jain02], pathrate [Dovrolis04], Pathneck [Hu04], etc.
       TCP for LFNs: High-Speed TCP [Chase01], TCP BIC [Xu04],
        TCP CUBIC [Rhee05], FAST TCP [Wie06], TCP-Illinois [Liu06], etc.
     Outline
30
31   RANS Framework
     Realistic and Deterministic Simulation at Large
     Scales
     Motivation
32


     Need to study system diversity at large scales
       network diversity: end-to-end latency fluctuations
       interest diversity: subscription correlation
     System deployment is labor-intensive and limited
       PlanetLab usually only has about 400 accessible nodes
     Experiments are not replayable
     Simulations provide an alternative, but not realistic
     Introduction
33


     RANS objectives
       Realism
         simulation results should match deployment observations
         simulation should be run using the same code as an actual
          implementation
       Deterministic replay
         unmodified application should yield the same result when
          provided with identical input as a previous execution
       Large scale
         ability to simulate several thousand end nodes
          selective granularity simulation
     Application Programming Interface
34




     An application interfaces with the EventManager
      and the TransportManager
     An application can run multiple protocols
     Events
35
     Messages
36
     Implementation
37


     Sockets implementation
       single-threaded, uses boost::asio
     Trace-based Simulator
       Overnet churn trace
     Topology Manager
38

      Goal: desire realistic end-to-end latency fluctuations
         Problem: artificial topology, limited scale of trace data
         Solution: topology fitting
      PlanetLab RTT trace with fluctuations
         226 end hosts, over 4 hours
         continuous fluctuations between node pairs (median latency)
      Internet AS topology
         20062 stub networks, 175 transit networks, and 8279 transit-and-stub
          network networks
      Topology fitting via trial and error
         Match simulator generated latencies with PlanetLab median latencies
            the first 10% of inter-AS links: latency between 0ms and 4ms
            the next 30% of inter-AS links: latency between 4ms and 30ms
            the final 60% of inter-AS links: latency between 30ms and 115ms
         Map each simulator node pair with a random PlanetLab node pair (for
          fluctuations)
     Topology Fitting Results
39




     The latencies modeled by RANS closely matches the
      latencies experienced within PlanetLab
     Simulation vs. PlanetLab:
     Rappel Per-Feed Dissemination Tree
40


     Experiment: 1 publisher, 250 subscribers, 4 hours, 1 update very minute




                   Close match of results validates RANS framework
              95% of nodes consistently receive updates in under 0.5 seconds
     Simulation vs. PlanetLab II
41




            Close match of results validates RANS framework
      Stretch ratio w.r.t. to the underlying coordinate space is lower
     Related Work
42


     Network simulators
       ns2, QualNet, OPNET
     Emulators
       ModelNet, EmuLab
     Testbed
       PlanetLab
     Application-level network simulators
       p2psim, GnutellaSim
     Programming libraries/languages for rapid
      deployment
       Macedon, P2
     Outline
43
44   Rappel
     Using Locality to Improve Fairness in Publish-
     Subscribe Systems
     System Goals
45


     Peer-to-peer delivery for RSS
       Low publisher and subscriber overhead
       Optimizations due to m-to-n delivery paradigm
     Client Fairness
       Zero noise
       Load should scale w/ number of subscriptions
         Exploit interest diversity
     Real-time update dissemination
       Via a low stretch ratio
       Exploit network diversity
          Design Overview
46

                                                H
                    C       D                               A global control plane
           A                                                  Used to locate other nodes
                                        E
                        B                                      (“friends”) close in interest-
                                                    G          and network- proximity
                                F                             The “friends overlay”
     P1                                                     A multicast dissemination tree
               Pi                           P1
                                                             per feed
                                                              Can be joined by contacting
                                                               any active node
                                K           W       Z         Only subscribers join a feed’s
                                                               dissemination tree
                                                        B        Eliminates noise
                                    A
     Techniques Utilized in Rappel
47


     Friends Overlay
       Utility-based selection of friends
         Exploit interest and network locality using Bloom filters and
          network coordinates
       Discovering new nodes via gossip
         Periodic audits
     Per-feed Dissemination Trees
       Primitives based on network coordinates
       Periodic rejoin
       Push-pull of updates
     Exploiting Network Diversity
48

          Experiment: 1 publisher (at UIUC), 25 subscribers (distributed across USA)




     Rappel’s per-feed dissemination trees exploit network diversity
     Exploiting Interest Diversity
49

           Experiment: 250 feeds, 5582 subscribers (simulation only)




           91% clients perfectly covered; no “wasted” friend
     Rappel’s friendship overlay effectively exploits interest diversity
     Outline
50
     Concluding Remarks
51


     “We show that by directly addressing interest and network
       diversity as a first class design principle, the scale and
       performance of peer-to-peer publish-subscribe systems can
       be improved.”

      Confluence exploits spatial and temporal network diversity
       for a convergecast delivery based publish-subscribe system
      RANS framework provides realistic and deterministic
       simulation to study system diversity at large scales
      Rappel exploits both interest and network diversity via the
       use of locality to improve fairness and system performance

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:9/25/2013
language:
pages:51