Stream Based Overlay Networks

Shared by: liaoqinmei
Categories
Tags
-
Stats
views:
0
posted:
11/8/2012
language:
Unknown
pages:
34
Document Sample
scope of work template
							               Network-Aware
            Operator Placement for
         Stream-Processing Systems



          Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman,
            Mema Roussopoulos, Matt Welsh, Margo Seltzer
                           hourglass@eecs.harvard.edu

                              Systems Research Group
                    Division of Engineering and Applied Sciences
                                  Harvard University
November 12                     ICDE 2006 – Atlanta
        Large-Scale Stream-Processing

Many geographically distributed data sources
      e.g., sensors, network routers, RFID tag readers, …
 • High volume of real-time stream data

Many users, submitting individual stream queries
 • Queries use the Internet for stream transport

Queries include operators for stream-processing
      e.g., join, filter, aggregate, XPath, image analysis, …
 • Operators require nodes for execution
 • In-network processing can often reduce data volume

                                                                 2
Stream-Based Overlay Network


      transform


                  aggregate




                               SBON
                               Node

                               Data
                               Source

                               User
                                 3
              Operator Placement Problem

How do you map operators to overlay nodes?

Efficiency
  • Node and network resources are limited and shared
  • Operator placement must be network-aware
      Consider link latency, bandwidth, congestion, jitter, …
      Filter and aggregate data close to sources
Scalability
  • Must scale to many sources, overlay nodes and queries
  • No global view of the system
Adaptability
  • Resource conditions change over time
                                                                 4
                    Contributions

Stream-Based Overlay Network (SBON)
 • Generic layer between network and stream-processing apps
 • Shields applications from network complexity

Operator placement using a metric cost space
 • Decentralized framework for minimizing network impact
 • Relaxation placement algorithm for operator placement
 • Adaptive to change in network conditions

Deployment of SBON and sample applications on PlanetLab


                                                           5
                        Outline

Introduction and Background

Operator Placement Goals

Network-Aware Operator Placement
 • Cost Space
 • Relaxation Placement Algorithm
 • Evaluation Results

Related and Current Work

Summary
                                    6
            Operator Placement Goals

Two conflicting optimization goals:


1. Global system performance with concurrent queries
      Minimize network usage
      Balance node and network load
             Minimize global network usage

2. Individual query performance
      Minimize data delay
      Maximize stream throughput



                                                       7
                           Network Usage

                      Datarate = 50 MB/s
                      Latency = 100 ms


                                               Datarate = 50 MB/s
             A                                  Latency = 30 ms
              Datarate = 50 MB/s
               Latency = 40 ms B
                                           Datarate = 10 MB/s
                                           Latency = 200 ms = 10 MB/s
                                                    Datarate
Datarate = 50 MB/s                                    Latency = 80 ms
 Latency = 50 ms

                     Datarate = 75 MB/s                      In-flight Traffic:
                           Datarate =
                      Latency = 20 ms75 MB/s              ∑ Datarate * Latency
                            Latency = 100 ms
                                                                  = 17 MB
                                                                 = 5.8 MB
                                                                             8
     Network-Aware Operator Placement

Perform operator placement in a decentralized fashion
 • Need information about data rate and latency

But measuring network metrics is expensive
 • All pairs latency measurements are O(n2)
 • Network latencies change over time
 • No global knowledge of measurements

Idea: Approximate optimal query with a cost space [NetDB’05]
 1. Build metric cost space to encode current network latencies
 2. Find query with minimal network usage in cost space
 3. Map query back to physical Internet nodes and instantiate
                                                                  9
                        Cost Space

Embed latency measurements into a metric space
 • Assign each SBON node a coordinate in a cost space
 • Euclidean distance ≈ network latency
 • Vivaldi [MIT], Big Bang [Tel Aviv], Lighthouses [Cambridge]
     Repeated measurements to refine local coordinate
Advantages
 • Mathematical model for using
   geometric algorithms
 • Optimization decisions
   without global knowledge
 • Adaptive to change


                                                                 10
               Relaxation Placement




Find a location for an operator that reduces network usage
                                                         11
                Relaxation Placement


          Latency




                Datarate




Use spring relaxation technique to find best location
  • Spring extension ≈ latency
                                                        12
  • Spring constant ≈ data rate
                Relaxation Placement




Use spring relaxation technique to find best location
  • Springs “relax” to low energy state, minimizing network usage
                                                               13
  • Dynamically adapts to changes in cost space
                Relaxation Placement




Uses nearest k-neighbor search for mapping of coordinates
 • Interesting problem in decentralized context
                                                              14
      Geometric routing [HUJI], DHT range queries [UCB], …
                Relaxation Placement

Any SBON node can perform the placement for a new query
  • Local computation without global state
      Inputs are coordinates of nodes and data rates in query
  • Supports placement of arbitrary complex queries
      Model multiple queries as networks of spring


Each node is then responsible for the operators it is hosting
  • Periodically re-execute Relaxation placement
  • Dynamically migrate operator to reflect new placement
      Adapts to changes in latency and data rate




                                                                 15
                   Simulation Setup

Discrete event simulator to evaluate placement algorithms
 • GATech transit-stub topology with 1550 nodes
      10 transit domains and 150 stub domains
      Realistic Internet routing tables
 • 1000 queries with 5 random endpoints
 • Comparison of Relaxation placement
   to 4 other algorithms                          1KB/s



    Optimal      Exhaustive search
    Producer     Common strategy
    Consumer     Central data warehouse
    Random       Worst case
                                                            16
                                         Global Network Usage
                  100

                        90

                        80
Percentage of Queries




                                                                                              Avg. Network
                                                                           Placement
                        70                                                                       Usage
                                                                           Algorithm
                                                                                                Penalty
                        60
                                                                         Optimal (NU)              0%
                        50
                                                                         Relaxation              15%
                        40
                                                                         Producer                43%
                        30
                                                                         Consumer                60%
                        20
                                                                         Random                  81%
                        10

                        0
                        1000   1500   2000   2500   3000   3500   4000   4500   5000   5500
                                             Network Usage (in KB)


         • Relaxation placement performs close to Optimal                                                    17
                                                   Application Delay Penalty
                   100

                        90
                                                                                          Placement     Avg. Delay
Percentage of Queries




                        80
                                                                                          Algorithm      Penalty
                        70
                                                                                         Optimal (NU)     13%
                                   Consumer




                        60
                                                                                         Relaxation       24%
                        50                                                               Producer         75%
                        40                                                               Consumer           0%
                                                          Delay penalty:
                        30                                                               Random           76%
                        20                                 Longest path delay
                                                               IP delay
                        10

                        0
                            -50%              0%    50%    100%   150%     200%   250%   300%   350%
                                                              Delay penalty
       • Consumer has smallest delay penalty
       • Relaxation has low delay penalty for an overlay network                                                 18
                 Operator Migration




7 SBON nodes shown in the cost space over several hours
 • Query with one migrating aggregation operator      19
         Operator Migration on PlanetLab




                         Relative Improvement in Network Usage
                                                                   Improved
• 48 concurrent
                                                                 Network Usage
queries on 130 nodes
• ½ of the queries
could migrate
• Same initial place-
ment for migrating and
non-migrating queries
• Change in network                                                                  Worse
usage of migrating                                                               Network Usage
queries after 5 hours
                                                                             Query Pair Number


• Migration decreased network usage for 75% of queries
      17% less network usage and 11% lower application delay                                    20
                         Related Work

Borealis [MIT, Brown, Brandeis] , Medusa [MIT], Gates [Ohio]
  • Focus on high-availability and load management
  • Wide-area operator placement specified by user

SAND [Brown] , PIER [UCB]
  • Operator placement at edge (prod/cons) or in-network
  • Exploit DHT routing paths for operator placement
        Can lead to poor placement efficiency [IPTPS’05]


IrisNet [Intel]
  • Hierarchical placement following DNS structure

                                                               21
                      Current Work

SBON prototype deployment on PlanetLab
 • Java-based, event-driven implementation with 25K loc
 • Implementation of 3 stream-processing applications
     Borealis stream-processing engine
     PlanetLab network monitoring
     Real-time weblog analysis


Network-aware stream query optimization
 • Operator decomposition: Clustering in cost space
 • Operator reuse: Bounded geometric search in cost space
 • Query reliability: Trade-off between data fidelity & replication


                                                                 22
                         Summary

Large-scale stream applications need new systems support
 • SBON: Infrastructure for stream-processing applications
 • Provides network-aware stream query optimization
Cost space approach for query optimization
 • Metric space for decentralised optimization decisions
 • Express query optimization as geometric problem
Relaxation placement algorithm for operator placement
 • Scalable placement decisions reducing network usage
 • Continuous optimization as network conditions change


                Thank You. Any Questions?
                Peter Pietzuch <prp@eecs.harvard.edu>        23
  Backup Slides




Backup Slides


                  24
                 3. Operator Reuse




Share operators between overlapping sub-queries
 • Use cost space to bound search effort for reuse   25
                 Spring Relaxation

                     S
                                     P2

                                P1

Network of springs tries to minimize potential energy E
 F=½•k•s                   where k is the spring constant
                           and s is the spring extension
 ∑ E= ∑ F • s
    = ∑ ½ • k • s2         where E is the potential energy


 ∑ [DR • Lat]2             Cost function for placement
                                                            26
                 Previous Research

 What if a user just wants asynchronous event notification?
Hermes: An event-based middleware architecture
 • Self-organizing network of event brokers
 • Scalable, content-based routing of events using a DHT
 • Programming language integration with static type-checking


 What if the user is interested in complex event patterns?
DistCED: A system for distributed event pattern detection
 • Idea: Use extended FSA for pattern detection
 • Detectors are factorizable and have bounded resource usage

                                                            27
Cost Space Convergence


             Convergence after 30min with
             116 North American PL nodes




                                     28
               Resource Contention


Transit-domains more
                               300
popular for placement          250

 • Traffic goes there anyway   200
                                     Optimal
 • Enable transit domains      150   Placement
   for operator placement      100

                               50
Maximum number of                0

placed operators               300

                               250

 • Spreading the load          200
                                     Relaxation
 • “Power of 2 choices”        150
                                     Placement
                               100

                                50

                                 0
                                                  29
Latency Samples on PlanetLab




                               30
          Transit-Stub in Cost Space




• 1550-node transit stub topology in latency space
• Transit domains at center; stub domains at edges
                                                     31
               Cost Space with Load

Latency/Load cost space
 • 2 dimensions for latency
 • 1 dimension for load (with weighting function)




                                                    32
Relaxation Placement Algorithm




                                 33
 Scratch




Scratch


           34

						
Related docs
Other docs by liaoqinmei
WSSB Learning to Self Medicate
Views: 0  |  Downloads: 0
Out of School Club
Views: 0  |  Downloads: 0
Statements
Views: 143  |  Downloads: 0
the survey presentation
Views: 0  |  Downloads: 0
Individual Differences
Views: 77  |  Downloads: 0