Massive Scale-out
of
Expensive Continuous Queries
Erik Zeitler and Tore Risch
Uppsala Database Laboratory
Uppsala University
Outline
1. Introduction
2. Stream splitting strategies for scale-out
3. Evaluating stream splitting strategies
4. Cost model and heuristic
5. Energy efficiency
6. Related work
7. Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 2
DSMS
Super Computer Stream Query processor
aData Stream Management System
Data Stream Management System
user
or
programmer
CQ: Continuous Queries (filters and transformations)
Queries
01001011
DSMS
SCSQ
Query processing
Input data software Query result
streams data stream
Stream data
access software
11001011
meta- stored
data data
31 Aug 2011 Erik Zeitler and Tore Risch 3
Research Questions
How to ensure scalable CQ execution
• with growing input stream rate?
• with high CQ execution cost? By scale-out.
CQs are scaled out by splitting the input stream.
• applications require customizable input stream splitting, called
splitstream
• both tuple route and broadcast allowed
CQ
splitstream
split CQ merge
CQ
31 Aug 2011 Erik Zeitler and Tore Risch 4
Research Questions
How to ensure scalable CQ execution
• with growing input stream rate?
• with high CQ execution cost? By scale-out.
CQs are scaled out by splitting the input stream.
• applications require customizable input stream splitting, called
splitstream
• both tuple route and broadcast allowed
How to split massive streams over massively parallel CQs?
• By parallelization of splitstream CQ
CQ
CQ
splitstream
split
splitstream CQ merge
splitstream
splitstream CQ
CQ
31 Aug 2011 CQ
Erik Zeitler and Tore Risch 5
Outline
1. Introduction
2. Stream splitting strategies for scale-
out
3. Scale-up of stream splitting strategies
4. Cost model and heuristic
5. Energy efficiency
6. Related work
7. Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 6
Defining stream splitting
splitstream(stream s, integer q,
function rfn,
function bfn) s splitstream sv
vector of stream sv
User defines rfn and bfn q
rfn(object tpl, integer q) integer
rfnLRB(event e, integer q) integer as
select expressway(e) where eventtype(e) = 0;
bfn(object tpl) boolean
bfnLRB(event e) boolean as
select eventtype(e) = 2;
rfn and bfn for streams are analogous to fragmentation and
replication conditions in distributed DBMS
Unlike DDBMS, execution of rfn and bfn is parallelized
31 Aug 2011 Erik Zeitler and Tore Risch 7
Naïve (flat) splitstream
implementation: fsplit
fsplit(stream s, integer q,
CQ
function rfn, function bfn)
vector of stream sv CQ
CQ
CQ
fsplit
CQ
CQ
Expensive stream
splitting computations CQ
Bottleneck! CQ
31 Aug 2011 Erik Zeitler and Tore Risch 8
Tree shaped splitstream
implementation: maxtree
maxtree(stream s, integer q, CQ
function rfn, function bfn)
vector of stream sv CQ
fsplit
CQ
CQ
fsplit
CQ
CQ
• Bottleneck is alleviated fsplit
[Zeitler and Risch, CQ
DASFAA 2010]
• but still problematic CQ
31 Aug 2011 Erik Zeitler and Tore Risch 9
Scaled-out splitstream: parasplit
parasplit(stream s, integer q, CQ
function rfn, function bfn)
vector of stream sv CQ
fsplit
CQ
CQ
PR fsplit
CQ
Window router CQ
distributes fsplit
entire windows CQ
CQ
Window splitter Stream
merge
31 Aug 2011 Erik Zeitler and Tore Risch 10
Parasplit:
route – //fsplit – //(merge – CQ)
parasplit(stream s, integer q, CQ
function rfn, function bfn)
vector of stream sv CQ
fsplit
CQ
CQ
PR fsplit
CQ
Window router CQ
distributes fsplit
entire windows CQ
CQ
Window splitter Stream
merge
31 Aug 2011 Erik Zeitler and Tore Risch 11
Tree shaped window routing:
parasplit*
fsplit
CQ
CQ0
PR fsplit
CQ
CQ1
fsplit
CQ
CQ2
fsplit
CQ
CQ3
PR PR fsplit
CQ
CQ4
fsplit
CQ
CQ5
fsplit
CQ
CQ6
PR fsplit
CQ
CQ7
fsplit
31 Aug 2011 Erik Zeitler and Tore Risch 12
Outline
1. Introduction
2. Stream splitting strategies
3. Scale-up of stream splitting
strategies
4. Cost model and heuristic
5. Energy efficiency
6. Related work
7. Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 13
Experimental set-up
www.cs.brandeis.edu/~linearroad Hardware
Linux cluster
Up to 70 nodes
Each node has 2x
quad-core Intel®
Xeon®
E5430@2.66GHz,
6 MB L2$.
TCP/IP over GbE
Performance number L : Number of xways the DSMS can handle
31 Aug 2011 Erik Zeitler and Tore Risch 14
LRB result
Performance number L : Number of xways the DSMS can handle
name org year L cores comment
Brandeis, 2004
Aurora 2.5 1
Commercial sys A Brown, MIT 2004 0.5 1
SPC IBM 2006 2.5 170 3GHz Xeon
Xquery ETHZ 2007 1.5 1
DataCell CWI 2009 1 4 1.4s avg RT
stream schema ETHZ 2010 5 4
D disabled (later verified in
SCSQ maxtree UU 2010 64 48
mySQL)
SCSQ parasplit UU 2011 512 560 D disabled
31 Aug 2011 Erik Zeitler and Tore Risch 15
Splitstream stream rate
1 000,00
Max stream rate [Mbps]
800,00
1 Gbps
600,00 parasplit* wire speed
parasplit
400,00 maxtree
fsplit
200,00
0,00
0 100 200 300 400 500
CQ parallelism, q
q
31 Aug 2011 Erik Zeitler and Tore Risch 16
Window router stream rate
CQ
CQ
fsplit
CQ
CQ
PR fsplit
CQ
W p CQ
fsplit
CQ
W – physical window size CQ
p – number of parallel fsplit
31 Aug 2011 Erik Zeitler and Tore Risch 17
Impact of window size W in window
router network bound
for large enough
windows
1000,00
Max stream rate [Mbps]
800,00
p=4 p=64
600,00
400,00
200,00
0,00
0 5 10 15
W [kB]
31 Aug 2011 Erik Zeitler and Tore Risch 18
Impact of window size W in window
Max stream rate [Mbps]
router when scaling p
1000,00
800,00
600,00
p=4 p=64
400,00
p=128 p=256
200,00 p=512
0,00
0 5 10 15
W [kB]
31 Aug 2011 Erik Zeitler and Tore Risch 19
Parasplit*
Tree shaped window router
1 000,00
900,00
Max stream rate [Mbps].
800,00
700,00
600,00
500,00
window router tree (parasplit*)
400,00
300,00 single window router (parasplit)
200,00
100,00
0,00
0 100 200 300 400 500
W = 16 kB fsplit parallelism, p
p
31 Aug 2011 Erik Zeitler and Tore Risch 20
Outline
1. Introduction
2. Stream splitting strategies
3. Scale-up of stream splitting strategies
4. Cost model and heuristic
5. Energy efficiency
6. Related work
7. Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 21
Eliminate p
parasplit(stream s, integer q, CQ
function rfn, function bfn)
vector of stream sv CQ
fsplit
CQ
CQ
PR fsplit
CQ
p CQ
Given fsplit
• Input stream rate ΦD CQ
• Parallelism of continuous query q
Automatically determine
q CQ
• fsplit parallelism p
31 Aug 2011 Erik Zeitler and Tore Risch 22
Cost model for fsplit
emit(R1) R1
consume split ... ...
emit(Rq) Rq
C fsplit cr cs(o r q b) ce(r q b)
cr – read cost per tpl (read + de-marshal)
cs – split cost per tpl (execute rfn and bfn)
ce – emit cost per tpl (marshal + print)
o – omit %
r – routing % according to rfn and bfn
b – broadcast %
q – number of output streams
31 Aug 2011 Erik Zeitler and Tore Risch 23
Cost model for merge in CQ
S1 consume(S1) emit(R1) R1
... ... merge compute split ... ...
Sp consume(Sp) emit(Rw) Rw
CCQ cr p cp cm O
cr – read cost per tpl (read + de-marshal)
cp – poll cost per tpl
cm – merge cost per tpl
O – cost of executing the CQ and emit its result
31 Aug 2011 Erik Zeitler and Tore Risch 24
Cost model for parasplit
CQ
CQ
fsplit
CQ
CQ
PR fsplit
CQ
p CQ
fsplit
CQ
C PR crW csW ceW q CQ
C fsplit crW cso r q b cer q b
CCQ cr p cp cm O
p can be eliminated using cost model, but requires
extensive profiling everywhere
31 Aug 2011 Erik Zeitler and Tore Risch 25
Heuristic for estimating p
Assume
• 1% broadcast tuples (configurable)
• 0% omitted tuples (configurable)
CQ
C fsplit crW cso r q b cer q b
CQ
fsplit
CQ
C fsplit cs ce 0.99 0.01 q
ˆ
CQ
PR fsplit
CQ
p CQ
fsplit
Measure Φfsplit(1)fsplit CQ
q
on rfn and bfn, q = 1: cs +ce = 1/Φfsplit(1)
CQ
D
Estimate p by p (1) 0.99 0.01 q
ˆ
fsplit
31 Aug 2011 Erik Zeitler and Tore Risch 26
p according to heuristics vs.
p using exact cost model
1 000,00
900,00
Max stream rate [Mbps]
800,00
700,00
600,00
parasplit
500,00 cost model
400,00
Too high p (p=q)
300,00 Too low p (p=1)
200,00
100,00
0,00
0 100 200 300 400 500
CQ parallelism, q
q
31 Aug 2011 Erik Zeitler and Tore Risch 27
Outline
1. Introduction
2. Stream splitting strategies
3. Scale-up of stream splitting strategies
4. Cost model and heuristic
5. Energy efficiency
6. Related work
7. Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 28
Estimating energy efficiency, η
How much extra energy CQ
does parasplit consume in
CQ
fsplit
comparison to fsplit?
CQ
CQ
PR fsplit
CQ
Conservatively assume CQ
fsplit
energy consumption CQ
proportional to CPU usage: CQ
Useful work
• p ∙ Cfsplit p C fsplit
Overhead CPR p C fsplit q CCQ0)
(O
• CPR
• q ∙ CCQ(O=0)
31 Aug 2011 Erik Zeitler and Tore Risch 29
Measuring energy efficiency
100% parasplit*
90% parasplit
80% cost model
70% Too high p (p=q)
Efficiency
Too low p (p=1)
60%
50%
40%
30%
20%
10%
0%
0 100 200 300 400 500
CQ parallelism, q
q
31 Aug 2011 Erik Zeitler and Tore Risch 30
Related work
Nobody else has investigated strategies for scalable
customizable stream splitting
IBM SPADE/System S [Andrade et al 2009]
• Splitstream operator with broadcast capabilities
• Streaming throughput degrades when scaling q
Event based systems [Brenna et al 2009]
• Custom stream splitting shown to be a bottleneck
Gigascope [Johnson et al 2008]
• Assumes specialized stream splitting hardware
• No customizable stream splitting
GSDM [Ivanova, Risch 2005]
• Parallel execution of expensive UDFs
• More limited parallelization
Streaming MapReduce [Condie et al 2010]
• Does not handle scalable stream splitting
[Balkesen, Tatbul 2011]
• Distributing entire windows over CQs
• q≤4
31 Aug 2011 Erik Zeitler and Tore Risch 31
Conclusions and future work
Naïve stream splitting is prohibitive for scale-out of CQs
Parasplit
• eliminates the bottleneck of stream splitting, providing network
bound stream rates
Parasplit*
• provides network bound stream rates for highly scaled-out stream
splitting
Push selection predicates from CQ to rfn of splitstream
Improve energy efficiency
High Availability
SCSQ home page
• http://www.it.uu.se/research/group/udbl/SCSQ.html
31 Aug 2011 Erik Zeitler and Tore Risch 32
31 Aug 2011 Erik Zeitler and Tore Risch 33
Extra material
Window router tree
Cost model
LRB
• Parallelization of LRB
Additional related work
31 Aug 2011 Erik Zeitler and Tore Risch 34
Single process window router, p=64
PR
31 Aug 2011 Erik Zeitler and Tore Risch 35
Tree shaped window router, p=64
Parasplit
+ tree shaped window router
= parasplit*
PR
PR
PR
PR
PR
PR
PR
PR
PR
31 Aug 2011 Erik Zeitler and Tore Risch 36
Heuristics for estimating p
Given
• Input stream rate ΦD
• Parallelism of continuous query q
Determine fsplit parallelism p
• If max stream rate of fsplit is Φfsplit CQ
choose p such that p ∙ Φfsplit ≥ ΦD fsplit
CQ
CQ
CQ
PR fsplit
CCQ cr p cp cm O
CQ
ΦD p CQ
fsplit
• CCQ increases with p CQ
Must choose p carefully q CQ
31 Aug 2011 Erik Zeitler and Tore Risch 37
Linear Road Benchmark
Simulates vehicles travelling Input: One stream of position
(and colliding) reports and historical queries
• on a number of expressways
(account balance, daily tolls)
• using variable tolling Continuous queries: Toll
notifications, accident
• based on traffic conditions and notifications
accident proximity
Output: Four result streams of
responses to historical and
continuous queries:
0. toll alerts
1. accident alerts
2. account balance responses
3. daily expenditure responses
L-rating: Number of xways
processed within RT constraints
31 Aug 2011 Erik Zeitler and Tore Risch 38
Parallelization of LRB using fsplit
fsplit
CQ CQ CQ CQ
Scale up q
fsplit fsplit fsplit fsplit
Daily expenditure
queries D are
excluded here.
union union groupby
toll alerts accident alerts
account
balance
Daily expenditure
answers
data is managed by a
regular DBMS.
31 Aug 2011 Erik Zeitler and Tore Risch 39
Related work
Nobody else has investigated strategies for scalable
customizable stream splitting
IBM SPADE/System S [Andrade et al 2009]
• Splitstream operator with broadcast capabilities
• Streaming throughput degrades when scaling q
Event based systems [Brenna et al 2009]
• Custom stream splitting shown to be a bottleneck
Gigascope [Johnson et al 2008]
• Assumes specialized stream splitting hardware
• No customizable stream splitting
GSDM [Ivanova, Risch 2005]
• Parallel execution of expensive UDFs
• More limited parallelization
Streaming MapReduce [Condie et al 2010]
• Does not handle scalable stream splitting
[Balkesen, Tatbul 2011]
• Distributing entire windows over CQs
• q≤4
31 Aug 2011 Erik Zeitler and Tore Risch 40
Other related work
Medusa [Balazinska et al 2004]
• Parallel DSMS
• Dynamic migration of operators between nodes
• Without scale-out, heavy operators are bottlenecks
Dryad [Isard et al 2007]
• User defined process graphs in QL (edges + vertices)
• SCSQ automatically generates such graphs from splitstream
SCOPE [Chaiken et al 2008], Map-reduce-merge [Yang et
al 2007]
• All these are batch systems, not DSMSs
Distributed DBMS
• rfn and bfn are analogous for streams to fragmentation and
replication conditions in DDBMS
• DDBMS do not scale out fragmentation and replication, while
splitstream parallelizes rfn and bfn.
31 Aug 2011 Erik Zeitler and Tore Risch 41