# On the fly

Document Sample

```					Determining the Cost of Algebraic Operators

Background: Memory, Files, Indexes

External Sort                  Selection

Join
Evaluating Individual Operators
Other

Finding Efficient Query Plans

Optimizer Architecture

Cost of Combining Operators

Enumerating Plans
1
Relational Query Optimization

Evaluating Queries with Several
Operators

2
Topic

• Until now, we have seen how to optimize
queries with a single operator
• Most queries have several operators
• We now discuss how queries with several
operators can be optimized

3
Simplest Way to Implement
Complex Queries
• Evaluate operators, one at a time
• After each evaluation, write temporary
results to disk
• Read temporary results from disk, as input to
the next operator

4
Simplest Way to Implement
Complex Queries
• Example: SELECT             S.sname
FROM           Sailors S, Reserves R
WHERE          S.sid = R.sid and
R.bid = 100 and S.rating>5

sname(bid=100 and rating>5(Reserves   Sailors))

• Compute join and write result T1 to disk
• Read T1, compute selection, and write result T2 to
disk
5
Simplest Way to Implement
Complex Queries
sname(bid=100 and rating>5(Reserves           Sailors))
• Compute join and write result T1 to disk
• Read T1, compute selection, and write result T2 to disk
• A query plan:                sname

bid=100   rating > 5   File scan, write to T2

Block nested loops, write to T1
sid=sid

Reserves           Sailors
6
Simplest Way to Implement
Complex Queries
Thinksname(bid=100 and rating>5(Reserves Sailors))
• 1) Does this seem write result T1 to disk
Compute join and efficient?
No. join and select can be calc together, can print
• Read namecompute selection, and write result T2 to disk
the T1, while choosing it.
Read T2, use an index for the selection? No.
the selection is for temp table that has no index.
• 3) querywould we implement the projection?
A How plan:           sname
as we said in last part (last slide).
bid=100   rating > 5   File scan, write to T2

Block nested loops, write to T1
sid=sid

Reserves           Sailors
7
Query Optimization: Basic Architecture

Query Parser

Query Optimizer

Plan        Plan Cost      Catalog
Generator     Estimator      Manager

Query Plan Evaluator
8
Simplifications
• SQL Queries may be composed of several blocks
• Each block is optimized separately
• In our discussion, we assume a single block
SELECT  S.sid, MIN(R.day)
FROM    Sailors S, Reserves R, Boats B
WHERE   S.sid = R.sid and R.bid = B.bid
and B.color = ‘red’ and
S.rating = (SELECT MAX(rating)
FROM Sailors)
GROUP BY S.sid
HAVING   count(*)>2                           9
Simplifications
• Queries may contain group-by and aggregation
• Are applied as a final step in evaluation (how? Hash/ Sort)
• We only consider queries without aggregation

SELECT  S.sid, MIN(R.day)
FROM    Sailors S, Reserves R, Boats B
WHERE   S.sid = R.sid and R.bid = B.bid
and B.color = ‘red’ and
S.rating = (SELECT MAX(rating)
FROM Sailors)
GROUP BY S.sid
HAVING   count(*)>2                                        10
Simplifications
• By making the simplifications discussed, we derive
queries that can be translated into relational
algebra. We show how to evaluate such queries.

SELECT  S.sid, MIN(R.day)
FROM    Sailors S, Reserves R, Boats B
WHERE   S.sid = R.sid and R.bid = B.bid
and B.color = ‘red’ and
S.rating = (SELECT MAX(rating)
FROM Sailors)
GROUP BY S.sid
HAVING   count(*)>2                                11
Query Plans

12
Translating to Algebra Trees
SELECT          S.sname
FROM            Sailors S, Reserves R
WHERE           S.sid = R.sid and
R.bid = 100 and S.rating>5

sname(bid=100 and rating>5(Reserves   Sailors))              sname

bid=100    rating > 5

sid=sid

Reserves           Sailors
13
Query Plans
• A query plan is a relational algebra tree, annotated
with access and evaluation methods
– Convention: In trees when using block/index nested
loops join, left child is outer relation
(On-the-fly)
sname
sname

bid=100     rating > 5     (On-the-fly)
bid=100    rating > 5

(Block Nested Loops)
sid=sid                                sid=sid

Reserves           Sailors         Reserves             Sailors                  14
Alternative Plans

• There may be many different query plans for the
same query. Query optimizer must:
– enumerate such plans
– choose the best plan
(On-the-fly)
sname
(On-the-fly)
sname

bid=100     rating > 5     (On-the-fly)
(Sort-Merge Join)
sid=sid

(Scan;                                  (Scan;                          (Block Nested Loops)
write to bid=100           rating > 5   write to
temp T1)                                temp T2)              sid=sid

Reserves           Sailors
Reserves             Sailors                  15
‫‪Pipelined Evaluation‬‬

‫”‪Evaluation “On the Fly‬‬
‫פעולת ביניים לא נכתבת אל הדיסק אלא מועברת‬
‫ישירות לפעולה הבאה‬

‫61‬
Pipelining

• When evaluating a query, the results of one
operator may have to be fed into another operator.
This can be done in two ways:
– Materializing: the first operator creates a temporary
relation, which is read by the second operator
– Pipelining (on the fly computation): the results of the
first operator are fed directly into the second operator

• Pipelining can save on the I/O costs!
– We save the cost of writing a temporary result to disk!
– We save the cost of reading for the following operation!   17
Pushing Selections/Projections
Why?
and Pipelining
Can Push through                               Can Pipeline
Selection/Projection                           into?
Block NL, Outer                                                                                 

Block NL, Inner                                                                                X

Index NL, Outer                                                                                 

Index NL, Inner                                         X                                       X

Sort Merge                                                                                     X
•   BNLO: R is read once. While reading check the selection cond. If it does add it to the projectile. Without writing to the
disk.
•   INLO: same.
18
•   When both possible it is always better to push and pipeline
When is Pipelining Impossible

• Impossible if:
– Input must be read several times
• Block nested loops join – inner relation

– Input must be read via an index
• Index nested loops join – inner relation

– Input must be read in a specific order (and the
previous operator does not output in that order
• Sort merge join

19
Example 1: Pipelining in Unary
Operators
• Suppose we want to apply two selections on the
table Reserves: date<05/05/05 and bid=100. We
have a BTree index on date:
– Materializing: Find tuples matching date condition using
BTree. Store in temporary relation. Scan relation to find
tuples matching bid condition
– Pipelining: Find tuples matching date condition using
BTree. As tuples are retrieved, apply second condition
• Pipelining into unary operators is called on-the-fly
evaluation
20
Example 2: Pipelining in Joins

• Suppose we want to compute (A          B)    C
• Materializing: Compute A         B, store and then read
and join with C
• Pipelining: The joins can be pipelined if some type
of nested loops join is used.
– Read blocks of tuples from A
– Find matching tuples from B (using a scan or index)
– Find matching tuples from C
C

A       B       21
Pipelining Joins: Questions

• Can the execution of the join be pipelined if
the join with C is performed using sort-merge
join?
– No.
– Example: R(A, B), S(B, C), T(C, D)
– Sort-merge (R, S) will be sorted by B
– Sort-merge (S, T) will be sorted by C
C
– Therefore C must be read from the file,
Meaning it must have been written to the file
A   B   22
Fully Pipelined Execution

• The execution is fully pipelined if no
temporary relations are written to disk
throughout the entire query processing

23
Pipelining Joins: Questions

• Which of the following trees can allow for
fully pipelined execution (e.g., using a
nested loops join)? Middle one.

D           D

C       C

A          B   C        D   A   B               A   B

24
Left Deep Plans

• A plan is left-deep if the inner relation of each join
(i.e., the right child of each join operator) is a base
relation. Which of the following are left deep?

D                D

C       C

A           B    C          D   A    B               A     B

• Left deep plans allow for fully pipelined execution
25
Alternative Plans

Motivating Example

26
Motivating Example
• We will consider the cost of various plans for the
following query.
– NOTE: No distinct in SELECT clause!
SELECT       S.sname
FROM         Sailors S, Reserves R
WHERE        S.sid = R.sid and
R.bid = 100 and S.rating>5

Tuple size Tuples per   Number of
block        blocks
Reserves 40           100          1000
Sailors    50         80           500            27
Tuple size Tuples per           Number of
block             blocks
Reserves 40                        100               1000
Sailors              50            80                500

• What is the cost of this plan, if
(On-the-fly)
sname
the buffer is of size 5?
BR + BS * BR / (B-2)
1000+500*1000/3= 168,000                       bid=100     rating > 5     (On-the-fly)
• What would you suggest to
change in this plan?
Switch between the join and the select                          (Block Nested Loops)
sid=sid
Switch between S and R
Do the projection early
Reserves             Sailors                  28
Alternative Plan: Pushing Selections

• Selections generally reduce the sizes of the relations
• Often a good strategy to push the selections, i.e., apply
early, so that the join is over smaller relations
• Which selections could be
(On-the-fly)
sname
pushed in this query?

bid=100     rating > 5     (On-the-fly)

(Block Nested Loops)
sid=sid

Reserves             Sailors                  29
Tuple size Tuples per           Number of
block            blocks
Reserves 40               100              1000
Sailors    50             80               500

• This is an plan has                                     (On-the-fly)
sname

fully-pushed                                                 (Block NL Join)
sid=sid
selections
(Scan;                                  (Scan;
write to bid=100           rating > 5   write to
‫כל הבחירות נעשות כמה שיותר‬        temp T1)                                temp T2)

‫מוקדם‬                 Reserves           Sailors

30
Tuple size Tuples per           Number of
block            blocks
Reserves 40               100              1000
Sailors    50             80               500

• Find cost, assuming:                                    (On-the-fly)
sname
– 5 buffer pages
– 100 different boats,                                      (Block NL Join)
sid=sid
uniform distribution on
(Scan;                                  (Scan;
bid                            write to bid=100
temp T1)
rating > 5   write to
temp T2)
– rating between 1 and                   Reserves           Sailors
10, uniformly distributed                                                    31
Tuple size Tuples per                   Number of
block             blocks
Reserves 40                              100               1000
Sailors            50                    80                500

• Finding cost:                                                             (On-the-fly)
sname
1000 :reserve ‫קריאת‬
10 ‫ הוא כ 001/0001. לכן עולה כ‬T1 ‫: גודל‬T1 ‫כתיבת‬
500 :salors ‫קריאת‬                               (Block NL Join)
sid=sid
250 ‫ תעלה כ‬T2 ‫: 2/005 = 052. לכן כתיבת‬T2 ‫כתיבת‬
Block Nested Loop: 10 + 250*10/3 = 1010       (Scan;                                  (Scan;
1000+10+500+250+1010 = 2770 :‫סה"כ העלות היא‬         write to bid=100           rating > 5   write to
temp T1)                                temp T2)

Reserves           Sailors

32
Pushing Selections/Projections
and Pipelining
Can Push through       Can Pipeline
Selection/Projection   into?
Block NL, Outer                                 

Block NL, Inner                                 X

Index NL, Outer                                 
If we tried to push
Index NL, Inner             X        selection/projection we
X
would no longer have
Sort Merge                                      X
an index

33
Alternative Plan: Pushing
Projections
• Projections also reduce the sizes of the relations
• Often a good strategy to push the projections, i.e., apply
early, so that the join is over smaller relations
• Which projections could be
(On-the-fly)
sname
pushed in this query?
• Note: in general, duplicate                      rating > 5     (On-the-fly)
bid=100

elimination is performed
only in the outermost                                 (Block Nested Loops)
sid=sid
projection
Reserves             Sailors                  34
Tuple size Tuples per         Number of
block              blocks
Reserves 40          100                1000
Sailors   50         80                 500

• This is an plan has                                  (On-the-fly)
sname

fully-pushed                                              (Block NL Join)
sid=sid
selections and          (On-the-fly;                             (On-the-fly;
write to T1)    sid                      write to T2)
sname,sid
projections                 (Scan)                                     (Scan)
bid=100           rating > 5

Reserves           Sailors               35
Tuple size Tuples per                Number of
block              blocks
Reserves 40                         100                1000
Sailors           50                80                 500

sid, ‫, ונניח‬byte 10 ‫ תופס‬sid ‫נניח ש‬
byte 25 ‫ תופס‬sname                                           (On-the-fly)
sname
‫חישוב עלות‬
1000 :Reserve ‫קריאת‬
(Block NL Join)
)‫: 3 = )4*001(/0001 (ערך עליון‬T1 ‫כתיבת‬                              sid=sid
(On-the-fly;                             (On-the-fly;
500 :Salors ‫קריאת‬         write to T1)    sid                      write to T2)
sname,sid
500/(2*2) = 125 :T2 ‫כתיבת‬
(Scan)      bid=100           rating > 5   (Scan)
3 + 125*3/3 = 128 :BNL ‫חישוב‬
1000+3+500+125+128 =   1756 :‫סה"כ העלות‬                     Reserves           Sailors               36
Alternative Plan: Using Indexes

• In there are indexes, then it may be
preferable to access one (or more) of the
relations via the indexes, instead of by using
a sequential scan
• Suppose we have a clustered hash index on
bid of Reserves and a hash index on sid of
Sailors

37
Tuple Plan: Using Indexes
Alternativesize Tuples per Number of
block                blocks
Reserves 40             100                  1000
Sailors    50           80                   500

• Find cost, assuming                                  sname
(On-the-fly)

rating > 5 (On-the-fly)

table indices costs 1.2
(Index Nested Loops,
I/Os                                               sid=sid    with pipelining )

• Is the selection fully            (Use hash
index; do
not write
bid=100      Sailors

result to
pushed? No. Is this                temp)
Reserves

good? No (index).                                                                    38
Tuple Plan: Using Indexes
Alternativesize Tuples per Number of
block                   blocks
Reserves 40                        100                     1000
Sailors         50                 80                      500

‫חישוב עלויות‬                                (On-the-fly)
sname
BR+TR(time to find matching raws in S) :INL ‫עלות‬
‫ היא בזמן הבחירה, לכן אין צורך לקרוא‬R ‫עלות קריאת‬                       rating > 5 (On-the-fly)
.INL ‫אותו שוב בשלב ה‬
1.2 + 1000*100/(100*100) = 11.2 :Reserve ‫קריאת‬                                    (Index Nested Loops,
sid=sid    with pipelining )
bid=100 ‫חלוקה אחת ב 001 כי בהסתברות 10.0 נקבל‬
.‫חלוקה שניה ב 001 היא בגלל שהאינדקס מכווץ‬     (Use hash
index; do    bid=100      Sailors
not write
1000*(1.2+1) :INL ‫עלות‬      result to
temp)
2211.2 :‫סה"כ העלות‬                  Reserves

39
Determining the Cost of Algebraic Operators

Background: Memory, Files, Indexes

External Sort                  Selection

Join
Evaluating Individual Operators
Other

Finding Efficient Query Plans

Optimizer Architecture

Cost of Combining Operators

Enumerating Plans
40
Calculating the Costs: Block NL
• N is number of blocks in the outer relation
• M is number of blocks in the inner relation
• B is number of buffer blocks
• Cost, selections/projections are not pipelined: N +
M(N/(B-2))
• Cost, selections/projections are pipelined on outer
relation: M(N/(B-2))
• Notes: If selections/projections are pushed through
the join, then N and M are the sizes of the relations
AFTER the selections/projections
41
Calculating the Costs: Index NL
• N is number of blocks in the outer relation
• T is number of tuples in the outer relation
• S is the cost of traversing the index
• Tm is number of tuples in the inner relation that match each
tuple in the outer relation
• P is number of tuples in the inner relation that fit into a page
• Notes: If selections/projections are pushed through the join,
then N and T are the sizes AFTER the selections/projections
Index clustered       Index unclustered
No pipelining     N+T(S+Tm/P)           N+T(S+Tm)
Pipelining on     T(S+Tm/P)             T(S+Tm)
outer relation                                                42
Calculating the Costs: Sort-Merge Join
• N is number of blocks in the outer relation
• M is number of blocks in the inner relation
• B is number of buffer blocks
• Cost:
2N(1+ logB-1( N/B)) + 2M(1+ logB-1( M/B)) + N + M
• If outer relation is already sorted:
2N(1+ logB-1( N/B)) + N + M
• If inner relation is already sorted:
2M(1+ logB-1( M/B)) + N + M
• Bother relations are already sorted: N + M
• Notes: If selections are pushed through the join, then N and
M are the sizes of the relations AFTER the selections       43
Determining the Cost of Algebraic Operators

Background: Memory, Files, Indexes

External Sort                  Selection

Join
Evaluating Individual Operators
Other

Finding Efficient Query Plans

Optimizer Architecture

Cost of Combining Operators

Enumerating Plans
44

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 5 posted: 10/27/2012 language: English pages: 44