# Cost-Based Plan Selection

Shared by:
Categories
Tags
-
Stats
views:
9
posted:
6/3/2012
language:
English
pages:
20
Document Sample

```							  Cost-Based Plan Selection
Choosing an Order for Joins
Chapter 16.5 and16.6 by:-
Vikas Vittal Rao
ID: 124/227
Chiu Luk
ID: 210
Agenda
•   Outline
•   Cost Estimation
•   Histograms
•   Computation of statistics
•   Reducing cost of heuristics
•   Enumerating Physical Plans
•   List of other approaches
Outline
• Query Optimizer estimates the “cost” of query
evaluation.
• This “cost” is based on number of disk I/O’s.
• Disk I/O’s influenced by:-
• Logical operators used
• Size of intermediate results
• The ordering of similar operations, especially
joins (discussed next in 16.6)
Cost Estimation
• Query Optimizer keeps track of certain
parameters like:
• T(R) – number of tuples in a relation R
• V( R , a ) – number of unique values in R for attribute ‘a’
• B(R) – number of blocks in which R can fit.

The Optimizer also computes a
“histogram” of the above parameters.
Histograms
• Equal Width - divide value range by fixed width w
and keep counts of each width
• Equal Height – similar to equal width, we pick the
lowest value v0, a fraction p and keep count of
values which are at “p from the lowest, 2 p from
the lowest, etc” up to the highest value.
• Most frequent values – list of most frequent
values and the number of their occurances
• Using histograms helps estimate the sizes of joins
more accurately(than previously discussed
methods)
Computation of Statistics
• Statistics are fault-tolerant, good for use in
estimates; e.g. small error does not significantly
change out come.

• Inspect the distribution of values across records in
relation

• Computation on entire relation is expensive;
however, can be computed using smaller sample
size
Reducing Cost by Heuristics

• Applies for logical query plan
• Estimate cost before and after a transformation
• Only choose/apply transformation when cost
estimations show beneficial
• Example:
• Deferring duplicate elimination is better
Enumerating Physical Plans

• Baseline approach (exhaustive): consider all
combinations, pick the smallest cost plan
• Other approaches categorized into:
• Top-down: compute cost from root, take the best
• Bottom-up: compute cost for all combinations for a
sub-expression, select the best, move up until root
evaluated
List of Other Approaches

• Heuristic selection
• Branch-and-bound plan enumeration: keep record
exceed best known cost
• Hill climbing
• Dynamic programming (PP): keep least cost of
each sub-expression; work bottom-up
• Selinger-style optimization: improved version of
DP; keep others beneficial plans besides least cost
plans
• Thank you. Chiu Luk will continue from this
point.
Introduction
• This section focuses on critical problem in
cost-based optimization:
– Selecting order for natural join of three or more
relations
• Compared to other binary operations, joins
take more time and therefore need effective
optimization techniques
Significance of Left and Right Join
Arguments
• The argument relations in joins determine the
cost of the join
• The left argument of the join is
– Called the build relation
– Assumed to be smaller
– Stored in main-memory
Significance of Left and Right Join
Arguments
• The right argument of the join is
– Called the probe relation
– Read a block at a time
– Its tuples are matched with those of build relation
• The join algorithms which distinguish between
the arguments are:
– One-pass join
– Nested-loop join
– Index join
Join Trees
• Order of arguments is important for joining
two relations
• Left argument, since stored in main-memory,
should be smaller
• With two relations only two choices of join
tree
• With more than two relations, there are n!
ways to order the arguments and therefore n!
join trees, where n is the no. of relations
Left-Deep Join Trees
• Consider 4 relations. Different ways to join
them are as follows
• In fig (a) all the right children are leaves. This
is a left-deep tree
• In fig (c) all the left children are leaves. This is
a right-deep tree
• Fig (b) is a bushy tree
• Considering left-deep trees is advantageous
for deciding join orders
Dynamic Programming to Select a Join
Order and Grouping
• Three choices to pick an order for the join of
many relations are:
– Consider all of the relations
– Consider a subset
– Use a heuristic o pick one
• Dynamic programming is used either to
consider all or a subset
– Construct a table of costs based on relation size
– Remember only the minimum entry which will
required to proceed
Dynamic Programming with More
Detailed Cost Functions
• Disadvantage of dynamic programming is that
it does not involve the actual costs of the joins
in the calculations
• Can be improved by considering
– Use disk’s I/O for evaluating cost
– When computing cost of R1 join R2, since we sum
cost of R1 and R2, we must also compute
estimates for there sizes
A Greedy Algorithm for Selecting a Join
Order
• It is expensive to use an exhaustive method
like dynamic programming
• Better approach is to use a join-order heuristic
for the query optimization
• Greedy algorithm is an example of that
– Make one decision at a time and never backtrack