# optimization by gegeshandong

VIEWS: 12 PAGES: 56

• pg 1
```									         Query Processing and Optimization

Chapters 12,14,15 in [R]

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   1
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   2
Overview of Query Optimization
   Plan: Tree of R.A. ops, with choice of alg for each op.
   Each operator typically implemented using a `pull’
interface: when an operator is `pulled’ for the next
output tuples, it `pulls’ on its inputs and computes them.
   Two main issues:
   For a given query, what plans are considered?
• Algorithm to search plan space for cheapest (estimated) plan.
   How is the cost of a plan estimated?
 Ideally: Want to find best plan. Practically: Avoid
worst plans!
 We will study the System R approach.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                3
Introduction
   Alternative ways of evaluating a given query
 Equivalent expressions
 Different algorithms for each operation (Chapter 13)
   Cost difference between a good and a bad way of
evaluating a query can be enormous
 Example: performing a r X s followed by a selection r.A =
s.B is much slower than performing a join on the same
condition
   Need to estimate the cost of operations
 Depends critically on statistical information about relations
which the database must maintain
• E.g. number of tuples, number of distinct values for join
attributes, etc.
 Need to estimate statistics for intermediate results to
compute cost of complex expressions
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                  4
Introduction (Cont.)
Relations generated by two equivalent expressions
have the same set of attributes and contain the same
set of tuples, although their attributes may be
ordered differently.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   5
Introduction (Cont.)
   Generation of query-evaluation plans for an
expression involves several steps:
1. Generating logically equivalent expressions
• Use equivalence rules to transform an expression into an
equivalent one – Logical optimization
2. Annotating resultant expressions to get alternative query
plans
3. Choosing the cheapest plan based on estimated cost –
Physical optimization
   The overall process is called cost based
optimization.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                 6
Query optimization – the two phases
query

Catalog               Parse
schema info
Tree of RA
operators

Logical optimization – pushing operators down

Catalog    physical optimization method to implement each RA
Statistics
indexes info              Run-time code
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke              7
Search space “plan”
Search logical order
1                                    2
1                  2                                 3
X        or        X                    or             X

1       2                        2                 1

Take Plan 1: 4 Join methods, 3 select
methods
There are 3*3*4 = 36 Plans!
For a large tree the # of plans is huge!

The optimization problem:
Find the best plan!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke     8
Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)

 Similar to old schema; rname added for variations.
 Reserves:
   Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.
   Sailors:
   Each tuple is 50 bytes long, 80 tuples per page, 500 pages.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke              9
RA Tree:        sname

Motivating Example
bid=100     rating > 5
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND                                   sid=sid
R.bid=100 AND S.rating>5
Reserves             Sailors
   Cost: 500+500*1000 I/Os
(On-the-fly)
   By no means the worst plan!      Plan: sname
   Misses several opportunities:
selections could have been          bid=100   rating > 5 (On-the-fly)
`pushed’ earlier, no use is made
of any available indexes, etc.
(Simple Nested Loops)
   Goal of optimization: To find more       sid=sid
efficient plans that compute the
Reserves         Sailors
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                                    10
(On-the-fly)
Alternative Plans 1
sname

(No Indexes)                                                   sid=sid
(Sort-Merge Join)

(Scan;                                    (Scan;
write to bid=100             rating > 5   write to
temp T1)                                  temp T2)
   Main difference: push selects.
Reserves        Sailors
   With 5 buffers, cost of plan:
   Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats,
uniform distribution).
   Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).
   Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250)
   Total: 3560 page I/Os.
   If we used BNL join, join cost = 10+4*250, total cost = 2770.
   If we `push’ projections, T1 has only sid, T2 only sid and sname:
   T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                                     11
(On-the-fly)
sname

Alternative Plans 2
With Indexes
rating > 5 (On-the-fly)

   With clustered index on bid of                                            sid=sid
(Index Nested Loops,
with pipelining )
Reserves, we get 100,000/100 =
1000 tuples on 1000/100 = 10 pages.                      (Use hash
index; do    bid=100      Sailors
not write
   INL with pipelining (outer is not                         result to
temp)
materialized).                                                        Reserves

–Projecting out unnecessary fields from outer doesn’t help.
v   Join column sid is a key for Sailors.
–At most one matching tuple, unclustered index on sid OK.
v   Decision not to push rating>5 before the join is based on
availability of sid index on Sailors.
v   Cost: Selection of Reserves tuples (10 I/Os); for each,
must get matching Sailors tuple (1000*1.2); total 1210 I/Os.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                                           12
Logical and physical optimization

 Logical opt. – use equivalence rules to get the “last” equivalent
expression.
 Physical opt. – use cost estimates to select the best physical plan.
Note: the two are not independent!
The best physical may not be the best logical!

Example:
select Name from Sailors, Reserves
where Reserves.Sid = Sailors.Sid and Sailors.Rating >10
and assume both sorted on Sid and index exists on Rating.
Better not to push the select down!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                   13
Relational Algebra Equivalences
 Allow us to choose different join orders and to
`push’ selections and projections ahead of joins.
 Selections:  c1 ...  cn  R    c1  . . .  cn  R   (Cascade)
 c1  c 2  R    c 2  c1  R         (Commute)
                                                  
Projections:  a1  R   a1 . . .  an  R                    (Cascade)

    Joins: R  (S  T)
                        (R S)  T
                        (Associative)
(R  S)  (S  R)
                                                 (Commute)

+ Show that:R  (S T)
                         (T R)  S (is T R a join?)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                    14
More Equivalences
 A projection commutes with a selection that only
uses attributes retained by the projection.
 Selection between attributes of the two arguments of
a cross-product converts cross-product to a join.
 A selection on just attributes of R commutes with
R  S. (i.e.,  (R  S)   (R)  S )
                               
 Similarly, if a projection follows a join R  S, we can

`push’ it by retaining only attributes of R (and S) that
are needed for the join or are kept by the projection.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   15
Equivalence Rules
1. Conjunctive selection operations can be
deconstructed into a sequence of individual
selections.    ( E )    (  ( E ))
1   2          1       2

2. Selection operations are commutative.
  (  ( E ))    (  ( E ))
1       2              2     1

3. Only the last in a sequence of projection operations
is needed, the others can be omitted.
 t1 ( t2 (( tn (E ))))   t1 (E )

4.   Selections can be combined with Cartesian
products and theta joins.
a. (E1 X E2) = E1  E2
b. 1(E1 2 E2) = E1 1 2 E2

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   16
Equivalence Rules (Cont.)

5. Theta-join operations (and natural joins) are
commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following
manner:

(E1      1 E2)      2  3   E3 = E1      1 3   (E2   2   E3)

where 2 involves attributes from only E2 and E3.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                     17
Equivalence Rules (Cont.)

7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in 0 involve only the
attributes of one
of the expressions (E1) being joined.

0E1          E2) = (0(E1))          E2

(b) When  1 involves only the attributes of E1 and 2
involves
only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke            18
Equivalence Rules (Cont.)
8. The projections operation distributes over the
theta join operation as follows:
(a) if  involves only attributes from L1  L2:
 L1  L2 ( E1....... E2 )  ( L1 ( E1 )) ...... ( L2 ( E2 ))

(b) Consider a join E1                                  E2.
 Let L1 and L2 be sets of attributes from E1 and E2,
respectively.
 Let L3 be attributes of E1 that are involved in join
condition , but are not in L1  L2, and
 let L4 be attributes of E2 that are involved in join
condition , but are not in L1  L2.
 L1  L2 ( E1.....  E2 )   L1  L2 ((  L1  L3 ( E1 )) ...... ( L2  L4 ( E2 )))
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                                           19
Equivalence Rules (Cont.)
9.    The set operations union and intersection are
commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
n (set difference is not commutative).
10.   Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11.   The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:            (E1  – E2) = (E1) – E2
and similarly for  in place of –, but not
for 
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke            20
Equivalence Rules (Cont.)
12.The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   21
Transformation Example

   Query: Find the names of all customers who have an
account at some branch located in Brooklyn.
customer-name(branch-city = “Brooklyn”
(branch (account depositor)))
   Transformation using rule 7a.
customer-name
((branch-city =“Brooklyn” (branch))
(account depositor))
   Performing the selection as early as possible reduces
the size of the relation to be joined.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke      22
Example with Multiple Transformations
   Query: Find the names of all customers with an account at
a Brooklyn branch whose account balance is over \$1000.
customer-name((branch-city = “Brooklyn”  balance > 1000
(branch        (account        depositor)))
   Transformation using join associatively (Rule 6a):
customer-name((branch-city = “Brooklyn”  balance > 1000
(branch        (account))         depositor)
   Second form provides an opportunity to apply the
“perform selections early” rule, resulting in the
subexpression
branch-city = “Brooklyn” (branch)  balance > 1000 (account)
   Thus a sequence of transformations can be useful
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke           23
Multiple Transformations (Cont.)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   24
Projection Operation Example
customer-name((branch-city = “Brooklyn” (branch)          account)   depositor)

   When we compute
(branch-city = “Brooklyn” (branch) account )
we obtain a relation whose schema is:
(branch-name, branch-city, assets, account-number, balance)
   Push projections using equivalence rules 8a and 8b;
eliminate unneeded attributes from intermediate results
to get:
 customer-name ((
 account-number ( (branch-city = “Brooklyn” (branch) account ))
depositor)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                           25
Join Ordering Example

 For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
 If r2 r3 is quite large and r1 r2 is small, we
choose

(r1 r2) r3
so that we compute and store a smaller
temporary relation.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   26
Highlights of System R Optimizer

   Impact:
   Most widely used currently; works well for < 10 joins.
   Cost estimation: Approximate art at best.
   Statistics, maintained in system catalogs, used to estimate
cost of operations and result sizes.
   Considers combination of CPU and I/O costs.
   Plan Space: Too large, must be pruned.
   Only the space of left-deep plans is considered.
• Left-deep plans allow output of each operator to be pipelined into
the next operator without storing it in a temporary relation.
   Cartesian products avoided.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                   27
Cost Estimation
   For each plan considered, must estimate cost:
   Must estimate cost of each operation in plan tree.
• Depends on input cardinalities.
• We’ve already discussed how to estimate the cost of
operations (sequential scan, index scan, joins, etc.)
   Must also estimate size of result for each operation
in tree!
• Use information about the input relations.
• For selections and joins, assume independence of
predicates.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke              28
Enumeration of Alternative Plans
   There are two main cases:
   Single-relation plans
   Multiple-relation plans
   For queries over a single relation, queries consist of a
combination of selects, projects, and aggregate ops:
   Each available access path (file scan / index) is considered,
and the one with the least estimated cost is chosen.
   The different operations are essentially carried out
together (e.g., if an index is used for a selection, projection
is done for each retrieved tuple, and the resulting tuples
are pipelined into the aggregate computation).
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke        29
Choice of Evaluation Plans
   Must consider the interaction of evaluation techniques
when choosing evaluation plans: choosing the
cheapest algorithm for each operation independently
may not yield best overall algorithm. E.g.
 merge-join may be costlier than hash-join, but may provide a
sorted output which reduces the cost for an outer level
aggregation.
 nested-loop join may provide opportunity for pipelining
   Practical query optimizers incorporate elements of the
1. Search all the plans and choose the best plan in a
cost-based fashion.
2. Uses heuristics to choose a plan.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke              30
Cost-Based Optimization

   Consider finding the best join-order for r1 r2 . . . rn.
   There are (2(n – 1))!/(n – 1)! different join orders for
above expression. With n = 7, the number is 665280,
with n = 10, the number is greater than 176 billion!
   No need to generate all the join orders. Using
dynamic programming, the least-cost join order for
any subset of
{r1, r2, . . . rn} is computed only once and stored for
future use.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke         31
Dynamic Programming in
Optimization
   To find best join tree for a set of n relations:
 To find best plan for a set S of n relations,
consider all possible plans of the form: S1 (S
– S1) where S1 is any non-empty subset of S.
 Recursively compute costs for joining subsets of
S to find the cost of each plan. Choose the
cheapest of the 2n – 1 alternatives.
 When plan for any subset is computed, store it
and reuse it when it is required again, instead
of recomputing it
• Dynamic programming

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke       32
Join Order Optimization Algorithm
procedure findbestplan(S)
if (bestplan[S].cost  )
return bestplan[S]
// else bestplan[S] has not been computed earlier,
compute it now
for each non-empty subset S1 of S such that S1  S
P1= findbestplan(S1)
P2= findbestplan(S - S1)
A = best algorithm for joining results of P1 and P2
cost = P1.cost + P2.cost + cost of A
if cost < bestplan[S].cost
bestplan[S].cost = cost
bestplan[S].plan = “execute P1.plan; execute
P2.plan;
join results of P1 and P2 using A”
return bestplan[S]
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke      33
Left Deep Join Trees
   In left-deep join trees, the right-hand-
side input for each join is a relation, not
the result of an intermediate join.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   34
Cost of Optimization
   To find best left-deep join tree for a set of n relations:
 Consider n alternatives with one relation as right-hand side
input and the other relations as left-hand side input.
 Using (recursively computed and stored) least-cost join
order for each alternative on left-hand-side, choose the
cheapest of the n alternatives.
   If only left-deep trees are considered, time
complexity of finding best join order is O(n 2n)
 Space complexity remains at O(2n)
   Cost-based optimization is expensive, but
worthwhile for queries on large datasets (typical
queries have small n, generally < 10)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                35
Heuristic using Left-Deep Plans
   Left-deep plans differ only in the order of relations, the access
method for each relation, and the join method for each join.
   Enumerated using N passes (if N relations joined):
   Pass 1: Find best 1-relation plan for each relation.
   Pass 2: Find best way to join result of each 1-relation plan (as outer) to
another relation. (All 2-relation plans.)
   Pass N: Find best way to join result of a (N-1)-relation plan (as outer)
to the N’th relation. (All N-relation plans.)
   For each subset of relations, retain only:
   Cheapest plan overall, plus
   Cheapest plan for each interesting order of the tuples.
   Complexity: only O(N2 ) (but not necessarily optimal!)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                       36
Interesting Orders in Cost-Based
Optimization
   Consider the expression (r1 r2 r3) r4 r5
   An interesting sort order is a particular sort order of
tuples that could be useful for a later operation.
 Generating the result of r1 r2 r3 sorted on the attributes
common with r4 or r5 may be useful, but generating it sorted
on the attributes common only r1 and r2 is not useful.
 Using merge-join to compute r1 r2 r3 may be costlier, but
may provide an output sorted in an interesting order.
   Not sufficient to find the best join order for each
subset of the set of n given relations; must find the
best join order for each subset, for each interesting
sort order
 Simple extension of earlier dynamic programming
algorithms
 Usually, number of interesting orders is quite small and
doesn’t affect time/space complexity significantly
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                  37
Heuristic Optimization
   Cost-based optimization is expensive, even with
dynamic programming.
   Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
   Heuristic optimization transforms the query-tree by
using a set of rules that typically (but not in all cases)
improve execution performance:
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of attributes)
 Perform most restrictive selection and join operations before
other similar operations.
 Some systems use only heuristics, others combine heuristics
with partial cost-based optimization.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke              38
Steps in Typical Heuristic
Optimization
1. Deconstruct conjunctive selections into a sequence of
single selection operations (Equiv. rule 1.).
2. Move selection operations down the query tree for the
earliest possible execution (Equiv. rules 2, 7a, 7b, 11).
3. Execute first those selection and join operations that will
produce the smallest relations (Equiv. rule 6).
4. Replace Cartesian product operations that are followed by
a selection condition by join operations (Equiv. rule 4a).
5. Deconstruct and move as far down the tree as possible
lists of projection attributes, creating new projections
where needed (Equiv. rules 3, 8a, 8b, 12).
6. Identify those subtrees whose operations can be pipelined,
and execute them using pipelining).

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke     39
Structure of Query Optimizers
   The System R/Starburst optimizer considers only
left-deep join orders. This reduces optimization
complexity and generates plans amenable to
pipelined evaluation.
System R/Starburst also uses heuristics to push
selections and projections down the query tree.
   Heuristic optimization used in some versions of
Oracle:
 Repeatedly pick “best” relation to join next
• Starting from each of n starting points. Pick best among these.
   For scans using secondary indices, some optimizers
take into account the probability that the page
containing the tuple is in the buffer.
   Intricacies of SQL complicate query optimization
 E.g. nested subqueries
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                        40
Enumeration of Plans (Contd.)
   ORDER BY, GROUP BY, aggregates etc. handled as a
final step, using either an `interestingly ordered’
plan or an addional sorting operator.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   41
Sailors:
B+ tree on rating                             sname

Example                 Hash on sid
Reserves:
B+ tree on bid
   Pass1:                                                          sid=sid

 Sailors: B+ tree matches rating>5, and is probably cheapest.
However, if this selection is expected to                bid=500 rating>5
retrieve a lot of tuples, and index is unclustered,
file scan may be cheaper.
• Still, B+ tree plan kept (because tuples are in rating order). Reserves Sailors
   Reserves: B+ tree on bid matches bid=500; cheapest.
   So we have two plans on Sailors and one on Reserves
v   Pass 2:
– We consider each plan retained from Pass 1 as the outer, and
consider how to join it with the (only) other relation.
e.g., Reserves as outer: Hash index can be used to get Sailors tuples
that satisfy sid = outer tuple’s sid value. (only linear scan of SAILORS)
SELECT S.sname
FROM Sailors S
Nested Queries                                    WHERE EXISTS
(SELECT *
FROM Reserves R
   Nested block is optimized                                WHERE R.bid=103
independently, with the outer                            AND R.sid=S.sid)
tuple considered as providing a
selection condition.                                         Nested block to optimize:
SELECT *
   Outer block is optimized with
FROM Reserves R
the cost of `calling’ nested block
WHERE R.bid=103
computation taken into account.
AND S.sid= outer value
   Implicit ordering of these blocks
means that some good strategies                        Equivalent non-nested query:
are not considered. The non-                           SELECT S.sname
nested version of the query is                         FROM Sailors S, Reserves R
typically optimized better.                            WHERE S.sid=R.sid
AND R.bid=103
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke                          43
A comprehensive example
SELECT S.sid, COUNT(*) AS numres
FROM boats B, Reserves R, Sailors S
WHERE R.sid = S.sid AND B.bid = R.bid AND B.color = ‘red’
GROUP BY S.sid

Query finds the number of red boats reserved by each sailor
sid COUNT(*) AS numres

GROUPBY sid

Sid = sid

Bid = bid
sailors

boats         color = ‘red’                   reserves

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke             44
Assume the following Indexes
   For Reserves
 A B+tree on Sid
 A clustered B+tree on Bid
   For Boats
 Both B+tree and Hash indexes on Color
   For Sailors
 A clustered B+tree and Hash indexes on Sid

   Note, some indexes are especially contrived
for the example…
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   45
A comprehensive example – cont.
   Pass 1 – best plan for each relation
 Reserves, sailors – file scan
 Boats – best is hash index but B-tree is used
because of interesting order.
   Pass 2 – all left deep plans
   Reserves (outer) sid boats (inner)
   Boats (outer) sid reserves (inner)
   Reserves (outer) sid sailors (inner)
   Sailors (outer) sid reserves (inner)
   Need to consider all relevant join methods:
BNL, SORT-Merge, Index,Hash
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   46
A comprehensive example – cont.
   Some good plans:
 Boats sid reserves using index join since Bid is a
clustered index for reserves.
 Reserves sailors using sort-merge since sailors
already has clustered B tree index on sid.
   Pass 3 – consider each plan from stage 2 as an
outer. Then use sort-merge join on Sid if join
with Sailors is last, otherwise must sort result
on Sid. This since we need to do a GROUP-by
on Sid. So because of the Group-By the join
with Sailors on Sid may better be the last
one…
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke      47
The system R optimizer
Current relational query optimizers have been
greatly influenced by choices made in the
design of IBM’s system R query optimizer.
Important design choices in the system R
optimizer include:
 The use of statistics about the database instance to
estimate the cost of a query evaluation plan.
 A decision to consider only plans with binary joins in
which the inner relation is a base relation (I.e. not a
temporary relation). This heuristic reduces the
potentially very large number of alternative plans
that must be considered.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke       48
The system R optimizer

 A decision to focus optimization on the class of
SQL queries without nesting , and to treat nested
queries in a relatively as hoc way.
 A decision not to perform duplicate elimination
for projections (except as a final step in the query
evaluation when required by a DISTINCT clause).
 A model of cost that accounted for CPU costs as
well as I/O costs.
 Search the space of plans for an optimal plan
using hill-climbing.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke        49

 Index selection – general problem is NP-
complete.
 Query with many joins – AI technique like
simulated annealing. [J9]
 Parallel database machines.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke   50
Materialized Views**
   A materialized view is a view whose contents are
computed and stored.
   Consider the view
create view branch_total_loan(branch_name, total_loan)
as
select branch_name, sum(amount)
from loan
group by branch_name
   Materializing the above view would be very useful if
the total loan amount is required frequently
 Saves the effort of finding multiple tuples and
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke      51
Optimization with Materialized
views (Data Warehousing)
   Which views to materialize?

   Given a query and a set of materialized views:
 Can the query be answered using only subset of the
materialized views?
 If yes, what is the best sub-set?

   All these topic are associated with optimization in Data
warehousing!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke       52
Query Optimization and Materialized
Views
   Rewriting queries to use materialized views:
 A materialized view v = r     s is available
 A user submits a query r s t
 We can rewrite the query as v t
• Whether to do so depends on cost estimates for the two alternative
   Replacing a use of a materialized view by the view definition:
 A materialized view v = r      s is available, but without any index on
it
 User submits a query A=10(v).
 Suppose also that s has an index on the common attribute B, and r has
an index on attribute A.
 The best plan for this query may be to replace v by r s, which can
lead to the query plan A=10(r) s
   Query optimizer should be extended to consider all above
alternatives and choose the best overall plan
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke               53
Materialized View Selection
   Materialized view selection: “What is the best set of views to
materialize?”.
   Index selection: “what is the best set of indices to create”
 closely related, to materialized view selection
• but simpler
   Materialized view selection and index selection based on typical
 Typical goal: minimize time to execute workload , subject to
constraints on space and time taken for some critical
 One of the steps in database tuning
• more on tuning in later chapters
   Commercial database systems provide tools (called “tuning
assistants” or “wizards”) to help the database administrator
choose what indices and materialized views to create
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke               54
Materialized View Maintenance
   The task of keeping a materialized view up-to-date with the
underlying data is known as materialized view maintenance
   Materialized views can be maintained by recomputation on every
update
   A better option is to use incremental view maintenance
 Changes to database relations are used to compute changes to
the materialized view, which is then updated
   View maintenance can be done by
 Manually defining triggers on insert, delete, and update of each
relation in the view definition
 Manually written code to update the view whenever database
relations are updated
 Periodic recomputation (e.g. nightly)
 Above methods are directly supported by many database systems
• Avoids manual effort/correctness issues
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke               55
Summary
   Query optimization is an important task in a relational DBMS.
   Must understand optimization in order to understand the
performance impact of a given database design (relations,
indexes) on a workload (set of queries).
   Two parts to optimizing a query:
 Consider a set of alternative plans.
• Must prune search space; typically, left-deep plans only.
 Must estimate cost of each plan that is considered.
• Must estimate size of result and cost for each plan node.
• Key issues: Statistics, indexes, operator implementations.
   Using materialized views (Data Warehousing)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke     56

```
To top