Aung S. Win
Query processing and optimization.
Static versus dynamic query optimization.
How a query is decomposed and
How to create a relational algebra tree to
represent a query.
The rules of equivalence for the relational
Heuristic transformation rules.
The types of database statistics required to
estimate the cost of operations.
The different strategies for implementing the
relational algebra operations.
The difference between materialization and
The advantages of left-deep trees.
The activities involved in retrieving data from
The aims of query processing
(1)to transform a query written in a high-level
language into a low-level language
(2)to execute the strategy to retrieve the
Query processing can be divided into four
main phases: decomposition, optimization,
code generation, and execution.
The aims of query decomposition
(1)to transform a high-level query into a
relational algebra query.
(2)to check that the query is syntactically and
The typical stages of query decomposition
are analysis, normalization, semantic
analysis, simplification, and query
The query is lexically and syntactically
analyzed using the techniques of
programming language compilers.
Verifies that the relations and attributes
specified in the query are defined in the
Verifies that any operations applied to
database objects are appropriate for the
On completion of the Root
analysis, the high-level
query has been
transformed into some
internal representation Intermediate operations
(query tree) that is
more suitable for
Converts the query into a normalized form
that can be more easily manipulated.
There are two different normal forms,
conjunctive normal form and disjunctive
Conjunctive normal form
A sequence of conjuncts that are connected
with the and operator. Each conjunct
contains one or more terms connected by the
(position=‘Manager’ V salary>20000) ^
branchNo = ‘B003’
Disjunctive normal form
A sequence of disjuncts that are connected
with the or operator. Each disjunt contains
one or more terms connected by the and
(position=‘Manager’ ^ branchNo = ‘B003’) V
(salary>20000 ^ branchNo = ‘B003’)
The objective is to reject normalized queries
that are incorrectly formulated or
To detect redundant qualifications, eliminate
common subexpressions , and transform the
query to a semantically equivalent but more
easily and efficiently computed form.
Access restrictions, view definitions, and
integrity constraints are considered at this
The final stage of query decomposition.
The query is restructured to provide a more
The activity of choosing an efficient execution
strategy for processing a query.
An important aspect of query processing is
The aim of query optimization is to choose
the one that minimizes resource usage.
Every method of query optimization depend on
The statistics cover information about relations,
attribute, and indexes.
Keeping the statistics current can be problematic.
If the DBMS updates the statistics every time a tuple
is inserted, updated, or deleted, this would have a
significant impact on performance during peak
An alternative approach is to update the
statistics on a periodic basis, for example
nightly, or whenever the system is idle.
Dynamic query optimization
Advantage: all information required to select
an optimum strategy is up to date.
Disadvantage: the performance of the query
is affected because the query has to be
parsed, validated, and optimized before it
can be executed.
Static query optimization
The query is parsed, validated, and
optimized once that is similar to the approach
taken by a compiler for a programming
1)The runtime overhead is removed
2)More time available to evaluate a larger
number of execution strategies.
Disadvantage: the execution strategy that is
chosen as being optimal when the query is
compiled may no longer be optimal when the
query is run.
Transformation Rules for the
Relational Algebra Operations
By applying transformation rules, we can
transform one relational algebra into an
equivalent expression that is more efficient.
There are twelve rules that can be used to
restructure the relational algebra tree
generated during query decomposition.
Many DBMSs use heuristics to determine strategies
for query processing.
Heuristics rules include
-performing Selection and Projections as early as
-combining Cartesian product with a subsequent
selection whose predicate represents a join
condition into a join operation.
-using associativity of binary operations to
rearrange leaf nodes so that leaf nodes with
the most restrictive Selections are executed
Depends on statistical information held in the
Typical statistics include the cardinality of
each base relation, the number of blocks
required to store a relation, the number of
distinct values for each attribute, the
selection cardinality of each attribute, and the
number of levels in each multilevel index.
Block nested loop join
Indexed nested loop join
In materialization, the output of one operation is
stored in a temporary relation for processing by the
An alternative approach is to pipeline the results of
one operation to another operation without creating a
temporary relation to hold the intermediate result.
By using it, we can save on the cost of creating
temporary relations and reading the results back in
Left – deep trees
A relational algebra tree where the right-hand
relation is always a base relation.
Advantages: reducing the search space for
the optimum strategy and allowing the query
optimizer to be based on dynamic processing