Docstoc

Query Optimization II.ppt

Document Sample
Query Optimization II.ppt Powered By Docstoc
					Query Optimization

      R&G, Chapter 15
           Lecture 16
  Administrivia

• Homework 3 available today
   – Written exercise; will be posted on class website
   – Due date: Tuesday, March 20 by end of class period

• Homework 4 available later this week
   – Implement nested loops and hash join operators for
     minibase
   – Due date: April 5 (after Spring Break)

• Midterm 2 is 3/22, 2 weeks from today
   – In class, covers lectures 10-17
   – Review will be held Tuesday 3/20 7-9 pm 306 Soda Hall
      Review
                           Now you are here
                           •Query plans are a tree of operators that
                           compute the result of a query
   Query Optimization      •Optimization is the process of picking the
                           best plan
     and Execution
                           •Execution is the process of executing the
  Relational Operators     plan

Files and Access Methods    You were here

  Buffer Management        •Operators are the building blocks for
                           computing results of queries
Disk Space Management          •Sort
                               •Project
                               •Join
                               •Filter
          DB                   •Access methods for files
                               •...
             Query Plans: turning text into tuples
                                                      Query Result
 Query                                                                         Shift
                                                       Name
SELECT A.aname, max(F.feedingshift)
FROM Animals A, Feeding F                              Aslan                      3
WHERE A.aid = F.aid AND                                Bageera                    3
(A.species = 'Big Cat' or A.species = 'Bear')
GROUP BY A.aname                                       Elsa                       3
HAVING COUNT(*) > 1
                                                       Shere Khan                 3
                                                       Tigger                     2

                                                     Operators
                         Query Plan



                    10    2   1   100   3       40    100     Aslan     Big Cat
                                                50    300     Baloo     Bear
                    10    3   2   100   3
                                                60    100     Bageera   Big Cat
                    20    3   2   100   3
                                                              Shere
                    20    2   3   100   3       70    100               Big Cat
                                                              Khan
                    30    3   2   100   3       90    100     Dumbo     Elephant
                    …     …   …   …     …       …     …       …         …
    Operator Review

•   Access Path : pulls tuples from tables
     – File scans
     – Index scans (clustered or unclustered)
     – Index-only scans
•   Select (or Filter): conditionally excludes
    tuples
     – Can be ‘pushed’/combined with Access Path
       operator
         • Use indexes where possible and apply other
           predicates on the result
     – Can also be applied at intermediate point in
       query plan
•   Projection: removes columns and duplicates
     – Column projection often done by operators
     – Duplicate elimination via Sort or Hash
    Operator Review

• Sort: sorts tuples in a particular order
   – Simple merge sort
   – General external merge sort (with
     various optimizations)
   – B+ tree traversal
• Join: combine tuples from 2 other
  operators
   – Page nested loops
   – Block nested loops
   – Index nested loops
   – Sort-merge join
   – Hash-join
• Other operators for
   – Group By, Temping, …
   Query Optimization steps
       SELECT A.aname, max(F.feedingshift)
       FROM Animals A, Feeding F
       WHERE A.aid = F.aid AND                                      Query parser
       (A.species = 'Big Cat' or A.species = 'Bear')
       GROUP BY A.aname
       HAVING COUNT(*) > 1
                                                                Block 3
1. Parse query from text to                                                 Block 2
                                                                                       Block 1
   ‘intermediate model’
2. Traverse ‘intermediate model’ and
   produce alternate query plans
                                                                    Query optimizer
   –   Query plan = tree of relational
       operators
   –   Optimizer keeps track of cost and
       properties of plans
3. Pick the cheapest plan
4. Pass cheapest plan on to query                      Cost = 200         Cost = 150         Cost = 500

   execution engine to execute and
   produce results of query                                                       To execution engine
     Query Blocks: Units of Optimization
• Intermediate model is a set of      SELECT A.aname, max(F.feedingshift)
  query blocks                        FROM Animals A, Feeding F
                                      WHERE A.aid = F.aid AND (A.species = 'Big Cat'
   – 1 per                            or A.species = 'Bear')
     SELECT/FROM/WHERE/GROUP          GROUP BY A.aname
                                      HAVING COUNT(*) > 1
     BY/HAVING clause
                                                                           Query Block

 • Subqueries produce                         SELECT S.sname
   nested query blocks                        FROM Sailors S
    – treated as calls to a                   WHERE S.age IN
      subroutine, made once                     (SELECT MAX (S2.age)
      per tuple produced by                      FROM Sailors S2
                                                 GROUP BY S2.rating)
      outer query block
    – sometimes subqueries can
      be rewritten to produce

                                                     X
                                 Outer Query Block

      cheaper plan
                                                                           Rewritten Query Block
                                                      Nested Query Block
   Query blocks are optimized 1 at a time

1. Convert block to relational algebra
   tree
2. Traverse tree and build plan bottom
   up:                                               Query Block


    • Pick best access method for each
       relation in FROM clause
       •   Applying predicates if possible
   •   Consider all join trees
       •   All ways to join relations in FROM
           clause 1-at-a time
       •   Consider multiple permutations and join
           methods
            – But not all! too many choices
            – Restrict to left-deep plans
            – Prune bad plans along the way
       Converting Query Blocks to Relational
       Algebra Trees
 • SQL is relationally complete; can express
   everything in relational algebra
 SELECT S.sname
 FROM Reserves R, Sailors S
 WHERE R.sid=S.sid AND
   R.bid=100 AND S.rating>5
                                                                   sname



                                                           bid=100    rating > 5
(sname)(bid=100  rating > 5) (Reserves  Sailors)

                                                               sid=sid


                                                        Reserves           Sailors
  SQL extends Relational Algebra
• SQL is more powerful than relational algebra
   – extend relational algebra to include aggregate
     ops: GROUP BY, HAVING

• How is this query block expressed?
   SELECT S.sname
   FROM Sailors S
   WHERE S.age IN (constant set from subquery)

                 σ
          Πsname(σ(age in set from subquery) Sailors
                                             Sailors)
• And this query block?
   SELECT MAX (S2.age)
       FROM Sailors S2
       GROUP BY S2.rating
                                       ΠMax(age)(GroupByRating(Sailors) )
                                                 GroupBy Rating(Sailors)
   Why optimize?
                               sname



                       bid=100    rating > 5




                           sid=sid


                    Reserves           Sailors




• Operators have implementation choices
  – Index scan? File scan? Nested loop join? Sort merge?
• Operators can also be applied in different order!
       Motivating Example --
       Schema used
     Sailors (sid: integer, sname: string, rating: integer, age: real)
     Reserves (sid: integer, bid: integer, day: dates, rname: string)

• As seen in previous lectures…
• Reserves:
   – Each tuple is 40 bytes long, 100 tuples per page, 1000
     pages.
   – Assume there are 100 boats
• Sailors:
   – Each tuple is 50 bytes long, 80 tuples per page, 500 pages.
   – Assume there are 10 different ratings
• Assume we have 5 pages in our buffer pool!
         Motivating Example
  SELECT S.sname
                                                             (On-the-fly)
  FROM Reserves R, Sailors S                 sname

  WHERE R.sid=S.sid AND
    R.bid=100 AND S.rating>5                    rating > 5   (On-the-fly)
                                      bid=100

• Cost: 500+500*1000 I/Os
• Not the worst plan, but…                          (Page-Oriented
• Misses several opportunities:   Plan:     sid=sid Nested loops)
   – selections could have been
     `pushed’ earlier,
                                  Sailors            Reserves
   – indexes might have been                          1000
                                   500
     helpful….
• Goal of optimization: To find
  more efficient plans that
  compute the same answer.
 Selectivity calculation

• Sailors: 500 pages, 80 tuples per page, 10
  ratings
• Selectivity of S.rating > 5?
  – ½ -> 500*80/2 = 20,000 tuples
  – 20,000/80 = 250 pages
• Reserves: 1000 pages, 100 tuples per page,
  100 boats
• Selectivity of R.bid = 100?
  – 1/100 -> 1000*100/100 = 1000 tuples
  – 1000/100 = 10 pages
          Alternative Plans – Push Selects
          (No Indexes)
                                                                                    (On-the-fly)
                                                                   sname

                             (On-the-fly)
            sname
                                                                   bid=100     (On-the-fly)

      bid=100   rating > 5   (On-the-fly)
                                                                          (Page-Oriented
                                                                  sid=sid Nested loops)
                   (Page-Oriented                     250                    1000
           sid=sid Nested loops)
                                                  rating > 5
500                   1000                                     (On-the-fly) Reserves

Sailors             Reserves
                                              500 Sailors


   500,500 IOs                              500 + 250 *1000 =       250,500 IOs
             Alternative Plans – Push Selects
             (No Indexes)
                              (On-the-fly)
                 sname
                                                                           (On-the-fly)
                                                             sname

                 bid=100   (On-the-fly)
                                                                    (Page-Oriented
                                                            sid=sid Nested loops)
                                               250                      10
                        (Page-Oriented
                sid=sid Nested loops)          rating > 5            bid = 100
  250                      1000
                                                    (On-the-fly)             (On-the-fly)
rating > 5
             (On-the-fly) Reserves
                                          500 Sailors                Reserves    1000

Sailors

     250,500 IOs
                                     500 + 250 *1000 =      250,500 IOs
             Alternative Plans – Try different join
             order
                              (On-the-fly)                                    (On-the-fly)
                 sname                                         sname




                 bid=100                                     rating > 5   (On-the-fly)
                           (On-the-fly)



                        (Page-Oriented                               (Page-Oriented
                sid=sid Nested loops)                        sid=sid Nested loops)
                                                      10
                 swap                                                       500
rating > 5                                      bid=100                   Sailors
             (On-the-fly) Reserves                        (On-the-fly)


Sailors                                   1000 Reserves

    250,500 IOs                      1000 + 10 *500=         6000 IOs
            Alternative Plans – Push Selects and
            precompute result (No Indexes)
                                  (On-the-fly)
                 sname

                                                                             (On-the-fly)
                                                             sname
               rating > 5   (On-the-fly)

                                                                    (Page-Oriented
                                                            sid=sid Nested loops)
                       (Page-Oriented              10                    250
               sid=sid Nested loops)                                                (Scan &
       10                   500                  bid=100               rating > 5   Write to 250
                                                        (On-the-fly)                temp T2)
 bid=100                    Sailors
            (On-the-fly)

                                         1000 Reserves                   Sailors      500
Reserves
1000
             6000 IOs              1000 + 500+ 250 + (10 * 250) = 4250 IOs
         Alternative Plans – Try different join
         order

                             (On-the-fly)                                      (On-the-fly)
               sname                                          sname



                      (Page-Oriented                                 (Page-Oriented
              sid=sid Nested loops)                          sid=sid Nested loops)
       10                                          250                    10
                          250
              swap                  (Scan &                                       (Scan &
   bid=100             rating > 5   Write to     rating>5               bid=100   Write to 10
        (On-the-fly)                temp T2)             (On-the-fly)             temp T2)
                                      250
 Reserves                Sailors               Sailors                   Reserves
                           500                                                      1000
1000                                            500
             4250 IOs

                                     500 + 1000 +10 +(250 *10) = 4010 IOs
          Optimized query is 124x cheaper
          than the original!
                           (On-the-fly)                                 (On-the-fly)
           sname                                         sname



                                                                (Page-Oriented
    bid=100   rating > 5   (On-the-fly)
                                                        sid=sid Nested loops)

                                                                             (Scan &
                                            rating>5               bid=100   Write to
                  (Page-Oriented
          sid=sid Nested loops)                     (On-the-fly)             temp T2)


                                          Sailors                   Reserves
Sailors            Reserves


                                                              4010 IOs
   500,500 IOs
                                                              (On-the-fly)
         More Alternative Plans
                                                          sname



         (No Indexes)                                 sid=sid
                                                                  (Sort-Merge Join)


                                       (Scan;                                  (Scan;
• Main difference:                                                             write to
                                       write to bid=100           rating > 5
                                       temp T1)                                temp T2)
  Sort Merge Join
                                               Reserves           Sailors
• With 5 buffers, cost of plan:
  – Scan Reserves (1000) + write temp T1 (10 pages, if we
     have 100 boats, uniform distribution) = 1010.
  – Scan Sailors (500) + write temp T2 (250 pages, if have 10 ratings) =
     750.
  – Sort T1 (2*2*10) + sort T2 (2*4*250) + merge (10+250) = 2300

   – Total: 4060 page I/Os.
• If use BNL join, join = 10+4*250, total cost = 2770.
• Can also `push’ projections, but must be careful!
   – T1 has only sid, T2 only sid, sname:
   – T1 fits in 3 pgs, cost of BNL under 250 pgs, total < 2000.
                                                                          (On-the-fly)

          More Alt Plans: Indexes
                                                                sname


• With clustered index on bid
                                                                              (On-the-fly)
                                                                rating > 5
  of Reserves, we get
  100,000/100 = 1000 tuples
                                                                        (Index Nested Loops,
                                                           sid=sid with pipelining )
  on 1000/100 = 10 pages.                   (Use hash
                                            Index, do
• INL with outer not                        not write
                                                      bid=100       Sailors
  materialized.                             to temp)
    – Projecting out unnecessary fields from outer   Reserves
        doesn’t help.
   Join column sid is a key for Sailors.
     –At most one matching tuple, unclustered index on sid OK.
   Decision not to push rating>5 before the join is based on
    availability of sid index on Sailors.
   Cost: Selection of Reserves tuples (10 I/Os); then, for each,
    must get matching Sailors tuple (1000*1.2); total 1210 I/Os.
  What is needed for optimization?
• A closed set of operators
   – Relational ops (table in, table out)
   – Encapsulation based on iterators
• Plan space, based on
   – Based on relational equivalences, different
     implementations
• Cost Estimation, based on
   – Cost formulas
   – Size estimation, based on
      • Catalog information on base tables
      • Selectivity (Reduction Factor) estimation
• A search algorithm
   – To sift through the plan space based on cost!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:5/16/2012
language:
pages:24
wangnuanzg wangnuanzg http://
About