Docstoc

On the fly

Document Sample
On the fly Powered By Docstoc
					Determining the Cost of Algebraic Operators

   Background: Memory, Files, Indexes

               External Sort                  Selection

                                                Join
      Evaluating Individual Operators
                                               Other

Finding Efficient Query Plans

            Optimizer Architecture

         Cost of Combining Operators

              Enumerating Plans
                                                          1
Relational Query Optimization


 Evaluating Queries with Several
           Operators


                                   2
                   Topic

• Until now, we have seen how to optimize
 queries with a single operator
• Most queries have several operators
• We now discuss how queries with several
 operators can be optimized




                                            3
      Simplest Way to Implement
          Complex Queries
• Evaluate operators, one at a time
• After each evaluation, write temporary
 results to disk
• Read temporary results from disk, as input to
 the next operator




                                              4
         Simplest Way to Implement
             Complex Queries
• Example: SELECT             S.sname
               FROM           Sailors S, Reserves R
               WHERE          S.sid = R.sid and
                              R.bid = 100 and S.rating>5

     sname(bid=100 and rating>5(Reserves   Sailors))

• Compute join and write result T1 to disk
• Read T1, compute selection, and write result T2 to
  disk
• Read T2, compute projection, and return to user
                                                           5
        Simplest Way to Implement
            Complex Queries
     sname(bid=100 and rating>5(Reserves           Sailors))
• Compute join and write result T1 to disk
• Read T1, compute selection, and write result T2 to disk
• Read T2, compute projection, and return to user
• A query plan:                sname



                        bid=100   rating > 5   File scan, write to T2



                                        Block nested loops, write to T1
                            sid=sid


                    Reserves           Sailors
                                                                          6
          Simplest Way to Implement
              Complex Queries
       about it:
  Thinksname(bid=100 and rating>5(Reserves Sailors))
• 1) Does this seem write result T1 to disk
   Compute join and efficient?
     No. join and select can be calc together, can print
• Read namecompute selection, and write result T2 to disk
     the T1, while choosing it.
   Read T2, use an index for the selection? No.
• 2) Can wecompute projection, and return to user
     the selection is for temp table that has no index.
• 3) querywould we implement the projection?
   A How plan:           sname
     as we said in last part (last slide).
                            bid=100   rating > 5   File scan, write to T2



                                           Block nested loops, write to T1
                                sid=sid


                       Reserves           Sailors
                                                                             7
Query Optimization: Basic Architecture

     Query Parser



  Query Optimizer

    Plan        Plan Cost      Catalog
  Generator     Estimator      Manager




   Query Plan Evaluator
                                         8
               Simplifications
• SQL Queries may be composed of several blocks
• Each block is optimized separately
• In our discussion, we assume a single block
    SELECT  S.sid, MIN(R.day)
    FROM    Sailors S, Reserves R, Boats B
    WHERE   S.sid = R.sid and R.bid = B.bid
            and B.color = ‘red’ and
            S.rating = (SELECT MAX(rating)
                        FROM Sailors)
    GROUP BY S.sid
    HAVING   count(*)>2                           9
                  Simplifications
• Queries may contain group-by and aggregation
• Are applied as a final step in evaluation (how? Hash/ Sort)
• We only consider queries without aggregation


     SELECT  S.sid, MIN(R.day)
     FROM    Sailors S, Reserves R, Boats B
     WHERE   S.sid = R.sid and R.bid = B.bid
             and B.color = ‘red’ and
             S.rating = (SELECT MAX(rating)
                         FROM Sailors)
     GROUP BY S.sid
     HAVING   count(*)>2                                        10
                Simplifications
• By making the simplifications discussed, we derive
  queries that can be translated into relational
  algebra. We show how to evaluate such queries.

    SELECT  S.sid, MIN(R.day)
    FROM    Sailors S, Reserves R, Boats B
    WHERE   S.sid = R.sid and R.bid = B.bid
            and B.color = ‘red’ and
            S.rating = (SELECT MAX(rating)
                        FROM Sailors)
    GROUP BY S.sid
    HAVING   count(*)>2                                11
Query Plans




              12
        Translating to Algebra Trees
SELECT          S.sname
FROM            Sailors S, Reserves R
WHERE           S.sid = R.sid and
                R.bid = 100 and S.rating>5



sname(bid=100 and rating>5(Reserves   Sailors))              sname



                                                       bid=100    rating > 5




                                                           sid=sid


                                                    Reserves           Sailors
                                                                               13
                             Query Plans
• A query plan is a relational algebra tree, annotated
  with access and evaluation methods
   – Convention: In trees when using block/index nested
     loops join, left child is outer relation
                                                                (On-the-fly)
                                               sname
           sname


                                       bid=100     rating > 5     (On-the-fly)
   bid=100    rating > 5



                                                        (Block Nested Loops)
       sid=sid                                sid=sid



Reserves           Sailors         Reserves             Sailors                  14
                           Alternative Plans

 • There may be many different query plans for the
    same query. Query optimizer must:
     – enumerate such plans
     – choose the best plan
                                                                                (On-the-fly)
                                                               sname
                       (On-the-fly)
                   sname


                                                       bid=100     rating > 5     (On-the-fly)
                           (Sort-Merge Join)
               sid=sid

(Scan;                                  (Scan;                          (Block Nested Loops)
write to bid=100           rating > 5   write to
temp T1)                                temp T2)              sid=sid

        Reserves           Sailors
                                                   Reserves             Sailors                  15
    ‫‪Pipelined Evaluation‬‬


     ‫”‪Evaluation “On the Fly‬‬
‫פעולת ביניים לא נכתבת אל הדיסק אלא מועברת‬
           ‫ישירות לפעולה הבאה‬


                                            ‫61‬
                      Pipelining

• When evaluating a query, the results of one
  operator may have to be fed into another operator.
  This can be done in two ways:
  – Materializing: the first operator creates a temporary
    relation, which is read by the second operator
  – Pipelining (on the fly computation): the results of the
    first operator are fed directly into the second operator

• Pipelining can save on the I/O costs!
  – We save the cost of writing a temporary result to disk!
  – We save the cost of reading for the following operation!   17
    Pushing Selections/Projections
                                                                                                             Why?
            and Pipelining
                                       Can Push through                               Can Pipeline
                                       Selection/Projection                           into?
    Block NL, Outer                                                                                 

    Block NL, Inner                                                                                X

    Index NL, Outer                                                                                 

    Index NL, Inner                                         X                                       X

    Sort Merge                                                                                     X
•   BNLO: R is read once. While reading check the selection cond. If it does add it to the projectile. Without writing to the
    disk.
•   INLO: same.
                                                                                                                                18
•   When both possible it is always better to push and pipeline
     When is Pipelining Impossible

• Impossible if:
  – Input must be read several times
     • Block nested loops join – inner relation

  – Input must be read via an index
     • Index nested loops join – inner relation

  – Input must be read in a specific order (and the
    previous operator does not output in that order
     • Sort merge join

                                                      19
       Example 1: Pipelining in Unary
                Operators
• Suppose we want to apply two selections on the
  table Reserves: date<05/05/05 and bid=100. We
  have a BTree index on date:
  – Materializing: Find tuples matching date condition using
    BTree. Store in temporary relation. Scan relation to find
    tuples matching bid condition
  – Pipelining: Find tuples matching date condition using
    BTree. As tuples are retrieved, apply second condition
• Pipelining into unary operators is called on-the-fly
  evaluation
                                                             20
     Example 2: Pipelining in Joins

• Suppose we want to compute (A          B)    C
• Materializing: Compute A         B, store and then read
  and join with C
• Pipelining: The joins can be pipelined if some type
  of nested loops join is used.
  – Read blocks of tuples from A
  – Find matching tuples from B (using a scan or index)
  – Find matching tuples from C
                                                       C

                                           A       B       21
       Pipelining Joins: Questions

• Can the execution of the join be pipelined if
 the join with C is performed using sort-merge
 join?
  – No.
  – Example: R(A, B), S(B, C), T(C, D)
  – Sort-merge (R, S) will be sorted by B
  – Sort-merge (S, T) will be sorted by C
                                                               C
  – Therefore C must be read from the file,
  Meaning it must have been written to the file
                                                  A   B   22
       Fully Pipelined Execution

• The execution is fully pipelined if no
 temporary relations are written to disk
 throughout the entire query processing




                                           23
        Pipelining Joins: Questions

• Which of the following trees can allow for
    fully pipelined execution (e.g., using a
    nested loops join)? Middle one.

                                        D           D

                                    C       C

A          B   C        D   A   B               A   B




                                                        24
                     Left Deep Plans

• A plan is left-deep if the inner relation of each join
    (i.e., the right child of each join operator) is a base
    relation. Which of the following are left deep?


                                             D                D

                                         C       C

A           B    C          D   A    B               A     B



• Left deep plans allow for fully pipelined execution
                                                               25
Alternative Plans


Motivating Example




                     26
               Motivating Example
• We will consider the cost of various plans for the
  following query.
  – NOTE: No distinct in SELECT clause!
     SELECT       S.sname
     FROM         Sailors S, Reserves R
     WHERE        S.sid = R.sid and
                  R.bid = 100 and S.rating>5

                Tuple size Tuples per   Number of
                           block        blocks
     Reserves 40           100          1000
     Sailors    50         80           500            27
                           Tuple size Tuples per           Number of
                                         block             blocks
      Reserves 40                        100               1000
      Sailors              50            80                500

• What is the cost of this plan, if
                                                                            (On-the-fly)
                                                           sname
  the buffer is of size 5?
    BR + BS * BR / (B-2)
    1000+500*1000/3= 168,000                       bid=100     rating > 5     (On-the-fly)
• What would you suggest to
  change in this plan?
    Switch between the join and the select                          (Block Nested Loops)
                                                          sid=sid
    Switch between S and R
    Do the projection early
                                               Reserves             Sailors                  28
    Alternative Plan: Pushing Selections

• Selections generally reduce the sizes of the relations
• Often a good strategy to push the selections, i.e., apply
  early, so that the join is over smaller relations
• Which selections could be
                                                                (On-the-fly)
                                               sname
  pushed in this query?

                                       bid=100     rating > 5     (On-the-fly)



                                                        (Block Nested Loops)
                                              sid=sid



                                   Reserves             Sailors                  29
               Tuple size Tuples per           Number of
                              block            blocks
    Reserves 40               100              1000
    Sailors    50             80               500



• This is an plan has                                     (On-the-fly)
                                                      sname


 fully-pushed                                                 (Block NL Join)
                                                  sid=sid
 selections
                                   (Scan;                                  (Scan;
                                   write to bid=100           rating > 5   write to
 ‫כל הבחירות נעשות כמה שיותר‬        temp T1)                                temp T2)

                     ‫מוקדם‬                 Reserves           Sailors

                                                                                 30
               Tuple size Tuples per           Number of
                              block            blocks
    Reserves 40               100              1000
    Sailors    50             80               500


• Find cost, assuming:                                    (On-the-fly)
                                                      sname
  – 5 buffer pages
  – 100 different boats,                                      (Block NL Join)
                                                  sid=sid
    uniform distribution on
                                   (Scan;                                  (Scan;
    bid                            write to bid=100
                                   temp T1)
                                                              rating > 5   write to
                                                                           temp T2)
  – rating between 1 and                   Reserves           Sailors
    10, uniformly distributed                                                    31
                         Tuple size Tuples per                   Number of
                                               block             blocks
      Reserves 40                              100               1000
      Sailors            50                    80                500


• Finding cost:                                                             (On-the-fly)
                                                                        sname
                             1000 :reserve ‫קריאת‬
   10 ‫ הוא כ 001/0001. לכן עולה כ‬T1 ‫: גודל‬T1 ‫כתיבת‬
                                500 :salors ‫קריאת‬                               (Block NL Join)
                                                                    sid=sid
  250 ‫ תעלה כ‬T2 ‫: 2/005 = 052. לכן כתיבת‬T2 ‫כתיבת‬
       Block Nested Loop: 10 + 250*10/3 = 1010       (Scan;                                  (Scan;
 1000+10+500+250+1010 = 2770 :‫סה"כ העלות היא‬         write to bid=100           rating > 5   write to
                                                     temp T1)                                temp T2)

                                                             Reserves           Sailors

                                                                                                   32
      Pushing Selections/Projections
             and Pipelining
                  Can Push through       Can Pipeline
                  Selection/Projection   into?
Block NL, Outer                                 

Block NL, Inner                                 X

Index NL, Outer                                 
                                       If we tried to push
Index NL, Inner             X        selection/projection we
                                                 X
                                      would no longer have
Sort Merge                                      X
                                             an index


                                                               33
               Alternative Plan: Pushing
                      Projections
• Projections also reduce the sizes of the relations
• Often a good strategy to push the projections, i.e., apply
  early, so that the join is over smaller relations
• Which projections could be
                                                                (On-the-fly)
                                               sname
  pushed in this query?
• Note: in general, duplicate                      rating > 5     (On-the-fly)
                                       bid=100

  elimination is performed
  only in the outermost                                 (Block Nested Loops)
                                              sid=sid
  projection
                                   Reserves             Sailors                  34
             Tuple size Tuples per         Number of
                        block              blocks
   Reserves 40          100                1000
   Sailors   50         80                 500



• This is an plan has                                  (On-the-fly)
                                                   sname


 fully-pushed                                              (Block NL Join)
                                               sid=sid
 selections and          (On-the-fly;                             (On-the-fly;
                         write to T1)    sid                      write to T2)
                                                            sname,sid
 projections                 (Scan)                                     (Scan)
                                         bid=100           rating > 5


                                        Reserves           Sailors               35
                           Tuple size Tuples per                Number of
                                             block              blocks
         Reserves 40                         100                1000
         Sailors           50                80                 500

     sid, ‫, ונניח‬byte 10 ‫ תופס‬sid ‫נניח ש‬
               byte 25 ‫ תופס‬sname                                           (On-the-fly)
                                                                        sname
                                ‫חישוב עלות‬
                 1000 :Reserve ‫קריאת‬
                                                                                (Block NL Join)
)‫: 3 = )4*001(/0001 (ערך עליון‬T1 ‫כתיבת‬                              sid=sid
                                              (On-the-fly;                             (On-the-fly;
                    500 :Salors ‫קריאת‬         write to T1)    sid                      write to T2)
                                                                                 sname,sid
            500/(2*2) = 125 :T2 ‫כתיבת‬
                                                  (Scan)      bid=100           rating > 5   (Scan)
        3 + 125*3/3 = 128 :BNL ‫חישוב‬
 1000+3+500+125+128 =   1756 :‫סה"כ העלות‬                     Reserves           Sailors               36
    Alternative Plan: Using Indexes

• In there are indexes, then it may be
 preferable to access one (or more) of the
 relations via the indexes, instead of by using
 a sequential scan
• Suppose we have a clustered hash index on
 bid of Reserves and a hash index on sid of
 Sailors

                                                  37
            Tuple Plan: Using Indexes
     Alternativesize Tuples per Number of
                            block                blocks
    Reserves 40             100                  1000
    Sailors    50           80                   500

• Find cost, assuming                                  sname
                                                                 (On-the-fly)



  that access to hash
                                                       rating > 5 (On-the-fly)

  table indices costs 1.2
                                                                (Index Nested Loops,
  I/Os                                               sid=sid    with pipelining )



• Is the selection fully            (Use hash
                                     index; do
                                     not write
                                                  bid=100      Sailors

                                     result to
  pushed? No. Is this                temp)
                                                 Reserves

  good? No (index).                                                                    38
               Tuple Plan: Using Indexes
        Alternativesize Tuples per Number of
                                           block                   blocks
        Reserves 40                        100                     1000
        Sailors         50                 80                      500

                                       ‫חישוב עלויות‬                                (On-the-fly)
                                                                         sname
BR+TR(time to find matching raws in S) :INL ‫עלות‬
  ‫ היא בזמן הבחירה, לכן אין צורך לקרוא‬R ‫עלות קריאת‬                       rating > 5 (On-the-fly)
                          .INL ‫אותו שוב בשלב ה‬
1.2 + 1000*100/(100*100) = 11.2 :Reserve ‫קריאת‬                                    (Index Nested Loops,
                                                                       sid=sid    with pipelining )
  bid=100 ‫חלוקה אחת ב 001 כי בהסתברות 10.0 נקבל‬
        .‫חלוקה שניה ב 001 היא בגלל שהאינדקס מכווץ‬     (Use hash
                                                       index; do    bid=100      Sailors
                                                       not write
                           1000*(1.2+1) :INL ‫עלות‬      result to
                                                       temp)
                               2211.2 :‫סה"כ העלות‬                  Reserves

                                                                                                         39
Determining the Cost of Algebraic Operators

   Background: Memory, Files, Indexes

               External Sort                  Selection

                                                Join
      Evaluating Individual Operators
                                               Other

Finding Efficient Query Plans

            Optimizer Architecture

         Cost of Combining Operators

              Enumerating Plans
                                                          40
    Calculating the Costs: Block NL
• N is number of blocks in the outer relation
• M is number of blocks in the inner relation
• B is number of buffer blocks
• Cost, selections/projections are not pipelined: N +
  M(N/(B-2))
• Cost, selections/projections are pipelined on outer
  relation: M(N/(B-2))
• Notes: If selections/projections are pushed through
  the join, then N and M are the sizes of the relations
  AFTER the selections/projections
                                                        41
     Calculating the Costs: Index NL
• N is number of blocks in the outer relation
• T is number of tuples in the outer relation
• S is the cost of traversing the index
• Tm is number of tuples in the inner relation that match each
  tuple in the outer relation
• P is number of tuples in the inner relation that fit into a page
• Notes: If selections/projections are pushed through the join,
  then N and T are the sizes AFTER the selections/projections
                     Index clustered       Index unclustered
   No pipelining     N+T(S+Tm/P)           N+T(S+Tm)
   Pipelining on     T(S+Tm/P)             T(S+Tm)
   outer relation                                                42
      Calculating the Costs: Sort-Merge Join
• N is number of blocks in the outer relation
• M is number of blocks in the inner relation
• B is number of buffer blocks
• Cost:
  2N(1+ logB-1( N/B)) + 2M(1+ logB-1( M/B)) + N + M
• If outer relation is already sorted:
                 2N(1+ logB-1( N/B)) + N + M
• If inner relation is already sorted:
                2M(1+ logB-1( M/B)) + N + M
• Bother relations are already sorted: N + M
• Notes: If selections are pushed through the join, then N and
  M are the sizes of the relations AFTER the selections       43
Determining the Cost of Algebraic Operators

   Background: Memory, Files, Indexes

               External Sort                  Selection

                                                Join
      Evaluating Individual Operators
                                               Other

Finding Efficient Query Plans

            Optimizer Architecture

         Cost of Combining Operators

              Enumerating Plans
                                                          44

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:10/27/2012
language:English
pages:44