CS 440 Database Management Systems by crt16941

VIEWS: 9 PAGES: 12

									                                                                                                                  3/1/2009




                 CS 440:                               Query Processing and Optimization
       Database Management Systems
                                                        How are queries processed
                                                         –   Input: SQL statements
                                                         –   Output: Results
                                                             O       R    l




Query Processing and Optimization                      Query Processing and Optimization

  How are queries processed                             SQL to extended relational algebra
   –   Input: SQL statements                             –   Query blocks
   –   Output: Results
       O       R    l                                          Each block has the i l t t
                                                               E h bl k h th simple structure of      f
                                                                 – SELECT-FROM-WHERE/GROUP BY and HAVING
                                                               Include aggregate functions
                                                               Nested queries are divided into different blocks
    SQL          Extended
 Statements      Relational    Query Tree    Results
                  Algebra




Query Processing and Optimization                      Query Processing and Optimization

  SQL to extended relational algebra                    SQL to extended relational algebra
   –   Uncorrelated nested queries                       –   Uncorrelated nested queries

        SELECT LName, FName FROM EMPLOYEE WHERE               SELECT LName, FName FROM EMPLOYEE WHERE
         Salary > (SELECT MAX(Salary) FROM EMPLOYEE            Salary > c;
                      WHERE DNo = 5);
                                                              SELECT MAX(Salary) FROM EMPLOYEE
                                                                          WHERE DNo = 5;




                                                                                                                        1
                                                                                                                        3/1/2009




Query Processing and Optimization                      Query Processing and Optimization

 SQL to extended relational algebra                     SQL to extended relational algebra
  –   Uncorrelated nested queries                        –   Uncorrelated nested queries
                                                         –   Relatively          i i
                                                             R l i l easy to optimize
       SELECT LName, FName FROM EMPLOYEE WHERE                SELECT LName, FName FROM EMPLOYEE WHERE
        Salary > c;π    (σ    EMPLOYEE)                        Salary > c;π LName, FName (σ Salary >c EMPLOYEE)
                       LName , FName   Salary > c


       SELECT MAX(Salary) FROM EMPLOYEE                       SELECT MAX(Salary) FROM EMPLOYEE
                   WHERE DNo = 5;                                                WHERE DNo = 5;
                FMAX Salary (σ DNo =5 EMPLOYEE )                    FMAX Salary (σ DNo =5 EMPLOYEE )




Query Processing and Optimization                      Query Processing and Optimization

 SQL to extended relational algebra                     Sorting
  –   Correlated nested queries                          –   Order by
  –   Hard to optimize
                                                         –   External sorting
                                                             E      l     i
                                                                Using Merge-Sort
  SELECT E.FName, E.LName
                                                                 – Data divided into small subfiles
  FROM EMPLOYEE AS E
                                                                 – Each subfile is sorted
  WHERE E.SSN IN (SELECT ESSN
                                                                 – The subfiles are merged iteratively
                     FROM DEPENDENT
                     WHERE E.FName = Dependent_name)




Query Processing and Optimization                      Query Processing and Optimization

 Algorithms for SELECT                                  Algorithms for SELECT
  –   Linear search                                      –   Binary search
         Every record i visited and t t d
         E          d is i it d d tested                        C diti i
                                                                Condition involves an equality comparison on a k attribute
                                                                              l            lit       i         key tt ib t
                                                                on which the data is ordered
           σ EName= ' Jane' EMPLOYEE

                                                                 σ SSN = '123456789' EMPLOYEE




                                                                                                                              2
                                                                                                                                       3/1/2009




Query Processing and Optimization                                     Query Processing and Optimization

 Algorithms for SELECT                                                 Algorithms for SELECT
  –   Primary index                                                     –   Primary index for multiple records
         Condition involves an equality comparison on a k attribute
         C diti i      l           lit        i         key tt ib t            C diti i
                                                                               Condition involves a non-equality comparison ( >, <=, >=)
                                                                                              l               lit      i    (<,        )
         with a primary index                                                  on a key attribute with a primary index


          σ UnivID = '930000000' EMPLOYEE                                             σ DNo >5 DEPT




Query Processing and Optimization                                     Query Processing and Optimization

 Algorithms for SELECT                                                 Algorithms for SELECT
  –   Clustering index for multiple records                             –   Secondary index
         C diti i        l             lit
         Condition involves an equality comparison on a non-key
                                              i             k                  Key
                                                                               K or non-key k
         attribute with a clustering index                                     =, <, >, <=, >=


              σ DNo =4 EMPLOYEE




Query Processing and Optimization                                     Query Processing and Optimization

 Algorithms for SELECT                                                 Algorithms for SELECT
  –   Conjunctive conditions (AND)                                      –   Conjunctive conditions (AND)
  –   Di j
      Disjunctive conditions (OR)
              i      di i                                                      Using     individual index
                                                                               U i an i di id l i d
                                                                               Using a composite index
                                                                               By intersection of record pointers
                                                                        –   Disjunctive conditions (OR)
                                                                               Little optimization is possible except in minor cases




                                                                                                                                             3
                                                                                                                         3/1/2009




Query Processing and Optimization                   Query Processing and Optimization

 Algorithms for PROJECT                              Algorithm for CARTISIAN PRODUCT
  –   Easy if key is included                         –   Straightforward but expensive
  –   Oh      i            d  li i     duplicates
      Otherwise, may need to eliminate d li           –   Try
                                                          T to substitute with other l
                                                                   b i                       i         i
                                                                            i h h less expensive operations
                                                          during optimization




Query Processing and Optimization                   Query Processing and Optimization

 Algorithms for UNION, MINUS,                        Algorithms for two-way EQUIJOIN
 INTERSECTION                                         –   Nest-loop join
  –   M        ti
      Merge-sort is used
                       d                                     For   h       d in A, t i             d in    d test
                                                             F each record i A retrieve every record i B and t t
                                                             whether the condition is met
                                                      –   Single-loop join
                                                             If one relation has an index, then loop through records in the
                                                             other relation and use the index to find out matching records




Query Processing and Optimization                   Query Processing and Optimization

 Algorithms for two-way EQUIJOIN                     Algorithms for two-way EQUIJOIN
  –   Sort-merge join                                 –   Sort-merge join

                                                          SSN          DNo                DNumber        DName
                                                          =========              2                   1      ADMIN
                                                          +++++++++              2                   2        DEV
                                                          $$$$$$$$$              3                   3           QC
                                                          ^^^^^^^^^^             4                   4      SALES
                                                          #########              5                   5        AED
                                                          *********              5                   6           HR




                                                                                                                               4
                                                                                                                                    3/1/2009




         SSN     and DNumber DName
Query ProcessingDNo Optimization                                           SSN     and DNumber DName
                                                                  Query ProcessingDNo Optimization

 Algorithms for two-way EQUIJOIN                                   Algorithms for two-way EQUIJOIN
  –   Sort-merge join                                               –   Sort-merge join

      SSN             DNo             DNumber       DName               SSN             DNo             DNumber       DName
      =========               2                 1       ADMIN           =========               2                 1       ADMIN
      +++++++++               2                 2        DEV            +++++++++               2                 2        DEV
      $$$$$$$$$               3                 3           QC          $$$$$$$$$               3                 3           QC
      ^^^^^^^^^^              4                 4       SALES           ^^^^^^^^^^              4                 4       SALES
      #########               5                 5        AED            #########               5                 5        AED
      *********               5                 6           HR          *********               5                 6           HR




         SSN     and DNumber DName
Query ProcessingDNo Optimization                                           SSN     and DNumber DName
                                                                  Query ProcessingDNo Optimization
                  =========       2                 2       DEV                     =========       2                 2       DEV

 Algorithms for two-way EQUIJOIN                                   Algorithms for two-way EQUIJOIN
  –   Sort-merge join                                               –   Sort-merge join

      SSN             DNo             DNumber       DName               SSN             DNo             DNumber       DName
      =========               2                 1       ADMIN           =========               2                 1       ADMIN
      +++++++++               2                 2        DEV            +++++++++               2                 2        DEV
      $$$$$$$$$               3                 3           QC          $$$$$$$$$               3                 3           QC
      ^^^^^^^^^^              4                 4       SALES           ^^^^^^^^^^              4                 4       SALES
      #########               5                 5        AED            #########               5                 5        AED
      *********               5                 6           HR          *********               5                 6           HR




         SSN     and DNumber DName
Query ProcessingDNo Optimization                                           SSN     and DNumber DName
                                                                  Query ProcessingDNo Optimization
                  =========       2                 2       DEV                     =========       2                 2       DEV
                  +++++++++                                                         +++++++++
                         2
 Algorithms for two-way EQUIJOIN 2                          DEV                            2
                                                                   Algorithms for two-way EQUIJOIN 2                          DEV
 – Sort-merge join                                                 – Sort-merge join

      SSN             DNo             DNumber       DName               SSN             DNo             DNumber       DName
      =========               2                 1       ADMIN           =========               2                 1       ADMIN
      +++++++++               2                 2        DEV            +++++++++               2                 2        DEV
      $$$$$$$$$               3                 3           QC          $$$$$$$$$               3                 3           QC
      ^^^^^^^^^^              4                 4       SALES           ^^^^^^^^^^              4                 4       SALES
      #########               5                 5        AED            #########               5                 5        AED
      *********               5                 6           HR          *********               5                 6           HR




                                                                                                                                          5
                                                                                                                                              3/1/2009




         SSN     and DNumber DName
Query ProcessingDNo Optimization                                                   SSN     and DNumber DName
                                                                          Query ProcessingDNo Optimization
                  =========              2                 2        DEV                     =========         2                 2       DEV
                  +++++++++                                                                 +++++++++
                         2
 Algorithms for two-way EQUIJOIN 2                                  DEV                            2
                                                                           Algorithms for two-way EQUIJOIN 2                            DEV
                                                                                    $$$$$$$$$      3       3                             QC
 – Sort-merge join                                                         – Sort-merge join

      SSN             DNo                    DNumber       DName                SSN             DNo               DNumber       DName
      =========                 2                      1       ADMIN            =========               2                   1       ADMIN
      +++++++++                 2                      2         DEV            +++++++++               2                   2        DEV
      $$$$$$$$$                 3                      3           QC           $$$$$$$$$               3                   3           QC
      ^^^^^^^^^^                4                      4       SALES            ^^^^^^^^^^              4                   4       SALES
      #########                 5                      5         AED            #########               5                   5        AED
      *********                 5                      6           HR           *********               5                   6           HR




         SSN     and DNumber DName
Query ProcessingDNo Optimization                                          Query Processing and Optimization
                  =========              2                 2        DEV
                  +++++++++
                         2
 Algorithms for two-way EQUIJOIN 2                                  DEV    Algorithms for aggregate functions
                  $$$$$$$$$              3                 3         QC
  –   Sort-merge join                                                       –   MAX, MIN
                                                                                   Having index helps (why?)
                                                                                   H i an i d h l ( h ?)
      SSN             DNo                    DNumber       DName
      =========                 2                      1       ADMIN
      +++++++++                 2                      2         DEV
                                                                            SELECT MAX (Salary)
      $$$$$$$$$                 3                      3           QC       FROM EMPLOYEE
      ^^^^^^^^^^                4                      4       SALES
      #########                 5                      5         AED
      *********                 5                      6           HR




Query Processing and Optimization                                         Query Processing and Optimization

 Algorithms for aggregate functions                                        Algorithms for outer join
  –   COUNT, AVERAGE, SUM                                                   –   Modified sort-merge join
                 g                    g
         Is straightforward when having a dense index
  –   GROUP BY
                                                                            –   S     d     h d
                                                                                Second method:
         Need to sort data based on the grouping attribute                         Perform natural join
         If an clustering index existing on the group attribute, super!            Perform set difference operation
                                                                                   Union the two
  SELECT DNo, AVG (Salary)
  FROM EMPLOYEE
  GROUP BY DNo




                                                                                                                                                    6
                                                                                     3/1/2009




Query Processing and Optimization           Query Processing and Optimization

 Queries are internally presented as         Heuristic optimization of query trees
  –   query trees (relational algebra) or     –   Query transformation
  –              h (relational calculus)
      query graphs ( l i       l l l )




Query Processing and Optimization           Query Processing and Optimization

 Heuristic optimization of query trees       Heuristic optimization of query trees
  –   Query transformation                    –   Query transformation




Query Processing and Optimization           Query Processing and Optimization

 Heuristic optimization of query trees       Heuristic optimization of query trees
  –   Query transformation                    –   Query transformation




                                                                                           7
                                                                                                              3/1/2009




Query Processing and Optimization        Query Processing and Optimization

 Heuristic optimization of query trees    Heuristic optimization of query trees
  –   Query transformation                 –   Query transformation




Query Processing and Optimization        Query Processing and Optimization

 Heuristic optimization of query trees    Heuristic optimization of query trees
  –   Query transformation                 –   Query transformation




Query Processing and Optimization        Query Processing and Optimization

 Heuristic optimization of query trees     Heuristic optimization of query trees
  –   Query transformation                 –    Query transformation rules
                                                1 Cascade of σ: A conjunctive selection condition
                                                1.
                                                can be broken up into a cascade (sequence) of
                                                individual s operations:
                                                σ c1 AND c2 AND ... AND cn(R) = σc1 (σc2 (...(σcn(R))...) )
                                                                          σ c1

                                           σ c1 AND c2                    σ c2

                                                 R                         R




                                                                                                                    8
                                                                                                                                         3/1/2009




Query Processing and Optimization                                  Query Processing and Optimization

  Heuristic optimization of query trees                              Heuristic optimization of query trees
  –   Query transformation rules                                     –   Query transformation rules
      2. Commutativity of σ: Th σ operation i
      2 C         i i   f The             i is                           3. Cascade of π: I a cascade (
                                                                         3 C         d f In                   d (sequence) of π
                                                                                                                             ) f
      commutative σc1 (σc2(R)) = σc2 (σc1(R))                            operations, all but the last one can be ignored:
                                                                            πList1 (πList2 (...(πListn(R))...) ) = πList1(R)
                                 σ c1                 σ c2                                   π List1

                                 σ c2                 σ c1                                   π List2            π List1

                                  R                    R                                         R                  R




Query Processing and Optimization                                  Query Processing and Optimization

  Heuristic optimization of query trees                              Heuristic optimization of query trees
  –   Query transformation rules                                     –   Query transformation rules
      4. Commuting σ with π: If the selection condition c
      4 C              i       ih      h        l i         di i         5. C
                                                                            Commutativity of ( and x ) Th
                                                                                     i i   f       d ): The                   i is
                                                                                                                        operation i
      involves only the attributes A1, ..., An in the                    commutative as is the x operation:
      projection list, the two operations can be commuted:               R C S = S C R; R x S = S x R
      πA1, A2, ..., An (σc (R)) = σc (πA1, A2, ..., An (R))
                        π List               σc
                                                                                         c                                 c
                          σc                 π List
                                                                            R                    S              S                    R
                          R                   R




Query Processing and Optimization                                  Query Processing and Optimization

  Heuristic optimization of query trees                              Heuristic optimization of query trees
  –   Query transformation rules                                     –   Query transformation rules
      6. Commuting σ with ( x ) If all the attributes i
      6 C         i       i h (or ):     ll h     ib      in             Al
                                                                         Alternatively, if the selection condition c can b
                                                                                 i l        h    l i        di i         be
      the selection condition c involve only the attributes              written as (c1 and c2), where condition c1 involves
      of one of the relations being joined—say, R—the                    only the attributes of R and condition c2 involves
      two operations can be commuted as follows :                        only the attributes of S, the operations commute as
      σc ( R     S ) = (σc (R))         S                                follows: σc ( R S ) = (σc1 (R)) (σc2 (S))
                 σc                                                                 σc
                                        σc                   S                                            σc1                  σc2

         R                S             R                                  R                 S              R                  S




                                                                                                                                               9
                                                                                                                                                  3/1/2009




Query Processing and Optimization                                                  Query Processing and Optimization

    Heuristic optimization of query trees                                            Heuristic optimization of query trees
    –   Query transformation rules                                                   –       Query transformation rules
        7. Commuting π with (or x ): Suppose that the projection list
        is L = {A1, ..., An, B1, ..., Bm}, where A1, ..., An are attributes                  8 C          i i   f          i     The
                                                                                             8. Commutativity of set operations: Th set
        of R and B1, ..., Bm are attributes of S. If the join condition c                    operations υ and ∩ are commutative but – is not.
        involves only attributes in L, the two operations can be
        commuted as follows:
        πL ( R     C   S ) = (πA1, ..., An (R))       C   (πB1, ..., Bm (S))                           ∩                         ∩

                       πL                                                                                   T           T                S
                                                                                             S
                                               πL1                      πL2

          R                         S             R                      S




Query Processing and Optimization                                                  Query Processing and Optimization

    Heuristic optimization of query trees                                            Heuristic optimization of query trees
    –   Query transformation rules                                                   –       Query transformation rules
        9 A
        9. Associativity of , x, υ, and ∩ : Th
                 i i i    f           d     These ffour                                      10. Commuting σ with set operations: Th σ
                                                                                             10 C        i      ih            i    The
        operations are individually associative; that is, if θ                               operation commutes with υ , ∩ , and –. If θ stands for
        stands for any one of these four operations                                          any one of these three operations, we have
        (throughout the expression), we have                                                 σc ( R θ S ) = (σc (R)) θ (σc (S))
               (RθS)θT = Rθ(SθT)
                       ∩                              ∩
                                                                                                       σc                         ∩

         ∩                      T          R                       ∩
                                                                                                       ∩
                                                                                                                         σc                  σc
R                  S                                      S                    T
                                                                                             R                  S           R                S




Query Processing and Optimization                                                  Query Processing and Optimization

    Heuristic optimization of query trees                                            Heuristic optimization of query trees
    –   Query transformation rules                                                   –       Query transformation rules
         11 Th π operation commutes with υ.
         11.The          i                 ih                                                12 C
                                                                                             12. Converting a ( x) sequence i
                                                                                                          i    (σ,                          h
                                                                                                                                 into : If the
        πL ( R υ S ) = (πL (R)) υ (πL (S))                                                   condition c of a σ that follows a x Corresponds to a
                                                                                             join condition, convert the (σ, x) sequence into a
                                                                                             as follows:
                                                                                             (σC (R x S)) = (R C S)
              πL                                      υ                                           σc
              υ                         πL                        πL
                                                                                                   X                                 c

    R                       S            R                        S
                                                                                         R                  S               R                S




                                                                                                                                                       10
                                                                                                                                                    3/1/2009




Query Processing and Optimization                                                 Query Processing and Optimization

  Heuristic optimization of query trees                                             Heuristic optimization of query trees
  –    Algorithm:
          Using rule 1, break up any select operations with conjunctive             –    Algorithm:
          conditions into a cascade of select operations.
                                                                                           Using l 3, 4, 7, d                   i the
                                                                                           U i rules 3 4 7 and 11 concerning th cascading ofdi     f
          Using rules 2, 4, 6, and 10 concerning the commutativity of select
          with other operations, move each select operation as far down the                project and the commuting of project with other operations,
          query tree as is permitted by the attributes involved in the select              break down and move lists of projection attributes down the
          condition.
                                                                                           tree as far as possible by creating new project operations
          Using rule 9 concerning associativity of binary operations,
          rearrange the leaf nodes of the tree so that the leaf node relations             as needed.
          with the most restrictive select operations are executed first in the            Identify subtrees that represent groups of operations that
          query tree representation.
          Using Rule 12, combine a cartesian product operation with a                      can be executed by a single algorithm.
          subsequent select operation in the tree into a join operation.




Query Processing and Optimization                                                 Query Processing and Optimization

 Heuristic optimization of query trees                                             Heuristic optimization of query trees
  –   Query transformation                                                          –   Query transformation




Query Processing and Optimization                                                 Query Processing and Optimization

 Heuristic optimization of query trees                                             Heuristic optimization of query trees
  –   Query transformation                                                          –   Query transformation




                                                                                                                                                         11
                                                                                                                         3/1/2009




Query Processing and Optimization                                               Query Processing and Optimization

 Heuristic optimization of query trees                                           Heuristic optimization of query trees
  –   Query transformation                                                        –   Query transformation




Query Processing and Optimization                                               Query Processing and Optimization

 Heuristic optimization of query trees                                           Heuristic optimization of query trees
  –   Query transformation                                                        –   Query transformation




Query Processing and Optimization

  Heuristic optimization of query trees
  –    Summary of Heuristics for Algebraic Optimization:
          The main heuristic is to apply first the operations that reduce the
           i    f intermediate results.
          size of i t    di t        lt
          Perform select operations as early as possible to reduce the
          number of tuples and perform project operations as early as
          possible to reduce the number of attributes. (This is done by
          moving select and project operations as far down the tree as
          possible.)
          The select and join operations that are most restrictive should be
          executed before other similar operations. (This is done by
          reordering the leaf nodes of the tree among themselves and
          adjusting the rest of the tree appropriately.)




                                                                                                                              12

								
To top