Docstoc

Avoiding Sorting and Grouping In Processing Queries

Document Sample
Avoiding Sorting and Grouping In Processing Queries Powered By Docstoc
					 Avoiding Sorting and
Grouping In Processing
       Queries
     Ketan Kalgaonkar




                        Slides reference from Sahak Maloyan
                     Outline
 Motivation
 Simple Example
 Order Properties
 Order Property Optimization
 Performance Results
 Conclusion
                       Motivation
 Inferred orderings
  – Make it possible to avoid sorting when preprocessing ORDER BY
     clauses of SQL query

 Inferred groupings
  – Avoid sorting or hashing prior to computing aggregates for GROUP
     BY clauses

 Inference of secondary ordering and grouping
  – Avoid unnecessary sorting or grouping over multiple attributes
  – Infer new primary orderings or groupings
             Simple Example
 Benefits of inferring grouping and secondary ordering
 TPC-H Query
  SELECT c_custkey, COUNT (*)                FROM
  Customer, Supplier                     WHERE
  c_nationkey = s_nationkey            GROUPBY
  c_custkey

  (How many suppliers could supply each customer
  directly without having to go through customs)
      Simple Example cont…
   sort-merge join result                  group
   is sorted (and hence               c_custkey,count(*)
   groupped) on
   c_nationkey.

                                           sort
                                         c_custkey



                                         merge-join
                                  c_nationkey=s_nationkey

Key: c_custkey
FD:
c_custkey                 sort                               sort
c_nationkey             c_nationkey                        s_nationkey


                     table scan                               table scan
                      customer                                  supplier
             Order Properties
 order properties have the form:
  A1α1  A2α2….Anαn

 each Ai is an attribute, each αi either specifies an
  ordering (αi = O) or a grouping (αi =G)

 A1α1 primary ordering or grouping and A2α2 secondary
       Order Properties cont…
 Identities
1. O oo’ (R)  O o (R)
     for any order property that holds of a physical relation, all
      prefixes of that order property also hold of R

2. O oBOo’ (R) O oBGo’ (R)
     an ordering on any attribute implies a grouping on that attribute

3. FD XB (R), O oBαo’ (R), X is subset of atts in o 
  O oo’ (R)
   If X functionally determines B, and an order property that
      includes all attributes in X (ordered or grouped) appearing
      before Bα, then Bα is redundant.
       Order Properties cont…
 Identities(cont…)
4. FD AB (R), O oAxo’Byo’’ (R) 
   O oAxo’o’’(R)
    special case of identity #3, covering the case where X
      consists of a single attribute

5. FD AB (R), O oBGAGo’ (R) 
   O oAGo’ (R)
    the grouping of an attribute that is functionally
      determined by the attribute that follows it in the order
      property is superfluous
 Order Property Optimisation
 Postgres Plan Operators


   The data structures for all plan nodes in postgres include
    the following fields:
   inp1,... inpn: the fields contained in all input tuples to the
    node
   left: the left subtree of the node
   right: the right subtree of the node
 Order Property Optimisation
 Postgres Plan Operators (cont…)


     Group
     Hash
     HJoin
     Merge
 Order Property Optimisation
 A Plan Refinement Algorithm


   Input: query plan tree generated by Postgres

   Output: an equivalent plan tree with unnecessary Sort
    operators (used either to order or group) removed

   Requires: 4 new attributes associated with every node in
    a query plan tree
  Order Property Optimisation
 A Plan Refinement Algorithm(cont…)
 New Attributes
   keys: a set of attribute sets that are guaranteed to be keys of
    inputs to n
   fds: a set of functional dependencies (attribute sets → attribute)
    that are guaranteed to hold of inputs to n
   req: a single order property that is required to hold of inputs
    either to n or some ancestor node of n for that node to execute
   sat: a set of order properties that are guaranteed to be satisfied
    by outputs of n
  Order Property Optimisation
 A Plan Refinement Algorithm (cont.)


   Idea:
  – decorate the input plan with the attributes above
  – remove any Sort operator n whose child node produces a
     result that is guaranteed to satisfy an order property
     required by its parent node
   Accomplished in 3 passes
               Order Property Optimisation
                                                                              keys = {{c_custkey,s_custkey}}
                                                        NOP                   fds = {s_suppkey  s_nationkey
                                                                              c_custkeyc_nationkey
                                                                              requires = -
 keys = {{c_custkey,s_suppkey}}
 fds = {s_supkey  s_nationkey
 c_custkeyc_nationkey                                 group
 requires = c_custkeyG                            c_custkey,count(*)


 keys = {{c_custkey,s_suppkey}}
 fds = {s_supkey  s_nationkey
 c_custkeyc_nationkey                                 sort                keys = {{c_custkey,s_suppkey}}
 requires = c_custkeyG                               c_custkey             fds = {s_supkey  s_nationkey
                                                                           c_custkeyc_nationkey
                                                                           requires = c_custkeyG



   keys = {c_custkey}                                 merge-join
   fds = c_custkeyc_nationkey
   requires = c_nationkeyO
                                               c_nationkey=s_nationkey               keys = {s_suppkey}
                                                                                     fds = {s_supkey  s_nationkey}
                                                                                     requires = c_nationkeyO



                                       sort                               sort
keys = {c_custkey}                                                                            keys = {s_suppkey}
fds = c_custkeyc_nationkey
                                    c_nationkey                        s_nationkey            fds = {s_supkey 
requires = c_nationkeyO                                                                       s_nationkey}
                                                                                              requires = s_cnationkeyO
                                  table scan                              table scan
                                   customer                                 supplier
  Order Property Optimisation
 A Plan Refinement Algorithm (cont.)

   Pass 1:Functional Dependencies and Keys
      A bottom-up pass, FDs and keys are propagated upwards when
        inferred to hold intermediate query result
   Pass 2:Required Order Properties
      Top-down pass requires order properties (req) which are propagated
       downwards from the root of the tree
      New required operators are generated by:
        – NOP: Node Order Property (called on the root of the plan to trigger the top-
            down pass)
        – Group and Unique
        – Join operator
        All other nodes pass the required order properties they inherit from parent nodes
            to their child nodes, except for Hash and Append which propagate the empty
            order property to their child nodes
  Order Property Optimisation
 A Plan Refinement Algorithm (cont.)


   Pass 3:Sort Elimination
     A bottom-up pass of the query plan tree that determines what
      order properties are guaranteed to be satisfied by outputs of
      each node (sat), and that concurrently removes any Sort
      operator, n for which n.left.sat Є n.req
               Performance Result
 Experiment #1                                         group
                                                   c_custkey,count(*)

  Postgres     Refined     Ratio
  Plan
                                                        sort
  6384.9 sec   487.9 sec   13.08                      c_custkey



                                                       merge-join
                                                c_nationkey=s_nationkey




                                        sort                               sort
                                     c_nationkey                        s_nationkey


                                   table scan                              table scan
                                    customer                                 supplier
               Performance Result
 Experiment #2

                                                                    NL-join
   Postgres    Refined         Ratio                           o_orderkey=l_order
   Plan                                                               key
   126.8       2729.9          0.05
                                                          Sort                      Index Scan
                                                       o_orderkey                     lineitem


Postgres   Refined      Ratio                            merge-join
Plan                                                 c_custkey=o_custkey
121.4      113.3        1.07

                                           sort                              Sort
                                         c_custkey                         o_custkey


                                      table scan                             table scan
                                       customer                                 order
                  Conclusion
 Formal approach to order optimization that integrates
  both orderings and groupings within the same
  comprehensive framework
 Secondary orderings and groupings
 By inferring secondary orderings and groupings, it is
  possible to avoid unnecessary sorting or grouping over
  multiple attributes

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:10/14/2011
language:English
pages:19