Docstoc

Efficient Computation of the Skyline Cube

Document Sample
Efficient Computation of the Skyline Cube Powered By Docstoc
					Efficient Computation of
the Skyline Cube

                Yidong Yuan
    School of Computer Science & Engineering
   The University of New South Wales & NICTA
                Sydney, Australia

Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW),
            Wei Wang (UNSW), Jeffrey Xu Yu (CUHK),
            Qing Zhang (UNSW & CSIRO)
            Outline
   Introduction
   Skycube Computation Techniques
   Experiments
   Summary




VLDB 2005             Yidong Yuan @DBG.UNSW   2
                                                   (x1, x2, …, xd)        (y1, y2, …, yd)
            Skyline Query                           i, xi  yi     & ∃k, xk<yk
                                                        dist
                                                                              P4

   A real estate example                                      P3

                                            Skyline on                   P1
         price (100K)   dist   age     …
                                            price & dist            P5
    P1        3          3      5      …
                                                                                   P2
    P2        5          1      1      …
    P3        1          4      4      …                                                price
                                                        age
    P4        4          5      2      …                                 P1

    P5        2          2      3      …                       P3

             Properties and Values          Skyline on              P5
                                            price & age                       P4
        P5    P1
                                                                                   P2
        skyline returns data points
         not dominated by others
                                                                                        price
VLDB 2005                       Yidong Yuan @DBG.UNSW                                           3
            Skyline Cube                            A    B
                                                                Skycube Example
                                                                 C               skyline
                                             P      3       3    5    ABC      {P2, P3, P4,

   Skycube                                  1

                                             P      5       1    1        AB
                                                                                  P5 }
                                                                               {P2, P3, P5}
        Skyline on price & dist & age
                                             2
                                            P      1       4    4        AC   {P2, P3, P4,
                                                                                  P5 }
        Skyline on price & dist
                                             3
                                            P      4       5    2        BC        {P2}
                                                  Dataset
        Skyline on price & age
                                             4
                                                                         A         {P3}
                                             P      2       2    3
                                                                        B           {P2}
        ……
                                             5
                                                                     ABCC           {P2}


       A union of skyline results of                   AB           AC         BC

        all the non-empty subsets of
        d-dimensional set (2d - 1)                      A             B         C
                                                 Lattice Structure of a Skycube

VLDB 2005                  Yidong Yuan @DBG.UNSW                                              4
            Motivation
   How to compute Skycube efficiently?
       existing skyline techniques are applicable
       no sharing computation  Not efficient!




VLDB 2005               Yidong Yuan @DBG.UNSW        5
                  Motivation (cont.)
       nested-loop-based alg.
            BNL       [ICDE 01]
B                                             Candidate Comparison of          Comparison of
                        P4
                                                 List    Skyline on A        Skyline on A and B
        P3
                                         P1                      --                 --
                  P1
                                         P2      P1        P1(A) vs. P2(A)     P1(A) vs. P2(A)
             P5
                                                                               P1(B) vs. P2(B)
                             P2
                                  A      ……
                             redundant comparison  Not efficient!
            SFS       [ICDE 03]      : presort the dataset  keep the candidate list minimum
                                      repeated sorting  Not efficient!
    VLDB 2005                                  Yidong Yuan @DBG.UNSW                              6
             Motivation (cont.)
   divide-and-conquer-based alg. (DC                                        [ICDE 01])

               B                                                 BC
                                    P4                                                P4

                     P3                                                P3
                                                   Divide Step of
  Divide Step of               P1                                                P1
                                              Skyline on A, B, and C
Skyline on A and B
                          P5                                                P5
                                         P2                                                P2

                                                A                                               A
                      m’A mA m’’A                                       m’A mA m’’A


     repeat same divide/merge steps  Not efficient!

 VLDB 2005                               Yidong Yuan @DBG.UNSW                                      7
            Outline
   Introduction
   Skycube Computation Techniques
       Bottom-Up Skycube Algorithm (BUS)
       Top-Down Skycube Algorithm (TDS)
   Experiments
   Summary


VLDB 2005             Yidong Yuan @DBG.UNSW   8
            Property of Skycube
   Distinct Value Condition
       no two data points have same value on the same
        dimension
       SKYU(S): skyline on sub-dimension set U
            SKYU(S)  SKYV(S)  U  V
   General Case
       Keep track of the “bad guys”


VLDB 2005                  Yidong Yuan @DBG.UNSW     9
            Basic Idea
   compute the Skycube in a
    level-wise and bottom-up manner
   each skyline is computed by a
    nested-loop-based algorithm
                            ABC

                    AB      AC        BC


                    A        B         C

VLDB 2005            Yidong Yuan @DBG.UNSW   10
            Sharing Strategies
   share-results: SKYU(S)  SKYV(S)                   AB
       reduce the size of input
       reduce the # of dominance test             A        B

   share-sorting: sort the dataset on each dimension
       keep the candidate list minimum
       reduce the # of sorting from 2d – 1 to d



VLDB 2005               Yidong Yuan @DBG.UNSW                   11
            Filtering
   Effective Dominance Test
         filter function: p = sum of p’s coordinates
         no false negative: p  q  q does not dominate p
    Skyline on A and B                          Sort on   P2 P5 P1 P 3 P4
B
                     P4                         B
     P3                                         ABp   6 4 6 5          9
                P1                      Candidate      Comparison         Comparison
                                           List     (without filter)      (with filter)
           P5
                          P2
                                   P5      P2         P2(A) vs. P5(A)   ABP2 vs. ABP5
                               A                      P2(B) vs. P5(B)
         maintain the candidate list in a non-decreasing order of
          filtering values (e.g. avl-tree)
VLDB 2005                               Yidong Yuan @DBG.UNSW                             12
            DC Algorithm

                                   Divide Step                Merge Step
B                  P4       B         S1    S2 P4       B     S12         S22

    P3                           P3                          P3
              P1                            P1                         P1
                                                        mB
         P5                            P5                         P5
                    P2                           P2           S11        S21 P2
                        A                  mA       A               mA          A




VLDB 2005                       Yidong Yuan @DBG.UNSW                               13
             Sharing Opportunities
   share-partitioning
                   S1                   S2                       S1                   S2
        B                          P4                 BC                         P4

skyline on                                    skyline on
 A and B      P3                                            P3
                                              A, B, and C
                              P1                                            P1



                        P5                                            P5
                                             P2                                            P2
                                                  A                                              A
             … mi …          mA … mj …                      … mi …         mA … mj …

 VLDB 2005                          Yidong Yuan @DBG.UNSW                                       14
             Sharing Opportunities (cont.)
                            BC         S1              S2              B         S1             S2
                                                  P4
                                  P3                                        P3
                                             P1                                            P1
   share-merging                       P5                                        P5
                                                   P2                                            P2
                                                            A                                         A
                                      mA                                           mA
decompose         skyline on A, B, and C                                   skyline on A and B
merge step
                 {P3, P5}   BC {P1, P2, P4}                       {P3, P5}             B    {P1, P2}


                 {P3, P5}   BC     {{P1, P2}, {P4}}


                                                            {P3, P5}        B     {P1, P2}
                 {P3, P5}   BC     {P1, P2}
                                                            {P3, P5}        C    above result
                 {P3, P5}    BC    {P4}
 VLDB 2005                   Yidong Yuan @DBG.UNSW                                                    15
            TDS Algorithm
                                                                         ABC
   Basic Idea
                                                                         AB
       compute skylines on a path simultaneously
       find a minimal set of paths                           A
       share-parent: using parent’s skyline result as the input
                                                     S

                 ABC                    ABC      SKYABC(S)   SKYABC(S)

            AB   AC     BC               AB          BC         AC

            A     B     C                 A          B           C

VLDB 2005                    Yidong Yuan @DBG.UNSW                             16
            Outline
   Introduction
   Skycube Computation Techniques
   Experiments
   Summary




VLDB 2005             Yidong Yuan @DBG.UNSW   17
            Experiment Setting
                         BNLS: BNL-Skycube algorithm *
                         SFSS: SFS-Skycube algorithm *
   Algorithms
   (* our sharing        DCS: DC-Skycube algorithm *
   strategies applied)
                         BUS: Bottom-Up Skycube algorithm
                  TDS: Top-Down Skycube algorithm
   Dataset        correlated, independent, anti-correlated
   Dimensionality d  [4, 10]
   Cardinality    n  [100k, 500k]

VLDB 2005                   Yidong Yuan @DBG.UNSW            18
            Effect of Dimensionality
                                      independent




                            Dimensionality (n = 500k)
VLDB 2005            Yidong Yuan @DBG.UNSW              19
            Effect of Dimensionality (cont.)

                   correlated                                 anti-correlated




            Dimensionality (n = 500k)                    Dimensionality (n = 500k)



VLDB 2005                        Yidong Yuan @DBG.UNSW                               20
            Effect of Cardinality
                           anti-correlated




                                               x100K
                         Cardinality (d = 8)
VLDB 2005             Yidong Yuan @DBG.UNSW            21
            Effect of Duplicate Values
                     independent (d = 8)




VLDB 2005            Yidong Yuan @DBG.UNSW   22
            Outline
   Introduction
   Skycube Computation Techniques
   Experiments
   Summary




VLDB 2005             Yidong Yuan @DBG.UNSW   23
            Summary
   A novel concept –– Skycube
   Skycube computation Techniques
       Bottom-Up Skycube algorithm
            share-results, share-sorting
       Top-Down Skycube algorithm
            share-partition-and-merging, share-parent
   Future Work
       I/O based techniques
       multiple skyline queries
VLDB 2005                    Yidong Yuan @DBG.UNSW       24
            Q&A



                  Thank you.


VLDB 2005           Yidong Yuan @DBG.UNSW   25
            Preliminaries
   Existing Skyline Computation Algorithms
       nested-loop-based
            Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01]
            Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03]
       divide-and-conquer-based
            Divide-and-Conquer (DC) algorithm         [BKS, ICDE 01]

       index-based
            Bitmap, Index-Method [TEO, VLDB 01]
            R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]
VLDB 2005                      Yidong Yuan @DBG.UNSW                    26
            Preliminaries
            –– BNL and SFS Algorithms
BNL algorithm
                                               Current Cand. List     Results
            B             P4
                                          P1                       P1
                P3
                           P1             P2 P1                     P1 , P2
                                          P3 P1 , P2                P1 , P2 , P3
                     P5
                                          P4 P1 , P2 , P3           P1 , P2 , P3
                               P2
                                    A     P5 P1 , P2 , P3           P2 , P3 , P5

SFS algorithm
   entropy value (indicator of the dominance power)
   pre-sort the dataset (e.g., {P5, P2, P3, P1, P4})
VLDB 2005                           Yidong Yuan @DBG.UNSW                          27
            Preliminaries
            –– DC Algorithm

                               Divide Step                  Merge Step
B             P4        B         S1        P4   S2   B     S12        P4   S22

    P3                       P3                            P3
               P1                            P1                         P1
                                                      mB
         P5                        P5                           P5
                   P2                            P2         S11        S21 P2
                    A                  mA         A               mA         A




VLDB 2005                   Yidong Yuan @DBG.UNSW                                 28
            General Case
   Issue: SKYU(S)  SKYV(S) does not necessarily hold
            B
                                             SKYB(S) = {P3, P4, P5}
                               P1   P2
                                             SKYAB(S) = {P3}
                P3   P5   P4
                                         A


   Solution
       share-results: re-examine SKYU(S) on V
VLDB 2005                       Yidong Yuan @DBG.UNSW                 29
            Motivation (cont.)
   other techniques
       Index method [VLDB 01]
       R-tree based index [VLDB 02; SIGMOD 03]
pre-computation
                        repeat
   (e.g. index)                                    Not efficient!
                   pre-computation
 is not reusable

   Goal
       Maximizing sharing computation!
VLDB 2005                Yidong Yuan @DBG.UNSW                        30

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:2/4/2010
language:English
pages:30