# Efficient Computation of the Skyline Cube

Document Sample

```					Efficient Computation of
the Skyline Cube

Yidong Yuan
School of Computer Science & Engineering
The University of New South Wales & NICTA
Sydney, Australia

Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW),
Wei Wang (UNSW), Jeffrey Xu Yu (CUHK),
Qing Zhang (UNSW & CSIRO)
Outline
   Introduction
   Skycube Computation Techniques
   Experiments
   Summary

VLDB 2005             Yidong Yuan @DBG.UNSW   2
(x1, x2, …, xd)        (y1, y2, …, yd)
Skyline Query                           i, xi  yi     & ∃k, xk<yk
dist
P4

   A real estate example                                      P3

Skyline on                   P1
price (100K)   dist   age     …
price & dist            P5
P1        3          3      5      …
P2
P2        5          1      1      …
P3        1          4      4      …                                                price
age
P4        4          5      2      …                                 P1

P5        2          2      3      …                       P3

Properties and Values          Skyline on              P5
price & age                       P4
    P5    P1
P2
    skyline returns data points
not dominated by others
price
VLDB 2005                       Yidong Yuan @DBG.UNSW                                           3
Skyline Cube                            A    B
Skycube Example
C               skyline
P      3       3    5    ABC      {P2, P3, P4,

   Skycube                                  1

P      5       1    1        AB
P5 }
{P2, P3, P5}
Skyline on price & dist & age
2
                                        P      1       4    4        AC   {P2, P3, P4,
P5 }
Skyline on price & dist
3
                                        P      4       5    2        BC        {P2}
Dataset
Skyline on price & age
4
                                                                     A         {P3}
P      2       2    3
B           {P2}
……
5
ABCC           {P2}

   A union of skyline results of                   AB           AC         BC

all the non-empty subsets of
d-dimensional set (2d - 1)                      A             B         C
Lattice Structure of a Skycube

VLDB 2005                  Yidong Yuan @DBG.UNSW                                              4
Motivation
   How to compute Skycube efficiently?
   existing skyline techniques are applicable
   no sharing computation  Not efficient!

VLDB 2005               Yidong Yuan @DBG.UNSW        5
Motivation (cont.)
   nested-loop-based alg.
    BNL       [ICDE 01]
B                                             Candidate Comparison of          Comparison of
P4
List    Skyline on A        Skyline on A and B
P3
P1                      --                 --
P1
P2      P1        P1(A) vs. P2(A)     P1(A) vs. P2(A)
P5
P1(B) vs. P2(B)
P2
A      ……
redundant comparison  Not efficient!
    SFS       [ICDE 03]      : presort the dataset  keep the candidate list minimum
repeated sorting  Not efficient!
VLDB 2005                                  Yidong Yuan @DBG.UNSW                              6
Motivation (cont.)
   divide-and-conquer-based alg. (DC                                        [ICDE 01])

B                                                 BC
P4                                                P4

P3                                                P3
Divide Step of
Divide Step of               P1                                                P1
Skyline on A, B, and C
Skyline on A and B
P5                                                P5
P2                                                P2

A                                               A
m’A mA m’’A                                       m’A mA m’’A

repeat same divide/merge steps  Not efficient!

VLDB 2005                               Yidong Yuan @DBG.UNSW                                      7
Outline
   Introduction
   Skycube Computation Techniques
   Bottom-Up Skycube Algorithm (BUS)
   Top-Down Skycube Algorithm (TDS)
   Experiments
   Summary

VLDB 2005             Yidong Yuan @DBG.UNSW   8
Property of Skycube
   Distinct Value Condition
   no two data points have same value on the same
dimension
   SKYU(S): skyline on sub-dimension set U
   SKYU(S)  SKYV(S)  U  V
   General Case
   Keep track of the “bad guys”

VLDB 2005                  Yidong Yuan @DBG.UNSW     9
Basic Idea
   compute the Skycube in a
level-wise and bottom-up manner
   each skyline is computed by a
nested-loop-based algorithm
ABC

AB      AC        BC

A        B         C

VLDB 2005            Yidong Yuan @DBG.UNSW   10
Sharing Strategies
   share-results: SKYU(S)  SKYV(S)                   AB
   reduce the size of input
   reduce the # of dominance test             A        B

   share-sorting: sort the dataset on each dimension
   keep the candidate list minimum
   reduce the # of sorting from 2d – 1 to d

VLDB 2005               Yidong Yuan @DBG.UNSW                   11
Filtering
   Effective Dominance Test
     filter function: p = sum of p’s coordinates
     no false negative: p  q  q does not dominate p
Skyline on A and B                          Sort on   P2 P5 P1 P 3 P4
B
P4                         B
P3                                         ABp   6 4 6 5          9
P1                      Candidate      Comparison         Comparison
List     (without filter)      (with filter)
P5
P2
P5      P2         P2(A) vs. P5(A)   ABP2 vs. ABP5
A                      P2(B) vs. P5(B)
     maintain the candidate list in a non-decreasing order of
filtering values (e.g. avl-tree)
VLDB 2005                               Yidong Yuan @DBG.UNSW                             12
DC Algorithm

Divide Step                Merge Step
B                  P4       B         S1    S2 P4       B     S12         S22

P3                           P3                          P3
P1                            P1                         P1
mB
P5                            P5                         P5
P2                           P2           S11        S21 P2
A                  mA       A               mA          A

VLDB 2005                       Yidong Yuan @DBG.UNSW                               13
Sharing Opportunities
   share-partitioning
S1                   S2                       S1                   S2
B                          P4                 BC                         P4

skyline on                                    skyline on
A and B      P3                                            P3
A, B, and C
P1                                            P1

P5                                            P5
P2                                            P2
A                                              A
… mi …          mA … mj …                      … mi …         mA … mj …

VLDB 2005                          Yidong Yuan @DBG.UNSW                                       14
Sharing Opportunities (cont.)
BC         S1              S2              B         S1             S2
P4
P3                                        P3
P1                                            P1
   share-merging                       P5                                        P5
P2                                            P2
A                                         A
mA                                           mA
decompose         skyline on A, B, and C                                   skyline on A and B
merge step
{P3, P5}   BC {P1, P2, P4}                       {P3, P5}             B    {P1, P2}

{P3, P5}   BC     {{P1, P2}, {P4}}

{P3, P5}        B     {P1, P2}
{P3, P5}   BC     {P1, P2}
{P3, P5}        C    above result
{P3, P5}    BC    {P4}
VLDB 2005                   Yidong Yuan @DBG.UNSW                                                    15
TDS Algorithm
ABC
   Basic Idea
AB
   compute skylines on a path simultaneously
   find a minimal set of paths                           A
   share-parent: using parent’s skyline result as the input
S

ABC                    ABC      SKYABC(S)   SKYABC(S)

AB   AC     BC               AB          BC         AC

A     B     C                 A          B           C

VLDB 2005                    Yidong Yuan @DBG.UNSW                             16
Outline
   Introduction
   Skycube Computation Techniques
   Experiments
   Summary

VLDB 2005             Yidong Yuan @DBG.UNSW   17
Experiment Setting
BNLS: BNL-Skycube algorithm *
SFSS: SFS-Skycube algorithm *
Algorithms
(* our sharing        DCS: DC-Skycube algorithm *
strategies applied)
BUS: Bottom-Up Skycube algorithm
TDS: Top-Down Skycube algorithm
Dataset        correlated, independent, anti-correlated
Dimensionality d  [4, 10]
Cardinality    n  [100k, 500k]

VLDB 2005                   Yidong Yuan @DBG.UNSW            18
Effect of Dimensionality
independent

Dimensionality (n = 500k)
VLDB 2005            Yidong Yuan @DBG.UNSW              19
Effect of Dimensionality (cont.)

correlated                                 anti-correlated

Dimensionality (n = 500k)                    Dimensionality (n = 500k)

VLDB 2005                        Yidong Yuan @DBG.UNSW                               20
Effect of Cardinality
anti-correlated

x100K
Cardinality (d = 8)
VLDB 2005             Yidong Yuan @DBG.UNSW            21
Effect of Duplicate Values
independent (d = 8)

VLDB 2005            Yidong Yuan @DBG.UNSW   22
Outline
   Introduction
   Skycube Computation Techniques
   Experiments
   Summary

VLDB 2005             Yidong Yuan @DBG.UNSW   23
Summary
   A novel concept –– Skycube
   Skycube computation Techniques
   Bottom-Up Skycube algorithm
   share-results, share-sorting
   Top-Down Skycube algorithm
   share-partition-and-merging, share-parent
   Future Work
   I/O based techniques
   multiple skyline queries
VLDB 2005                    Yidong Yuan @DBG.UNSW       24
Q&A

Thank you.

VLDB 2005           Yidong Yuan @DBG.UNSW   25
Preliminaries
   Existing Skyline Computation Algorithms
   nested-loop-based
   Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01]
   Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03]
   divide-and-conquer-based
   Divide-and-Conquer (DC) algorithm         [BKS, ICDE 01]

   index-based
   Bitmap, Index-Method [TEO, VLDB 01]
   R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]
VLDB 2005                      Yidong Yuan @DBG.UNSW                    26
Preliminaries
–– BNL and SFS Algorithms
BNL algorithm
Current Cand. List     Results
B             P4
P1                       P1
P3
P1             P2 P1                     P1 , P2
P3 P1 , P2                P1 , P2 , P3
P5
P4 P1 , P2 , P3           P1 , P2 , P3
P2
A     P5 P1 , P2 , P3           P2 , P3 , P5

SFS algorithm
   entropy value (indicator of the dominance power)
   pre-sort the dataset (e.g., {P5, P2, P3, P1, P4})
VLDB 2005                           Yidong Yuan @DBG.UNSW                          27
Preliminaries
–– DC Algorithm

Divide Step                  Merge Step
B             P4        B         S1        P4   S2   B     S12        P4   S22

P3                       P3                            P3
P1                            P1                         P1
mB
P5                        P5                           P5
P2                            P2         S11        S21 P2
A                  mA         A               mA         A

VLDB 2005                   Yidong Yuan @DBG.UNSW                                 28
General Case
   Issue: SKYU(S)  SKYV(S) does not necessarily hold
B
SKYB(S) = {P3, P4, P5}
P1   P2
SKYAB(S) = {P3}
P3   P5   P4
A

   Solution
   share-results: re-examine SKYU(S) on V
VLDB 2005                       Yidong Yuan @DBG.UNSW                 29
Motivation (cont.)
   other techniques
   Index method [VLDB 01]
   R-tree based index [VLDB 02; SIGMOD 03]
pre-computation
repeat
(e.g. index)                                    Not efficient!
pre-computation
is not reusable

   Goal
   Maximizing sharing computation!
VLDB 2005                Yidong Yuan @DBG.UNSW                        30

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 14 posted: 2/4/2010 language: English pages: 30