Implementation Of Skyline Query Algorithms

Document Sample
Implementation Of Skyline Query Algorithms Powered By Docstoc
					                Implementation Of Skyline Query Algorithms

                         Marios Kokkodis                                         Pamela Bhattacharya
                  University of California Riverside                        University of California Riverside
                         mak@cs.ucr.edu                                           pamelab@cs.ucr.edu




ABSTRACT                                                            ios Hadjieleftheriou [2] to support skyline queries. More spe-
In this work we extend the public available Spatial Index           cific, we implement the following three skyline algorithms:
Library to support skyline queries. More specific, We im-
                                                                       • Block Nested Loop (BN L) - [1]
plement three basic algorithms (Block Nested Loop, Divide
and Conquer and Branch and Bound ) and the naive brute
force approach. Finally we test these implementations in               • Divide and Conquer (DCA) - [1]
various datasets, and we compare their efficiency in terms
of disk I/Os and run time.                                             • Branch and Bound (BBS) - [5]

                                                                    Furthermore we run those algorithms in various datasets,
1.   INTRODUCTION                                                   and we compare their efficiency in terms of disk IOs and
   Skyline points are defined as those points which dominate         run time.
all the other points in a dataset. The dominating criteria
are defined in a way that a point is part of the skyline (i.e.
it dominates all other points) if it is as good as or better        2. ALGORITHMS DESCRIPTION
in all dimensions than the rest of the points. The skyline
points are equally good, and don’t dominate each other.             2.1 Computing “Dominance"
   Computing the skyline points of a dataset is essential for
                                                                      The notion of dominance is very crucial in the definition of
applications that involve multi-criteria decision making. As
                                                                    skyline points. Given two data points, we compute whether
                                               o o
an example, we adopt the one given by B¨rzs¨nyi et. al.
                                                                    one dominates the other or they are incomparable by using
in [1]. Often tourists search for “cheap hotels near the
                                                                    the dominating factor specified in the query. In the classical
beach”. Such query, should return all hotels that dominate
                                                                                            o o
                                                                    example provided by B¨rzs¨nyi et al . in [1] the dominat-
the others in at least one of the two attributes (i.e. cost, dis-
                                                                    ing factors are whether a hotel is “cheap” and “near to the
tance to the beach). The result, is the skyline of the queried
                                                                    beach”. The formal definition of dominance is as follows [1]:
dataset. This example, taken from [1] is presented in figure
1.                                                                                                    Tuple p =
   The first skyline query algorithm was proposed by Kung                (p1 , ....., pk , pk+1 , ....., pl , pl+1 , ....., pm , pm+1 , ....., pn )
et. al. in 1975 [4] . It was based in the “divide and con-
quer” strategy and it had high complexity. In the later eight-
ies and nineties several algorithms were proposed, which did                                          Tuple q =
not gain much popularity as they could not provide solutions            (q1 , ....., qk , qk+1 , ....., ql , ql+1 , ....., qm , qm+1 , ....., qn )
for datasets that would not fit in main memory. B¨rzs¨nyio o
et. al. [1] in 2001 were the ones to first introduce the notion        Let’s say that we have the following Skyline Query:
of skyline queries for significantly large databases. They pro-
vided three algorithms: Block Nested Loop, Basic Divide and           Skyline of d1 M IN, ....., dk M IN, dk+1 M AX, ....., dl
Conquer and M-way Divide and Conquer. In the same year,                          M AX, dl+1 DIF F, dm DIF F
Tan et. al. [6] presented two other novel approaches for com-
puting the skyline points: Bitmap and Index. Furthermore,             where, M IN (M AX - DIF F ) indicates that we are look-
in 2002 Kossmann et al. [3] presented the NN-Algorithm for          ing for a minimum (maximum - different) point in that spe-
evaluating 2-dimensional skylines. Finally Papadias et al.          cific dimension. Then we can say that tuple p dominates
in 2005 presented the Branch and Bound algorithm [5]. Re-           tuple q if the following 3 conditions hold:
cently, several optimized and novel approaches of computing
skyline queries have been proposed , but there are out of the                             pi ≤ qi for all i = 1, ....., k
scope of this work.
   In this work, we extend the Spatial Index Library by Mar-                          pi ≥ qi for all i = (k + 1), ....., l

                                                                                      pi = qi for all i = (l + 1), ....., m
Figure 1: Skyline of “Cheap hotels near the beach”                Figure 2: Merging Phase of Divide and Conquer


2.2 Block Nested Loop (BNL)                                      a way that is efficient for large databases. The algorithm is
   A kind of naive way of computing the Skyline points for       stated as follows:
                                     o o
a given dataset, was proposed by B¨rzs¨nyi et al . 2001 [1].
Before the skyline computation begins, a window is being            • Compute the Median: In this step we compute the
allocated some space in the main memory so that it keeps              median mp of the input dataset for some “dominat-
the resultant candidates. If there is not enough memory for           ing” dimension dp . For example, if we query for all
all the candidate skyline points a temporary file is created           “cheap” hotels in Los Angeles, our “dominating” di-
which stores all the points that are incomparable with the            mension for computing the median is the dimension
candidate skyline points of the window. Each tuple p from             “price” for each input hotel. We then divide the in-
the input dataset is compared with all the tuples in the              put dataset into two parts: P1 and P2 . P1 contains
window. Such a comparison has three probabilities:                    all tuples whose value of attribute dp is “better” (for
                                                                      our example: is less than or cheaper) than mp . P2
   • Eliminate a tuple from input dataset: A tuple from               contains the remaining tuples of the given input.
     the dataset is eliminated, when it is dominated by at
     least one tuple in the window. The eliminated tuple is         • Compute the Skyline recursively: In a recursive fash-
     not considered for comparison in the rest of algorithm.          ion, the partitions P1 and P2 computed in the earlier
                                                                      step are further partitioned and this continues until Pi
   • Eliminate tuples from window : When a tuple in the               (i = 1, 2) contains one (or very few) tuples.Then com-
     dataset dominates one or more tuples in the window,              puting the skyline point (points) is trivial. The result
     the former substitutes all the latter ones in the win-           consists of the skyline points of Pi , and it is kept in Si
     dow. The tuples thus removed from the dataset are                respectively.
     treated similar to a tuple that has been eliminated
     from the input dataset , and are hence not considered          • Compute the Overall Skyline: From the way that we
     again for comparison.                                            partitioned the data, none of the tuples in S1 can be
                                                                      dominated by a tuple in S2 , since a tuple in S1 is bet-
   • Incomparable tuples: When a tuple is incomparable to             ter in dimension dp than every tuple in S2 . Hence, the
     all other tuples in the window, if there is enough space,        only thing that we need to do in the merging phase, is
     it is inserted in the window. Otherwise it is copied to a        to eliminate all the tuples in S2 that are being dom-
     temporary file and is considered for comparison (only             inated by some tuple in S1 . Then, the skyline points
     with the rest tuples in the temporary file)in the next            will be S1 ∪ merge(S1 , S2 )1 .
     iteration.
                                                                 Complexity: Both the best case and the worst case com-
For the very first iteration, both the window and the tempo-      plexity is O(n ∗ (log n)d−2 ) + O(n ∗ log n); where n is the
rary file is empty and hence, the first tuple is inserted in the   number of input tuples and d is the number of dimensions in
window. The iteration continues unless all the tuples in the     the Skyline. Hence, we expect this algorithm to outperform
dataset have been compared with the tuples in the window.        BNL in worst cases and to be worse in good ones.
   Complexity: The best case complexity is O(n), and that
is when all the candidate skyline points fit in the memory        2.4 Branch-and-Bound Skyline Algorithm (BBS)
at every instance. It’s obvious, that in the worst case, the
                                                                   In the year 2005, Papadias et al. [5] proposed the Branch-
complexity is O(n2 ) which is the same as the naive nested
                                                                 And-Bound Skyline algorithm. This algorithm is based on
loop algorithm (brute force, see §3.4).
                                                                 the Nearest-Neighbor Search to find the skyline points us-
2.3 Basic Divide-and-Conquer Algorithm (BDC)                     ing a multidimensional index (R-Trees). The algorithm is
                                                                 divided into fours steps:
  As stated in [1], Kung et al . in 1975 proposed the
first “divide-and-conquer” algorithm for computing skylines.      1
                                                                   In section §3.2 we present in detail the merging phase,
 o o
B¨rzs¨nyi et al. [1] used the same approach and modified in       which is the most important part of this algorithm.
     Figure 3: Skyline Points of the Island dataset                   Figure 4: Skyline Points of the Long Beach dataset


     • Index: use an R-tree to index the objects and set the          Conquer, the index was not necessary.2 However, since it
       list of Skyline points (S ) to Null                            was available, it helped us improved the I/Os of the Block
                                                                      Nested loop algorithm. For the Divide and conquer, we just
     • Access Root: Insert all the entries of the root in a heap,     use the R-tree to access the data points and store them in a
       and sort them according to their minimum distance              list. Finally, the use of the R-tree index is essential for the
       from a minimum point in ascending order. For positive          Branch and Bound, since it is a key point for implementing
       values this point could be the origin (i.e. x = [0...0]T )     the algorithm. In the next paragraphs, we present some
       while for negative values, this point could be a point         details about the implementation of the three algorithms.
       that consists of the minimum values at each direction.
       The minimum distance (mindist) of a node of the R-             3.1 Block Nested Loop
       tree to a point is given by the Euclidean distance of             For this algorithm, we use a skyline list (the window ) ,
       the point to the lower left corner of the node’s Mini-         which at every instance keeps the candidate skyline points.
       mum Bounding Rectangle(MBR). The mindist of                    At each step, this list is compared with the current tree
       two points is their Euclidean distance.                        node (index, leaf) , and only if the node is not dominated
                                                                      by some candidate skyline points we visit its children. The
     • Compute Skyline Points: The procedure starts by ex-            same thing happens at the leaves level: if an entry in a
       amining the elements of the heap. If the element is
                                                                      leaf is dominated by some point in the list, we don’t visit
       an intermediate node,and it is not dominated by some
                                                                      the actual data point. If not, we visit the actual point and
       point in S, it is expanded and all its children that are
                                                                      we include it in the candidate skyline points list. In the
       not dominated by some point in S are inserted in the           end, and when all the necessary nodes of the tree have been
       heap. If the top element of the heap is a data point,          traversed, the candidate skyline points list contains only the
       it is inserted in S. Summarizing, the following steps          skyline points.
       that are taken:
                                                                         When we visit actual points that are not dominated by
                                                                      some candidate skyline point, we first check if there is enough
          – Expand the entry e in the root which has the
                                                                      memory to add them in the list. If not, we store the data in
            lowest mindist,
                                                                      the disk (as the algorithm indicates,[1] ). However, for our
          – Compare e with all the entries in S and if e is           datasets (see §4) this wasn’t necessary, since we had enough
            “dominated” by any point in S , discard e                 memory to keep all the candidate skyline points in memory.
                                                                         The implementation can be found in src/rtree/RTree.cc
          – If e is not “dominated” and e is an intermediate
                                                                      file of the Spatialindex library [2] under the name bnl.
            node, insert each child of e that is not dominated,
            in the heap
                                                                      3.2 Divide and Conquer
          – If e is a data point, insert it in the skyline list S .     In order to implement this algorithm we need some nec-
                                                                      essary functions that we describe in this section.
   Complexity: The main memory requirement for the BBS                  Function travereTree(): This method traverses the
is at the same order of the size of skyline list, since both the      tree and adds all the points in a list, which if there is enough
heap and the main-memory R-tree sizes are of at this order.           space, is kept in memory. Otherwise points are stored in the
Furthermore, the number of node accesses by BBS is at most            disk.
s.h, where s is the number of skyline points, and h the height          Function findMedian(): It finds the median, mp , for a
of the R-tree.                                                        given dimension, dp .
                                                                      2
3.     IMPLEMENTATION DETAILS                                           We could implement these two algorithms without the use
                                                                      of the R-tree index, but since we wanted to be consistent
  All the three algorithms have the same R-tree files as an            when comparing these two with the branch and bound al-
input. For both the Block Nested Loop and the Divide and              gorithm, we finally used the same framework.
    Function partition(): It splits our data into two par-
titions, P1 , P2 , according to the median mp that has been
found earlier by findMedian(). In the case where the first
partition is empty3 we find the median of one other dimen-
sion, d′ , and we partition the data again. This way, we
guarantee that each point in P1 is not dominated by any
point in P2 .
    Function skylineBasic(data): If there is only one point
in the data list, then this point is returned. Otherwise, we
find the median of the data list, and we partition it into P1
and P2 (by calling partition()). Then, we call skylineBasic(P1 )
and skylineBasic(P2 ) and we keep the results in S1 and S2 .
So far, we know that all the points in S1 are not dominated
by any point in S2 (this is due to the way that we partition-
ing the data- see §3.2). Hence we need to merge S1 with S2
in a way to find the points in S2 that are not being domi-             Figure 5: Skyline Points of the Sierpinski dataset
nated by any point in S1 . In the end, the union of S1 and
the result of the merge are the skyline points.
    Function merge(): This is the most important phase
of the algorithm. We implement it as described in [1], so
that we eliminate unnecessary comparisons. First, we check
whether any of the two lists S1 , S2 has only one element -
trivial cases. Then, we check if the points are 2-dimensional4 .
and if so, we find the minimum of S1 (min) in a dimension
d′ , where d′ = dp . As a result, we return all the points in
S2 that have their d′ -dimension less than min (those are the
points that are incomparable with all the points in S1 ). If
the dataset is of higher dimension (> 2), we partition both
sets in some other dimension dg (with median mg ) and we
get four sets: S1,1 , S1,2 for S1 , and S2,1 , S2,2 for S2 (notice
again that any point in S1,1 is not dominated by any point
in S1,2 etc.). Then we merge recursively S1,1 with S2,1 (as
a result we get all the points in S2,1 that are not dominated
by some point in S1,1 ) ,S1,2 with S2,2 and finally we merge
                                                                      Figure 6: Skyline Points of the Sierpinski dataset
S1,1 with the result of the last merge (that is all the points
in S2,2 that are not dominated by some point in S1,2 are
merging diagonal with S1,1 ). Schematically, those compar-
isons are presented in Figure 25 . As a result, we return the        intermediate entry. If it is, we add all its children that are
union of the first and the third merge:                               not being dominated by some point in S into the heap. If e
                                                                     is a data point, we add it into S.
     merge(S1,1 , S2,1 ) ∪ merge(S1,1 , merge(S1,2 , S2,2 ))            If the dataset has negative points, we traverse the tree
                                                                     to find the minimum coordinate in every direction, create
  All these methods are implemented with the same names              a point that consists of all these coordinates, and compute
in src/rtree/RTree.cc file of the Spatialindex library [2].           the minimum distances to this point. It is obvious that in
                                                                     this case, the number of disk I/Os increases, since we have
3.3 Branch and Bound                                                 to go through all the data and find the minimum values.
   As it is already mentioned, the R-tree index is necessary         However, if somehow we have this minimum value before-
for this algorithm. After the generation of the R-tree, if the       hand, we can have the same I/Os as if the dataset had only
dataset is “positive” (it includes only positive points) the         positive points.
origin is considered as the minimum point. The minimum                  The implementation of the algorithm can be found in
distances of the roots are computed, and the heap is created         src/rtree/RTree.cc file of the SpaialIndex library [2], under
by inserting the roots in ascending order. At each instance,         the name bbsQuery.
we remove the top entry e of the heap , and we check if
e is dominated by some point in our skyline points list, S.
If it is, we discard it, otherwise we check whether e is an          3.4 Brute Force
                                                                        For validation and comparison purposes, we also imple-
3
  The first partition consists of the points that have their dp -     mented a brute force approach for finding the skyline points.
dimension value less than mp . Hence, is the only partition          As it can be inferred, this method consists of one nested for
that can be empty, and this could only happen in the case            loop, that compares each point with all the others, and if
where the median is also the minimum in dp dimension (all
the points belong to the second partition).                          it’s not dominated, it’s been added in the skyline list. As
4
  Here we don’t actually mean the dimension of the initial           soon as a point is found to be dominated, we exit the loop,
dataset, but the number of dimensions that have not been             and continue with the next point.
partitioned yet.                                                        The implementation can be found in src/rtree/RTree.cc
5
  The figure is taken from [1].                                       file of the SpaialIndex library [2], under the name bruteF orce.
 Algorithm      Index I/Os    Leaves I/Os     Data I/Os   Total I/Os    Algorithm   Index I/Os   Leaves I/Os   Data I/Os   Total I/Os
    BF              15            924           63383       64322          BF          108          7714        531441      539263
   DCA              15            924           63383       64322         DCA          108          7714        531441      539263
   BNL              15            924           5191         6130         BNL          108          7714           1          7823
   BBS              15            671            467         1153         BBS           50           66            1          117


Table 1: Island (2d) IOs - toal points:63383, skyline                  Table 4: Sierpinski(2d) IOs - total points:531441,
points:467                                                             skyline points:1

 Algorithm      Index I/Os    Leaves I/Os     Data I/Os   Total I/Os    Algorithm   Index I/Os   Leaves I/Os   Data I/Os   Total I/Os
    BF              8             526           36298       36832          BF           7            374        25375        25756
   DCA              8             526           36298       36832         DCA           7            374        25375        25756
   BNL              8             526            263         797          BNL           7            374          315         696
   BBS              16            601           36324       36941         BBS           15           511        25415        25940


Table 2: Long Beach(2d)IOs (Negative points) - to-                     Table 5: US cities(2d) IOs - total points:25375, sky-
tal points:36298 points, skyline points:26                             line points:40


4.      DATA                                                           5.1 Correctness
  We use five datasets to evaluate the three algorithms we                We first check the correctness of our implementations. We
implemented by extending the Spatial Index Library [2].                do this in two ways:
Those datasets are:                                                       • For the 2-dimensional datasets, we depict in the same
                                                                            figure the skyline points with the dataset points. By
      • Islands: 63,383 2-dimensional points of an Island,                  observing those figures, we can tell whether our re-
                                                                            sulted skyline points are correct or not.
      • Beaches: 36,298 2-dimensional coordinates of road in-
        tersections in the Long Beach County, CA, (includes               • For all the datasets, we find the skyline points by using
        negative points)                                                    the brute force method described in §3.4 and compare
                                                                            them with the ones that our implementations provide
      • Sierpinski: 531,441 2-dimensional points representing
        a Sierpinski triangle fractal,                                   After applying the algorithms, we had the same skyline
                                                                       points for each dataset, and also the same skyline points
      • US Cities: 25,375 2-dimensional points representing            with the ones that resulted from the brute force approach.
        the cordinates of US cities, (includes negative points)        This proves the correctness of our implementations.
                                                                         Table 3 presents the number of skyline points that we
      • NBA: 17,265 5-dimensional points, representing some            found for each dataset. Furthermore, for all the 2-dimensional
        statistics of NBA players.                                     datasets we present the skyline points in comparison with
                                                                       the dataset points in figures 3, 4, 5, and 6 . Correctness for
     A tuple in an n-dimensional dataset is in the form:               these datasets is also visible.

                             id,v1 ,...,dvn                            5.2 Disk Inputs - Outputs
                                                                         As we mentioned in §3.3, our implementation of BBS
5.     EXPERIMENTAL ANALYSIS                                           has different number of disk IOs depending on whether
                                                                       the dataset includes negative points. Hence, we split the
   In this section we analyze the results after applying all
                                                                       datasets into the “positive” and the “negative” ones and we
three algorithms to all available datasets. The procedure
                                                                       compare the algorithms accordingly.
follows two steps:
                                                                         Positive set {Island, Sierpinski, NBA}: Tables 1,4,6
      • Create an R-tree index for each dataset                        present the IOs for each algorithm, for Island ,Sierpinski
                                                                       and NBA datasets respectively. It’s obvious, that Branch
      • Load the stored index and find the skyline points by            and Bound has less IOs than any other algorithm. This is
        applying the algorithms                                        due to the fact, that BBS visits only the skyline points. The
                                                                       way we implement Block Nested Loop (exploiting the R-tree
All comparisons are taking place after the creation of the             index), results in less IOs than we would expect without the
R-tree index.                                                          use of the R-tree index (in that case, all the data would be
                                                                       visited in order to compared with the skyline points). Divide
                                                                       and Conquer has to visit all the points, and so does the brute
           Dataset     Dataset Points         Skyline Points
                                                                       force approach (BF ), and hence they are equivalent as far
            Island         63383                   467                 as IOs cost (however Divide and Conquer is a lot faster
          L. Beach         36298                    26                 than BF in most of the cases, as we present in 5.3). It is
          Sierpinski      531441                    1                  important here to state again that all points of every dataset
          US cities        25375                    40                 fit in memory, and that’s why Divide and Conquer has the
             NBA           17265                   495                 same IOs with BF .
                                                                         Negative Set {Long Beach, US Cities}: Tables 2
             Table 3: Resulted Skyline Points                          and 5 present the IOs for these two datasets. Since these
    Algorithm   Index I/Os   Leaves I/Os   Data I/Os   Total I/Os             Dataset                BF      DCA           BNL            BBS
       BF           5            253         17265       17523
      DCA           5            253         17265       17523
                                                                               Island                223     1.067         0.79           0.27
      BNL           5            253         9225         9483               L. Beach               0.335    0.565         0.084          0.17
      BBS           5            253          495          753               Sierpinski              2.3     9.534         0.93           0.045
                                                                             US cities              0.625    0.415         0.065          0.13
Table 6: NBA (5d) IOs - total points: 17265, skyline                            NBA                  3.7      0.58         0.57           0.21
points:495
                                                                    Table 7: Algorithm run time (in sec) for each dataset

datasets contain negative points, BBS first traverses the tree                             1.1
                                                                                                                                        NBA
to find the minimum in every dimension (as we described in                                  1
                                                                                                                                      Island
                                                                                                                                 Long Beach
§3.3). Hence, it nees more IOs than any other algorithm.                                                                           US Cities
                                                                                          0.9
This can be easily improved, by having the min point as an
input for the algorithm. In that case, the disk IOs would                                 0.8

be reduced to the number of skyline points as before. Nega-




                                                                             Time (sec)
                                                                                          0.7

tive points does not affect the rest of the algorithms: Block                              0.6
Nested Loop is better than Divide and Conquer which has
                                                                                          0.5
the same IOs as BF .
                                                                                          0.4

5.3 Run - Time Cost                                                                       0.3
   We run each algorithm three times on each dataset, and                                 0.2
we compute the average running time that need in order to                                       1       10                 100                 1000
                                                                                                             Parameter p
find the skyline points. Table 7 presents the results of our
tests.
   Positive set {Island, Sierpinski, NBA}: As we ex-
                                                                      Figure 7: Run time of DCA vs the parameter p
pected, BBS has the smallest running times when applied
in these datasets (rows 1,3,5 in Table 7). Block Nested Loop
(BN L) comes second. For the Divide and Conquer (DCA),              such time, DCA runs faster than before.
we observe that it needs more time than brute force ap-
proach for the Sierpinski dataset. This has to do with the          6. CONCLUSION
type of the dataset. As we can see in figure 5 , this dataset
has only one skyline point. Hence it could happen that this            In this project, we used the Spatial Index Library frame-
point is in the beginning of the list, and the nested loop          work [2] to implement three classical Skyline algorithms,
never completes, after finding this skyline point. As a re-          Block Nested Loop, Divide and Conquer and Branch and
sult, brute force needs just a few seconds. On the other hand       Bound. We tested each one of these algorithms in five mul-
DCA needs relatively long time to result. This is due to the        tidimensional datasets, and we compared them in terms of
fact that many of the points of this dataset (see figure 5)          disk I/Os and running time. We conclude that as long as
have same values at different dimensions, a fact that leads          the R-tree index is generated, BBS runs faster and with less
to empty partitions P1 , and hence repartitioning P2 in other       disk I/Os than anyone else. For “negative” datasets, BN L
dimensions (see §3.2). As a result, BF is faster than DCA           exploits the R-tree index characteristics and has better per-
for this dataset. An other thing to notice here, is the huge        formance, since BBS need to make a search in the beginning
time that BF needs in order to find end the skyline of Island        for finding the minimum point of the set. Finally, we have
dataset: 223 seconds. This happens because this time, the           seen that generally DCA runs better than the brute force
skyline points are 467 (relatively big number) and hence the        approach, but still worse than the two other approaches.
nested loop breaks less times.
   Negative Set {Long Beach, US Cities}: For these                  7. REFERENCES
cases, BN L is the fastest. BBS needs some time to traverse         [1] Stephan Borzsonyi, Konrad Stocker, and Donald
all the data from the R-tree in order to find the minimum                Kossmann. The skyline operator. Data Engineering,
point (described in §3.3). This extra time ranks BBS second             International Conference on, 0:0421, 2001.
among all the algorithms. What we said before about DCA             [2] Marios Hadjieleftheriou, Erik Hoel, and Vassilis J.
and BF holds here too, since they are not affected by the                Tsotras. Sail: A spatial index library for efficient
“negative” points.                                                      application integration. Geoinformatica, 9(4):367–389,
   Different Trivial Cases of Divide and Conquer: So                     2005.
far, we have assume that skylineBasic(data): returns only           [3] Donald Kossmann, Frank Ramsak, and Steffen Rost.
when the data size is equal to one. In the following exper-             Shooting stars in the sky: an online algorithm for
iment we increase this value (parameter p) from 1 to 1000               skyline queries. In VLDB ’02: Proceedings of the 28th
and we find the skyline points of this partition by brute                international conference on Very Large Data Bases,
force. We run the algorithm again in all the datasets. The              pages 275–286. VLDB Endowment, 2002.
results are presented in figure 76 We can notice that the best       [4] H. T. Kung F. Luccio and F. P. Preparata. On nding
choice for our datasets, are between p = 50 and p = 100. In             the maxima of a set of vectors. Journal of the ACM,
6
 We exclude Sierpinski dataset in the figure since its run               (22(4)):469476, 1975.
time is huge compared to the others. However, the dataset           [5] Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard
shows similar behavior.                                                 Seeger. Progressive skyline computation in database
    systems. ACM Trans. Database Syst., 30(1):41–82,
    2005.
[6] Kian-Lee Tan, Pin-Kwang Eng, and Beng Chin Ooi.
    Efficient progressive skyline computation. In VLDB ’01:
    Proceedings of the 27th International Conference on
    Very Large Data Bases, pages 301–310, San Francisco,
    CA, USA, 2001. Morgan Kaufmann Publishers Inc.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:128
posted:8/2/2011
language:English
pages:7