# Implementation Of Skyline Query Algorithms by jizhen1947

VIEWS: 128 PAGES: 7

• pg 1
```									                Implementation Of Skyline Query Algorithms

Marios Kokkodis                                         Pamela Bhattacharya
University of California Riverside                        University of California Riverside
mak@cs.ucr.edu                                           pamelab@cs.ucr.edu

ABSTRACT                                                            ios Hadjieleftheriou [2] to support skyline queries. More spe-
In this work we extend the public available Spatial Index           ciﬁc, we implement the following three skyline algorithms:
Library to support skyline queries. More speciﬁc, We im-
• Block Nested Loop (BN L) - [1]
plement three basic algorithms (Block Nested Loop, Divide
and Conquer and Branch and Bound ) and the naive brute
force approach. Finally we test these implementations in               • Divide and Conquer (DCA) - [1]
various datasets, and we compare their eﬃciency in terms
of disk I/Os and run time.                                             • Branch and Bound (BBS) - [5]

Furthermore we run those algorithms in various datasets,
1.   INTRODUCTION                                                   and we compare their eﬃciency in terms of disk IOs and
Skyline points are deﬁned as those points which dominate         run time.
all the other points in a dataset. The dominating criteria
are deﬁned in a way that a point is part of the skyline (i.e.
it dominates all other points) if it is as good as or better        2. ALGORITHMS DESCRIPTION
in all dimensions than the rest of the points. The skyline
points are equally good, and don’t dominate each other.             2.1 Computing “Dominance"
Computing the skyline points of a dataset is essential for
The notion of dominance is very crucial in the deﬁnition of
applications that involve multi-criteria decision making. As
skyline points. Given two data points, we compute whether
o o
an example, we adopt the one given by B¨rzs¨nyi et. al.
one dominates the other or they are incomparable by using
in [1]. Often tourists search for “cheap hotels near the
the dominating factor speciﬁed in the query. In the classical
beach”. Such query, should return all hotels that dominate
o o
example provided by B¨rzs¨nyi et al . in [1] the dominat-
the others in at least one of the two attributes (i.e. cost, dis-
ing factors are whether a hotel is “cheap” and “near to the
tance to the beach). The result, is the skyline of the queried
beach”. The formal deﬁnition of dominance is as follows [1]:
dataset. This example, taken from [1] is presented in ﬁgure
1.                                                                                                    Tuple p =
The ﬁrst skyline query algorithm was proposed by Kung                (p1 , ....., pk , pk+1 , ....., pl , pl+1 , ....., pm , pm+1 , ....., pn )
et. al. in 1975 [4] . It was based in the “divide and con-
quer” strategy and it had high complexity. In the later eight-
ies and nineties several algorithms were proposed, which did                                          Tuple q =
not gain much popularity as they could not provide solutions            (q1 , ....., qk , qk+1 , ....., ql , ql+1 , ....., qm , qm+1 , ....., qn )
for datasets that would not ﬁt in main memory. B¨rzs¨nyio o
et. al. [1] in 2001 were the ones to ﬁrst introduce the notion        Let’s say that we have the following Skyline Query:
of skyline queries for signiﬁcantly large databases. They pro-
vided three algorithms: Block Nested Loop, Basic Divide and           Skyline of d1 M IN, ....., dk M IN, dk+1 M AX, ....., dl
Conquer and M-way Divide and Conquer. In the same year,                          M AX, dl+1 DIF F, dm DIF F
Tan et. al. [6] presented two other novel approaches for com-
puting the skyline points: Bitmap and Index. Furthermore,             where, M IN (M AX - DIF F ) indicates that we are look-
in 2002 Kossmann et al. [3] presented the NN-Algorithm for          ing for a minimum (maximum - diﬀerent) point in that spe-
evaluating 2-dimensional skylines. Finally Papadias et al.          ciﬁc dimension. Then we can say that tuple p dominates
in 2005 presented the Branch and Bound algorithm [5]. Re-           tuple q if the following 3 conditions hold:
cently, several optimized and novel approaches of computing
skyline queries have been proposed , but there are out of the                             pi ≤ qi for all i = 1, ....., k
scope of this work.
In this work, we extend the Spatial Index Library by Mar-                          pi ≥ qi for all i = (k + 1), ....., l

pi = qi for all i = (l + 1), ....., m
Figure 1: Skyline of “Cheap hotels near the beach”                Figure 2: Merging Phase of Divide and Conquer

2.2 Block Nested Loop (BNL)                                      a way that is eﬃcient for large databases. The algorithm is
A kind of naive way of computing the Skyline points for       stated as follows:
o o
a given dataset, was proposed by B¨rzs¨nyi et al . 2001 [1].
Before the skyline computation begins, a window is being            • Compute the Median: In this step we compute the
allocated some space in the main memory so that it keeps              median mp of the input dataset for some “dominat-
the resultant candidates. If there is not enough memory for           ing” dimension dp . For example, if we query for all
all the candidate skyline points a temporary ﬁle is created           “cheap” hotels in Los Angeles, our “dominating” di-
which stores all the points that are incomparable with the            mension for computing the median is the dimension
candidate skyline points of the window. Each tuple p from             “price” for each input hotel. We then divide the in-
the input dataset is compared with all the tuples in the              put dataset into two parts: P1 and P2 . P1 contains
window. Such a comparison has three probabilities:                    all tuples whose value of attribute dp is “better” (for
our example: is less than or cheaper) than mp . P2
• Eliminate a tuple from input dataset: A tuple from               contains the remaining tuples of the given input.
the dataset is eliminated, when it is dominated by at
least one tuple in the window. The eliminated tuple is         • Compute the Skyline recursively: In a recursive fash-
not considered for comparison in the rest of algorithm.          ion, the partitions P1 and P2 computed in the earlier
step are further partitioned and this continues until Pi
• Eliminate tuples from window : When a tuple in the               (i = 1, 2) contains one (or very few) tuples.Then com-
dataset dominates one or more tuples in the window,              puting the skyline point (points) is trivial. The result
the former substitutes all the latter ones in the win-           consists of the skyline points of Pi , and it is kept in Si
dow. The tuples thus removed from the dataset are                respectively.
treated similar to a tuple that has been eliminated
from the input dataset , and are hence not considered          • Compute the Overall Skyline: From the way that we
again for comparison.                                            partitioned the data, none of the tuples in S1 can be
dominated by a tuple in S2 , since a tuple in S1 is bet-
• Incomparable tuples: When a tuple is incomparable to             ter in dimension dp than every tuple in S2 . Hence, the
all other tuples in the window, if there is enough space,        only thing that we need to do in the merging phase, is
it is inserted in the window. Otherwise it is copied to a        to eliminate all the tuples in S2 that are being dom-
temporary ﬁle and is considered for comparison (only             inated by some tuple in S1 . Then, the skyline points
with the rest tuples in the temporary ﬁle)in the next            will be S1 ∪ merge(S1 , S2 )1 .
iteration.
Complexity: Both the best case and the worst case com-
For the very ﬁrst iteration, both the window and the tempo-      plexity is O(n ∗ (log n)d−2 ) + O(n ∗ log n); where n is the
rary ﬁle is empty and hence, the ﬁrst tuple is inserted in the   number of input tuples and d is the number of dimensions in
window. The iteration continues unless all the tuples in the     the Skyline. Hence, we expect this algorithm to outperform
dataset have been compared with the tuples in the window.        BNL in worst cases and to be worse in good ones.
Complexity: The best case complexity is O(n), and that
is when all the candidate skyline points ﬁt in the memory        2.4 Branch-and-Bound Skyline Algorithm (BBS)
at every instance. It’s obvious, that in the worst case, the
In the year 2005, Papadias et al. [5] proposed the Branch-
complexity is O(n2 ) which is the same as the naive nested
And-Bound Skyline algorithm. This algorithm is based on
loop algorithm (brute force, see §3.4).
the Nearest-Neighbor Search to ﬁnd the skyline points us-
2.3 Basic Divide-and-Conquer Algorithm (BDC)                     ing a multidimensional index (R-Trees). The algorithm is
divided into fours steps:
As stated in [1], Kung et al . in 1975 proposed the
ﬁrst “divide-and-conquer” algorithm for computing skylines.      1
In section §3.2 we present in detail the merging phase,
o o
B¨rzs¨nyi et al. [1] used the same approach and modiﬁed in       which is the most important part of this algorithm.
Figure 3: Skyline Points of the Island dataset                   Figure 4: Skyline Points of the Long Beach dataset

• Index: use an R-tree to index the objects and set the          Conquer, the index was not necessary.2 However, since it
list of Skyline points (S ) to Null                            was available, it helped us improved the I/Os of the Block
Nested loop algorithm. For the Divide and conquer, we just
• Access Root: Insert all the entries of the root in a heap,     use the R-tree to access the data points and store them in a
and sort them according to their minimum distance              list. Finally, the use of the R-tree index is essential for the
from a minimum point in ascending order. For positive          Branch and Bound, since it is a key point for implementing
values this point could be the origin (i.e. x = [0...0]T )     the algorithm. In the next paragraphs, we present some
while for negative values, this point could be a point         details about the implementation of the three algorithms.
that consists of the minimum values at each direction.
The minimum distance (mindist) of a node of the R-             3.1 Block Nested Loop
tree to a point is given by the Euclidean distance of             For this algorithm, we use a skyline list (the window ) ,
the point to the lower left corner of the node’s Mini-         which at every instance keeps the candidate skyline points.
mum Bounding Rectangle(MBR). The mindist of                    At each step, this list is compared with the current tree
two points is their Euclidean distance.                        node (index, leaf) , and only if the node is not dominated
by some candidate skyline points we visit its children. The
• Compute Skyline Points: The procedure starts by ex-            same thing happens at the leaves level: if an entry in a
amining the elements of the heap. If the element is
leaf is dominated by some point in the list, we don’t visit
an intermediate node,and it is not dominated by some
the actual data point. If not, we visit the actual point and
point in S, it is expanded and all its children that are
we include it in the candidate skyline points list. In the
not dominated by some point in S are inserted in the           end, and when all the necessary nodes of the tree have been
heap. If the top element of the heap is a data point,          traversed, the candidate skyline points list contains only the
it is inserted in S. Summarizing, the following steps          skyline points.
that are taken:
When we visit actual points that are not dominated by
some candidate skyline point, we ﬁrst check if there is enough
– Expand the entry e in the root which has the
memory to add them in the list. If not, we store the data in
lowest mindist,
the disk (as the algorithm indicates,[1] ). However, for our
– Compare e with all the entries in S and if e is           datasets (see §4) this wasn’t necessary, since we had enough
“dominated” by any point in S , discard e                 memory to keep all the candidate skyline points in memory.
The implementation can be found in src/rtree/RTree.cc
– If e is not “dominated” and e is an intermediate
ﬁle of the Spatialindex library [2] under the name bnl.
node, insert each child of e that is not dominated,
in the heap
3.2 Divide and Conquer
– If e is a data point, insert it in the skyline list S .     In order to implement this algorithm we need some nec-
essary functions that we describe in this section.
Complexity: The main memory requirement for the BBS                  Function travereTree(): This method traverses the
is at the same order of the size of skyline list, since both the      tree and adds all the points in a list, which if there is enough
heap and the main-memory R-tree sizes are of at this order.           space, is kept in memory. Otherwise points are stored in the
Furthermore, the number of node accesses by BBS is at most            disk.
s.h, where s is the number of skyline points, and h the height          Function ﬁndMedian(): It ﬁnds the median, mp , for a
of the R-tree.                                                        given dimension, dp .
2
3.     IMPLEMENTATION DETAILS                                           We could implement these two algorithms without the use
of the R-tree index, but since we wanted to be consistent
All the three algorithms have the same R-tree ﬁles as an            when comparing these two with the branch and bound al-
input. For both the Block Nested Loop and the Divide and              gorithm, we ﬁnally used the same framework.
Function partition(): It splits our data into two par-
titions, P1 , P2 , according to the median mp that has been
found earlier by ﬁndMedian(). In the case where the ﬁrst
partition is empty3 we ﬁnd the median of one other dimen-
sion, d′ , and we partition the data again. This way, we
guarantee that each point in P1 is not dominated by any
point in P2 .
Function skylineBasic(data): If there is only one point
in the data list, then this point is returned. Otherwise, we
ﬁnd the median of the data list, and we partition it into P1
and P2 (by calling partition()). Then, we call skylineBasic(P1 )
and skylineBasic(P2 ) and we keep the results in S1 and S2 .
So far, we know that all the points in S1 are not dominated
by any point in S2 (this is due to the way that we partition-
ing the data- see §3.2). Hence we need to merge S1 with S2
in a way to ﬁnd the points in S2 that are not being domi-             Figure 5: Skyline Points of the Sierpinski dataset
nated by any point in S1 . In the end, the union of S1 and
the result of the merge are the skyline points.
Function merge(): This is the most important phase
of the algorithm. We implement it as described in [1], so
that we eliminate unnecessary comparisons. First, we check
whether any of the two lists S1 , S2 has only one element -
trivial cases. Then, we check if the points are 2-dimensional4 .
and if so, we ﬁnd the minimum of S1 (min) in a dimension
d′ , where d′ = dp . As a result, we return all the points in
S2 that have their d′ -dimension less than min (those are the
points that are incomparable with all the points in S1 ). If
the dataset is of higher dimension (> 2), we partition both
sets in some other dimension dg (with median mg ) and we
get four sets: S1,1 , S1,2 for S1 , and S2,1 , S2,2 for S2 (notice
again that any point in S1,1 is not dominated by any point
in S1,2 etc.). Then we merge recursively S1,1 with S2,1 (as
a result we get all the points in S2,1 that are not dominated
by some point in S1,1 ) ,S1,2 with S2,2 and ﬁnally we merge
Figure 6: Skyline Points of the Sierpinski dataset
S1,1 with the result of the last merge (that is all the points
in S2,2 that are not dominated by some point in S1,2 are
merging diagonal with S1,1 ). Schematically, those compar-
isons are presented in Figure 25 . As a result, we return the        intermediate entry. If it is, we add all its children that are
union of the ﬁrst and the third merge:                               not being dominated by some point in S into the heap. If e
is a data point, we add it into S.
merge(S1,1 , S2,1 ) ∪ merge(S1,1 , merge(S1,2 , S2,2 ))            If the dataset has negative points, we traverse the tree
to ﬁnd the minimum coordinate in every direction, create
All these methods are implemented with the same names              a point that consists of all these coordinates, and compute
in src/rtree/RTree.cc ﬁle of the Spatialindex library [2].           the minimum distances to this point. It is obvious that in
this case, the number of disk I/Os increases, since we have
3.3 Branch and Bound                                                 to go through all the data and ﬁnd the minimum values.
As it is already mentioned, the R-tree index is necessary         However, if somehow we have this minimum value before-
for this algorithm. After the generation of the R-tree, if the       hand, we can have the same I/Os as if the dataset had only
dataset is “positive” (it includes only positive points) the         positive points.
origin is considered as the minimum point. The minimum                  The implementation of the algorithm can be found in
distances of the roots are computed, and the heap is created         src/rtree/RTree.cc ﬁle of the SpaialIndex library [2], under
by inserting the roots in ascending order. At each instance,         the name bbsQuery.
we remove the top entry e of the heap , and we check if
e is dominated by some point in our skyline points list, S.
If it is, we discard it, otherwise we check whether e is an          3.4 Brute Force
For validation and comparison purposes, we also imple-
3
The ﬁrst partition consists of the points that have their dp -     mented a brute force approach for ﬁnding the skyline points.
dimension value less than mp . Hence, is the only partition          As it can be inferred, this method consists of one nested for
that can be empty, and this could only happen in the case            loop, that compares each point with all the others, and if
where the median is also the minimum in dp dimension (all
the points belong to the second partition).                          it’s not dominated, it’s been added in the skyline list. As
4
Here we don’t actually mean the dimension of the initial           soon as a point is found to be dominated, we exit the loop,
dataset, but the number of dimensions that have not been             and continue with the next point.
partitioned yet.                                                        The implementation can be found in src/rtree/RTree.cc
5
The ﬁgure is taken from [1].                                       ﬁle of the SpaialIndex library [2], under the name bruteF orce.
Algorithm      Index I/Os    Leaves I/Os     Data I/Os   Total I/Os    Algorithm   Index I/Os   Leaves I/Os   Data I/Os   Total I/Os
BF              15            924           63383       64322          BF          108          7714        531441      539263
DCA              15            924           63383       64322         DCA          108          7714        531441      539263
BNL              15            924           5191         6130         BNL          108          7714           1          7823
BBS              15            671            467         1153         BBS           50           66            1          117

Table 1: Island (2d) IOs - toal points:63383, skyline                  Table 4: Sierpinski(2d) IOs - total points:531441,
points:467                                                             skyline points:1

Algorithm      Index I/Os    Leaves I/Os     Data I/Os   Total I/Os    Algorithm   Index I/Os   Leaves I/Os   Data I/Os   Total I/Os
BF              8             526           36298       36832          BF           7            374        25375        25756
DCA              8             526           36298       36832         DCA           7            374        25375        25756
BNL              8             526            263         797          BNL           7            374          315         696
BBS              16            601           36324       36941         BBS           15           511        25415        25940

Table 2: Long Beach(2d)IOs (Negative points) - to-                     Table 5: US cities(2d) IOs - total points:25375, sky-
tal points:36298 points, skyline points:26                             line points:40

4.      DATA                                                           5.1 Correctness
We use ﬁve datasets to evaluate the three algorithms we                We ﬁrst check the correctness of our implementations. We
implemented by extending the Spatial Index Library [2].                do this in two ways:
Those datasets are:                                                       • For the 2-dimensional datasets, we depict in the same
ﬁgure the skyline points with the dataset points. By
• Islands: 63,383 2-dimensional points of an Island,                  observing those ﬁgures, we can tell whether our re-
sulted skyline points are correct or not.
• Beaches: 36,298 2-dimensional coordinates of road in-
tersections in the Long Beach County, CA, (includes               • For all the datasets, we ﬁnd the skyline points by using
negative points)                                                    the brute force method described in §3.4 and compare
them with the ones that our implementations provide
• Sierpinski: 531,441 2-dimensional points representing
a Sierpinski triangle fractal,                                   After applying the algorithms, we had the same skyline
points for each dataset, and also the same skyline points
• US Cities: 25,375 2-dimensional points representing            with the ones that resulted from the brute force approach.
the cordinates of US cities, (includes negative points)        This proves the correctness of our implementations.
Table 3 presents the number of skyline points that we
• NBA: 17,265 5-dimensional points, representing some            found for each dataset. Furthermore, for all the 2-dimensional
statistics of NBA players.                                     datasets we present the skyline points in comparison with
the dataset points in ﬁgures 3, 4, 5, and 6 . Correctness for
A tuple in an n-dimensional dataset is in the form:               these datasets is also visible.

id,v1 ,...,dvn                            5.2 Disk Inputs - Outputs
As we mentioned in §3.3, our implementation of BBS
5.     EXPERIMENTAL ANALYSIS                                           has diﬀerent number of disk IOs depending on whether
the dataset includes negative points. Hence, we split the
In this section we analyze the results after applying all
datasets into the “positive” and the “negative” ones and we
three algorithms to all available datasets. The procedure
compare the algorithms accordingly.
follows two steps:
Positive set {Island, Sierpinski, NBA}: Tables 1,4,6
• Create an R-tree index for each dataset                        present the IOs for each algorithm, for Island ,Sierpinski
and NBA datasets respectively. It’s obvious, that Branch
• Load the stored index and ﬁnd the skyline points by            and Bound has less IOs than any other algorithm. This is
applying the algorithms                                        due to the fact, that BBS visits only the skyline points. The
way we implement Block Nested Loop (exploiting the R-tree
All comparisons are taking place after the creation of the             index), results in less IOs than we would expect without the
R-tree index.                                                          use of the R-tree index (in that case, all the data would be
visited in order to compared with the skyline points). Divide
and Conquer has to visit all the points, and so does the brute
Dataset     Dataset Points         Skyline Points
force approach (BF ), and hence they are equivalent as far
Island         63383                   467                 as IOs cost (however Divide and Conquer is a lot faster
L. Beach         36298                    26                 than BF in most of the cases, as we present in 5.3). It is
Sierpinski      531441                    1                  important here to state again that all points of every dataset
US cities        25375                    40                 ﬁt in memory, and that’s why Divide and Conquer has the
NBA           17265                   495                 same IOs with BF .
Negative Set {Long Beach, US Cities}: Tables 2
Table 3: Resulted Skyline Points                          and 5 present the IOs for these two datasets. Since these
Algorithm   Index I/Os   Leaves I/Os   Data I/Os   Total I/Os             Dataset                BF      DCA           BNL            BBS
BF           5            253         17265       17523
DCA           5            253         17265       17523
Island                223     1.067         0.79           0.27
BNL           5            253         9225         9483               L. Beach               0.335    0.565         0.084          0.17
BBS           5            253          495          753               Sierpinski              2.3     9.534         0.93           0.045
US cities              0.625    0.415         0.065          0.13
Table 6: NBA (5d) IOs - total points: 17265, skyline                            NBA                  3.7      0.58         0.57           0.21
points:495
Table 7: Algorithm run time (in sec) for each dataset

datasets contain negative points, BBS ﬁrst traverses the tree                             1.1
NBA
to ﬁnd the minimum in every dimension (as we described in                                  1
Island
Long Beach
§3.3). Hence, it nees more IOs than any other algorithm.                                                                           US Cities
0.9
This can be easily improved, by having the min point as an
input for the algorithm. In that case, the disk IOs would                                 0.8

be reduced to the number of skyline points as before. Nega-

Time (sec)
0.7

tive points does not aﬀect the rest of the algorithms: Block                              0.6
Nested Loop is better than Divide and Conquer which has
0.5
the same IOs as BF .
0.4

5.3 Run - Time Cost                                                                       0.3
We run each algorithm three times on each dataset, and                                 0.2
we compute the average running time that need in order to                                       1       10                 100                 1000
Parameter p
ﬁnd the skyline points. Table 7 presents the results of our
tests.
Positive set {Island, Sierpinski, NBA}: As we ex-
Figure 7: Run time of DCA vs the parameter p
pected, BBS has the smallest running times when applied
in these datasets (rows 1,3,5 in Table 7). Block Nested Loop
(BN L) comes second. For the Divide and Conquer (DCA),              such time, DCA runs faster than before.
we observe that it needs more time than brute force ap-
proach for the Sierpinski dataset. This has to do with the          6. CONCLUSION
type of the dataset. As we can see in ﬁgure 5 , this dataset
has only one skyline point. Hence it could happen that this            In this project, we used the Spatial Index Library frame-
point is in the beginning of the list, and the nested loop          work [2] to implement three classical Skyline algorithms,
never completes, after ﬁnding this skyline point. As a re-          Block Nested Loop, Divide and Conquer and Branch and
sult, brute force needs just a few seconds. On the other hand       Bound. We tested each one of these algorithms in ﬁve mul-
DCA needs relatively long time to result. This is due to the        tidimensional datasets, and we compared them in terms of
fact that many of the points of this dataset (see ﬁgure 5)          disk I/Os and running time. We conclude that as long as
have same values at diﬀerent dimensions, a fact that leads          the R-tree index is generated, BBS runs faster and with less
to empty partitions P1 , and hence repartitioning P2 in other       disk I/Os than anyone else. For “negative” datasets, BN L
dimensions (see §3.2). As a result, BF is faster than DCA           exploits the R-tree index characteristics and has better per-
for this dataset. An other thing to notice here, is the huge        formance, since BBS need to make a search in the beginning
time that BF needs in order to ﬁnd end the skyline of Island        for ﬁnding the minimum point of the set. Finally, we have
dataset: 223 seconds. This happens because this time, the           seen that generally DCA runs better than the brute force
skyline points are 467 (relatively big number) and hence the        approach, but still worse than the two other approaches.
nested loop breaks less times.
Negative Set {Long Beach, US Cities}: For these                  7. REFERENCES
cases, BN L is the fastest. BBS needs some time to traverse         [1] Stephan Borzsonyi, Konrad Stocker, and Donald
all the data from the R-tree in order to ﬁnd the minimum                Kossmann. The skyline operator. Data Engineering,
point (described in §3.3). This extra time ranks BBS second             International Conference on, 0:0421, 2001.
among all the algorithms. What we said before about DCA             [2] Marios Hadjieleftheriou, Erik Hoel, and Vassilis J.
and BF holds here too, since they are not aﬀected by the                Tsotras. Sail: A spatial index library for eﬃcient
“negative” points.                                                      application integration. Geoinformatica, 9(4):367–389,
Diﬀerent Trivial Cases of Divide and Conquer: So                     2005.
far, we have assume that skylineBasic(data): returns only           [3] Donald Kossmann, Frank Ramsak, and Steﬀen Rost.
when the data size is equal to one. In the following exper-             Shooting stars in the sky: an online algorithm for
iment we increase this value (parameter p) from 1 to 1000               skyline queries. In VLDB ’02: Proceedings of the 28th
and we ﬁnd the skyline points of this partition by brute                international conference on Very Large Data Bases,
force. We run the algorithm again in all the datasets. The              pages 275–286. VLDB Endowment, 2002.
results are presented in ﬁgure 76 We can notice that the best       [4] H. T. Kung F. Luccio and F. P. Preparata. On nding
choice for our datasets, are between p = 50 and p = 100. In             the maxima of a set of vectors. Journal of the ACM,
6
We exclude Sierpinski dataset in the ﬁgure since its run               (22(4)):469476, 1975.
time is huge compared to the others. However, the dataset           [5] Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard
shows similar behavior.                                                 Seeger. Progressive skyline computation in database
systems. ACM Trans. Database Syst., 30(1):41–82,
2005.
[6] Kian-Lee Tan, Pin-Kwang Eng, and Beng Chin Ooi.
Eﬃcient progressive skyline computation. In VLDB ’01:
Proceedings of the 27th International Conference on
Very Large Data Bases, pages 301–310, San Francisco,
CA, USA, 2001. Morgan Kaufmann Publishers Inc.

```
To top