Efficient k Nearest Neighbor Queries on Remote Spatial Databases

Document Sample
Efficient k Nearest Neighbor Queries on Remote Spatial Databases Powered By Docstoc
					 Efficient k Nearest Neighbor
 Queries on Remote Spatial
 Databases Using Range
 Estimation

Danzhou Liu              Ee-Peng Lim Wee-Keong Ng
 Center for Advanced Information Systems, School of Computer Engineering
Nanyang Technological University, Nanyang Ave, Singapore 639798, Singapore
Outline
   Introduction
   Related work
   k-NN query algorithm based on range estimation
   Range estimation methods
   Experiments
   Conclusions




SSDBM2002                                            2
Introduction
   Spatial database provides persistent storage for
    spatial objects (e.g., points, polylines, polygons)
   Spatial database supports
       Representation of spatial attributes
       Storage/indexing of spatial data values using some
        spatial indices (e.g., R-tree and Quadtree)
       Queries involving spatial attributes




SSDBM2002                                                 3
k-Nearest Neighbor Queries
   Definition
       k-Nearest Neighbor (k-NN) query: locating k spatial
        objects nearest to a given query point
   Wide range of applications:
       Geographic Information Systems (GIS), e.g., finding
        the nearest two hospitals
       Computer Aided Design (CAD), e.g, finding the
        nearest three resistors in a circuit board




SSDBM2002                                                     4
Motivation
   Large volume of spatial data on WWW
       Geospatial Data Clearinghouse (a collection of over
        250 spatial database servers)
       Yahoo, Tiger and other map services
   Limited Web-based query interfaces
       Support simple spatial queries (e.g., window
        queries)
       No support for remote index access




SSDBM2002                                                 5
The Geospatial Data Clearinghouse
   Large amount of useful geospatial information on WWW




SSDBM2002                                                  6
The Geospatial Data Clearinghouse
   Limited Web-based query interface; supports only window
    queries




SSDBM2002                                                     7
Objective
   Develop efficient algorithms to evaluate k-NN
    queries on remote spatial databases using
    window queries:
       Propose a generic k-NN query processing
        algorithm that accommodates different range
        estimation methods
       Develop efficient range estimation methods
       Conduct experiments to evaluate performance of
        proposed range estimation methods
       Develop sampling methods to obtain statistical
        knowledge of remote databases needed for range
        estimation methods

SSDBM2002                                                8
Related Work
   Algorithms for simple k-NN queries may be
    divided into three major groups:
       Partition-based algorithms
       Graph-based algorithms
       Range-based algorithms




SSDBM2002                                       9
Partition-based Algorithms
   Retrieve k nearest neighbors from spatial indices
    by pruning away nodes that cannot lead to k
    nearest neighbors
   Examples
       Branch-and-bound R-tree traversal algorithm
       Pipelined fashion algorithm
   Not applicable to Web environment
       Spatial indices are usually not available to non-
        local applications
       Creating local indices is infeasible due to large
        amount of data

SSDBM2002                                                   10
Graph-based Algorithms
   Pre-compute nearest neighbors of spatial objects;
    create new index structures for pre-computed
    nearest neighbor information to support search
   Example
       Voronoi-based algorithm



   Not applicable to Web environment
       Retrieving all spatial objects on remote database
        servers is sometimes impractical
       Creating local indices is infeasible due to large
        amount of data

SSDBM2002                                                   11
Range-based Algorithms
   Use range queries to retrieve k nearest neighbors
   Examples
       Use sampling for range estimation
       Use distance distributions for range estimation
       Use reference points for range estimation
   Not applicable to Web environment
       Determining sample size and selecting samples of
        spatial objects properly are still a challenge
       Creating local indices is infeasible due to large
        amount of data


SSDBM2002                                                   12
Proposed k-NN Algorithm
   Based on range estimation
   New strategies for k-NN query evaluation in Web
    environment are required
   Use window queries for probing spatial database




SSDBM2002                                             13
Density-based Range Estimation Method
   Based on uniform spatial object distribution
    assumption
       Range estimated by EstiRange1 function is




       Ranges estimated by EstiRange2 function are




SSDBM2002                                             14
Bucket-based Range Estimation Method
   Use summary information about partitions or
    buckets of spatial objects for range estimation
       Summary information
          Bucket MBB, number of spatial objects in bucket

       Buckets are created using different strategies [1]
       Sort the set of max distance between buckets and
        query point
       Range estimated is the minimal bucket-query point
        max distance that contains at least k nearest
        neighbor objects
       Use one window query


SSDBM2002                                                15
   Example: k = 5




SSDBM2002            16
Experiments
   New Jersey road dataset from TIGER [30]




SSDBM2002                                     17
   Performance measures:
       Number of iterations h
                               h

                                      nni
                                             k
       Average accuracy 
        A                      i
                                       h
                                   h

                                       nni
                                              o
       Average efficiency 
        A                          i
                                        h




SSDBM2002                                         18
Experimental Results
   Minimum, maximum and upper bounds on the
    number of iterations of the density-based range
    estimation method




SSDBM2002                                             19
   Iteration and accuracy of the density-based range
    estimation method




SSDBM2002                                           20
Experimental Results
   Efficiency of density-based and bucket-based
    range estimation methods




SSDBM2002                                          21
Conclusions
   A window query approach to evaluate k-NN
    queries on remote spatial databases motivated
    by
       Large amount of spatial information on the Web
       Limited query interface
   Proposed range estimation methods
       Performances increase with k.
       No a clear winner




SSDBM2002                                                22
SSDBM2002   23
Types of Range Estimation Methods
   Tight estimation methods
       Estimated range is not large enough; i.e., both
        EstiRange1 and EstiRange2 functions may be
        invoked
       e.g., density-based method
   Loose estimation methods
       Estimated range is large enough; i.e., only the
        EstiRange1 function is invoked
       e.g., bucket-based method




SSDBM2002                                                 24
Future Work
   Extending range estimation methods with
    sampling techniques to determine data
    distribution
       Current range estimation methods depend on
        statistical knowledge provided by database owners
       Investigate how the statistical knowledge can be
        approximated through sampling
   Developing strategies to select the appropriate
    range estimation methods for evaluating k-NN
    queries.
   Developing Web applications of k-NN queries.
SSDBM2002                                               25
Four Strategies to Create Buckets
   Equi-Count, Equi-Area, Min-Skew, and Min-Overlap partitioning
    strategies [1]




    Charminar Dataset Spatial Densities in Charminar Equi-Area Partitioning




Equi-Count Partitioning Min-Skew Partitioning        Min-Overlap Partitioning

SSDBM2002                                                               26