Efficient k Nearest Neighbor Queries on Remote Spatial Databases
Document Sample


Efficient k Nearest Neighbor
Queries on Remote Spatial
Databases Using Range
Estimation
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Center for Advanced Information Systems, School of Computer Engineering
Nanyang Technological University, Nanyang Ave, Singapore 639798, Singapore
Outline
Introduction
Related work
k-NN query algorithm based on range estimation
Range estimation methods
Experiments
Conclusions
SSDBM2002 2
Introduction
Spatial database provides persistent storage for
spatial objects (e.g., points, polylines, polygons)
Spatial database supports
Representation of spatial attributes
Storage/indexing of spatial data values using some
spatial indices (e.g., R-tree and Quadtree)
Queries involving spatial attributes
SSDBM2002 3
k-Nearest Neighbor Queries
Definition
k-Nearest Neighbor (k-NN) query: locating k spatial
objects nearest to a given query point
Wide range of applications:
Geographic Information Systems (GIS), e.g., finding
the nearest two hospitals
Computer Aided Design (CAD), e.g, finding the
nearest three resistors in a circuit board
SSDBM2002 4
Motivation
Large volume of spatial data on WWW
Geospatial Data Clearinghouse (a collection of over
250 spatial database servers)
Yahoo, Tiger and other map services
Limited Web-based query interfaces
Support simple spatial queries (e.g., window
queries)
No support for remote index access
SSDBM2002 5
The Geospatial Data Clearinghouse
Large amount of useful geospatial information on WWW
SSDBM2002 6
The Geospatial Data Clearinghouse
Limited Web-based query interface; supports only window
queries
SSDBM2002 7
Objective
Develop efficient algorithms to evaluate k-NN
queries on remote spatial databases using
window queries:
Propose a generic k-NN query processing
algorithm that accommodates different range
estimation methods
Develop efficient range estimation methods
Conduct experiments to evaluate performance of
proposed range estimation methods
Develop sampling methods to obtain statistical
knowledge of remote databases needed for range
estimation methods
SSDBM2002 8
Related Work
Algorithms for simple k-NN queries may be
divided into three major groups:
Partition-based algorithms
Graph-based algorithms
Range-based algorithms
SSDBM2002 9
Partition-based Algorithms
Retrieve k nearest neighbors from spatial indices
by pruning away nodes that cannot lead to k
nearest neighbors
Examples
Branch-and-bound R-tree traversal algorithm
Pipelined fashion algorithm
Not applicable to Web environment
Spatial indices are usually not available to non-
local applications
Creating local indices is infeasible due to large
amount of data
SSDBM2002 10
Graph-based Algorithms
Pre-compute nearest neighbors of spatial objects;
create new index structures for pre-computed
nearest neighbor information to support search
Example
Voronoi-based algorithm
Not applicable to Web environment
Retrieving all spatial objects on remote database
servers is sometimes impractical
Creating local indices is infeasible due to large
amount of data
SSDBM2002 11
Range-based Algorithms
Use range queries to retrieve k nearest neighbors
Examples
Use sampling for range estimation
Use distance distributions for range estimation
Use reference points for range estimation
Not applicable to Web environment
Determining sample size and selecting samples of
spatial objects properly are still a challenge
Creating local indices is infeasible due to large
amount of data
SSDBM2002 12
Proposed k-NN Algorithm
Based on range estimation
New strategies for k-NN query evaluation in Web
environment are required
Use window queries for probing spatial database
SSDBM2002 13
Density-based Range Estimation Method
Based on uniform spatial object distribution
assumption
Range estimated by EstiRange1 function is
Ranges estimated by EstiRange2 function are
SSDBM2002 14
Bucket-based Range Estimation Method
Use summary information about partitions or
buckets of spatial objects for range estimation
Summary information
Bucket MBB, number of spatial objects in bucket
Buckets are created using different strategies [1]
Sort the set of max distance between buckets and
query point
Range estimated is the minimal bucket-query point
max distance that contains at least k nearest
neighbor objects
Use one window query
SSDBM2002 15
Example: k = 5
SSDBM2002 16
Experiments
New Jersey road dataset from TIGER [30]
SSDBM2002 17
Performance measures:
Number of iterations h
h
nni
k
Average accuracy
A i
h
h
nni
o
Average efficiency
A i
h
SSDBM2002 18
Experimental Results
Minimum, maximum and upper bounds on the
number of iterations of the density-based range
estimation method
SSDBM2002 19
Iteration and accuracy of the density-based range
estimation method
SSDBM2002 20
Experimental Results
Efficiency of density-based and bucket-based
range estimation methods
SSDBM2002 21
Conclusions
A window query approach to evaluate k-NN
queries on remote spatial databases motivated
by
Large amount of spatial information on the Web
Limited query interface
Proposed range estimation methods
Performances increase with k.
No a clear winner
SSDBM2002 22
SSDBM2002 23
Types of Range Estimation Methods
Tight estimation methods
Estimated range is not large enough; i.e., both
EstiRange1 and EstiRange2 functions may be
invoked
e.g., density-based method
Loose estimation methods
Estimated range is large enough; i.e., only the
EstiRange1 function is invoked
e.g., bucket-based method
SSDBM2002 24
Future Work
Extending range estimation methods with
sampling techniques to determine data
distribution
Current range estimation methods depend on
statistical knowledge provided by database owners
Investigate how the statistical knowledge can be
approximated through sampling
Developing strategies to select the appropriate
range estimation methods for evaluating k-NN
queries.
Developing Web applications of k-NN queries.
SSDBM2002 25
Four Strategies to Create Buckets
Equi-Count, Equi-Area, Min-Skew, and Min-Overlap partitioning
strategies [1]
Charminar Dataset Spatial Densities in Charminar Equi-Area Partitioning
Equi-Count Partitioning Min-Skew Partitioning Min-Overlap Partitioning
SSDBM2002 26
Related docs
Get documents about "