> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1
Distributed Geospatial Information Retrieval
Dave Kolas and Ryan Blace
with minimal effort. Utilizing RDF and OWL and mapping
Abstract—We propose a geospatial information retrieval between various data source and application ontology, our
system that incorporates the techniques and technologies of the approach will increase the flexibility of this integration. In
Semantic Web with a traditional Information Retrieval system. addition, our approach experiments with various hybrid
Using the Web Ontology Language (OWL) to represent distributed methods for integrating structured and unstructured
concepts and relationships, the system will combine keyword- information from various sources.
based queries with ontologically defined user preferences to
generate formal semantic queries. The system will then III. RELATED WORK
distribute the semantic queries across an arbitrary number of Significant work has been done on retrieving geospatial
information sources. The system will rank elements of the concepts both in the realms of geospatial information retrieval
result set according to the original query and present them and geospatial data retrieval. The proposed system draws
through the Google Earth interface. upon work in both areas to create a hybrid system, ideally
more useful than either independently.
Index Terms— Geospatial, Information Retrieval, Semantic,
Ontology, Web Ontology Language, OWL, SWRL A. Geospatial Information Retrieval
B. Geospatial Data Retrieval
here are a number of motivations for proposing this type
T of information retrieval project. First, many information The OGC Interoperability Experiment for the Geospatial
Semantic Web explored the possibilities of presenting
retrieval systems suffer from the fact that retrieval algorithms
operate primarily on the syntactic, rather than semantic geospatial services with semantic definitions for easing
similarity of documents and terms . Querying semantically discovery. Semantics were found to be extremely useful in
has the potential to greatly enhance the precision of the result abstracting the details of geospatial data source manipulation
set by mitigating the effects of term ambiguity. Second, using from the user, allowing a system such as the one created in the
a World Wide Web Consortium (W3C) recommended format experiment to automatically discover and query sources.
and language for performing distributed queries (Web <expand>
Ontology Language , SWRL ) significantly simplifies the
task of integrating new information sources. Finally, Google The GeoSWRL project  aims to add geospatial processing
Earth (earth.google.com) is a seminal geospatial visualization capabilities to SWRL, allowing for semantics-based systems to
tool. The development of a method to interpret RDF/OWL in leverage the JTS Topology suite for geospatial calculations.
Google Earth would immensely useful. This is an essential building block to semantically combining
multiple data sources in a geospatial manner, allowing rules to
II. INTRODUCTION “deconflict” items from multiple data sources based on
Semantic Web technologies have been shown to aid in the <expand>
discovery and utilization of distributed geospatial data
services.   However, much of the structured geospatial The semantic web search engine Swoogle (swoogle.umbc.edu)
data available is not useful to the average user because they do allows users to search for semantic content on the web via
not know about the existence of the service or how to use it. information retrieval methods on the web. While on the
Moreover, not all types of data with a geospatial nature are surface this is similar to what we are proposing, our approach
available in structured sources at all. Thus we propose an differs in that our semantic sources are known but we will
attempt to combine the power of distributed geospatial actually be leveraging the data returned from them.
semantic data sources with information retrieval techniques to
provide a user with a seamless integration of structured and D2RQ  is a piece of software designed to expose relational
unstructured geospatial information with a familiar search database data sources in a method compatible with Semantic
interface. Our approach stands apart from typical information Web technologies. Though not directly utilized, this concept
integration solutions in that it integrates the various data is a building block for the query decomposition component of
sources in a generic manner that will allow new sources of the proposal.
information to be integrated or swapped with existing sources
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2
IV. APPROACHES This query will be distributed to a query decomposition and
A. Overall Architecture
(Yahoo Map API)
E. Query Issue and Retrieval
XML Geo-Location The query decomposition component will be responsible for
decomposing the query into a set of sub-queries that can be
•Query Paramters IR System KML Results
distributed between the various information sources of which it
•Geo-parsing is aware. After decomposition, the queries are issued to the
•SPARQL generation information sources, which then respond with all relevant
semantic information they contain. Finally, the system will
SPARQL/OWL then process the result set to rank each element (analogous to a
Semantic Query Decomposition document) contained in the set. This component will build
heavily on work done for the OGC Interoperability Experiment
on the Geospatial Semantic Web .
When a user queries the system, the system will first take into
account their user preferences. Relevant parts of the query
will then be geocoded, and an abstract geospatial semantic F. Information Visualization
query will be built. This query is passed on to the semantic
query decomposition engine. This component will break the
query down into requisite subqueries, pose the subqueries to
the underlying data sources, and combine the answers. These
answers will then be converted to KML for display in Google
B. User Preferences and Querying
The first time the user interacts with the system, he/she will fill
out a short form that will capture general preferences of the
user with respect to the system. This will include any of the
typical personal preferences one might encounter when
populating a profile at a web site like My Space or The
Facebook. This information will be stored in the system for
future use. After that point, users will interact with the system Figure 1: Google Earth Interface
by issuing keyword searches through a query interface in
Google Earth. The information will then be returned to the Google Earth
interface where it will be converted and rendered onto the
<expand how> Google Earth map. The user can interact with Google Earth,
selecting the various features and viewing data associated with
C. Query Analysis and Semantic Extraction each geospatial feature.
The IR system will receive the user query and submit it for
processing to a set of semantic information extraction <expand how>
components. For this project, there will be a single geospatial
information extraction component that will be capable of G. Data Sets
taking a search string and extracting geospatial information
from it. The Yahoo API will provide the system access to Flikr and
Upcoming.org data feeds. The Google API will provide
<expand how> access to Google Maps geospatial information.
For demonstration purposes, the system will be integrated with
D. Semantic Query Generation Flikr and Upcoming.org, and the Google API. The system will
The results of the extraction will be used in conjunction with query Flikr and Upcoming.org to retrieve information that is
the terms of the original query and the user preferences to relevant to the user preferences, the query, and the geospatial
generate a SPARQL query in terms of the IR system ontology. information contained in the query. Structured sources will
include geospatial service(s) presented as an OGC WFS.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3
Minimally these will include services used in the OGC
Interoperability Experiment. Events will be presented on the
Google Earth map, along with images that are geospatially
A. Methods of Evaluation
B. Test Methodology
C. Test Results
VII. FUTURE WORK
 F. Hui and Z. ChengXiang, "Semantic term matching in axiomatic
approaches to information retrieval," in Proceedings of the 29th
annual international ACM SIGIR conference on Research and
development in information retrieval Seattle, Washington, USA:
ACM Press, 2006.
 S. Bechofer, F. v. Harmelen, J. Hendler, I. Horrocks, D. L.
McGuinness, P. F. PPatel-Schneider, and L. A. Stein, "OWL Web
Ontology Language Reference," 2004.
 I. Horrocks, P. F. Patel-Schneider, H. Bolev, S. Tabet, B. Grosof,
and M. Dean, "SWRL: A Semantic Web Rule Language
Combining OWL and RuleML," 2004.
 M. Dean, T. Pehle, and J. Lieberman, in W3C Workshop on
Frameworks for Semantics in Web Services Innsbruck, Austria:
 D. Kolas, J. Hebeler, and M. Dean, "Geospatial Semantic Web:
Architecture of Ontologies," Lecture Notes in Computer Science -
Geospatial Semantics, vol. 3799/2005, p. 11, 2005.
 S. Bacharach, "OGC to Begin Geospatial Semantic Web
Interoperability Experiment," opengeospatial.org, 2005.
 D. Kolas and W. Kammersell, "geoSWRL," 2005.
 "D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs."