Document Sample
checkpoint Powered By Docstoc

           Distributed Geospatial Information Retrieval
                         and Integration
                                                  Dave Kolas and Ryan Blace
                                                                   with minimal effort. Utilizing RDF and OWL and mapping
Abstract—We propose a geospatial information retrieval             between various data source and application ontology, our
system that incorporates the techniques and technologies of the    approach will increase the flexibility of this integration. In
Semantic Web with a traditional Information Retrieval system.      addition, our approach experiments with various hybrid
Using the Web Ontology Language (OWL) to represent                 distributed methods for integrating structured and unstructured
concepts and relationships, the system will combine keyword-       information from various sources.
based queries with ontologically defined user preferences to
generate formal semantic queries. The system will then                                  III. RELATED WORK
distribute the semantic queries across an arbitrary number of      Significant work has been done on retrieving geospatial
information sources. The system will rank elements of the          concepts both in the realms of geospatial information retrieval
result set according to the original query and present them        and geospatial data retrieval. The proposed system draws
through the Google Earth interface.                                upon work in both areas to create a hybrid system, ideally
                                                                   more useful than either independently.
  Index Terms— Geospatial, Information Retrieval, Semantic,
Ontology, Web Ontology Language, OWL, SWRL                          A. Geospatial Information Retrieval

                        I. MOTIVATION
                                                                     B. Geospatial Data Retrieval
      here are a number of motivations for proposing this type
T     of information retrieval project. First, many information    The OGC Interoperability Experiment for the Geospatial
                                                                   Semantic Web explored the possibilities of presenting
retrieval systems suffer from the fact that retrieval algorithms
operate primarily on the syntactic, rather than semantic           geospatial services with semantic definitions for easing
similarity of documents and terms [1]. Querying semantically       discovery. Semantics were found to be extremely useful in
has the potential to greatly enhance the precision of the result   abstracting the details of geospatial data source manipulation
set by mitigating the effects of term ambiguity. Second, using     from the user, allowing a system such as the one created in the
a World Wide Web Consortium (W3C) recommended format               experiment to automatically discover and query sources.
and language for performing distributed queries (Web               <expand>
Ontology Language [2], SWRL [3]) significantly simplifies the
task of integrating new information sources. Finally, Google       The GeoSWRL project [7] aims to add geospatial processing
Earth ( is a seminal geospatial visualization     capabilities to SWRL, allowing for semantics-based systems to
tool. The development of a method to interpret RDF/OWL in          leverage the JTS Topology suite for geospatial calculations.
Google Earth would immensely useful.                               This is an essential building block to semantically combining
                                                                   multiple data sources in a geospatial manner, allowing rules to
                       II. INTRODUCTION                            “deconflict” items from multiple data sources based on
Semantic Web technologies have been shown to aid in the            <expand>
discovery and utilization of distributed geospatial data
services. [4] [5] However, much of the structured geospatial       The semantic web search engine Swoogle (
data available is not useful to the average user because they do   allows users to search for semantic content on the web via
not know about the existence of the service or how to use it.      information retrieval methods on the web. While on the
Moreover, not all types of data with a geospatial nature are       surface this is similar to what we are proposing, our approach
available in structured sources at all. Thus we propose an         differs in that our semantic sources are known but we will
attempt to combine the power of distributed geospatial             actually be leveraging the data returned from them.
semantic data sources with information retrieval techniques to
provide a user with a seamless integration of structured and       D2RQ [8] is a piece of software designed to expose relational
unstructured geospatial information with a familiar search         database data sources in a method compatible with Semantic
interface. Our approach stands apart from typical information      Web technologies. Though not directly utilized, this concept
integration solutions in that it integrates the various data       is a building block for the query decomposition component of
sources in a generic manner that will allow new sources of         the proposal.
information to be integrated or swapped with existing sources

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <                                                                  2

                                 IV. APPROACHES                                            This query will be distributed to a query decomposition and
                                                                                           distribution component.
 A. Overall Architecture
                                                                                           <expand how>
                                 (Yahoo Map API)
                                                                                             E. Query Issue and Retrieval
                                              String Tokens
                           XML Geo-Location                                                The query decomposition component will be responsible for
                                                                                           decomposing the query into a set of sub-queries that can be
        •User Profile
        •Query Paramters           IR System                  KML Results
                                                                                           distributed between the various information sources of which it
                                    •Geo-parsing                                           is aware. After decomposition, the queries are issued to the
                                  •Query expansion
                                •SPARQL generation                                         information sources, which then respond with all relevant
                                                                                           semantic information they contain. Finally, the system will
                                                                            Google Earth
                                   SPARQL/OWL                                              then process the result set to rank each element (analogous to a
                  Semantic Query Decomposition                                             document) contained in the set. This component will build
                                                                                           heavily on work done for the OGC Interoperability Experiment
                                                                                           on the Geospatial Semantic Web [6].
                                  Data Sources
                                                                                           <expand how>

When a user queries the system, the system will first take into
account their user preferences. Relevant parts of the query
will then be geocoded, and an abstract geospatial semantic                                  F. Information Visualization
query will be built. This query is passed on to the semantic
query decomposition engine. This component will break the
query down into requisite subqueries, pose the subqueries to
the underlying data sources, and combine the answers. These
answers will then be converted to KML for display in Google

  B. User Preferences and Querying
The first time the user interacts with the system, he/she will fill
out a short form that will capture general preferences of the
user with respect to the system. This will include any of the
typical personal preferences one might encounter when
populating a profile at a web site like My Space or The
Facebook. This information will be stored in the system for
future use. After that point, users will interact with the system                                            Figure 1: Google Earth Interface
by issuing keyword searches through a query interface in
Google Earth.                                                                              The information will then be returned to the Google Earth
                                                                                           interface where it will be converted and rendered onto the
<expand how>                                                                               Google Earth map. The user can interact with Google Earth,
                                                                                           selecting the various features and viewing data associated with
  C. Query Analysis and Semantic Extraction                                                each geospatial feature.
The IR system will receive the user query and submit it for
processing to a set of semantic information extraction                                     <expand how>
components. For this project, there will be a single geospatial
information extraction component that will be capable of                                     G. Data Sets
taking a search string and extracting geospatial information
from it.                                                                                   The Yahoo API will provide the system access to Flikr and
                                                                                  data feeds. The Google API will provide
<expand how>                                                                               access to Google Maps geospatial information.

                                                                                           For demonstration purposes, the system will be integrated with
  D. Semantic Query Generation                                                             Flikr and, and the Google API. The system will
The results of the extraction will be used in conjunction with                             query Flikr and to retrieve information that is
the terms of the original query and the user preferences to                                relevant to the user preferences, the query, and the geospatial
generate a SPARQL query in terms of the IR system ontology.                                information contained in the query. Structured sources will
                                                                                           include geospatial service(s) presented as an OGC WFS.

Minimally these will include services used in the OGC
Interoperability Experiment. Events will be presented on the
Google Earth map, along with images that are geospatially

                          V. EVALUATION

  A. Methods of Evaluation

  B. Test Methodology

  C. Test Results

                        VI. CONCLUSIONS

                        VII. FUTURE WORK


[1]      F. Hui and Z. ChengXiang, "Semantic term matching in axiomatic
         approaches to information retrieval," in Proceedings of the 29th
         annual international ACM SIGIR conference on Research and
         development in information retrieval Seattle, Washington, USA:
         ACM Press, 2006.
[2]      S. Bechofer, F. v. Harmelen, J. Hendler, I. Horrocks, D. L.
         McGuinness, P. F. PPatel-Schneider, and L. A. Stein, "OWL Web
         Ontology Language Reference," 2004.
[3]      I. Horrocks, P. F. Patel-Schneider, H. Bolev, S. Tabet, B. Grosof,
         and M. Dean, "SWRL: A Semantic Web Rule Language
         Combining OWL and RuleML," 2004.
[4]      M. Dean, T. Pehle, and J. Lieberman, in W3C Workshop on
         Frameworks for Semantics in Web Services Innsbruck, Austria:
         W3C, 2005.
[5]      D. Kolas, J. Hebeler, and M. Dean, "Geospatial Semantic Web:
         Architecture of Ontologies," Lecture Notes in Computer Science -
         Geospatial Semantics, vol. 3799/2005, p. 11, 2005.
[6]      S. Bacharach, "OGC to Begin Geospatial Semantic Web
         Interoperability Experiment,", 2005.
[7]      D. Kolas and W. Kammersell, "geoSWRL," 2005.
[8]      "D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs."