Enabling Semantic Interoperability for Earth Science Data Final Report to NASA Earth Science Technology Office (ESTO) Rob Raskin Jet Propulsion Laboratory Abstract- Data interope rability across Markup Language (ESML), Earth Science heterogeneous systems can be hampered by Modeling Framework (ESMF), and the Open GIS differences in terminology, particularly when consortium (OGC). Key to the success of these multiple scientific communities are involved. To initiatives is the development of a common reconcile differences in semantics, a common semantic framework. Such a framework enables semantic frame work was created through the dataset and science concepts to be understood by development of Earth science ontologies. Such a software tools. The framework goes beyond data shared understanding of concepts enables interoperability by supporting knowledge reuse, or ontology-aware software tools to understand the the exchange of conceptual knowledge within and meaning of te rms in docume nts and we b pages. across these disciplines. This report updates last year's Semantic Web This framework can be achieved through the for Earth and Environmental Terminology "Semantic Web" (Fensel, et al., 2003), an (SWEET) prototype. For the recent work, we ambitious extension to the existing WWW incorporated concepts of other funded environment, coordinated by the World Wide initiatives such as ESML, ESMF, grid Consortium (W3C). The Semantic Web encodes computing, and OGC. We also created a system common sense knowledge directly into web pages to update its knowledge base as needed, from themselves, using broadly agreed upon namespaces gazetteers and other on-line Web sources. An and ontologies to define terms and their mutual accompanying search tool supports system-wide relationships. search and ultimately, a wide range of semantically-based web services. The motivation of our task is to improve semantic understanding of web resources by This report includes some background software tools, with specific application to material that appeared in last year’s report that discovery and use of Earth science data. Semantic is repeated to convey a self-contained understanding of text by automated tools is enabled understanding of the subject. This report through the combined use of i) ontologies and ii) concludes with road maps for various software tools that can interpret the ontologies. An technology initiatives. ontology is a formal representation of technical concepts and their interrelations in a form that supports domain knowledge. Generally, an 1. INTRODUCTION ontology is hierarchical, with child concepts having explicit properties to specialize their parent Earth system science data originate from many concept(s). disciplines, spanning several community standards, terminologies, and data formats. Several initiatives A Semantic Web emerges if terms on web pages are underway to develop a common infrastructure are associated with corresponding elements in to improve data interoperability across the ontologies. This is accomplished by placing an disciplines. Examples include the: Earth Science XML tag around a term to identify its associated ontology namespace. A search tool potentially can Web Mapping Service (WMS) and Web Coverage use these metadata tags to distinguish different uses Service (WCS) protocols. The HDF-EOS and of the same term (e.g. “fall” as a season vs. “fall” OGC solutions enable a data seeker to query and as a downward motion) to eliminate false hits. It access data by spatial/temporal parameters rather also can locate resources without having an exact than by array row/columns (which would require keyword match, because terms such as “El Nino” human intervention). Thus a software tool have an equivalent definition in terms of its understanding these conventions can access any defining scientific components. HDF-EOS or OGC-compliant dataset and be guaranteed that the spatial-temporal interpretation To support potential Semantic Web activities, we is known. developed a collection of ontologies for the Earth and environmental sciences and supporting areas. Semantic interoperability for dependent variables We created a common sense knowledge base of the has generally meant the use of controlled Earth sciences using the Ontology Web Language keywords. For instance, the NASA GCMD defines (OWL) , a standard adopted by the W3C. We approximately 1000 controlled keywords, each use these ontologies in a prototype search tool that with a dictionary definition. Such a representation improves performance by creating additional does not support computer reasoning that would be relevant search terms based on the underlying required to respond to general queries or chain semantics. We demonstrate how such a knowledge services together. It does not support inheritance base can be “virtual” by adding a wrapper around of concepts for knowledge reuse, does not provide remote, dynamic data repositories. a rich expression of the relationship between the keywords and is not directly extendable by the 1.1 SEMANTIC INTEROPERABILITY user. This project addresses a more scalable solution to semantic interoperability in the context In the early days of computing, an initial level of of the Earth sciences. data interoperability resulted when data structures (arrays) created on one computer system were 2. ONTOLOGY DEVELOPMENT readable by another computer. Data formats such as HDF emerged to extend this level of An ontology is a formal representation of interoperability to more complex data structures technical concepts and their interrelations in a form and across vendor platforms and enabled the that captures domain knowledge. Generally, an preservation of variable names. The Internet later ontology is hierarchical, with child concepts having brought on protocols such as DODS, which explicit properties to specialize their parent supported modification of the data structure (subset concept(s). Thus, “hydrosphere” is the parent extraction) during the transfer. Exchanges of this concept of “surface water”, which is a parent of type say nothing about the scientific interpretation “river”, which is a parent of “Mississippi River”, of the data on the receiving end. A variable name etc. In this paper, we describe our experiences is assigned to a data structure, but human with the development of Earth and environmental intervention is required to make sense of it. science ontologies. The HDF-EOS format remedied the semantic In the initial year of ESTO funding, we created interoperability problem for independent variables the Semantic Web for Earth and Environmental by standardizing the naming convention of spatial Terminology (SWEET)  to prototype how a and temporal parameters. The Open GIS Semantic Web can be implemented in the Earth Consortium (OGC) provides a similar level of sciences. We used the terms in the Global Change spatial/temporal interoperability problem in its Master Directory (GCMD)  as a starting point in manually populating the ontologies, but … We defined multidimensional concepts such as reorganized and expanded the concepts to form a coordinate systems, mathematical operators and scalable framework. Later, we incorporated an functions. analogous keyword list used in the Earth Science Modeling Framework (ESMF) . Temporal Entity Time is essentially a numerical scale with Earth Realm terminology specific to the temporal domain. We The “spheres” of the Earth constitute an developed a time ontology in which the temporal EarthRealm ontology, based upon the physical extents and relations are special cases of numeric properties of the planet. Elements of this ontology extents and relations, respectively. Temporal include “atmosphere”, “ocean”, and “solid earth”, extents include: duration, season, century, 1996, … and associated subrealms (such as “ocean floor” Temporal relations include: after, before, … and “atmospheric boundary layer”). The subrealms generally are distinguished from their parent Spatial Entity classes, based on the property of altitude, e.g., Space is essentially a 3-D numerical scale with “troposphere” is the subclass of “atmosphere” terminology specific to the spatial domain. We where elevation is between 0 and 15 km. developed a space ontology in which the spatial extents and relations are special cases of numeric Non-Living Element (Substance) extents and relations, respectively. Spatial extents This ontology includes the non- living building include: country, Antarctica, equator, … Spatial blocks on nature, such as: particles, relations include: above, northOf, … electromagnetic radiation, and chemical compounds. Phenomena A phenomena ontology is used to define transient Living Element events. A phenomenon crosses bounds of other This ontology includes plant and animal species, ontology elements. Examples include: hurricane, imported from the GCMD “biosphere” taxonomy. earthquake, El Nino, volcano, terrorist event, and each has associated Time, Space, EarthRealms, Physical Property NonLivingElements, LivingElements, etc. We also A separate ontology was developed for physical include specific instances of recent phenomena. properties that might be associated with any component of EarthRealm, NonLivingElements, or Human Activities LivingElements. PhysicalProperties include This ontology is included for representing “temperature”, “pressure”, “height”, “a lbedo”, impacts of environmental phenomena such as etc. commerce, fisheries, etc. Units Data Units are defined using Unidata’s UDUnits. The The data ontology provides support for dataset resulting ontology includes conversion factors concepts, including representation, storage, between various units. Prefixed units such as km modeling, format, resources, grid computing, and are defined as a special case of m with appropriate distribution. conversion factor. 2.1. ONTOLOGIES AS A UNIFYING Numerical Entity KNOWLEDGE FRAMEWORK Numerical extents include: interval, point, 0, R2 , … Numerical relations include: greaterThan, max, The first several ontologies listed above represent meaning for: cardinality, inverse properties, orthogonal concepts (or dimensions), often called synonyms, and many more concepts in three facets. Traversing down the tree associated with a versions: OWL Lite, Owl DL, and OWL Full. The facet follows the scientific path of reductionism by four languages (RDF, Owl Lite, OWL DL, OWL adding additional details to more abstract concepts. Full) offer a nested set of language capabilities. We adopted OWL Full due to its anticipated A completely different type of ontology is widespread acceptance over the coming years. Our encountered in “phenomena”, as this category is ontologies initially were written in the DARPA synergetic rather than orthogonal to the others. Markup Language (DAML), a predecessor to The phenomena entries describe synthesizing OWL, and converted these ontologies to OWL concepts that utilize elements from the other Full. ontologies (e.g., a hurricane is associated with particular coastal areas, and is characterized by OWL has support for numbers only through a high winds, rainfall, flood impacts, etc.). Thus, W3C specification . This spec defines number phenomena are defined in terms of combinations of types (e.g., real numbers, unsigned integer) and elements from the faceted concepts. The “Human some abilities to create derivations of these types activities” ontology also is a unifying, rather than (e.g. the closed interval between 0 and 1). It reductionist collection. contains no operations or relations on these numbers. This is a deficiency, because basic Taken together, these two complementary scientific concepts are defined in terms of numeric approaches mirror the scientist’s dual processes of concepts. For example, “brighter”, “higher”, reductionism and synthesis. This structure “later”, and “more northerly” are special cases of provides a relatively complete framework for the “greater than” relation, when applied in specific capturing scientific knowledge. Using OWL, we domains. In particular, spectral regions are defined relate concepts in these two approaches. in terms of wavelength (e.g. visible light is Generally, unifying concepts are built up and between 0.3 and 0.7 nanometers), atmospheric defined in terms of individual facets. layers are defined by altitude (e.g. troposphere is Alternatively, facets can be defined through between 0 and 15 km), etc. This specification also projection operations on unifying concepts. has no notion of a multidimensional space Rn . 2.2 ONTOLOGY LANGUAGES Repositories of OWL ontologies exist to enable the work of others to be extended. However, at An ontology is expressed using a language that is present there are no ontologies supporting numeric typically a specialization of XML. XML is widely operations (e.g. “greater than”, “max”). Several supported by existing software tools and is spatial and temporal ontologies exist, but these platform- independent. The World Wide ontologies do not exploit the fact that space and Consortium (W3C) has adopted two XML time are numerical scales. Therefore, the languages as its standard method of representing numerical, space, time, and event ontologies that ontologies: the Resource Description Framework we developed for SWEET will be submitted to a (RDF) and the Ontology Web Language (OWL). general OWL ontology library. Each of these languages is rich enough to express the hierarchical structures inherent in knowledge XML-based languages such as OWL are well representation. RDF specializes XML by suited to data and model exchange, but are less standardizing meanings for: class, subclass, practical for storage and query of large ontologies. property, subproperty, domain, range, etc. OWL is Existing database management systems provide the a further specialization of RDF; it adds standard needed functionality in storage and indexing of robust ontologies, including support for data formal commitment from the ESMF project to use integrity, concurrency control, etc. Consequently, our ontology at this time. we adopted the Postgres object-oriented DBMS to store the names and parent-child relations of our The Earth System Grid (ESG)  is a DOE- ontology elements. We created two-way funded project to use grid computing in support of translators between the internal DBMS Earth system modeling. We included the grid representation and the standard XML concepts into the SWEET data ontology and are representation of OWL properties. By placing all working with Line Pouchard, ESG Project term declarations in the DBMS, any search for Associate at ORNL, to achieve consistency terms is very rapid. between ESG and SWEET. For representation of spatial concepts, we used The Open GIS Consortium uses standard bounding polygons to describe regions, where representations for coordinate systems and possible. Polygons are a native datatype in geometric objects. While it was not practical to PostGRES. include all of these entities in the SWEET spatial ontology, we included the more widely used ones. 3. SEMANTIC INTEROPERABILITY WITH OTHER INITIATIVES 4. DYNAMIC ACCESS TO ONTOLOGY ELEMENTS The Earth Science Markup Language (ESML) combines an XML-based language for describing Many Earth science facts reside in large external datasets with an API read library. Its XML tags are databases. We created OWL wrappers to enable of two types: syntactic (for reading data) and several of these database contents to be accessible semantic (for interpreting data). ESML no longer as if they were local ontology elements. The maintains semantic tags within its libraries and databases include three gazetteers: CIA World Map relies instead on external ontologies to provide that , Getty Thesaurus , and the Calle Global functionality. Thus, SWEET tags may be used to Gazetteer . Gazetteers translate vernacular provide the semantic content of any ESML file. names to and from geographic coordinates. We Examples include: science subject, geographic added polygon boundaries to many gazetteer coordinate system, scaling factors & offsets, etc. entries that otherwise contained only rectangular bounding boxes. Also included are the USGS real- ESMF is an effort to make large Earth System time list of earthquakes  and the Heavens models interoperable. Model interoperability Above real-time list of satellite locations . A involves knowing input/outut compatibility and Web Map Server (WMS)  import capability parameter tables. We defined within SWEET the was added to acquire images and maps accessible model parameters required to ascertain model through WMS-compliant servers. A map-based interoperability. ESMF also uses the list of 350 interface demonstrates all of these capabilities by variable names, defined under the CF/Standard querying the external sources in response to user name conventions. Most of these terms are requests. concatenations of several terms (e.g. temperature_at_top_of_boundary_layer). We The gazetteer entries generally include fields for mapped the terms to the SWEET ontology, so that bounding rectangle but not bounding polygons. this list of terms could grow more naturally. We The polygon information is available separately are working with Cecelia DeLuca, Project from other sources for state and international Associate at UCAR, to ensure compatibility boundaries. We inserted the bounding polygon between ESMF and SWEET, though there is no data into the internal SWEET database. In many cases, the size of the polygon exceeded what could resolution, model assumptions, etc.) is required to be stored natively in PostGRES, and we reduced enable community comparisons and review. By the spatial resolution. defining a target concept in terms of ontology concepts, such a representation can be articulated. 5. INTELLIGENT SEARCH ENGINE This functionality is particularly important in on- board processing systems, where knowledge must A search tool that is aided by an ontology can be reused to identify targets of interest for locate resources without having an exact keyword enhanced data collection. It is recommended that match. To demonstrate this capability, we created data mining activities be required to use the full a search tool that consults the SWEET ontology to expressive capabilities of an ontology. Without a find related terms. These terms may be formal requirement, it is unlikely that algorithm synonymous (same as), more specific (child of), or developers will voluntarily contribute this less specific (parent of) than those requested. The information. tool then submits the union of these terms to the GCMD search tool and presents the results. The Web Services results verified that additional relevant terms were Based on current business application trends, it is found from the search, relative to the exact likely that a wide range of Web services will be keyword search. The search tool is implemented as established in the Earth sciences to locate, acquire a web service using the RQDL (RDF Query and use data. WSDL and UDDI are currently used Language). Once the synonyms and parent-child to describe and advertise services, respectively. relationships have been discovered, the augmented WSDL and UDDI address the semantics of query returns resulting GCMD DIF summaries. An requests, only to the extent that ontologies are extension of this search tool will be incorporated referenced in the service descriptions. It is into the Earth Science Information Partner (ESIP) recommended that future data-oriented Web Federation Interactive Network for Discovery service descriptions be required to use the full (FIND). expressive capabilities of an ontology. This suggestion is especially pertinent in a grid 6. ROADMAPS computing environment, where ontologies can describe what services may be chained together The following mini-strategies describe and how this choreography is implemented. opportunities for exploiting semantic interoperability in future NASA and ESTO work. Science Domain Specialist Involvement These recommendations may be difficult to Obtaining review of existing SWEET ontologies implement immediately, due to the tendency of has been very difficult. This situation is due in part scientists to retain their narrow disciplinary to the limited tools available for ontology perspectives. But emerging demands for cross- visualization, as the dimensionality of the semantic disciplinary science and automated data services space is very large. Dedicated workshops focusing will rely heavily on semantic interoperability. on 3 –D walkthroughs of the semantic space might inspire much greater community involvement in Data Mining the review process. Making this happen will The target of a data mining algorithm generally is require investments in the relevant 3-D a phenomena of interest, as defined within the visualization technologies and support of mining algorithm. The definition of the workshops for domain specialists. phenomena often is hidden or incomplete, as there is no standard language for its expression. A complete description (including spatial/temporal Standards http://earthquake.usgs.gov/recenteqsww/Quakes/qu Currently, NASA- funded Earth science data akes_all.html products must be classified using the GCMD Science Keywords. It is recommended that this  Heavens Above. http://www.heavens- requirement be relaxed to allow an alternative above.com classification, such as representation in a SWEET ontology. This requirement is of secondary  Web Map Server. http://opengis.org importance because we provide a transformation table on our Web site between GCMD and SWEET representations. REFERENCES Fensel, D., J. Hendler, H. Lieberman, W. Wahlster (Eds.), 2003, Spinning the Semantic Web, MIT Press, Cambridge, 479 pp. INTERNET REFERENCES  OWL. http://www.w3.org/TR/owl-ref  SWEET. http://sweet.jpl.nasa.gov  GCMD Science Keywords and Directory Keywords. http://gcmd.nasa.gov/Resources/valids  CF Standard name table. http://www.cgd.ucar.edu/cms/eaton/cf- metadata/standard_name.html  XML Schema Part 2: Datatypes. http://www.w3.org/TR/xmlschema-2.  Earth System Grid. http://earthsystemgrid.org  CIA World Factbook. http://www.cia.gov/cia/publications/factbook/  Getty Thesaurus of Place Names. http://www.getty.edu/research/conducting_research /vocabularies/tgn/  Calle Global Gazetteer. http://www.calle.com/world  Earthquake List for World.
Pages to are hidden for
"Enabling Semantic Interoperability for Earth Science Data "Please download to view full document