Interoperable Web Services for Distributed Data Access and

Document Sample
Interoperable Web Services for Distributed Data Access and Powered By Docstoc
					Interoperable Web Services for Distributed Data
 Access and Analysis of Emissions Inventories
                              Stefan Falke
                                 stefan@wustl.edu
          Center for Air Pollution Impact and Trend Analysis (CAPITA)
        Department of Energy, Environmental and Chemical Engineering
                              Washington University
                                St. Louis, Missouri
                                       USA

                             Terry Keating
                          Keating.Terry@epa.gov
                     US Environmental Protection Agency
                          Office of Air & Radiation
                              Washington, DC,
                                     USA


                      GEIA 2006 Open Conference
                             Paris, France
                         November 30, 2006
Objectives: advance the implementation of the Networked Environmental
Information Systems for Global Emissions Inventories (NEISGEI), an US
EPA initiative to develop a web-based global air emissions inventory network to
provide
    • access to distributed emission inventory data at multi-spatial and temporal
    scales
    • tools for data processing and analysis
    • means for sharing data & tools
    • an environment for collaboration among researchers, regulators, policy
    analysts and interested public


Approach: Develop, test, and implement components of an air quality
cyberinfrastructure using the latest advances in information technology to make
multi-scale air emissions data and tools easier to find, use and integrate.


           An air emissions “cyberinfrastructure”
                             Cyberinfrastructure
  Cyberinfrastructure - information sciences and technologies used to build new
  types of scientific and engineering knowledge environments with the goal of pursuing
  research and management more effectively and efficiently.

“Contemporary projects require effective federation of both distributed resources (data and
facilities) and distributed, multidisciplinary expertise and cyberinfrastructure is a key to
making this possible.” - NSF Blue Ribbon Report on Cyberinfrastructure, 2003




                                                                     (Atkins, 2004)
                 Conceptual Diagram of an Emissions Cyberinfrastructure

                   Wrappers/                                                Users &
                   Adapters/        Data Catalogs                           Projects
      Data                                                 Mediators /
       XML         Standards                                Portals
                               GEIA/ACCENT    Geospatial
                                Data Portal   One-Stop
                                                                             Report
Emissions Inventories                                                       Generation



      Portals                                                             Data Analysis



   Activity Data
                                        Web                               Comparison
                                    Tools/Services                        of Emissions
                                                                             Methods
 Emissions Factors                             Spatial
                                  GIS
                                              Allocation                   Model
                                                                         Development
    Surrogates                 Estimation      Transport
                                Methods         Models
                  Networked Inventories Principles

Distributed/Federated. Data are shared but remain distributed and maintained by their
original inventory organizations. The data are dynamically accessed from multiple sources
through the Internet rather than collecting all emission data in a single repository.

Non-intrusive. The technologies needed to bring inventory nodes together in a
distributed network should not require substantial modifications by the emission inventory
organizations in order to participate. However, there will need to be some harmonization of
existing inventory data.

Transparent. From the emission inventory user’s perspective, the distributed data
should appear to originate from a single database. One interface to multiple data sets
should be possible without required special software or download onto the user’s
computer.

Flexible/Extendable (Interoperable). An emission data network should be designed
with the ability to easily incorporate new data and tools from new providers joining the
network so that they can be integrated with existing data and tools.
                          NEISGEI Web Portal


A community
resource providing
access to,
descriptions of, and
dialogues about
an array of content
and services for
exploring and
sharing
emissions data,
tools
and ideas.




Built using LifeRay, an
                           Accessible through http://www.neisgei.org
open-source portal
package
               Federated data system - DataFed
   The Data Federation is a web-based infrastructure for distributed data access
   and collaborative processing/analysis of air quality data. (Husar et al., 2004)

                                                              http://datafed.net

                                                                           NEISGEI is
                                                                           built on
                                                                           DataFed
                                                                           infrastructure
                                                                           and services.


50+ Datasets
                                                                 Export or connect to
                                                                 other web services
                           Geospatial Web Standards
Standards for finding, accessing, portraying, and processing geospatial data are defined
by the Open Geospatial Consortium (OGC).
    • Web Map Server (WMS) for exchanging map images, but the
    • Web Feature Service (WFS) retrieves discrete feature data (roads, political
    boundaries)
    • Web Coverage Service (WCS) allows access to multidimensional data that
    represent coverages, such as grids or point monitoring data
    • Sensor Observation Service (SOS) multidimensional access to measurement
    data
While these standards are based on the geospatial domain, many are designed to be extended to
support non-geographic data “dimensions,” such as time and the many other dimension tables
found in emissions inventories.
                                                                   Geospatial One-Stop
  Web Coverage Service (WCS)
              GetCoverage                  netCDF
 WCS Server                 WCS Client
              GeoTiff,HDF,
          netCDF,CSV,ASCII…


         GetCoverage Request
http://webapps.datafed.net/ogc_EPA.wsfl
?SERVICE=wcs
&REQUEST=GetCoverage
&VERSION=1.0.0
&CRS=EPSG:4326
&COVERAGE=EPA_CAMD_HOUR.SO2_MASS
&FORMAT=NetCDF-table
&BBOX=-82.4606,42.9258,-82.4606,42.9258,0,0
&TIME=2002-04-01T15:00:00Z/2002-04-30T15:00:00Z
&WIDTH=700
&HEIGHT=350
&DEPTH=99
                Using Standard Interfaces for Web Access
Current Process:
Data access without standard interfaces:
1)   Find data in Portal                              User
                                                                3.
2)   Download data                            2.           1.
3)   Reformat / “Wrapping”




                                                                                  WCS
                                                                       Multi-                      5.
4)   Repeat 1-3 for other datasets
                                                    Emissions           dim
5)   Browse, visualize, analyze                      Portal            cube
                                     RETRO
                                                                         DataFed

Possible Future Process:
Data access with standard interfaces:
1) Find data in Portal
2) Access through standard interfaces        WCS                       User
3) Browse, visualize, analyze
                                                                           1.
                                        2.
                                                   RETRO
                                             WCS




                                                                                        WCS
                                                                     Emissions                2.    3.
                                        2.                            Portal(s)
                                                   EDGAR
                                             WCS




                                        2.
                                                   GEIA
                   Multi-dimensional Browsing
   RETRO Biomass Burning Emissions




                                          August 2000
Source: RETRO



                      1960-2000 monthly




http://webapps.datafed.net/datafed.aspx?page=Emissions/RETRO
                         Custom Web Applications




GoogleMaps mashup using DataFed data access interfaces for browsing and visualizing a smoke
event in Idaho on September 12, 2006. The “standard” GoogleMaps application is augmented
with a HTML/Javascript table that accesses monitoring data through standard interfaces.
               http://niceguy.wustl.edu/EmissionsGoogleMaps/
                           Summary
Information technologies (particularly service oriented architectures
and web services) provide opportunities to realize benefits of
distributed databases using standardized interfaces

Distributed databases allow data to remain maintained by owner
         - dynamically updated (avoids versioning issues)
         - make connection once – always get latest and greatest

Standard interfaces foster networked activity and sharing of data and
tools through interoperability
         - simplify integration and analysis by moving the information
           technology details to the background

Federated inventories, datasets, models, analysis tools, portals
       - no “one-stop” can meet all user needs
       - faster progress through distributed, shared efforts