Service-Oriented Architecture of the Rapid Prototyping by malj


									             Rapid Prototyping Capabilities for Conducting Research
                              of Sun-Earth System
                                T. Haupt, A. Kalyanasundaram, I. Zhuk
                 High Performance Computing Collaboratory, Mississippi State University

Abstract: This paper describes the requirements,
design and implementation progress of an e-Science
environment to enable rapid evaluation of potential
uses of NASA research products and technologies to
improve future operational systems for societal
benefits. This project is intended to be a low-cost
effort focused on integrating existing open source,
public domain, and/or community developed software
components and tools. Critical for success is a
carefully designed implementation plan allowing for
incremental enhancement of the scale and
functionality of the system while maintaining an
operational the system and hardening its                          Figure 1: The RPC concept as an integration
implementation. This has been achieved by                         platform for composing, executing, and
rigorously following the principles of separation of              analyzing numerical experiments for Earth-Sun
concerns, loose coupling, and service oriented                    System Science supporting the location
architectures employing Portlet (GridSphere),                     transparency of resources.
Service Bus (ServiceMix), and Grid (Globus)
technologies, as well as introducing a new layer on
top of the THREDDS data server. At the current
                                                                knowledge of the Earth system through space-based
phase, the system provide data access through a data
                                                                observation, assimilation of new observations, and
explorer allowing the user to view the metadata and
                                                                development     and     deployment  of enabling
provenance of the datasets, invoke data
                                                                technologies, systems and capabilities including
transformations such as subsampling, reprojections,
                                                                those with potential to improve future operational
format translations, and de-clouding of selected data
sets or collections, as well as generate simulated data
sets approximating data feed from future NASA                   The infrastructure to support Rapid Prototyping
missions.                                                       Capabilities (RPC) is thus expected to provide the
                                                                capability to rapidly evaluate innovative methods of
1. Introduction                                                 linking science observations. To this end, the RPC
                                                                should provide the capability to integrate the tools
    1.1. Objectives of Rapid Prototyping                        needed to evaluate the use of a wide variety of
                                                                current and future NASA sensors and research
                                                                results, model outputs, and knowledge, collectively
The overall goal of the National Aeronautic and                 referred to as “resources”. It is assumed that the
Space Administration’s (NASA) initiative to create a            resources are geographically distributed and thus
Rapid Prototyping Capability (RPC) is to speed the              RPC will provide the support for the location
evaluation of potential uses of NASA research                   transparency of the resources.
products and technologies to improve future
operational systems by reducing the time to access,             This paper describes a particular implementation of a
configure, and assess the effectiveness of NASA                 RPC system under development by the Mississippi
products and technologies. The developed RPC                    Research Consortium, in particular Mississippi State
infrastructure will accomplish this goal and                    University, under a NASA/SSC contract as part of
contribute to NASA's Strategic Objective to advance             the NASA Applied Sciences Program. This is a work
scientific                                                      in progress, about one year from the inception of this

    1.2. RPC experiments

Results of NASA research (including NASA
partners) provide the basis for candidate solutions
that demonstrate the capacity to improve future
operational systems through activities administered
by NASA’s Applied Sciences Program. Successfully
extending NASA research results to operational
organizations requires science rigor and capacity
throughout the pathways from research to operations.
A framework for the extension of applied sciences
activities involves a Rapid Prototyping Capability       Figure2: Two major categories of experiments and
(RPC) to accelerate the evaluation of research results   subsequent analysis to be supported by RPC.
in an effort to identify candidate configurations for
                                                         In spite of being conceptually simple, two use cases
future benchmarking efforts. The results from the
                                                         defined in Fig. 2 in fact entail a significant technical
evaluation activity are verified and validated in
                                                         challenge. The barriers currently faced by the
candidate operational configurations through RPC
                                                         researchers include inadequate data access
experiments. The products of RPC studies will be
                                                         mechanisms, lack of simulated data approximating
archived and will be made accessible to all
                                                         feeds from sensors to be deployed by future NASA
customers, users and stakeholders via the internet
                                                         missions, a plethora of data formats and metadata
with a purpose of being utilized in competitively
                                                         systems, complex multi-step data pre-processing, and
selected experiments proposed by the applied
                                                         rigorous statistical analysis of results (comparisons
sciences community through NASA’s “Decisions”
                                                         between results obtained using different models
solicitation process [1].
                                                         and/or data).
Examples of currently funded RPC experiments
                                                         The data from NASA and other satellite missions are
(through the NASA grant to the Mississippi Research
                                                         distributed by Data Active Archive Centers (DAAC)
Consortium (MRC)) include: Rapid Prototyping of
                                                         operated by NASA and its partners. The primary
new NASA sensor data into the SEVIR system,
                                                         focus of DAACs is to feed post-processed data (e.g.,
Rapid prototyping of hyperspectral image analysis
                                                         calibrated, corrected for atmospheric effects, etc.) –
algorithms for improved invasive species decision
                                                         referred to as the data products – for operational use
support tools, an RPC evaluation of the watershed
                                                         by the US government and organizational users
modeling program HSPF to NASA existing data,
                                                         around the world. The access to the data by an
simulated future data streams, and model (LIS) data
                                                         individual researcher is currently cumbersome (the
products, and Evaluation of the NASA Land
                                                         requests are processed asynchronously, as the data in
Information System (LIS) using rapid prototyping
                                                         most cases are not readily available online), and the
                                                         pre-processing made by DAACs usually does not
                                                         meet the researcher’s needs. In particular, the purpose
2. System Requirements                                   of many RPC experiments is to define new pre-
The requirements for the infrastructure to support       processing procedures that, if successful, can be later
RPC experiments fall into two categories: (1) a          employed by DAACs to generate new data products.
computational platform seamlessly integrating
                                                         The pre-processing of the data takes many steps and
geographically distributed resources into a single
                                                         the steps to be performed depends on technical
system to perform RPC experiments and (2)
                                                         details of the sensor and the nature of the research.
collaborative environment for dissemination of
                                                         For the sake of brevity, only Moderate Resolution
research results enabling a peer-review process.
                                                         Imaging Spectroradiometer (MODIS) [2] data are
                                                         discussed here, as a representative example. MODIS
    2.1. Enabling RPC Experiments                        sensors are deployed on two platforms: Aqua and
The RPC is expected to support at least two major        Terra that are viewing the entire Earth's surface every
categories of experiments (and subsequent analysis):     1 to 2 days, acquiring data in 36 spectral bands. The
comparing results of a particular model as fed with      data (the planetary reflectance) is captured in
data coming from different sources, and comparing        swatches 2330 km (cross track) by 10 km (along
different models using the data coming from the          track at nadir). The postprocessing of MODIS data
same source, as depicted in Fig. 2.                      may involve the selection of the region of interest
                                                         (that may require combining several swatches taken

at different times, and possibly merging data from          recommendation for the NASA administrators to
Terra and Aqua), sub-sampling, re-projection, band          pursue or abandon the topic under investigation.
selection or computation of the vegetation or               Since making an evaluation requires a narrow
moisture indices by combining data from different           expertise in a given field (e.g., invasive species, crop
spectral bands, noise removal and de-clouding,              predictions, fire protection, etc.), the results presented
feature detection, correlation with GIS data and other.     by a particular research needs to be peer-reviewed.
The post-processed data are then fed into                   One way of doing that is publishing papers in
computational models and/or compared with in situ           professional journals and conferences. However, this
observations (changes in vegetation, changes in soil        introduces latency to the process, and the information
moisture, fires, etc.). Of particular interest for RPC      given in a paper is not always sufficient for
experiments currently performed by MRC is the time          conclusive evaluation of the research results. The
evolution of Normalized Difference Vegetation               proposed remedy is to provide means for publishing
Index (NDVI) defined as (NIR-RED)/(NIR+RED),                the results electronically – that is, providing the
where RED and NIR stand for the spectral                    community access not only to the final reports and
reflectance measurements acquired in the red, and           publication but also to the data used and/or produced
near-infrared regions, respectively.            Different   during the analysis, as well as providing access to
algorithms are being tested for eliminating gaps in         tools used to derive the conclusions of the evaluation.
the data caused by the cloud cover by fusing data           The intention is not to let the peer scientists to repeat
collected by Aqua and Terra and weighted spatial and        a complete experiment, which may involve
temporal interpolations. Finally, the comparison of         processing voluminous data on high-performance
data coming from different sources                  (and    systems, but rather to provide means for testing new
corresponding model predictions) require handling           procedures, tools, and final analysis developed in the
the differences in spatial and temporal resolutions,        course of performing the experiment.
differences in satellites orbits, differences in spectral
bands, and other sensor characteristics.                    3. Design Considerations
Enabling RPC experiments, in this context, means            The development of an RPC system satisfying all the
thus a radical simplification of access to both actual      requirements described above is an immense task.
and simulated data, as well as tools for data pre- and      Consequently, one of the most important design
post-processing. The tools must be interoperable            decisions was to prioritize the system features and
allowing the user to create computational workflows         select the sequence of actions that would lead
with the data seamlessly transferred as needed,             towards the implementation of the full functionality.
including third-party transfers to high-performance         Taking into account the particular needs of the
computing platforms. In addition, the provenance of         experiments carried on by MSC, the following
the data must be preserved in order to document             implementation roadmap has been agreed upon [3].
results of different what-if scenarios and to enable
                                                            Phase I: Interactive web site for describing the
collaboration and data sharing between users.
                                                            experiments and gathering feedback from the
The development of the RPC system does not involve          community. All experiments are performed outside
developing the tools for data processing. These tools       the RPC infrastructure.
are expected to be provided by the researchers
                                                            Phase II: RPC data server acting as a cache for
performing experiments, projects focused on the tool
                                                            experimental data (Unidata’s THREDDS server [4]).
development, and the community at large. Indeed,
                                                            In the prototype deployment a small amount (~6
many tools for handling Earth science data are
                                                            TBytes) of disk space is made available for the
available from different sources, including NASA,
                                                            experimenters with support for transfers of the data
USGS, NOAA, UCAR/Unidata, and numerous
                                                            between the RPC data server and a hierarchical
universities. Instead, the RPC system is expected to
                                                            storage facility at the High Performance Computing
be an integration platform supporting adding
                                                            Collaboratory (HPCC) at Mississippi State University
(“plugging in”) tools as needed.
                                                            via a 2 Mbytes/s link. The experiments obtain the
                                                            data from DAACs “the old way” (through
    2.2. Enabling Community-Wide Peer-                      asynchronous requests) and store them at HPCC, or
         Review Process                                     generate using computational models run on HPCC
The essence of the RPC process is to provide an             Linux clusters. Once transferred to the RPC data
evaluation of the feasibility of transferring research      server, the data sets are available online. This is a
capabilities into routine operations for societal           transitional step, and still the experiments are
benefits. The evaluation should result in a                 executed outside the RPC infrastructure. However,

since the data are online, they can be accessed by        perform interactively using web interfaces. Instead,
various standalone tools, such as Unidata’s Integrated    support for asynchronous (“batch”) processing is
Data Viewer (IDV) [5].                                    provided. The tools are still deployed as web
                                                          services; however, they delegate the execution to
Phase III:        Online tools for data processing
                                                          remote high-performance computational resources.
(“transformations”). The tools are deployed as web
                                                          The user selects the range of files (or a folder),
services and integrated with the RPC data server.
                                                          selects the transformation parameters and submits
Through a web interface, the user sets the
                                                          processing of all selected files by clicking a single
transformation parameters and selects the input data
                                                          submit button. Since the data pre-processing is
sets by browsing or searching the RPC data server.
                                                          usually embarrassingly parallel (the same operation is
The results of the transformations (together with the
                                                          repeated for each file or for each pixel across the
available provenance information) are stored back at
                                                          group of files in TSPT), the user automatically gains
the RPC data server at the location specified by the
                                                          by using the Portal, as the system seamlessly makes
user. The provenance information depends on the
                                                          all necessary data transfers and parallelizes the
tool, in some case it is just the input parameter files
                                                          execution. Since the batch execution is asynchronous,
and standard output, other tools generate additional
                                                          the Portal provides tools for monitoring the progress
log files and metadata. Since the THREDDS server
                                                          of the task. Furthermore, even very complex
“natively” handles data in the netCDF [6] format, the
                                                          computational models (as opposed to a relatively
primary focus is given to tools for transforming
                                                          simple data transformation tools) can be easily
NASA’s HDF-EOS [7] format (for example, MODIS
                                                          converted to a Web service, and thus all of the
data are distributed in this format), including HDF-
                                                          computational needs of the user can be satisfied
EOS to geoTIFF Conversion Tool (HEG) [8]
                                                          through the Portal. At this phase the user may
supporting reformatting, band selection, subsetting
                                                          actually perform the experiment using the RPC
and stitching , and MODIS re-projection tools
                                                          infrastructure, assuming that the input data sets are
(MRT[9] and MRTswath[10]). The second class of
                                                          “prefetched” to the RPC data server or HPCC storage
tools integrated with the RPC system is the
                                                          facility, all computational models are installed at
Applications Research Toolbox (ART) and the Time
                                                          HPCC systems, and all tools are wrapped as Web
Series Product Tool (TSPT), developed specially for
the RPC system by the Institute for Technology
Development (ITD) [11] located at the NASA                Phase V: The RPC system is deployed at NASA
Stennis Space Center. The ART tool is used for            Stennis Space Center, and it becomes a seed for a
generating           simulated         Visible/Infrared   Virtual Organization (VO). Each deployment comes
Imager/Radiometer Suite (VIIRS) data. VIIRS is a          with its own portal, creating a network of RPC points
part of the National Polar-orbiting Operational           of access. Each Portal deploys a different set of tools
Environmental Satellite System (NPOESS) program,          that are accessible through a distributed Service Bus.
and it is expected to be deployed in 2009. The VIIRS      Each site contributes storage and computational
data will replace MODIS. The TSPT tool generates          resources that are shared across the VO. In
layerstacks of various data products to assist in time    collaboration with DAACs the support for online
series analysis (including de-clouding). In particular,   access is developed.
TSPT operates on MODIS and simulated VIIRS data.
Finally, the RPC system integrates the Performance        4. Implementation
Metrics Workbench tools for data visualizations and
statistical analysis. These tools have been developed         4.1. Grid Portal
at the GeoResources Institute at Mississippi State
University and the Geoinformatics Center at the           The functionality of the RPC Portal naturally splits
University of Mississippi.        At this phase, the      into several independent modules such as interactive
experimenters can use the RPC portal for rapid            Web site, data server, tool’s interfaces, or monitoring
prototyping of experimental procedures using online,      service. Each such module is implemented as an
interactive tools on data uploaded to the RPC data        independent portlet [12]. The RPC Portal aggregates
server. Furthermore, the peer researchers can test the    the different contents provided by the portlets into a
proposed methods using the same data sets and the         single interface employing a popular GridSphere [13]
same tools.                                               open source portlet container. GridSphere, while
                                                          fully JSR-168 compliant, also provides out-of-the-
Phase IV: Support for batch processing. The actual        box support for user authentication and maintaining
data analysis needed to complete an experiment            user credentials (X509 certificates, MyProxy [14]),
usually requires processing huge volumes of data          vital when providing access to remote resources.
(e.g., a year’s worth of data). This is impractical to

Access to full functionality of the Portal, which         that experiment, while anyone can contribute to the
includes the right to modify the contents served by       blog area and participate in the discussion of the
the Portal, is granted only to the registered users who   experimental pages. In addition, each group is
must explicitly login to start a Portal session. To       associated with a private namespace – not accessible
access remote resources, in addition, the user must       to nonmembers at all – which enables collaborative
upload his or her certificate to the myProxy server       development of confidential contents.
associated with the Portal, using a portlet developed
by the GridSphere team. In phases II - IV of the RPC          4.3. Data Server
system deployment, the only remote resources              The science of the Sun-Sun system is notorious for
available to the RPC users are those provided by          collecting an incredible amount of observational data
HPCC. Remote access to the HPCC resources is              that come from different sources in a variety of
granted to registered users with certificates signed by   formats and with inconsistent and/or incomplete
the HPCC Certificate Authority (CA).                      metadata. The solution of the general problem of
Phase V of the deployment calls for establishing a        managing such data collections is a subject of
virtual organization allowing the users to access the     numerous research efforts and it goes far beyond the
resources made available by the VO participants,          scope of this project. Instead, for the purpose of the
including perhaps the NASA Columbia project and           RPC system it is desirable to adopt an existing
TeraGrid. To simplify the user task to obtain and         solution representing the community common
manage certificates, Grid Account Management              practice. Even though such solution is necessarily
Architecture (GAMA) [15] will be adopted. It              incomplete, by the virtue of being actually used by
remains to be determined what CA(s) will be               the researchers, it is useful enough and robust enough
recognized, though.                                       to be incorporated into the RPC infrastructure.
                                                          From the available open source candidates, Unidata’s
    4.2. Interactive Web Site                             THREDDS Data Server (TDS) has been selected and
 It is imperative for the PRC system to provide an        deployed as a portlet. In order to better integrate it
easy way for the experimenters to update the contents     with the other RPC Portal functionality, and in
of the web pages describing their experiments, in         particular, to provide user-friendly interfaces to the
particular, to avoid intermediaries such as a             data transformations, a thin layer of software on top
webmaster. A ready to use solution for this problem       of TDS – referred to as the TDS Explorer – has been
is Wiki - a collaborative website which can be            developed
directly edited by anyone with access to it [16]. From
several available open source implementations,                      4.3.1. THREDDS Data Server
Media Wiki [17] has been chosen for the RPC portal,       THREDDS (Thematic Real-time Environmental
as the RPC developers are impressed by the                Distributed Data Services) [4] is middleware
robustness of the implementation proven by the            developed to simplify the discovery and use of
multi-million pages Wikipedia [18].                       scientific data and to allow scientific publications and
Media Wiki is deployed as a portlet managed by            educational materials to reference scientific data.
GridSphere. The only (small) modification                 Catalogs are the heart of the THREDDS concept.
introduced to Media Wiki in order to integrate it with    They are XML documents that describe on-line
the RPC Portal is replacing the direct Media Wiki         datasets. Catalogs can contain arbitrary metadata.
login by automatic login for users who successfully       The THREDDS Catalog Generator produces
logged in to GridSphere. With this modification, by       THREDDS catalogs by scanning or crawling one or
a single login to the RPC Portal the user not only gets   more local or remote dataset collections. Catalogs
access to the RPC portlets and remote resources           can be generated periodically or on demand, using
accessible to RPC users (through automatic retrieval      configuration files that control what directories get
of the user certificate from the MyProxy server) but      scanned, and how the catalogs are created.
also she acquires the rights to modify the Wiki           THREDDS Data Server (TDS) actually serves the
contents.                                                 contents of the datasets, in addition to providing
The rights to modify the Wiki contents are group-         catalogs and metadata for them. The TDS uses the
based. Each group is associated with a namespace          Common Data Model to read datasets in various
and only members of the group can make                    formats, and serves them through OPeNDAP, OGC
modifications to the pages in the associated              Web Coverage Service, NetCDF subset, and bulk
namespace. For example, only participants of an RPC       HTTP file transfer services. The first three allow the
experiment can create and update pages describing         user to obtain subsets of the data, which is crucial for

large datasets. Unidata’s Common Data Model                        from a remote location using gridFTP (until
(CDM) is a common API for many types of data                       phase V of the deployment, only HPCC
including OPeNDAP, netCDF, HDF5, GRIB 1 and 2,                     storage facility). In either case select the
BUFR, NEXRAD, and GINI. A pluggable                                destination collection, from menu select
framework allows other developers to add readers for               option “uploadHTTP” or “uploadGridFTP”
their own specialized formats.                                     , select files(s) in the file chooser popup, and
                                                                   click OK (or cancel).
Out-of-the-box TDS provide most functionality
needed to support data sets commonly used in                      Renaming and deleting datasets and
climatology applications (e.g., weather forecasting,               collections: select dataset or data collection
climate change) and GIS applications because of the                and use the corresponding option in the
supported file formats. It is possible to create CDM-              menu
based modules to handle other data formats, in
particular, HDF4-EOS that is critical for many RPC                Downloading the data either to the user
experiments, however, that would possibly lead to                  desktop using HTTP or transferring it to a
loss of the metadata information embedded in HDF                   remote location using gridFTP.
headers. Furthermore, while TDS provides support                  Displaying the provenance of a dataset. By
for subsetting CDM-based data sets, it does not allow              choosing this option, the list of files is
for other operations, often performed on HDF4-EOS                  displayed (instead of the TDS catalog page)
data, such as re-projections. To minimize                          that were generated when creating the
modifications and extensions to TDS needed to                      datasets, if any. Typically, the provenance
integrate it with RPC infrastructure, the new                      files are generated when an RPC tool is used
functionality needed for RPC is developed as a                     to create a dataset, and the list may include
separate package (a web application) that acts as an               the standard output, the input parameter file,
intermediary between the user interface and TDS.                   a log file, a metadata record, or other
The requests for services that can be rendered by                  depending on the tool.
TDS are forwarded to TDS, while the others are
handled by the intermediary: the TDS Explorer.                    Invoke tools for the selected fileset(s) or
                                                                   collection. Some tools operate on a single
          4.3.2. TDS Explorer                                      dataset (e.g., multispectral viewer, other
                                                                   may be invoked for several datasets (e.g.
The TDS native interface allows browsing the TDS
                                                                   HEG tool), yet others operate on data
catalog one page at a time, which makes the
                                                                   collections (e.g., TSPT). The tools GUI pop
navigation of a hierarchical catalog a tedious process.
                                                                   up as new iframes. The tools are described
To remedy that, a new interface inspired by the
                                                                   in Section 4 below.
familiar Microsoft File Explorer (MSFE) has been
developed. The structure of the catalog is now             The user interface of the TDS Explorer is
represented as an expandable tree in a narrow pane         implemented using JavaScript, including AJAX. The
(iframe) on the left hand side of the screen. The          server side functionality is a web application using
selection of a tree node or leave results in displaying    JSP technology. The file operations (upload, delete,
the catalog page of the corresponding data collection      rename) are performed directly on the file system.
or data set, respectively, in an iframe occupying the      The changes in the file system are propagated to the
rest of the screen. The TDS explore not only makes         TDS Explorer tree by forcing TDS to recreate the
the navigation of the data repository more efficient, it   catalog by rescanning the file system (with
also simplifies the development of interfaces to other     optimizations prohibiting rescanning folders that
services not natively supported by TDS. Among the          have not changed). Finally, the TDS explorer web
services are:                                              application invokes TDS API for translating datasets
                                                           logical names into physical URLs, as needed.
        Creating containers for new data collections
         and uploading new data sets. The creation of
         a new container is analogous to creating a            4.4. RPC Tools
         new folder in MSFE: select a parent folder,       From the perspective of the integration, there are
         from menu select option “new collection”,         three types of tools. One type is standalone tools
         in a pop-up iframe type in the name of the        capable of connecting to the RPC data server to
         new collection and click OK (or cancel).          browse and select the dataset but otherwise
         There are two modes of uploading files:           performing all operations independently of the RPC
         from the user workstation using HTTP and          infrastructure. The Unidata IDV is an example of
                                                           such a tool. Obviously, such tools require no support

from the RPC infrastructure except for making the           unnecessary clutter in the TDS catalog. Using the
RPC data server conform to accepted standards (such         TDS explorer the user navigates only the actual
as DODS). One of the advantages of TDS is that it           datasets. If the provenance information is needed, the
supports many of the commonly used data protocols,          TDS explorer recreates the hash from the dataset
and consequently, the data served by the RPC data           URL and shows the contents of that directory
server may be accessed by many existing community           providing the user with access to all files there.
developed tools, immediately enhancing the
                                                            Finally, the data viewers and statistical analysis tools
functionality of the RPC system.
                                                            do not produce new datasets. In this regard, they are
The second type of tools is transformations that take       similar to standalone tools. The advantage of
a dataset or a collection as an input, and output the       integrating them with the RPC infrastructure is that
transformed files. Examples of such transformations         the data can be preprocessed on the server side
are HEG, MRT, ART, and TSPT. They come with a               reducing the volume of the necessary data transfers.
command line interface (a MatLab executable in the          Because of the interactivity and rich functionality
case of ART and TSPT), and are controlled by a              (visualizations), they are implemented as Java
parameter input file. The integration of such tools         applets.
with the RPC infrastructure is made in two steps.
First, the Web-based GUI is developed (using                5. Summary
JavaScript and Ajax as needed to create lightweight
                                                            This paper describes the requirements, design and
but rich interfaces) to produce the parameter file. The
                                                            implementation      progress    of     an    e-Science
GUI is integrated with the TDS explorer to simplify
                                                            environment to enable rapid evaluation of innovative
the user task of selecting the input files and defining
                                                            methods of processing science observations, in
the destination of the output files. The other step is to
                                                            particular data gathered by sensors deployed on
install the executable on the system of choice and
                                                            NASA-launched satellites. This project is intended to
convert it to a service. To this end open source
                                                            be a low-cost effort focused on integrating existing
ServiceMix [19] that implements the JSR-208 Java
                                                            open source, public domain, and/or community
Business Integration Specification [20] is used;
                                                            developed software components and tools. Critical
implementations of JBI are usually referred to as
                                                            for success is a carefully designed implementation
“Service Bus”. Depending on the user chosen target
                                                            plan allowing for incremental enhancement of the
machine the service forks a new process on one of
                                                            functionality of the system, including incorporating
the servers associated with the RPC system, or
                                                            new tools per user requests, while maintaining an
submits the job on the remote machine using Globus
                                                            operational system and hardening its implementation.
GRAM[21]. In the case of remote submission, the
                                                            This has been achieved by rigorously following the
service registers itself as the listener of GRAM job
                                                            principles of separation of concerns, loose coupling,
status change notifications. The notifications are
                                                            and service oriented architectures employing Portlet
forwarded to a Job Monitoring Service (JMS). JMS
                                                            (GridSphere), Service Bus (ServiceMix), and Grid
stores the information on the jobs in a database
                                                            (Globus) technologies, as well as introducing a new
(mySQL). A separate RPC portlet provides the user
                                                            layer on top of the THREDDS data server (TDS
interface for querying the status of all jobs submitted
                                                            Explorer). At the time of writing this paper, the
by the user. The request for a job submission (local or
                                                            implementation is well into phase IV, while
remote) contains an XML job descriptor that
                                                            continuing to add new tools. The already deployed
specifies all information needed to submit the job: the
                                                            tools allow for subsampling, reprojections, format
location of the executable, values of environmental
                                                            translations, and de-clouding of selected data sets and
variables, files to be staged out and in, etc.
                                                            collections, as well as for generating simulated
Consequently, the same ServiceMix service is used to
                                                            VIIRS data approximating data feeds from future
submit any job with the job descriptor generated by
                                                            NASA missions.
the transformation GUI (or supporting JSP page).
Furthermore, a new working directory is created for         References
each instance of a job. Once the job completes, the
result of the transformation is transferred to the TDS      [1] NASA Science Mission Directorate, Applied Sciences
                                                            Program. Rapid Prototyping Capability (RPC) Guidelines
server to the location specified by the user, while
                                                            and Implementation Plan,
“byproducts” such as standard output and log files, if
created, are transparently moved to a location              01_07.doc
specified by the RPC server: a folder with the name
created automatically by hashing the physical URL of        [2]
the transformation result. This approach eliminates

[3] T. Haupt and R. Moorhead, “The Requirements and
Design of the Rapid Prototyping Capabilities System”,
2006 Fall Meeting of the American Geophysical Union,
San Francisco, USA, December 2006.
[12] JSR-168 Portlet Specification,   
[14] MyProxy Credential Management                   Service,
[15] K. Bhatia, S. Chandra, K. Mueller, "GAMA: Grid
Account Management Architecture," First International
Conference on e-Science and Grid Computing (e-
Science'05), pp. 413-420, 2005.
[16]     Howard        G.     "Ward"            Cunningham,


To top