Project description 1 Background
The research groups and institutions in this proposal are pioneers in developing sophisticated planetary modeling, monitoring and forecasting tools [1-8]. In the last five years these groups have also been focusing attention on the area of cross-organization integration of computational resources. In this arena two major national efforts, ESMF[9] and SERVO[10] are led by members our group. Both these projects are at the cutting edge of the intersection between domain science and advanced cyberinfrastructure. The ESMF project[11] has designed and is currently implementing (and field testing) a standard software platform that application developers and software engineers can use to create shareable high-performance components for use in coupled oceanic, atmospheric, cryosphere and biosphere modeling. The SERVO[12, 13] project has implemented a Grid/web-service standards based system that allows solid-earth researchers virtualized access (via web portals[14]) to distributed database assets, seismic modeling assets and distributed hardware resources and provides mechanism to dynamically configure new assemblies of these pieces.
1.1
The scope of ESMF
The ESMF project is a major NASA Computational Technologies funded initiative in the US Earth system modeling community. It grew out of a shared desire in the climate and weather communities[15] for a common modeling infrastructure that would make it possible to (1) share simulation codes between development groups in different institutions, without overly restricting innovation (2) share costs of maintaining and extending high-performance, scalable, performance-portable support software that all climate and weather codes require; standardizing and extending proprietary software infrastructure work of several key groups[16-21] Figure 1 Illustrations from some of the models adopting the Earth System Modeling Framework. Currently these models are computationally incompatible, by adopting ESMF the components of these models will be able to interoperate.
In addressing these goals, the focus of effort under ESMF has been on a developer level toolkit that supports interoperable, modular Earth science components. Strong emphasis has been placed on defining (and implementing) a unified high-performance interface standard, applicable to a broad span of codes, for connecting modeling components that (1) perform tightly coupled parallel simulations (with component-component switching times in the microsecond range), (2) represent PDE solvers that are discretized at different physical and temporal resolutions (in coupled climate simulations sub-components of the system such as sea-ice models and ocean models are often integrated on different physical grids) and (3) are created by distributed development teams. As illustrated in Figure 1, the resulting ESMF architecture [9] and implementation is currently being adopted in field tests at many sites. This includes: adoption of ESMF into the primary national and global weather forecast codes [22, 23] from NOAA National Center for Environmental Prediction (NCEP,[24]); adoption of ESMF by the assimilation and prediction codes [2, 8] at the NASA Global Modeling and Assimilation Office (GMAO); adoption of ESMF into coupled modeling systems[1, 25, 26, 16, 27] of the two main institutions (GFDL and NCAR) responsible for the US climate forecasts for the Intergovernmental Panel on Climate Change (IPCC[28]).
1
1.2
The scope of SERVO
The SERVO project encompasses a) the improvement of several earthquake simulation codes using high performance computing techniques to enable large scale modeling; b) the development of XML-based data models and the deployment of relational and XML databases for managing earth modeling information; c) the development of Grid/Web Service infrastructure to provide the glue for connecting application services, visualization services, and data sources on distributed resources; and d) the development of component-based computing web portals to provide user access to SERVO services. Cyberinfrastructure work under SERVO, illustrated in Error! Reference source not found.Error! Reference source not found., emphasizes aggregating coarse-grained computational resources (such as simulation codes, databases, hardware platforms and visualization services) through Grid/web-services and providing user-level frontend portals (based largely on OGCE building blocks[29, 30]) to make interconnection and steering of complex composite systems manageable. Users interact with application-specific portal front-ends (bottom 2/3 of Figure2) for launching and managing applications. The portal can also be used to automate code execution, file output staging, and high end parallel visualization services. The top bar of Figure 2 shows (left to right) a GeoFEST calculation for the Northridge, CA fault, VirtualCaliforia model results for sixty interacting California faults, and a GeoFEST simulation of the Landers, CA fault system. SERVO results so far have included the development of interfaces that provide integrated portal access to seismic simulation tools and paleoseismic datasets, GPS measurements and geodetic analyses, InSAR and seismicity datasets. SERVO is based on standards based webservice technologies such as SOAP and HTTP protocols and the web-service definition language (WSDL).
Figure 2 QuakeSim portal (bottom) can be used to launch remote applications and visualization services. GMT services provide quick analysis; MPEG movies of GeoFEST (top left and top right) and Virtual California (top middle) are made using RIVA and can be integrated with SERVO services. See http://pat.jpl.nasa.gov/public/RIVA/images.html.
2
1.3
Leveraging ESMF, SERVO and NMI.
We propose to leverage ESMF SERVO and NMI development. Table 1 highlights some key characteristics of the ESMF and SERVO efforts. Our proposal will deploy a set of services that combines and extends the ESMF and SERVO efforts, with significant potential benefits for technology and science agendas in planetary scale modeling. Table 1 Table listing key characteristics of the SERVO and ESMF systems. ESMF Focus on climate and weather. High-performance, developer oriented framework. Support for tightly coupled components. A broad span of numerical components. A broad community of component developers from NSF, DOE, NOAA, NASA and universities1. SERVO Focus on seismic modeling. Service based, distributed components User oriented environments Built in information management Built in standards based (web-services compliant) remote invocation. Integration of plotting and database assets. Grid based security and authentication.
1.3.1 Potential for shared support tools Grid service enabling ESMF components give these components access visualization tools that support standards based Grid services (see SERVO examples in section 4.2) and data repositories and libraries that are adopting portal, Grid and web-service paradigms (see for example [31-36]). At the same time our project will make modeling and assimilation technology currently under ESMF potentially accessible to the solid earth community. 1.3.2 Potential for shared modeling technology From a domain technology perspective planetary scale solid earth modeling and planetary scale weather and climate modeling have technical overlaps that remain largely unexplored. For example the equation sets and numerical tools for mantle convection and oceanic convection in the presence of rotation are very similar, and equations and algorithms for deformation of tectonic plates under stress and for sea-ice sheet dynamics are similar. It is therefore possible to imagine that a Grid service from the fluid earth domain could in fact be of use in solid earth modeling and vice-versa. 1.3.3 Potential for shared assimilation technology Both solid earth and fluid earth research communities are actively exploring increasingly sophisticated data assimilation techniques to provide planetary scale state-estimates of plate boundary movements [37], ocean climate[38]. Many common issues of non-linear optimization of problems of high dimensionality are being tackled and the possibility of shared services for aspects of this work is real. For example ensemble filter approaches being used in ocean assimilation and in plate boundary studies have a common need for scalable tools to calculate covariances in high-dimensional systems. Similarly adjoint based approaches have been developed in both communities ([39, 40]). General optimization tools, available as services for large problem sizes, would be of direct benefit for both communities. 1.3.4 Potential for combined geodetic models In the area of planetary science our proposed work would lay the foundation for integrated modeling activities that span the solid earth and weather and climate disciplines. A clear role for this capability is in deconvolving signals detected by satellite gravity missions such as GRACE[41] that measure the variations in the Earth‟s gravitational field The minute gravity field variations that GRACE can detect are due to a mix of planetary processes (plate
1
The Earth System Modeling Framework design and development team includes representatives from federally funded laboratories (including GSFC, JPL, Argonne National Laboratory, Los Alamos National Laboratory, NCAR, GFDL, NCEP) and from the university research community (including MIT, Princeton University, UCLA, University of Michigan). It spans groups that execute weather modeling operationally to groups developing next generation climate simulation codes.
3
movements, ocean and atmospheric mass interchange and redistribution etc…). There are already studies showing that interpretation of gravity measurements requires considering both solid earth and fluid earth processes[42, 43].
2
Proposal outline
We call our proposed effort PSANDA (Planetary Simulation And Data Assimilation System). Under our proposal, high quality PSANDA “workbenches” would be constructed and deployed for a number of computational Earth science problems, we describe the planned workbenches in section 3.
2.1
Workbench role
Each workbench would transparently interconnect suites of high-end hardware resources at geographically dispersed locations (see section 4.5) that would be available to the workbench through Grid services. Workbenches would access existing services representing large repositories of data, such as those derived from remote sensing missions and from state estimation activities (for example data from the GRACE[41] and TOPEX/POSEIDON[44] satellite missions, data from geodetic networks[45], data from ocean state estimates[46]), data from earthquake faults and seismic events [REF], and demonstrate how to connect these to simulation, assimilation and analysis tools through standards based Grid/web-service protocols. The workbenches we will develop serve several purposes: (1) the workbenches provide practical demonstrations of how web-service enabling ESMF has the result of making simulation and assimilation components using that framework web-service ready, (2) the workbenches allow us to research the sharing of data management, visualization and numerical simulation and assimilation tools between the solid earth and fluid earth planetary modeling and assimilation communities, (3) the workbenches will be suited to adoption into production research and operational workflows of the proposal participants, enabling them to capitalize on service oriented technologies. The span of areas covered by the planned workbenches is sufficiently broad that it will guarantee that the generic layers in our workbench software are flexible enough to accommodate many future earth science applications.
2.2
Technical innovation
The PSANDA workbenches will interact with and interconnect ESMF components (and other services) using Grid/web-serive technologies. However, we will not “web-service-enable” each of these codes separately, instead we propose to web-service-enable ESMF. This will, implicitly, “web-service-enable” all the codes and software packages that are adopting or that are included within ESMF. We describe the technical basis for this strategy in section 4.
2.3
Long term benefits: A sustainable approach to improving accessibility of high-end planetary scale modeling
The modeling groups involved in this proposal have all developed extremely capable simulation and analysis systems for examining planetary scale dynamics. For the most part the technical learning curve required to make use of these tools remains extremely steep. Past efforts at making these tools more accessible to people with appropriate domain expertise have often been frustrated by the costs of ongoing maintenance and by the lack of open-standards for decoupling the complex back-end elements of these advanced systems from more comprehensible component level abstractions. Developments under NMI, and broader developments in the web-services arena, present an opportunity to address the issue of ease-of-use in a truly new way that holds significant promise. A successful series of deployments of PSANDA workbenches would not only impact productivity of core groups developing the next generation of planetary scale monitoring, analysis and prediction systems, it would also provide the means for many more groups to gain access to these systems. This would lead to far greater awareness of the capabilities of advanced planetary scale modeling systems and to broader application in basic research, teaching and applied scenarios. The project we are proposing involves codes that are already openly available, together with support software that is or will be developed under open-source practices and that will adhere to open standards based principles. A significant barrier to making these codes available as services is the initial development costs and the overhead of ongoing maintenance and support. The approach we propose here can dramatically lower both these costs, making viable the availability of services for ocean modeling, atmosphere modeling, sea-ice modeling, land-surface modeling, solid-earth modeling, data assimilation and general constrained optimization, including the services listed in Table 2, not just as technology demonstrations but as sustainable production quality resources. Area of focus Services Created Description
4
Area of focus Coupled climate simulator
Downstream applications for high-resolution state estimates.
Geodesy
Services Created Coupled climate simulation, including the GFDL CM2 system used for IPCC work. Climate system component assembly tool. Climate component (atmosphere, ocean, sea-ice, land-surface) simulations Relocatable regional transport simulation. Biogeochemical and ecosystem simulation. Geospatial regridding services. GPS data server Non-linear optimization.
Description Workbench accessible services will be created that allows the simulation and analysis of climate evolution in coupled atmosphere, ocean, land and sea-ice models under user defined forcing and parametric settings and with a user definable set of simulation components (see section 3.2.1). Workbench accessible services will be created to allow post-processing and transport modeling using high-resolution state estimates of the ocean circulation and sea-ice (see section 3.2.2). Workbench accessible services will be created that allow customizable processing for user selected regions of geodetic observations (see section 3.2.3).
Geo 2
Geo 2
Table 2 Some of the services that will be made available as part of the PSANDA project. As we explore techniques for partitioning codes, the list of services will almost certainly grow.
3
The PSANDA workbenches
To illustrate our goal we describe two scenarios that PSANDA workbenches will support. The first example describes a use case scenario where a workbench would be useful for high-end ocean biogeochemistry, the second example describes a workbench scenario in support of advanced seismic analysis: Example 1. A graduate student has developed a numerical algorithm for modeling a key biogeochemical process such as iron cycling in the ocean. The student now wishes to assess the skill of the numerical scheme and use the scheme to make estimates of the role of ocean iron distribution in the large scale carbon cycle. Working with a PSANDA workbench the student would be able to incorporate their numerical scheme into a high-resolution state estimate of the three-dimensional, time evolving state of the global oceans during the 1990‟s. They would then be able to use facilities provided by PSANDA to interconnect their time dependent simulation with other biogeochemical components and make estimates of the impact of iron cycling on ocean CO2 uptake. All this would be possible from within an apparently integrated environment. The PSANDA workbench would make the complex mix of hardware and software assets that are required to make this work possible appear to behave like a turnkey system, conceived for solving the student‟s problem. Example 2. A researcher at an Earthscope[45] sponsored field station wishes to conduct an advanced analysis of local GPS data in a manner that exploits regional insight into the underlying refenrence frame model. Working from a laptop and using one of the proposed PSANDA workbenches the researcher would be able to select custom processing of selected time series of GPS station data with different sets of user defined assumptions about errors in station data. The researcher would then be able to initiate one or more analyses of the time series and have the resulting plate movement estimates automatically plotted and compared. The time-series analysis would occur remotely on a parallel computing resource without any detailed configuration or control to be developed by the researcher. The resulting analysis would be available in a variety of forms and could be archived together with references to the source data and processing procedure for others to share. These examples illustrate how the work we are proposing would demonstrate how NMI and Grid services could dramatically streamline and compress computational science endeavors that are possible today, but that more often than not are considered barely viable due to the overwhelming technical demands.
5
3.1
What is required (and what is not required)
The key tools to carry out the two examples above (and many others like it) already exist today. However deploying those tools to provide an end to end solution is much more difficult than it should be because unifying platforms into which such tools could, in theory, fit simply do not exist. Attempting either of the projects sketched above would involve working with (and learning technical workflow details about) multiple different numerical simulation tools, analysis tools, hardware platforms and, in some cases, manipulating enormous (1TB or more) datasets held in proprietary formats (and stored in different locations) - manually locating and assembling all the software and hardware assets they require along the way. In principle, if all the pieces of the system were accessible from a coherent workbench, as pre-composed collections of components and services within an NMI and Grid based system of systems, end-to-end assembly could be reduced to an almost trivial task. Our proposal does not require the generation of new simulation tools or the development of brand new technologies. Rather what we are proposing capitalizes on emerging grid technologies from projects such as NMI and on existing flagship software and hardware systems, deploying and integrating all these pieces to fit within a virtualized service model.
3.2
Target problems
We will target a number of high-profile science areas that are of interest to the project participants and to the broader climate, weather and ocean community. Having “productivity workbenches” in these areas will be of high impact. In each area we will create PSANDA workbenches that are fully functional and suitable for adoption into production workflows of the groups involved. In addition to these functioning workbenches, our end products will include glue software and standards that are used to fit our software and hardware into an NMI Grid. This glue software will be common to all PSANDA workbenches and will make extensive use of NMI technologies such as those available from the GRIDS Center [REF] and OGCE[29]. Like the workbench applications, the glue software will be publicly available as will demonstration workbenches that can be easily replicated elsewhere and that we will make accessible to a broader community of researchers. We plan to create four production quality workbenches in all. Each workbench will focus on specific sub-areas within the domain of computational Earth science. The component pieces for each workbench already exist, however the glue software and associated sub-area specific client will be designed and deployed during the lifetime of our project. The services supported by the workbenches will be accessed solely (and exclusively) through Grid service based clients such as browsers and interactive modeling and simulation systems like Matlab and the CCA builder services client. We now outline each of the four proposed workbenches: 3.2.1 The COUPLE workbench: Climate simulation workbench This workbench will provide a front-end that can be used to drive any ESMF component as a service.from a portal that allows monitoring and visualization of the components state as the component runs. As part of the ESMF project a set of so-called “interoperability experiments” is being created that will show case the ESMF technology. Our Grid/web-service enabling approach will allow us to create a workbench that can run any of these experiments and allow the outputs from these experiments to be visualized. This workbench would also allow the driving of coupled climate tools from GFDL, MIT and GSFC (shown in Figure 1) as web-services evaluating the CCA GUI[47] as a desktop portal for specifying component interconnections. The workbench will support execution of the coupled systems at different sites and with interchangeable sets of forcing and parametric data. Pre-existing ESMF components that regrid fields between the constituent models that make up these coupled systems will be made available as services. This will make it possible to interconnect constituent components from different coupled systems. Plotting and visualization services will be integrated into this workbench. 3.2.2 The HIGHRES workbench: High-resolution state-estimation product use workbench The workbench will be used to support downstream analysis of high-resolution multi-terabyte outputs from the Estimating the Climate and Circulation of the Ocean project[46]. A geographically relocatable transport model service that can embed chemistry and biogeochemistry in a fluid model will be deployed together with graphics and analysis tools.
6
Figure 3 The HIGHRES workbench would support relocatable biogeochemical simulations using outputs from the ECCO high-resolution state estimation initiative. A transport model could located anywhere in the simulation (represented by the white box) and dynamical and biogeochemical simulations could be made for that area using global high resolution fields for boundary conditions.
3.2.3
The GEOD workbench: Customizable, regional geodesy workbench
Words from Tom here
3.2.4 The VC workbench: Geo workbench scenario 2
Words from Jay here
Virtual California is a code that utilizes the Monte Carlo method in order to generate simulated, realistic earthquakes on an arbitrary fault surface mesh. It uses topologically realistic networks of independent fault segments that are mediated by elastic interactions. These segments can be designed to represent fault systems spanning the region of California. I think we are converging on the following concepts for the VC assimilation workbench: -Provide a substantial (parallel processing) web service environment for setting up and running Virtual California -Provide a rich interface for comparing real-world data to simulation features, particularly GPS, inSAR, and seismicity (historic catalogs and recent events). -Develop Grid interface and algorithms to use the data to steer the simulation and develop a data assimilation operational facility. Determine if Kalman filter methods, GA or other methods work best for this.
4
Technical Strategy
A novel and innovative aspect of our proposal is our plan to leverage extensively the interface standard provided by ESMF as a means to seamlessly clothe existing ESMF components as web-services. By making ESMF web-service aware our approach will allow us to treat any ESMF conforming component as a web-service. Because the interface standard that ESMF defines is currently being adopted by a number of major climate and weather codes, PSANDA would automatically (and transparently) provide a fully functional starting point for a broad community of Earth system model developers to adopt their applications to a service oriented programming paradigm. In section 4.1 below we outline how this will be achieved technically. Following the approach in SERVO we will align our service oriented “clothing” with OGSA (ref: ) complying forms of commercially supported service protocols (WSDL, SOAP, HTTP). This allows us to leverage, where appropriate, the significant business and commerce investments in production quality, economically sustainable software infrastructure. In section 4.2 below we describe the technologies developed in SERVO to date and lay-out how these would be applied to develop the proposed PSANDA workbenches.
7
Combining ESMF and SERVO developed technologies enables us to develop maintainable production quality systems with high-impact with relatively modest resources. The services to be integrated in PSANDA workbenches include simulation tools (atmosphere, ocean, seismic, biogeochemical codes), hardware platforms and massive datasets that represent planetary scale observations and state estimates. These services reside at numerous different locations, although in many cases location will be transparent to users of PSANDA workbenches. Different services would have radically different functions. We would use the emerging Grid and NMI “service description” standards to provide abstract representations of each service in terms of its external interfaces and its requirements. We would leverage work in the SERVO project - that has already begun down this path for some seismic scenarios - and work in the ESMF project that is attaching standard interfaces to an array of atmosphere, ocean and climate codes.
4.1
Leveraging ESMF
In a user code that has adopted ESMF, the data and control flows into and out of that code pass through ESMF code. In this project we will make the parts of ESMF that handle data-flow into and out of user code that has adopted ESMF web-service aware. This will provide the basis for codes that have adopted ESMF to become web-service compatible without requiring changes to the inner working of those codes. The areas in ESMF that handle control and data flows into and out of user code are the ESMF component interface and the ESMF I/O and configuration attribute classes. These pieces of ESMF form part of the ESMF “sandwich” architecture shown in Figure 4. The component interface is part of the ESMF Superstructure layer. The I/O and configuration attribute classes are part of the ESMF Infrastructure layer. User code, such as an ocean simulation component or a component that maps between two different discrete grids, sits in between the Superstructure layer and the Infrastructure layer. Figure 4 The ESMF "sandwich" architecture in which user written code sits between and upper level (the ESMF Superstructure Layer) and a lower level (the ESMF Infrastructure layer). System software is behind the Infrastructure layer and both User Code and the Superstructure layer code make calls to the Infrastructure layer.
We briefly describe the ESMF component interface and the ESMF I/O and configuration attribute layers below and then explain how they will be adapted to become web-service aware. 4.1.1 The ESMF component interface Codes such as an atmospheric model or a data assimilation scheme become components within ESMF by supporting the ESMF component interface standard. The interface standard defines a small set of generic functions that can be used to drive a component. This set of functions is used to advance an ESMF component through its life-cycle. A component is driven by another ESMF component that is its “parent”. Components can be arrayed in a tree structure so that a parent component may have many children and may itself be a child of a component above it in the tree. Data flows between parent and child components are contained in a standard, extensible data-type called an ESMF State. The generic ESMF component interface functions accept input data contained in an ESMF State, this data is called the component “import state”, and return output data contained in another ESMF State, this data is called the component “export state”. This data flow is illustrated in Figure 5 by the arrows labeled 1 and 2 (showing data-flow
8
into a component passing through ESMF code) and by the arrows labeled 7 and 8 (showing data-flow out of a component again passing through ESMF code). Figure 5 The standard ESMF architecture with key steps (1-8) of a components lifecycle highlighted.
4.1.2 The ESMF infrastructure layer Two other parts of ESMF also handle data-flow into and out of components. These pieces handle reading of configuration parameters (from text files) and reading of large datasets of numeric and other information.(from raw binary or from scientific data formats such as netCDF and HDF) The parts reside in the ESMF Configuration Attributes (CATTR) and ESMF I/O layers. Unlike the ESMF component interface, these parts are called from within an executing component. The arrows labeled 3 and 4 and the arrows labeled 5 and 6 show how these parts of also involve user code passing through ESMF code. 4.1.3 Grid/web service enabling ESMF The ESMF component interface, I/O and configuration attribute classes and the ESMF State type provide a natural way to connect to Grid/web services. Under this proposal we will deploy ESMF in a Grid/web-service enabled form allowing any ESMF component to be controlled as a web-service as well via a function call within a single program. This will allow ESMF components to be directed from remote clients (portals and other services) when desired. Figure 6 illustrates an augmented ESMF that supports the arrows labeled 1 * and 8*. These arrows represent messages from or to Grid/web-services that are directed and the ESMF component interface generic functions. These messages will use standard Grid/web-service protocols and data transports and so the ESMF component interface layer will be extended through this proposal to support decoding and encoding of such messages in order to map them to the internal ESMF State type. The resulting ESMF State type is passed across the interfaces represented by the arrows labeled 2 and 7 just as in Figure 5. In the same manner the arrows labeled 4* and 6* represent standard protocol Grid/web-service messages that must be translated to and from the ESMF types passed across the interfaces represented by the arrows labeled 3 and 4 in Figure 5 and in Figure 6.
9
Figure 6 The ESMF architecture augmented to support Grid/web-services. Only the ESMF library needs to be changed to support Grid/webservice control of an ESMF component. The user code of the component can remain unchanged.
The approach we have described will allow ESMF components to function within a service based system without changing the ESMF component. When complete this will allow interconnection of ESMF components with other web-services for I/O purposes and for program data flow purposes. In this way we will, for example, be able to attach ESMF components to databases, pass ESMF component export states to visualization tools and control the operation of an ESMF component from a remote workbench. This will make ESMF components, including components used for constrained optimization, and SERVO services able to interoperate.
4.2
SERVOGrid technology
The SERVOGrid project is building a pure Web Service-based Grid to bind together earthquake modeling and forecasting codes with distributed databases and visualization services. All services are accessed through an OGCEcompatible portal system (Figure 2). SERVOGrid is an interdisciplinary endeavor, with a team composed of geologists, geophysicists, and computer scientists specializing in databases, high performance computing, and grid technologies. Partner institutions include JPL, the University of Southern California, Brown University, University fo California (Irvine and Davis) and Indiana University. SERVO‟s international partners, interacting through ACES http://www.aces.org.au/), include universities and government agencies in Australia, Japan, and China. SERVOGrid supports a wide range of deployed scientific applications, including finite element codes (GeoFEST), Monte Carlo applications (VirtualCalifornia), mesh generation tools, and data mining applications. This applications are linked to relational and XML data bases that store crustal faults (“QuakeTables”), GPS data archives, and Seismic catalogs. These databases may be used directly by users, or they may be coupled to simulation codes. Application tool visualization services include GMT (Generic Mapping Tools), RIVA, and ParVox. The latter two applications are high performance visualization tools developed in-house at JPL. Apache Ant-based job management tools allow users to execute composed, multi-staged meta-applications. SERVO‟s initial development phase includes a testbed of distributed resources supplied by JPL, the University of Southern California, and Indiana University, linked through a Web Service infrastructure described below. SERVOGrid‟s core Web Services include the following: a) file transfer from desktop to backend and between backend machines; b) remote command execution to support job submission, including multi-staged composite jobs; c) job monitoring on backend resources; d) event/call back services; e) services to manage user context (or session) metadata, allowing the user to archive and revisit project folders; f) database services that provide both user-based
10
and programmatic access to XML and relational data bases; and g) application metadata management services that manage information about deployed applications. These services are bound into a cohesive Grid as shown in Figure 7 SERVO‟s approach is to use WSDL to define abstract interfaces to these general purpose services. We may then implement the services in any number of ways, including wrappers around the Java COG kit for accessing Grid services. Globus toolkit 3 and higher-style services may also be built around this approach, although we favor pure Web services for the current implementation. The use of WSDL simplifies deployment, as it does not presuppose any underlying Grid infrastructure. The client stubs for these various services are managed through the User Interface Server. SERVO user access is based around a component based grid computing portal that is compatible with OGCE technology. Pierce is the OGCE principal investigator. SERVOGrid future work will concentrate on supporting more sophisticated data flows needed to build data assimilation tools for computational steering. We are also developing more sophisticated grid workflows to support streaming data sources (particularly sensor grids), Geographical Information System grid/web services based on OpenGIS Consortium standards, and RDF/OWL-based information systems for managing the distributed components in Figure 7. Figure 7 SERVO Grid architecture
Portlet Based User Interface
HTTP
Client Stubs
SOAP SOAP
SOAP
DB Service 1
Job Sub/Mon And File Services
SOAP
Visualization Services Grid RPC
JDBC Operating and Queuing Systems Host 2
DB
GMT
Host 1
Host 3
4.3
Generic workbench architecture
We think of our workbenches as portal based systems that have been customized to a specific application subdomain. In contrast to a traditional problem solving environment (PSE) a workbench will contain a collection of potentially distributed services arrayed together in a particular way for a specific purpose. Each PSANDA workbench would be built in a layered approach that allows specialization to particular sub-domains at levels visible to end-users whilst building on general portal technologies „‟under the hood‟‟. Thus much of the workbench construction can utilize existing software and it is likely that areas of workbench development that require custom development may find application elsewhere. Under the hood PSANDA workbenches will make extensive use of existing NMI technologies, including GRIDS Center releases and OGCE portal releases. As described in Sec 4.1, PSANDA workbenches build on high performance ESMF component technologies to build supercomputing applications and Grid/Web Service
11
technologies to manage deployment requirements and millisecond and higher inter-component communications. The GRIDS Center release‟s core technologies, including particularly the Globus Toolkit and the Condor scheduler, will be leveraged by this proposal. Globus technologies provide the tools needed to bring ESMF-built application workbenches onto deployed grids: Globus GSI security will be used to provide secure access to deployed host resources (described in Sec 4.5). We will manage user credentials with NMI tools such as MyProxy. GRAM and the more recent Job Manager Service will be used to launch applications remotely. GridFTP will be used for high performance file transfer. As discussed in the workbench scenarios, many PSANDA applications will generate very large data sets. Condor-G will be used for high job through-put and sophisticated scheduling on distributed resources. PSANDA services will be built using Web and Grid Services. As Globus 4 technologies become hardened by NMI testing, we will adopt proven GT4 Web Service components that available through the NMI. PSANDA will also take full advantage of the OGCE portal tools. We will base PSANDA workbenches on Service Oriented Architectures [REF], so different types of client-side tools may be developed. However, prominent effort will be devoted to Computing Web Portal-based workbenches. These will be based on the OGCE portlet tools for standard Grid services enumerated above. As necessary, we will develop and contribute back workbench portlets to the OGCE for use in other OGCE compatible projects.
4.4
Workbench specialization to a specific sub-domain
Several of our workbenches will employ OGCE portal tools and therefore require limited customization. However, for other projects we wish to explore the use of custom non-browser client software. The alternate clients we will employ are Matlab[48] and the CCA[49] GUI client. These clients will respectively allow us to (1) use the Matlab advanced graphics and built-in linear algebra capabilities seamlessly and (2) utilize the CCA breadboard drawing tool to flexibly wire modeling components together. To make these clients compatible with standards-based Grid/web-service protocols proxies that map from the client to web-service will be deployed.
4.5
Target hosts for deployment
The proposal team has available to them a number of high-end supercomputing resources and a number of high-end cluster platforms. The platform mix will provide a solid basis for establishing the platform portability of our core software and will provide adequate resources to support significant production computation. We also anticipate applying for computational resources and NSF and DOE national centers. The testbed sites include: (1) at M.I.T four clusters, with an aggregate CPU count of 550 processors, interconnected by 10 gigabit Ethernet[50], (2) at NASA-JPL - a currently under purchase $3M cluster facility with 1000 processors and a 64-processor SGI Altix (3) at NASA GSFC - some access to a 1000 processor Alpha based system and a 500 processor Myrinet cluster (4) at PU-NOAA GFDL - some limited access to a 1000 processor SGI MIPS based facility and extensive access to a 32 processor Linux cluster facility (5) at NCAR - some access to a 1000 processor IBM Power series CPU cluster (6) at NASA AMES - a 512 processor SGI Altix and a 24 processor SGI MIPS system. Network connections between the sites can sustain at least 50mbit/s making it possible to transfer as much as 1TB of data between these sites in a matter of days. As the project progresses we will work with systems groups at each of the sites to ensure that appropriate software and resource allocations are available for this project.
4.6
Bridges to the Common Component Architecture
The ESMF component model rivals the CCA for simplicity and minimality in its requirements. These properties are essential for a framework that needs to host a wide variety of scientific simulations and help provide a stepwise upgrade path from separate standalone codes to a single multidisciplinary simulation consisting of interoperating parts. Although it is out of the scope of this project to develop CCA/Grid/ESMF interoperability, our team does have the necessary connections to the CCA community (through Gannon and Bramley on the IU team) to ensure long term compatibility, establish voluntary interoperability working groups, and otherwise enable cooperation between the communities. We have extensive collaborations with the ESMF team working on the CCA. Nancy Collins, an ESMF Joint Specification Team member, has actively participated in CCA meetings for two years now, and hosted the last CCA quarterly meeting. The CCA is an ESMF partner (http://www.esmf.ucar.edu/). At an Indiana University sponsored
12
workshop (http://www.cs.indiana.edu/~bramley/cca/cca.html) last September CCA interfaces were defined for some ESMF modules and prototypes were successfully run using CCA frameworks. This weeklong event included climate model researchers from other efforts (CCSM, the Model Coupling Toolkit, the DDB) and so shared CCA interfaces may enable bridging among the different climate modeling frameworks. In addition, web service connections to DOE CCA frameworks will provide ESMF researchers with an extensive parallel toolkit of mesh generators, PDE solvers, linear and nonlinear solvers, etc. Bramley and Gannon have been actively researching providing CCA interfaces as web services, and the CCA standard has been influenced to be amenable to this approach. As part of our CCA work we are also working closely with ESMF researchers people in developing data models for CCA. In June a jointly sponsored IU-ORNL workshop will be convened to develop data models, and it is now a requirement that the CCA model handle ESMF fields, grids, and local arrays. The web services approach proposed here also potentially provides significant value-added to the CCA effort, by contributing distributed data service interfaces. Current CCA implementations are limited to standard file I/O and are not enabled to access and work with remotely located data.
5
Project plans and milestones
Year 1 activities Mapping ESMF component interface and ESMF State type to WSDL Determine granularity of services (i.e. how to partition codes). Workbench requirements definition and detailed design. Limited deployment of prototype OGCE portal based workbenches (OPTIM, HIGHRES and VC) Add web-service binding to ESMF component interface. Development of build and test tools for glue software and for monitoring deployed services. Establishment of PSANDA front end web site and PSANDA open source distribution and support site. Year 2 activities Add web-service capability to ESMF CATTR and I/O interface. Full deployment of OGCE workbenches. Development of Matlab and CCA GUI work benches (COUPLE, GEOD). Publication of draft system and user documentation. Development of custom code insertion service. Year 3 activities Development of workbenches for general install, including test tools. Analysis of application web-services to internal ESMF component coupling. Full deployment of Matlab and CCA GUI based workbenches. Full deployment of code insertion service. Publication of final system and user documentation.
5.1 Software Engineering Plan The glue software we develop will be made available through SourceForge…, regression tests, netsaint based deployed service tests, web-site for user and developer documents. (ref: MITgcm testing and docs, ESMF testing and docs, developers HOWTOS etc…) 6 Results from Prior NSF Support
13
References
1. 2. 3. Boville, B.A. and P.R. Gent, The NCR Climate System Model Version One. Journal of Climate, 1998. 11: p. 1115-1130. DAO, GEOS-3 Data Assimilation System, in Office Note Series on Global Modeling and Data Assimilation. 1997, NASA Donellan, A. and G. Lyzenga, Fault afterslip and upper crustal relaxation following the Northridge earthquake. Journal of Geophysical Research, 1998. 103(21): p. 21,28521,297. Hudnut, K.W., et al., Co-seismic displacements of the 1994 Northridge California Earthquake. Bulletin of the Seismic Society of America, 1996. 86: p. S19-S36. Marshall, J., et al., Hydrostatic, quasi-hydrostatic and nonhydrostatic ocean modeling. Journal of Geophysical Research, 1997. 102: p. 5733-5752. Michalakes, J., et al. Development of a Next Generation Regional Weather Research and Forecast Model. in Ninth ECMWF Workshop on the Use of High Performance Computing in Meteorology. 2001. Reading, UK: World Scientific, Singapore vol p 269276 Pacanowski, R. and S. Griffies, The MOM 3 Manual. 1999, Geophysical Fluid Dynamics Laboratory: Princeton. p. 680 Suarez, M. and L. Takacs, Documentation of the AIRES/GEOS dynamical core, Version 2., in NASA Technical Memo. 1995, Goddard Space Flight Center Hill, C., et al., The architecture of the earth system modeling framework. Computing in science and engineering, 2004. 6(1): p. 18-28. Donellan, A., et al., Illuminating the Earth's Interior Through Advanced Computing. Computing in Science and Engineering, 2004. 6(1): p. 36-44. ESMF, The Earth System Modeling Framework Web Site. 2004.http://www.esmf.ucar.edu Quakesim, The Quakesim project web site.http://wwwaig.jpl.nasa.gov/public/dus/quakesim/index.html SERVO, The Solid Earth Virtual Observatory web site.http://www.servogrid.org Pierce, M., C. Youn, and G. Fox. Interacting Data Services for Distributed Earthquake Modeling. in Internationational Conference on Computational Science. 2003: SpringerVerlag.Lecture Notes on Computer Science vol 2659 p 863-872 Dickinson, R., et al., How can we advance our weather and climate models as a community. Bulletin of the American Meteorological Sociaety, 2002. 83(3). FMS, The Geophysical Fluids Laboratory Flexible Modeling System Web Site.http://www.gfdl.noaa.gov/~fms Hill, C., et al. A strategy for tera-scale climate modeling. in Eigth ESMWF Conference on the Use of Parallel Processors in Meteorology. 1997. Reading, UK: World Scientific, Singapore vol p Jones, P., First- and Second-Order conservative remapping schemes for Grids in Spherical Coordinates. Monthly Weather Review, 1999. 127: p. 2204-2210. Larson, J., et al. The Model Coupling Tookit. in International Conference on Computer Science. 2001: Springer-Verlag.Lecture Notes in Computer Science vol 2073 p 185-194
4. 5. 6.
7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17.
18. 19.
14
20.
21. 22. 23. 24. 25. 26.
27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37.
38. 39.
40.
41. 42.
Sawyer, W., et al. Parallel Grid Manipulations in Earth Science Calculations. in 3rd International Meeting on Vector and Parallel Processing. 1998: Springer-Verlag.Lecture Notes in Computer Science - VECPAR vol p 666-679 Schaffer, D.S. and M. Suarez, Design and performance analysis of a massively parallel Atmospheric General Circulation Model. Scientific Programming, 2000. 8: p. 40-57. Derber, J.C., D.F. Parrish, and S.J. Lord, The new global operational system at the National Meteorological Center. Weather and Forecasting, 1991. 6: p. 538-547. Sela, J., Spectral modeling at the National Meteorological Center. Monthly Weather Review, 1980. 108: p. 1279. NCEP, The National Center for Environmental Prediction.http://wwwt.ncep.noaa.gov CCSM, The Community Climate System Model Web Site.http://www.ccsm.ucar.edu/ Dukowicz, J., R. Smith, and R. Malone, A reformulation and implementation of the Bryan-Cox-Semtner ocean model on the connection machine. Journal of Atmosphere and Ocean Technology, 1993. 10: p. 195-208. Vertenstein, M., K. Oleson, and S. Levis, CLM 2.0 Users Guide, in Community Climate System Model, National Center for Atmospheric Research IPCC, International Panel on Climate Change Web Site.http://www.ipcc.ch/ OGCE, The Open Grid Computing Environment Web Site. Pierce, M., et al. Interoperable Web Services for Computational Portals. in Supercomputing 2002. 2002: IEEE.Proceedings of Supercomputing 2002 vol p DLESE, The Digital Library for Earth System Education web site.http://www.dlese.org ESG, The Earth System Grid website.http://www.earthsystemgrid.org ESP, The Earth Science Portal web site.http://data1.gfdl.noaa.gov/~ck/esp/webpages/ GEON, The GEON (Cyberinfrastructure for the Geosciences) web site.http://www.geongrid.org/ NOMADS, The NOAA Operational Model Archive and Distributions System web site.http://www.ncdc.noaa.gov/oa/climate/nomads/nomads.html NVODS, The National Virtual Ocean Data System.http://www.po.gso.uri.edu/tracking/vodhub/vodhubhome.html PBO-Steering-Committee, The Plate Boundary Observatory: Creating a FourDimensional Image of the Deformation of Western North America. 1999.http://www.unavco.org/research_science/publications/proposals/PBOwhitepaper.pd f Stammer, D., et al., State estimation in modern oceanographic research. EOS, 2002. 83(27): p. 289&294-295. H.P., B., H. C., and T. B., Mantle circulation models with variational data assimilation: Inferring past mantle flow and structure from plate motion histories and seismic tompography. Geophys. J. Int., 2003. 152: p. 280-301. Stammer, D., et al., Volume, heat and freshwater transports of the global ocean circulation 1992-1997, estimated from a general circulation model constrained by WOCE data. Journal of Geophysical Research, 2003: p. DOI: 10.1029/2001JC001115, 2002 C1, 2003. GRACE, The Gravity Recovery and Climate Experiment web site.http://www.csr.utexas.edu/grace/ Dickey, J., et al., Recent Earth oblateness variations: Unraveling cliamte and postglacial rebound effects. Science, 2002. 298: p. 1975-1977.
15
43. 44. 45. 46. 47. 48. 49. 50.
Ponte, R., D. Stammer, and J. Marshall, Ocean signals in observed motions of the Earth's pole. Nature, 1998. 391(29): p. 476-479. T/P, The TOPEX/POSEIDON Satellite Altimeter web site.http://topex-www.jpl.nasa.gov/ Earthscope, The Earthscope web site.http://www.earthscope.org ECCO, The ECCO web site. Zhou, S., et al. Prototyping an ESMF CCA interface. in NASA Earth System Technology Conference. 2003 vol A4 p 3 Mathworks, The Mathworks Simulink and Matlab web site. Armstrong, R., et al. Toward a Common Component Architecture for High-Performance Scientific Computing. in High Performance and Distributed Computing. 1999 vol p ACES, The Alliance for Computational Earth Science web site.http://acesgrid.org
16
7 Budget Justification IU – 1 FTE + ???? Caltech/JPL – 0.75 FTE + 1 PostDoc MIT – 0.5 + 0.5 FTE + 1 PostDoc + 0.2 project admin NCAR – 1 Postdoc + 0.5 FTE senior ESMF developer co-sponsored PU-GFDL – 0.5 FTE
17