QuakeSim: Grid Computing, Web
Services, and Portals for
Community Grids Lab
Prof. Geoffrey Fox, CGL Director
Many external collaborators: Andrea Donnellan and team
(JPL), Yehuda Bock and team (Scripps/UCSD), Neil
Devadason, John Buechler, and David Coats (POLIS)
Dr. Yili Gong
Choonhan Youn (now with GEON project)*
Mehmet S. Aktas
Jong Youl Choi
Grids and Cyberinfrastructure
Cyberinfrastructure is a term coined by the
National Science Foundation in the
famous “Atkins Report”.
Prof. Dan Atkins (UM) is now the head of
NSF’s Office of Cyberinfrastructure.
Roughly synonymous with
Grid Computing (DOE and NSF)
Global Information Grid (DOD), etc.
What Is CI, Really?
Computing, Data Storage, Networking
NSF TeraGrid (www.teragrid.org)
Open Sciences Grid (www.opensciencegrid.org)
Many international equivalents
Globus: multi-institutional security, job management, file
transfer, data management, system monitoring
Condor: Cycle-scavenging and job scheduling.
And many others: see for example the TeraGrid’s Common
TeraGrid Software Stack, the OSG’s Virtual Data Toolkit and
the NMI Grids Center for composite releases.
Scientific Gateways (like QuakeSim)
Useful Online Services
NIH’s PubMed, PubChem
Most Grids are built these days with Web Services
Contributions from Choonhan
Youn, Ahmet Sayar, Galip Aydin,
Harsh Gadgil, and collaborators’
QuakeSim is an example of a science
Google “TeraGrid Science Gateways” for
Combines a Web portal and Web
services to access on-line data sources
and connect them to geophysical
applications running on computing
QuakeSim Applications and Their Data
Pattern Informatics (UC-Davis)
Earthquake forecasting code, uses seismic archives as
Regularized Dynamic Annealing Hidden Markov
Method (RDAHMM) (JPL)
Time series analysis code, can be applied to GPS and
Identifies signal components (possibly associated with
underlying physical causes) with no fixed parameters.
Finite element code for detailed modeling of fault
stresses, seismic displacements, uses fault models as
QuakeTables Fault Database
QuakeSim’s fault repository for California.
Compatible with GeoFEST, Disloc, VC
GPS Data sources and formats (RDAHMM and others).
Seismic Event Data (RDAHMM and others)
diagram, from the Browser Interface
JSP + Client Stubs
SOAP/HTTP WSDL WSDL WSDL WSDL
WSDL WSDL WSDL WSDL
Job Sub/Mon Visualization
DB Service And File Or Map
DB Queuing DB
Host 1 (WFS) Host 2 (Grid) Host 3 (WMS)
GIS Services as a Data Grid
We decided that the Data Grid components of SERVO is
best implemented using standard GIS services.
Use Open Geospatial Consortium standards
Maximize reusability in future QuakeSim projects
Provide downloadable GIS software to the community as a side
effect of QuakeSim research.
We implemented two cornerstone standards
Web Feature Service (WFS): data service for storing abstract map
Faults, GPS, seismic records
Web Map Service (WMS): generate interactive maps from WFS’s
and other WMS’s.
We built these as Web Services
WSDL and SOAP: programming interfaces and messaging formats
You can work with the data and map services through programming
APIs as well as browser interfaces.
satellite maps with
overlays for Los
This has been our simplest “proving
Integrates (streaming) WFS, WMS,
WS-Context, and HPSearch’s
WSProxy services (wraps PI
executable and helper format
This is basically a linear workflow
Whole earth seismic catalog plotted on
NASA map server. Combines
streaming feature server and map
Pattern informatics results combined with
Feature and Map servers can be used to
forecast areas of increased earthquake
Data Flow or Event Flow?
Octopus slide implies a sequential data flow between
applications on distributed hosts.
Usually called “scientific workflow” in the CI community.
See http://vtcpc.isi.edu/wiki/ for the an overview and players.
This is not MPI or parallel programming. It’s more like a stone
Services don’t need to know much about each other.
Don’t have to be from the same providers
Transfer data (or URL pointers) as needed.
Event flow and traditional message passing are better suited
for closely coupled applications.
See for example DOE’s CCA project and NASA’s Earth System
Modeling Framework (ESMF).
We use JSR 168 portlets to
build sharable portal plugins.
Portlets: Portal Components
Web portals are essentially websites with
Personalization, content control, etc, derive from
Java portals are based on a standard
Componets are called portlets
JSR 168 is the standard
Many TeraGrid and other science gateways
use this standard.
RDAHMM Set up and run RDAHMM, query Scripps
GRWS GPS Service, maintain persistent
ST_Filter Similar to RDAHMM portlet; ST_Filter has
much more input.
Station Monitor Shows GPS stations on a Google Map,
displays last 10 minutes of data.
Real Time RDAHMM Displays RDAHMM results of last 10
minutes of GPS data in a Google map.
Seismic Archive Query Google Map portlet that shows seismic
Portlet events based on your query.
Fault Query Portlet Allows you to query the QuakeTables fault
data base for information on faults.
RDAHMM Portlet: Main
RDAHMM Project Set Up
RDAHMM GRWS Query
RDAHMM Results Page
Real Time RDAHMM Portlet
Station Monitor Portlet
Managing Real Time GPS
Slides from Galip Aydin
California Real Time Network
Continuous GPS Stations (CGPS) are depicted as
triangles while the Real-Time stations are Message Format
represented as circles. Image is obtained from Network Data Rates
SOPAC GPS Explorer at
http://sopac.ucsd.edu/projects/realtime Time RYO ASCII GML
CRTN GPS 1 second 1.5KB 4.03KB 48.7KB
1 hour 5.31MB 14.18MB 171.31MB
1 day 127.44MB 340.38MB 4.01GB
1 month 3.8GB 9.97GB 123.3GB
1 year 45.8GB 119.67GB 1.41TB
Network (250 1year 1.23TB 16.18TB 160TB
How does one manage all the data generated by the
85 stations? How can you get just the data you want?
Note this is fundamentally different from traditional
request/response style Web Services.
Processing Real-Time GPS Streams
s Server Station
GPS Networks nt Filter
ryo2as ascii2po Single RDAHMM
Raw ryo2nb Filter
cii s Station
A Complete Sensor Message Processing Path, including a data analysis application.
Application Integration with Real-Time Filters
RDAHMM Filter Filter
Station Monitor records
records real-time for 10
minutes for 10 minutes
and calculates position
which determines state
changes in theApplication
Graph Plotter XYZ signal.
representation of the
representation of the
2 – Multiple Publishers Test
Multiple Publishers Test
c 1B 0
Time Of The Day
We add more GPS networks by running more publishers.
The results show that 1000 publishers can be supported
with no performance loss. This is an operating system 29
4 – Multiple Brokers Test
creation of Broker networks.
Converter We create a two-broker
1 Simpl Messages published to first
Filter broker can be received from
the second broker.
We take timings on each
Simple We connect 750 clients to
751 each broker and run for 24
Filter hours. We chose 750 clients to
stay well below the saturation
er 2 Simple
Filter The results show that the
performance is very good and
similar to single broker test. 30
Slides courtesy of Zao Liu
Integrating Map Servers
Geographical Information Systems combine online dynamic
maps and databases.
Many GIS software packages exist
GIS servers around state of Indiana
ESRI ArcIMS and ArcMap Server (Marion, Vanderburgh,
Hancock, Kosciusco, Huntington, Tippecanoe)
Autodesk MapGuide (Hamilton, Hendricks, Monroe,
WTH Mapserver™ Web Mapping Application (Fulton,
Cass, Daviess, City of Huntingburg) based on several
Open Source projects (Minnesota Map Server)
Challenge: make 17 different county map servers from different
companies work together.
92 counties in Indiana, so potentially 92 different map
We assume heterogeneity in GIS map and feature
GIS services are organized bottom-up rather than top-down.
Local city governments, 92 different county governments,
multiple Indiana state agencies, inter-state (Ohio, Kentucky)
consideration, federal government data providers (Hazus).
Must find a way to federate existing services.
We must reconcile ESRI, Autodesk, OGC, Google Map,
and other technical approaches.
Must try to take advantage of Google, ESRI, etc rather than
We must have good performance and interactivity.
Servers must respond quickly--launching queries to 20 different
map servers is very inefficient.
Clients should have simplicity and interactivity of Google Maps
and similar AJAX style applications.
Caching and Tiling Maps
Federation through caching:
WMS and WFS resources are queried and results are stored on the
WMS images are stored as tiles.
These can be assembled into new images on demand (c. f. Google
Projections and styling can be reconciled.
We can store multiple layers this way.
We build adapters that can work with ESRI and OGC products; tailor to
Serving images as tiles
Client programs obtain images directly from our tile server.
That is, don’t go back to the original WMS for every request.
Similar approaches can be used to mediate WFS requests.
This works with Google Map-based clients.
The tile server can re-cache and tile on demand if tile sections are
Google Maps Server
Hamilton Cass County
County Map Map Server
Server (OGC Web Map
Must provide adapters
for each Map Server Adapter Adapter Adapter Browser client fetches
type . image tiles for the
bounding box using
Tile Server Google Map API.
Tile Server requests
map tiles at all zoom
levels with all layers.
Cache Server The cache server
These are converted
fulfills Google map
to uniform projection,
calls with cached tiles
indexed, and stored.
at the requested
bounding box that fill
Browser + the bounding box.
Map Server Example
Marion and Hancock
county parcel plots and
IDs are overlaid on IU
images that are
accessed by this
mashup using Google
We cache and tile all
the images from several
different map servers.
(Marion and Hancock
actually use different
It’s the Data, Stupid
Grids have been distracted by complicated security
Accounts, allocations, authentication, etc on
It assumes a lot of people actually want to do this.
But arguably most people really want access to data
and results, not computers.
Ex: PubChem has properties on 12 million drug-like
molecules online, can be browsed for free.
The Grid security model is equivalent to actually giving you a
key to the lab.
My suggestion: leave the Grid to the experts and try
to think of as many online data services that can be
created using results from TeraGrid resources.
Challenge: use all of the TeraGrid, NASA, Open
Science Grid, China National Grid, etc, etc to
Multiple Grid Job Execution
QuakeSim and many similar science gateways
have generally correct approach...
Web Services, online components.
...but arguably the details need to be changed.
We have been following the Enterprise model
(IBM, HP, MS, Sun).
JSR 168, WSRP, WSDL, SOAP, WS-*
Maybe time to switch to the Internet model
Google desktop, Netvibes startpage
Programmable Web, mash ups, AJAX, REST, etc.
www.quakesim.org (being updated)
WSDL WSDL “REST”
Tying It All Together:
HPSearch is an engine for orchestrating distributed Web Service
It uses an event system and supports both file transfers and data
HPSearch engine binds the flow to a particular set of remote
services and executes the script.
HPSearch engines are Web Services, can be distributed
interoperate for load balancing.
ProxyWebService: a wrapper class that adds notification and
streaming support to a Web Service.
More info: http://www.hpsearch.org
Filters can be run as Web
Services to create workflows.
Filter Chains can be deployed
for complex processing.
Streaming messaging provide