simdat - ecmwf
Document Sample


SIMDAT
Meteo Activity of the SIMDAT project:
Building components of the WIS
Baudouin Raoult
ECMWF
HALO meeting –11.07.06 BR 1
Data Grids for Process and Product
Development using Numerical Simulation SIMDAT
and Knowledge Discovery
4 years project funded by the EU
- Contract with EU was signed on 1 September 2004
SIMDAT focuses on 4 application areas:
- product design in automotive and aerospace,
- process design in pharmacology
- service provision in meteorology
Budget of 11 M €
Phase 1: Connectivity Phase 2: Interoperability Phase 3: Knowledge
. Deployment of Grid infrastructure . Virtual Data Repository . Integration of analysis
with particular attention to data . Introduction of Grid services, workflows,
transport and management technologies research discovery and data mining
. Distributed DB access . Introduction of VO
HALO meeting –11.07.06 BR 2
SIMDAT Meteorology Partners SIMDAT
22 members in the consortium
Deutscher Wetterdienst (DWD)
ECMWF
EUMETSAT
Météo France
UK Met Office
Intel
Ontoprise
IBM
IT Innovation
NEC
HALO meeting –11.07.06 BR 3
Meteo activity SIMDAT
To build an integrated and scalable framework for the collection and sharing
of distributed data (WIS building blocks)
- Instead of each National Met Service having a GISC, A “virtual” GISC
- 2 DCPCs : ECMWF, EUMETSAT
Service oriented framework targeting meteorology, hydrology, climate and
environment and offering transparent access to distributed resources
- Grid enabled software
- Services to process the data, elaborate products, visualize those products
Some key elements of the project are:
- A single view of meteorological information which is distributed amongst the 5 partners
- Improve visibility and access to meteorological data through a comprehensive
discovery service
- Offer a variety of reliable services for routine dissemination and for collection of data
- Provide a global access control policy managed by the partners and integrated into
their existing security infrastructure
320 men/month taking into account the technology contribution to the
meteo application
HALO meeting –11.07.06 BR 4
Virtual meteorological Centre -
SIMDAT
functional view
Through the Distributed Portal
users searches for and retrieves
data, subscribe to services such
as routine dissemination subject
to authentication and
authorization
The Virtual Database Service
provides a single view of
partners databases
HALO meeting –11.07.06 BR 5
Architectural Choices SIMDAT
Catalogue duplicated and synchronized at each site
- To have a fast discovery (browse & search phase) and a reliable system (client
redirection to another node)
Build an open and flexible framework integrating technologies from
different areas
- Allow to pick the best components of each Grid Middleware (Globus,OGSA-DAI)
- Associate J2EE and Grid/Web Services technologies to build solid components
QoS and Robustness are amongst the top priorities of the project
- Framework based on J2EE components
- Use pipelining, priority and queuing mechanisms to process user’s requests
HALO meeting –11.07.06 BR 6
Architecture SIMDAT
3 main components to build the virtual database: Data Repository,
Catalogue Node and Portal
- installed on each partner site and interconnected through a dedicated secure
connection channel
Data Repository
- Interface to the partners databases
- Offers metadata information to describe, search, locate data
- Offers interface to retrieve data from the associated local databases
Catalogue Node
- Maintains the registry and ensures synchronisation
- Harvests metadata and requests data from the data Repository
- Ingests data and maintains the cache of the real-time data
- Serves clients: Portal or other Nodes
- Monitors the execution of the requests
Distributed Portal
- Offers interface to search/browse the catalogue
HALO meeting –11.07.06 BR 7
Architecture – con’t SIMDAT
HALO meeting –11.07.06 BR 8
WMO Core metadata standard SIMDAT
WMO Core Profile 0.2, profile of ISO19115 on geo-referenced data
Not scalable
- Records are large and contain redundant information, slowing down the
database hosting the catalogue
- Same information repeated in all metadata records Unnecessary
information is circulating over the network
- Some documents are orders of magnitude larger than data itself
- Cannot represent very large archives with small granularity
Cannot fulfil all requirements to build the Virtual Meteorological
Centre
- Information on how to retrieve data from local databases
- Information to create a directory (Taxonomy of documents)
- Information to sub-select data from a dataset
HALO meeting –11.07.06 BR 9
Solutions SIMDAT
Split XML documents into fragments to solve the scalability issue
WMO Core
- WMO core metadata is structured UKMO Owner
- Some parts are shared amongst many documents Synop Data type
Heathrow Location
2005-10-12 Date
Add specific extension to define all relevant information needed to
implement the system and not defined by the WMO core
- Internal unique ID
- Hierarchy relationship
- Physical location (which node holds the data)
- Information used to generate a valid request to retrieve data from the end
system
- Information used to create web interface for the end user
Work with WMO to Integrate extensions in future releases of
standards
HALO meeting –11.07.06 BR 10
WMO Information System (WIS)
SIMDAT
Requirements
Support variety of data types (Common to all WMO Programmes)
Support Archive and Real-time datasets
Build a Catalogue of all the meteorological data for exchange to
support WMO programmes
Support ad-hoc requests for data and products: Pull model
Support routine dissemination of all observed data and products
both real-time and non real-time : Push model
Support network security
Support of different users profile and data policies
Use different types of communication links (GTS, satellite, dedicated
links)
HALO meeting –11.07.06 BR 11
WIS Requirements
SIMDAT
Support variety
of data types
HALO meeting –11.07.06 BR 12
Data Repository Functions SIMDAT
Interface to the existing Meteorological Databases
- It provides access to any kind of databases (rdbms, bespoke, flat files)
Metadata provider
- Provide Metadata information to discover, locate and describe data, in
respect with a defined XML metadata format
- Answer Catalogue Node metadata harvesting messages
Data provider
- Provide an interface to asynchronously request data from the associated
existing database (to support real-time & archive datasets)
- Transform the XML data request to the real database request
- Offer a data channel (HTTP, FTP, …) to send the retrieved data to the
Catalogue Node
HALO meeting –11.07.06 BR 13
Data Repository Implementation SIMDAT
Implemented as a web-service using a document-based interface
- Protocol entirely described in an XML Message
- Independent from the network transport (HTTP, SOAP, etc)
Three transport methods are supported
- OGSA-DAI WSRF
- Web Services (WS-I, WSDL, SOAP)
- REST (XML over HTTP)
VMCMessage Protocol
- A set of XML messages have been defined for metadata harvesting
(Info,GetMetadataRecord)
- A set of XML messages have been defined for data requesting (Submit,
GetSubmitStatus, DeleteRequest)
HALO meeting –11.07.06 BR 14
WIS Requirements SIMDAT
Support real-time UMARF Satellite Data
data
Era40 ReanalysisData Unidart Climate Data
IAA NWP JEDDS Aeronautical
Outputs Data Data
HALO meeting –11.07.06 BR 15
Realtime Data Repository SIMDAT
A GTS Data Repository is being developed by Meteo-France
- Interfaced with the GTS (through a MSS)
- It publishes GTS collections
For phase II : One source providing GTS data
- No data replication over the SIMDAT infrastructure
For phase III several sources plugged onto SIMDAT
- Strategy to uniquely identify the datasets (using MD5 hash codes)
- Real-time data replication using the metadata synchronization mechanism
Generic Solution which can be used by all the partners
HALO meeting –11.07.06 BR 16
WIS Requirements SIMDAT
Build a Catalogue of
all the available
meteorological
products
HALO meeting –11.07.06 BR 17
Catalogue Node SIMDAT
The Catalogue is built using the metadata harvested from the Data
Repositories
The Catalogue is synchronized and replicated on each Catalogue
Node
The Catalogue offers discovery services accessible to the user
through the distributed portal
The Catalogue contains the necessary information to retrieve and
sub select the data
HALO meeting –11.07.06 BR 18
SIMDAT Infrastructure SIMDAT
Support ad-hoc
requests for data
& products:
Pull model
HALO meeting –11.07.06 BR 19
Distributed Portal SIMDAT
A Portal is deployed on each site and offers a unique view of all the
datasets available
Portal offers discovery mechanisms to the users
- Full text, temporal and geographical search (google-like)
- Directory browsing (yahoo-like browsing)
Portal provides request handling mechanisms to the users
- Submitted requests can be asynchronous to manage long-lived requests
- A user can manage its requests (check status, delete them …)
- A user retrieve the associated data when the request is complete
Portal uses the information contained in the metadata to create the
data sub-selection forms
- The metadata/data providers define how to access its datasets
HALO meeting –11.07.06 BR 20
How to create the database
SIMDAT
requests ?
Keep the request language of the different databases
- Non intrusive solution
Add information in metadata <vgisc> extension to build the end system
request:
- <request>: hold information specific on how to generate a valid request to the data
repository
- <variables>: hold information on how to create a web interface to let the user select
items from the dataset
Web portal uses the <variables> element to present selection dialogues to the user
HALO meeting –11.07.06 BR 21
SIMDAT
HALO meeting –11.07.06 BR 22
SIMDAT
HALO meeting –11.07.06 BR 23
SIMDAT
HALO meeting –11.07.06 BR 24
WIS Requirements SIMDAT
Support routine
dissemination of all
observed data and
products both real-time
and non real-time :
Push model
Dissemination/Subscription
Will be addressed in phase III of the project
HALO meeting –11.07.06 BR 25
WIS Requirements SIMDAT
Support Network
Security
Inter-Node Communications secured using SSL
HALO meeting –11.07.06 BR 26
WIS Requirements SIMDAT
Support of different
users profile &
data policies
Virtual Organization Implementation:
Framework study and investigation in Phase II
First Stable Version delivered for Nov 06
HALO meeting –11.07.06 BR 27
VO Domains SIMDAT
VO Domain
Domain
A B
- Group of organisations that share a common policy C
(e.g. the RA-VI V-GISC) D1
- The VO might contain a number of sub-domains. F D2
E
Authentication (AuthN)
- Users register with a node.
- Users are known to all the nodes in the same domain
- Any node within the domain should be able to authenticate a user of the
domain.
Authorisation (AuthZ)
- AuthZ is performed at the node level to allow/deny access to the data.
- Data Access policy is expressed within the metadata.
HALO meeting –11.07.06 BR 28
Cross-domain issues SIMDAT
VO Domain
Metadata is visible across all domains
A B
- But some metadata can be explicitly hidden
C
Cross-domain authorisation involves user registration D1
F D2
E
- User from domain “D2“ wanting to access data which is
limited to domain “D1” will have to register to domain “D1”
Cross-domain authentication will be recognised on a trust relation-
ship previously established.
- Users authenticated coming from “D2” into “D1” will be checked against the
trusted CA domains.
The concept of domain needs to be validated by VO working group
HALO meeting –11.07.06 BR 29
WIS Requirements SIMDAT
Use different
types of
communication
links
Currently deployed on Internet
Phase II : Study on a dual RMDCN/Internet
deployment for production
Phase III :RMDCN deployment and
Eumetcast integration study
HALO meeting –11.07.06 BR 30
What do you need to publish data ? SIMDAT
Installation
- Install a Catalogue
- Install a Data Repository
Develop a Module to request data from the existing database
- It can simply be a shell script calling the database client with the “zero
development” Data Repository
Define the metadata describing the datasets
- Define the discovery information (keyword, geographical, temporal)
- Define how to request the database
Static information necessary to access the database
Define how to sub-select data
- A metadata definition wizard is being developed
HALO meeting –11.07.06 BR 31
Milestones SIMDAT
Synchronization Engine Enhancements - June 06
Mesh Network Management Software - June 06
- Lead by INTEL and fully compatible with the new synchronization engine
WSRF interfaces implementation - Sep 06
Metadata Manager migration toward ebXML
- Lead by UKMO, feasibility study by June 06
Development of a Real-time Data Repository
- To acquire GTS observations : Lead by Meteo-France, first implementation
by Sep 06
Implementation of the security services of the VO - Feb 07
Onotology based discovery service
- First Thesaurus implementation Sep 06, discovery interface Mar 07
HALO meeting –11.07.06 BR 32
CBS conference demonstration SIMDAT
Meshed network of GISCs and
DCPCs
Based on SIMDAT software and
including the 5 European partners,
JMA, CMA, BoM, NCAR, NODC
-JMA, CMA, BoM fully integrated in the
grid architecture
-NCAR acting as DCPC and providing
metadata information via OAI
-NODC currently investigating the
SIMDAT software
HALO meeting –11.07.06 BR 33
Results Achieved SIMDAT
Five (+2.5) Meteorological Centres interconnected and exchanging data and
metadata
Users able to search browse and retrieve data distributed within the
partners
Unified Catalogue based UMARF Satellite
Data
on WMO Core Profile v0.2
First element of the
security infrastructure
Era40 Data
UNIDART Data
IAA Data JEDDS Data
HALO meeting –11.07.06 BR 34
Results Achieved (cont.) SIMDAT
Flexible, non intrusive architecture
- Support any kind of databases (RDBMS, XML, Flat File, Object, bespoke).
- Zero development Data Repository
- Support Asynchronous requests (Archive, long requests)
Interests shown by meteorological community:
- JMA (Japan) and CMA (China) fully integrated
- BoM (Australia), KMA (Korea) and NODC (Russia) in progress
- NCAR (US) catalogue is harvested using OAI, users are redirected to NCAR
portal
SIMDAT work feeds back into WMO through expert teams:
- ET-WISC: SIMDAT Meteo requirements are now used as the WIS
requirements, IPET-MI: Findings have been used for the definition of the
WMO Core Profile 0.3, ET-CTS: SIMDAT infrastructure is seen as a major
infrastructure for implementing the WIS
HALO meeting –11.07.06 BR 35
Get documents about "