simdat - ecmwf

Document Sample
simdat - ecmwf Powered By Docstoc

 Meteo Activity of the SIMDAT project:
 Building components of the WIS

                         Baudouin Raoult

HALO meeting –11.07.06                     BR 1
Data Grids for Process and Product
Development using Numerical Simulation                                                 SIMDAT
and Knowledge Discovery
 4 years project funded by the EU
    -   Contract with EU was signed on 1 September 2004
 SIMDAT focuses on 4 application areas:
    -   product design in automotive and aerospace,
    -   process design in pharmacology
    -   service provision in meteorology
 Budget of 11 M €

                      Phase 1: Connectivity     Phase 2: Interoperability       Phase 3: Knowledge

         . Deployment of Grid infrastructure   . Virtual Data Repository       . Integration of analysis
         with particular attention to data     . Introduction of Grid          services, workflows,
         transport and management              technologies research           discovery and data mining
         . Distributed DB access               . Introduction of VO

 HALO meeting –11.07.06                                                 BR 2
SIMDAT Meteorology Partners               SIMDAT
 22 members in the consortium

   Deutscher Wetterdienst (DWD)
   Météo France
   UK Met Office

 Intel
 Ontoprise
 IT Innovation
 HALO meeting –11.07.06            BR 3
Meteo activity                                                                     SIMDAT
 To build an integrated and scalable framework for the collection and sharing
  of distributed data (WIS building blocks)
    -   Instead of each National Met Service having a GISC, A “virtual” GISC
 Service oriented framework targeting meteorology, hydrology, climate and
  environment and offering transparent access to distributed resources
    -   Grid enabled software
    -   Services to process the data, elaborate products, visualize those products
 Some key elements of the project are:
    -   A single view of meteorological information which is distributed amongst the 5 partners
    -   Improve visibility and access to meteorological data through a comprehensive
        discovery service
    -   Offer a variety of reliable services for routine dissemination and for collection of data
    -   Provide a global access control policy managed by the partners and integrated into
        their existing security infrastructure
 320 men/month taking into account the technology contribution to the
  meteo application
 HALO meeting –11.07.06                                            BR 4
Virtual meteorological Centre -
                  functional view
Through the Distributed Portal
users searches for and retrieves
data, subscribe to services such
as routine dissemination subject
to authentication and

The Virtual Database Service
provides a single         view   of
partners databases

 HALO meeting –11.07.06               BR 5
Architectural Choices                                                        SIMDAT
 Catalogue duplicated and synchronized at each site
    -   To have a fast discovery (browse & search phase) and a reliable system (client
        redirection to another node)
 Build an open and flexible framework integrating technologies from
  different areas
    -   Allow to pick the best components of each Grid Middleware (Globus,OGSA-DAI)
    -   Associate J2EE and Grid/Web Services technologies to build solid components
 QoS and Robustness are amongst the top priorities of the project
    -   Framework based on J2EE components
    -   Use pipelining, priority and queuing mechanisms to process user’s requests

 HALO meeting –11.07.06                                       BR 6
Architecture                                                           SIMDAT
 3 main components to build the virtual database: Data Repository,
  Catalogue Node and Portal
    - installed on each partner site and interconnected through a dedicated secure
      connection channel
 Data Repository
    - Interface to the partners databases
    - Offers metadata information to describe, search, locate data
    - Offers interface to retrieve data from the associated local databases
 Catalogue Node
    -   Maintains the registry and ensures synchronisation
    -   Harvests metadata and requests data from the data Repository
    -   Ingests data and maintains the cache of the real-time data
    -   Serves clients: Portal or other Nodes
    -   Monitors the execution of the requests
 Distributed Portal
    - Offers interface to search/browse the catalogue

 HALO meeting –11.07.06                                 BR 7
Architecture – con’t            SIMDAT

HALO meeting –11.07.06   BR 8
WMO Core metadata standard                                          SIMDAT
 WMO Core Profile 0.2, profile of ISO19115 on geo-referenced data
 Not scalable
    - Records are large and contain redundant information, slowing down the
      database hosting the catalogue
    - Same information repeated in all metadata records  Unnecessary
      information is circulating over the network
    - Some documents are orders of magnitude larger than data itself
    - Cannot represent very large archives with small granularity
 Cannot fulfil all requirements to build the Virtual Meteorological
    - Information on how to retrieve data from local databases
    - Information to create a directory (Taxonomy of documents)
    - Information to sub-select data from a dataset

 HALO meeting –11.07.06                                BR 9
Solutions                                                              SIMDAT
 Split XML documents into fragments to solve the scalability issue
                                                                    WMO            Core
    - WMO core metadata is structured                               UKMO           Owner
    - Some parts are shared amongst many documents                  Synop          Data type
                                                                  Heathrow         Location
                                                                  2005-10-12       Date
 Add specific extension to define all relevant information needed to
  implement the system and not defined by the WMO core
    - Internal unique ID
    - Hierarchy relationship
    - Physical location (which node holds the data)
    - Information used to generate a valid request to retrieve data from the end
    - Information used to create web interface for the end user
 Work with WMO to Integrate extensions in future releases of
 HALO meeting –11.07.06                                 BR 10
WMO Information System (WIS)
 Support variety of data types (Common to all WMO Programmes)
 Support Archive and Real-time datasets
 Build a Catalogue of all the meteorological data for exchange to
  support WMO programmes
 Support ad-hoc requests for data and products: Pull model
 Support routine dissemination of all observed data and products
  both real-time and non real-time : Push model
 Support network security
 Support of different users profile and data policies
 Use different types of communication links (GTS, satellite, dedicated

 HALO meeting –11.07.06                         BR 11
WIS Requirements
              Support variety
               of data types

HALO meeting –11.07.06          BR 12
Data Repository Functions                                              SIMDAT
 Interface to the existing Meteorological Databases
    - It provides access to any kind of databases (rdbms, bespoke, flat files)
 Metadata provider
    - Provide Metadata information to discover, locate and describe data, in
      respect with a defined XML metadata format
    - Answer Catalogue Node metadata harvesting messages
 Data provider
    - Provide an interface to asynchronously request data from the associated
      existing database (to support real-time & archive datasets)
    - Transform the XML data request to the real database request
    - Offer a data channel (HTTP, FTP, …) to send the retrieved data to the
      Catalogue Node

 HALO meeting –11.07.06                                  BR 13
Data Repository Implementation                                    SIMDAT
 Implemented as a web-service using a document-based interface
    - Protocol entirely described in an XML Message
    - Independent from the network transport (HTTP, SOAP, etc)
 Three transport methods are supported
    - Web Services (WS-I, WSDL, SOAP)
    - REST (XML over HTTP)
 VMCMessage Protocol
    - A set of XML messages have been defined for metadata harvesting
    - A set of XML messages have been defined for data requesting (Submit,
      GetSubmitStatus, DeleteRequest)

 HALO meeting –11.07.06                               BR 14
WIS Requirements                                  SIMDAT
            Support real-time           UMARF Satellite Data


     Era40 ReanalysisData                      Unidart Climate Data

          IAA NWP                        JEDDS Aeronautical
          Outputs Data                   Data

HALO meeting –11.07.06          BR 15
Realtime Data Repository                                            SIMDAT
 A GTS Data Repository is being developed by Meteo-France
    - Interfaced with the GTS (through a MSS)
    - It publishes GTS collections
 For phase II : One source providing GTS data
    - No data replication over the SIMDAT infrastructure
 For phase III several sources plugged onto SIMDAT
    - Strategy to uniquely identify the datasets (using MD5 hash codes)
    - Real-time data replication using the metadata synchronization mechanism
 Generic Solution which can be used by all the partners

 HALO meeting –11.07.06                                BR 16
WIS Requirements                         SIMDAT

          Build a Catalogue of
            all the available

HALO meeting –11.07.06           BR 17
Catalogue Node                                            SIMDAT
 The Catalogue is built using the metadata harvested from the Data
 The Catalogue is synchronized and replicated on each Catalogue
 The Catalogue offers discovery services accessible to the user
  through the distributed portal
 The Catalogue contains the necessary information to retrieve and
  sub select the data

 HALO meeting –11.07.06                       BR 18
SIMDAT Infrastructure                    SIMDAT

              Support ad-hoc
             requests for data
               & products:
                Pull model

HALO meeting –11.07.06           BR 19
Distributed Portal                                                    SIMDAT
 A Portal is deployed on each site and offers a unique view of all the
  datasets available
 Portal offers discovery mechanisms to the users
    - Full text, temporal and geographical search (google-like)
    - Directory browsing (yahoo-like browsing)
 Portal provides request handling mechanisms to the users
    - Submitted requests can be asynchronous to manage long-lived requests
    - A user can manage its requests (check status, delete them …)
    - A user retrieve the associated data when the request is complete
 Portal uses the information contained in the metadata to create the
  data sub-selection forms
    - The metadata/data providers define how to access its datasets

 HALO meeting –11.07.06                                 BR 20
How to create the database
requests ?
 Keep the request language of the different databases
    -   Non intrusive solution
 Add information in metadata <vgisc> extension to build the end system
    -   <request>: hold information specific on how to generate a valid request to the data
    -   <variables>: hold information on how to create a web interface to let the user select
        items from the dataset
              Web portal uses the <variables> element to present selection dialogues to the user

 HALO meeting –11.07.06                                            BR 21

HALO meeting –11.07.06   BR 22

HALO meeting –11.07.06   BR 23

HALO meeting –11.07.06   BR 24
WIS Requirements                                        SIMDAT

         Support routine
       dissemination of all
       observed data and
     products both real-time
       and non real-time :
           Push model

              Will be addressed in phase III of the project

HALO meeting –11.07.06                        BR 25
WIS Requirements                                SIMDAT

             Support Network

         Inter-Node Communications secured using SSL

HALO meeting –11.07.06                 BR 26
WIS Requirements                                    SIMDAT

           Support of different
             users profile &
              data policies

                 Virtual Organization Implementation:
            Framework study and investigation in Phase II
               First Stable Version delivered for Nov 06

HALO meeting –11.07.06                    BR 27
VO Domains                                                            SIMDAT
                                                                          VO Domain
 Domain
                                                                  A         B
    - Group of organisations that share a common policy               C
      (e.g. the RA-VI V-GISC)                                                   D1
    - The VO might contain a number of sub-domains.               F       D2
 Authentication (AuthN)
    - Users register with a node.
    - Users are known to all the nodes in the same domain
    - Any node within the domain should be able to authenticate a user of the
 Authorisation (AuthZ)
    - AuthZ is performed at the node level to allow/deny access to the data.
    - Data Access policy is expressed within the metadata.
 HALO meeting –11.07.06                                BR 28
Cross-domain issues                                                    SIMDAT
                                                                                 VO Domain
 Metadata is visible across all domains
                                                                         A         B
    - But some metadata can be explicitly hidden

 Cross-domain authorisation involves user registration                                D1
                                                                        F        D2
    - User from domain “D2“ wanting to access data which is
         limited to domain “D1” will have to register to domain “D1”

 Cross-domain authentication will be recognised on a trust relation-
  ship previously established.

    - Users authenticated coming from “D2” into “D1” will be checked against the
        trusted CA domains.

 The concept of domain needs to be validated by VO working group

 HALO meeting –11.07.06                                    BR 29
WIS Requirements                                   SIMDAT

               Use different
                 types of

                Currently deployed on Internet
           Phase II : Study on a dual RMDCN/Internet
                   deployment for production
              Phase III :RMDCN deployment and
              Eumetcast integration study

HALO meeting –11.07.06                    BR 30
What do you need to publish data ? SIMDAT
 Installation
    - Install a Catalogue
    - Install a Data Repository
 Develop a Module to request data from the existing database
    - It can simply be a shell script calling the database client with the “zero
      development” Data Repository
 Define the metadata describing the datasets
    - Define the discovery information (keyword, geographical, temporal)
    - Define how to request the database
              Static information necessary to access the database
              Define how to sub-select data
    - A metadata definition wizard is being developed

 HALO meeting –11.07.06                                     BR 31
Milestones                                                         SIMDAT
 Synchronization Engine Enhancements - June 06
 Mesh Network Management Software - June 06
    - Lead by INTEL and fully compatible with the new synchronization engine
 WSRF interfaces implementation - Sep 06
 Metadata Manager migration toward ebXML
    - Lead by UKMO, feasibility study by June 06
 Development of a Real-time Data Repository
    - To acquire GTS observations : Lead by Meteo-France, first implementation
      by Sep 06
 Implementation of the security services of the VO - Feb 07
 Onotology based discovery service
    - First Thesaurus implementation Sep 06, discovery interface Mar 07

 HALO meeting –11.07.06                               BR 32
CBS conference demonstration                   SIMDAT
                            Meshed network of GISCs and

                            Based on SIMDAT software and
                             including the 5 European partners,
                             JMA, CMA, BoM, NCAR, NODC

                                   -JMA, CMA, BoM fully integrated in the
                                   grid architecture

                                   -NCAR acting as DCPC and providing
                                   metadata information via OAI

                                   -NODC currently investigating the
                                   SIMDAT software

HALO meeting –11.07.06     BR 33
Results Achieved                                                 SIMDAT
 Five (+2.5) Meteorological Centres interconnected and exchanging data and
 Users able to search browse and retrieve data distributed within the
 Unified Catalogue based                                                UMARF Satellite
  on WMO Core Profile v0.2
 First element of the
  security infrastructure

                            Era40 Data
                                                                           UNIDART Data

                                 IAA Data                                 JEDDS Data

 HALO meeting –11.07.06                             BR 34
Results Achieved (cont.)                                           SIMDAT
 Flexible, non intrusive architecture
    - Support any kind of databases (RDBMS, XML, Flat File, Object, bespoke).
    - Zero development Data Repository
    - Support Asynchronous requests (Archive, long requests)
 Interests shown by meteorological community:
    - JMA (Japan) and CMA (China) fully integrated
    - BoM (Australia), KMA (Korea) and NODC (Russia) in progress
    - NCAR (US) catalogue is harvested using OAI, users are redirected to NCAR
 SIMDAT work feeds back into WMO through expert teams:
    - ET-WISC: SIMDAT Meteo requirements are now used as the WIS
      requirements, IPET-MI: Findings have been used for the definition of the
      WMO Core Profile 0.3, ET-CTS: SIMDAT infrastructure is seen as a major
      infrastructure for implementing the WIS

 HALO meeting –11.07.06                               BR 35

Shared By: