Services and the Semantic Grid - Indiana University

Document Sample
Services and the Semantic Grid - Indiana University Powered By Docstoc
					Services and the Semantic
  SKG2005 Beijing China November 28 2005

                Geoffrey Fox

    Computer Science, Informatics, Physics
      Pervasive Technology Laboratories
   Indiana University Bloomington IN 47401

             Data Deluged Science
   In the past, we worried about data in the form of parallel I/O or
    MPI-IO, but we didn’t consider it as an enabler of new science
    and new ways of computing
   Data assimilation was not central to HPCC
   DoE ASCI set up because didn’t want test data!
   Now particle physics will get 100 petabytes from CERN
    • Nuclear physics (Jefferson Lab) in same situation
    • Use around 30,000 CPU’s simultaneously 24X7
   Weather, climate, solid earth (EarthScope)
   Bioinformatics curated databases (Biocomplexity only 1000’s of
    data points at present)
   Virtual Observatory and SkyServer in Astronomy
   Environmental Sensor nets
     Information/Knowledge Grids
   Distributed (10’s to 1000’s) of data sources (instruments,
    file systems, curated databases …)
   Data Deluge: 1 (now) to 100’s petabytes/year (2012)
    • Moore’s law for Sensors
   Possible filters assigned dynamically (on-demand)
     • Run image processing algorithm on telescope image
     • Run Gene sequencing algorithm on compiled data
   Needs decision support front end with “what-if”
   Metadata (provenance)
    critical to annotate data
   Integrate across experiments
     as in multi-wavelength
Data Deluge comes from pixels/year available                3
 Semantically Rich Services with a Semantically
   Rich Distributed Operating Environment
                     O                                   Filter Service
                     S        FS                FS
                FS                     O                 O                          O
                         O             S                 S                          S
       SS                S                     FS                                           F
                FS                                                   FS                     S
       SS                                                                  MD
                         O                 MD
                                                             O                                  MD
                         S                                   S                                              O
                FS                FS                                                FS                      S   Other
                                                                     F                                          Service
                                                                     S                               O
       SS                MD
                                  O                                            O                     S
            FS                    S                                            S            FS
                                               FS        MD                                                  MD
       SS                FS
            FS                    FS                     FS                    FS                    S       MetaData
            S        S        S            S        S            S         S            S        S       Sensor Service
            S        S        S            S        S            S         S            S        S

       Semantic Grid and Services
   Implications of SOA (Service Oriented Architectures) for SG
    (Semantic Grid)
     • Build services to implement SG
   Implications of SG for SOA
     • Build metadata rich systems of services using SG
   Services receive data in SOAP messages, manipulate it and
    produce transformed data as further messages
   Meta-data is carried in SOAP messages
   Meta-data controls processing and transport of SOAP Messages
   Knowledge is created from data by services
   The Grid enhances Web services with semantically rich system
    and application specific management
   One must exploit and work around the different approaches to
    meta-data and their manipulation in Web Services            5
      Structure of SOAP Messages
                                             Container Workflow

     H1   H2   H3   H4   Body          F1   F2    F3    F4    Service

                                       Container Handlers

   SOAP Messages have System information in the header
    including WS-Policy based meta-data defining processing
     • Processed by Handlers
   Application data and meta-data is the body (controversies here!)
     • Processed by the Service itself
   Some meta-data like WS-RF is logically “only in messages”
   Other like that in WS-Context or the SRB are stored in logical
    equivalent of XML databases
   We only need to preserve semantic structure (XML/SOAP
    Infoset) so transport in fast XML and store in efficient relational
What Type of Services are there?
   There are a horde of support services supplying security,
    collaboration, database access, user interfaces
   The support services are either associated with system or
     • We will study the WS-* and GS-* which implicitly or
       explicitly define many support services
   There are generalized filter services which are applications that
    accept messages and produce new messages with some data
    derived from that in input
     • Simulations (including PDE’s and reactive systems)
     • Data-mining
     • Transformations
     • Agents
     • Reasoning           are all termed filters here
   There are services like “author ontology”, “parse RDF” or
    “attach provenance” that directly support Semantic Grid
   But all services and their interactions are bathed in sea of meta-
    data and so implicitly need and support the Semantic Grid
It’s a Composite Hierarchical World
   Filters can be a workflow which means they are “just collections
    of other simpler services”
     • One needs meta-data to control the workflow
   Services are programs that accept messages and produce
   Grids are a distributed collection of services supporting
    managed shared resources
     • Management requires meta-data
   Grids are distributed systems that accept distributed messages
    and produce distributed result messages
     • Can always talk about Grids and view a service or a
        workflow as a special case of a Grid
   It just requires meta-data to send a message to a Grid and it
    routed to “correct computer” holding “requested service”
     • Meta-data allows mapping of virtual to real addresses        8
 Semantically Rich Services with a Semantically
   Rich Distributed Operating Environment
       SOAP Message Streams                                   Filter Service                                Wisdom
                        S        FS                FS
                            MD                                                                             Decisions
                    Data                  O               O                         O
    Raw Data                                                    FS
                            O             S               S                         S
       SS                   S Information FS                                                F
                   FS                                                  FS                   S
       SS                                                                    MD
                            O                 MD
                   Data     S        Information               S                                            O
                   FS                FS                                             FS                      S   Other
    Raw Data                                                      F                                             Service
                                                                  S                                  O
       SS                   MD
                                     O                      Information         O                    S
               FS                    S                                          S           FS
 Service                                          FS         MD                                              MD
       SS                   FS
                                 Data                                                                O
               FS                    FS                      FS
                                                                        DataFS                       S       MetaData
               S        S        S            S        S           S        S           S        S       Sensor Service
               S        S        S            S        S           S        S           S        S
                     Raw Data                          Raw Data                                      is same as outward
Another     Grids of Grids Architecture           Another
                                                  SOAP Message Streams                                facing application
Database                                            Grid                                                   service
The Grid and Web Service Institutional Hierarchy

    4: Application or Community of Interest
                Specific Services
 such as “Run BLAST” or “Look at Houses for sale”
    3: Generally Useful Services and Features                     and other
 Such as “Access a Database” or “Submit a Job” or “Semantic       GGF/W3C/
   Grid” or “Support a Portal” or “Collaborative Visualization”   ………

          2: System Services and Features                         WS-* from
Handlers like WS-RM, Security, Programming Models like BPEL
                  or Registries like UDDI

                  1: Container and                                Apache Axis
           Run Time (Hosting) Environment                         .NET etc.

           The WS-* Infrastructure
   Core Grid Services build on and/or extend the 60 or so
    WS-* Infrastructure specifications which define
    • 1. Container Model, XML, WSDL …
    • 2. Service Internet ( (Reliable) Messaging, Addressing)
      including extensions for high performance transport and
      representation. This is natural basis for streaming
    • 3. Notification
    • 4. Workflow and Transactions
    • 5. Security
    • 6. Service Discovery
    • 7. Metadata and State including lifetime These categories
                                                  are directly connected
    • 8. Management (service interactions)        to metadata
    • 9. Policy, Agreements
    • 10. Portals and User Interfaces
          A List of Web Services 6
• 6) Service Discovery
• UDDI (Broadly Supported OASIS Standard) V3 August
• WS-Discovery Web services Dynamic Discovery
  (Microsoft, BEA, Intel …) February 2004
• WS-IL Web Services Inspection Language, (IBM,
  Microsoft) November 2001
• Note WS-Context as a metadata catalog and WS-
  Management Catalog are examples of related services
• There are many UDDI extensions such as Grimoires from
   Discovery is just accessing part of meta-data
  UK OMIIawhich often are essentially providing semantic
   defining Grid
  enrichment                                       12
          A List of Web Services 7
• 7) Metadata and State
• RDF Resource Description Framework (W3C) Set of
  recommendations expanded from original February 1999 standard
• DAML+OIL combining DAML (Darpa Agent Markup Language)
  and OIL (Ontology Inference Layer) (W3C) Note December 2001
• OWL Web Ontology Language (W3C) Recommendation February
• WS-MetadataExchange Web Services Metadata Exchange (BEA,
  IBM, Microsoft, SAP, Sun …) September 2004
• ASAP Asynchronous Service Access Protocol (OASIS) with V1.0
  working draft 2B December 11 2004
• WS-GAF Web Service Grid Application Framework (Arjuna,
  Newcastle University) August 2003
• WBEM Web-Based Enterprise Management including CIM
  (Common Information Model) from DMTF (Distributed
  Management Task Force) 2004-2005                         13
        A List of Web Services 7
• 7) Metadata and State: Resource Framework
• WS-RF Web Services Resource Framework (OASIS)
• WS-Resource Framework Web Services Resource 1.2
  (OASIS) Public Review Draft 01, 10 June 2005
• WS-ResourceProperties Web Services Resource
  Properties V1.2 Public Review Draft 01, 10 June 2005
• WS-ResourceLifetime Web Services Resource Lifetime
  V1.2 Public Review Draft 01, 13 June 2005
• WS-ServiceGroup Web Services Service Group V1.2
  Public Review Draft 01, 10 June 2005
  These WS-* define syntax of Meta-data (RDF
• OWL CIM) and how to use it in system (WS- Public
  WS-BaseFaults Web Services Base Faults V1.2
  Review Draft 01, June 13, 2005
  MetadataExchange) – especially headers (WS-RF) 14
          Metadata and Service Context
   Consider a collection of services working together
     • Workflow tells you how to specify service
       interaction but more basically there is shared
       information or context specifying/controlling
   WS-RF and WS-GAF have different approaches to
    contextualization – supplying a common “context”
    which at its simplest is a token to represent state
   More generally core shared information includes
    dynamic service metadata and the equivalent of
    configuration information.
   Two services linked by a stream are perhaps simplest
    example of a collection of services needing context
   Note that there is a tension between storing
    metadata in messages and services.
     • This is shared versus distributed memory debate in

       parallel computing
               Stateful Interactions
   There are (at least) four approaches to specifying state
     • OGSI use factories to generate separate services for
       each session in standard distributed object fashion
     • Globus GT-4 and WSRF use metadata of a resource
       to identify state associated with particular session
     • WS-GAF uses WS-Context to provide abstract
       context defining state. Has strength and weakness
       that reveals less about nature of session
     • WS-I+ “Pure Web Service” leaves state specification
       the application – e.g. put a context in the SOAP body
   I think we should smile and write a great metadata
    (semantic) service hiding all these different models for
    state and metadata                                     16
               Role of WS-Context
   There are many WS-* specifications addressing meta-data
    and both many approaches and many trade-offs
   We hear about Distributed Hash Tables (Chord) to achieve
    scalability in large scale networks
   Managed dynamic workflows as in sensor integration and
    collaboration require
     • Fault-tolerance and ability to support dynamic changes
       with few millisecond delay
     • But only a modest number of involved services (up to
       1000’s in a session)
     • Need Session NOT Service/Resource meta-data so don’t
       use WS-RF
   We are building a WS-Context compliant metadata catalog
    supporting distributed or central paradigms – see later talk
    by Mehmet Aktas
   Use for OGC Web catalog service with UDDI for slowly
    varying meta-data
       A List of Web Services 8
• 8) Management
• WS-DistributedManagement Web Services
  Distributed Management Framework with MUWS
  and MOWS below (OASIS)
• WSDM-MUWS Web Services Distributed
  Management: Management Using Web Services
  (OASIS) OASIS Standard March 9 2005
• WSDM-MOWS Web Services Distributed
  Management: Management of Web Services
  (OASIS) OASIS Standard March 9 2005
   A List of Web Services 8- Contd
• 8) Management: Microsoft Stack
• WS-Management Web Services for Management
  (Microsoft, Intel, Sun …) August 2005
• WS-Management Catalog The WS-Management
  Catalog (Microsoft, Intel, Sun …) August 2005
• WS-Transfer Web Service Transfer (Microsoft,
  BEA, Sonic Software etc.) September 2004
• WS-Enumeration Web Service Enumeration
              BEA, Sonic Software etc.) September
  (Microsoft, define exchange of data and meta-data
  These WS-*
  between services
         A List of Web Services 9
• 9) General Service Characteristics
• WS-PolicyFramework Web Services Policy
  Framework (BEA, IBM, Microsoft, SAP …)
  September 2004
• WS-PolicyAttachment Web Services Policy
  Attachment (BEA, IBM, Microsoft, SAP …)
  September 2004
• WS-PolicyAssertions Web Services Policy Assertions
  Language (BEA, IBM, Microsoft, SAP) 18 December
  2002 WS-* define syntax of Meta-data defining
  These(Superseded by WS-PolicyFramework)
• structure of distributed SystemAgreement
  WS-Agreement Web Services
  Grids are managed (meta-data enhanced) August 2004
  Specification (GGF under development) 9            20
  distributed collections of Internet Scale services
Activities in Global Grid Forum Working Groups
GGF Area                                   Standards Activities
1: Architecture     High Level Resource/Service Naming (level 2 of fig. 1),
                    Integrated Grid Architecture
2: Applications     Software Interfaces to Grid, Grid Remote Procedure Call,
                    Checkpointing and Recovery, Interoperability to Job Submittal services,
                    Information Retrieval,
3: Compute          Job Submission, Basic Execution Services, Service Level Agreements
                    for Resource use and reservation, Distributed Scheduling

4: Data             Database and File Grid access, Grid FTP, Storage Management, Data
                    replication, Binary data specification    and interface, High-level
                    publish/subscribe, Transaction management
5: Infrastructure   Network measurements, Role of IPv6 and high performance
                    networking, Data transport
6: Management       Resource/Service configuration, deployment and lifetime, Usage
                    records and access, Grid economy model
7: Security         Authorization, P2P and Firewall Issues, Trusted Computing
          Use the sea of meta-data supported by Semantic Grid
         Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
  two-level Programming Model
• We make a Service (same as a “distributed object” or
  “computer program” running on a remote computer) using
  conventional technologies
   – C++ Java or Fortran Monte Carlo module
   – Data streaming from a sensor or Satellite
   – Specialized (JDBC) database access
• Such services accept and produce data from users files and
• The Grid is built by coordinating such services assuming
  we have solved problem of programming the service 22
         Two-level Programming II
   The Grid is discussing the composition of distributed
    services with the runtime Service1                  Service2
    interfaces to Grid in
    analogy to UNIX
    pipes/data streams       Service3             Service4

   Familiar from use of UNIX Shell, PERL or Python
    scripts to produce real applications from core programs
   Such interpretative environments are the single
    processor analog of Grid Programming
   Some projects like GrADS from Rice University are
    looking at integration between service and composition
    levels but dominant effort looks at each level separately
        3 Layer Programming Model
Web Service 1      WS 2                        WS N-1   Web Service N
              Level 1 Programming inside services
     Application expressed in in Java Fortran C++ MPI etc.

                        WS-* Infrastructure

   Level 2 Programming choosing services by virtualization
  Application Semantics (Metadata, Ontology) Semantic Grid

     Level 3 Grid Programming composing multiple services
           Service Workflow, Transactions, Mediation

            Substantial work in UK e-Science program,
              international semantic web community
    Information Architecture and Semantic Grid

   WS-* provides key low level capability but deliberately
    does not define an information (data) architecture and
    leaves this to domain specific specification activities such
    as CellML/SBML for biology, WFS/GML for GIS and
    XGSP for Collaboration
   WS-* does define a primitive service discovery (UDDI)
    and meta-data capabilities including WS-Context, WS-
    RF, RDF and WS-MetadataExchange already discussed.
   GGF defines Grid data capabilities including info-D
    (publish/subscribe) and OGSA-DAI for data repositories
   Semantic Grid uses WS-* and GS-* extending meta-data
    and service discovery with data-mining and reasoning
    3 XML Databases of Importance
   WS-Context controlling a workflow
   (Extended) UDDI supporting semantic service discovery
   WFS or ASFS (see later) provides application specific
    data/meta-data repository)
   These have different performance, scalability and data unit size
   In our implementation, each is currently “just an
    Oracle/MySQL” database front ended by filters that convert
    between XML (GML for WFS) and object-relational Schema
     • Example of Semantics (XML) versus representation (SQL)
   OGSA-DAI offers Grid interface to databases – we could use but
    don’t as we only need to expose WFS and not MySQL to Grid

Information Management/Processing
   SOAP messages transport information expressed in a
    semantically rich fashion between sources and services that
    enhance and transform information so that complete system
     • Semantic Web technologies like RDF and OWL help us have
       rich expressivity
   Data  Information  Knowledge transformation
   We build application specific information
    management/transformation systems ASIS for each application
   One special domain is the system itself where the metadata
    associated with services, sessions, Grids, messages, streams and
    workflow is itself managed and supported by an SIIS

                 Generalizing a GIS
   Geographical Information Systems GIS have been
    hugely successful in all fields that study the earth and
    related worlds
    • They define Geography Syntax (GML) and ways to store,
      access, query, manipulate and display geographical features
    • In SOA, GIS corresponds to a domain specific XML language
      and a suite of services for different functions above
   However such a universal information model has not
    been developed in other areas even though there are
    many fields in which it appears possible
    •   BIS Biological Information System
    •   MIS Military Information System
    •   IRIS Information Retrieval Information System
    •   PAIS Physics Analysis Information System
    •   SIIS Service Infrastructure Information System
ASIS Application Specific Information System I
   a) Discovery capabilities that are best done using WS-*
   b) Domain specific metadata and data including
    search/store/access interface. (cf WFS). Lets call generalization
    ASFS (Application Specific Feature Service)
    • Language to express domain specific features (cf GML). Lets call
      this ASL (Application Specific language)
    • Tools to manipulate information expressed in language and key
      data of application (cf coordinate transformations). Lets call this
      ASTT (Application specific Tools and Transformations)
    • ASL must support Data sources such as sensors (cf OGC metadata
      and data sensor standards) and repositories. Sensors need
      (common across applications) support of streams of data
    • Queries need to support archived (find all relevant data in past)
        and streaming (find all data in future with given properties)
    • Note all AS Services behave like Sensors and all sensors are
      wrapped as services
    • Any domain will have “raw data” (binary) and that which has been
      filtered to ASL. Lets call ASBD (Application Specific Binary Data)
ASIS Application Specific Information System II
   Lets call this ASVS (Application Specific Visualization Services)
    generalizing WMS for GIS
   The ASVS should both visualize information and provide a way of
    navigating (cf GetFeatureInfo) database (the ASFS)
   The ASVS can itself be federated and presents an ASFS output
   d) There should be application service interface for ASIS from which all
    ASIS service inherit
   e) There will be other user services interfacing to ASIS
   All user and system services will input and output data in ASL using
    filters to cope with ASBD
                  Filter, Transformation, Reasoning,
       AS                Data-mining, Analysis
                    AS Tool          AS Service       AS Tool     ASVS
                   (generic)       (user defined)    (generic)    Display

                         Messages using ASL
       Directly GS-* WS-*



                           Is a
                      or a message/
               ASVS             31
                                                   or Military
             ASFS                                 Information

                                                     Unit of
OGSA-DAI and Sensor Standards
                                                  expressed in

     IS =                                     Information

                         Receive                  Get                ASL
                      Request/Select             Status             Data Get
 Service or

                        Issue                 Request               ASL
  BFS      =        Request/Select             Status              Data Put

Basic Filter                             Filter Resource
                       Receive                  Get                  ASL
                    Request/Select             Status              Data Get

               Filters either transform or aggregate Information
                             A Filter Service is a general workflow
                             (the microscopic workflow) of Basic
FS    =
                             Filter Services

           BFS         BFS

           BFS         BFS

                          The output of a Filter Service is
                          indistinguishable from that of an IS

A transport link supports asynchronous publish/subscribe semantics
and Web Service Reliable messaging fault tolerance
Transport links can be multicast to support collaboration (typically
for last link before or after Presentation Service) or replication for
fault tolerance.
                    Top IS could be produced by a Filter Service

                        IS               IS                IS
IS Gridlet   =

                        FS               FS                FS


 The basic unit (Gridlet) transforms and aggregates
           application specific information
 Gridlets are composed using Grid of Grids concept

IS Gridlet            IS Gridlet              IS Gridlet

             IS Gridlet                             IS Gridlet              IS Gridlet

 Federation                                                            General System
 Macrosopic Workflow                IS Gridlet                         Services
                            Search                                     Security
      Session              Planning                                    Fault Tolerance
                          Construction            Presentation
    Management                                                         Metadata
                          Management                                   Directory
                                                      ASVS             Collaboration
                                 Portal                                Management
 Data  Information  Knowledge as messages flow from original sources to top of Filter Grid
 Semantically Rich Services with a Semantically
   Rich Distributed Operating Environment
       SOAP Message Streams                                   Filter Service                                Wisdom
                        S        FS                FS
                            MD                                                                             Decisions
                    Data                  O               O                         O
    Raw Data                                                    FS
                            O             S               S                         S
       SS                   S Information FS                                                F
                   FS                                                  FS                   S
       SS                                                                    MD
                            O                 MD
                   Data     S        Information               S                                            O
                   FS                FS                                             FS                      S   Other
    Raw Data                                                      F                                             Service
                                                                  S                                  O
       SS                   MD
                                     O                      Information         O                    S
               FS                    S                                          S           FS
 Service                                          FS         MD                                              MD
       SS                   FS
                                 Data                                                                O
               FS                    FS                      FS
                                                                        DataFS                       S       MetaData
               S        S        S            S        S           S        S           S        S       Sensor Service
               S        S        S            S        S           S        S           S        S
                     Raw Data                          Raw Data                                      is same as outward
Another     Grids of Grids Architecture           Another
                                                  SOAP Message Streams                                facing application
Database                                            Grid                                                   service
   Virtualization everywhere
   Focus on semantics not representation to get
    performance combined with expressivity for transport
    and data access
   All this enabled by powerful meta-data services
   Grids add management to rich but potentially chaotic
    set of Web Services;
    • management and coherence enabled by meta-data
   Can define general information architectures (ASIS,
    GIS, SIIS) for both applications and system
   Knowledge from filters that span simulations, data-
    mining, reasoning and agents
   A service is just a special case of a Grid
   Build systems from SubGrids (Gridlets)                38