					Services and the Semantic
  SKG2005 Beijing China November 28 2005

                Geoffrey Fox

    Computer Science, Informatics, Physics
      Pervasive Technology Laboratories
   Indiana University Bloomington IN 47401

             Data Deluged Science
   In the past, we worried about data in the form of parallel I/O or
    MPI-IO, but we didn’t consider it as an enabler of new science
    and new ways of computing
   Data assimilation was not central to HPCC
   DoE ASCI set up because didn’t want test data!
   Now particle physics will get 100 petabytes from CERN
    • Nuclear physics (Jefferson Lab) in same situation
    • Use around 30,000 CPU’s simultaneously 24X7
   Weather, climate, solid earth (EarthScope)
   Bioinformatics curated databases (Biocomplexity only 1000’s of
    data points at present)
   Virtual Observatory and SkyServer in Astronomy
   Environmental Sensor nets
     Information/Knowledge Grids
   Distributed (10’s to 1000’s) of data sources (instruments,
    file systems, curated databases …)
   Data Deluge: 1 (now) to 100’s petabytes/year (2012)
    • Moore’s law for Sensors
   Possible filters assigned dynamically (on-demand)
     • Run image processing algorithm on telescope image
     • Run Gene sequencing algorithm on compiled data
   Needs decision support front end with “what-if”
   Metadata (provenance)
    critical to annotate data
   Integrate across experiments
     as in multi-wavelength
Data Deluge comes from pixels/year available                3
 Semantically Rich Services with a Semantically
   Rich Distributed Operating Environment
       Semantic Grid and Services
   Implications of SOA (Service Oriented Architectures) for SG
    (Semantic Grid)
     • Build services to implement SG
   Implications of SG for SOA
     • Build metadata rich systems of services using SG
   Services receive data in SOAP messages, manipulate it and
    produce transformed data as further messages
   Meta-data is carried in SOAP messages
   Meta-data controls processing and transport of SOAP Messages
   Knowledge is created from data by services
   The Grid enhances Web services with semantically rich system
    and application specific management
   One must exploit and work around the different approaches to
    meta-data and their manipulation in Web Services            5
      Structure of SOAP Messages
   SOAP Messages have System information in the header
    including WS-Policy based meta-data defining processing
     • Processed by Handlers
   Application data and meta-data is the body (controversies here!)
     • Processed by the Service itself
   Some meta-data like WS-RF is logically “only in messages”
   Other like that in WS-Context or the SRB are stored in logical
    equivalent of XML databases
   We only need to preserve semantic structure (XML/SOAP
    Infoset) so transport in fast XML and store in efficient relational
What Type of Services are there?
   There are a horde of support services supplying security,
    collaboration, database access, user interfaces
   The support services are either associated with system or
     • We will study the WS-* and GS-* which implicitly or
       explicitly define many support services
   There are generalized filter services which are applications that
    accept messages and produce new messages with some data
    derived from that in input
     • Simulations (including PDE’s and reactive systems)
     • Data-mining
     • Transformations
     • Agents
     • Reasoning           are all termed filters here
   There are services like “author ontology”, “parse RDF” or
    “attach provenance” that directly support Semantic Grid
   But all services and their interactions are bathed in sea of meta-
    data and so implicitly need and support the Semantic Grid
It’s a Composite Hierarchical World
   Filters can be a workflow which means they are “just collections
    of other simpler services”
     • One needs meta-data to control the workflow
   Services are programs that accept messages and produce
   Grids are a distributed collection of services supporting
    managed shared resources
     • Management requires meta-data
   Grids are distributed systems that accept distributed messages
    and produce distributed result messages
     • Can always talk about Grids and view a service or a
        workflow as a special case of a Grid
   It just requires meta-data to send a message to a Grid and it
    routed to “correct computer” holding “requested service”
     • Meta-data allows mapping of virtual to real addresses        8
 Semantically Rich Services with a Semantically
   Rich Distributed Operating Environment
The Grid and Web Service Institutional Hierarchy

    4: Application or Community of Interest
                Specific Services
 such as “Run BLAST” or “Look at Houses for sale”
    3: Generally Useful Services and Features                     and other
 Such as “Access a Database” or “Submit a Job” or “Semantic       GGF/W3C/
   Grid” or “Support a Portal” or “Collaborative Visualization”   ………

          2: System Services and Features                         WS-* from
Handlers like WS-RM, Security, Programming Models like BPEL
                  or Registries like UDDI

                  1: Container and                                Apache Axis
           Run Time (Hosting) Environment                         .NET etc.

           The WS-* Infrastructure
   Core Grid Services build on and/or extend the 60 or so
    WS-* Infrastructure specifications which define
    • 1. Container Model, XML, WSDL …
    • 2. Service Internet ( (Reliable) Messaging, Addressing)
      including extensions for high performance transport and
      representation. This is natural basis for streaming
    • 3. Notification
    • 4. Workflow and Transactions
    • 5. Security
    • 6. Service Discovery
    • 7. Metadata and State including lifetime These categories
                                                  are directly connected
    • 8. Management (service interactions)        to metadata
    • 9. Policy, Agreements
    • 10. Portals and User Interfaces
          A List of Web Services 6
• 6) Service Discovery
• UDDI (Broadly Supported OASIS Standard) V3 August
• WS-Discovery Web services Dynamic Discovery
  (Microsoft, BEA, Intel …) February 2004
• WS-IL Web Services Inspection Language, (IBM,
  Microsoft) November 2001
• Note WS-Context as a metadata catalog and WS-
  Management Catalog are examples of related services
• There are many UDDI extensions such as Grimoires from
   Discovery is just accessing part of meta-data
  UK OMIIawhich often are essentially providing semantic
   defining Grid
  enrichment                                       12
          A List of Web Services 7
• 7) Metadata and State
• RDF Resource Description Framework (W3C) Set of
  recommendations expanded from original February 1999 standard
• DAML+OIL combining DAML (Darpa Agent Markup Language)
  and OIL (Ontology Inference Layer) (W3C) Note December 2001
• OWL Web Ontology Language (W3C) Recommendation February
• WS-MetadataExchange Web Services Metadata Exchange (BEA,
  IBM, Microsoft, SAP, Sun …) September 2004
• ASAP Asynchronous Service Access Protocol (OASIS) with V1.0
  working draft 2B December 11 2004
• WS-GAF Web Service Grid Application Framework (Arjuna,
  Newcastle University) August 2003
• WBEM Web-Based Enterprise Management including CIM
  (Common Information Model) from DMTF (Distributed
  Management Task Force) 2004-2005                         13
        A List of Web Services 7
• 7) Metadata and State: Resource Framework
• WS-RF Web Services Resource Framework (OASIS)
• WS-Resource Framework Web Services Resource 1.2
  (OASIS) Public Review Draft 01, 10 June 2005
• WS-ResourceProperties Web Services Resource
  Properties V1.2 Public Review Draft 01, 10 June 2005
• WS-ResourceLifetime Web Services Resource Lifetime
  V1.2 Public Review Draft 01, 13 June 2005
• WS-ServiceGroup Web Services Service Group V1.2
  Public Review Draft 01, 10 June 2005
  These WS-* define syntax of Meta-data (RDF
• OWL CIM) and how to use it in system (WS- Public
  WS-BaseFaults Web Services Base Faults V1.2
  Review Draft 01, June 13, 2005
  MetadataExchange) – especially headers (WS-RF) 14
          Metadata and Service Context
   Consider a collection of services working together
     • Workflow tells you how to specify service
       interaction but more basically there is shared
       information or context specifying/controlling
   WS-RF and WS-GAF have different approaches to
    contextualization – supplying a common “context”
    which at its simplest is a token to represent state
   More generally core shared information includes
    dynamic service metadata and the equivalent of
    configuration information.
   Two services linked by a stream are perhaps simplest
    example of a collection of services needing context
   Note that there is a tension between storing
    metadata in messages and services.
     • This is shared versus distributed memory debate in

       parallel computing
               Stateful Interactions
   There are (at least) four approaches to specifying state
     • OGSI use factories to generate separate services for
       each session in standard distributed object fashion
     • Globus GT-4 and WSRF use metadata of a resource
       to identify state associated with particular session
     • WS-GAF uses WS-Context to provide abstract
       context defining state. Has strength and weakness
       that reveals less about nature of session
     • WS-I+ “Pure Web Service” leaves state specification
       the application – e.g. put a context in the SOAP body
   I think we should smile and write a great metadata
    (semantic) service hiding all these different models for
    state and metadata                                     16
               Role of WS-Context
   There are many WS-* specifications addressing meta-data
    and both many approaches and many trade-offs
   We hear about Distributed Hash Tables (Chord) to achieve
    scalability in large scale networks
   Managed dynamic workflows as in sensor integration and
    collaboration require
     • Fault-tolerance and ability to support dynamic changes
       with few millisecond delay
     • But only a modest number of involved services (up to
       1000’s in a session)
     • Need Session NOT Service/Resource meta-data so don’t
       use WS-RF
   We are building a WS-Context compliant metadata catalog
    supporting distributed or central paradigms – see later talk
    by Mehmet Aktas
   Use for OGC Web catalog service with UDDI for slowly
    varying meta-data
       A List of Web Services 8
• 8) Management
• WS-DistributedManagement Web Services
  Distributed Management Framework with MUWS
  and MOWS below (OASIS)
• WSDM-MUWS Web Services Distributed
  Management: Management Using Web Services
  (OASIS) OASIS Standard March 9 2005
• WSDM-MOWS Web Services Distributed
  Management: Management of Web Services
  (OASIS) OASIS Standard March 9 2005
   A List of Web Services 8- Contd
• 8) Management: Microsoft Stack
• WS-Management Web Services for Management
  (Microsoft, Intel, Sun …) August 2005
• WS-Management Catalog The WS-Management
  Catalog (Microsoft, Intel, Sun …) August 2005
• WS-Transfer Web Service Transfer (Microsoft,
  BEA, Sonic Software etc.) September 2004
• WS-Enumeration Web Service Enumeration
              BEA, Sonic Software etc.) September
  (Microsoft, define exchange of data and meta-data
  These WS-*
  between services
         A List of Web Services 9
• 9) General Service Characteristics
• WS-PolicyFramework Web Services Policy
  Framework (BEA, IBM, Microsoft, SAP …)
  September 2004
• WS-PolicyAttachment Web Services Policy
  Attachment (BEA, IBM, Microsoft, SAP …)
  September 2004
• WS-PolicyAssertions Web Services Policy Assertions
  Language (BEA, IBM, Microsoft, SAP) 18 December
  2002 WS-* define syntax of Meta-data defining
  These(Superseded by WS-PolicyFramework)
• structure of distributed SystemAgreement
  WS-Agreement Web Services
  Grids are managed (meta-data enhanced) August 2004
  Specification (GGF under development) 9            20
  distributed collections of Internet Scale services
Activities in Global Grid Forum Working Groups
GGF Area                                   Standards Activities
1: Architecture     High Level Resource/Service Naming (level 2 of fig. 1),
                    Integrated Grid Architecture
2: Applications     Software Interfaces to Grid, Grid Remote Procedure Call,
                    Checkpointing and Recovery, Interoperability to Job Submittal services,
                    Information Retrieval,
3: Compute          Job Submission, Basic Execution Services, Service Level Agreements
                    for Resource use and reservation, Distributed Scheduling

4: Data             Database and File Grid access, Grid FTP, Storage Management, Data
                    replication, Binary data specification    and interface, High-level
                    publish/subscribe, Transaction management
5: Infrastructure   Network measurements, Role of IPv6 and high performance
                    networking, Data transport
6: Management       Resource/Service configuration, deployment and lifetime, Usage
                    records and access, Grid economy model
7: Security         Authorization, P2P and Firewall Issues, Trusted Computing
          Use the sea of meta-data supported by Semantic Grid
         Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
  two-level Programming Model
• We make a Service (same as a “distributed object” or
  “computer program” running on a remote computer) using
  conventional technologies
   – C++ Java or Fortran Monte Carlo module
   – Data streaming from a sensor or Satellite
   – Specialized (JDBC) database access
• Such services accept and produce data from users files and
• The Grid is built by coordinating such services assuming
  we have solved problem of programming the service 22
         Two-level Programming II
   The Grid is discussing the composition of distributed
    services with the runtime Service1                  Service2
    interfaces to Grid in
    analogy to UNIX
    pipes/data streams       Service3             Service4

   Familiar from use of UNIX Shell, PERL or Python
    scripts to produce real applications from core programs
   Such interpretative environments are the single
    processor analog of Grid Programming
   Some projects like GrADS from Rice University are
    looking at integration between service and composition
    levels but dominant effort looks at each level separately
        3 Layer Programming Model
            Substantial work in UK e-Science program,
              international semantic web community
    Information Architecture and Semantic Grid

   WS-* provides key low level capability but deliberately
    does not define an information (data) architecture and
    leaves this to domain specific specification activities such
    as CellML/SBML for biology, WFS/GML for GIS and
    XGSP for Collaboration
   WS-* does define a primitive service discovery (UDDI)
    and meta-data capabilities including WS-Context, WS-
    RF, RDF and WS-MetadataExchange already discussed.
   GGF defines Grid data capabilities including info-D
    (publish/subscribe) and OGSA-DAI for data repositories
   Semantic Grid uses WS-* and GS-* extending meta-data
    and service discovery with data-mining and reasoning
    3 XML Databases of Importance
   WS-Context controlling a workflow
   (Extended) UDDI supporting semantic service discovery
   WFS or ASFS (see later) provides application specific
    data/meta-data repository)
   These have different performance, scalability and data unit size
   In our implementation, each is currently “just an
    Oracle/MySQL” database front ended by filters that convert
    between XML (GML for WFS) and object-relational Schema
     • Example of Semantics (XML) versus representation (SQL)
   OGSA-DAI offers Grid interface to databases – we could use but
    don’t as we only need to expose WFS and not MySQL to Grid

Information Management/Processing
   SOAP messages transport information expressed in a
    semantically rich fashion between sources and services that
    enhance and transform information so that complete system
     • Semantic Web technologies like RDF and OWL help us have
       rich expressivity
   Data  Information  Knowledge transformation
   We build application specific information
    management/transformation systems ASIS for each application
   One special domain is the system itself where the metadata
    associated with services, sessions, Grids, messages, streams and
    workflow is itself managed and supported by an SIIS

                 Generalizing a GIS
   Geographical Information Systems GIS have been
    hugely successful in all fields that study the earth and
    related worlds
    • They define Geography Syntax (GML) and ways to store,
      access, query, manipulate and display geographical features
    • In SOA, GIS corresponds to a domain specific XML language
      and a suite of services for different functions above
   However such a universal information model has not
    been developed in other areas even though there are
    many fields in which it appears possible
    •   BIS Biological Information System
    •   MIS Military Information System
    •   IRIS Information Retrieval Information System
    •   PAIS Physics Analysis Information System
    •   SIIS Service Infrastructure Information System
ASIS Application Specific Information System I
   a) Discovery capabilities that are best done using WS-*
   b) Domain specific metadata and data including
    search/store/access interface. (cf WFS). Lets call generalization
    ASFS (Application Specific Feature Service)
    • Language to express domain specific features (cf GML). Lets call
      this ASL (Application Specific language)
    • Tools to manipulate information expressed in language and key
      data of application (cf coordinate transformations). Lets call this
      ASTT (Application specific Tools and Transformations)
    • ASL must support Data sources such as sensors (cf OGC metadata
      and data sensor standards) and repositories. Sensors need
      (common across applications) support of streams of data
    • Queries need to support archived (find all relevant data in past)
        and streaming (find all data in future with given properties)
    • Note all AS Services behave like Sensors and all sensors are
      wrapped as services
    • Any domain will have “raw data” (binary) and that which has been
      filtered to ASL. Lets call ASBD (Application Specific Binary Data)
ASIS Application Specific Information System II
   Lets call this ASVS (Application Specific Visualization Services)
    generalizing WMS for GIS
   The ASVS should both visualize information and provide a way of
    navigating (cf GetFeatureInfo) database (the ASFS)
   The ASVS can itself be federated and presents an ASFS output
   d) There should be application service interface for ASIS from which all
    ASIS service inherit
   e) There will be other user services interfacing to ASIS
   All user and system services will input and output data in ASL using
    filters to cope with ASBD
                  Filter, Transformation, Reasoning,
       AS                Data-mining, Analysis
                    AS Tool          AS Service       AS Tool     ASVS
 Semantically Rich Services with a Semantically
   Rich Distributed Operating Environment
   Virtualization everywhere
   Focus on semantics not representation to get
    performance combined with expressivity for transport
    and data access
   All this enabled by powerful meta-data services
   Grids add management to rich but potentially chaotic
    set of Web Services;
    • management and coherence enabled by meta-data
   Can define general information architectures (ASIS,
    GIS, SIIS) for both applications and system
   Knowledge from filters that span simulations, data-
    mining, reasoning and agents
   A service is just a special case of a Grid
   Build systems from SubGrids (Gridlets)                38