Services and the Semantic Grid - Indiana University
Document Sample


Services and the Semantic
Grid
SKG2005 Beijing China November 28 2005
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
gcf@indiana.edu
1
http://www.infomall.org
Data Deluged Science
In the past, we worried about data in the form of parallel I/O or
MPI-IO, but we didn’t consider it as an enabler of new science
and new ways of computing
Data assimilation was not central to HPCC
DoE ASCI set up because didn’t want test data!
Now particle physics will get 100 petabytes from CERN
• Nuclear physics (Jefferson Lab) in same situation
• Use around 30,000 CPU’s simultaneously 24X7
Weather, climate, solid earth (EarthScope)
Bioinformatics curated databases (Biocomplexity only 1000’s of
data points at present)
Virtual Observatory and SkyServer in Astronomy
Environmental Sensor nets
2
Information/Knowledge Grids
Distributed (10’s to 1000’s) of data sources (instruments,
file systems, curated databases …)
Data Deluge: 1 (now) to 100’s petabytes/year (2012)
• Moore’s law for Sensors
Possible filters assigned dynamically (on-demand)
• Run image processing algorithm on telescope image
• Run Gene sequencing algorithm on compiled data
Needs decision support front end with “what-if”
simulations
Metadata (provenance)
critical to annotate data
Integrate across experiments
as in multi-wavelength
astronomy
Data Deluge comes from pixels/year available 3
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O Filter Service
SS
S FS FS
MD
MD
SS
FS O O O
FS
O S S S
SS S FS F
FS FS S
FS
SS MD
O MD
O MD
S S O
SS
FS FS FS S Other
FS
F Service
S O
SS MD
O O S
FS S S FS
FS MD MD
SS FS
O
FS FS FS FS S MetaData
MD
SS
S S S S S S S S S Sensor Service
S S S S S S S S S
4
Database
Semantic Grid and Services
Implications of SOA (Service Oriented Architectures) for SG
(Semantic Grid)
• Build services to implement SG
Implications of SG for SOA
• Build metadata rich systems of services using SG
Services receive data in SOAP messages, manipulate it and
produce transformed data as further messages
Meta-data is carried in SOAP messages
Meta-data controls processing and transport of SOAP Messages
Knowledge is created from data by services
The Grid enhances Web services with semantically rich system
and application specific management
One must exploit and work around the different approaches to
meta-data and their manipulation in Web Services 5
Structure of SOAP Messages
Container Workflow
H1 H2 H3 H4 Body F1 F2 F3 F4 Service
Container Handlers
SOAP Messages have System information in the header
including WS-Policy based meta-data defining processing
options
• Processed by Handlers
Application data and meta-data is the body (controversies here!)
• Processed by the Service itself
Some meta-data like WS-RF is logically “only in messages”
Other like that in WS-Context or the SRB are stored in logical
equivalent of XML databases
We only need to preserve semantic structure (XML/SOAP
Infoset) so transport in fast XML and store in efficient relational
databases
6
What Type of Services are there?
There are a horde of support services supplying security,
collaboration, database access, user interfaces
The support services are either associated with system or
application
• We will study the WS-* and GS-* which implicitly or
explicitly define many support services
There are generalized filter services which are applications that
accept messages and produce new messages with some data
derived from that in input
• Simulations (including PDE’s and reactive systems)
• Data-mining
• Transformations
• Agents
• Reasoning are all termed filters here
There are services like “author ontology”, “parse RDF” or
“attach provenance” that directly support Semantic Grid
But all services and their interactions are bathed in sea of meta-
data and so implicitly need and support the Semantic Grid
7
It’s a Composite Hierarchical World
Filters can be a workflow which means they are “just collections
of other simpler services”
• One needs meta-data to control the workflow
Services are programs that accept messages and produce
messages
Grids are a distributed collection of services supporting
managed shared resources
• Management requires meta-data
Grids are distributed systems that accept distributed messages
and produce distributed result messages
• Can always talk about Grids and view a service or a
workflow as a special case of a Grid
It just requires meta-data to send a message to a Grid and it
routed to “correct computer” holding “requested service”
• Meta-data allows mapping of virtual to real addresses 8
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O
SOAP Message Streams Filter Service Wisdom
SS
S FS FS
Another
MD
Service
MD Decisions
SS
FS
Data O O O
Raw Data FS
Knowledge
O S S S
SS S Information FS F
FS FS S
FS
SS MD
O MD
O
Knowledge
MD
Data S Information S O
SS
FS FS FS S Other
FS
Raw Data F Service
S O
SS MD
O Information O S
Another
FS S S FS
Service FS MD MD
SS FS
Data O
FS FS FS
DataFS S MetaData
MD
SS
S S S S S S S S S Sensor Service
S S S S S S S S S
Raw Data Raw Data is same as outward
Another Grids of Grids Architecture Another
SOAP Message Streams facing application
9
Grid
Database Grid service
The Grid and Web Service Institutional Hierarchy
4: Application or Community of Interest
Specific Services
such as “Run BLAST” or “Look at Houses for sale”
OGSA
3: Generally Useful Services and Features and other
Such as “Access a Database” or “Submit a Job” or “Semantic GGF/W3C/
Grid” or “Support a Portal” or “Collaborative Visualization” ………
2: System Services and Features WS-* from
OASIS/W3C/
Handlers like WS-RM, Security, Programming Models like BPEL
Industry
or Registries like UDDI
1: Container and Apache Axis
Run Time (Hosting) Environment .NET etc.
10
The WS-* Infrastructure
Core Grid Services build on and/or extend the 60 or so
WS-* Infrastructure specifications which define
• 1. Container Model, XML, WSDL …
• 2. Service Internet ( (Reliable) Messaging, Addressing)
including extensions for high performance transport and
representation. This is natural basis for streaming
applications
• 3. Notification
• 4. Workflow and Transactions
• 5. Security
• 6. Service Discovery
• 7. Metadata and State including lifetime These categories
are directly connected
• 8. Management (service interactions) to metadata
• 9. Policy, Agreements
• 10. Portals and User Interfaces
11
A List of Web Services 6
• 6) Service Discovery
• UDDI (Broadly Supported OASIS Standard) V3 August
2003
• WS-Discovery Web services Dynamic Discovery
(Microsoft, BEA, Intel …) February 2004
• WS-IL Web Services Inspection Language, (IBM,
Microsoft) November 2001
• Note WS-Context as a metadata catalog and WS-
Management Catalog are examples of related services
• There are many UDDI extensions such as Grimoires from
Discovery is just accessing part of meta-data
UK OMIIawhich often are essentially providing semantic
defining Grid
enrichment 12
A List of Web Services 7
• 7) Metadata and State
• RDF Resource Description Framework (W3C) Set of
recommendations expanded from original February 1999 standard
• DAML+OIL combining DAML (Darpa Agent Markup Language)
and OIL (Ontology Inference Layer) (W3C) Note December 2001
• OWL Web Ontology Language (W3C) Recommendation February
2004
• WS-MetadataExchange Web Services Metadata Exchange (BEA,
IBM, Microsoft, SAP, Sun …) September 2004
• ASAP Asynchronous Service Access Protocol (OASIS) with V1.0
working draft 2B December 11 2004
• WS-GAF Web Service Grid Application Framework (Arjuna,
Newcastle University) August 2003
• WBEM Web-Based Enterprise Management including CIM
(Common Information Model) from DMTF (Distributed
Management Task Force) 2004-2005 13
A List of Web Services 7
• 7) Metadata and State: Resource Framework
• WS-RF Web Services Resource Framework (OASIS)
including
• WS-Resource Framework Web Services Resource 1.2
(OASIS) Public Review Draft 01, 10 June 2005
• WS-ResourceProperties Web Services Resource
Properties V1.2 Public Review Draft 01, 10 June 2005
• WS-ResourceLifetime Web Services Resource Lifetime
V1.2 Public Review Draft 01, 13 June 2005
• WS-ServiceGroup Web Services Service Group V1.2
Public Review Draft 01, 10 June 2005
These WS-* define syntax of Meta-data (RDF
• OWL CIM) and how to use it in system (WS- Public
WS-BaseFaults Web Services Base Faults V1.2
Review Draft 01, June 13, 2005
MetadataExchange) – especially headers (WS-RF) 14
Metadata and Service Context
Consider a collection of services working together
• Workflow tells you how to specify service
interaction but more basically there is shared
information or context specifying/controlling
collection
WS-RF and WS-GAF have different approaches to
contextualization – supplying a common “context”
which at its simplest is a token to represent state
More generally core shared information includes
dynamic service metadata and the equivalent of
configuration information.
Two services linked by a stream are perhaps simplest
example of a collection of services needing context
Note that there is a tension between storing
metadata in messages and services.
• This is shared versus distributed memory debate in
15
parallel computing
Stateful Interactions
There are (at least) four approaches to specifying state
• OGSI use factories to generate separate services for
each session in standard distributed object fashion
• Globus GT-4 and WSRF use metadata of a resource
to identify state associated with particular session
• WS-GAF uses WS-Context to provide abstract
context defining state. Has strength and weakness
that reveals less about nature of session
• WS-I+ “Pure Web Service” leaves state specification
the application – e.g. put a context in the SOAP body
I think we should smile and write a great metadata
(semantic) service hiding all these different models for
state and metadata 16
Role of WS-Context
There are many WS-* specifications addressing meta-data
and both many approaches and many trade-offs
We hear about Distributed Hash Tables (Chord) to achieve
scalability in large scale networks
Managed dynamic workflows as in sensor integration and
collaboration require
• Fault-tolerance and ability to support dynamic changes
with few millisecond delay
• But only a modest number of involved services (up to
1000’s in a session)
• Need Session NOT Service/Resource meta-data so don’t
use WS-RF
We are building a WS-Context compliant metadata catalog
supporting distributed or central paradigms – see later talk
by Mehmet Aktas
Use for OGC Web catalog service with UDDI for slowly
varying meta-data
17
A List of Web Services 8
• 8) Management
• WS-DistributedManagement Web Services
Distributed Management Framework with MUWS
and MOWS below (OASIS)
• WSDM-MUWS Web Services Distributed
Management: Management Using Web Services
(OASIS) OASIS Standard March 9 2005
• WSDM-MOWS Web Services Distributed
Management: Management of Web Services
(OASIS) OASIS Standard March 9 2005
18
A List of Web Services 8- Contd
• 8) Management: Microsoft Stack
• WS-Management Web Services for Management
(Microsoft, Intel, Sun …) August 2005
• WS-Management Catalog The WS-Management
Catalog (Microsoft, Intel, Sun …) August 2005
• WS-Transfer Web Service Transfer (Microsoft,
BEA, Sonic Software etc.) September 2004
• WS-Enumeration Web Service Enumeration
BEA, Sonic Software etc.) September
(Microsoft, define exchange of data and meta-data
These WS-*
2004
between services
19
A List of Web Services 9
• 9) General Service Characteristics
• WS-PolicyFramework Web Services Policy
Framework (BEA, IBM, Microsoft, SAP …)
September 2004
• WS-PolicyAttachment Web Services Policy
Attachment (BEA, IBM, Microsoft, SAP …)
September 2004
• WS-PolicyAssertions Web Services Policy Assertions
Language (BEA, IBM, Microsoft, SAP) 18 December
2002 WS-* define syntax of Meta-data defining
These(Superseded by WS-PolicyFramework)
• structure of distributed SystemAgreement
WS-Agreement Web Services
Grids are managed (meta-data enhanced) August 2004
Specification (GGF under development) 9 20
distributed collections of Internet Scale services
Activities in Global Grid Forum Working Groups
GGF Area Standards Activities
1: Architecture High Level Resource/Service Naming (level 2 of fig. 1),
Integrated Grid Architecture
2: Applications Software Interfaces to Grid, Grid Remote Procedure Call,
Checkpointing and Recovery, Interoperability to Job Submittal services,
Information Retrieval,
3: Compute Job Submission, Basic Execution Services, Service Level Agreements
for Resource use and reservation, Distributed Scheduling
4: Data Database and File Grid access, Grid FTP, Storage Management, Data
replication, Binary data specification and interface, High-level
publish/subscribe, Transaction management
5: Infrastructure Network measurements, Role of IPv6 and high performance
networking, Data transport
6: Management Resource/Service configuration, deployment and lifetime, Usage
records and access, Grid economy model
7: Security Authorization, P2P and Firewall Issues, Trusted Computing
21
Use the sea of meta-data supported by Semantic Grid
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and
databases
Service
Data
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service 22
Two-level Programming II
The Grid is discussing the composition of distributed
services with the runtime Service1 Service2
interfaces to Grid in
analogy to UNIX
pipes/data streams Service3 Service4
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
Such interpretative environments are the single
processor analog of Grid Programming
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
23
3 Layer Programming Model
Web Service 1 WS 2 WS N-1 Web Service N
Level 1 Programming inside services
Application expressed in in Java Fortran C++ MPI etc.
WS-* Infrastructure
Level 2 Programming choosing services by virtualization
Application Semantics (Metadata, Ontology) Semantic Grid
Level 3 Grid Programming composing multiple services
Service Workflow, Transactions, Mediation
Substantial work in UK e-Science program,
international semantic web community
24
Information Architecture and Semantic Grid
WS-* provides key low level capability but deliberately
does not define an information (data) architecture and
leaves this to domain specific specification activities such
as CellML/SBML for biology, WFS/GML for GIS and
XGSP for Collaboration
WS-* does define a primitive service discovery (UDDI)
and meta-data capabilities including WS-Context, WS-
RF, RDF and WS-MetadataExchange already discussed.
GGF defines Grid data capabilities including info-D
(publish/subscribe) and OGSA-DAI for data repositories
Semantic Grid uses WS-* and GS-* extending meta-data
and service discovery with data-mining and reasoning
25
3 XML Databases of Importance
WS-Context controlling a workflow
(Extended) UDDI supporting semantic service discovery
WFS or ASFS (see later) provides application specific
data/meta-data repository)
These have different performance, scalability and data unit size
requirement
In our implementation, each is currently “just an
Oracle/MySQL” database front ended by filters that convert
between XML (GML for WFS) and object-relational Schema
• Example of Semantics (XML) versus representation (SQL)
difference
OGSA-DAI offers Grid interface to databases – we could use but
don’t as we only need to expose WFS and not MySQL to Grid
26
Information Management/Processing
SOAP messages transport information expressed in a
semantically rich fashion between sources and services that
enhance and transform information so that complete system
provides
• Semantic Web technologies like RDF and OWL help us have
rich expressivity
Data Information Knowledge transformation
We build application specific information
management/transformation systems ASIS for each application
domain
One special domain is the system itself where the metadata
associated with services, sessions, Grids, messages, streams and
workflow is itself managed and supported by an SIIS
27
Generalizing a GIS
Geographical Information Systems GIS have been
hugely successful in all fields that study the earth and
related worlds
• They define Geography Syntax (GML) and ways to store,
access, query, manipulate and display geographical features
• In SOA, GIS corresponds to a domain specific XML language
and a suite of services for different functions above
However such a universal information model has not
been developed in other areas even though there are
many fields in which it appears possible
• BIS Biological Information System
• MIS Military Information System
• IRIS Information Retrieval Information System
• PAIS Physics Analysis Information System
• SIIS Service Infrastructure Information System
28
ASIS Application Specific Information System I
a) Discovery capabilities that are best done using WS-*
standards
b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets call generalization
ASFS (Application Specific Feature Service)
• Language to express domain specific features (cf GML). Lets call
this ASL (Application Specific language)
• Tools to manipulate information expressed in language and key
data of application (cf coordinate transformations). Lets call this
ASTT (Application specific Tools and Transformations)
• ASL must support Data sources such as sensors (cf OGC metadata
and data sensor standards) and repositories. Sensors need
(common across applications) support of streams of data
• Queries need to support archived (find all relevant data in past)
and streaming (find all data in future with given properties)
• Note all AS Services behave like Sensors and all sensors are
wrapped as services
• Any domain will have “raw data” (binary) and that which has been
filtered to ASL. Lets call ASBD (Application Specific Binary Data)
29
ASIS Application Specific Information System II
Lets call this ASVS (Application Specific Visualization Services)
generalizing WMS for GIS
The ASVS should both visualize information and provide a way of
navigating (cf GetFeatureInfo) database (the ASFS)
The ASVS can itself be federated and presents an ASFS output
interface
d) There should be application service interface for ASIS from which all
ASIS service inherit
e) There will be other user services interfacing to ASIS
All user and system services will input and output data in ASL using
filters to cope with ASBD
Filter, Transformation, Reasoning,
AS Data-mining, Analysis
Repository
AS Tool AS Service AS Tool ASVS
(generic) (user defined) (generic) Display
AS
“Sensor”
30
Messages using ASL
Directly GS-* WS-*
Filters/ASTT
Military
Information
Management
System
Everything
Is a
Service
or a message/
Information
Nugget
ASVS 31
MIO
or Military
ASFS Information
Object
Unit of
Managed
Information
OGSA-DAI and Sensor Standards
expressed in
ASL
Info-D
WS-Notification
WS-Eventing
32
IS = Information
Resource
Information
Receive Get ASL
Service
Request/Select Status Data Get
(Sensor,
Service or
Repository)
Issue Request ASL
BFS = Request/Select Status Data Put
Basic Filter Filter Resource
Service
Receive Get ASL
Request/Select Status Data Get
Filters either transform or aggregate Information
33
A Filter Service is a general workflow
BFS
(the microscopic workflow) of Basic
FS =
Filter Services
BFS BFS
BFS BFS
The output of a Filter Service is
indistinguishable from that of an IS
BFS
A transport link supports asynchronous publish/subscribe semantics
and Web Service Reliable messaging fault tolerance
Transport links can be multicast to support collaboration (typically
for last link before or after Presentation Service) or replication for
fault tolerance.
34
Top IS could be produced by a Filter Service
IS IS IS
IS Gridlet =
FS FS FS
FS
The basic unit (Gridlet) transforms and aggregates
application specific information
Gridlets are composed using Grid of Grids concept
35
IS
IS Gridlet IS Gridlet IS Gridlet
Gridlet
IS Gridlet IS Gridlet IS Gridlet
Federation General System
Macrosopic Workflow IS Gridlet Services
-----------------------
Messaging/Data
transport
Notification
Search Security
Session Planning Fault Tolerance
Construction Presentation
Management Metadata
Management Directory
ASVS Collaboration
Replica
Portal Management
36
Data Information Knowledge as messages flow from original sources to top of Filter Grid
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O
SOAP Message Streams Filter Service Wisdom
SS
S FS FS
Another
MD
Service
MD Decisions
SS
FS
Data O O O
Raw Data FS
Knowledge
O S S S
SS S Information FS F
FS FS S
FS
SS MD
O MD
O
Knowledge
MD
Data S Information S O
SS
FS FS FS S Other
FS
Raw Data F Service
S O
SS MD
O Information O S
Another
FS S S FS
Service FS MD MD
SS FS
Data O
FS FS FS
DataFS S MetaData
MD
SS
S S S S S S S S S Sensor Service
S S S S S S S S S
Raw Data Raw Data is same as outward
Another Grids of Grids Architecture Another
SOAP Message Streams facing application
37
Grid
Database Grid service
Summary
Virtualization everywhere
Focus on semantics not representation to get
performance combined with expressivity for transport
and data access
All this enabled by powerful meta-data services
Grids add management to rich but potentially chaotic
set of Web Services;
• management and coherence enabled by meta-data
Can define general information architectures (ASIS,
GIS, SIIS) for both applications and system
Knowledge from filters that span simulations, data-
mining, reasoning and agents
A service is just a special case of a Grid
Build systems from SubGrids (Gridlets) 38
Get documents about "