Automatic Web service configuration

Document Sample
Automatic Web service configuration Powered By Docstoc
					               Automatic Template-based Web service configuration




Joost Broekhuizen

Student number: 1090690
Email: jbroekh@cs.vu.nl
MSc Thesis
Intelligent Interactive Distributed Systems Group,
Department of Computer Science,
Division of Mathematics and Computer Science,
Faculty of Sciences,
Vrije Universiteit Amsterdam
de Boelelaan 1081a
1081 HV, Amsterdam,
The Netherlands

Supervisors:
Drs. S. van Splunter
Prof. Dr. F.M.T. Brazier

Second reader:
Dr. N.J.E. Wijngaards
Contents

 1 Introduction .............................................................................................................................. 3
 2 Web service properties / Research ........................................................................................... 4
    2.1 Terms and definitions ......................................................................................................... 4
         2.1.1 Web services ........................................................................................................... 4
            Layout of service technology ....................................................................................... 5
         2.1.2 WSDL and OWL-S ................................................................................................. 6
            WSDL........................................................................................................................... 6
            Detailed structure of WSDL ......................................................................................... 6
            OWL-S ......................................................................................................................... 7
            Overall structure of OWL-S ......................................................................................... 9
            Detailed structure of OWL-S ..................................................................................... 10
    2.2 Configuration methods ..................................................................................................... 12
         2.2.1 Configuration methods described .......................................................................... 12
         2.2.2 Configuration methods compared ......................................................................... 14
         2.2.3 The Agent Factory ................................................................................................. 15
            2.2.3.1 Structuring design process ............................................................................. 15
            2.2.3.2 Structuring design artefact ............................................................................. 16
            2.2.3.3 Templates ....................................................................................................... 16
 3 Application Domain ............................................................................................................... 19
    3.1 Problem statement ............................................................................................................ 19
         3.1.1 Initial situation....................................................................................................... 19
         3.1.2 Current situation .................................................................................................... 20
    3.2 Design Process ................................................................................................................. 20
         3.2.1 Initial situation....................................................................................................... 20
         3.2.2 Current situation .................................................................................................... 22
            Template 1 .................................................................................................................. 22
            Template 2 .................................................................................................................. 24
            Template 3 .................................................................................................................. 26
 4 Implementation ....................................................................................................................... 28
    4.1 Web service implementation ............................................................................................ 28
         4.1.1 GetFile ................................................................................................................... 28
         4.1.2 GathAndStoreBib .................................................................................................. 28
         4.1.3 BibInfoGath ........................................................................................................... 29
         4.1.4 ReplaceSIATag ..................................................................................................... 29
         4.1.5 Rdf2Bib ................................................................................................................. 29
         4.1.6 StorePub ................................................................................................................ 30
    4.2 Web service annotation .................................................................................................... 30
         4.2.1 GetFile ................................................................................................................... 30
         4.2.2 GathAndStoreBib .................................................................................................. 31
         4.2.3 BibInfoGath ........................................................................................................... 31
         4.2.4 ReplaceSIATag ..................................................................................................... 32
         4.2.5 Rdf2Bib ................................................................................................................. 32
         4.2.6 StorePub ................................................................................................................ 33
         4.2.7 Domain ontology ................................................................................................... 33


                                                                     1
       4.2.8 Template ................................................................................................................ 33
5 Discussion and Conclusions .................................................................................................. 34
6 References ............................................................................................................................... 36
Appendix A – Techniques and definitions................................................................................ 38
Appendix B – Design process considerations ........................................................................... 40
Appendix C ................................................................................................................................ 42
     C.1 GetFile: ........................................................................................................................ 42
     C.2 GathAndStoreBib: ....................................................................................................... 42
     C.3 BibInfoGath: ................................................................................................................ 44
     C.4 ReplaceSIATag: ........................................................................................................... 44
     C.5 Rdf2Bib: ...................................................................................................................... 45
     C.6 StorePub: ..................................................................................................................... 48
Appendix D ................................................................................................................................ 49
     D.1 GetFile ......................................................................................................................... 49
     D.2 GathAndStoreBib ........................................................................................................ 49
     D.3 BibInfoGath ................................................................................................................. 49
     D.4 ReplaceSIATag ........................................................................................................... 49
     D.5 Rdf2Bib ....................................................................................................................... 50
     D.6 StorePub ...................................................................................................................... 50
     D.7 Ontology ...................................................................................................................... 50
     D.8 Template ...................................................................................................................... 50




                                                                    2
1 Introduction

In a rapidly increasing number of situations Web services are used to perform tasks for users.
While these services are very useful they still leave a lot to be desired. It is very hard to find a
Web service that provides the exact functionality a user is looking for. It would be very useful if a
Web service could be created on demand.
There are a large number of Web services available on the Internet and they are all useful in their
own way. However there are situations in which the use of multiple Web services is required, or
several components of different Web services are required. In this case a need rises for an
application that can combine these services or components into one single application that can
present a single service to a user that fulfils all the user‟s needs. This an be realised using Web
service templates and the template based configuration functionality of the Agent Factory
[Brazier, Wijngaards, 2001].

This thesis focuses on automated Web service configuration using the Agent Factory. The Agent
Factory is an automated configuration service that can be used to configure agents and other
structured applications such as Web services.
The Web service configuration process of the Agent Factory uses templates and provides
descriptions of services as a guideline [Richards, Splunter et al, 2003]. Templates specify
different processes that together define meaningful Web services. These processes are defined by
input and output and additional conditions. The Agent Factory is designed to perform template
based configuration. This thesis focuses on creating a number of templates that describe different
Web services. In addition a number of Web services are described which can be used to fill in the
templates. Combined these templates and services fulfil the requirements for the Agent Factory to
create a Web service configuration.

The requirements for a Web service to be configurable and for configuration to be automated
using the Agent Factory are the focus of section 2. Web services are described in terms of
requirements and different configuration methods are compared. Section 3 presents the
application domain of this thesis and presents a look at the previous and the current situation in
the scope of this project and also what techniques and tools were used to realise the
implementation.
Section 4 describes the implementation of the solution to the problem statement. It describes the
implementation of Web services, their annotations and the ontology and templates that have been
designed for the configuration process. Section 5 presents a discussion of difficulties that were
identified during this project and the conclusions of this project.




                                                 3
2 Web service properties / Research

This section provides an overview of the properties that a configurable Web service should have.
Section 2.1 describes the terms and definitions commonly used in this thesis and provides some
insight on why particular properties have been chosen for this Master‟s project.
Section 2.2 describes various methods for automating Web service configuration, one of them is
the Agent Factory [Brazier, Wijngaards, 2001].


2.1 Terms and definitions

This section describes the terms used throughout this thesis and gives the reader some more
insight in the standards and terminology in this field of research.

2.1.1 Web services

The Stencil Group provides the definition of Web services that is used throughout this thesis. The
definition of Web services according to [The Stencil Group, 2001] is: “Loosely coupled, reusable
software components that semantically encapsulate discrete functionality and are distributed and
programmatically accessible over standard Internet protocols.”
This definition is useful for different reasons, explained below.
First of all reusable software components make it possible to create new services based on
existing software. Second Web services semantically encapsulate discrete functionality in a sense
that they define their own functionality and behaviour making it possible for any other service
(for example the Agent Factory) to determine what the service does, how to use it and what to
expect from it.
A third property is that Web services can be accessed programmatically. Web services are not
necessarily accessed only by human users, they may also be accessed by other services, e.g., the
Agent Factory [Brazier, Wijngaards, 2001]. The final property is that the services can be
distributed over the Internet and use standard protocols to make them accessible, this ensures
simple access, not having to implement application specific interfaces.




                                                4
Figure A: Overview of Web service technology

Layout of service technology

Figure 1 gives an overview of Web service technology: A standard combination of technology to
build Web services using SOAP, WSDL, UDDI and BPEL4WS techniques.
Each layer in this figure is discussed below working from the bottom to the top.

The Transport Layer is for access control to the Web services: using the protocols in this layer it
becomes possible to access the Web services and to exchange information between services.
The XML messaging layer facilitates the exchange of data via typed messages and remote calls.
The Service Description Layer describes the functionality of the Web services and describes its
input, output etc. WSDL and OWL-S are languages often used for this purpose. These languages
are described in the next two sections in more detail.
The Web service composition/discovery layer describes the different techniques for Web service
composition, configuration and the discovery of the Web services.
The Applications Layer includes the actual implementation of Web services and their properties.

HTTP, SMTP and FTP in the bottom layer, are used for information exchange.
The Simple Object Access Protocol (SOAP)1, in the second layer, is a lightweight and simple
XML-based protocol that is designed to exchange structured and typed information on the Web.
The goal of SOAP is to enable rich and automated Web services based on a shared and open Web
infrastructure. SOAP can be used in combination with a variety of existing Internet protocols and
formats including Hypertext Transfer Protocol (HTTP), Simple Mail Transfer Protocol (SMTP),
and Multipurpose Internet Mail Extensions (MIME), and can support a wide range of
applications from messaging systems to remote procedure calls (RPCs).
WSDL and OWL-S, that belong to the next layer, are described in detail in the next section.
Universal Description Discovery and Integration (UDDI)2 is used to look-up Web services.
UDDI contains references to all kinds of Web services.

As mentioned above UDDI can be used to search for Web services. There is an intelligent
software agent however that can also be used for this task. This agent can make the searching for
Web service much easier. This agent called The Matchmaker3 introduced by [Paolucci et al,
1
  Simple Object Access Protocol http://www.soapware.org/
2
  Universal Description Discovery and Integration of Web services http://www.uddi.org
3
  The DAML-S Matchmaker http://www-2.cs.cmu.edu/~softagents/daml_Mmaker/daml-s_matchmaker.htm

                                                     5
2002] uses UDDI and OWL-S together with a matching algorithm to provide a service of finding
the right service for a task.
The Matchmaker is also a web service that helps make connections between service requesters
and service providers. The Matchmaker serves as a "yellow pages" of service capabilities. The
Matchmaker allows users and/or software agents to find each other by providing a mechanism for
registering service capabilities. Registration information is stored as advertisements. When the
Matchmaker agent receives a query from a user or another software agent, it searches its dynamic
database of advertisements for agents that can fulfil the incoming request(s). Thus, the
Matchmaker also serves as a liaison between a service requester and a service provider.

2.1.2 WSDL and OWL-S

This section describes the WSDL and OWL-S layer, the third layer in Figure A.

WSDL

Web Service Description Language (WSDL)4 is used to describe the functionality of a service.
WSDL is an XML format for describing network services as a set of endpoints operating on
messages containing either document-oriented or procedure-oriented information. Operations and
messages are described abstractly, and then bound to a concrete network protocol and message
format to define an endpoint. Related concrete endpoints are combined into abstract endpoints
(services). WSDL is extensible to allow description of endpoints and their messages regardless of
what message formats or network protocols are used to communicate.
As communications protocols and message formats are standardized in the web community, it
becomes possible and increasingly important to be able to describe the communications in some
structured way. WSDL addresses this need by defining an XML grammar for describing network
services as collections of communication endpoints capable of exchanging messages. WSDL
service definitions provide documentation for distributed systems and serve as a recipe for
automating the details involved in applications communication.
A WSDL document defines services as collections of network endpoints, or ports. In WSDL, the
abstract definition of endpoints and messages is separated from their concrete network
deployment or data format bindings. This allows the reuse of abstract definitions: messages,
which are abstract descriptions of the data being exchanged, and port types, which are abstract
collections of operations. The concrete protocol and data format specifications for a particular
port type constitute a reusable binding. A port is defined by associating a network address with a
reusable binding, and a collection of ports defines a service.

Detailed structure of WSDL

A WSDL description of a Web service can contain the following elements5:

         Types– a container for data type definitions using some type system (such as XSD).
         Message– an abstract, typed definition of the data being communicated.
         Operation– an abstract description of an action supported by the service.
4
    Web Service Description Language: http://www.w3.org/TR/wsdl
5
    WSDL elements described: http://www.w3.org/TR/wsdl

                                                     6
         Port Type–an abstract set of operations supported by one or more endpoints.
         Binding– a concrete protocol and data format specification for a particular port type.
         Port– a single endpoint defined as a combination of a binding and a network address.
         Service– a collection of related endpoints.


WSDL is used for the description of the grounding of a Web service, it defines how to access the
service. The elements that are most commonly used in Web services are Message, Port/Port
Type, Binding and Service.
The Message part describes the data that is communicated, and links these to the OWL-S
parameters that they correspond with.
The part that describes the port or port types defines names for the input and output of the Web
service and creates a port or port type for the input or output. The port or port types are then
linked by the Binding to the SOAP level to ensure the communication between different Web
services.
Finally the service description combines the port types and bindings and links them to a SOAP
location on the Internet.

OWL-S




Figure B: The Layered Cake6

The Layered Cake shown in Figure B, depicts the different levels described by the Semantic Web
community. Each layer is described below working from the bottom up.
The bottom layers, layer one and two, are similar to the lower layers of the Web service
technologies in figure A. URI provides global identifiers and UNICODE is a character-encoding
standard that supports international characters. This layer provides the global perspective, already
present in the WWW, for the Semantic Web.
HTML expresses what is to be shown to the user and XML adds a syntax on top of HTML to
describe what kind of information is expressed and make this information reusable.


6
    The layered Cake, Tim Berners-Lee, http://www.w3.org/2002/Talks/04-sweb/slide12-0.html

                                                          7
RDF and rdfschema in, layer three, are used to define and construct the building blocks to realise
the Semantic web. Together with an ontology is becomes possible to reason about data on the
web.
Ontologies, in layer four, are necessary when the expressiveness achieved with semantic
network-like tools is not enough. Metadata vocabularies defined by RDF Schemas can be
considered as simplified ontologies. The tools included in this layer rise the developed
vocabularies to the category of ontologies. Ontologies are specially suited to formalise domain
specific knowledge. Once it is formalized, it can be easily interconnected with other
formalizations. This facilitates the interoperability among independent communities and thus
ontologies are one of the fundamental building blocks of the Semantic Web.
The purpose of the logic layer, layer five, is to provide the features of First Order Logic,
providing capabilities of logic at a reasonable computation cost.
The sixth layer, the proof layer, makes use of inference engines in the Semantic Web and makes
it open, contrary to computer programs that apply the black-box principle. An inference engine
can be asked how it has reached a conclusion, i.e. it provides proofs of their conclusions.
The trust layer is the top layer of the Semantic Web architecture. Agents that want to work with
the full-featured Semantic Web will form a Web of Trust7. The trust layer makes use of all the
Semantic Web layers below, layers three through six. The rules that bind the upper two layers
allow proof without the full logic machinery. They capture dynamic knowledge as a set of
conditions that must be fulfilled in order to achieve the set of consequences of the rule. Rules can
specify queries and inferences in Web ontologies, mappings between Web ontologies, and
dynamic Web behaviour of workflows, services, and agents. All statements on the Web occur in
a context, defined by RDF statements and matched to other statements by ontologies.
Applications need this context to evaluate the trustworthiness of statements using logic reasoning.
With logic reasoning proof can be found that statements are actually true or false. The actual trust
that is placed on statements, based on the proof, is evaluated in a different way by each
application.

The third and fourth layer (RDF / Ontology) is where OWL comes in to the picture. This
technology is an important feature of Web service design and is therefore explained more in-
depth below.

OWL-S8 is an OWL9-based Web service ontology, which supplies Web service providers with a
core set of mark-up language constructs for describing the properties and capabilities of their
Web services in unambiguous, computer-interpretable form. OWL-S mark-up of Web services
has been designed to facilitate the automation of Web service tasks, including automated Web
service discovery, execution, composition and interoperation. OWL-S, an extension of OWL,
was previously known as DAML-S. Current developments are focused on creating an ontology
for the Semantic Web10 community to enable automation of services on the Semantic Web.
OWL has been primarily designed to represent information about categories of objects and how
objects are interrelated, the sort of information that is often referred to as ontology. OWL can


7
  Web of trust: http://www.w3.org/2002/Talks/04-sweb/slide21-0.html
8
  Ontology Web Language – Services: http://www.daml.org/services/owl-s/
9
  Ontology Web Language: http://www.w3.org/2004/OWL
10
   Semantic Web: http://www.w3.org/2001/sw/ , Tim Berners Lee, Founder of Semantic Web http://www.w3.org/People/Berners-Lee/

                                                              8
also represent information about the object themselves, the data [Horrocks, Patel-Schneider and
van Harmelen 2003].
OWL-S has been designed to establish a framework within which OWL descriptions are made
and shared. Web sites should be able to employ a set of basic classes and properties for declaring
and describing services, and the ontology structuring mechanisms of OWL provide the
appropriate framework within which to do this [The OWL Services Coalition, 2003], [Martin et
al, 2004].
In [Sabou, Richards and Splunter, 2003] an experience report is given on DAML-S which
identifies some problems that need attention in the future, but also explains the advantages of
using DAML-S for the description of a service. DAML-S goes beyond syntactic description by
providing semantic descriptions and this allows reasoning about a service and creates the
possibility of dynamic service discovery and usage.
OWL-S is now an accepted standard in the Web service community. The W3C community has
accepted OWL-S as a standard and this ensures also a standardized way of describing Web
services.

In this master‟s project OWL-S version 1.0 is used for the annotation of Web services. This is the
most up-to-date (non-beta) version of OWL-S.

Overall structure of OWL-S

[Ankolekar et al, 2001] describes the structure of OWL-S. The description is illustrated in figure
C.




Figure C: Service description structure

In OWL-S each Web service has a Profile11, Model12, and Grounding13 [Ankolekar et al, 2001 ].
The Profile describes what the service requires from its user and what it provides. Using the
profile other services, or the Agent Factory, can judge whether this service meets their needs with
11
   OWL-S Profile ontology http://www.daml.org/services/owl-s/1.0/Profile.owl
12
   OWL-S Model ontology http://www.daml.org/services/owl-s/1.0/Process.owl
13
   OWL-S Grounding ontology http://www.daml.org/services/owl-s/1.0/Grounding.owl

                                                        9
respect to input, output, conditions etc. The service Model (process Model) is used for process
monitoring and coordination, however the monitoring part is not yet available. Moreover it
describes what happens if the service is carried out. The process model describes both simple and
composed services. This description is not used for simple services as simple services are
considered to be a black box. However when looking at Composed services these descriptions
become more interesting to consider. These Composed services could contain sub-services that
can be used for configuration. In this case the service Model becomes important. A Composed
service can actually have any number of sub-services running for them in parallel or sequential
order. The OWL-S Profile actually only describes the process at a high level and only defines the
input from the first part of the service and the output of the last called of the service. This is of
course enough to know about a service if the goal for configuration completely matches the
functionality of the service. However when looking at complex services the different sub-services
and scripts within the services can also be of interest from the perspective of re-use.
The Grounding describes how the service is used. Typically this grounding specifies how to
access the service: For example by specifying the communications protocols to use, address
information etc. Grounding also refers to the WSDL descriptions in which the protocols to use to
access the service are specified.
OWL-S is a layer above the XML-messaging layer (see Figure A), of which SOAP is an
example. The link between OWL-S and SOAP is that OWL-S describes what the different scripts
accomplish and how to access them. The OWL-S descriptions describe the input of the scripts
and the output of the scripts and what exactly happens within that process. SOAP ensures that
these XML-messages arrive at the correct locations.

Detailed structure of OWL-S

The Profile, Model and Grounding are defined in ontologies, which are discussed in this section.

The ontology of a Service Profile14:
The ontology first defines the Profile: Profile is a subclass of ServiceProfile. It is used to enable
different ways to create a profile for services other then defined in OWL-S.
The second part of a profile describes the non-functional properties of the service. Such
properties are, e.g., ServiceName, ContactInformation, quality of service and additional
information that may help to evaluate the service (TextDescription). The ServiceName refers to
the name of the service that is being offered. ContactInformation is used to record contact
information about the entity that issued the Profile. The TextDescription provides a brief
description of the service. It summarises what the service offers, or is used to describe what
service is requested.
After the description of the non-functional properties, the functional properties follow. These
include definitions of the service‟s input, output, pre- and post conditions and effects that help
with the specification of what the service provides.
The last part of the Service Profile contains some additional classes. These classes specify details
of the Profile such as service category, service parameters and quality rating of the service.
The Service category refers to an ontology of services that may be on offer. High-level services
could include classification on the bases of industry taxonomies such as NAICS15 or others that
14
     Service Profile ontology: http://www.daml.org/services/owl-s/1.0/Profile.owl
15
     North American Industry Classification System http://www.census.gov/epcd/www/naics.html

                                                       10
may be used. Additionally, it can be used to specify other classification systems such as,
Products, Problem Solving Capabilities, Commercial Services, Information etc.

The ontology of a Service Model16:
First the input, output, conditions and effects are defined, after which the connection to the upper
level Service ontology is defined. The internal processes of the service follow. Processes can
have a name, parameters, preconditions, and (conditional) effects.
There can be three kinds of processes in the Service Model: atomic processes, simple processes
and composite processes.
Atomic processes are the basic units of implementation. To interact with an atomic process
involves (at most) 2 messages: one carrying its inputs, and one carrying its outputs. (Note,
however, that messages are not explicitly defined in this ontology, but rather are specified
by the Grounding.) An atomic process is a "black box" representation; that is, no description is
given of how the process works (apart from inputs, outputs, preconditions, and effects). To be
used, an atomic process must be associated with a Grounding. This association is expressed
indirectly, by means of a Grounding instance, which is declared independently of the process.
Thus, to get to the Grounding for a given atomic process, one needs to navigate from the process
to the service object (via "describes"), and then from the service object to its grounding (via
"supports"). The grounding contains a relation mapping atomic processes to their groundings.
Simple processes provide an (optional) level of abstraction. They are described in the same way
as Atomic processes, but, unlike atomics, they have additional characterization of how they work,
in terms of other processes (using the "expandsTo" and "realizedBy" properties). They are not
directly callable. A simple process can be thought of as a "view" on either an atomic or a
composite process. Simple processes provide a means of characterizing other processes at
varying levels of granularity, for purposes of planning and reasoning.
Composite processes are composed of subprocesses, and specify constraints on the ordering and
conditional execution of these subprocesses. These constraints are captured by the
“composedOf” property, which is required for a composite process. Composite processes bottom
out in non-composite (atomic and/or simple) processes. When the processes within the service
Model have been defined there is room for some control constructs. The control constructs can be
used for specifying and composing processes. Examples of control constructs are „sequence‟,
„split‟, „unordered‟, „condition‟ etc. Sequence, for example, is defined as having a list of
component processes that specify the body, these processes can each have conditions, parameters
and effects. The effect of the sequence can be defined by the union of the effect of the individual
members, and the parameters of the sequence to be the union of the parameters of individual
members.

The ontology of a Service Grounding17:
The Grounding of a service mainly contains a mapping to corresponding WSDL specifications of
services. These WSDL specifications are explained in section 2.1.3.

Now that the general architecture and description methods that are used for Web service design
have been explained, it is time to look a bit further. It is possible using the explained methods to
create well-defined Web services with semantic descriptions. With these services and their
16
     service Model ontology: http://www.daml.org/services/owl-s/1.0/Process.owl
17
     service grounding ontology: http://www.daml.org/services/owl-s/1.0/Grounding.owl

                                                            11
descriptions, it becomes possible to create methods to configure these Web services and use them
to their full extent.

2.2 Configuration methods

This section describes and compares several methods to configure Web services. One of the
methods is selected for further use. Configuration methods discussed in this section are: the
Agent Factory, Agent Wrappers, Agents, Web service orchestration, Web service composer and
WebTransact. The Agent Factory is described in more detail in section 2.2.3.

2.2.1 Configuration methods described

This section describes different methods of Configuring Web services.

The first configuration method is called the Agent Factory [Brazier, Wijngaards, 2001] and
describes a component based design of agents using templates. It also automates the creation and
redesign of both the conceptual and operational design based on the requirements on function,
behaviour and state of an agent. The configuration method has also been applied to agents, which
is a further distinguishing feature. For a more in depth analysis of the Agent Factory see section
2.2.3.

The first alternative to the Agent Factory is Agent Wrappers. [Richards, Sabou et al 2003] write:
“Web services are componential, independent, software applications similar to agents. However,
agents are also reactive, social and capable of reasoning. If we wish Web services to work
together, we need to give them social and reasoning capabilities. This can be achieved by
wrapping a service in an agent.” This agent wrapper allows services to collaborate. An agent can
reason about a service based on DAML-S descriptions. This allows an agent to know what other
agents are capable of doing and to use this information to see if an other agent can assist in
meeting its goal. In The Racing Project18 a number of different agent wrappers are supported:
users, query translation, query planning, resource wrapper, ontology, matchmaking, cloning and
coordination agents. The use of agent wrappers is a way of allowing multi-agent system
technology to be applied to web services.
The Bee-gent framework19 [Kawamura et al, 1999] uses agent wrappers; this framework uses the
wrappers to protect information and to create a bridge for communicating to and about the
internal processes. This method of configuration can be automatic since the agents will
communicate with each other and try to find other agents that can perform a specific task. Agents
need to be able to take a task set and try to find other agents with whom it can solve the task set.
A separate agent or each agent could be assigned this coordination task.

Another alternative is composing Web services using agents. In this approach, Web services and
user constraints are marked up in DAML-S. A generic task procedure is selected by the user and
given to the DAML(-S) enabled agent, who customizes the procedure according to the user
specific constraints. Agents are well suited for this configuration task because of their automated

18
     http://www.zsu.zp.ua/racing/
19
     http://www.toshiba.co.jp/beegent/whatsbge.htm



                                                     12
nature. In many cases agents have been used to perform selection tasks based on user specified
constraints. Agents are also useful for this task with respect to changing preferences of the user.
When a user for example prefers speed over stability this change could be simply realized by
adaptive behaviour of the agent. Examples of systems that use agents for Web service
configuration can be found in [Bechhofer et al, 2001] and [Cheng et al, 2002].

The Business Process Execution Language for Web Services (BPEL4WS)20 [Wohes et al 2002]
provides a language for the formal specification of business processes and business interaction
protocols. By doing so, it extends the Web services interaction model and enables it to support
business transactions. BPEL4WS defines an interoperable integration model that should facilitate
the expansion of automated process integration in both the intra-corporate and the business-to-
business spaces. When multiple services are needed to perform a certain task, BPEL4WS makes
sure that the services are executed in the correct sequential order so that all different services
communicate to the right service with the right input and output.

Web service orchestration is also a configuration method. Web services orchestration is about
providing an open, standards-based approach for connecting web services together to create
higher-level business processes [Peltz 2003]. This approach focuses on Web service
configuration for business processes and how to manage these and take into account business
efficiency, adaptivity etc. Web service orchestration is closely related to BPEL4WS. There are
several methods that implement Web service orchestration. One example of an implementation is
the Collaxa Orchestration Server21. Collaxa provides a simple, standards-based software
infrastructure for integrating collaborative business processes together. The Collaxa product
supports the BPEL4WS, WS-Coordination [Pires et al 2002], and WS-Transaction [Pires et al
2002] specifications and includes an orchestration server and a management console. The
orchestration server provides the underlying infrastructure; handling asynchronous processing,
flow and transaction coordination, and monitoring of business processes.
All orchestration implementations have a user interface of some kind and there fore need some
interaction with a user to function.

Web service composer [Sirin et al, 2003] is a semi-automatic Web service configuration method.
This configuration method requires user interaction during the composition process. The basic
functionality of the composer is to let the users invoke web services annotated with DAML-S.
The user is presented a list of services registered to the system and can execute an individual web
service by entering the values to the input parameters. The DAML-S services are executed using
the WSDL grounding information. The prototype of the Web service composer is the first system
to directly combine the DAML-S semantic service descriptions with actual invocations of the
WSDL descriptions allowing us to execute the composed services on the Web.
Using the composer one can generate a workflow of web services. The composition is done in a
semi-automatic fashion where composer presents the available choices at each step a human
controller makes the selection. Compositions generated by the user can be saved as a new service
that can be further used in other compositions.


20
     Business Process Execution Language for Web Services http://www-106.ibm.com/developerworks/library/ws-
bpel/
21
   http://www.javaskyline.com/20030311_collaxa.html



                                                      13
The number of services available on the web will make it infeasible for someone to scroll down a
list presenting all the available services. Composer provides a filtering mechanism to limit the
services shown and let the user locate the most relevant service for the current task. The ontology
descriptions of DAML-S ServiceProfiles are used to dynamically build up a filtering panel where
constraints on various properties of the service may be entered.

The WebTransact [Paulo et al 2002] architecture encapsulates the message format, content, and
transaction support of multiple Web services and provides different levels of value added
services. First, the WebTransact architecture provides the functionality of uniform access to
multiple Web services. Remote services resolve conflicts involving the dissimilar semantics and
message formats from different Web services, and conflicts due to the mismatch in the content
capability of each Web service. Besides resolving structural and content conflicts, remote
services also provide information on the interface and the transaction semantics supported by
Web services. Second, Mediator services integrate semantically equivalent remote services
providing a homogenized view on heterogeneous Web services. Finally, transaction interaction
patterns are built on top of those mediator services generating composite mediator services that
can be used by application programs or exposed as new complex Web services.
Application programs interact with composite mediator services written by composition
developers. Such compositions are defined through transaction interaction patterns of mediator
services. Mediator services provide a homogenized interface of (several) semantically equivalent
remote services. Remote services integrate Web services providing the necessary mapping
information to convert messages from the particular format of the Web service to the mediator
format.


2.2.2 Configuration methods compared

This section compares the different methods and describes arguments to support the selection of
one of these methods.

An important factor to consider when choosing a Web service configuration method is to look at
automation of the process. While the Agent Factory is fully automated when provided with a
template and a source of Web services, other methods are only partly automated or not at all.
When looking at agent wrappers, they can be implemented to do a full configuration of Web
services however how this automation should be implemented is not known at this time.
BPEL4WS is automated and ensures a sequential activation of Web services, however
BPEL4WS has another focus, business processes, and there fore is not the right choice for the
scope of this thesis. When using agents for the configuration process, the process is not
automated and user intervention is needed for task procedure selection and putting on constraints.
Both Web service orchestration and composer need user interaction as well, and since the goal of
this thesis was an automated process they will not suffice either. The last method to consider in
this way is WebTransact, while it is an automated procedure it has its disadvantages with the
extra step of reformatting the service messages. So to conclude, the Agent Factory is the best
option for creating an automated Web service configuration application at this point.




                                                14
The Agent Factory fulfils all requirements of this project: it is fully automated and offers an
existing and structured implementation. This thesis assumes the Agent Factory is used for Web
service configuration.


2.2.3 The Agent Factory

This section describes the Agent Factory [Brazier, Wijngaards, 2001] in more detail.

The Agent Factory [Brazier, Wijngaards, 2001] approach focuses on automation of the creation
and adaptation of compositional agents. The nature of the components, of which an agent is
composed, is graybox. These components provide mechanisms to implement an agent‟s
processes, knowledge & information, and control. Component composition is regulated by
explicitly defined ‟open slots‟ in components & templates based on generic agent models.
Components are defined at two levels of abstraction: conceptual and operational. Minimal
ontologies are used for annotation of components and interfaces, without adhering to standard
ontologies or languages.

The Agent Factory was originally designed for agent software configuration. The configuration
process of a software agent in the (re-)design centre is based on the Generic Design Model
(GDM) presented in [Brazier et al 1994]. In short, the assumption behind this model is that both
requirements and their qualifications, and the description of an artefact evolve during a design
process. E.g., in practice often not all initial requirements can be satisfied. The artefact is
designed to satisfy sets of these requirements.

The Agent Factory works with a component based design process to create agents. These
components have well-defined functionality, input, output and conditions. It is this component
based design approach that makes the Agent Factory also suitable for Web service configuration.
Web services can also be configured by well-defined building blocks, components. Composing
agents can be viewed as a (re-) configuration task, the same holds for Web services [Splunter,
Brazier et al, 2003]. This thesis focuses on the use of the Agent Factory for Web services.

2.2.3.1 Structuring design process

This section provides an overview of the sub processes of the Agent Factory.




                          Design




Figure D: Agent Factory components




                                               15
(Re-)Design. The (Re-)Design process is responsible for the actual (re)design process of a Web
service configuration, based on given requirements, both hard and soft, provided by, e.g., an
agent of a human user.
Building Block Retrieval. Building Block Retrieval is responsible for retrieval of building blocks,
i.e. parts of Web service configuration, by querying, in which characteristics with respect to
functionality, behaviour and state are specified.
Assembly. In Assembly operational code is assembled on the basis of the operational blueprint
produced by the design process. Based on the output of the design process a new OWL-S
specification is generated.

2.2.3.2 Structuring design artefact

The Agent Factory requires components, knowledge and information, and coordination patterns
to be explicitly defined [Splunter, Wijngaards, Brazier, 2003].

Web services consist of several concepts, these concepts are the OWL-S descriptions of the
service, such as the Profile, Model and Grounding. The knowledge and information of Web
services are described by the Web service‟s Profile, Process and Grounding, defining the service
input, output and also the ontology to define the context of the service. The coordination patterns
are defined within the service Profile and Process by the (pre-) conditions, effects and control
constructs which are used for to express interdependencies between services. In this way,
structures required by the Agent Factory are directly mapped to OWL-S concepts.

2.2.3.3 Templates

Building blocks can be combined in many ways creating different services with very divers
functionality. But just combining building blocks at random would seldom create a working and
useful application. To ensure a useful combination of building blocks the Agent Factory uses
templates. These templates have open slots that define the functionality a component should have
to fill the open slot. It also defines the input, output etc. of the component. The templates are
defined in a small extension of OWL-S. The template describes the OWL-S profiles for the
components to fit the different open slots and defines a process to express the inter-dependence
and control over the slots.

                             Slot1     Building block Property A   Slot constant X

Template

                             Slot2     Building block Property B   Slot constant Y


Figure E: A blanc template

Figure E shows the structure of a possible template with two open slots.
When designing templates, for Web services, the first step is to acquire insight in the
functionality requirements. For example the goal of the functionality of a template could be to
display a piece of text in bold Arial format. The next step is to divide the process of reaching this
goal into small atomic steps. For example, let the user enter a piece of text and load it into an



                                                    16
object, then manipulate the text format within this object to the previously mentioned format, and
finally display the text in the object on screen.
With the descriptions of the simple sub processes of the template it is possible to create a
template with open slots that require exactly the functionality of each step.

The example:
Template 1, Display formatted text
      Open slot 1 (load_text):
              Description: Let a user enter a piece of text and load it into an object
              Input: text
              Output: Object containing text

           Open slot 2 (format_text):
                  Description: Manipulate text format within an object to bold Arial format
                  Input: Object containing text
                  Output: Object containing formatted text

           Open slot 3 (display_text):
                  Description: Display text in an object on screen
                  Input: Object containing (formatted) text
                  Output: Display instructions containing text from object

The above template is defined in plain text. A graphical representation of the first two slots can
be seen in Figure F.
                                load_text    User_text_input              X

 Templat
 e
                               format_text   change_text_format_to_aria   Y
                                             l
Figure F: A template with defined open slots

Translated into OWL-S an open slot is defined by:

Open slot 1 (load_text):

<profile:serviceName>load_text</profile:serviceName>
    <profile:textDescription>
     Let a user enter a piece of text and load it into an object
     </profile:textDescription>

     <!-- descriptions of IOs -->
     <profile:input>
         <profile:ParameterDescription rdf:ID="Plain_Text">
             <profile:parameterName>Plain_Text</profile:parameterName>
             <profile:restrictedTo rdf:resource="&ontology;#Information"/>
             <profile:refersTo rdf:resource="#Plain_TextIn"/>
         </profile:ParameterDescription>
       </profile:input>




                                                       17
<profile:output>
    <profile:ParameterDescription rdf:ID="String_Object">
        <profile:parameterName>String_Object</profile:parameterName>
        <profile:restrictedTo rdf:resource="&ontology;#Object"/>
        <profile:refersTo rdf:resource="#String_ObjectOut"/>
    </profile:ParameterDescription>
  </profile:output>




                                  18
3 Application Domain

This section defines the scope of the practical part of this master project, the domain of
application chosen and the templates invoked. Scenarios are used to describe a “walkthrough”
defining the steps that need to be taken to identify Web service properties and to assemble several
services. In the last part of this section, the techniques used in this project are described.


3.1 Problem statement

This paragraph describes the focus and goal of the application. Section 3.1.1 describes the initial
situation and domain of application. Section 3.1.2 describes the current situation and its problems
and challenges that lay open. Section 3.2 describes a more extensive scenario and the additional
services needed.

3.1.1 Initial situation

Originally there was a portal that could visualise publications, and more so visualise the
cooperation within publications. By manually calling different Web services it was possible to
add new references to a portal and create a visualisation of cooperation between papers by
identifying the same individuals as authors of different papers. [Richards, Splunter et al, 2003]
describes how this process was automated. A full explanation of this situation and the scenario to
solve the problem can be read in section 3.2.1.




                                                19
Figure G: IIDS website22

Figure G depicts a screenshot of the initial portal. Users can browse the IIDS website for
publications and can also use search arguments to get a sorted list by author name or title.

3.1.2 Current situation

The focus of this extension is in portal creation for the IIDS research group website23, publication
section24.
In [Richards, Splunter et al, 2003] an example of Web service configuration is presented for
portal creation. This example is used as a basis for implementation for the IIDS portal.

The goal of the configuration is to display references contained in a BibTeX file on the IIDS
portal. BibTeX files contain relevant information about papers for references, they contain
information about the authors, title, publishing date etc. With the information in these files it is
possible to create an overview of all papers published by one and the same author by searching
through all available BibTeX files and selected the correct authors and the corresponding
published papers.
Problems that occur during this process are related to the format of the available references
[Richards, Splunter et al, 2003], authors who are defined in a different way in the different
references. Information in BibTeX files is not often formatted in a standardised way, this means
that the information has to be converted into, for example, RDF before it can be used. RDF gives
semantics to the information and makes it possible to compare authors since their names now
have the semantic meaning that they are names of authors. In the previous scenario the reference
information was also translated into RDF.


3.2 Design Process

This paragraph describes the scenarios in the different steps of the design process shortly
described in the previous paragraph.

3.2.1 Initial situation

This section describes the initial situation as described in [Richards, Splunter et al, 2003].

The first step towards automatic Web service configuration is to ensure that all Web services are
(re-) configurable. To do this, semantic descriptions are added to each service, using DAML-S to
mark them up. The services that are used and marked up with DAML-S (v0.7) descriptions are
Bib2RDF, Isesame, Esesame, SameIndividualAs and DisplayCreator.
A short description of these services (all services were atomic processes):

22
     http://www.iids.org/publicationdata/pubviewer?property=year&value=2004
23
     http://www.iids.org
24
     http://www.iids.org/publications




                                                                   20
        Bib2Rdf (B2R) – a file conversion service that takes a file in BibTeX format and outputs a
         file in RDF format.
        ISesame - a file import service that takes a file in RDF format and adds it to a specified
         public or private repository in Sesame25 [Broekstra et al, 2002]. Sesame is a repository
         based RDF(S) storage and query facility.
        ESesame – a file export service that extracts data from a specified public or private
         repository in Sesame and outputs the data in RDF, n3 or ntriples.
        SameIndividualAs (SIA) – a utility service that reads in a file in RDF format and adds the
         sameIndividualAs DAML tag to RDF resources that reference the same person.
        DisplayCreator - AIdMinistrator26 (AI-DIS) – a service which takes the contents of a
         Sesame repository and displays it in a Web portal.

In addition an ontology was defined that created a context for this scenario. In that way it became
possible to reason about and match Web services to required functionality. For this scenario some
templates were created also, to create a design of the required functionality. The Agent Factory
could use those templates to configure Web services.

Once the Web services were ready for configuration, the Agent Factory was presented with its
first assignment from a user: “Display bibliographical references available in BibTex, per author using
AIdMinistrator services”.
From an external point of view, the Agent Factory accepts the above assignment and internally
creates a DAML-S description of Web services that can fulfil the task. The different DAML-S
description can then be matched, or are linked to a Web service with the required functionality.

For decision making the Agent Factory first scans the services for DAML-S descriptions to find
out what the services do. Then it looks at the current available data and its format. The AF (Agent
Factory) then finds a service that needs input in the same format as the AF can provide and which
can fulfil the task or part of the task. Then the AF looks at the next step etc. to design the entire
configuration process in such a way that the task is eventually fulfilled. In this way when some
step in the process, some service, requires some kind of input or has some PRE-conditions, these
can be fulfilled prior to this service.
After this scan the Bib2Rdf service is selected to translate the BibTeX formatted information into
RDF, this service needs a RefBibURL as input and this is what the AF has as available data. The
Bib2Rdf service gives a RefRdf Stream as output. Now that the information is available in RDF
format the AF wants to store this data so that all the references can be sorted by author later. To
do this the AF finds the ISesame service which accepts only RDF stream as input.

ISesame has a pre-condition that if the SameIndividualAs (IAS) tag is added that it will handle it.
The AF has the request to display references per author, so it is indeed needed to identify which
references are from the same author. There for the IAS service is added to fulfil the task.
All these services still use RDF stream data as input and output, of course each with their own set
of semantic description for the respective tasks.


25
     http://aduna.biz/index.html




                                                   21
Finally when the AF has gathered all the relevant data, all that remains is to display the
information to the user. To display the data in AIdMinistrator format the last service is used
AIdMdisplayCreator.

3.2.2 Current situation

This section describes a few scenarios that overcome the shortcomings of the initial situation.
There are three scenario‟s described below. This section describes three different phases in the
design of a useful template

Template 1
   Template:
               Input: BibteX URL
               Output: URL IIDS portal

                         Service1: GetFile
                                              Openslot1:
                                                           Retrieve BibteX
                                                                     Input: BibteX URL
                                                                     Output: BibteX File

                         Service2: ProcessFile (Process All Instances)
                                              Openslot1:
                                                         SelectInstance
                                                                    SelectSinglePublication
                                                                               Input: BibteX File
                                                                               Output: BibTex File containing 1 publication
                                              Openslot2:
                                                         StoreInstance
                                                                    StorePublicationInZope
                                                                               Input: Simple BibteX file
                                                                               Output: Feedback
                         Service3: ShowPortal
                                              Openslot1:
                                                         Display new added reference
                                                                    Input: Feedback
                                                                    Output: URL to IIDS portal containing relevant
                                                                                reference

Figure H: The first scenario in template format

The template in figure H describes the following scenario:
A user wants to display a set of references using the IIDS portal. The user has a BibTeX file
containing information about the author of the publication and topic, time of publication etc. To
add this reference to the IIDS portal, the user provides the system with the URL to the BibTeX
file. The goal of the system is now to add the information about the publication as a reference on
the IIDS portal. What first needs to be done is to retrieve the BibTeX file to be able to process the
data it contains. The Web service „getFile‟ takes care of downloading the BibTeX file and
provides this file to the next Web service. The next Web service is „ProcessFile‟, this Web
service ensures that all reference information in the BibTeX file is processed. Since a BibTeX file
can contain more than one reference to a publication it is important to separate these different
references. To do this the component „SelectInstance‟ is called which takes the first reference
information element (which can contain of multiple properties such as author information, url‟s
etc.) and creates a new BibTeX file containing only this one element. Then it passes this file back
to the „ProcessFile‟ Web service and creates another file for the next element and passes this one


                                                                     22
and so on. The „ProcessFile‟ service passes this file to the „StoreInstance‟ component. This
component creates reference files in the ZOPE environment (ZOPE is explained in section 3.3),
which basically means that a file containing an URL to the publication is created with properties
like author information. When the „StoreInstance‟ component is done it gives a feedback signal to
the „ProcessFile‟ service containing information whether the operation was successful. If it was
not the „ProcessFile‟ service can restart. If the operation is successful then an URL to the newly
created reference file in the ZOPE environment is forwarded to the „ShowPortal‟ service. This
service opens a new browser window for the user to the URL of the new reference.

The above template did not suffice. The open slots were too generic to find a suitable Web
service that could match the requirements, for example the input and output are too generic. This
template was incapable of identifying individuals with different papers, and how to match their
names in different references.




                                               23
Template 2
 Template:
             Input: Bibtex URL
             Output: URL IIDS portal

                       Service1: GetFile
                                            Openslot1:
                                                         Retrieve BibTeX
                                                                   Input: BibTeX URL
                                                                   Output: BibTeX File
                       Service2: GatherAllBibtexInfo
                                           Openslot1:
                                                       RetrieveAllBibtex
                                                                  Input: search parameter
                                                                  Output: File with references to all other Bibtex files
                                                                             on the portal
                       Service3: ProcessFile (Process All Instances)
                                            Openslot1:
                                                       SelectInstance
                                                                  SelectSinglePublication
                                                                             Input: BibTeX File
                                                                             Output: BibTeX File containing 1 publication
                                                                                         (most simple Bibtex file possible)
                                            Openslot2:
                                                       PassAllBibtexFiles
                                                                  SelectBibtexAndPass
                                                                             Input: File with references to other Bibtex files
                                                                             Output: BibTeX file
                                            Openslot3:
                                                       Bib2RDF
                                                                  AddRDFTags
                                                                             Input: BibTeX file
                                                                             Output: RDF file
                                            Openslot4:
                                                       SameIndividualAs
                                                                  AddDAML-STag
                                                                             Input: RDF file
                                                                             Output: RDF file with SIA tag
                                            Openslot5:
                                                       RDF2Bib
                                                                  RemoveRDFTagsPreserveDAML-STags
                                                                             Input: RDF stream with SIA tag
                                                                             Output: BibTeX File with SIA tag
                                            Openslot6:
                                                       IdentifyAuthor
                                                                  Remove DAML-S tag and replace with correct author name
                                                                             Input: BibTeX File with SIA tag
                                                                             Output: BibTeX File with right author name
                                            Openslot7:
                                                       StoreInstance
                                                                  StorePublicationInZope
                                                                             Input: Simple BibTeX file
                                                                             Output: Feedback
                       Service4: ShowPortal
                                            Openslot1:
                                                       Display new added reference
                                                                  Input: Feedback
                                                                  Output: URL to IIDS portal containing relevant
                                                                              reference




Figure I: The second template




                                                                     24
P.T.G. Hoofd                Hoofd, P.T.G.               Pieter T. G. Hoofd
Piet Hoofd                  P. Hoofd                    Hoofd
Hoofd, P.T.G.               Pieter Theo Hoofd           Pieter
Piet Theo Gerard Hoofd Pieter Gerard Hoofd              Hoofd, Pieter Gerard
Table 1: Different possibilities to spell a name

Extra functionality has been added to the service in the second template. There is now a check to
ensure that publications are linked to the right author even if his or her name is used in a different
way. This can be the case if the name of an author is spelled as P.T.G. Hoofd in one publication
and as Piet Hoofd in another publication. This check has a downside though, to use this service
all other references have to be in the same file or data stream for the service to be able to identify
the same authors. So to give this information to this service all Bibtex information currently
available on the portal needs to be gathered.
To integrate this new functionality the „simple‟ BibTeX file is first converted to RDF data. The
existing service Bib2RDF is used for this purpose. This service adds RDF tags to the Bibtex data
so that there can be done some reasoning about the content.
There is a need for all the other reference information on the portal. For this purpose all BibTeX
information on the portal is converted to RDF with the Bib2RDF service and added to the same
RDF file. To do this RetrieveAllBibTeX searches the portal for all BibTeX files and stores the
relevant URL files in a file. Then PassAllBibtexFiles can gather the BibTeX files one by one and
pass them through to the Bib2RDF service which will create a large file with all RDF info from
all the Bibtex files.
The created RDF file is then forwarded to the existing service SameIndividualAs. This service
adds DAML-S tags to the data to identify author names that refer to the same author.
Since this information still needs to be stored and displayed the RDF data now needs to be
converted back to Bibtex format. To do this a service called „RDF2Bib‟ is used. This service
outputs a BibTeX file identical to the original BibTeX file except for the SIA tag that is still
included. This SIA tag specifies:
DAML:SameIndividualAs(“P.T.G. Hoofd, Piet Hoofd”)
Only the new references are converted back to BibTeX, all other RDF data will be discarded.
The final step before the service can get on with the earlier described functionality StoreInstance,
is to remove the SIA tag and replace the author name with the name that is commonly used on the
IIDS portal. To do this the service IdentifyAuthor is used. This service outputs a „simple‟ Bibtex
file containing a single author name that is the same as all other references on the IIDS portal of
this author.

This template introduced new problems. For example processing and combining BibTeX sources
was more easily accomplished by splitting this service up into two or three other services.
Another solution to the SameIndividualAs problem was needed, the final template for the IIDS
portal follows:




                                                 25
Template 3
 Template:
             Input: Bibtex URL
             Output: URL IIDS portal

                       Service1: GetFile
                                             Openslot1:
                                                          Retrieve BibTeX
                                                                    Input: BibTeX URL
                                                                    Output: BibTeX File Object
                       Service2: GathAndStoreBib
                                          Openslot2:
                                                          Search Portal for BibTeX
                                                                    Input: Website URL
                                                                    Output: File Object containing links to BibTeX
                                                                                sources
                       Service3: BibInfoGath:
                                           Openslot3:
                                                          Combine different BibTeX sources
                                                                   Input: Local or remote BibTeX source,
                                                                          File containing links to BibTeX sources
                       Service4: Bib2Rdf
                                             Openslot4:
                                                          AddRDFTags
                                                                 Input: BibTeX File object
                                                                 Output: RDF file
                       Service5: SameIndividualAs
                                          Openslot5:
                                                          AddDAML-STag
                                                                 Input: RDF File
                                                                 Output: SIA File Object
                       Service6: ReplaceSIATag
                                          Openslot6:
                                                          Identify individuals and select one spelling for their name,
                                                          and replace all other spellings with that one
                                                                     Input: RDF File Object, SIA File Object
                                                                     Output: RDF File Object
                       Service7: Rdf2Bib
                                             Openslot7:
                                                          Convert the RDF data back into BibTeX format
                                                                    Input: RDF File Object
                                                                    Output: BibTeX File Object
                       Service8: StorePub:
                                             Openslot8:
                                                          Store BibTeX references info on a ZOPES portal in plain
                                                          text format
                                                                      Input: BibTeX File Object
                                                                      Output: URL to plain format file


Figure J: Template three

The final template specified; First the GetFile service loads a BibTeX file into a readable file
object. The next step is to search through a portal/website and look for BibTeX reference files. If
those are found, links to those files are saved in a file object, GathAndStoreBib is an open slot
that defines that functionality. After this, the file object from GetFile and GathAndStoreBib can
be used by a Web service that fits in the third open slot „BibInfoGath‟. This open slot defines the
funcionality of combining new BibTeX information (the file object from GetFile) and old/known
BibTeX information (file object from GathAndStoreBib).
The next open slot uses this BibTeX source and converts it to RDF data. When the data is in RDF
format, the open slot that is named SameIndividualAs can use the RDF file object that comes
from the Bib2Rdf open slot and identify authors and different spellings of their names.



                                                                        26
The ReplaceSIATag open slot then uses that information to filter out different spellings from
each author name and only preserves one spelling.
The next step is to convert the RDF data back to BibTeX format so that it can be used as
reference information again. Finally this BibTeX information is stored in a ZOPE environment
by the Web service that fits in the StorePub open slot.

This last scenario is sufficient to solve the problems identified when adding a reference to a
portal. The scenario is specific enough to create a template which can be used for Web service
configuration.




                                               27
4 Implementation

In this section the practical part of this Msc. project is described, containing worked out examples
and decision information.


4.1 Web service implementation

This paragraph describes the implementation of the different Web services in Java. These Web
services fit into the open slots of the new template defined in section 3.2.2. Some of the Web
services that match the functionality requirements of the open slots have already been
implemented in the previous scenario (section 3.2.1), therefore are not mentioned in this section.

4.1.1 GetFile

The „GetFile‟ service accepts an URL to a remote file and then loads its content into memory and
stores a copy of the file locally. Since this service accepts URL‟s to remote files, it assumes that
these remote files are accessible over the Internet and that the service will be granted at least
„read‟ rights. The service has been developed with a user interface for testing purposes. The user
is asked to provide an URL to the remote BibTeX file and provide a name for the file to store a
copy locally under that name.
The service checks whether this file is already created, if so it overwrites this file, if not the file is
created. This provides a stateless environment for this service.
When the file has been read, copied and loaded it is printed on the screen to present it to the user.
Again this feature is for testing, to see whether the service has correctly copied the BibTeX
information.
The user interface is created in HTML and is connected to the Web service by JSP.
In principle the GetFile service can read any file that is provided through a URL, however it
cannot yet handle binary input, and therefore this kind of file should not be presented to this
service. An error handler to check on this is something to look at in the future.
For further insight in this service please see appendix C.1, which contains a java doc like
overview of the Java implementation in which methods and important variables are explained.

4.1.2 GathAndStoreBib

The „GathAndStoreBib‟ service collects all the BibTeX information that is available on a
website. Given the URL of a website, this service searches through its pages to find links to
BibTeX files. If such links are found the URL‟s to these BibTeX files are stored in a local file.
A user interface has been designed to test this service. The user is asked to provide the service
with a URL of a website and the name of the output file that will contain the links to BibTeX
files.
For further insight in this service see appendix C.2, which contains a java doc like overview of
the Java implementation with methods and important variables explained.




                                                   28
Figure K: GathAndStoreBib user interface

4.1.3 BibInfoGath

The „BibInfoGath‟ service collects all BibTeX information and stores it in a local file and
memory. This service has the following input: a URL or local location to the BibTeX file
containing new reference information and the location or object that contains links to „old‟
BibTeX files. These two sources are then combined and saved in a new file and object that now
contains all BibTeX information from a portal/website and the new reference information.
For further insight in this service see appendix C.3, which contains a java doc like overview of
the Java implementation with methods and important variables explained.


4.1.4 ReplaceSIATag

The „ReplaceSIATag‟ service takes two types of input: One containing BibTeX information in
Rdf format provided as an object and one Rdf file containing DAML-S SameIndividualAs tags
based on the former Rdf file (provided through an URL).
This service parses the Rdf file containing BibTeX references and replaces authors with identical
id‟s, so that each other refers to the same name. The name of an author is defined between
„Person‟ tags in Rdf.
The selection procedure for selecting one type of spelling for a name is quiet simple, it selects the
first name that is connected to an id.
For further insight in this service see appendix C4, which contains a java doc like overview of the
Java implementation with methods and important variables explained.

4.1.5 Rdf2Bib

The „Rdf2Bib‟ services takes a RDF file as input and converts its content to BibTeX. To do this
the service creates a Persons object to gather information about authors and their names (possibly
with different spellings) and the service also creates an Organizations object that contains
publisher information. To create these objects the RDF document is first parsed for the relevant
information.


                                                 29
Together with these objects this service creates a new BibTeX object, reference by reference.
For each reference that is translated to the BibTeX format, the service reads line by line the Rdf
information and removes RDF tags and inserts BibTeX formatted text.
This service can handle a predefined set of entry types and field names. All other entry types and
field names are ignored. This predefined set is based on the selection that is described at this
website27.
For further insight in this service see appendix C.5, which contains a java doc like overview of
the Java implementation with methods and important variables explained.

4.1.6 StorePub

The „StorePub‟ service takes a BibTeX file as input, either by an URL to the BibTeX file, or an
object containing the BibTeX information.
The service then stores this BibTeX information (or the information from the remote BibTeX
file) on a portal in plain text without the BibTeX format.
The service is designed to store the publication in a Zope environment for the IIDS portal. To
store the file in a Zope environment the service needs to open a ftp session to Zope and then store
it in some location in Zope. To accomplish all this, the service needs to be provided with the
location of the Zope portal, the ftp port, username and password of the Zope environment with
writing rights and a location in Zope to store the file. This last part has a default value, if no
location is provided the file is saved in /website/publicationdata/db/.
StorePub only saves predefined BibTeX fields. These fields are again a selection from the earlier
mentioned website. The IIDS-portal has some pre-defined fields of its own, however those are
ignored in this version of the Web service.
For further insight in this service see appendix C.6, which contains a java doc like overview of
the Java implementation with methods and important variables explained.


4.2 Web service annotation

This paragraph describes the annotation of the different Web services in OWL-S, and the
problems and decision making area‟s are explained.

The annotation of the different Web services involves the creation of the following files; An
overall service description describing where to find the Profile, Process etc.
Other files describe the Profile, Process and Grounding of the Web service. The last file is the
WSDL grounding of the Web service.

4.2.1 GetFile

The annotation of GetFile involved creating a Service file describing the locations to find the
Service‟s Profile, Process and Grounding.
The Profile describes the service itself, the creator, contact information of the creator and the
input, output and preconditions of the service. For GetFile the input is an URL pointing to a

27
     BibTex format website: http://www.ecst.csuchico.edu/~jacobsd/bib/formats/bibtex.html

                                                              30
BibTeX file. The output is a file object containing the contents of the BibTeX file provided
through the URL of the input. Preconditions for GetFile are (1) that the BibTeX file exists (and
so the URL had to be valid) and (2) that the file should contain ASCII input only.
The Process file describes the types of input, output and preconditions and connects them to some
term in the ontology so that it becomes possible to reason about these properties and place them
in a context. The Process description is an atomic process and therefore does not have any other
sub-processes.
The Grounding of the GetFile service is mainly the bridge to the WSDL descriptions of this Web
service. It connects the input, output and conditions to WSDL descriptions.
The WSDL Grounding describes how to access the service, how to deliver the needed input, what
output to expect and what ports, locations, bindings and other connection types to use.
The implementation of the annotation of this Web service can be found in Appendix D.

4.2.2 GathAndStoreBib

The annotation of GathAndStoreBib involved creating a Service file describing the locations to
find the Service‟s Profile, Process and Grounding.
The Profile describes the service itself, the creator, contact information of the creator and the
input, output and preconditions of the service. For GathAndStoreBib the input is an URL
pointing to a website or portal. The output is a file object containing links to BibTeX reference
files that were found on the provided website or portal. Preconditions for GathAndStoreBib are
(1) that the website or portal exists (and so the URL had to be valid) and (2) that the website
contains an index.html defining the home file of the website/portal.
The Process file describes the types of input, output and preconditions and connects them to some
term in the ontology so that it becomes possible to reason about these properties and place them
in a context. The Process description is an atomic process and therefore does not have any other
sub-processes.
The Grounding of the GathAndStoreBib service is mainly the bridge to the WSDL descriptions
of this Web service. It connects the input, output and conditions to WSDL descriptions.
The WSDL Grounding describes how to access the service, how to deliver the needed input, what
output to expect and what ports, locations, bindings and other connection types to use.
The implementation of the annotation of this Web service can be found in Appendix D.

4.2.3 BibInfoGath

The annotation of BibInfoGath involved creating a Service file describing the locations to find
the Service‟s Profile, Process and Grounding.
The Profile describes the service itself, the creator, contact information of the creator and the
input, output and preconditions of the service. There are several types of input for BibInfoGath:
The first input is a location pointing to a BibTeX file. The second input is an URL to a file
containing links to BibTeX reference files. The third input is a location to a file containing links
to BibTeX references. The last input is a string containing the name of the output file.
The output is a file object containing the contents of a BibTeX file that exists out of the
concatenation of the first input and the BibTeX references found in the second or third input.
Preconditions for BibInfoGath are (1) the locations and (2) that URLs to the different files exist
(and so the URL and location had to be valid) and (3) that the file should contain ASCII input
only.


                                                 31
The Process file describes the types of input, output and preconditions and connects them to some
term in the ontology so that it becomes possible to reason about these properties and place them
in a context. The Process description is an atomic process and therefore does not have any other
sub-processes.
The Grounding of the BibInfoGath service is mainly the bridge to the WSDL descriptions of this
Web service. It connects the input, output and conditions to WSDL descriptions.
The WSDL Grounding describes how to access the service, how to deliver the needed input, what
output to expect and what ports, locations, bindings and other connection types to use.
The implementation of the annotation of this Web service can be found in Appendix D.

4.2.4 ReplaceSIATag

The annotation of ReplaceSIATag involved creating a Service file describing the locations to find
the Service‟s Profile, Process and Grounding.
The Profile describes the service itself, the creator, contact information of the creator and the
input, output and preconditions of the service. There are two type of input for ReplaceSIATag:
The first input is an URL pointing to a SIA file. The second input is an RDF object. The output is
a file object containing RDF information but with filtered names of individuals, each individual
has now only one spelling of his or her name. Preconditions for ReplaceSIATag are (1) that the
SIA file exists (and so the URL had to be valid) and (2) that the file contains ASCII input only.
The Process file describes the types of input, output and preconditions and connects them to
terms in the ontology so that it becomes possible to reason about these properties and place them
in a context. The Process description is an atomic process and therefore does not have any other
sub-processes.
The Grounding of the ReplaceSIATag service is mainly the bridge to the WSDL descriptions of
this Web service. It connects the input, output and conditions to WSDL descriptions.
The WSDL Grounding describes how to access the service, how to deliver the needed input, what
output to expect and what ports, locations, bindings and other connection types to use.
The implementation of the annotation of this Web service can be found in Appendix D.

4.2.5 Rdf2Bib

The annotation of Rdf2Bib involved creating a service file describing the locations to find the
service‟s Profile, Process and Grounding.
The Profile describes the service itself, the creator, contact information of the creator and the
input, output and preconditions of the service. For Rdf2Bib the input is an URL pointing to an
RDF file or an RDF object. The output is a file object containing BibTeX references extracted
from the RDF descriptions from the provided RDF file or object. Preconditions for Rdf2Bib are
(1) that the RDF file exists (and so the URL had to be valid) and (2) that the file contains ASCII
input only.
The Process file describes the types of input, output and preconditions and connects them to some
term in the ontology so that it becomes possible to reason about these properties and place them
in a context. The Process description is an atomic process and therefore does not have any other
sub-processes.
The Grounding of the Rdf2Bib service is mainly the bridge to the WSDL descriptions of this
Web service. It connects the input, output and conditions to WSDL descriptions.



                                               32
The WSDL Grounding describes how to access the service, how to deliver the needed input, what
output to expect and what ports, locations, bindings and other connection types to use.
The implementation of the annotation of this Web service can be found in Appendix D.

4.2.6 StorePub

The annotation of StorePub involved creating a Service file describing the locations to find the
Service‟s Profile, Process and Grounding.
The Profile describes the service itself, the creator, contact information of the creator and the
input, output and preconditions of the service. For StorePub there are six types of input. The first
input is an URL pointing to a BibTeX file. The second input is the location of a Zope portal. The
third input is the FTP port of the Zope portal. The fourth input is the username to login on the
Zope Portal. The fifth input is the password belonging to the username. The last input is the
location on the Zopes portal where the publication(s) should be stored.
The output is an URL pointing to the first added reference that should now be accessible on the
portal. Preconditions for StorePub are (1) that the BibTeX file exists (and so the URL had to be
valid) and (2) that the file contains ASCII input only.
The Process file describes the types of input, output and preconditions and connects them to some
term in the ontology so that it becomes possible to reason about these properties and place them
in a context. The Process description is an atomic process and therefore does not have any other
sub-processes.
The Grounding of the StorePub service is mainly the bridge to the WSDL descriptions of this
Web service. It connects the input, output and conditions to WSDL descriptions.
The WSDL Grounding describes how to access the service, how to deliver the needed input, what
output to expect and what ports, locations, bindings and other connection types to use.
The implementation of the annotation of this Web service can be found in Appendix D.

4.2.7 Domain ontology

The ontology created for automatic configuration of the above defined Web services is based on
the ontology defined in [Richards, Splunter et al, 2003]. Concepts added to the indicate: “File
Object”, “BibFile” and “String”.
The implementation of the ontology can be found in Appendix D.

4.2.8 Template

The template that has been implemented is a very specific implementation of the scenario in
3.2.2. It has open slots, which can be precisely filled with the implemented Web services. It
restricts the slot with the exact input/output requirements that can be fulfilled by the Web services
implemented in section 5.1. The implementation of the template can be found in Appendix D.




                                                 33
5 Discussion and Conclusions

This section provides an overview of problems encountered. Appendix B describes some
additional considerations about the design process.

The first goal of this thesis was to provide an overall view of the Web service research field. This
goal has been reached by describing what a Web service is, what kind of technology is used for
creating and describing Web services and by describing different method of configuring Web
services.

The second goal was to create a scenario for the IIDS portal that is able to add a new publication
reference to it and deal with all problems that come with that scenario, for example individuals
with different spellings of their name. A scenario and template to solve this problem is presented
in section 4.2.2.

To implement actual Web services to fulfil the tasks set out in the template and scenario, was
another goal set for this project. As indicated in section 5 and Appendix C, together with the
implementation files, this goal has also been reached.
Together with the descriptions an implementation of a template and an ontology have been
created to make automatic configuration possible. To ensure validity of all the annotations of the
implemented Web services, an OWL parser28 has checked the annotations.

The last goal in this project however was not reached. To really put the Web services to the test,
using the Agent Factory for automatically configuring them has still to be accomplished.

Has this thesis been successful in describing and creating Automatic Web service configuration?
This thesis has created the insight needed for Web service creation and description. Together with
templates and an ontology a path is opened for automatic Web service configuration. The
configuration part has not been tested, but the Web services, templates and ontology are fully
ready and are syntactically correct.

The real challenge in Web service configuration lies in describing Web services and the
configuration methods used. Designing and implementing Web services requires some skills
from the programmer but these processes are not the area to focus on in the future, because the
knowledge on programming, and creating interfaces to the Web are already widely present.
However the processes of annotating and configuring Web services can be a challenge. For
annotating Web services knowledge on OWL-S and modelling ontologies is needed. Though
OWL-S offers a standardised structure to follow, it remains a time consuming process to
understand how to apply the OWL-S ontology. This could be simplified with tools that offer
support for applying the predefined structure on a Web service. Annotations of Web services do
not differ much from each other, therefore a tool that offers a standard structure that can be
applied on a random Web service could be a welcome development. The Profile, Model and
Grounding in both OWL-S and WSDL have a similar structure for each Web service. Modelling

28
     http://www.w3.org/RDF/Validator/



                                                 34
the ontology is a different story. For this project it was possible to reuse the ontology created in
[Richards, Splunter et al, 2003], however when designing Web services for a different domain
would require a new ontology design and much more work. It would be possible to use existing
ontologies, but the context and concepts need to match perfectly.
Choosing the right configuration method before designing a Web service is of the greatest
importance. Each configuration method requires a different approach, for example agents,
templates or mediators, see section 2.1.1, and each method therefore requires additional work.
The developer must know in advance which method to use in order to be able to deliver working
and matching Web services.




                                                 35
6 References


[Ankolekar et al, 2001] A. Ankolekar, M. Burstein, J. Hobbs, O. Lassila, D. Martin, S. McIlraith,
S. Narayanan, M. Paolucci, T. Payne, K. Sycara, H. Zeng: 2001, "DAML-S: Semantic Markup
for Web Services" In Proceedings of the International Semantic Web Working Symposium
(SWWS), July 30-August 1, 2001.

[Bechhofer et al, 2001] Bechhofer, S. and Goble, C.: “Toward Annotation Using DAML+OIL”,
1st International Conference on Knowledge Capture (K-CAP’2001), Workshop on Semantic
Markup and Annotation, Victoria, BC, Canada, Oct. 2001

[Brazier, Wijngaards, 2001] Brazier, F.M.T. and Wijngaards, N.J.E.: “Automated Servicing of
Agents” In: AISB Journal, Vol. 1, Number 1, pp. 5-20 (Special Issue on Agent Technology), 2001

[Broekstra et al, 2002] Broekstra, J., Kampman, A., van Harmelen, F.: ”Sesame: A Generic
Architecture for Storing and Querying RDF and RDF Schema”, Proceedings of the First
Internation Semantic Web Conference, Lecture Notes in Computer Science, pages 54-68, July
2002

[Cheng et al, 2002] Cheng, Z., Singh, M.P. and Vouk, M.A.: “Composition Constraints for
Semantic Web Services” WWW2002 Workshop on Real World RDF and Semantic Web
Applications, May 7, 2002

 [Kawamura et al, 1999] T. Kawamura, Y. Tahara, T. Hasegawa, A. Ohsuga and S. Honiden.:
“Bee-gent: Bonding and Encapsulation Enhancement Agent Framework for Development of
Distributed Systems", Journal of the IEICEJ, D-I, Vol. J82-D-I, No.9, 1999.

 [Martin et al, 2004] D. Martin, M. Paolucci, S. McIlraith, M. Burstein, D. McDermott, D.
McGuinness, B. Parsia, T. Payne, M. Sabou, M. Solanki, N. Srinivasan, K. Sycara: "Bringing
Semantics to Web Services: The OWL-S Approach", Proceedings of the First International
Workshop on Semantic Web Services and Web Process Composition (SWSWPC 2004), July 6-9,
2004, San Diego, California, USA.

[Paolucci et al, 2002] Massimo Paolucci, Takahiro Kawamura, Terry R. Payne, Katia Sycara;
"Semantic Matching of Web Services Capabilities." In Proceedings of the 1st International
Semantic Web Conference (ISWC), 2002.

[Paulo et al 2002] Paulo F. Pires, Mário R. F. Benevides, and Marta Mattoso: “Building Reliable
Web Services Compositions”, In Proceedings of the NET.Object Days Conference (WS-RDS'02),
pages 551-562, Erfurt, Germany, October 2002

[Peltz 2003] Chris Peltz: “ Web services orchestration a review of emerging technologies, tools,
and standards” 2003, Hewlett Packard, Co.
http://devresource.hp.com/drc/technical_articles/wsOrchestration.pdf


                                               36
[Pires et al 2002] Paulo F. Pires and Mario Benevides and Marta Mattoso: “Building Reliable
Web Services Compositions” in Net.Object Days - WS-RSD'02, pages 551-562, 2002

 [Richards, Splunter et al, 2003] D. Richards, S. van Splunter, F.M.T. Brazier, M. Sabou:
“Composing Web services using an Agent Factory” In: Proceedings of AAMAS Workshop on
Web Services and Agent-Based Engineering(WSABE), Melbourne, Australia, pp. 57-66, 2003

[Richards, Sabou et al 2003] D. Richards, M. Sabou, S. van Splunter, F.M.T. Brazier: “Artificial
Intelligence: a Promised Land for Web Services”, In: The Proceedings of The 8th Australian and
New Zealand Intelligent Information Systems Conference (ANZIIS2003), Macquarie University,
Sydney, Australia, pp. 205-210, 2003

[Sabou, Richards and Splunter, 2003] M. Sabou, D. Richards, S. van Splunter: “An experience
report on using DAML-S” In: Proceedings of WWW 2003 Workshop on E-Services and the
Semantic Web (ESSW'03), Budapest, Hungary, 2003

 [Sirin et al, 2003] Evren Sirin, James Hendler, Bijan Parsia: “Semi-automatic Composition of
Web Services using Semantic Descriptions”, In "Web Services: Modeling, Architecture and
Infrastructure" workshop in conjunction with ICEIS2003, 2003.
http://www.mindswap.org/~evren/composer/

[Splunter, Brazier et al, 2003] S. van Splunter, M. Sabou, F.M.T. Brazier, D. Richards:
“Configuring Web Services, using Structuring and Techniques from Agent Configuration”
In: Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003),
pp. 153-160, 2003

[Splunter, Wijngaards, Brazier, 2003] S. van Splunter, N.J.E. Wijngaards, F.M.T. Brazier:
“Structuring Agents for Adaptation” In: Alonso, E., Kudenko, D. and Kazakov, D. (editors),
Adaptive Agents and Multi-Agent Systems, Lecture Notes in Artificial Intelligence (LNAI) , Vol.
2636, pp. 174-186, 2003

[The OWL Services Coalition, 2003] The OWL Services Coalition: “OWL-S: Semantic Markup
for Web Services”, 11-2003 (http://www.daml.org/services/owl-s/1.0/owl-s.html)

[The Stencil Group, 2001] Stencil Group: 2001: “ The Stencil Scope, An Analysis Memo from
The Stencil Group”, http://www.stencilgroup.com

[Wohes et al 2002] P. Wohed, W.M.P. van der Aalst, M. Dumas, and A.H.M. ter Hofstede:
“PatternBased Analysis of BPEL4WS.” in QUT Technical report, FIT-TR-2002-04, Queensland
University of Technology, Brisbane, 2002




                                               37
Appendix A – Techniques and definitions

This section describes the techniques used in this project.

One of the tools that have been used is ZOPE29.
With ZOPE it is possible create an environment in which it is very easy to access portals and the
scripts and pages etc. within these portals. The scripts can manipulate data, create data and delete
data. The scripts can call another script and in this way a chain of actions can be created.




Figure L : ZOPE environment

The IIDS-portal is based on Zope. Figure K shows a screenshot of the ZOPE environment where
new publications and references can be added. Scripts and resulting files from the new scenario
will also be placed in this ZOPE environment. Scripts are needed to add information about
references to the IIDS-portal in order to be displayed.

BibTeX30 is a program and file format designed by Oren Patashnik and Leslie Lamport in 1985
for the LaTeX document preparation system. The format is entirely character based, so it can be
used by any program (although the standard character set for accents is TeX). It is field (tag)
based and the programs that deal with the BibTeX format will ignore unknown fields, so it is
expandable. It is probably the most common format for bibliographies on the Internet. BibTeX
features a standard set of Entry types such as article, book etc. that can be used describes a

29
     http://www.zope.org
30
     http://www.ecst.csuchico.edu/~jacobsd/bib/formats/bibtex.html



                                                                     38
publication. To define the properties of a publication BibTeX has a standard set of fields.
Examples of such fields are: author, title, year etc.

For the practical part of this master‟s project Java, JSP and XML have been chosen for
implementing the different Web services that fill the open slots in the pre-defined templates.
During experimentation and implementation simple web server software proved the best solution.
Sun‟s Web services development kit31 with apache web server was too complex and put too many
constraints for this project. Therefore Blazix32 Web server software was used to test all Web
service implementations. This last piece of software is a Java Application Web Server and was
easy to use and set up. However the created Web services can also be run on an Apache Web
server33, which is open source.
Most Web services have been extended with a user interface to test the service.




31
     http://java.sun.com/webservices/downloads/webservicespack.html
32
     http://www.blazix.com/
33
     http://www.apache.org/



                                                                      39
Appendix B – Design process considerations

This section describes a few considerations that address issues encountered during the design
process.

The assumption was made that the portal deleted its own content after each use, so that the portal
would be stateless. This was to constrain the number of factors that needed to be taken into
account. To use a portal with references present and a state all together would have made this
master‟s project too big a project. If the portal had not been stateless there would be some
additional problems to solve. For example when saving a reference on the portal, it would be
necessary to check whether this reference isn‟t already available on the portal.

When considering the SameIndividualAs service there was a decision to make. That decision was
what different names, tagged by the service for the same individual, should be selected for a new
reference to be added. Right now the ReplaceSIA service has been implemented to pick the first
spelling of the name that is found, this is quiet a random selection. A better implementation
would be to count the number of occurrences of a spelling, or to design a standard type of
spelling, and if present select that one. Another way to better implement this is to look at other
references already present at a portal and select the spelling that is used in there to keep
uniformity. This feature isn‟t implemented since the assumption was to have a stateless portal and
due to simplicity.

BibTeX has a large number of entry types and field names. There is some kind of standard and
commonly used names, but this is not by a long shot a uniform way representing references.
Every file creator uses his or her own field names or entry types, and new ones are created
regularly. There fore it is hard to create a good conversion system for these files into RDF and
also when storing them on a ZOPE portal in plain text format. The StorePub service (section
4.1.6) only translates a pre-defined set of field names and entry types. All other names and types
are ignored. While this set is usually enough for describing a publication it is not a perfect
application for a broader audience.

The IIDS portal, on which this thesis focuses, runs a ZOPE environment, to support this the
StorePub Web service has been created. This service is however not very safe to use. It asks for a
username and password to login the ZOPE server, however the name and password are not
encrypted when send to the portal to login. This is of course very insecure and should there fore,
before it is presented to a wider audience, be designed to deal with this issue in the future.

Using OWL-S creates a need for more specific domain concepts. It definitely offers a well-
structured way of describing Web services, however everything stands or falls with the domain-
ontology that is used for the Web service description. Based on the ontology the configuration
decisions are made, if the ontology is insufficient the Web services may never be used.
The same holds for decisions while defining templates and ontology‟s. Templates have open slots
that define input, output etc. of Web services that are to fill the slots, but how many restrictions
do you want to make on the input, how many conditions etc. If you make a template too generic it
will find so many Web services to fill the slots that the probability that the correct functionality is


                                                  40
selected decreases rapidly. However if a template is too specific it may be possible that no service
matching the requirements is found at all. The same holds for an ontology. If it is too generic the
context may not be clear and Web services may be selected that are designed for a totally
different domain. But when the ontology is too specific it can constrain the functionality also. For
example if we look at the GetFile service from section 4.1.1 it loads a file into a readable object
and puts constraints on the type of file, it may not be binary and it should have the BibTeX
extension .bib etc. When an ontology is familiar with concepts like these it is so specific that it is
hard to find a Web service that matches these requirements, while other service may be available
that can do the same task but that doesn‟t set these limitations.

Because the Agent Factory wasn‟t fully ready for automatic Web service configuration at the
time of this writing, it was not possible to check whether the semantic descriptions of the Web
services, and the template and ontology was sufficiently detailed to match and automatically
configure the supplies set of Web services. The Web service descriptions in OWL-S have been
created, however whether they are really sufficiently defined and describe the services to a level
where the Agent Factory can actually use them is impossible to say. It was not yet possible to
validate the syntax of the Web service OWL-S descriptions because there is no working syntax
checker/validator available that can deal with OWL-S 1.0. The syntax checkers that are available
focus on OWL and RDF, not yet on OWL-S. If, in the future, it would become possible to check
the syntax, this would help to ensure validity of the code and therefore increase the chance that
the Agent Factory can use the descriptions.




                                                 41
Appendix C


C.1 GetFile:

/* This class reads a remote file into a file object */

/* This method creates the file object and reads the remote file */
      public String getLink();

/* This method writes the information into the outputFile / object */
      public void setLink(String value);




C.2 GathAndStoreBib:

/* This class searches a website for links to BibTeX sources. All BibTeX
sources that are found are stored in a file
   by their links
*/

//adds links to BibTeX files to a file
      public void addToList(String url, String hostName);

//adds links to BibTeX files to a file
      public void addToList(String url, String hostName);

// Purpose - checks to see if a page has already been
// visited by the search thread
      boolean checkAlreadyFound (String page);

//   Purpose - adds a page visited by the search thread to
//   the list of visited pages
//   This prevents the same link from being followed if it
//   is on multiple pages.
        public void incrementPages (String page);

// Purpose - returns the number of pages that the search
// thread has visited
      public int getTotalPages ();

// checks to see if the output file exists and if so it is deleted and a
// new one is created. The website is searched for links to “.bib” extension
// files.

       public String getServer();


//=========================================================================
//                        Class SearchPages
//=========================================================================



                                        42
// This thread performs the search. The search starts with the index.html or
// index.htm page and then follows all local links
// Note external links are ignored.

//   Search state transitions
//   First find top level pages (from the index page)
//   Search the above pages first
//   Search all other pages
        final byte FIND_TOP_LEVEL_PAGES = 0;
        final byte SEARCH_TOP_LEVEL_PAGES = 1;
        final byte SEARCH_OTHER_PAGES = 2;

       String hostName;        // Host name of site e.g babbage
       HomePageSearch app;     // Parent applet
       String textToFind;      // String to find
       int maxPages;           // Maximum number of pages to visit
       int hitsFound = 0;      // No of occurrences of search string
       static final byte URLCOUNT = 2;
       boolean pageOpened = false;     // Flag to indicate if index page
                               // opened OK
       boolean proxyDetected = false; // Flag to indicate if a proxy server
                               // or firewall has been detected
       int topLevelSearch;             // Search the index page links first
       Vector topLevelPages;           // Page names in the index page
       Vector nextLevelPages;          // Lower level pages
       boolean match;                // True if current link is BibTeX file

// Constructor
      SearchPages (HomePageSearch applet, String hn, String text, int
                   maxSearch)

        public void run();
//   State 1: search the index page, remembering all links on
//   the index page
//   Check to see if a proxy is being used. If so then we use
//   IP address rather than hostnames


//   Function: detectProxyServer
//   Purpose: attempt to see if a proxy server or firewall is blocking
//   a connection back to the originating server. If so then the
//   variable proxyDetected is set to true and all future connections
//   to the server will use the IP Address (if passed as a parameter)

       final boolean detectProxyServer ();


// Purpose - read all lines on a page - extracting local links
// and checking for the presence of the search string
       final void searchPage ( DataInputStream dis, BufferedReader dis,
                        String url);

//   Purpose - scan a line of text looking for links to other
//   pages. The following types of links are currently supported
//   1. Normal links, e.g <A HREF="page.html">Text</A>
//   2. Frames, e.g <FRAME scrolling=yes SRC="contents.html">


                                        43
      final String parseForLink (String upperCaseInput, String input);


// Purpose - scan a line of text to see if the search string is
// present. If so then add the line to the list of matches.
      final void checkMatch (String url);


// Purpose - remove HTML tages from a line (e.g <BR>). The
// algorithm is a bit simplistic in that it cannot handle
// HTML tags spilt over one line.
      final String removeHTMLTags (String inputLine);


// Purpose - checks validity of a link. If the link is valid
// it's added to the list of visited links and then followed
      final void checkLink (String link);




C.3 BibInfoGath:

/* This class combines the new publication
   information found in the NewBibTeX file with
   all bibtex information provided by links in a
   new file which is provided by the user / other service.
*/

/* The following variables are used for recognising BibTeX references in a
   file. The variables described here are the only entry type for a reference
   that are accepted by this service. Obviously this can be extented to
   support more entry types.
*/
      String book = new String("@BOOK");
      String article = new String("@ARTICLE");
      String techReport = new String("@TECHREPORT");
      String inCollection = new String("@INCOLLECTION");

/* This method checks whether the output file already exists. If not it is
   created.after that the new BibTeX source is read and after that the sources
   from the second file are read
*/
      public String getOutputFile();

/* This method reads the provided BibTeX file and adds the references in that
   file to the new output file
*/
      public void addNewBib();




C.4 ReplaceSIATag:
 public final static int NUMBER_OF_AUTHOR_NAMES = 10; // each author can have max 10
different names


                                         44
 public final static String DAML_STRING = "           <daml:sameIndividualAs rdf:resource =
\"&ns_0;"; //DAML-S SIA tag
 String rdf;    // This string contains the rdf information from input 1
 File siaFile; // This string contains the sia information from input file 2
 BufferedReader sia;
 boolean allSIAHandled;
 String[] authorNames;
 int numberOfNames;

/* This method reads the rdf file containing SIA tags from the provided url */
        public void setSiaFile(String value);

/* This method initiates the two streams.
 * An array is created containing the different spellings of author names.
 * After this the RDF file is parsed and all author names that are found that are equal to
 * a spelling in the array are replaced by the author name spelling in the first location
 * in the array.
 * This procedure is repeated until all tags have been handled.
 */
        public String getRdf();

/* This method selects a SIA tag and saves all author names in this tag into an array.*/
        public void selectTag();

/* This method searches through the rdf input for a line containing an author name
 * from the author array. If such an author is found in a line, this name is replaced
 * with the first name in the array.
 */
        public void replaceNames();

/* This method replaces a word in a string with another word, in this case these words are author
 * names.
 */
        public void replaceName(String authorNameToReplace, String authorNameToWrite);


C.5 Rdf2Bib:

FileWriter out; // To write to the temp file, which will contain the BibTeX formatted information
BufferedReader rdf, rdf2, rdf3;
Persons persons; //Object to collect and later reuse author information.
Organizations organizations; //Object to collect and later reuse publisher information.
BufferedWriter bw;

       // These are the Rdf tags that can be dealt with by this program.
       String[] rdfEntry = {"Article", "Book", "Booklet", "Conference",


                                                 45
                          "InBook", "InCollection", "InProceedings", "Manual",
                          "Mastersthesis", "Misc", "PhDThesis", "Proceedings",
                          "TechReport", "Unpublished"};

/* These are the bibtex reference tags that correspond to the above rdf tags, and that
 * are most commonly used in the bibtex format
 */
String[] bibtexEntry = {"@ARTICLE", "@BOOK", "@BOOKLET", @CONFERENCE",
                          "@INBOOK", "@INCOLLECTION", "@INPROCEEDINGS",
                          "@MANUAL", "@MASTERSTHESIS", "@MISC",
                          "@PHDTHESIS", "@PROCEEDINGS", "@TECHREPORT",
                          "@UNPUBLISHED"};

/* Together with the below bibtexFields, these form the allowed fields for a rdf file
 * that is presented to this program
 */
String[] rdfFields = {"address", "annote", "author", "booktitle", "chapter",
         "crossref", "edition", "editor", "howpublished", "institution", "journal", "key",
         "month", "note", "number", "organization", "pages", "publisher", "school",
         “series", "title", "type", "volume", "year", "affiliation", "abstract", "contents",
         "copyright", "ISBN", "ISSN", "keywords", "language", "location", "LCCN",
         "mrnumber", "price", "size", "url", "editors", "fullauthor", "fulleditor",
         "full_author", "issue"};

String[] bibtexFields = {"address = ", "annote = ", "author = ", "booktitle = ", "chapter = ",
        "crossref = ", "edition = ", "editor = ", "howpublished = ", "institution = ",
        "journal = ", "key = ", "month = ", "note = ", "number = ", "organization = ",
        "pages = ", "publisher = ", "school = ", "series = ", "title = ", "type = ", "volume =
        ", "year = ", "affiliation = ", "abstract = ", "contents = ", "copyright = ", "ISBN = ",
        "ISSN = ", "keywords = ", "language = ", "location = ", "LCCN = ", "mrnumber =
        ", "price = ", "size = ", "url = ", "editors = ", "fullauthor = ", "fulleditor = ",
        "full_author", "issue = "};

/* This method reads the remote rdf file, and stores two copies of it
 * One for parsing and gathering author and publisher information
 * and one for reading and translating to bibtex. */
public void setRdf(String value);

/* First the RDF file needs to be parsed to match author names with persons.
 * Then each reference can be translated and written to BibTeX format.
 */
public String getRdf();

/* This method creates a Persons and a Organizations object that matches
 * persons with author names and Organizations with publishers.*/
public void parseRdfForAuthorsAndOrganizations();



                                           46
       /* This method works through the intire rdf file.
        */
       public void translateAndWrite();

/* This method reads references one by one, which are recognised by ow:Article, ow:Book etc
 * Then each reference is translated to BibTex, and after that written in a new file/object.
 * - First the entry type is translated and written.
 * - Second the rest of the entry/reference is translated and written.
 * There are three possible types of input.
 * - The field that is read contains author information, in which case writeAuthor() is called
 * - The field that is read contains publisher information, in which case writePublisher() is called
 * - The field that is read contains a field other then author or publisher, in which case
 * writeBibtexField() is called.
 */
         public void addReference(String s, BufferedReader br, int index);

/* This method finds the name that belongs to the current author(s) and translates and writes
 * this into bibtex format.
 * The relevant information is selected between the irrelevant rdf tags
 * In case of more the one author, all authors are read and translated.
 * Because there can be multiple authors, the next line in the file is always read,
 * if it doesn't contain author information the line is forwarded to writeBibtexField()
 */
          public BufferedReader writeAuthor(String field, int index, BufferedReader br);

/* This method finds the name that belongs to the current publisher(s) and translates and writes
 * this into bibtex format.
 * The relevant information is selected between the irrelevant rdf tags
 * In case of more the one publisher, all publishers are read and translated.
 * Because there can be multiple publishers, the next line in the file is always read,
 * if it doesn't contain publisher information the line is forwarded to writeBibtexField()
 */
          public BufferedReader writePublisher(String field, int index, BufferedReader br);
/* This method translates and writes a field into bibtex format.
 * The relevant information is selected between the irrelevant rdf tags
 * If the information is spread out over more then one line, for example when a reference
 * has a long title, or an abstract is present, the other lines are also read, see the if/while part.
 * At the beginning a few checks are made. For example if a closing reference tag by accident
 * is forwarded to this method, the method returns.
 * Url is also a special case, since it has no closing tags in rdf.
 */
          public BufferedReader writeBibtexField(String field, BufferedReader br);




                                                  47
C.6 StorePub:
/* This class stores a BibTeX file into flat text compatible with the IIDS
portal / ZOPE */

/* This array contains BibTeX formatted fields */
      String[] bibtexFormat = {"title", "author", "year", "pages",
                            "note", "abstract", "publisher", "editor",
                            "series", "location", "place", "booktitle",
                            "journal", "volume", "number", "address",
                            "file", "isn", "school", "subjects"};

/* This array contains ZOPE formatted fields */
      String[] portalFormat = {"Title:", "Author:", "Year:", "Pages:",
                            "Note:", "Abstract:", "Publisher:", "Editor:",
                            "Series:", "Location:", "Place:", "Booktitle:",
                            "Journal:", "Volume:", "Number:", "Address:",
                            "File:", "ISN:", "School:", "Subjects:"};


/* This array contains BibTeX formatted field entries */
      String[] bibtexEntry = {"@article", "@book", "@booklet", "@conference",
                            "@inbook", "@incollection", "@inproceedings",
                            "@manual", "@mastersthesis", "@misc",
                            "@phdthesis", "@proceedings", "@techreport",
                            "@unpublished"};


/* This method loads the provided bibtex file, through an url, into a buffer
*/
      public void setBibtex(String value;


/* This method initiates the process, and will later return a object
 * containing url(s) to the created files
 */
      public String getBibtex;


/* this method writes the BibTeX info into the file in plain text format
 * For every entry from the above define bibtexEntry list, a file with a
 * reference is created

*/
       public StringBuffer writeBibInfo();

/* This method writes a reference to a buffer it searches for compatible
 * bibtex format and translates it to plain (ZOPE) text
 */
      public StringBuffer translateAndWriteReference();


/* This method deletes characters that are no longer needed*/
      public StringBuffer cleanString(StringBuffer sb);i




                                                 48
Appendix D

This appendix describes where the URL‟s of the different annotations of the Web services can be
found. It would be unreadable to put the full annotations here, so instead the URL‟s to the files,
which are publicly available, are defined here.

D.1 GetFile

Service description: http://www.cs.vu.nl/~jbroekh/services/GetFileService.owl
Profile description: http://www.cs.vu.nl/~jbroekh/services/GetFileProfile.owl
Model description: http://www.cs.vu.nl/~jbroekh/services/GetFileProcess.owl
Grounding descriptions: http://www.cs.vu.nl/~jbroekh/services/GetFileGrounding.owl
                          http://www.cs.vu.nl/~jbroekh/services/GetFileGrounding.wsdl


D.2 GathAndStoreBib

Service description: http://www.cs.vu.nl/~jbroekh/services/GathAndStoreBibService.owl
Profile description: http://www.cs.vu.nl/~jbroekh/services/GathAndStoreBibProfile.owl
Model description: http://www.cs.vu.nl/~jbroekh/services/GathAndStoreBibProcess.owl
Grounding descriptions: http://www.cs.vu.nl/~jbroekh/services/GathAndStoreBibGrounding.owl
                          http://www.cs.vu.nl/~jbroekh/services/GathAndStoreBibGrounding.wsdl



D.3 BibInfoGath

Service description: http://www.cs.vu.nl/~jbroekh/services/BibInfoGathService.owl
Profile description: http://www.cs.vu.nl/~jbroekh/services/BibInfoGathProfile.owl
Model description: http://www.cs.vu.nl/~jbroekh/services/BibInfoGathProcess.owl
Grounding descriptions: http://www.cs.vu.nl/~jbroekh/services/BibInfoGathGrounding.owl
                          http://www.cs.vu.nl/~jbroekh/services/BibInfoGathGrounding.wsdl



D.4 ReplaceSIATag

Service description: http://www.cs.vu.nl/~jbroekh/services/ReplaceSIATagService.owl
Profile description: http://www.cs.vu.nl/~jbroekh/services/ReplaceSIATagProfile.owl
Model description: http://www.cs.vu.nl/~jbroekh/services/ReplaceSIATagProcess.owl
Grounding descriptions: http://www.cs.vu.nl/~jbroekh/services/ReplaceSIATagGrounding.owl
                          http://www.cs.vu.nl/~jbroekh/services/ReplaceSIATagGrounding.wsdl




                                                49
D.5 Rdf2Bib

Service description: http://www.cs.vu.nl/~jbroekh/services/Rdf2BibService.owl
Profile description: http://www.cs.vu.nl/~jbroekh/services/Rdf2BibProfile.owl
Model description: http://www.cs.vu.nl/~jbroekh/services/Rdf2BibProcess.owl
Grounding descriptions: http://www.cs.vu.nl/~jbroekh/services/Rdf2BibGrounding.owl
                          http://www.cs.vu.nl/~jbroekh/services/Rdf2BibGrounding.wsdl



D.6 StorePub

Service description: http://www.cs.vu.nl/~jbroekh/services/StorePubService.owl
Profile description: http://www.cs.vu.nl/~jbroekh/services/StorePubProfile.owl
Model description: http://www.cs.vu.nl/~jbroekh/services/StorePubProcess.owl
Grounding descriptions: http://www.cs.vu.nl/~jbroekh/services/StorePubGrounding.owl
                          http://www.cs.vu.nl/~jbroekh/services/StorePubGrounding.wsdl


D.7 Ontology

Ontology description: http://www.cs.vu.nl/~jbroekh/services/Ontology.owl



D.8 Template

Template description: http://www.cs.vu.nl/~jbroekh/services/Display_BibTeXTemplate.owl




                                              50