Adaptive Grid Services and by malj


									Adaptive High-End Grid Services
The Grid Administration Toolkit

      Murali Sangubhatla

    Master‘s Plan B Report

     Dr. Jon B. Weissman
        Faculty Advisor

            June 2003
Table of Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. Overview of Grid Technologies

      2.1 Introduction to Grid Technologies . . . . . . . . . . . . . . . . . . . . . . . 7
      2.2 Computational Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
      2.3 Open Grid Services Architecture – a brief overview . . . . . . . . 9
      2.4 Community Services – A Grid Infrastructure . . . . . . . . . . . . . 10

3. Gene Comparison Grid Service

      3.1 Genomic Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
      3.2 Implementing the Gene Comparison Grid Service . . . . . . . . . . 13
      3.3 Multiple Client Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
      3.4 Adaptive Gene Comparison Grid Service . . . . . . . . . . . . . . . . . . 18

4. Adaptive Resource Management Constructs – Expand / Shrink

       4.1 Dynamic Hosting Environments . . . . . . . . . . . . . . . . . . . . . . . . . 19
      4.2 Resource Addition Construct: Expand . . . . . . . . . . . . . . . . . . . 21
       4.3 Resource Removal Construct: Shrink . . . . . . . . . . . . . . . . . . . . 22

5. Grid Administration Toolkit

      5.1 Basic Functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
      5.2 Architecture and Implementation . . . . . . . . . . . . . . . . . . . . . . . . 34

6. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36


I would like to thank my advisor Dr. Jon B. Weissman for his constant encouragement
and motivation. I would also like to thank Byoung Dai Lee for his suggestions and
patience to answer my questions and all the members of the Distributed Computing
Systems Group for their support.

                                                                Murali Sangubhatla


Grid environments are typified by dynamic sharing of distributed resources. Design of high-end
applications for these complex environments demands expertise in parallel/distributed computing
and knowledge of infrastructure management. High-end grid services abstract this inherent
complexity of grids from end-users and allow them to invoke these services online to solve real
problems. These services themselves are, however, sensitive to the dynamicism of grid environments
and should be capable of smoothly shifting their computations to new resources. This report
describes the development of Gene Comparison Grid Service - a real, scientific high-end grid
service and the mechanisms for transparently adding or removing resources to instances of such
services at runtime. In addition, the report presents the Community Services Administration Toolkit
that makes the tedious tasks of grid monitoring and administration, user friendly and easy.

1. Introduction

Grid Computing is revolutionizing the traits of scientific computing and e-Businesses with the
promise for coordinated sharing of resources and computational power on demand. There are
numerous scientific applications that demand significant resources in terms of computation,
communication or data storage. Such applications can be called – high end applications. While grid
environments address the need for significant resources, developing applications on these complex
environments demands parallel / distributed computing expertise and knowledge about maintenance
of the infrastructure. High-End grid services abstract these details and let end-users invoke these
services across the network and solve ‗real-problems‘. Deployment of these high-end grid services
can be very challenging. Moreover, grid environments are becoming more and more dynamic
typified by the sharing of resources on-the-fly. Grid Services are themselves, sensitive to the
dynamicism of the resources. Dynamic grid infrastructures provide mechanisms for transparently
shifting several instances of these grid services to different resources in dynamic hosting
environments. This report describes the development of Gene Comparison Grid Service - a real,
scientific high-end grid service and the mechanisms for adding or removing resources to an instance
of a grid service at runtime.

Another challenging task is the administration of the grid itself. Common tasks such as resource and
service monitoring or remote administration can be mundane and time-consuming, requiring
interaction with every resource (typing the password etc.,) to be monitored. This report also
describes Grid Administration Toolkit that has been developed to make these tasks easy by
providing a GUI and a backend Script Server with Single Sign-on to the resources. Though the
functionalities of this administration toolkit can easily be extended to generic grid environments, it
has been implemented for the Grid Infrastructure of the Distributed Computing Systems Group and
hence interchangeably referred to as the Community Services Administration Toolkit. The
organization of the report is as follows: Section 2 gives an overview of Grid Computing by
introducing the Grid Technologies, Open Grid Services Architecture, necessary grid terminology
and Community Services – the Grid Infrastructure of the Distributed Computing Systems Group.

Section 3 describes the architecture and implementation of the Gene Comparison Grid Service,
different architectures to support multiple client interfaces, publishing a WSDL for the service, on

the guidelines of Open Grid Services Architecture and adapting the code to facilitate scheduling so
that it can be added to the existing Test Bed of the Infrastructure.

Section 4 describes the application transparent implementation of runtime resource addition
constructs – Expand and Shrink.

Section 5 describes the Community Services Administration Toolkit, its 2-tier architecture and the
single sign-on feature. Conclusions are presented in section 6 and the references in section 7.

2. Overview of Grid Technologies

2.1 Introduction to Grid Technologies

Until recently, application developers could often assume a target environment that was (to a useful
extent) homogeneous, reliable, secure, and centrally managed. Increasingly, however, computing is
concerned with collaboration, data sharing, and other new modes of interaction that involve
distributed resources. The result is an increased focus on the interconnection of systems both within
and across enterprises, whether in the form of intelligent networks, switching devices, caching
services, appliance servers, storage systems, or storage area network management systems. In
addition, companies are realizing that they can achieve significant cost savings by outsourcing
nonessential elements of their IT environment to various forms of service providers. These
evolutionary pressures generate new requirements for distributed application development and
deployment. Today, applications and middleware are typically developed for a specific platform
(e.g., Windows NT, a flavor of Unix, a mainframe, J2EE, Microsoft .NET) that provides a hosting
environment for running applications. The capabilities provided by such platforms may range from
integrated resource management functions to database integration, clustering services, security,
workload management, and problem determination—with different implementations, semantic
behaviors, and APIs for these functions on different platforms. We require new abstractions and
concepts that allow applications to access and share resources and services across distributed, wide
area networks. Such problems have been for some time a central concern of the developers of
distributed systems for large-scale scientific research. Work within this community has led to the

development of Grid technologies, which address precisely these problems and which are seeing
widespread and successful adoption for scientific and technical computing.

Grid Technologies support the sharing and coordinated use of diverse resources. The real and
specific problem that underlies the Grid Concept is the coordinated resource sharing and problem
solving in dynamic, multi-institutional virtual organizations. The sharing is not primarily that of file
exchange but rather direct access to computers, software, data and other resources as is required by a
range of collaborative problem solving and resource brokering technologies emerging in industry,
science and engineering. This sharing is, necessarily, highly controlled, with resource providers and
consumers defining clearly and carefully just what is shared, who is allowed to share, and the
conditions under which sharing occurs. A set of such individuals and / or institutions form Virtual

2.2 Computational Grids

Computational Grids are an enabling technology that permits the transparent coupling of
geographically-dispersed resources (machines, networks, data storage, visualization devices, and

                 Figure 2.1: An example of a Computational Grid. Image Courtesy: NASA

scientific instruments) for large-scale distributed applications. Grid Middleware binds such resources
by coupling various software and hardware components in a grid. A Grid Service, is any service in a
grid environment that
         1. can be implemented in any language and executes in an environment supporting it
         2. insulates the clients from native implementation details
         3. provides a WSDL and thus accessible to clients implemented in any language in which
             WSDL bindings are available.

2.3 Open Grid Services Architecture – a brief overview

OGSA can be seen as an extension and a refinement of the emerging Web Services architecture.
The designers of the Web Service Description Language anticipated the need for extensions to
the core language and provided the requisite hooks to make that possible. The extensions used
by the OGSA have designed include the concept of "service type", which allow us to describe
families of services defined by a collections of ports of specified types. OGSA also provides a
mechanism to specify that an instance of a service is an instance of a particular service
implementation of a specified service type and a way to say that this service is compatible with
others. These extensions provide a mechanism to describe service semantic evolution and
versioning. A basic WSDL instance document can only state that a service implements a port
with the given interface. It cannot convey any information about what the service does with
that port. The OGSA extensions allow us to name families of services that have identical
semantics and to assert that a particular service implements these semantics. Clients of the
service will have a clue as to what behavior to expect from the service.

OGSA specifies three things that a web service must have before it qualifies as a Grid Services.
First it must be an instance of a service implementation of some service type as described
above. Second, it must have a Grid Services Handle (GSH), which is a type of Grid URI for the
service instance. The GSH is not a direct link to the service instance, but rather it is bound to a
Grid Service Reference (GSR). The GSR might be (the OGSA allows for other representations)
the WSDL document for the service instance with the required "instanceOf" and other OGSA
extensions. The idea is that the handle provides a constant way to locate the current GSR for

the service instance, because the GSR may change if the service instance changes or is

2.4 Community Services – a Grid Infrastructure

The following figure provides an overview of Community Services: grid infrastructure and test bed
used by the Distributed Computing Systems Group.

The standard communication API and reusable components layer forms the essential grid
middleware binding the computational resources provided by the Beowulf cluster and the Computer
Science Workstations interconnected by the CS – LAN. This layer also provides standard reusable
components for deploying a grid service.

      Gene Comparison Grid Service                 N - Body Simulation Grid Service
       Client Interface Components                      Client Request Interface

                                   Scheduling Framework

       Gene Comparison Grid Service                 N - Body Simulation Grid Service
              Middleware                                      Middleware

                     Communication API & Reusable Components

   Figure 2.2: Community Services: Grid Infrastructure and Test Bed of the Distributed Computing Systems

The Gene Code Grid Service and N-Body Grid Service Middleware Layer built on top of the
Communication API and Reusable Component Layer consists of the specific grid service
computation hierarchy. The components in this layer closely interact with the Scheduling Layer.

The topmost layer hosting the Service components of the Gene Comparison Grid Service and N-
Body simulation service provides standard API for client request submission. Extensions to support
multiple client interfaces such as Web Clients, Web Service Clients (discussed later) are built on top
of this layer.

Computation Hierarchy in Community Services

The Computation hierarchy is essentially another perspective of the infrastructure that determines
the flow of information.

                                              Client Interfaces

                                          Service Manager (SM)

                           Performance DB                       Data Structures


                                             Request Managers

                                             User Tasks

                      Figure 2.3: Computation Hierarchy in the Grid Infrastructure

Process Manager (PM)

Although not seen in the software architecture in Figure 2.2, Process Manager (PM) is the primary
interface to a Computational resource in the Grid Infrastructure. It functions as a Gate Keeper to the
resource and provides access on authorization.

Service Manager (SM)

This component present in the top layer is a primary component of the infrastructure. It provides API
for clients to submit their requests. It interacts very closely with the Scheduler and the binding of the
Scheduler with the SM is so tight that the boundary between these two layers is blurred. The
Scheduler consists of several data structures such as active request queue, wait queue etc., and
maintains a performance database for high performance scheduling. The SM periodically monitors
the status of active requests.

Request Manager (RM)

The Request Manager is in charge of one client request and is created by the Service Manager (SM)
on receipt of a client request. It also gets an initial allotment of the resources for its computation.
The Request Manager can also receive resource allotment change notifications and should adapt the
computations accordingly.

User Tasks (UT)

User Tasks are the Grid Application specific code components and are created by the Request
Manager (RM). The Request Manager dictates their life cycles and tasks.

3. Gene Comparison Grid Service

3.1 Genomic Computations

Bio-informatics is rapidly evolving into a promising technology with vast research going into Gene
Matching applications. Several communities develop a repository of gene sequences and can
potentially share these huge genomic libraries along with their computational resources with other
organizations. Grid Computing is indispensable for this realization of the power of data/computation
sharing. A Gene Sequence Match between a source string ( O(m) ) and a target string ( O(n) ) takes
O(m*n) time (efficient algorithms such as KMP string matching algorithms reduce this complexity
to O(m+n)). Despite the use of efficient string matching algorithms, these genomic computations are
computationally intensive owing to the huge sizes of the genomic libraries (Mega Bytes to Tera
Bytes of data). This brings about the need for parallelization and high end resources – motivating us
to develop a Gene Comparison Grid Service, and provide high performance computational grids by
incorporating this service into the existing Test bed of the Community Services infrastructure.

3.2 Implementing the Gene Comparison Grid Service

The core of the Gene Comparison Grid Service is the ―complib‖ code, developed at the University of
Virginia. The complib code is essentially a data-parallel application that can be described by the
following pseudo-code:
             num_of_slaves = n; start = 0; target_library_size = T;
             for each source sequence s i {
                    end = start + T / n;
                    foreach slave S {
                            send source sequence si;
                            send target library boundary (start, end);
                    start = end;

                                                         boundaries etc.,


                                       Target Library
                             Figure 3.1: Parallel Gene Sequence Comparison

The master breaks up the entire target library among the slaves. For each of the gene sequences from
the source library, the master sends out library boundaries and other parameters such as source
sequence, target library name, algorithm to use. The slaves compute the matches and send back the
partial results. The master assembles all the partial results and sends them back to the client.

The basic Master, Slave components along with the algorithm implementation components have
been extracted and transformed into the components in the computational hierarchy described in
section 2.3. For example, the User Tasks code essentially consists of the core from Slaves; and the
Request Managers for Gene Comparison Grid Service consists of the core from Master. Some
changes were also made to the architecture libraries to make the code platform independent. This
code has been successfully deployed and tested on both Solaris and Linux platforms.


By providing a multi-threaded service interface, the Service has been made scalable to client
requests. However, a memory leak in the object libraries for gene sequence matching led to service
crashes. I have detected it and provided a module to seal this memory leak. As a result, the service is
highly scalable to client requests.

3.3 Multiple Client Interfaces

Clients for the Grid Service can mainly be classified into:

Intra-Grid Client Requests

The clients within the grid can submit the request by using the API provided by the SM. This
however, restricts the clients to be implemented in the same language as the grid service

Web Clients and Clients from any Java Platform

By using the platform independence of Java Programming language, clients implemented in Java
from any platform can be supported by the 2-tier Architecture described below. The Clients can also
be implemented as Java Applets embedded in HTML pages thus extending the accessibility of the
Service to a client as thin as a Web Browser. The design of this architecture was collaborative, with
one of the members in the group – Zhaoxin Ding providing client interfaces in Java and Java
Applets. I have made changes to the applet interface to conform to Java 2 security model.


                           Service in
      Source                                                                          Perform.
      Library                                                                         History
                                     JNI              Request


                                   Target                                    User Tasks

               Figure 3.2: 2-tier Architecture to support Java/Web Clients from any platform

The requests from the Client essentially consist of the names of the source, target libraries, number
of sequences to be compared and which algorithm to use. The request is processed by a multi-
threaded Java Server that invokes the native code using Java Native Interface (JNI). The native code
marshals the source sequences and submits the request on behalf of the client using the API provided
by the SM. The source library must be accessible to the native code invoked by the Proxy service.
Similarly, the target library must be accessible to User Tasks (which are essentially Slaves).

Web Services Clients
By providing a WSDL (Web Services Definition Language) for the Gene Comparison Service with a
3-tier architecture described below, the service can be called a true Grid Service by satisfying the
following criteria:

          1. Service can be implemented in any language and executed in any environment
              supporting it.
          2. The implementation language details must be insulated from clients.
          3. The Service should provide a WSDL and must thus be accessible to clients
              implemented in any language for which WSDL bindings are available.

                                                                                         Web Service

 Client Code                                                                                  JWS
                               Apache AXIS – SOAP Implementation

                                                SOAP                                    Proxy
                                                Request/                                Service In
                                                Response                                Java

                                                                                        Native Code

                                                                                  To Service Manager

Figure 3.3: Web Services Clients can use the WSDL published by the Java Web Service (JWS) and invoke
the service by using Simple Object Access Protocol (SOAP). The JWS passes on the request to Proxy Service
in Java similar to the Applet interface submitting request to Proxy Service

The JWS thus provides a Web Services Wrapper for the Proxy Service. The JWS itself is contained
in a Web Services Container that provides automatic service compilation and service instance
creation on demand. I have used Apache AXIS API for SOAP request response handling.

3.4 Adaptive Gene Comparison Grid Service

In order to facilitate scheduling such as resource harvesting or shortest job first without
interrupting/restarting the active computation, the code must be adaptive. In other words, the grid
application should be malleable to resources so that it can execute on fewer of additional resources.
For example: consider a case when a Grid Service that is allotted 5 resources. A new request from
the client arrives which is a short job; however all of the resources are active (busy with earlier
computations). In such cases, the Scheduler might want to harvest some resources from earlier
requests and execute the shorter job. This is not possible if the code cannot adjust to fewer resources
at runtime. Similarly, the scheduler might want to allot more resources to an existing computation in
order to finish its execution. The service should thus also be capable of using additional resources at

                                            Change in


                                         1. Re-compute slave
                                         2. Re-assign Target
                                         3. Synchronize with
                                            existing slaves
                                         4. Start new slaves if
                                            additional resources

                                           Send tasks to Slaves

                                 Figure 3.4: Flowchart for adapting the grid service

Thus, the Gene Comparison Grid Service must be adaptive. On being instructed by the scheduler, the
Master should synchronize with the number of slaves based on the resource allotment.

The flowchart in figure 3.4 describes the mechanism used for adapting the code to dynamic resource
allotment changes. If the request gets fewer than required resources, the Service Manager (SM)
restarts the computation by placing the request in the ready queue. Re-computing the target library
boundaries and slave loads can be achieved by reinitializing the library module. The re-initialization
has been verified and tested to be free from any global state effects for the application. With this
feature added, the Gene Comparison Grid Service is a full-fledged Adaptive Grid Service.

It is important to understand that the adaptivity described above takes place at a request-level
(internal to the Service Manager resources) and the service manager itself is running on a static set
of resources.

A small note to clarify the contributions in adapting the Gene Comparison Grid Service: The basic
synchronization between master and slave components was already a part of the Reusable
Components layer in the Grid Infrastructure. But, the original complib code runs with a static set of
slaves through out the computation life-time of a request. My contribution was to track the library
components from comp-lib code necessary to bring about the desired adaptation, to verify and test
for global-state effects and reuse these components in the master/slave code components for

4. Adaptive Resource Management Constructs – Expand / Shrink

4.1 Dynamic Hosting Environments

Moving a step ahead, the acquisition of resources by the Adaptive Grid Services can be dynamic. An
instance of an adaptive grid Service might have to smoothly move its computation as follows:

     1. All the computation to a totally new hosting environment

       For example, consider a scenario where the resource provider would like to claim back the
       shared resources hosting the Grid Service. If the Grid Service is not capable of moving its
       computation without restart, it might have to restart the current computations. This is not
       desirable if the current computations have executed for a long time now (which is typically
       the case in computationally intensive applications such as Gene Comparisons)

     2. Shift of partial computations to a new environment

       Such cases may arise if the Grid Service might have to surrender some of the resources it is
       currently using and expand the computations to new resources while continuing the
       computations on the resources that need not be surrendered. Again, for the same reason
       described above, a restart of computations is not described.

Implicit in the above requirements is the need for the resource addition/removal constructs for
adaptive grid services. For example, the shift of partial computations can be implemented by the
following pseudo code:

             Adaptive Grid Service running on R-{r1, r2, .. , ri,… rj}
             Receive Message Shrink (Rs);          where Rs = { ri,… rj}
             Receive Message Expand(R’s);

The old resource provider sends the shrink message and the new resource provider sends the expand
message. The Grid infrastructure provides an API to external entities such as resource providers by
implementing mechanisms to update its internal data structures. However, the services must
themselves be adaptive as described in section 3.4.

The following sections 4.2 and 4.3 describe the implementation of these resource addition/removal

4.2 Resource Addition Construct: Expand

The following is a description of the application transparent implementation of the resource addition
construct – expand.

Let us examine the pseudo-code of the original adaptive grid service:

                      1. Read the performance database into memory
                      2. Initialize the active, ready queues
                      3. Initialize the Resource Allotment Table
                      4. Open the necessary communication channels
              Start listening to client requests
              while (running) do
                      Schedule the requests
                      Process the requests
                      update the performance database on end of request computation

Clearly, the grid service is initialized with a static set of resources and it continues to schedule client
requests on this set throughout its life time. Thus, to expand the computation to new resources, the
service code must be modified to update the resource table in a way that it is visible to the scheduler.

The following issues must be addressed:

1. The addition of new resources should not exclude any of the existing resources. This is an
   important consideration. For example: if the new resource being added has the same resource-id
   internal to the resource table, the new resource simply replaces the existing resource which is
   NOT desired. Thus, the resource-ids must be unique. This can be achieved by mapping the
   global resource-ids to local resource-ids and by incrementing the local resource-ids on resource

2. The addition of new resources should not interrupt any of the active computations. With the
   computation hierarchy clearly defined, the addition of new resources is totally transparent to any
   layer below the scheduler. Thus, the active computations are not affected.

With these issues in mind, the Expand construct has been provided and tested for the Gene
Comparison Grid Service.

4.3 Resource Removal Construct: Shrink

This resource removal construct is tricky because of the synchronization issues. It is a policy issue
between the resource provider and the Grid Service whether certain grace time would be given for
surrendering the resources or not. Thus, there could be two variants of Shrink

         1. Forced Shrink
             In this implementation of Shrink, the Service might simply be denied access to the
             resources to be surrendered. The Service must be able to distinguish between a resource
             unavailability due to failure and a resource unavailability due to loss of authorization. It
             must thus be capable of computation recovery and restart.

         1. Graceful Shrink
             In this variant of the Shrink construct, the resource provider notifies the Service to
             surrender the resources. The resource provider lets the service continue its computation
             for certain time so that the service can gracefully shrink its computation.

I have provided an implementation of the second variant - Graceful Shrink. The implementation
mainly performs the following actions:

   1. Change Resource Allotments to Request Managers on Victim Resources
   2. Collect the Victim Resources after Adaptation
   3. Surrender the Resources

The Scheduler must identify the computations running on the victim resources and notify the
respective request managers to adapt to fewer resources (not consisting of victim resources). If a
request computation cannot proceed by surrendering these resources, it is restarted by placing on the
ready queue.

However, since the process of creating a request manager when a new request arrives is NOT
atomic, the adapt notification fails when the request manager is still being created. Thus, by
identifying this critical section, the following solution can be provided:

               if (request manager not allotted) {
                      wait for request manager allotment;
                      send the adaptation event;

Clearly, busy wait- is not a solution here since it is a) not efficient and b) not possible with single
threaded SM implementation. Thus, the SM has been upgraded to be multithreaded and the platform
independency is preserved by conforming to POSIX standards.

With the multithreaded code in place, the solution is efficient with the use of Condition Variables as


           Construct the Victim Resource Set R
           for each resource r in R {
                  if( r is IDLE) {
                             delete the information from resource table
                             surrender the resource
                  } else {
                             if RM is not allotted {
                                      create-thread(wait-for-RM, resource-args);
                             else {
                                      send-adaptation-notification to RM;


BEGIN wait-for-RM
           while (RM for resource not allotted) {
                  // wake-up signal from signaling thread;
       release the mutex
   send-adaptation message to RM.
END wait-for-RM

Message-listening Thread:

    if (new RM allotted) {           // creation acknowledged by the RM
            broadcast signal to all waiting threads;

if(resource adaptation ACK-RELEASED from RM) {
            if ( released resource is a Victim resource) {
                     if( resource was borrowed from another request) {
                     else {
                              mark the resource as UNAVAILABLE;

Resource Collector Thread:

            collect all the UNAVAILABLE resources
            ASSERT that they are not being used by the service
            Delete their information from the resource table
            surrender the resource

                      IDLE                           ACTIVE                    RESERVED

                                Figure 3.1: State Transitions for a Resource

The IDLE, ACTIVE states are self-explanatory. A resource is RESERVED, if it was borrowed from
another active request and needs to be re-allotted to the original request, after finishing the shorter
request (or the request that borrowed the resource). The Resource Collector thread waits for the
resource to come back to IDLE state for release. Similar solution can be provided for another
possible synchronization event when a request has finished computation at the same time as Shrink
message received for its resource set.

5. Grid Administration Toolkit

The Grid Administration Toolkit makes the grid administration tasks such as build/tear the
infrastructure, monitoring a resource, remote process management on a node or on a grid, easy. By
providing a user-friendly GUI for administration, the Grid Administration toolkit transforms user
events into complex and routine tasks of grid administration.

5.1 Basic Functionalities

Grid Monitoring and Services Administration

      Resource Monitoring
         Monitor the resource utilization of any machine in the computational grid.

                    Figure 5.1: Resource Monitoring using the Administration Toolkit

The following functionalities are available at two levels:
             a. Node or Single resource level
             b. Group Level (under the tab Group Management)

      Process/Service Monitoring
         Monitor any machine in the computational grid to identify different processes running on it
         in general, and Grid infrastructure specific services in particular (such as view all the active
         requests on a resource etc.,)

Figure 5.2: Monitoring Community Services on remote hosts using the Grid Administration Toolkit

Figure 5.3: General Process Monitoring using Grid Administration Toolkit

Figure 5.4: Group Process Monitoring using Grid Administration Toolkit

   Service Launch
      Launch a service (such as a Gate-Keeper service or the PM) on a resource/machine

Figure 5.5: Launching a Service using the Administration Toolkit. The Script Server verifies if a Process-
Manager (Gate-Keeper Service) is already running and starts it only if it is not already running.

       Process/Service Termination
          In general, terminate any process running on the system. Specifically, bring or tear down
          services running on the resource. The processes /services could be terminated by
              a. Process Id
              b. Process Command Line key-word match
              c. Functionality – this is specific to the grid infrastructure described in section 2.2.

   Figure 5.6: Terminating Processes by keyword match using Grid Administration Toolkit

In a dynamic Grid environment, there could be multiple Resource providers managing different
pools of resources. The following functionalities have been added to the Grid Administration toolkit
to make the Grid Administration simple in a dynamic grid environment with multiple resource
providers and several instances of grid services.

Resource Management using Resource Providers

      Resource Information Service for a Resource Provider
       Provides information such as Resource-Ids, Platform, IP-Addresses of the resources managed
       by the particular Resource Provider

      Resource Leasing Information for a Resource Provider
       Lets the administrator view the leasing information details such as start-of-lease, duration-of-
       lease, grid service instances associated with the lease for all the resources managed by the
       resource provider.

       Figure 5.7: View Resource Leasing Information using Grid Administration Toolkit

      Resource Addition/Removal for a Resource Provider
       The Administrator can add (remove) resources to (from) a resource provider.

Adaptive Grid Services Management

      Launch an Adaptive Grid Service
       Just by providing the details of the Resource Provider, the list of resources to host the service
       instance on and the resource on which the application interface or the SM needs to be
       launched (SM-location), an adaptive grid service can be launched with the click of a button.
       The following actions take place when an Adaptive Grid Service is launched:

                    1. Book-Keeping for Lease Management
                    2. Generation of configuration files for the Grid Service
                    3. Verify the availability of a Gate-Keeper service on each of the resources. If
                        the service is not already up, bring it up.
                    4. Set the necessary environment for launching the SM.
                    5. Start the SM on the specified SM-location.

        If the infra-structure does not use a distributed file system, the necessary code libraries might
        have to be deployed on each of the resources and can easily be added to the administration

        Figure 5.8: Launching an Adaptive Grid Service using the Grid Administration Toolkit

       Stop an Adaptive Grid Service
        Since the Adaptive Grid Service can potentially receive more client requests while it is in the
        process of bringing itself down, it must stop accepting new requests before anything else.
        Thus, bringing down of an adaptive grid service must be carried out in a top-down order. The
        following actions take place in the same order:

    2     Bring down the Proxy Services and interfaces to client request for
    3     Bring down the SM
    4     If the current computations must also stop immediately, terminate the Request Managers
          and User Tasks on all the resources. Else, wait for the completion of active computations.

         The concerned Request Managers can send the results of active computations to active

      View the Leasing Information
       The administrator can view the resources being used by the adaptive grid service

5.2 Architecture and Implementation

The Grid Administration uses the 2-tier architecture shown in Figure 4.1. The Administration GUI is
the user interface presenting all the features described in section 4.1. All the remote administration
and monitoring tasks are performed by the Multi-threaded Script Server in the back-end. The Script
Server has access to the resource-books (for book-keeping), the configuration files (specific to the
Adaptive Grid Services or Resource Providers) and other environment set-up information.

                                User Events
           Admin. GUI                                   Script Server

                                                                      1. Remote Administration
                                                                      2. Resource BookKeeping
                                                                      3. Grid Monitoring
                                                                      4. Service

                     Figure 5.9: 2-tier architecture of the Grid Administration Toolkit

A well defined protocol between the GUI and the Script Server transforms the user events into
complex grid administration tasks.

Most of the high-end computational resources that typify resources in a computational grid deny
access to X-windows or other interactive applications. This 2-tier architecture overcomes this
problem by separating the GUI from Script Server.

          Register User
                                    SSH Agent
                                                     User                            Resources
  User                                               Credential?

                         Figure 5.10: Single sign-on for Script Server using SSH

The Script Server uses the Single sign-on feature with SSH protocol to prevent the grid resources
from prompting for password. By creating a public-private key pair and making the public keys
available on each of the grid resources (if the resources do not share a distributed file-system, they
must be copied securely to each of the resources), and by registering the user credential (password)
with the SSH-AGENT once, the Script server can log into any resource without supplying a
password. The GUI is developed using Perl/Tk and the Script Server in Perl.

One advantage of this toolkit is that, it is very lightweight needing no special privileges on the
computational grid. It can also be very useful as a general remote administration toolkit and also as a
debugging utility for distributed application developers.

6. Conclusions and Future Work

Grid environments are complex in nature and the transformation of a distributed/parallel application
into a grid service can be accomplished by using standard protocols and services as specified by the
Open Grid Services Architecture. The code libraries and service components must be carefully

packaged for easy deployment. Moreover, the increase in the number of scientific and e-Business
projects relying on grid infrastructures urges the necessity for dynamic resource addition/removal
constructs such as Expand/Shrink that facilitate the shift of computation intensive applications in
dynamic hosting environments. The report showed an application transparent implementation of
these constructs using the Community Services grid infrastructure.

The Grid Administration Toolkit is an effort towards making grid administration as user friendly as
possible and I am planning to make it reusable and more generic to suit the dynamic hosting
environments coming into existence.

We are currently working on evolving the Community Services Infrastructure with additional
services and focusing on profiling and prediction services for dynamic host environments.

7. References

     1.   IBM Grid Computing -
     2.   The Globus Project -
     3.   Grid Computing: Making Global Infrastructure a reality – edited by Fran Berman et al.,
     4.   An Open Grid Services Architecture for Distributed Systems Integration – Ian Foster, Carl
              Kesselman, Jeffrey Nick, Steven Tuecke
     5.   The Anatomy of the Grid – Ian Foster, Carl Kesselman, Steven Tuecke
     6.   Building Web Services with Java – Steve Graham et al.,
     7.   An analysis of the Open Grid Services Architecture – Dennis Gannon et al.,


To top