Docstoc

outline1

Document Sample
outline1 Powered By Docstoc
					Operating System Services for Grid Architectures
− An Outline


Pradeep Padala
      ppadala@cise.ufl.edu


v0.1, 2002−08−12

Revision History
Revision 0.1                 2002−08−12   Revised by: ppadala
Initial Draft
                                Operating System Services for Grid Architectures − An Outline



                                                       Table of Contents
1. Introduction.....................................................................................................................................................1

2. Intent................................................................................................................................................................2

3. OS Services......................................................................................................................................................3
       3.1. OS Network Interface.......................................................................................................................3
       3.2. OS IO Interface.................................................................................................................................3
       3.3. Scheduling........................................................................................................................................3
       3.4. Resource Accounting........................................................................................................................3
       3.5. Resource Discovery and Match−Making.........................................................................................3
       3.6. Security.............................................................................................................................................4
       3.7. Execution Environment....................................................................................................................4
       3.8. Performance Monitoring, Instrumentation etc..................................................................................4

4. Goals.................................................................................................................................................................5
        4.1. Generic Goals           ....................................................................................................................................5
        4.2. Short−Term Goals               .............................................................................................................................5

5. Implementation...............................................................................................................................................6

6. Q & A...............................................................................................................................................................7
       6.1. Why Linux?......................................................................................................................................7
       6.2. Why Globus and Legion?.................................................................................................................7

7. References........................................................................................................................................................8




                                                                                                                                                                             i
1. Introduction
There has been great interest in Grid Technologies recently. Companies like IBM, Sun started investing
considerable effort into these cutting−edge technologies. This also caused some confusion over the term
"Grid". People often confuse Grid with Clusters.

Before we start, let us clarify the term "Grid Computing", informally. Grid computing is concerned with
"coordinated resource sharing and problem solving in dynamic, multi−institutional virtual organizations".
Here the main aspect is on sharing the resources managed by different organizations whose policies may be
drastically different. There is no central authority which controls or maintains information about the resources.
This is where a cluser differs from a grid.

Ian Foster has provided a three point checklist [1] for a better understanding of the grid.




1. Introduction                                                                                                1
2. Intent
Many attempts have been made to harness the power of Distributed Resources. Projects like condor[2] have
concentrated on a specific area like High−Throughput. Many special purposed tools are developed to address
issues like "message passing", "resource discovery", "scheduling", "process migration" etc.. Some of these
research challenges require careful design and a robust architecture.

Middleware like Globus[3] toolkit, Legion [4], JXTA[5] try to address some of these issues by providing a
robust framework to build tools. The design of these middleware is done with the specific goal of making it
easy for application writers.

These middleware are developed with platform independence in mind and they try to fill the gaps due to
deficiences in Operating Systems. For example, Condor provides a user level library[6] for checkpointing the
processes. This checkpointing is crucial to the transparent process migration done by Condor. Similarly
Globus toolkit provides mechanisms to discover, match resources etc..

Our intent is to provide operating system services to make the job of middleware easy. We will also develop
mechanisms which would facilitate High−Performance Computing commonly seen in grids.




2. Intent                                                                                                     2
3. OS Services
Surprisingly, little research is done in providing operating system services for Grid Architectures. So far, the
emphasis was on making the toolkits necessary to build grid applications.

On the other hand, extensive research has been done on Distributed Operating Systems. We have to adapt
some of these ideas into Grid based systems. The trend is towards turning machines running a commodity
operating system part of grid rather than trying to write a special purpose Distributed Operating System.

As described in the "Grid Book"[7], High−Performance Computing is suffered by the so called "OS
bottleneck". Traditionally, the Network and IO subsystems in operating systems concentrated on providing
unified, transparent interfaces to the underlying hardware. These systems incur high per−byte processing
overhead as a result. Accounting and scheduling of OS resources is almost non−existent. These things are
very important for Grid and High−Performance architectures.


3.1. OS Network Interface
The network interface provided by OS plays a significant role in achiveing high performance. There have
been different attempts at solving this issue. One attempt close to our work is PODOS[8], A Performance
Oriented Distributed Operating System. It attempts to achive high network performance by custom protocols.
The work seems to be in limbo for sometime.


3.2. OS IO Interface
IO interface also plays important role in achieving high−performance. The nework and IO subsystems
actually cannot be seperated so easily. Some of the functionality overlaps and similar techniques are required
to achieve high performance.


3.3. Scheduling
This has profound impact on grid technologies. In Grid architecture, different sub systems provide job
scheduling, cpu scheduling, network resource scheduling (for QoS purposes) etc... Each of these systems
provide a different API to applications. To achieve high−performance, a global end−to−end scheduling of
resources is required. I believe this can be achieved by delegating some parts of the scheduling to OS.


3.4. Resource Accounting
Resource accounting is non−existent in general purpose operating systems. It is essential for the success of
distributed nature of the grid.


3.5. Resource Discovery and Match−Making
This is a complex topic with variety of solutions. Some researchers like in JXTA, tried to provide generic
resource discovery protocols. Others even went to the extent of matching the resources with the requirements.

I belive some of these services should stay at user−level only. We certainly do not want to match two XML


3. OS Services                                                                                                     3
                     Operating System Services for Grid Architectures − An Outline

documents by a buyer and seller at OS level. But, OS has the best knowledge of the hardware resource usage,
consumption etc.. So, the OS should provide proper interface to query this data and present it in a format
useful for the middleware. The actual resource matching decisions should be done at the user level.


3.6. Security
Security is a huge topic with various consequences. Security services should be present in every layer of the
grid architecture. With respect to OS, OS should be able to provide services which allow the safe−execution
of the downloaded code. If possible, it should also protect against the software bugs not anticipated by the
author. See next section.


3.7. Execution Environment
This section is some what related to the previous one. We need a better interface to the execution environment
than the environment provided by traditional operating systems.


3.8. Performance Monitoring, Instrumentation etc..
This is another issue which is not addressed by traditional commodity operating systems. We need services
which can be used by the middleware for monitoring of the performance. Services for instrumentation can
probably be done at user level with other OS services.




3. OS Services                                                                                                  4
4. Goals
In this section we discuss the plan to implement these services.


4.1. Generic Goals
The long term goal is to provide operating system services which makes it easy to write middleware. The
middleware can concentrate on other high−level ideas like resource matching, load−balancing etc.. Here's a
list of general goals.

      • Provide all the OS services relevant to High−Performance and Grid architectures.
      • The services should be easy to add. It is written as a module and a simple insmod grid−module
        should add the services. Removing of services should be similarly simple.
      • For each service, careful thinking should be done to avoid unnecessary additions. If it can be done at
        the user level without any performance penalties, then rethink the design
      • A systematic study of the middleware is needed to identify the refactoring.


4.2. Short−Term Goals
      • A Quick proof−of−concept implementation
      • A simple framework for testing
      • More thorough research on the existing technologies




4. Goals                                                                                                         5
5. Implementation
The implementation will be done on Linux. The services will be built as a module which is kept as small as
possible.

Globus and Legion will be used as the main target middleware.




5. Implementation                                                                                            6
6. Q & A
This section answers some of the reasons for design and implementation decisions.


6.1. Why Linux?
Because I like it :−)

It's the ideal choice for playing with the kernel. It's one of the most successful open−source commodity
operating systems. It's constantly improved at a frenetic pace.


6.2. Why Globus and Legion?
These two are the most mature grid middleware available today. Globus is quickly becoming the standard for
grid technologies. Similarly Legion has wealth of research devoted to it.

We will also look into JXTA. But, it seems like it's more oriented towards P2P computing.




6. Q & A                                                                                                   7
7. References
    1.
         What is Grid? A three point checklist by Ian Foster Argonne National Lab & University of Chicago
         [HTML]
    2.
         Jim Basney and Miron Livny, "Deploying a High Throughput Computing Cluster", High Performance
         Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May 1999.
         [Postscript]
    3.
         The Anatomy of the Grid: Enabling Scalable Virtual Organizations. I. Foster, C. Kesselman, S.
         Tuecke. International J. Supercomputer Applications, 15(3), 2001. [PDF]
    4.
         "Legion: The Next Logical Step Toward a Natiowide Virtual Computer" UVa CS Technical Report.
         [PDF]
    5.
         Project JXTA: An Open, Innovative Collaboration. [PDF]
    6.
         The Grid: Blueprint for a New Computing Infrastructure Edited by Ian Foster and Carl Kesselman,
         July 1998, ISBN 0−97028−467−5
    7.
         Michael Litzkow, Todd Tannenbaum, Jim Basney, and Miron Livny, "Checkpoint and Migration of
         UNIX Processes in the Condor Distributed Processing System", University of Wisconsin−Madison
         Computer Sciences Technical Report #1346, April 1997. [Postscript]
    8.
         Vazhkudai, S., Maginnis, P. T. "A High Performance Communication Subsystem for PODOS."
         Techinal Report, Department of Computer Science, University of Mississippi, Oxford, MS, August
         1999. [Postscript]




7. References                                                                                               8

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:9
posted:2/14/2010
language:
pages:10
Description: grid computing system - products - applications