Document Sample
deployment Powered By Docstoc
					    Deployment of High-Performance
     Computing Platforms for Non-
          Computer Scientists

            Evan Lee Turner, Claus Walter Spitzer

    Clustering, System Monitoring, Usability, Abstraction,

                           Table of contents

    Abstract                  ……………………………       p.    2
    Introduction: HpCDC       ……………………………       p.    2
        o Our Mission          ……………………………       pp.   2-3
        o HEX                  ……………………………       p.    3
    The User Environment      ……………………………       pp.   3-4
    Our Tools                 ……………………………       p.    4
        o The Imp Suite.       ……………………………       pp.   4-5
        o Monitoring Tools     ……………………………       pp.   5-7
    Conclusions               ……………………………       p.    7
    References                ……………………………       p.    8

        High performance parallel computing platforms have become incredibly
popular in the last decade. This paper focuses on the work done at the High-
Performance Computing Development Center [HpCDC 03] at Texas A&M
University - Corpus Christi (TAMU-CC) to extend the usability of the center's main
cluster, HEX, by designing the user environment and creating a suite of tools to
enhance the use of our facility by researchers whose main background is not
computer science, but whose work may still benefit from a high performance
computing platform (HPCP). This category includes biologists and chemists among
others, but is not restricted to natural sciences. Computer scientists whose research
focus is not on high performance applications may also benefit from this setup.

                                  Introduction: HpCDC
Our Mission

        HpCDC has its origins in an independent project run by at the time two
undergraduate students, Evan Lee Turner1 and Claus W. Spitzer2. The goal of the
project was to create a low-cost computer cluster utilizing old hardware donated by
Texas A&M University – Corpus Christi and independent entities. The cluster
project was successful in creating a large zero-budget cluster capable of 16GFlops
of computing bandwidth.         In 2003 Dr. Michelle Moore and Dr. Mirley
Balasubramanya coauthored a grant from the National Science Foundation [NSF
03], which provided the resources to build from the initial cluster project and fund
the creation of HpCDC.

        The mission of the High Performance Computing Development Center is to
support the development of computational science applications. The HpCDC
operates a scalable GNU/Linux cluster targeted for academic scientific and
engineering research at TAMU-CC and other universities. The HPCDC staff assists
in the development and implementation of client research applications.

        Having to support multiple disparaging research emphasis and backgrounds,
it becomes a challenge to support a complex computer system that must handle all
manner of programs at the same time without disruption of support services. As
being a supporting institution, the more abstraction that can be created between the
end user and the underlying system is a goal to be achieved. When building the
cluster that would be the main system for HpCDC the focus became that the cluster
software systems would be built and tailored around user programs, not user
programs created specifically for the system. This created many challenges during
the design phase because the system must now be able to run a multitude of

    Evan Turner is a graduate student at Texas A&M University – Corpus Christi
    Claus Spitzer is a graduate student at the University of Waterloo

clustering packages simultaneously. Accordingly, users are able to run software
using these separate cluster software systems concurrently without ill effects.

        As with any computer system, reliability is a key concern that can adversely
affect user research as well as have negative impacts on the seen expertise and
professionalism of the institution. Although as with any large system problems will
occur, it becomes especially challenging with cluster systems due to the sheer mass
of independent hardware and software that must run on every machine in the
cluster. For example, each computer in the cluster should not be a critical system in
which a single machine failure would cause user programs to break. Unfortunately,
many clustering systems exhibit this behavior which must be worked around by
other system services.


        HEX is the nickname given to our main clustering platform. HEX currently
consists of 12 Dell 2650 Dual Xeon capable servers running at 3.06 GHz each, and
connected with copper Gigabit networking. The operating system is Debian
GNU/Linux 3.0r1, with a 2.4.21 kernel featuring the openMosix kernel patch [BAR
02]. openMosix is a free single-system image (SSI) clustering tool which provides
automatic load-balancing, a key feature for the mission of our facility. openMosix is
a fork of the original MOSIX clustering system [BARAK 98], which became
proprietary in late 2001. For extended functionality we have applied the migratable
shared memory (MigShm) patch made available by the MAASK team [MAASK

        In addition to the openMosix SSI clustering system, HEX also supports
several other concurrency languages and modeling systems. The most widely used
parallel programming interface used on the cluster is LAM-MPI version 6.5.9
[BURNS 94]. LAM-MPI is an open source implementation of the message passing
interface specification [MPI-2 97]. Also running on the system is the NAMD
Scalable Molecular Dynamics resource for Macromolecular Modeling and
Bioinformatics and was developed by the Theoretical and Computational
Biophysics Group in the Beckman Institute for Advanced Science and Technology
at the University of Illinois at Urbana-Champaign [KALÉ 99]. Using either the
MPI or openMosix system to distribute work units, NAMD is a valuable modeling
tool for chemists at Texas A&M University - Corpus Christi to solve large
computational problems.

                            The User Environment
        The principal goal when setting up the user environment was to hide the
largest amount of cluster mechanics from the end-user without sacrificing
efficiency and expandability. We purposefully did not implement directory services
such as ldap and yp because of their complexity and the difficulty in adapting
them to the project goals. Instead a single machine was designated to be the entry

point to the cluster, a so-called head machine. Although most of the core daemons
are located on a different server for load balancing, users will use the head machine
for logon and process spawning. User operations on the head machine proceed as if
the cluster was a single system, hence the aforementioned importance of
openMosix. User processes will automatically and transparently take advantage of
the entire cluster’s capabilities.

                                     Our Tools
         Preexisting software tools for cluster computing did not provide the desired
functionality to achieve optimal transparency without a high cost to features or
stability. This led to the development of a set of utilities to achieve such goals. The
following tools are a result of these goals.

The Imp Suite

      The Imp suite encompasses all the tools created specifically for the HEX
computer cluster. These tools are mostly administrative in nature and are largely
dependant on the cluster running the openMosix system.

         imp_dsh is the most valuable utility within the Imp Suite, and the
foundation of many of the other tools. It is a distributed shell (dsh) tool – in other
words, it allows the user to execute commands across a network of computers with
minimal effort. By default, imp_dsh utilizes real-time information provided by the
openMosix clustering system to collect the list of hosts that it should run on, but it
includes an option to manually specify a file with the list of hostnames to be used.
This option is useful for situations where openMosix is unavailable. Another feature
that was included in imp_dsh is the ability to suppress execution output, which
was added as a result of the need to run scripts in silent mode. It is important to note
at this point that the silent mode only suppresses messages sent to the standard
output console – error messages still get displayed. imp_dsh is preferred over
other, more feature-rich distributed shells such as clusterssh because of its ability to
run on a simple command line interface [CLUSTERSSH 03]. Also, because of its
scripted nature (imp_dsh is written completely in Perl) it is easier for the
administrator to tailor it to his/her needs. If needed, imp_dsh can be reduced to its
primary components, the list-based imp_massdo, which was based originally on
do_massdo [FREED 03], and the hostname collection tool, collect_hosts,
which was created to address the lack of dynamicity in imp_massdo. This allows
it to be the base for future dynamic administrative tools, such as the planned parallel
version of imp_dsh.

        Second in importance within the Imp suite are the user setup tools. To
decrease the complexity for end-user tasks, the amount of work to accomplish
administrative tasks increased as a result. Many features, such as the ability to
utilize MPI programs, created the result of a need for user accounts to roam all
machines within the cluster without having to supply a password. These features

are enabled to end-users, albeit the implementation hidden. The HpCDC cluster is a
dynamic environment, meaning there is a constant need to add or remove users as
research projects appear/terminate. The underlying complexity of such tasks led to
the deployment of a set of scripts, uberadduser and setupuser, to address
these issues. These tools automatically set up user accounts to take advantage of the
distributed nature of the cluster, such as password-less roaming and shared home

         Third are the tools aimed at users of the LAM-MPI parallel language. These
tools, similar to the administrative scripts, were created upon the need to provide
support for dynamic clusters, where nodes are not guaranteed to provide 100%
uptime. By utilizing the same hostname collection tool present in imp_dsh, these
tools provided a dependable starter kit for researchers using LAM-MPI. This kit
contains two core applications, lamjumpstart and mpirun_sh. The
lamjumpstart utility starts the MPI daemon on all available machines without
the need of a static host list. To maintain system stability, lamjumpstart will
overlook dead or unavailable nodes from the boot process dynamically, thus a node
failure in the cluster will not affect the MPI system. mpirun_sh is utilized once
the MPI daemon has been started and is designed to run MPI-based programs with
the optimal amount of separate processes based on the current cluster configuration
at the time. Therefore, even during maintenance periods or unplanned failures the
MPI system will be operational for all users and optimized on the cluster.

Cluster Monitor Utilities

        Within the HPCDC environment framework monitoring user processes
becomes a necessary task. Users require confirmation of their process execution as
well as running statistics of load and memory usage.

        Mosmon, a utility provided by the openMosix userland tools, is a graphical
monitor utility that shows load, memory usage, and other openMosix features
[OPENMOSIX 97]. Although appropriate for most needs on the openMosix system,
several other tools have been created to supplement or replace mosmon for specific
situations. For example, although mosmon’s CPU and network usage is negligible
for a single instance, when called by multiple users simultaneously system load can
increase dramatically. Also, mosmon will only operate on true SSH terminals with
marginally decent network bandwidth. Clients on slow dialup or without proper
terminals are without a monitor utility.

      To counter the limitations of the openMosix mosmon utility, two separate
monitor utilities were created to give statistical load information on the terminal

        The    impmon       utility
collects real-time information
from the openMosix clustering
software. Lacking any graphical
functions, impmon provides a
low-key solution for slower
connections and is the preferred
method of monitoring on HEX.
Impmon also allows an optional
parameter to be passed to it upon
execution for variable refresh
                                                 Figure 1: impmon

       In addition, if the cluster is booted into a non openMosix kernel mosmon
and impmon would be non-functional. A separate monitoring tool was created to
address this used in this case.

       Emonitor is the terminal monitor tool that was developed in 2001 for the
ECluster clustering system that was used on the predecessor of HEX [TURNER

         Emonitor    queries    a
ruptime service on all node
machines       and   prints   the
corresponding loads with text
graphics. Since the Emonitor
service relies on using a system
service on each computer, it can
be scripted to run on startup of
the non openMosix kernel. Thus
even during the special and
sporadic occasions that the
cluster needs to be in a non-
openMosix state monitoring can
still be achieved.                               Figure 2: Emonitor

        The following table depicts the feature list of the three aforementioned
cluster monitoring utilities.

Utility      Needs      Graphical      Fast          Displays        Displays    Term.
             OM                        Operation     Values          hostnames   Friendly
mosmon       *          *
impmon       *                         *             *               *           *
Emonitor                *(text)        *             *               *           *

                         Table 1: Feature list of monitoring tools

      Table 1 clearly illustrates how our custom utilities exceed the original
monitoring tool in desired features.

                            Conclusions/Future Work
         Administration of an all-purpose, highly-abstract system has proven to be a
difficult task. Highly specialized systems are abundant, but not enough effort is
being done in providing support for the lay person. It is a goal of the HpCDC that
this is an area that deserves more attention.

        Finally, the tools created are far from satisfying all needs for the cluster. For
example, imp_dsh is a great distributed shell utility; however it is restricted due to
its execution being serial in nature. Effort is currently being put into creating a
parallel version which will become the base for more complex tools.

BARAK 98: Barak A., La'adan O., The MOSIX Multicomputer Operating System
   for High Performance Cluster Computing., Journal of Future Generation
   Computer Systems, Vol. 13, No. 4-5, pp. 361-372, March 1998, available at

BURNS 94: Burns, G., Daoud, R., Vaigl, J., LAM: An Open Cluster Environment
   for MPI, Proceedings of Supercomputing Symposium, pp. 379-386, 1994,
   available at, last
   visited April 14, 2004.

CLUSTERSSH 03: ClusterSSH Cluster Admin Via SSH, available at, last visited April 14, 2004.

FREED 03 Freed, D. Aaron, The KludgeKollection, September 2003, available at, last visisted April 14, 2004.

HPCDC 03: Spitzer, C., Turner, E., High-performance Computing Development
   Center, August 2003, available at, last visited
   April 14, 2004.

KALÉ 99: Kalé, L., Skeel, R., Bhandarkar, et-al, NAMD2: Greater scalability for
   parallel molecular dynamics, Journal of Computational Physics, 151:283-312,

MAASK 03: Maya, Anu, Asmita, Snehal, Krushna (MAASK), The MigShm, March
   4, 2003, available at,
   last visited April 14, 2004.

MPI-2 97: MPI-2 Extensions, University of Tennessee, 1997, available at, last visited April 14, 2004.

NSF 03: Moore, M., Balasubramanya, M., RUI-Development of a Cluster System to
    Support Computational Science Research, NSF Grant 0321218, available at, last visited April 14,

OPENMOSIX 97: openMosix, openMosix load monitor, August 28, 1997, Linux
   man pages.

RECHENBURG 03: Rechenburg, Matt, openMosix, a Linux Kernel Extension for
   Single System Image Clustering, 10th International Linux System Conference,
   October 14, 2003, Saarbrücken, Germany, available at http://www.linux-, last visited April 14, 2004.

TURNER 03: Turner, Evan. Lee., ECluster: A Serial Program Parallel Subsystem
   (SPPS) Designed for Beowulf Computer Clusters, Proceedings of the NCUR,


Shared By: