Learning Center
Plans & pricing Sign in
Sign Out

Installation of a Condor Supercomputing pool


									  Brain Campbell
Bryce Carmichael
   Unquiea Wade

   Dr. Eric Akers
     The international polar year was designed to study and better understand the
current state of the climatic changes to the world’s ice sheets. For the last few decades,
there have been automated weather stations and satellites in geo-synchronous orbit that
created data sets. Today, numerous amounts of data are unexplored due to insufficient
funding and the scarcity of resources. For this reason, the polar grid concept was
proposed to delegate the analysis of the existing data sets.

          The goal of the Elizabeth City State University’s Polar Grid Team was to
construct a model network to serve as a base for a super computing pool. The super
computing pool will be constructed on the university’s campus and linked to the
overall polar grid system. Numerous Software and protocols were researched that are
currently in use at other institutions around the nation. From the possible protocols, the
condor software was chosen. Condor was created and developed at the University of
Wisconsin because of easier usage and its willingness for expansion.

          An eighteen node computing pool was constructed and tested within Dixon
Hall's second floor lab using Condor. This pool was comprised of seventeen desk-tops
running on a Windows NT platform, with the pool's mater housed in Lane hall acting
as a Linux based server.
 The goal was to utilize all of our
 Gain knowledge about Supercomputing.
 Setup a pool of computers that can be
  accessed by Polar Grid.
 Familiarize team members with job
  submission and overall operation of
 What   is Supercomputing?
 Supercomputing a term given to a system
 capable of processing at speeds much
 greater than commercially available
 High throughput computing is used in
 describing systems with intermediate
 processing abilities.
• Distributed computing utilizes
a network of many computers,
each accomplishing a portion of
an overall task, to achieve a
computational result much more
quickly than with a single

Distributed computing also allows
many users to interact and
connect openly.

•Parallel processing is the
simultaneous processing of the
same task on two or more
microprocessors in order to obtain
faster results.

The computer resources can
include a single computer with
multiple processors.
•Parallel processing allows more
intimate communication between
nodes increasing efficiency.

•As the size of the network grows
communication takes up a greater
part of the CPU’s time

•This can be limited by using more
than one type of protocol in a system
Condor is a specialized workload management
system for compute-intensive jobs. Like other full-featured
batch systems, Condor provides a job queueing mechanism,
scheduling policy, priority scheme, resource monitoring, and
resource management.

Beowulf is a design for high-performance parallel
computing clusters on inexpensive personal computer
hardware. Beowulf cluster is a group of usually identical PC
computers running a Free and Open Source Software
(FOSS) Unix-like operating system, such as BSD, Linux or

BOINC is a software platform for volunteer computing and
desktop Grid computing. BOINC is designed to support
applications that have large computation requirements,
storage requirements, or both.
 TheCondor project was started in 1988.
 Condor was built from the results of the Remote
 Unix project and from the continuation of
 research in the area of Distribute Resource
 Management (DRM).

 Condor    was created at the University of
 Wisconsin-Madison (UW-Madison), and it was
 first installed as a production system in the UW-
 Madison Department of Computer Science.
 Versatility
 Capability  of switching between
  distributive or parallel computing
 Multiple programming codes for simple
  execution of jobs.
 Operates on Multiple platforms
 Availability– Open source software
 Easy Expansion – Any number of nodes
  can be added to an existing pool
 Cost efficiency – Any CPU meeting the
  base requirements can be use efficiently.
Windows                             Unix
• Condor for Windows requires
                                    • The size requirements for the
  Windows 2000 (or better) or
                                    downloads are currently vary
  Windows XP.
                                    from about 20 Mbytes
                                    (statically linked HP Unix on a
                                    PA RISC) to more than 50
• 300 megabytes of free disk
                                    Mbytes (dynamically linked Irix
  space is recommended.
                                    on an SGI).
  Significantly more disk space
  could be desired to be able to
                                    • In addition, you will need a lot
  run jobs with large data files.
                                    of disk space in the local
                                    directory of any machines that
                                    are submitting jobs to Condor
• Condor for Windows will operate
  on either an NTFS or FAT file
  system. However, for security
  purposes, NTFS is preferred.

                                 Condor software can be
                                 access through their main

                                 Condor can be
                                 downloaded on various
                                 platform such as: Solaris,
                                 Linux/Unix, Windows, and

                                 Administrative and user
                                 manuals are also available
                                 on the website.
Installation – overseen through the windows installation wizard

Changes to default :

Pool master node – Linux base machine in lane hall
        having a Linux based master will allow the eventual use of the full
        array of condor options.

Read & Write access - parameters changed to include 10.*.*.* to allow fee
       back and access from different nodes.

Due to the use of the CERSER labs during class hours each node is required
to be idle for 15 minutes before it is available to perform tasks. If a tasks
interrupted it will be restarted on a different machine, if the original node is not
freed in less than ten minutes
Jobs can be submitted using
any executable file format
through the condor/bin

Jobs are submitted through the
condor bin using the
condor_submit filename,the
status of the nodes within the
system can be checked using
the command condor_status,
condor _status command will bring
up a menu given the condition that
will list the current platform and
availability of each node. Availability
is signified by the one word qualifiers
in the fourth column.

Unclaimed: The node is open but is
unable to perform the specified task

Claimed: The node is currently
running a specified task

Matched : The node is opened and
can perform a specified task

Owner: The node has a local user
demanding its attention
After submission a task can be
traced through the pool using
condor_q, command.

The results of the tasks can be
seen within the output files
created through the executable.
or through the .log file that is
created automatically for each
Condor pool composed of 17 nodes running on windows NT platform has
been established in the Dixon hall laboratory. Operating under a Linux
based master housed at the lane hall offices.

To date simple tasks have been submitted using C++ code and have ran
successfully through the pool.

Diagnostic assessment has shown two CPU’s unconnected to the network
and that there were naming redundancies which hindered the installation of
the condor system.
             of Condor was a success .
 Installation
 Expansion of the cluster is easy and can
  be done efficiently with minimal cost in
 Management and Programming with
  Condor can be done on an
  undergraduate level and is encouraged.
 Familiarizemore of CERSER teams with
  Condor software.
 Continue the expansion of the Condor
  pool .
 Link ECSU to the Polar Grid network.
 Encourage the development of a
  programs to aide future CERSER research
1. Andrew S. Tanenbaum, Maarten Van Steen (2002): Distributed Systems
   Principles and Paradigms. New Jersey: Prentice- Hall Inc.
2. Amza C., A.L. Cox, S. Dwarkadas, P. Keleher, R. Rajamony H. Lu, W. Yu, and
   W.Zwaenepoel. ThreadMarks: Shared memory computing on networks of
   workstations, to appear in IEEE Computer,(draft copy):
3. A.J. van der Steen, An evaluation of some Beowulf clusters, Technical Report
   WFI-00-07, Utrecht University, Dept. of Computational Physics, December 2000.
   (Also available through, directory reports/.)
4. A.J. van der Steen, Overview of recent supercomputers high-end servers, June
   2005,, directory reports/.

To top