class

Document Sample
class Powered By Docstoc
					Distributed & Parallel Computing Cluster




                   Patrick McGuigan
                   mcguigan@cse.uta.edu




                                           2/12/04
DPCC Background

   NSF funded Major Research Instrumentation
    (MRI) grant
   Goals
   Personnel
    –   PI
    –   Co-PI’s
    –   Senior Personnel
    –   Systems Administrator

                                    2/12/04
DPCC Goals

   Establish a regional Distributed and Parallel
    Computing Cluster at UTA (DPCC@UTA)
   An inter-departmental and inter-institutional facility
   Facilitate collaborative research that requires large
    scale storage (tera to peta bytes), high speed access
    (gigabit or more) and mega processing (100’s of
    processors)




                                              2/12/04
DPCC Research Areas
   Data mining / KDD
     –   Association rules, graph-mining, Stream processing etc.
   High Energy Physics
     –   Simulation, moving towards a regional Dø Center
   Dermatology/skin cancer
     –   Image database, lesion detection and monitoring
   Distributed computing
     –   Grid computing, PICO
   Networking
     –   Non-intrusive network performance evaluation
   Software Engineering
     –   Formal specification and verification
   Multimedia
     –   Video streaming, scene analysis
   Facilitate collaborative efforts that need high-performance computing



                                                                   2/12/04
DPCC Personnel

   PI Dr. Chakravarthy
   CO-PI’s
    –   Drs. Aslandogan, Das, Holder, Yu
   Senior Personnel
     – Paul Bergstresser, Kaushik De, Farhad
       Kamangar, David Kung, Mohan Kumar, David
       Levine, Jung-Hwan Oh, Gregely Zaruba
   Systems Administrator
    –   Patrick McGuigan

                                           2/12/04
DPCC Components

 Establish a distributed memory cluster
  (150+ processors)
 Establish a Symmetric or shared
  multiprocessor system
 Establish a large shareable high speed
  storage (100’s of Terabytes)

                                2/12/04
DPCC Cluster as of 2/1/2004

   Located in 101 GACB
   Inauguration 2/23 as part of E-Week
   5 racks of equipment + UPS




                                      2/12/04
2/12/04
2/12/04
2/12/04
   Photos (scaled for presentation)




                                       2/12/04
DPCC Resources

   97 machines
    –   81 worker nodes
    –   2 interactive nodes
    –   10 IDE based RAID servers
    –   4 nodes support Fibre Channel SAN
   50+ TB storage
    –   4.5 TB in each IDE RAID
    –   5.2 TB in FC SAN

                                            2/12/04
DPCC Resources (continued)

   1 Gb/s network interconnections
    –   core switch
    –   satellite switches
   1 Gb/s SAN network
   UPS




                                      2/12/04
    DPCC Layout

node1
node2
                 Campus                                  Campus                            Linksys 3512              Linksys 3512
node3
node4
                                                                                           node33                    node43
node5       master.dpcc.uta.edu                   grid.dpcc.uta.edu   raid3
                                                                                           node34                    node44
node6
                                                                                           node35                    node45
node7
                                                                      raid6                node36                    node46
node8
                                                                                           node37                    node47
node9                             Foundry FastIron 800
                                                                      raid8                node38                    node48
node10
                                                                                           node39                    node49
node11
                                                                                           node40                    Node50
node12                                                                raid9
                                                                                           node41                    node51
node13
                                                                                           node42                    node52
node14
                                          Open                        raid10               raid1   IDE    6TB        raid2   IDE    6TB
Node15
node16                                    Open

node17                                    Open

node18
                                          Open                        Linksys 3512         Linksys 3512              Linksys 3512
node19
node20
                                                                      node53               node63                    node73
node21
                                                                      node54               node64                    node74
node22
                                                     100 Mbs switch   node55               node65                    node75
node23
                                                                      node56               node66                    node76
node24
                                                     lockserver       node57               node67                    node77
node25
                                                                      node58               node68                    node78
node26                                               gfsnode1
                                                                      node59               node69                    node79
node27
                                                     gfsnode2         node60               node70                    node80
node28
                                                                      node61               node71                    node81
node29                        Brocade 3200
                                                     gfsnode3         node62               node72
node30
                                                                      raid4    IDE   6TB   raid5   IDE    6TB        raid7   IDE    6TB
node31                        ArcusII    5.2 TB
node32


                                                                                                           2/12/04
DPCC Resource Details

   Worker nodes
    –   Dual Xeon processors
            32 machines @ 2.4GHz
            49 machines @ 2.6GHz
    –   2 GB RAM
    –   IDE Storage
            32 machines @ 60 GB
            49 machines @ 80 GB
    –   Redhat 7.3 Linux (2.4.20 kernel)


                                           2/12/04
DPCC Resource Details (cont.)

   Raid Server
    –   Dual Xeon processors (2.4 GHz)
    –   2 GB RAM
    –   4 Raid Controllers
            2 port controller (qty 1) Mirrored OS disks
            8 port controller (qty 3) RAID5 with hot spare
    –   24 250GB disks
    –   2 40GB disk
    –   NFS used to support worker nodes

                                                        2/12/04
DPCC Resource Details (cont.)

   FC SAN
    –   RAID5 Array
    –   42 142GB FC disks
    –   FC Switch
    –   3 GFS nodes
            Dual Xeon (2.4 Ghz)
            2 GB RAM
            Global File System (GFS)
            Serve to cluster via NFS
    –   1 GFS Lockserver


                                        2/12/04
Using DPCC

   Two nodes available for interactive use
    –   master.dpcc.uta.edu
    –   grid.dpcc.uta.edu
   More nodes are likely to support other services
    (Web, DB access)
   Access through SSH (version 2 client)
    –   Freeware Windows clients are available (ssh.com)
    –   File transfers through SCP/SFTP



                                                    2/12/04
Using DPCC (continued)

   User quotas not implemented on home
    directory yet. Be sensible in your usage.
   Large data sets will be stored on RAID’s
    (requires coordination with sys admin)
   All storage visible to all nodes.




                                        2/12/04
Getting Accounts
   Have your supervisor request account
   Account will be created
   Bring ID to 101 GACB to receive password
     –   Keep password safe
   Login to any interactive machine
     –   master.dpcc.uta.edu
     –   grid.dpcc.uta.edu
   USE yppasswd command to change password
   If you forget your password
     –   See me in my office, I will reset your password (with ID)
     –   Call or e-mail me, I will reset your password to the original
         password



                                                               2/12/04
User environment

   Default shell is bash
    –   Change with ypchsh
    –   Customize user environment using startup files
            .bash_profile (login session)
            .bashrc (non-login)
    –   Customize with statements like:
            export <variable>=<value>
            source <shell file>
    –   Much more information in man page


                                              2/12/04
Program development tools

   GCC 2.96
    –   C
    –   C++
    –   Java (gcj)
    –   Objective C
    –   Chill
   Java
    –   JDK Sun J2SDK 1.4.2

                              2/12/04
Development tools (cont.)

   Python
    –   python = version 1.5.2
    –   python2 = version 2.2.2
   Perl
    –   Version 5.6.1
   Flex, Bison, gdb…
   If your favorite tool is not available, we’ll
    consider adding it!

                                            2/12/04
Batch Queue System

   OpenPBS
    –   Server runs on master
    –   pbs_mom runs on worker nodes
    –   Scheduler runs on master
    –   Jobs can be submitted from any interactive node
    –   User commands
            qsub – submit a job for execution
            qstat – determine status of job, queue, server
            qdel – delete a job from the queue
            qalter – modify attributes of a job
    –   Single queue (workq)

                                                              2/12/04
PBS qsub

   qsub used to submit jobs to PBS
    –   A job is represented by a shell script
    –   Shell script can alter environment and proceed
        with execution
    –   Script may contain embedded PBS directives
    –   Script is responsible for starting parallel jobs (not
        PBS)



                                                  2/12/04
Hello World
[mcguigan@master pbs_examples]$ cat helloworld
echo Hello World from $HOSTNAME

[mcguigan@master pbs_examples]$ qsub helloworld
15795.master.cluster

[mcguigan@master pbs_examples]$ ls
helloworld helloworld.e15795 helloworld.o15795

[mcguigan@master pbs_examples]$ more helloworld.o15795
Hello World from node1.cluster




                                           2/12/04
Hello World (continued)

   Job ID is returned from qsub
   Default attributes allow job to run
    –   1 Node
    –   1 CPU
    –   36:00:00 CPU time
   standard out and standard error streams are
    returned


                                          2/12/04
Hello World (continued)

   Environment of job
    –   Defaults to login shell (overide with #!) or –S switch
    –   Login environment variable list with PBS additions:
            PBS_O_HOST
            PBS_O_QUEUE
            PBS_O_WORKDIR
            PBS_ENVIRONMENT
            PBS_JOBID
            PBS_JOBNAME
            PBS_NODEFILE
            PBS_QUEUE
    –   Additional environment variables may be transferred using “-v”
        switch


                                                                 2/12/04
PBS Environment Variables
   PBS_ENVIRONMENT
   PBS_JOBCOOKIE
   PBS_JOBID
   PBS_JOBNAME
   PBS_MOMPORT
   PBS_NODENUM
   PBS_O_HOME
   PBS_O_HOST
   PBS_O_LANG
   PBS_O_LOGNAME
   PBS_O_MAIL
   PBS_O_PATH
   PBS_O_QUEUE
   PBS_O_SHELL
   PBS_O_WORKDIR
   PBS_QUEUE=workq
   PBS_TASKNUM=1


                        2/12/04
qsub options
   Output streams:
     –   -e (error output path)
     –   -o (standard output path)
     –   -j (join error + output as either output or error)
   Mail options
     –   -m [aben] when to mail (abort, begin, end, none)
     –   -M who to mail
   Name of job
     –   -N (15 printable characters MAX first is alphabetical)
   Which queue to submit job to
     –   -q [name] Unimportant for now
   Environment variables
     –   -v pass specific variables
     –   -V pass all environment variables of qsub to job
   Additional attributes
    -w specify dependencies


                                                                  2/12/04
Qsub options (continued)

   -l switch used to specify needed resources
    –   Number of nodes
            nodes = x
    –   Number of processors
            ncpus = x
    –   CPU time
            cput=hh:mm:ss
    –   Walltime
            walltime=hh:mm:ss
   See man page for pbs_resources

                                             2/12/04
Hello World

   qsub –l nodes=1 –l ncpus=1 –l cput=36:00:00 –N
    helloworld –m a –q workq helloworld
   Options can be included in script:
    #PBS -l nodes=1
    #PBS -l ncpus=1
    #PBS -m a
    #PBS -N helloworld2
    #PBS -l cput=36:00:00

    echo Hello World from $HOSTNAME

                                           2/12/04
qstat

   Used to determine status of jobs, queues, server
   $ qstat
   $ qstat <job id>
   Switches
    –   -u <user> list jobs of user
    –   -f provides extra output
    –   -n provides nodes given to job
    –   -q status of the queue
    –   -i show idle jobs


                                             2/12/04
qdel & qalter

   qdel used to remove a job from a queue
    –   qdel <job ID>
   qalter used to alter attributes of currently
    queued job
    –   qalter <job id> attributes (similar to qsub)




                                                 2/12/04
Processing on a worker node

   All RAID storage visible to all nodes
    –   /dataxy where x is raid ID, y is Volume (1-3)
    –   /gfsx where x is gfs volume (1-3)
   Local storage on each worker node
    –   /scratch
   Data intensive applications should copy input data
    (when possible) to /scratch for manipulation and
    copy results back to raid storage



                                                        2/12/04
Parallel Processing

   MPI installed on interactive + worker nodes
    –   MPICH 1.2.5
    –   Path: /usr/local/mpich-1.2.5
   Asking for multiple processors
    –   -l nodes=x
    –   -l ncpus=2x




                                       2/12/04
Parallel Processing (continued)

   PBS node file created when job executes
   Available to job via $PBS_NODEFILE
   Used to start processes on remote nodes
    –   mpirun
    –   rsh




                                     2/12/04
Using node file (example job)
#!/bin/sh
#PBS -m n
#PBS -l nodes=3:ppn=2
#PBS -l walltime=00:30:00
#PBS -j oe
#PBS -o helloworld.out
#PBS -N helloword_mpi
NN=`cat $PBS_NODEFILE | wc -l`
echo "Processors received = "$NN

echo "script running on host `hostname`"
cd $PBS_O_WORKDIR
echo

echo "PBS NODE FILE"
cat $PBS_NODEFILE
echo

/usr/local/mpich-1.2.5/bin/mpirun -machinefile $PBS_NODEFILE -np $NN ~/mpi-example/helloworld
                                                                                   2/12/04
MPI

   Shared Memory vs. Message Passing
   MPI
    –   C based library to allow programs to communicate
    –   Each cooperating execution is running the same program
        image
    –   Different images can do different computations based on
        notion of rank
    –   MPI primitives allow for construction of more sophisticated
        synchronization mechanisms (barrier, mutex)



                                                       2/12/04
helloworld.c
#include   <stdio.h>
#include   <unistd.h>
#include   <string.h>
#include   "mpi.h"

int main( argc, argv )
     int argc;
     char **argv;
{
  int rank, size;
  char host[256];
  int val;

 val = gethostname(host,255);
 if ( val != 0 ){
   strcpy(host,"UNKNOWN");
 }

 MPI_Init( &argc, &argv );
 MPI_Comm_size( MPI_COMM_WORLD, &size );
 MPI_Comm_rank( MPI_COMM_WORLD, &rank );
 printf( "Hello world from node %s: process %d of %d\n", host, rank, size );
 MPI_Finalize();
 return 0;
                                                                2/12/04
Using MPI programs

   Compiling
    –   $ /usr/local/mpich-1.2.5/bin/mpicc helloworld.c
   Executing
    –   $ /usr/local/mpich-1.2.5/bin/mpirun <options> \
        helloworld
    –   Common options:
            -np number of processes to create
            -machinefile list of nodes to run on


                                                    2/12/04
Resources for MPI

   http://www-hep.uta.edu/~mcguigan/dpcc/mpi
    –   MPI documentation
   http://www-unix.mcs.anl.gov/mpi/indexold.html
    –   Links to various tutorials
   Parallel programming course




                                            2/12/04

				
DOCUMENT INFO