Docstoc

kickstart

Document Sample
kickstart Powered By Docstoc
					      Kickstart tutorial
             for Sciblade Cluster Users

                   19 Oct, 2010

High Performance Cluster Computing Centre (HPCCC)
                Faculty of Science
            Hong Kong Baptist University
               Outline
•   Hardware configurations
•   Recent Software Installed
•   Basic Login and job submission
    procedure
•   Parallel Program Examples
•   Policy for using
    sciblade.sci.hkbu.edu.hk
•   Acknowledgement
http://www.sci.hkbu.edu.hk/hpccc/sciblade

                                            2
    Latest Cluster
Hardware configurations
         Cluster Hardware
This 256-node PC cluster (sciblade) consist of:
•   Master node x 2
•   IO nodes x 3 (storage)
•   Compute nodes x 256
•   Blade Chassis x 16
•   Management network
•   Interconnect fabric
•   1U console & KVM switch
• Emerson Liebert Nxa 120k VA UPS


                                                  4
   Sciblade Cluster




256-node clusters supported by fund from RGC

                                               5
    Hardware Configuration
• Master Node
  – Dell PE1950, 2x Xeon E5450 3.0GHz (Quad Core)
  – 16GB RAM, 73GB x 2 SAS drive
• IO nodes (Storage)
  – Dell PE2950, 2x Xeon E5450 3.0GHz (Quad Core)
  – 16GB RAM, 73GB x 2 SAS drive
  – 3TB storage Dell PE MD3000
• Compute nodes x 256 each
  – Dell PE M600 blade server w/ Infiniband network
  – 2x Xeon E5430 2.66GHz (Quad Core)
  – 16GB RAM, 73GB SAS drive


                                                      6
    Hardware Configuration
• Blade Chassis x 16
  – Dell PE M1000e
  – Each hosts 16 blade servers
• Management Network
  – Dell PowerConnect 48 (Gigabit Ethernet) x 6
• Inerconnect fabric
  – Qlogic SilverStorm 9120 switch
• Console and KVM switch
  – Dell AS-180 KVM
  – Dell 17FP Rack console
• Emerson Liebert Nxa 120kVA UPS
                                                  7
           Software List
• Operating System
  – ROCKS 5.1 Cluster OS
  – CentOS 5.3 kernel 2.6.18
• Job Management System
  – Portable Batch System
  – MAUI scheduler
• Compilers, Languages
  – Intel Fortran/C/C++ Compiler for Linux V11
  – GNU 4.1.2/4.4.0 Fortran/C/C++ Compiler


                                                 8
          Software List
• Message Passing Interface (MPI)
  Libraries
  – MVAPICH 1.1
  – MVAPICH2 1.2
  – OPEN MPI 1.3.2
• Mathematic libraries
  – ATLAS 3.8.3
  – FFTW 2.1.5/3.2.1
  – SPRNG 2.0a(C/Fortran) /4.0(C++/Fortran)
  – ScaLAPACK 1.8.0
                                              9
             Software List
• Molecular Dynamics & Quantum Chemistry
  –   Gamess 2009R1
  –   Gromacs 4.0.7
  –   LAMMPS
  –   Siesta 3.0b
  –   GPAW 0.7.2
• Third-party Applications
  –   MATLAB 2008b with pmatlab
  –   Lumerical FDTD
  –   TAU 2.18.2, Visit 1.11.2
  –   Xmgrace 5.1.22

                                           10
            Software List
• Queuing system
  – Torque/PBS
  – Maui scheduler
• Editors
  – vi
  – emacs




                            11
              Hostnames
• Master node
  – External : sciblade.sci.hkbu.edu.hk
  – Internal : frontend-0
• IO nodes (storage)
  – pvfs2-io-0-0, pvfs2-io-0-1, pvfs-io-0-2
• Compute nodes
  – compute-0-0.local, …, compute-0-255.local




                                                12
Basic Login and Job Submission
           Procedure
              Basic login
• Remote login to the master node
• Terminal login
  – using secure shell
    ssh -l username sciblade.sci.hkbu.edu.hk
• Graphical login
  – PuTTY & vncviewer e.g.
    [username@sciblade]$ vncserver
     New ‘sciblade.sci.hkbu.edu.hk:3
     (username)' desktop is
     sciblade.sci.hkbu.edu.hk:3
  It means that your session will run on display 3.


                                                      14
          Graphical login
• Using PuTTY to setup a secured
  connection: Host Name=sciblade.sci.hkbu.edu.hk




                                                   15
    Graphical login (con’t)
• ssh protocol version




                              16
    Graphical login (con’t)
• Port 5900 +display number (i.e. 3 in this
  case)




                                              17
      Graphical login (con’t)
• Next, click Open, and login to sciblade
• Finally, run VNC Viewer on your PC, and enter
  "localhost:3" {3 is the display number}




• You should terminate your VNC session after you
  have finished your work. To terminate your VNC
  session running on sciblade, run the command
  [username@tdgrocks] $ vncserver –kill :3



                                                    18
                           Linux commands
• Both master and compute nodes are installed with
  Linux
• Frequently used Linux command in PC cluster
  http://www.sci.hkbu.edu.hk/hpccc/sciblade/faq_sciblade.php
    cp      cp f1 f2 dir1             copy file f1 and f2 into directory dir1

   mv       mv f1 dir1                move/rename file f1 into dir1

    tar     tar xzvf abc.tar.gz       Uncompress and untar a tar.gz format file

    tar     tar czvf abc.tar.gz abc   create archive file with gzip compression

    cat     cat f1 f2                 type the content of file f1 and f2

   diff     diff f1 f2                compare text between two files

   grep     grep student *            search all files with the word student

  history   history 50                find the last 50 commands stored in the shell

   kill     kill -9 2036              terminate the process with pid 2036

   man      man tar                   displaying the manual page on-line

  nohup     nohup runmatlab a         run matlab (a.m) without hang up after logout

    ps      ps -ef                    find out all process run in the systems

   sort     sort -r -n studno         sort studno in reverse numerical order


                                                                                      19
 ROCKS specific commands
• ROCKS provides the following commands for
  users to run programs in all compute node.
  e.g.
   – cluster-fork
     • Run program in all compute nodes
  – cluster-fork ps
     • Check user process in each compute node
  – cluster-kill
     • Kill user process at one time
  – tentakel
     • Similar to cluster-fork but run faster

                                                 20
                  Ganglia
Web based management and monitoring
• http://sciblade.sci.hkbu.edu.hk/ganglia




                                            21
Job Submission Procedures
    Job Submission Procedure
• Prepare and compile a program, e.g.
     mpicc –o hello hello.c
• Prepare a job submission script, e.g.
     Qhello.pbs
• Submit the job using qsub. e.g.
     qsub Qhello.pbs
• Note the JobID returned.
• Monitor with showq or qstat
• Examine the error and output file. e.g.
     hello.oJobID, hello.eJobID


                                            23
       Sample Program: hello.c
#include <stdio.h>
#include “mpi.h”                          // MPI compiler header file

void main(int argc, char **argv)
{
   int nproc,myrank,ierr;

    ierr=MPI_Init(&argc,&argv);           // MPI initialization

    // Get number of MPI processes
    MPI_Comm_size(MPI_COMM_WORLD,&nproc);

    // Get process id for this processor
    MPI_Comm_rank(MPI_COMM_WORLD,&myrank);

    printf (“Hello World!! I’m process %d of %d\n”,myrank,nproc);

    ierr=MPI_Finalize();                  // Terminate all MPI processes
}

                                                                           24
Compiling & Running MPI Programs
• Using mvapich 1.1
1. Setting path, at the command prompt, type:
    export PATH=/u1/local/mvapich1/bin:$PATH
     (uncomment this line in .bashrc)
2. Compile using mpicc, mpiCC, mpif77 or mpif90, e.g.
    mpicc –o hello hello.c
3. Prepare hostfile (e.g. machines) number of compute
   nodes:
     compute-0-0
     compute-0-1
     compute-0-2
     compute-0-3
4. Run the program with a number of processor node:
     mpirun –np 4 –machinefile machines ./hello


                                                        25
Prepare parallel job script, Qhello.pbs
#!/bin/sh
### Job name
#PBS -N hello
### Declare job non-rerunable
#PBS -r n
#PBS -l nodes=10:ppn=2
#PBS -l walltime=00:08:00
# This job's working directory
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `cat $PBS_NODEFILE`
# Define number of processors
NPROCS=`wc -l < $PBS_NODEFILE`
echo This job has allocated $NPROCS nodes
# Run the parallel MPI executable “hello"
/u1/local/mvapich1/bin/mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS ./hello




                                                                                  26
 Job submission and monitoring
• Submit the job
       qsub Qhello.pbs
• Note the jobID. e.g.
       15238.sciblade2.sci.hkbu.edu.hk
• Monitor by qstat. e.g qstat                        15238
  Job id                    Name             User            Time Use S Queue
  ------------------------- ---------------- --------------- -------- - -----
  15238.sciblade2          hello            morris                 0 R default




                                                                                 27
                       Job monitoring
 • Show the status of submitted jobs
           showq
  13896                  dhhe       Running     16    INFINITY   Mon May   3 04:48:25
  14402                  dhhe       Running     16    INFINITY   Wed May   5 23:46:09
  14403                  dhhe       Running     16    INFINITY   Wed May   5 23:47:07

      67 Active Jobs    2012 of 2024 Processors Active (99.41%)
                         253 of 253 Nodes Active       (100.00%)

  IDLE JOBS----------------------
  JOBNAME            USERNAME         STATE   PROC     WCLIMIT              QUEUETIME

  0 Idle Jobs

  BLOCKED JOBS----------------
  JOBNAME            USERNAME         STATE   PROC     WCLIMIT              QUEUETIME

  14951                    ggl         Idle     32 4:00:00:00    Mon May 10 00:55:19
  15011                 justin         Idle     32 7:00:00:00    Mon May 10 15:48:36
  15098                 hkbu09         Idle     50 33:08:00:00   Tue May 11 11:46:45



• Delete jobID by qdel. e.g.
          qdel 15238                                                                    28
Assorted Program Examples
         Example codes
• Updated example codes have been
  stored in /u1/local/share/examples/
• Copy all codes in one file
  /u1/local/share/examples.tar.gz
• Unzip and Untar using
     tar xzvf examples.tar.gz




                                        30
Example: Ring              0          Example: Prime
                                      prime/prime.c
ring/ring.c            3       1      prime/prime.f90
ring/Makefile                         prime/primeParallel.c
                           2          prime/Makefile
ring/machines                         prime/machines

Compile program by the command:       Compile by the command: make
make                                  Run the serial program by
                                      ./primeC       or ./primeF
Run the program in parallel by        Run the parallel program by
mpirun –np 4 –machinefile             mpirun –np 4 –machinefile
machines ./ring < in                  machines ./primeMPI



Example: Sorting                      Example: mcPi

                                      mcPi/mcPi.c
sorting/qsort.c                       mcPi/mc-Pi-mpi.c
sorting/bubblesort.c                  mcPi/Makefile
sorting/script.sh                     mcPi/QmcPi.pbs
sorting/qsort
sorting/bubblesort                    Compile by the command: make
                                      Run the serial program by: ./mcPi ##
Submit job to PBS queuing system by   Submit job to PBS queuing system by
qsub script.sh                        qsub QmcPi.pbs
                                                                      31
Example 1: OpenMP

 /u1/local/share/examples/omp
                  OpenMP
• The OpenMP Application Program Interface
  (API) supports multi-platform shared-memory
  parallel programming in C/C++ and Fortran on
  all architectures, including Unix platforms and
  Windows NT platforms.
• Jointly defined by a group of major computer
  hardware and software vendors.
• OpenMP is a portable, scalable model that
  gives shared-memory parallel programmers a
  simple and flexible interface for developing
  parallel applications for platforms ranging from
  the desktop to the supercomputer.

                                                     33
     OpenMP compiler choice
• gcc 4.40 or above
  – compile with -fopenmp
• Intel 10.1 or above
  – compile with –Qopenmp on Windows
  – compile with –openmp on linux
• PGI compiler
  – compile with –mp
• Absoft Pro Fortran
  – compile with -openmp

                                       34
      Sample openmp example
#include <omp.h>
#include <stdio.h>
int main() {
   #pragma omp parallel
   printf("Hello from thread %d, nthreads %d\n",
   omp_get_thread_num(),
   omp_get_num_threads());
}




                                                   35
                        serial-pi.c
#include <stdio.h>
static long num_steps = 10000000;
double step;
int main ()
{ int i; double x, pi, sum = 0.0;
    step = 1.0/(double) num_steps;
    for (i=0;i< num_steps; i++){
          x = (i+0.5)*step;
          sum = sum + 4.0/(1.0+x*x);
    }
    pi = step * sum;
    printf("Est Pi= %f\n",pi);
}



                                       36
        Openmp version of spmd-pi.c
#include <omp.h>
#include <stdio.h>
static long num_steps = 10000000;
double step;
#define NUM_THREADS 8
int main ()
{ int i, nthreads; double pi, sum[NUM_THREADS];
   step = 1.0/(double) num_steps;
   omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
   {
     int i, id,nthrds;
     double x;
     id = omp_get_thread_num();
     nthrds = omp_get_num_threads();
     if (id == 0) nthreads = nthrds;
     for (i=id, sum[id]=0.0;i< num_steps; i=i+nthrds) {
          x = (i+0.5)*step;
          sum[id] += 4.0/(1.0+x*x);
     }
   }
   for(i=0, pi=0.0;i<nthreads;i++)
          pi += sum[i] * step;
   printf("Est Pi= %f using %d threads \n",pi,nthreads);
}

                                                           37
Submit parallel jobs into torque batch queue
 Prepare a job script, say omp.pbs like the following
   #!/bin/sh
   ### Job name
   #PBS -N OMP-spmd
   ### Declare job non-rerunable
   #PBS -r n
   ### Mail to user
   ##PBS -m ae
   ### Queue name (small, medium, long, verylong)
   ### Number of nodes
   #PBS -l nodes=1:ppn=8
   #PBS -l walltime=00:08:00
   cd $PBS_O_WORKDIR
   export OMP_NUM_THREADS=8
   ./omp_hello
   ./omp_test
   ./serial-pi
   ./omp-spmd-pi

 Submit it using qsub
   qsub omp.pbs

                                                         38
      Example 2: Siesta 3.0b
• Spanish Initiative for Electronic
  Simulations with Thousands of Atoms
• perform electronic structure calculations
  and ab initio molecular dynamics
  simulations of molecules and solids.
• Project website:
  http://www.icmab.es/siesta
• Example directory:
  /u1/local/share/examples/siesta/h2o

                                              39
 Siesta example input file h2o.fdf
• Input file: Flexible data format (FDF), e.g. h2o.fdf
  SystemName            Water molecule
  SystemLabel           h2o
  NumberOfAtoms         3
  NumberOfSpecies       2

  %block ChemicalSpeciesLabel
  1 8 O        # Species index, atomic number, species
  label
  2 1 H
  %endblock ChemicalSpeciesLabel

  AtomicCoordinatesFormat Ang
  %block AtomicCoordinatesAndAtomicSpecies
   0.000 0.000 0.000 1
   0.757 0.586 0.000 2
  -0.757 0.586 0.000 2
  %endblock AtomicCoordinatesAndAtomicSpecies


                                                         40
 Siesta sample pbs file h2o.pbs
 #!/bin/bash
 #PBS -N siesta-h2o
 #PBS -l nodes=8
 #PBS -l walltime=6:00:00
 #PBS -l pmem=512mb
 NCPU=`wc -l < $PBS_NODEFILE`
 cd $PBS_O_WORKDIR
 MPIPATH=/u1/local/mvapich2/bin
 ${MPIPATH}/mpirun_rsh -np ${NCPU} -hostfile
 ${PBS_NODEFILE} /u1/local/bin/siesta < h2o.fdf

•Submit the above h2o.pbs using qsub
  qsub h2o.pbs


                                                  41
       Example 3: pmatlab
• Pmatlab developed by MIT Lincoln
  Laboratory
• Installed with MATLAB 2008b
• Example directory:
  /u1/local/share/examples/pmatlab
• Startup.m : matlab startup file
• RUN.m : control file for running in
  compute nodes
• sample_app.m : main program
• Qpmatlab.pbs : submit script

                                        42
Pmatlab : idea of distributed matrix
• New data type: dmat
• Overload functions: zeros, ones, rand, with an
  additional parameter Map
• Map tells pmatlab how and where dmat must
  be distributed with three components:
  – Grids, e.g [2 3], 2 x 3 grids
  – Distributions:
     • block – contiguous block of data
     • Cyclic – data are interleaved with processors
     • Block cyclic
  – Processor lists, e.g. [0:nCPUs]


                                                       43
Pmatlab: examples of map grid




                                44
                                 RUN.m
% RUN.m is a generic script for running pMatlab scripts.
% Define number of processors to use
Ncpus = 4;
% Name of the script you want to run
mFile = 'sample_app';
% Define cpus.
% Empty implies run on host.
% cpus = {};
% Get path to PBS node file on ITC Linux cluster
pbs_path=getenv('PBS_NODEFILE');
% Specify machine names to run remotely.
cpus = textread(pbs_path,'%s')';
% Specify which machines to run on
% cpus = {'compute-0-0', 'compute-0-1', 'compute-0-2', 'compute-0-3'};
% Abort left over jobs
MPI_Abort;
pause(2.0);
% Delete left over MPI directory
MatMPI_Delete_all;
pause(2.0);
% Define global variables
global pMATLAB;
% Run the script.
['Running: ' mFile ' on ' num2str(Ncpus) ' cpus']
eval(MPI_Run(mFile, Ncpus, cpus));

                                                                         45
                             sample_app.m
N = 2^10; % NxN Matrix size.                   % Finalize the pMATLAB program
M = 8;                                         disp('SUCCESS');
format long;                                   if (PARALLEL)
% Turn parallelism on or off.                     pMatlab_Finalize;
PARALLEL = 1; % Can be 1 or 0. OK to change.   end
% Create Maps.
mapX = 1; mapY = 1;
if (PARALLEL)
   % Initialize pMatlab.
   pMatlab_Init;
   Ncpus = pMATLAB.comm_size;
   my_rank = pMATLAB.my_rank;
   % Break up channels.
   mapX = map([1 Ncpus], {}, 0:Ncpus-1);
   mapY = map([1 Ncpus], {}, 0:Ncpus-1);
end

% Allocate data structures.
X1 = rand(N,M,mapX);
X2 = rand(N,M,mapX);
Y = zeros(N,M,mapY);
Z = zeros(1,M,mapY);
x1local=local(X1);
x2local=local(X2);
R=x1local .* x1local + x2local .* x2local;
R= R<=1;
sumR=sum(R);
Z = put_local(Z,sumR);
A = agg(Z);
epi= sum(A)/M/N * 4.0

                                                                                46
                     Qpmatlab.pbs
#!/bin/sh
### Job name
#PBS -N pmatlab
### Declare job non-rerunable
#PBS -r n
### Output files
###PBS -e pmatlab.err
###PBS -o pmatlab.log
### Mail to user
#PBS -m ae
### Queue name (small, medium, long, verylong)
### Number of nodes (node property ev67 wanted)
#PBS -l nodes=4
#PBS -l walltime=00:20:00

# This job's working directory
###echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR

/u1/local/bin/matlab -nodisplay -r RUN



                                                  47
Example: Ring              0          Example: Prime
                                      prime/prime.c
ring/ring.c            3       1      prime/prime.f90
ring/Makefile                         prime/primeParallel.c
                           2          prime/Makefile
ring/machines                         prime/machines

Compile program by the command:       Compile by the command: make
make                                  Run the serial program by
                                      ./primeC       or ./primeF
Run the program in parallel by        Run the parallel program by
mpirun –np 4 –machinefile             mpirun –np 4 –machinefile
machines ./ring < in                  machines ./primeMPI



Example: Sorting                      Example: mcPi

                                      mcPi/mcPi.c
sorting/qsort.c                       mcPi/mc-Pi-mpi.c
sorting/bubblesort.c                  mcPi/Makefile
sorting/script.sh                     mcPi/QmcPi.pbs
sorting/qsort
sorting/bubblesort                    Compile by the command: make
                                      Run the serial program by: ./mcPi ##
Submit job to PBS queuing system by   Submit job to PBS queuing system by
qsub script.sh                        qsub QmcPi.pbs
                                                                      48
    Policy for using
sciblade.sci.hkbu.edu.hk
                   Policy
1. Every user shall apply for his/her own computer user
   account to login to the master node of the PC cluster,
   sciblade.sci.hkbu.edu.hk.
2. The account must not be shared his/her account and
   password with the other users.
3. Every user must deliver jobs to the PC cluster from
   the master node via the PBS job queuing system.
   Automatically dispatching of job using scripts or
   robots are not allowed.
4. Users are not allowed to login to the compute nodes.
5. Foreground jobs on the PC cluster are restricted to
   program testing and the time duration should not
   exceed 1 minutes CPU time per job.

                                                            50
           Policy (continue)
6. Any background jobs run on the master node or
   compute nodes are strictly prohibited and will be
   killed without prior notice.
7. The current restrictions of the job queuing system are
   as follows,
    – The maximum number of running jobs in the job queue is 8.
    – The maximum total number of CPU cores used in one time
      cannot exceed 512.
8. The restrictions in item 7 will be reviewed timely for
   the growing number of users and the computation
   need.




                                                              51
 Good Practice in using sciblade
• logout from the master node after use
• delete unused files or compress
  temporary data
• estimate the walltime for running jobs
  and acquire just enough walltime for
  running.
• never run foreground job within the
  master node and the compute node
• report abnormal behaviours.

                                           52
          Acknowledgement
• When you make presentations or publish papers, we
  would appreciate it if you would kindly acknowledge
  the HPCCC by including:
  "This research was conducted using the resources of
  the High Performance Cluster Computing Centre,
  Hong Kong Baptist University, which receives funding
  from Research Grant Council, University Grant
  Committee of the HKSAR and Hong Kong Baptist
  University."
• Use of Center resources constitutes an agreement to
  provide copies of any publication or news stories
  concerning research conducted using our systems
  and/or consulting services.
• Please send acknowledgement e-mail to
  hpccc@sci.hkbu.edu.hk. Thank you

                                                         53
Thank you!


  Questions?

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:24
posted:11/15/2010
language:English
pages:54