Introduction to HPC at UNBC

Document Sample
Introduction to HPC at UNBC Powered By Docstoc
					Introduction to HPC at UNBC
 The Enhanced High Performance
       Computing Center

     Dr. You Qin (Jean) Wang

       February 13, 2008
Summary of the presentation:
Who needs HPC?
What kind of software do we have?
What kind of hardware do we have?
How to access the HPC systems?
Parallel programming basics
Who needs HPC?
HPC Domains of Applications at UNBC:
 Atmospheric Science
 Environmental Science
 Computer Science
Who needs HPC?
 We use HPC to solve problems that
 can't be solved in a reasonable
 amount of time using a single
 desktop computer.
 Problems solved using HPC:
   Needs large quantity of RAM
   Requires large quantity of CPUs
      HPC Users Summery
On February 6, 2008:

 Total Users: 73
 Professors: 16
 Post-doctoral: 7
 Ph. D. students: 5
 Master Students and Others: 45
What kind of software do we have?
 MATLAB + Toolboxes
 NAG Fortran Library
 PGI Compilers
 Intel Compilers
What kind of software do we have?
 IDL – the ideal software for data
 analysis, visualization, and cross-
 platform application development
 ENVI - the premier software solution
 to quickly, easily, and accurately
 extract information from geospatial
What kind of software do we have?
 MATLAB is a high-level technical computing
 language and interactive environment for
 algorithm development, data visualization, data
 analysis, and numeric computation.

 MATLAB Toolboxes:
 – Curve Fitting
 – Distributed Computing
 – Image Processing
 – Mapping
 – Neural Network
 – Statistics
What kind of software do we have?
                  Two images plotted
                  using Tecplot by
                  Dr. Jean Wang

                 Pressure Contour
                   around a Prolate
What kind of software do we have?
 Why use STATA?
 STATA is a complete, integrated
 statistical package that provides
 everything you need for data
 analysis, data management, and
What kind of software do we have?
 The NAG Fortran Library - the largest
 commercially available collection of numerical
 algorithms for Fortran today

 Calling NAG Library:
 – Set Environmental Variables before you run your job.

    export LM_LICENSE_FILE

    /opt/intel/fc/9.0/bin/ifort -r8 test.for –L
    /usr/local/fll6420dcl/lib/ -o test.exe
What kind of software do we have?

 FLUENT – Flow Modeling Software
What kind of hardware do we have?
 SGI Altix 3000 – 64 processor
 Linux Cluster – 128 processor
 File Server
 Windows Terminal Server
 10 Workstations in HPC Lab
 Geowall systems for visualization
SGI Altix 3000 –
                64 Processors
                 – Intel Itanium2 (1.5Ghz)
                 – 4Mb Cache
                64 Gb RAM
                 – 1Gb/processor
                NumaLink interconnect
                 – 6.4Gb/s
                 – Fat Tree
                10GbE network connection
                Suse Linux Enterprise
                Server 9
Linux Cluster –
                64 Nodes (128 processors) +
                Head Node
                 – AMD Opteron (2.1Ghz)
                 – 144Gb RAM (2/node + 16 for
                GigE interconnect
                 – Two nortel switches
                 – Network access via head
                Operating System
                 – Suse 9.3
                 – 1.7 Tb of local storage on
                    head node for software and
                    local copies.
File Server
       SGI Altix 350
       – 4p, 8Gb RAM
       SGI TP9100
       – 6Tb Storage
       – RAID 5, with hot
       10GbE network
       Maintain type
  Windows Terminal Server –
Dell PowerEdge 6800
– 4p (Intel Xeon, 2.4Ghz)
– 8Gb RAM
Local Raid for system volume.
– 600Gb volume
Accessible from anywhere.
Runs windows applications.
  Workstations at HPC Lab
Dell Precision 470
– 2 Intel Xeon Processors (3.2Ghz)
– 2Gb RAM
– NVidia Quadro FX3400 / 256Mb
– 2 Dell 20” LCD displays.
GeoWall Systems
         Two Systems
         Both have a 2
         processors server,
         1.5Tb RAID5
         GeoWall Room (8-
         111) has rear
         projected display
         Portable unit has
         front projected
How to access the HPC systems?
From Windows to Windows

 From: Start-> All Program -> Accessories
-> Communications -> Remote Desktop


  Log on to: UNI
How to access the HPC systems?

From Linux to Windows
 rdesktop -a15 -g 1280x1024 pg-hpc-ts-

   Log on to: UNI
How to access the HPC systems?
From Linux to Linux:
ssh –X
ssh –X
 – [pg-hpc-clnode-head ~]>ssh -X pg-hpc-
 – [pg-hpc-clnode-63 ~]>
How to access the HPC systems?
From Windows to Linux:
Download software “Xmanager 2.0”
How to access the HPC systems?
How to mount /hpc file system?
Under windows:
 – Simply right click on "My Computer" and
   select "Map Network Drive"", and then
   choose \\\LOGIN
 – replacing LOGIN with your UNI login.
How to access the HPC systems?
How to mount /hpc file system?
On a Linux machine:
 – smbmount //pg-hpc-fs- MOUNTPOINT -o
 – replacing MOUNTPOINT with the name
   of a directory that the system will be
   mounted to.
Reminder to HPC users:
 Don’t run applications directly on the
 cluster headnode. Always remember
 to switch to node 63 or 64 first, then
 run your applications, such as
 Matlab, IDL etc.
 Submit your job via PBS on both
 Columbia and Andrei.
What is PBS?
 Portable Batch System (or simply
 PBS) is the name of computer
 software that performs job
 scheduling. Its primary task is to
 allocate computational tasks, i.e.,
 batch jobs, among the available
 computing resources.
 If you want to know more about
 PBS, please contact Dr. Jean Wang.
Parallel programming Basics
 What is parallelism?
 Less fish vs. more
What is Parallelism?
 Parallelism is the use of multiple
 processors to solve a problem, and in
 particular, the use of multiple
 processors working concurrently on
 different parts of a problem.
 The different parts could be different
 tasks, or the same task on different
 pieces of the problem’s data.
Kinds of Parallelism
 Shared Memory: Auto Parallel,
 OpenMP, MPI

 Distributed Memory – MPI
The Jigsaw Puzzle Analogy
 Serial Computing:        Shared Memory
 Suppose you want to      Parallelism:
 do a jigsaw puzzle        If Tom sits across the
 that has 1000 pieces.    table from you, then
 Let’s say that you can   he can work on his
 put the puzzle           half of the puzzle and
 together in an hour.     you can work on
Shared Memory Parallelism
 Once in a while, you will both reach into the pile
 of pieces at the same time (you will contend for
 the same resource), which will cause a little bit of
 And from time to time you will have to work
 together (communicate) at the interface between
 his half and yours.
 The speedup will be nearly 2-to-1: you will take
 35 min instead of 30 min.
The More the Merrier?
 Now let’s put Mike and Sam on the other two
 sides of the table. Each of you can work on a part
 of the puzzle, but there will be a lot more
 contention for the shared resource (the pile of
 puzzle pieces) and a lot more communication at
 the interfaces.
 So you will get noticeably less than a 4-to-1
 speedup, but you will still have an
 improvement, say the four of you can
 get it done in 20 min instead of an hour.
Diminishing Returns
 If we now put four more
 people on the corners of the
 table, there is going to be a
 lot contention for the shared
 resource, and a lot of
 communication at the many
 interfaces. You will be lucky
 to get it down in 15 min.
 Adding too many workers
 onto a shared resource is
 eventually going to have a
 diminishing return.
Distributed Parallelism
 Now let’s set up two    Now you all can work completely
 tables, and let’s put   independently, without any
 you at one of them      contention for a shared resource.
 and Tom at the other.   But the cost of communicating is
 Let’s put half of the   much higher (scootch tables
 puzzle pieces on your   together), and you need the
 table and the other     ability to split up (decompose)
 half of the pieces on   the puzzle pieces reasonably
 Tom’s.                  evenly.
Distributed Parallelism
 Processors are independent of each other.
 All data are private.
 Processes communicate by passing
 The cost of passing a message is split into
 the latency (connection time) and the
 bandwidth (time per byte).
Parallel Overhead
 Parallelism isn’t free. The compiler and the
 hardware have to do a lot of work
 parallelism happen – and this work takes
 time. This time is called parallel overhead.

 The overhead typically includes:
   Managing the multiple processes;
   Communication between processes;
   Synchronization: everyone stops until
   everyone is ready.
OpenMP and MPI programming paradigms

 MPI… parallelizing data
 OpenMP… parallelizing tasks
Harry Potter Volume 1   Harry Potter Volume 2
     Spanish                 Spanish
     French                  French
     Translator              Translator
Harry Potter Volume 1   Harry Potter Volume 1
Harry Potter Volume 2   Harry Potter Volume 2

     Spanish                 French

     Translator              Translator
Compilers on ACT cluster (andrei):

 GNU – C/C++, g77
 PGI – C/C++, f77, f90

Compilers on Altix 3000 (columbia):
 Intel – C/C++, Fortran
 GNU– C/C++, g77
PGI Compilers (cluster)
PGI Compiler:
  For 32-bit compilers, set PATH as:
export PATH=/usr/local/pgi/linux86/6.0/bin:$PATH
  For 64-bit compilers, set PATH as:
export PATH=/usr/local/pgi/linux86-
Fortran: pgf77,pgf90,pgf95, pghpf(High
  Performance Fortran), mpif77,mpif90
C: pgcc,mpicc
C++: pgCC, mpicxx
Compilers for MPI codes
Examples: a C++ code bones.C, a C code bogey.c, and
  a Fortran code mpihello.f:

On cluster:
/usr/local/pgi/linux86/6.0/bin/mpicxx bones.C -o bones

On cloumbia:
/opt/intel/cc/9.0/bin/icc bogey.c –o bogey -lmpi
/opt/intel/fc/9.0/bin/ifort -o mpihello mpihello.f -lmpi
Compilers for MPI codes
/usr/local/pgi/linux86/6.0/bin/mpicxx bones.C -o bones –
pgif77 -o mpihello mpihello.f -lfmpich –lmpich
mpif77 -o mpihello mpihello.f -lfmpich –lmpich

Which mpirun
  [pg-hpc-clnode-head ~]> which mpirun

 [pg-hpc-altix-01 ~]> which mpirun
 /opt/mpich/ch-p4/bin/mpirun –np 4 …
More then one “mpirun” – SGI MPI and MPICH
Intel Compilers
How to compile a parallel code
MPI codes:

  ifort -options myMPIcode.f -lmpi
  icc -options myMPIcode.c -lmpi

Code with OpenMp directives:

  ifort -options -openmp myOpenMpcode.f
  icc -options -openmp myOpenMpcode.c

Automatic Parallelization:

  ifort -parallel mycode.f
  icc -parallel mycode.c
More About Compilers
On columbia:
 man ifort -M /opt/intel/fc/9.0/man
 man icc -M /opt/intel/cc/9.0/man

On andrei:
 man pgCC -M
 man pgf90 -M
Getting started with OpenMP
 Key points

 – Shared memory multiprocessor nodes
 – Parallel programming using compiler
 – Fortran 77/90/95 and C/C++
C OpenMP compiler directive
    Parallel regions in C …
#include <stdio.h>
main (void)
#pragma omp parallel
        printf ("Hello, world\n");
    return 0;
Fortran OpenMP compiler directive

   Parallel regions in Fortran …

        program hello
   c$omp parallel
        print*, ‘Hello, world’
   c$omp end parallel
Compiling and Running
 Intel (-openmp) or SGI (-mp)
 – “icc test.cpp –openmp –o test-
 – “ifort test.f –openmp –o test-
 – “time ./test-openmp.exe”
Two work directories –
  /home/user-id & /hpc/home/user-id
 /home/user-id           /hpc/home/user-id
  – CTS server           – HPC server
  – Email box            – Research area
  – Login files          – Backup once a
  – Backup daily           week
  – Contact help desk    – Contact Jean Wang
    for increasing the     for increasing the
    space                  space