Docstoc

High-Performance Computations and the technologies of Microsoft

Document Sample
High-Performance Computations and the technologies of Microsoft Powered By Docstoc
					 Nizhni Novgorod State University



  High-performance computations
  and the technologies of Microsoft



Prof. Gergel V.P., D.Sc.,
Software Department, NNSU
 www.unn.ru
 www.software.unn.ac.ru
   Needs of High Performance Computations
    (HPC)
   Windows based Clusters –
    Microsoft Compute Cluster Server
   HPC: Simplicity vs Complexity –
    Brief Introduction to MPI
   How to overcome the complexity –
    HPC Curriculum



Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   2-61
                                                  Gergel V.P.
Needs of High Performance Computations…
• Time-consuming nature of many scientific and
  engineering problems (problems of "Great
  Challenge")
• Increase of serial computers performance is
  limited
• Cost of parallel computational systems is reducing
  (clusters,…)
• “Parallelism” on processor layer –
  HyperThreading, multicore (70% of market in
  2006)

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   3-61
                                                  Gergel V.P.
Needs of High Performance Computations…
                    1991                    1998                                2005
  System         Cray Y-MP              Sun HPC 10000                Shuttle@NewEgg.com
                   C916
 Structure       16xVector             24x333Mhz Ultra-               4x2/2GHz x64 4GB,
                 4GB, Bus                 SPARCII,                          GigE
                                         24 GB, SBus
     OS           UNICOS                 Solaris 2.5.1                 Windows Cluster
                                                                     Compute Server 2003
  GFlops             ~10                     ~10                                 ~10
   Price        $40,000,000               $1,000,000                            $4,000


Price is reduced more than 10,000 times !!!
               Supercomputing Goes Personal
Microsoft Academic Days 2006 Russia,         High-performance computations
Moscow, 28.04.2005                          and the technologies of Microsoft            4-61
                                                       Gergel V.P.
Needs of High Performance Computations…
                                                        PVP

                                  UMA                  SMP
                                                       (incl. Multi-core)
      Multiprocessors
    (shared memory)                                    COMA
                                                       CC-NUMA
                                  NUMA
                                                      NCC-NUMA
       MIMD
                            NORMA             Cluster

      Multipcomputers
   (distributed memory)
                                              MPP

Microsoft Academic Days 2006 Russia,      High-performance computations
Moscow, 28.04.2005                       and the technologies of Microsoft   5-61
                                                    Gergel V.P.
Needs of High Performance Computations
Cluster:
• Group of computers (local network) capable to
  work as a unified computational unit,
• Higher reliability and efficiency than local
  network,
• Essentially lower cost comparing to other
  types of parallel computational systems (by
  using of commodity-on-the-shelves hardware
  and software)
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   6-61
                                                  Gergel V.P.
   Needs of High Performance Computations
    (HPC)
   Windows based Clusters –
    Microsoft Compute Cluster Server
   HPC: Simplicity vs Complexity –
    Brief Introduction to MPI
   How to overcome the complexity –
    HPC Curriculum



Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   7-61
                                                  Gergel V.P.
Windows based Clusters - Microsoft Compute
Cluster Server…
• Microsoft Vision in HPC area
• Compute Cluster Server (CCS) consist of:
       – Dedicated release of OS Windows Server 2003 -
         Cluster Edition
       – Compute Cluster Pack (CCP):
             * MS MPI – implementation of the standard MPI-2,
             * Cluster management system,
             * GUI, CUI, COM and other interfaces for job submitting
• Current Release - Community Preview Release #3
• First Release CCS became available in
  November, 2005
    Download - http://www.connect.microsoft.com

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   8-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server…
• Computational Nodes:
       -   64-bit processors of x86 family,
       -   > 512 Мb RAM,
       -   > 4 Gb HDD,
       -   64-bit Microsoft Windows Server 2003
• Parallel Software Development:
       - PC under MS Windows XP, 2003, Vista,
       - MS Compute Cluster Pack SDK,
       - Recommended IDE – MS Visual Studio 2005




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   9-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server…
Job Management:
• CCS provides job management and efficient use
  of the resources on the compute cluster,
• The interfaces for scheduling jobs include:
      -   Command Line Interface (CLI),
      -   GUI,
      -   Web UI,
      -   Web-services, COM, …




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   10-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server
Job Management provide:
• Schedule the job execution,
• Inspect current states of jobs,
• Terminate jobs,
etc.




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   11-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server…
Development and executing MPI-programs:
• IDE – VS 6.0, VS 2003, VS 2005,
• Language – С,
• MS MPI is compatible with MPICH-2 (on source
  code level),
• mpiexec is used to run MPI-programs at the same
  way as for MPICH-2




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   12-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server…
Debugging MPI-programs:
• Visual Studio 2005 and CCP has build-in
  MPI-debugger!




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   13-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server…
Debugging MPI-programs




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   14-61
                                                  Gergel V.P.
Microsoft Compute Cluster Server…




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   15-61
                                                  Gergel V.P.
   Needs of High Performance Computations (HPC)
   Windows based Clusters –
    Microsoft Compute Cluster Server
   HPC: Simplicity vs Complexity –
    Brief Introduction to MPI
   How to overcome the complexity –
    HPC Curriculum




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   16-61
                                                  Gergel V.P.
HPC: Simplicity vs Complexity –
Brief Introduction to MPI
                                                        Processor                     Processor
The processors in the computer
systems with distributed memory
                                                         Cache                         Cache
operate independently.                                                   Data
                                                                      Communication
                                                                        Network


                                         RAM                                           RAM
It is necessary to have a possibility:
 - to distribute the computational load,
 - to organize the information communication (data
    transmission) among the processors.

     The solution of the above mentioned problems is
     provided by the MPI (Message Passing Interface)

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft                 17-61
                                                  Gergel V.P.
Brief Introduction to MPI…
Example: Computing the constant 
• The value of constant  can be computed by
   means of the integral
             1

                         
                 4
                    dx
             0
               1 x 2

• To compute this integral the method of
     rectangles can be used for numerical
     integration
                 4
                 3
                 2
                 1
                 0
                     0     0,25        0,5   0,75      1
Microsoft Academic Days 2006 Russia,           High-performance computations
Moscow, 28.04.2005                            and the technologies of Microsoft   18-61
                                                         Gergel V.P.
Brief Introduction to MPI…
// Serial programs
#include <math.h>
double f(double a) {
  return (4.0 / (1.0 + a*a));
}
int main(int argc, char *argv) {
  int n, i;
  double PI25DT = 3.141592653589793238462643;
  double mypi, h, sum, x;
  printf("Enter the number of intervals: ");
  scanf("%d",&n);
  // calculating
  h = 1.0 / (double) n;
  sum = 0.0;
  for ( i = 0; i <= n; i ++ ) {
    x = h * ((double)i - 0.5);
    sum += f(x);
  }
  mypi = h * sum;
  printf("pi is approximately %.16f, error is .16f\n",
    mypi, fabs(pi - PI25DT));
}

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   19-61
                                                  Gergel V.P.
Brief Introduction to MPI…
 Parallel method:
 • Cyclic scheme can be used to distribute the
   calculations among the processors
 • Partial sums, that were calculated on different
   processors, have to be summed
  4
                                                                      - Processor 0
  3                                                                   - Processor 1

  2
                                                                      - Processor 2

  1

  0
      0        0,25        0,5         0,75          1


Microsoft Academic Days 2006 Russia,           High-performance computations
Moscow, 28.04.2005                            and the technologies of Microsoft       20-61
                                                         Gergel V.P.
Brief Introduction to MPI…
  #include "mpi.h"
  #include <math.h>
  double f(double a) {
    return (4.0 / (1.0 + a*a));
  }
  int main(int argc, char *argv) {
    int ProcRank, ProcNum, n, i;
    double PI25DT = 3.141592653589793238462643;
    double mypi, pi, h, sum, x, t1, t2;
    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&ProcNum);
    MPI_Comm_rank(MPI_COMM_WORLD,&ProcRank);
    if ( ProcRank == 0) {
        printf("Enter the number of intervals: ");
        scanf("%d",&n);
        t1 = MPI_Wtime();
    }

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   21-61
                                                  Gergel V.P.
Brief Introduction to MPI
    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
    // calculating the local sums
    h = 1.0 / (double) n;
    sum = 0.0;
    for (i = ProcRank + 1; i <= n; i += ProcNum) {
      x = h * ((double)i - 0.5);
      sum += f(x);
    }
    mypi = h * sum;
    // reduction
    MPI_Reduce(&mypi,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
        if ( ProcRank == 0 ) { // printing results
       t2 = MPI_Wtime();
       printf("pi is approximately %.16f, Error is
                  %.16f\n",pi, fabs(pi - PI25DT));
       printf("wall clock time = %f\n",t2-t1);
    }
    MPI_Finalize();
}


Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   22-61
                                                  Gergel V.P.
   Needs of High Performance Computations (HPC)
   Windows based Clusters –
    Microsoft Compute Cluster Server
   HPC: Simplicity vs Complexity –
    Brief Introduction to MPI
   How to overcome the complexity –
    HPC Curriculum




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   23-61
                                                  Gergel V.P.
How to overcome the complexity –
HPC Curriculum…
HPC: Required Skills and Knowledge:
• Architecture of parallel computer
  systems
• Computation models and methods for
  analyzing complexity of calculations
• Parallel computation methods
• Parallel programming (languages,
  development environments, libraries),…
   It is important to have an integrated teaching
           course on parallel programming
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   24-61
                                                  Gergel V.P.
HPC Curriculum…
     The essential curriculum part is
 an integrated course "HPC and
 parallel programming" which
 provides:
   • studying the models of parallel
     computations,
   • mastering in parallel algorithms and
   • getting practical experience in parallel                                                           R
                                                                                          p1             2
                                                                                                                           p2
     programming.
 The course provides good knowledge in                                                                                p3

                                                                   p1            p1
                                                                                 R                 p1            p1
 many parallel programming areas (models,                                             1
                                                                                                            p
                                                                                                             4

 methods, technologies, programs) for                                   R                  R            R              R
 students. Learning combines theoretical
 classes and laboratory works                                      p2            p2                p2            p2
                                                                          1                    1         1
                                                                   S             T                 U             V
                                                                        запрос   приобретение освобождение
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft                                          25-61
                                                  Gergel V.P.
HPC Curriculum…
Components:
• Course syllabus,
• Syllabus of laboratory works,
                                                               № и type
• E-textbook,
                                                                                     Processors
                                                              of iterat ion    1     2       3     4
                                                              Initial data    2 3   3 8     5 6   1 4


• Program system for supporting
                                                                 1 odd        2 3   3 8     5 6   1 4
                                                              (1,2),(3.4)     2 3   3 8     1 4   5 6
                                                                2 even        2 3   3 8     1 4   5 6
                                                                 (2,3)        2 3   1 3     4 8   5 6
  laboratory works,                                             3 odd
                                                              (1,2),(3.4)
                                                                              2 3
                                                                              1 2
                                                                                    1 3
                                                                                    3 3
                                                                                            4 8
                                                                                            4 5
                                                                                                  5 6
                                                                                                  6 8

• Manual of program system user,                                4 even
                                                                 (2,3)
                                                                              1 2
                                                                              1 2
                                                                                    3 3
                                                                                    3 3
                                                                                            4 5
                                                                                            4 5
                                                                                                  6 8
                                                                                                  6 8


• Function library,
• Function library reference guide,
• PowerPoint presentations for all lections
               http://www.software.unn.ac.ru/ccam
Development of HPC curriculum has been supported
                  by Microsoft
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft                                26-61
                                                  Gergel V.P.
HPC Curriculum…

Highlights of the course:
• Comprehensive learning of the spectrum of
  parallel programming issues (models, methods,
  technologies, programs)
• Organic combination of theoretical classes and
  laboratory training
• Intensive use of research and educational
  software systems for carrying out computational
  experiments

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   27-61
                                                  Gergel V.P.
HPC Curriculum…
 Syllabus:
 • Architecture of parallel computers and their
   classification,
 • Modeling and analysis of parallel computations,
 • Analysis of communication complexity of parallel
   programs,
 • Technology for developing parallel programs:
     – Parallel expansions for industrial algorithmic
       languages (OpenMP),
     – Developer's libraries for parallel programming
       (MPI)
 • Principles of parallel algorithm design,
 • Parallel computation methods
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   28-61
                                                  Gergel V.P.
HPC Curriculum…
                                                                                                                 R
                                                                                            p1                    2
                                                                                                                                       p2
Modeling and analysis of parallel
  computations:                                                                                                                  p3
                                                                                                                                                        R
• Computations model in the form of an                                                  R1

                                                                                                                   p
  “operations – operands“ graph,                                                                                       4




• Description of the scheme for parallel                                                 p1                p1                    p1                p1
  execution of an algorithm,
• Predicting the time for executing of a                                                         R                 R                     R
  parallel algorithm,
• Efficiency criteria of a parallel                                                      p2                p2                    p2                p2
                                                                                                     1                 1                     1
  algorithm                                                                                 S               T                    U                 V
                                                                                                 запрос     приобретение освобождение

                                  (ab)
       +          +           +             <<             2
       *          *          *
                                                           1
        *          *           *              *
     a1 * b1    a2 * b2    a3 * b3        a4 * b4
     *
    >>    *
          >>   *
               >> *  >>   *
                          >> *   >>      *
                                         >> *>>


Microsoft Academic Days 2006 Russia,                                   computations
                                                     High-performance X X X X X
                                                            X X XX
                                                            1   2   3   4   5   6   7   8        9
                                                                                                     X XX
                                                                                                     1 0   1 1   1 2
                                                                                                                       X X X X
                                                                                                                           1 3   1 4   1 5   1 6


Moscow, 28.04.2005                                  and the technologies of Microsoft                                                  29-61
                                                               Gergel V.P.
HPC Curriculum:
Modeling and analysis of parallel computations…

Characteristics of Parallel Algorithm
 Efficiency:
• Speedup                   S p (n)  T1 (n) / T p (n)
• Efficiency                 E p (n)  T1 (n) /( pT p (n))  S p (n) / p


  Very often these criteria are antagonistic !



Microsoft Academic Days 2006 Russia,      High-performance computations
Moscow, 28.04.2005                       and the technologies of Microsoft   30-61
                                                    Gergel V.P.
HPC Curriculum:
Modeling and analysis of parallel computations…
Example: Total Sum Computation…
• The computation of the total sum of the available
  set of values (particular case of the general
  reduction problem):
                                        n
                              S   x  i 1
                                              i




Microsoft Academic Days 2006 Russia,               High-performance computations
Moscow, 28.04.2005                                and the technologies of Microsoft   31-61
                                                             Gergel V.P.
HPC Curriculum:
Modeling and analysis of parallel computations…
Example: Total Sum Computation…
• Sequential summation of the elements of a series
  of values      p
                                                                               +
                n                                                   +
         S   x ,
               i 1
                      i                                 +
                                            +
                                       x1          x2          x3         x4
                                                                                    n


    This “standard” sequential summation algorithm
        allows only strictly serial execution and
                 cannot be parallelized
Microsoft Academic Days 2006 Russia,             High-performance computations
Moscow, 28.04.2005                              and the technologies of Microsoft       32-61
                                                           Gergel V.P.
HPC Curriculum:
Modeling and analysis of parallel computations…
Example: Total Sum Computation…
• Cascade Summation Scheme

                                       +

                          +                       +

              x1          x2           x3          x4           n
              S p  T1 Tp  (n  1) / log 2 n,
              E p  T1 pTp  (n  1) ( p log 2 n)  (n  1) ((n / 2) log 2 n),

               !!! lim E p  0 при n  
Microsoft Academic Days 2006 Russia,         High-performance computations
Moscow, 28.04.2005                          and the technologies of Microsoft    33-61
                                                       Gergel V.P.
HPC Curriculum:
Modeling and analysis of parallel computations
 Example: Total Sum Computation…
 • Modified Cascade Scheme:

           2

           1




          X1 X2 X3 X4     X5 X6 X7 X8   X9 X1 0 X1 1 X1   2
                                                              X1 3 X1 4 X1 5 X1   6



               S p  T1 Tp  (n  1) 2 log 2 n ,
               E p  T1 pTp  (n  1) (2(n / log 2 n) log 2 n)  (n  1) 2n
                E p  (n  1) / 2n  0.25, lim E p  0.5 when n  .
Microsoft Academic Days 2006 Russia,       High-performance computations
Moscow, 28.04.2005                        and the technologies of Microsoft           34-61
                                                     Gergel V.P.
HPC Curriculum:
Analysis of communication complexity…

• Characteristics of the topology
  of data communication network,
• General description of data
  communication techniques,
• Analysis of time complexity for
  data communication operations,
• Methods of logic representation
  of communication topology


Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   35-61
                                                  Gergel V.P.
HPC Curriculum:
Principles of parallel algorithm design
The general scheme of parallel algorithm design
 (proposed by I. Foster)




   Problem Decomposition
   into Subtasks



                           Analysis of Information
                           Dependencies

                                                       Scaling the Subtasks



                                                                              Distributing the Subtasks
                                                                              among Processors




Microsoft Academic Days 2006 Russia,                  High-performance computations
Moscow, 28.04.2005                                   and the technologies of Microsoft                    36-61
                                                                Gergel V.P.
HPC Curriculum: Parallel algorithms…

•    Matrix-vector multiplication
•    Matrix multiplication
•    Sorting                                                        № и тип
                                                                   итерации          1
                                                                                             Процессоры
                                                                                              2      3          4
                                                                   Исходные

•
                                                                                 2 3         3 8    5 6        1 4

     Graph processing                                               данные
                                                                    1 нечет
                                                                   (1,2),(3.4)
                                                                                 2       3   3   8   5   6     1   4
                                                                                 2       3   3   8   1   4     5   6


•
                                                                     2 чет       2       3   3   8   1   4     5   6
     Partial differential equations                                  (2,3)
                                                                    3 нечет
                                                                                 2
                                                                                 2
                                                                                         3
                                                                                         3
                                                                                             1
                                                                                             1
                                                                                                 3
                                                                                                 3
                                                                                                     4
                                                                                                     4
                                                                                                         8
                                                                                                         8
                                                                                                               5
                                                                                                               5
                                                                                                                   6
                                                                                                                   6
                                                                   (1,2),(3.4)   1       2   3   3   4   5     6   8

•    Optimization                                                    4 чет
                                                                     (2,3)
                                                                                 1
                                                                                 1
                                                                                         2
                                                                                         2
                                                                                             3
                                                                                             3
                                                                                                 3
                                                                                                 3
                                                                                                     4
                                                                                                     4
                                                                                                         5
                                                                                                         5
                                                                                                               6
                                                                                                               6
                                                                                                                   8
                                                                                                                   8




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft                                     37-61
                                                  Gergel V.P.
HPC Curriculum: Parallel algorithms…

Example: Matrix multiplication by Cannon’s Method…
• Data distribution – Checkerboard scheme
         A                       B                      C



                             X                      =



• Basic subtask is a procedure, that calculates all
  elements of one block of matrix C
  A00 A01... A0 q 1   B00 B01... B0 q 1   C00C01...C0 q 1                     q 1
                                                                     
                                                                ,     Cij   Ais Bsj
  A A ... A              B B ... B              c C ...C                         s 0
  q 10 q 11 q 1q 1   q 10 q 11 q 1q 1   q 10 q 11 q 1q 1 

Microsoft Academic Days 2006 Russia,          High-performance computations
Moscow, 28.04.2005                           and the technologies of Microsoft                38-61
                                                        Gergel V.P.
HPC Curriculum: Parallel algorithms…
Example: Matrix multiplication by Cannon’s Method…
• Analysis of Information Dependencies:
 - The subtask with the number (i,j) calculates the block Cij, of the
   result matrix C. As a result, the subtasks form the q×q two-
   dimensional grid,
 - The initial distribution of matrix blocks in Cannon’s algorithm is
   selected in such a way that the first block multiplication can be
   performed without additional data transmission:
          At the beginning each subtask (i,j) holds the blocks Aij and Bij,
          For the i-th row of the subtasks grid the matrix A blocks are shifted
           for (i-1) positions to the left,
          For the j –th column of the subtasks grid the matrix B blocks are
           shifted for (j-1) positions upward,
 - Data transmission operations are the example of the circular shift
   communication
Microsoft Academic Days 2006 Russia,      High-performance computations
Moscow, 28.04.2005                       and the technologies of Microsoft         39-61
                                                    Gergel V.P.
HPC Curriculum: Parallel algorithms…

Example: Matrix multiplication by Cannon’s Method…
• Analysis of Information Dependencies:
      - After the redistribution, which was performed at the first
        stage, the matrix blocks can be multiplied without
        additional data transmission operations,
      - To obtain all of the rest blocks after the operation of
        blocks multiplication:
               Matrix A blocks are shifted for one position left along the
                grid row,
               Matrix B blocks are shifted for one position upward along
                the grid column.

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   40-61
                                                  Gergel V.P.
HPC Curriculum: Parallel algorithms…

Example: Matrix multiplication by Cannon’s Method…
• Scaling and Distributing the Subtasks among the
  Processors:
      - The sizes of the matrices blocks can be selected so that
        the number of subtasks will coincides the number of
        available processors p,
      - The most efficient execution of the parallel Canon’s
        algorithm can be provided when the communication
        network topology is a two-dimensional grid,
      - In this case the subtasks can be distributed among the
        processors in a natural way: the subtask (i,j) has to be
        placed to the pi,j processor
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   41-61
                                                  Gergel V.P.
HPC Curriculum: Parallel algorithms…

Example: Matrix multiplication by Cannon’s Method…
• Efficiency Analysis…
     - Speed-up and Efficiency generalized estimates:
                          n2                                n2
              Sp                 p        Ep                        1
                          2
                       n /p                           p  (n / p)
                                                              2


       Developed method of parallel computations allows
    to achieve ideal speed-up and efficiency characteristics




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft    42-61
                                                  Gergel V.P.
HPC Curriculum: Parallel algorithms…
Example: Matrix multiplication by Cannon’s Method…
• Efficiency Analysis (detailed estimates):
- The Cannon’s algorithm differs from the Fox’s algorithm only
 in the types of communication operations, that is:
              T p calc   (n 2 / p )  2n  1  
- Time of the initial redistribution of matrices blocks:
              1
                                  
            T p comm  2    w  (n 2 p ) /           
- After every multiplication operation matrix blocks are shifted:
               2
                                  
             T p comm  2    w  (n 2 p) /           
Total time of parallel algorithm execution is:
  T p  q[(n 2 / p)  2n / q  1  (n 2 / p)]    (2q  2)(  w(n 2 / p ) /  )
Microsoft Academic Days 2006 Russia,            High-performance computations
Moscow, 28.04.2005                             and the technologies of Microsoft       43-61
                                                          Gergel V.P.
HPC Curriculum: Parallel algorithms…
Example: Matrix multiplication by Cannon’s Method…
• Results of computational experiments…
     - Comparison of theoretical estimations and results of
       computational experiments

                                                          40
                           4 processors                   35
 Matrix Size                                              30
                 Model             Experiment
                                                          25
                                                                                              Experiment




                                                   Time
  500×500        0,5908              0,6676               20
                                                                                              Model
                                                          15
 1000×1000       4,4445              4,7065               10
                                                          5
 1500×1500       14,6868             15,4247              0
                                                               500   1000       1500   2000
 2000×2000       34,4428             36,5024
                                                                       Matrix Size




Microsoft Academic Days 2006 Russia,             High-performance computations
Moscow, 28.04.2005                              and the technologies of Microsoft             44-61
                                                           Gergel V.P.
HPC Curriculum: Parallel algorithms
Example: Matrix multiplication by Cannon’s Method
• Results of computational experiments:
    - Speedup
                                Parallel Algorithm,               3,8
                                                                  3,7
  Matrix       Sequential          4 processors                   3,6
   Size        Algorithm                                          3,5




                                                        Speedup
                               Time       Speedup                 3,4
                                                                  3,3
 500×500         2,0628        0,6676       3,0899                3,2
                                                                  3,1
                                                                    3
1000×1000        16,5152       4,7065       3,509                 2,9
                                                                  2,8
1500×1500        56,566       15,4247       3,6672                      500   1000                 1500    2000

2000×2000       133,9128      36,5024       3,6686                                   Matrix Size




Microsoft Academic Days 2006 Russia,            High-performance computations
Moscow, 28.04.2005                             and the technologies of Microsoft                          45-61
                                                          Gergel V.P.
HPC Curriculum: Laboratory classes…

• Methods of parallel programs development
  for multi-processor systems with shared and
  distributed memory using OpenMP and MPI
  technologies
• Training for developing parallel algorithms
  and programs for solving computational
  problems
• Training on using parallel methods libraries
  for solving complex scientific and engineering
  problems

Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   46-61
                                                  Gergel V.P.
HPC Curriculum: Laboratory classes…

   • Computational experiments on parallel systems



    2-х процессорный     2-х процессорный                                                                         2-х процессорный   2-х процессорный
        комплекс             комплекс                                                                                 комплекс           комплекс
                                                 4-х проце ссорный                    4-х проце ссорный
                                                      компле кс                            компле кс

                                                                        Hub

                                                                                                                  2-х процессорный   2-х процессорный
    2-х процессорный     2-х процессорный                           1 Гигабит                                         комплекс           комплекс
        комплекс             комплекс
                                                    Hub   Hub                              Hub



                                                                                      100 Мбит
    2-х процессорный     2-х процессорный                                                                         2-х процессорный   2-х процессорный
        комплекс             комплекс                                                                                 комплекс           комплекс




                   Pentium IV               Pentium IV           Pentium IV             Pentium IV            Pentium IV             Pentium IV




   Pentium IV              Pentium IV               Pentium IV           Pentium IV              Pentium IV            Pentium IV

                                                                 Кластер - класс




Microsoft Academic Days 2006 Russia,                                                                                                               High-performance computations
Moscow, 28.04.2005                                                                                                                                and the technologies of Microsoft   47-61
                                                                                                                                                             Gergel V.P.
HPC Curriculum: Laboratory classes…

    Intensive use of research and education software
     systems for modeling computations on various
     multiprocessor systems and visualization of
     parallel computation processes:
           The system Parallel Laboratory (ParaLab) –
            the software system for studying and
            investigations parallel methods for solving time-
            consuming problems



Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   48-61
                                                  Gergel V.P.
HPC Curriculum: Laboratory classes with
ParaLab…
• Modelling a parallel computing system,
• Choice of computing problems and methods to
  solve them,
• Carrying out computational experiments,
• Parallel computation visualization,
• Information gathering and analysis of results
  (“experiment log"),
• Data archive



Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   49-61
                                                  Gergel V.P.
        HPC Curriculum: Laboratory classes with
        ParaLab…
                                                                         Experiment's results
Area for experiment
 data visualization




            Visualization of an processor's
                      operations


       Microsoft Academic Days 2006 Russia,    High-performance computations
       Moscow, 28.04.2005                     and the technologies of Microsoft         50-61
                                                         Gergel V.P.
HPC Curriculum: Laboratory classes with
ParaLab…

 • Modelling a parallel
   computing system




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   51-61
                                                  Gergel V.P.
HPC Curriculum: Laboratory classes with
ParaLab…
 • Choosing computing problems and methods to
   solve them




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   52-61
                                                  Gergel V.P.
HPC Curriculum: Laboratory classes with
ParaLab…
 • Computational experiments and parallel
   computation visualization
Matrix computations                           Sorting




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   53-61
                                                  Gergel V.P.
HPC Curriculum: Laboratory classes with
ParaLab…
 • Information gathering and analysis of results
   (“experiment log")




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   54-61
                                                  Gergel V.P.
HPC Curriculum: Laboratory classes with
ParaLab


 System usage experience shows, that ParaLab may
 be useful for both novices, who are just starting to
 learn parallel computing, and sometimes even for
    experts in this perspective sphere of strategic
                 computer technology




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   55-61
                                                  Gergel V.P.
Winter School on Parallel Computing
2004, 2005, 2006…
• January 25 – February 7, 2004,
• 39 participants from 11 cities
   in CIS,
• 6 lecture courses given by
   leading specialists
   in parallel computing,
• scientific seminar




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   56-61
                                                  Gergel V.P.
Winter School on Parallel Computing
2004, 2005, 2006…

  School syllabus:
  – Technologies of parallel programming
    (Gergel V. NNSU, Popova N. MSU),
  – Parallel data bases (Sokolinsky L. ChelSU),
  – Parallel computation models (on the based of
    DVM system) Krukov V. (IPM RAN)
  – Parallel computational algorithms
   (Yakobovski M. IMM RAN)



Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   57-61
                                                  Gergel V.P.
Winter School on Parallel Computing
2004, 2005, 2006…
School highlights
• Intensive form of classes (9-0018-00 daily,
   till 21-00 self-instruction works),
• Predominance of practical classes and laboratory
   works,
• Remote access to many Russian high-performance
   resources (clusters of NNSU, MSU, RCC MSU,
   ICC RAN, SPbSU, IAP RAN),
• Holding training on parallel software development
   tools (Intel),
• Holding research and educational seminar for
   students and scientists
      Winter School has been supported by Intel
Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   58-61
                                                  Gergel V.P.
Conclusions

• High performance computing - Challenge for
  CS and IT
• Microsoft Vision – Clusters under Compute
  Cluster Server
• UNN HPC Curriculum provides the easiest
  entering to HPC world




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   59-61
                                                  Gergel V.P.
Contacts


    University of Nizhni Novgorod
    23, Gagarin Avenue
    603950, Nizhni Novgorod

    Tel: 7 (8312) 65-48-59,
    E-mail: gergel@unn.ac.ru




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   60-61
                                                  Gergel V.P.
Questions,
               Remarks,
                                 Something to add…




Microsoft Academic Days 2006 Russia,    High-performance computations
Moscow, 28.04.2005                     and the technologies of Microsoft   61-61
                                                  Gergel V.P.

				
DOCUMENT INFO