Docstoc

PGAS programming with UPC and Fortran coarrays.pdf

Document Sample
PGAS programming with UPC and Fortran coarrays.pdf Powered By Docstoc
					PGAS: new languages for 
parallel computing
DEISA / TeraGrid Workshop, 5/10/2010

David Henty (EPCC)

Thanks to: Alan Simpson, Michele Weiland, Jim Enright,
           Savvas Petrou (EPCC);
           Harvey Richardson, Bill Long, Nathan Wichmann (Cray)
The PGAS 
Programming Model
Motivation
• Explicit message‐passing is cumbersome
• Performance may not be optimal
   • two‐sided model
   • hard to hide communications latency
   • memory‐hungry
   • invisible to compiler
   • not multicore‐aware
   • ...
• Take a higher‐level view and involve the compiler
   • Partitioned Global Address Space model
   • have both local and remote data

                                                      3
Why consider new programming models?
• Next‐generation architectures  bring new challenges:
      Very large numbers of processors with many cores
      Complex memory hierarchy
      even today (2010) we are at 200k cores
• Parallel programming is hard, need to make this simpler
• Some of the models we currently use are
      bolt‐ons to existing languages as APIs or directives
      Hard to program for underlying architecture
      unable to scale due to overheads
• So, is there an alternative to the models prevalent today?
      Most popular are OpenMP and MPI …


                                                               4
Shared‐memory directives and OpenMP

 memory




                threads




                                      5
OpenMP: work distribution

 memory




                                    !$OMP PARALLEL DO
                                    do i=1,32
                                      a(i)=a(i)*2
             1-8 9-16 17-24 25-32   end do


                   threads




                                                        6
OpenMP implementation

 memory

              process




                threads

    cpus


                          7
Cooperating Processes Models




        PROBLEM




                               processes

                                           8
Message Passing, MPI
  process

       memory      memory   memory




       cpu         cpu      cpu




                                     9
MPI
           process 0             process 1

           memory                memory




           cpu                   cpu


      MPI_Send(a,...,1,…)   MPI_Recv(a,...,0,…)


                                                  10
Fortran coarray model
      image        image     image

    memory        memory   memory




    cpu           cpu      cpu




                                     11
UPC
       thread   thread    thread

      memory    memory   memory




      cpu       cpu      cpu




                                   12
UPC
       thread           thread             thread

      memory            memory            memory




      cpu               cpu               cpu

                upc_forall(i=0;i<32;i++;affinity)
                  a[i]=a[i]*2



                                                    13
High Performance Fortran (HPF)
• Data Parallel programming model
• Single thread of control
• Arrays can be distributed and operated on in parallel
• Loosely synchronous
• Parallelism mainly from Fortran 90 array syntax, FORALL and 
  intrinsics.
• This model popular on SIMD hardware (AMT DAP, Connection 
  Machines) but extended to clusters where control thread is 
  replicated




                                                                 14
HPF

  memory   memory            memory   memory



  pe       pe                pe       pe




                    memory




                    cpu

                                               15
HPF

  memory   memory            memory       memory



  pe       pe                pe           pe




                    memory


                                      a (distributed)


                                      a(:)= a(:)*2
                    cpu

                                                        16
Differences

  User needs to distribute both work and data in parallel

  These models do this in different ways

  What does the model provide for / require of the user


                            Work?       Data?
               MPI          No          No
               OpenMP       Yes         No
               HPF          No          Yes
               CAF          No          No
               UPC          Yes         Yes
Hello World!

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...




                                 18
Hello World!

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...




     Hello World!




                                 19
Hello World!

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...




     Hello World!    <- serial




                                 20
Hello World!

     #include <mpi.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...




                                 21
Hello World!

     #include <mpi.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello     World!
     Hello     World!
     Hello     World!
     Hello     World!

                                 22
Hello World!

     #include <mpi.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello     World!   <-   rank   0
     Hello     World!   <-   rank   1
     Hello     World!   <-   rank   2
     Hello     World!   <-   rank   3

                                        23
Hello World!

     #include <omp.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...




                                 24
Hello World!

     #include <omp.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello World!




                                 25
Hello World!

     #include <omp.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello World!       <- serial




                                    26
Hello World!

     #include <omp.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello     World!
     Hello     World!
     Hello     World!
     Hello     World!

                                 27
Hello World!

     #include <omp.h>

     int main(void) {
     #pragma omp parallel {
     printf(“Hello World!\n”);
     ...

     Hello     World!   <-   thread   0
     Hello     World!   <-   thread   1
     Hello     World!   <-   thread   2
     Hello     World!   <-   thread   3

                                          28
Hello World!

     program HPF
     ...
     write(*,*) ‘Hello World!’
     ...




                                 29
Hello World!

     program HPF
     ...
     write(*,*) ‘Hello World!’
     ...




     Hello World!




                                 30
Hello World!

     program HPF
     ...
     write(*,*) ‘Hello World!’
     ...




     Hello World!    <- serial




                                 31
Hello World!

     #include <upc.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...




                                 32
Hello World!

     #include <upc.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello     World!
     Hello     World!
     Hello     World!
     Hello     World!

                                 33
Hello World!

     #include <upc.h>

     int main(void) {
     ...
     printf(“Hello World!\n”);
     ...

     Hello     World!   <-   thread   0
     Hello     World!   <-   thread   1
     Hello     World!   <-   thread   2
     Hello     World!   <-   thread   3

                                          34
Hello World!

     program CAF
     ...
     write(*,*) ‘Hello World!’
     ...




                                 35
Hello World!

     program CAF
     ...
     write(*,*) ‘Hello World!’
     ...




     Hello     World!
     Hello     World!
     Hello     World!
     Hello     World!

                                 36
Hello World!

     program CAF
     ...
     write(*,*) ‘Hello World!’
     ...




     Hello     World!   <-   image   1
     Hello     World!   <-   image   2
     Hello     World!   <-   image   3
     Hello     World!   <-   image   4

                                         37
Quite a few differences already ...

  Work and data distribution is also fundamentally different

  PGAS may not be a very useful categorisation ...
Fortran coarrays
Basic Features
Coarray Fortran
  "Coarrays were designed to answer the question:

  ‘What is the smallest change required to convert Fortran 
  into a robust and efficient parallel language?’

  The answer: a simple syntactic extension. 
  It looks and feels like Fortran and requires 
  Fortran programmers to learn only a few new rules."

                                   John Reid, 
                                   ISO Fortran Convener




                                                              40
Some History
• Introduced in current form by Numrich and Reid in 1998 as a 
  simple extension to Fortran 95 for parallel processing

• Many years of experience, mainly on Cray hardware

• A set of core features are expected to form part of the Fortran 
  2008 standard

• Additional features are expected to be published in a Technical 
  Report in due course.




                                                                     41
How Does It Work?

• SPMD ‐ Single Program, Multiple Data
     single program replicated a fixed number of times

• Each replication is called an image

• Images are executed asynchronously
     execution path may differ from image to image
     some situations cause images to synchronize

• Images access remote data using coarrays

• Normal rules of Fortran apply
                                                         42
What are coarrays?
• Arrays or scalars that can be accessed remotely
     images can access data objects on any other image
• Additional Fortran syntax for coarrays
     Specifying a codimension declares a coarray

   real, dimension(10), codimension[*]:: x
   real :: x(10)[*]

     these are equivalent declarations of a array x 
     of size 10 on each image
     x is now remotely accessible
     coarrays have the same size on each image!


                                                         43
Accessing coarrays

integer :: a(4)[*], b(4)[*] !declare coarrays
b(:) = a(:)[n]              ! copy

• integer arrays a and b declared to be size 4 on all images
• copy array a from remote image n into local array b
• () for local access        [] for remote access
• e.g. for two images and n = 2:
             image 1                        image 2

     a   1    2   3    4            a   2    9   3    7
     b   2
         5    6
              9   7
                  3    8
                       7            b   10 11 12 13
                                         2  9 3  7




                                                               44
Synchronisation
• Be careful when updating coarrays:
      If we get remote data was it valid?
      Could another process send us data and overwrite 
      something we have not yet used?
      How do we know that data sent to us has arrived?
• Fortran provides intrinsic synchronisation statements
• For example, barrier for synchronisation of all images:
      sync all

• do not make assumptions about execution timing on images
      unless executed after synchronisation
      Note there is implicit synchronisation at program start
                                                                45
Retrieving information about images
• Two intrinsics provide index of this image and number of 
  images
     this_image()         (image indexes start at 1)
     num_images()

   real :: x[*]
   if(this_image() == 1) then
     read *,x
     do image = 2,num_images()
       x[image] = x
     end do
   end if
   sync all

                                                              46

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:9/3/2012
language:English
pages:46
censhunay censhunay http://
About