Co-arraystobe included in the Fortran 2008Standard

Document Sample
Co-arraystobe included in the Fortran 2008Standard Powered By Docstoc
					 Co-arrays to be included in the
    Fortran 2008 Standard
    John Reid, ISO Fortran Convener

The ISO Fortran Committee has decided to
include co-arrays in the next revision of the
Aim of this talk: introduce co-arrays and
explain why we believe that they will lead
to easier development of parallel programs,
faster execution times, and better
No knowledge of Fortran 2003 is needed. I
will explain the Fortran 95 features used.

           Kyoto University, 30 October, 2006
           Tokyo University, 31 October, 2006
    Summary of co-array model
SPMD – Single Program, Multiple Data
Replicated to a number of images
Images have indices 1, 2, ...
Number of images fixed during execution
Each image has its own set of local
Images execute asynchronously except
when explicitly synchronized: sync all,
sync team, notify, query,...
Variables declared as co-arrays are
accessible on another image through second
set of array subscripts, delimited by [ ] and
mapped to image indices by the usual rule
Intrinsics: this_image, num_images,...
collectives such as co_sum
Critical construct

       Examples of co-array syntax

real :: r[*]    ! Scalar co-array
real :: x(n)[*] ! Array co-array
! Co-arrays always have assumed
! co-size (equal to number of images)

real :: t    ! Local scalar
integer :: p ! Local scalar

t = r[p]
x(:) = x(:)[p]
! Reference without [] is to local part
x(:)[p] = r

            Implementation model
Usually each image resides on one processor.
However, several images may share a processor
(e.g. for debugging) and one image may execute
on a multiple processors (e.g. with OpenMP).
A co-array has the same set of bounds on all
images, so the compiler can arrange that it
occupies the same set of addresses within each
On a shared-memory machine, a co-array can be
implemented as a single large array.
On any machine, a co-array may be implemented
so that each image can calculate the memory
address of an element on another image.


With a few exceptions, the images execute
asynchronously. If syncs are needed, the user
supplies them explicitly.

! Barrier on all images
     sync all
! Barrier on the images of a team
     sync team (team)
! Check into a barrier, but do not wait
     notify (image-set)
! Wait for others to check into barrier
     query (image-set)

For example, to read data on image 1 and get it to
other images:

 if(this_image()==1) read(*,*)p
 sync all
 p = p[1]

                Critical sections
Exceptionally, it may be necessary to limit
execution to one image at a time:
        p[6] = p[6] + 1
     end critical

             Collective subroutines

Intrinsics and involve synchronization.
All have optional argument team.

On every image, given the co-arrays
   real :: x[*], y(n)[*]
   real :: sum, sums(n)
call co_sum(x,sum)
    returns ∑ x[p] and
call co_sum(y(:),sums(:))
    returns ∑ y(:)[p].

co_all           True if all values are true
co_any           True if any value is true
co_count         Numbers of true elements
co_maxloc        Image indices of maximum values
co_maxval        Maximum values
co_product       Products of elements

              Dynamic co-arrays

Only dynamic form: the allocatable co-array.

All images synchronize at an allocate or
deallocate statement so that they can all perform
their allocations and deallocations in the same
order. The bounds must not vary between images.

Automatic co-arrays or co-array-valued functions
would require automatic synchronization, so are
not allowed.

             Co-Arrays and SAVE

Unless allocatable or a dummy argument, a co-
array must be given the SAVE attribute.

This is to avoid the need for synchronization
when co-arrays go out of scope on return from a

A subobject of a co-array without [ ], may be
passed to a co-array.
      The ordinary rules of Fortran 95 apply to
      the local part; the co-rank and co-bounds
      are defined afresh.
      The interface must be explicit.
      No copy-in copy-out.
The rules for resolving generic procedure
references remain unchanged.
No co-array syntax is permitted in pure

             Structure components
A co-array may be of a derived type with
allocatable or pointer components.
Pointers must have targets in their own image:
 q => z[i]%p      ! Not allowed
 allocate(z[i]%p) ! Not allowed
No automatic synchronization. Each image works
Provides a simple but powerful mechanism for
cases where the size varies from image to image,
avoiding loss of optimization.

Syntax to allow teams of images to access a
single file. Allows local buffering.
To open for a team:
There is an implied
   sync team (team)
and the unit must not be opened on other images.
Only cases:
sequential write While an image is writing a
  record, the processor blocks other images.
  Thus each record comes from a single image.
direct access Up to the programmer to
  synchronize access to a single record by more
  than one image.

Most of the time, the compiler can optimize as if
the image is on its own, using its temporary
storage such as cache, registers, etc.
There is no coherency requirement except on
It also has scope to optimize communication.

           Comparison with MPI (i)
MPI is the de-facto standard but is awkward to
program. Here is an example due to Jef Dawson
With co-arrays, to send the first m elements of an
array from one image to another:
real :: a(n)[*]
if ( me.eq.2 ) a(1:m)=a(1:m)[1]
sync all
and with MPI:
real :: a(n)
call mpi_comm_rank(mpi_comm_world, &
                    myrank, errcode)
if (myrank.eq.0) call mpi_send        &
       (a,m,mpi_float,1,tag1, &
if (myrank.eq.1) call mpi_recv        &
       (a,m,mpi_float,0,tag1, &

           Comparison with MPI (ii)
Experience on the Cray vector computers with the
Cray compiler suggests that there is a
performance advantage as the number of
processes increases.
For example, Dawson (2004) reports speed-up of
60 on 64 processors of the Cray X1 for a stencil
update code, compared with 35 for MPI.

Dawson, Jef (2004). Co-array Fortran for
productivity and performance. In Army HPC
Research Center Bulletin, 14, 4.

      Advantages of co-arrays
Easy to write code – the compiler looks
after the communication
References to local data are obvious as
Easy to maintain code – more concise than
MPI and easy to see what is happening
Integrated with Fortran – type checking,
type conversion on assignment, ...
The compiler can optimize communication
Local optimizations still available
Does not make severe demands on the
compiler, e.g. for coherency.

               Further reading
Numrich, Robert W. and Reid, John (2005).
Co-arrays in the next Fortran Standard.
ACM Fortran Forum, 24, 2, 4-17.
Also N1642.pdf in


Shared By: