talk

Document Sample
talk Powered By Docstoc
					        High Performance Computing
        With Java

            Mark Baker
            University of Portsmouth


            9th HPDC, Pittsburgh, USA – 1st August 2000
            http://www.dcs.port.ac.uk/~mab/Talks/

22 January 2012           Mark.Baker@Computer.Org
           Contents
          Programming Paradigms
          Java for High Performance Computing
          The Java Grande Forum
          Java and Parallelism
          Existing Java Frameworks
          RMI and Object Serialization
          mpiJava – A Java MPI-like interface
          MPJ – A pure Java environments
          Summary and Conclusions

22 January, 2012        Mark.baker@Computer.Org
           Motivation
          The limitations of sequential computers
           means that to achieve high
           performance applications need to
           exploit parallel or distributed systems.
          The obvious problem is therefore how
           to use these distributed resources in an
           efficient and effective manner.


22 January, 2012       Mark.baker@Computer.Org
           Motivation
          To exploit a parallel or distributed platform
           mechanisms are required to move data back
           and forth between the processors.
          There are many different ways of doing this,
           but generally it via some means of message
           passing.
          This message passing can range from being
           hidden from the user (such as in distributed
           shared memory) or via explicit message
           passing (via library calls) within the application.

22 January, 2012          Mark.baker@Computer.Org
           Programming Paradigms
      Two application programming paradigms for
       distributed-memory systems:
          Shared memory:
               Sequential programs with a global view of memory.
               Supported by:
                      Hardware (KSR- allcache and Standford DASH),
                      Operating System (IVY/Munin/Midway/…)
                      High-level Software directives (SHMEM/OpenMP/HPF).
          Message passing:
               Low-level protocols (Active/Fast messages/VIA).
               Application-level protocols (MPI/PVM) – more later!!


22 January, 2012                  Mark.baker@Computer.Org
           Programming Paradigms
          Obviously the easiest programs to write
           are sequential ones!
          One would think that Distributed-Shared-
           Memory (DSM) would be the the
           programming paradigm of choice for
           parallel application developers!
          It is not!! – the vast majority of parallel
           applications use the message passing
           programming paradigm.

22 January, 2012        Mark.baker@Computer.Org
           Programming Paradigms
          Why not DSM!
              H/W solutions are complex and expensive.
              O/S solutions currently lack performance
               and scalability for a whole variety of reasons
               – may be changing though!
              Software solutions require a large
               intellectual and development effort
               (compilers/RTL) – still very difficult to
               map/solve irregular problems efficiently.

22 January, 2012           Mark.baker@Computer.Org
           Programming Paradigms
          Why Message Passing?
              Originally considered difficult for a number of
               reasons, including:
                   Portability – many different APIs;
                   Readability…;
                   Formalism (deadlock/race-conditions/etc);
                   An order of magnitude more difficult to produce
                    efficient code than the sequential equivalent!;
                   Efficient coding of large applications reckoned to
                    take a considerable fraction of the life of a parallel
                    machine.

22 January, 2012                Mark.baker@Computer.Org
           Programming Paradigms
          Why Message Passing?
              Even naïve attempts often produce faster and
               more efficient application codes than the
               equivalent DSM efforts.
              Introduction of the de facto Message Passing
               Interface (MPI) standard – more words later.
              Standardising has meant that codes became
               portable, libraries, tools (debuggers, profilers,
               etc.), and other utilities could be developed with
               confidence and less risk.

22 January, 2012             Mark.baker@Computer.Org
           Message Passing Programs
          To write parallel programs scientific
           programmers generally write applications in a
           high-level language such as C or FORTRAN,
           and include, by linking, a message passing
           library.
          The program is:
              Written as source code;
              compiled and linked;
              Execution of the program by running a script file
               that stats the application on a set of named
               computers.

22 January, 2012             Mark.baker@Computer.Org
           Recap.
          So we have established:
              That we need distributed memory platform to
               achieve the high performance capabilities that
               applications need.
              We need a software infrastructure to pass data back
               and forth between remote processors.
              The most common way to get high performance is
               to develop message passing applications.
              Now onto Java, and where that fits into our story…



22 January, 2012            Mark.baker@Computer.Org
           High Performance Java
          Sounds like an oxymoron – High Performance
           and Java!!
          A great deal of interest in the idea that Java
           may be a good language for scientific and
           engineering computation, and in particular
           for parallel computing.
          It is claimed that Java is simple, efficient and
           platform-neutral – a natural language for
           network programming!!

22 January, 2012          Mark.baker@Computer.Org
          Java: Key Technical Ideas
         A Better Language:
             Simplicity and C/C++ compatibility promote fluency;
             GC and Threads allow software components;
             Platform independence saves time;
             Strong typing catches errors up front;
             Declared exceptions forces coverage in code.
         Scalable Applications;
             Threads for parallel speedup;
             Dynamic linking allows simple apps to grow;
             Range of implementations from JavaCard to
              HotSpot.

22 January, 2012            Mark.baker@Computer.Org
           Parallel Programming
          These factors make Java potentially attractive
           to scientific programmers hoping to harness the
           collective computational power of Clusters of
           PCs/workstations, or even of the Internet.
          A basic prerequisite for parallel programming is
           a good communication API.
          Java comes with several:
              Notably an easy-to-use interface to BSD sockets;
              The Remote Method Invocation (RMI) mechanism.


22 January, 2012            Mark.baker@Computer.Org
          Parallel Programming
         Interesting as these are, it is questionable
          whether parallel programmers will find
          them especially convenient.
         Sockets and RPC have existed as long as
          parallel computing, and neither of them
          has been popular in that field.



22 January, 2012       Mark.baker@Computer.Org
           Missing From Java
          These communication models are
           optimized for Client/Server programming,
           whereas the parallel computing world is
           mainly concerned with "symmetric"
           communication, occurring in groups of
           interacting peers (SPMD).
          So, Java is missing a user-level message
           passing API (like MPI/PVM) – more later.

22 January, 2012       Mark.baker@Computer.Org
           Problems with Java
          High Performance application developers
           have found that Java has some critical
           areas that need addressing, these
           include:
              Improved floating-point;
              Complex arithmetic;
              Multidimensional arrays;
              Lightweight classes;
              Operator overloading.

22 January, 2012          Mark.baker@Computer.Org
           The Java Grande Forum
          The aforementioned aspects of Java means
           that there are some problems that hinder its
           use for large applications.
          Java Grande Forum was created to make Java
           a better platform for Grande applications-
           http://www.JavaGrande.org.
          Currently two working groups are exist:
              Numeric – complex and FP arithmetic, multi-
               dimensional arrays, operator overloading, etc.
              Concurrency/Applications – performance of RMI and
               object serialization, benchmarking, etc.

22 January, 2012             Mark.baker@Computer.Org
           The Goal of Java Grande Forum?
          Java has potential to be a better environment
           for “Grande application development” than any
           previous languages such as Fortran and C++.
          The Forums goal is to develop community
           consensus and recommendations for either
           changes to Java or establishment of standards
           (frameworks) for “Grande” libraries and services
          These Language changes or frameworks are
           designed to realize “best ever Grande
           programming environment”.

22 January, 2012         Mark.baker@Computer.Org
           Forum Goals
          Essential changes to Java Language:
              In particular remove unrealistic goal that could and
               should get same answer on every machine;
              Also things like the use of full precision and aggressive
               compiler optimizations.
          Allow operator overloading – which has been
           abused but is natural (and essential) in
           scientific computing.
          Lightweight objects needed for complex types.


22 January, 2012               Mark.baker@Computer.Org
           Forum Goals
          Matrices, rounding, interval arithmetic,
           exceptions (traps v flags), subscript checking,
           need for better garbage collection aimed at
           scientific contiguous data structure, role of
           compilers versus language.
          Most important in the near term – encourage
           Sun to make a few key changes in Java to
           allow it to be a complete efficient Grande
           Programming Language.


22 January, 2012          Mark.baker@Computer.Org
           Critical Needs for Numerics
          Numerics: Java as a language for
           mathematics led by Ron Boisvert and
           Roldan Pozo from NIST.
           http://math.nist.gov/javanumerics/
          Changes in Java controversial handling of
           floating point arithmetic, which currently
           has goal of reproducible results, that
           leads to non-optimal accuracy.

22 January, 2012       Mark.baker@Computer.Org
           Critical Needs for Numerics
          Lightweight classes and operator
           overloading – enables implementation of
           complex as a class.
          “Fortran rectangular multi-dimensional
           arrays” – Java naturally has “arrays of
           arrays”.
          High quality math libraries with agreed
           interfaces – FFT, matrices and linear
           algebra, transcendental functions.

22 January, 2012       Mark.baker@Computer.Org
           Other Needs of Java
          Distributed and Parallel Computing led by
           Dennis Gannon and Denis Caromel (INRIA,
           France).
          Performance of RMI (attractive Java distributed
           object model – “remote method invocation”).
          Performance of Java runtime (the virtual
           machine VM) with lots of threads, I/O, memory
           use.
          Parallel computing interfaces including Java
           MPI-like bindings.

22 January, 2012         Mark.baker@Computer.Org
          Java and Parallelism…
           There are several forms of parallelism:
          1) Fine grain functional parallelism as in the overlap
             of communication and computation;
          2) Coarse grain distributed objects;
          3) Fine grain data parallelism which is historically
             either implemented with high level language (HPF)
             or explicit user defined message passing (MPI) –
             this gives "massive parallelism" as in parallelism
             over grid points or particles in scientific computing.



22 January, 2012           Mark.baker@Computer.Org
           Java and Parallelism…
          In a Nutshell, Java is better than previous
           languages for a) and b) and no worse for c):
             Automatically provides "task parallelism"

              which needs to be added in painful fashion
              for Fortran;
             The Web integration of Java gives it

              excellent “network” classes and support for
              message passing.



22 January, 2012         Mark.baker@Computer.Org
          Java and Parallelism…
         Thus a “Java plus message passing” form of
          parallel computing is actually somewhat easier
          than in Fortran or C.
            Java has been integrated with PVM and MPI

             using either pure Java or with Java wrappers
             to existing implementations.
         Coarse grain parallelism very natural in Java
          with the use of RMI.


22 January, 2012        Mark.baker@Computer.Org
           Java and Parallelism…
          “Data Parallel” languages features are NOT in
           Java and have to be added extending ideas
           from HPF/HPC++.
          Java has built in “threads” and a given Java
           program can run multiple threads at a time:
              In Web use, allows one to process image in one
               thread, HTML page in another, etc.
              Threads can be used to do more general parallel
               computing but only on shared memory computers.


22 January, 2012            Mark.baker@Computer.Org
           Recap
          We have identified that Java has
           possibilities as being a natural language
           with which to develop high performance
           applications.
          We have highlighted a number of current
           deficiencies (numerics and lack of a high-
           level message passing interface).
          We now review some Java message
           passing frameworks.

22 January, 2012        Mark.baker@Computer.Org
           Java Based Frameworks
          Use Java as wrapper for existing frameworks;
              (mpiJava, Java/DSM, JavaPVM).
          Use pure Java libraries:
              (MPJ, DOGMA, JPVM, JavaNOW).
          Extend Java language with new keywords:
              Use pre-processor or own compiler to create.
              Java (byte) code – HPJava, Manta, JavaParty,
               Titanium.
          Web oriented and use Java applets to execute
           parallel task (WebFlow, IceT, Javelin).
22 January, 2012            Mark.baker@Computer.Org
           Use Java as wrapper for existing
           frameworks.
          JavaMPI: U. of Westminster
              Java wrapper to MPI;
              Wrappers are automatically generated from
               the C MPI header using Java-to-C interface
               generator (JCI);
              Close to C binding, not Object-oriented;
          JavaPVM (jPVM): Georgia Tech.
              Java wrapper to PVM.

22 January, 2012          Mark.baker@Computer.Org
           Use Java as wrapper for existing
           frameworks.
          Java/DSM: Rice U.
              Heterogeneous computing system.
              Implements a JVM on top of a TreadMarks DSM
               system.
              One JVM on each machine – all objects are allocated
               in the shared memory region.
              Provides transparency: Java/DSM combination hides
               the hardware differences from the programmer.
              Since communication is handled by the underlying
               DSM, no explicit communication is necessary.


22 January, 2012            Mark.baker@Computer.Org
           Use pure Java libraries(I)
          JPVM: U. of Virginia
              A pure Java implementation of PVM.
              Based on communication over TCP sockets.
              Performance is very poor compared to JavaPVM.
          jmpi: Baskent U.
              A pure Java implementation of MPI built on top of
               JPVM.
              Due to additional wrapper layer to JPVM routines, its
               performance is poor compared to JPVM, (JavaPVM <
               JPVM < jmpi)

22 January, 2012             Mark.baker@Computer.Org
           Use pure Java libraries(II)
          MPIJ: Brigham Young U.
              A pure Java based subset of MPI developed
               as part of the Distributed Object Group
               Meta-computing Architecture(DOGMA)
              Hard to use…




22 January, 2012          Mark.baker@Computer.Org
          Use pure Java libraries(III)
         JavaNOW: Illinois Institute Tech.
             Shared memory based system and
              experimental message passing framework.
             Creates a virtual parallel machine like PVM.
              Provides shared memory similar to Linda.
             Currently available as standalone software
              and must be used with a remote (or secure)
              shell tool in order to run on a network of
              workstations.

22 January, 2012          Mark.baker@Computer.Org
           Extend Java Language
          Use pre-processor to create Java code.
          Own compiler to create Java byte code or
           executable code, but loose the portability of
           Java.
          Manta: Vrije University
              Compiler-based high-performance Java system.
              Uses native compiler for aggressive optimisations.
              Has optimised RMI protocol (Manta RMI).



22 January, 2012             Mark.baker@Computer.Org
          Extend Java Language
         Titanium: UC Berkeley
             Java based language for high-performance parallel
              scientific computing.
             Titanium compiler translates Titanium into C.
             Extends Java with additional features like:
                  Classes which behave like existing C structs;
                  Multidimensional arrays;
                  An explicitly parallel SPMD model of computation with a
                   global address space;
                  A mechanism for programmer to control memory
                   management.


22 January, 2012                Mark.baker@Computer.Org
          Extend Java Language
         JavaParty: University of Karlsruhe
             Provides a mechanism for parallel
              programming on distributed memory
              machines.
             Compiler generates the appropriate Java
              code plus RMI hooks.
             The remote keywords is used to identify
              which objects can be called remotely.


22 January, 2012         Mark.baker@Computer.Org
           Web oriented
          IceT: Emory University
              Enables users to share JVMs across a network.
              A user can upload a class to another virtual machine using a
               PVM-like interface.
              By explicitly calling send and receive statements, work can be
               distributed among multiple JVMs.
          Javelin: UC Santa Barbara
              Internet-based parallel computing using Java by running Java
               applets in web browsers.
              Communication latencies are high since web browsers use
               RMIs over TCP/IP, typically over slow Ethernets.



22 January, 2012                Mark.baker@Computer.Org
           Communications in Java
          We noted earlier that in Java we can
           communicate between processes/objects
           with either RMI or Sockets.
          In the next section we briefly discuss RMI
           (and naturally object serialization) and its
           performance.



22 January, 2012        Mark.baker@Computer.Org
          RMI and Object Serialization
         RMI
             Provides easy access to objects on remote machines.
             Designed for Client/Server applications over unstable
              and slow networks.
             Fast remote method invocations with low latency
              and high bandwidth are required for HPC.
         Object Serialization
             Provides the ability to read or write a whole object
              to and from a raw byte stream.
             Essential feature needed by RMI implementation
              when method arguments are passed by copy.

22 January, 2012             Mark.baker@Computer.Org
           RMI
          A RMI carries baggage in the form of
           descriptors and data to support remote method
           invocations.
          Objects needs to be registered with the RMI
           daemon before they can have their methods
           invoked.
          RMI has reasonable performance for the
           initiation activities of Java applications, but is
           slow compared to Java Sockets for inter-
           process communications.

22 January, 2012          Mark.baker@Computer.Org
           Problems with Object Serialization
          Does not handle float and double types efficiently:
              The type cast which is implemented in the JNI, requires various
               time consuming operations for check-pointing and state
               recovery.
              Float arrays invokes the above mentioned JNI routine for every
               single array element.
          Costly encoding of type information:
              For every type of serialized object, all fields of the type are
               described verbosely.
          Object creation takes too long:
              Object output and input should be overlapped to reduce
               latency.


22 January, 2012                 Mark.baker@Computer.Org
            Costs of Serialization
            byte                      float
        t          = 0.043s      t           = 2.1s
         ser                          ser
            byte                      float
        t unser= 0.027s          t unser= 1.4s
            byte                      float
        tcom = 0.062s             tcom = 0.25s        (non-shared)

         byte                         float
        t com = 0.008s            tcom = 0.038s       (shared)


22 January, 2012           Mark.baker@Computer.Org
          Typical Costs of Serialization
         Cost of serializing and unserializing an
          individual float are one to two orders
          of magnitude greater than
          communication!




22 January, 2012       Mark.baker@Computer.Org
           Object serialization comments
          Relatively easy to get dramatic improvements
           in performance – customise implementation
           (non-standard then).
          Performance hit is due to marshalling and
           unmarshalling objects.
          Naïve implementation too slow for bulk data
           transfer.
          Optimizations should bring asymptotic
           performance in line with C/Fortran MPI.

22 January, 2012         Mark.baker@Computer.Org
           Java Message Passing
          Mentioned that message passing was the
           most popular parallel programming
           paradigm.
          RMI/Sockets best for Client/Server
           computing paradigm (numerous
           reasons!)
          MPI is the most popular API for user-level
           message passing in parallel applications.
          What follows is a brief overview of MPI.
22 January, 2012       Mark.baker@Computer.Org
           Message Passing Programs
          Usually the computers are identified by IP
           addresses and background processes manage
           the creation of remote processes.
          Generally, applications are based on the SPMD
           programming model.
          Each process is a copy of the same executable,
           and program behaviour depends on the
           identifier of the process.
          Processes exchange data by sending messages.

22 January, 2012         Mark.baker@Computer.Org
          Message Passing
         Some of the early examples of message
          passing systems include Express, NX,
          PVM, P4, and Tiny/Chimp…
         In reaction to the wide variety of non-
          standard messaging passing systems
          around at this time, the MPI Forum was
          formed as a consortium of about 60
          people from universities, government
          laboratories, and industry in 1992.

22 January, 2012      Mark.baker@Computer.Org
           Message Passing Interface (MPI)
          MPI Version 1.0 was approved as a de
           jure standard interface for message-
           passing parallel applications in May 1994.
          The Forum continued to meet and in April
           1997, the MPI-2 standard was published.
          MPI-2 includes additional support for
           dynamic process management, one-sided
           communication, and parallel I/O.

22 January, 2012       Mark.baker@Computer.Org
           Message Passing Interface (MPI)
          Many implementations have been developed.
          Some implementations, such as MPICH
           (ANL/MSS), are freely available.
          Others are commercial products optimized for
           a particular manufacturers system, such as
           SUN or SGI machines.
          Generally, each MPI implementation is built
           over a faster and less functional low-level
           interfaces, such as Active Message, BSD
           Sockets, or the SHMEM interface.

22 January, 2012        Mark.baker@Computer.Org
          MPI I
         The first release of the MPI standard specifies
          support for collective operations, and non-
          blocking sends and receives.
         Collective operations in MPI are those that have
          either more than one sender, or more than one
          receiver.
         Collective calls provide a more convenient
          interface for the application programmer and
          are often implemented using special, efficient,
          algorithms.

22 January, 2012         Mark.baker@Computer.Org
           MPI I
          MPI supports broadcast, scatter, gather,
           reduce and variations of these such as
           all-to-all, all-reduce, and all-gather.
          MPI also implements a barrier
           synchronization, which allows all
           processes to come to a common point
           before proceeding.
          MPI I specifies standards for block and
           non-blocking sends and receives.

22 January, 2012       Mark.baker@Computer.Org
           MPI II
          The second release of the MPI standard, MPI
           II, specifies additional standards for one-way
           communications, similar to the writes to shared
           distributed memory locations.
          It also specifies:
              A standard way to create MPI processes;
              Provide support for heterogeneous platforms;
              Extended collective operations;
              I/O operations;
              Multithreading on individual computers.


22 January, 2012            Mark.baker@Computer.Org
           Projects: MPI and Java
          mpiJava (Syracuse/FSU)
          JavaMPI (Westminster)
          JMPI (MPI Software Technology)
          MPIJ (Brigham Young)
          Jmpi (Basknet)




22 January, 2012      Mark.baker@Computer.Org
          Standardization?
         Currently all implementations of MPI for
          Java have different APIs.
         An “official” Java binding for MPI
          (complementing Fortran, C, C++ bindings)
          would help.
         Position paper and draft API: Carpenter,
          Getov, Judd, Skjellum and Fox, 1998.


22 January, 2012      Mark.baker@Computer.Org
          Java Grande Forum
         Level of interest in message-passing for Java
          healthy, but not enough to expect MPI forum
          to reconvene.
         More promising to work within the Java
          Grande Forum – Message-Passing Working
          Group formed (as a subset of the existing
          Concurrency and Applications working group).
         To avoid conflicts with MPIF, Java effort
          renamed to MPJ.

22 January, 2012        Mark.baker@Computer.Org
           MPJ
          Group of enthusiasts, informally chaired.
          Meetings in San Francisco (Java ‟99 and
           May 00), Syracuse, and Portland (SC „99).
          Regular attendance by members of Sun
           HPC group, amongst others.




22 January, 2012       Mark.baker@Computer.Org
          The mpiJava wrapper
         Implements a Java API for MPI suggested
          in late „97.
         Builds on work on Java wrappers for MPI
          started at NPAC about a year earlier.
         People: Bryan Carpenter, Yuh-Jye Chang,
          Xinying Li, Sung Hoon Ko, Guansong
          Zhang, Sang Lim and Mark Baker.


22 January, 2012      Mark.baker@Computer.Org
          mpiJava features.
         Fully featured Java interface to MPI 1.1
         Object-oriented API based on MPI 2
          standard C++ interface
         Initial implementation through JNI to
          native MPI
         Comprehensive test suite translated from
          IBM MPI suite
         Available for Solaris, Windows NT and
          other platforms
22 January, 2012       Mark.baker@Computer.Org
          Class hierarchy
                    MPI


                    Group
                                                     Cartcomm
                                      Intracomm
                    Comm                             Graphcomm
      Package mpi                     Intercomm
                    Datatype


                    Status


                    Request            Prequest




22 January, 2012           Mark.baker@Computer.Org
          Basic Datatypes
                   MPI Datatype   Java Datatype
                   MPI.BYTE       byte
                   MPI.CHAR       char
                   MPI.SHORT      short
                   MPI.BOOLEAN    boolean
                   MPI.INT        int
                   MPI.LONG       long
                   MPI.FLOAT      float
                   MPI.DOUBLE     double
                   MPI.OBJECT     object



22 January, 2012           Mark.baker@Computer.Org
           Minimal mpiJava program

   import mpi.*
   class Hello {
       static public void main(String[] args) {
           MPI.Init(args);

           int myrank = MPI.COMM_WORLD.Rank();
           if(myrank == 0) {
               char[] message = “Hello, there”.toCharArray();
               MPI.COMM_WORLD.Send(message, 0, message.length, MPI.CHAR, 1, 99);
           }
           else {
               char[] message = new char [20];
               MPI.COMM_WORLD.Recv(message, 0, 20, MPI.CHAR, 0, 99);
               System.out.println(“received:” + new String(message) + “:”);
           }
           MPI.Finalize();
       }
   }


22 January, 2012               Mark.baker@Computer.Org
          mpiJava implementation issues
         mpiJava is currently implemented as
          Java interface to an underlying MPI
          implementation - such as MPICH or some
          other native MPI implementation.
         The interface between mpiJava and the
          underlying MPI implementation is via the
          Java Native Interface (JNI).


22 January, 2012      Mark.baker@Computer.Org
          mpiJava - Software Layers

                     MPIprog.java

                     Import mpi.*;

                    JNI C Interface

                   Native Library (MPI)




22 January, 2012       Mark.baker@Computer.Org
           mpiJava implementation issues
          Interfacing Java to MPI not always trivial,
           e.g., see low-level conflicts between the Java
           runtime and interrupts in MPI.
          Situation improving as JDK matures - 1.2
          Now reliable on Solaris MPI (SunHPC, MPICH),
           shared memory, NT (WMPI).
          Linux - Blackdown JDK 1.2 beta just out and
           seems OK - other ports in progress.


22 January, 2012        Mark.baker@Computer.Org
            mpiJava - Test Machines

         Processor         Memory           OS      Interconnect
       Dual PII 200 MHz    128 MB          NT 4        10 Mbps
                                          (SP3)        Ethernet
        Dual UltraSparc     256 MB        Solaris      10 Mbps
           200 MHz                          2.5        Ethernet
        450 MHz PII &      256 MB &       Linux       100 Mbps
         100 MHz P5         64 MB           2.X        Ethernet




22 January, 2012          Mark.baker@Computer.Org
          The PingPong Program
      Increasing sized messages are sent back and forth
       between processes
      This benchmark is based on standard blocking
       MPI_Send()/MPI_Recv().
      PingPong provides information about latency and
       uni-directional bandwidth.
      To ensure that anomalies in message timings do not
       occur the PingPong is repeated for all message
       lengths.
       See  http://www.dcs.port.ac.uk/~mab/TOPIC

22 January, 2012        Mark.baker@Computer.Org
          mpiJava performance

         Wsock     WMPI-C   WMPI-J    MPICH-C   MPICH-J   Linux-C   Linux-J
   SM   144.8μs 67.2μs      161.4μs   148.7μs   374.6μs   132.8μs   405.4μs
   DM   244.9μs 623.3μs     689.7μs   679.1μs   961.2μs   264.0μs   527.1μs




22 January, 2012              Mark.baker@Computer.Org
                        Bandwidth (Log) versus Message Length
                        mpiJava performance -DM mode
                              (In Distributed Memory)
                         20
                         10
                          5
                          4
                          3
                                                                                                   Plot Key
                          2
 Bandwidth (MBytes/s)




                                       100 Mbps                                                      WMPI-C
                          1
                          .5                                                                         WMPI-J
                          .4
                          .3
                          .2                                                                         WSock
                          .1                                            10 Mbps
                         .05
                                                                                                     Solaris-C
                         .04
                         .03
                         .02                                                                         Solaris-J
                         .01                                                                         Linux-J
                        .005
                        .004
                        .003                                                                         Linux-C
                        .002
                        .001                                                                         Java
                               1   4      16      64   256   1K    4K    16K     64K   256K   1M


                                               Message Length (Bytes)

22 January, 2012                                       Mark.baker@Computer.Org
          mpiJava - CFD: inviscid flow




22 January, 2012   Mark.baker@Computer.Org
          mpiJava - Q-state Potts model




22 January, 2012   Mark.baker@Computer.Org
          Some Conclusions
         mpiJava provides a fully functional
          Java interface to MPI.
         mpiJava does not impose a huge
          overhead on top of MPI in DM mode.
         Discovered the limitations of JNI -
          MPICH signals, seems have been solved
          in JDK 1.2.
         Careful programming needed with JNI.
22 January, 2012     Mark.baker@Computer.Org
          Some Conclusions
         Tests show that the additional latencies
          are due to the relatively poor
          performance of the JVM rather than
          traversing more software layers.
         mpiJava is also providing a popular
          parallel programming teaching tool.



22 January, 2012      Mark.baker@Computer.Org
          MPJ
         The Message-Passing Working Group of
          the Java Grande Forum has been
          working on a common messaging API.
         An initial draft for a specification was
          distributed at Supercomputing '98.
         The present API is now called MPJ.
         Version 1.3 of mpiJava will implement
          the new API.
22 January, 2012       Mark.baker@Computer.Org
          Problems with mpiJava
         mpiJava wrappers rely on the availability of
          platform-dependent native MPI implementation
          for the target computer, some disadvantages:
             2-stage installation:
                  Get/build native MPI then install/match the Java wrappers.
                  Tedious/off-putting to new users.
             Problems – conflicts between the JVM and the native
              MPI runtime behaviour.
                  mpiJava now runs on various combinations of JVM and
                   MPI implementation.
             Strategy simply conflicts with the ethos of Java –
              write-once-run-anywhere.

22 January, 2012                Mark.baker@Computer.Org
          MPJ
         An MPJ reference implementation could be
          implemented as:
             Java wrappers to a native MPI implementation,
             Pure Java,
             Principally in Java – native methods to optimize
              operations (like marshalling arrays of primitive
              elements) that are difficult to do efficiently in Java.
         We are aiming at pure Java to provide an
          implementation of MPJ that is maximally
          portable and that hopefully requires the
          minimum amount of support effort.
22 January, 2012             Mark.baker@Computer.Org
          Benefits of a pure Java
          implementation of MPJ
         Highly portable – assumes only a Java
          development environment.
         Performance: moderate – may need JNI
          inserts for marshalling arrays.
         Network speed limited by Java sockets.
         Good for education/evaluation.
         Vendors provide wrappers to native MPI
          for ultimate performance?
22 January, 2012      Mark.baker@Computer.Org
          MPJ
         We envisage that a user will download a jar-file
          of MPJ library classes onto machines that may
          host parallel jobs, and install a daemon on those
          machines – technically by registering an
          activatable object with an rmid daemon.
         Parallel java codes are compiled on one host.
         An mpjrun program invoked on that host
          transparently loads the user's class files into
          JVMs created on remote hosts by the MPJ
          daemons, and the parallel job starts.

22 January, 2012        Mark.baker@Computer.Org
           Design Criteria
           for an MPJ Environment
         Need an infrastructure to support
          groups of distributed processes:
             Resource discovery;
             Communications (not discussed here);
             Handling failure and fault tolerance;
             Spawn processes on hosts;
             …


22 January, 2012          Mark.baker@Computer.Org
           Jini as an Infrastructure
          Jini is the name for a distributed computing
           environment, that can offer “network plug and
           play''.
          A device or a software service can be
           connected to a network and announce its
           presence.
          Clients that wish to use such a service can then
           locate it and call it to perform tasks.
          It seems that Jini will provide the appropriate
           infrastructure for MPJ.

22 January, 2012         Mark.baker@Computer.Org
          Jini Technologies




22 January, 2012   Mark.baker@Computer.Org
          Jini Architecture




22 January, 2012    Mark.baker@Computer.Org
          A Jini Community




22 January, 2012   Mark.baker@Computer.Org
           Lookup service
          Repository of available services.
          Stores each service as an extensible set of
           Java objects.
          Service objects may be downloaded to clients
           as required.
          May be federated with other lookup services.
          Lookup service interface provides:
              Registration, Access, Search, Removal.



22 January, 2012            Mark.baker@Computer.Org
          Discovery and Join
         Allow a Jini service (hardware or
          software) to:
             Find and join a group of Jini services;
             Advertise capabilities/services;
             Provide required software and attributes.




22 January, 2012          Mark.baker@Computer.Org
          Registering a Service

                                                         Server
            Lookup Service
                                 Discovery Request
              Registrar                                  Service




               Service Proxy                             Registrar
                  Object
                                Service Registration




22 January, 2012               Mark.baker@Computer.Org
          Jini Lookup




22 January, 2012   Mark.baker@Computer.Org
          Acquiring compute slaves
          through Jini
                   Daemon



                                                MPJ Process




                                     mpjrun …




22 January, 2012            Mark.baker@Computer.Org
           Leasing in Jini
          Protocol for managing resources using a
           renewable, duration-based model.
          Contract between objects.
          Provides a method of managing
           resources in an environment where
           network failures can, and do, occur.



22 January, 2012       Mark.baker@Computer.Org
           Distributed events in Jini
          Enables Java event model to work in a
           distributed network.
          Register interest, receive notification.
          Allows for use of event managers.
          Can use numerous distributed delivery
           models:
              Push, pull, filter...


22 January, 2012             Mark.baker@Computer.Org
           Transactions in Jini
          Designed for distributed object
           coordination: Light weight, object-
           oriented.
          Supports nested transactions.
          Supports various levels of ACID
           properties (Atomicity, Consistency,
           Isolation, Durability).
          Implemented in Transaction Manager
           service.
22 January, 2012      Mark.baker@Computer.Org
           MPJ - Implementation
          In the short-to-medium-term – before Jini
           software is widely installed – we might have
           to provide a “lite” version of MPJ that is
           unbundled from Jini.
          Designing for the Jini architecture should,
           nevertheless, have a beneficial influence on
           overall robustness and maintainability.
          Use of Jini implies use of RMI for various
           management functions.

22 January, 2012         Mark.baker@Computer.Org
           MPJ – Implementation
          The role of the MPJ daemons and their
           associated infrastructure is to provide an
           environment consisting of a group of
           processes with the user-code loaded and
           running in a reliable way.
          The process group is reliable in the
           sense that no partial failures should be
           visible to higher levels of the MPJ
           implementation or the user code.

22 January, 2012        Mark.baker@Computer.Org
          MPJ – Implementation
         We will use Jini leasing to provide fault
          tolerance – clearly no software
          technology can guarantee the absence of
          total failures, where the whole MPJ job
          dies at essentially the same time.




22 January, 2012       Mark.baker@Computer.Org
          MPJ – Implementation
         If any slave dies, a client generates a Jini
          distributed event, MPIAbort()– all
          slaves are notified and all processes
          killed.
         In case of other failures (network failure,
          death of client, death of controlling
          daemon, …) client leases on slaves
          expire in a fixed time, and processes are
          killed.

22 January, 2012        Mark.baker@Computer.Org
           MPJ – Implementation
          Checkpointing and restarting of
           interrupted jobs may be quite useful.
          Checkpointing would not happen without
           explicit invocation in the user-level code,
           or that restarting would happen
           automatically.
          A serialized object can be saved to disk
           and restarted.

22 January, 2012        Mark.baker@Computer.Org
           MPJ - Implementation
          Once a reliable cocoon of user processes has
           been created through negotiation with the
           daemons, we have to establish connectivity.
          In the reference implementation this will be
           based on Java sockets.
          Recently there has been interest in producing
           Java bindings to VIA - eventually this may
           provide a better platform on which to
           implement MPI, but for now sockets are the
           only realistic, portable option.

22 January, 2012         Mark.baker@Computer.Org
    Slave 1         Slave 2        Slave 3




                                                  Slave 4




    Host

                                                            Mpj Deamon

           Mpjrun myproggy –np 4
                                                               rmid


                                                            http server


22 January, 2012              Mark.baker@Computer.Org
           MPJ – Future Work
          On-going effort (proposal + volunteer help).
          Collaboration to define exact MPJ interface
           – consisting of other Java MP system
           developers.
          Work at the moment is based around the
           development of the low-level MPJ device
           and exploring the functionality of Jini.
          Looking at Security deeply!

22 January, 2012        Mark.baker@Computer.Org
           Summary and Conclusions
          Using Java as a language for high performance
           applications has been shown to be possible and
           is in fact actually been used.
          It is possible to use Java for parallel
           applications from fine-grain ones to course-
           grain meta-problems.
          On going efforts within Java Grande to improve
           Java for scientific and engineering applications:
              Implementation issues (numerics);
              Semantics (serialization, concurrency).


22 January, 2012             Mark.baker@Computer.Org
          Summary and Conclusions
         MPJ – a unified effort to produce a
          “message passing” interface with MPI-like
          functionality for Java.
         Jini can be used as the infrastructure to
          produce a runtime environment for MPJ.
         Java is being increasingly used to develop
          high performance applications.


22 January, 2012       Mark.baker@Computer.Org
          Summary and Conclusions
         Java is however restricted and until some
          of the issues being addressed by the Java
          Grande forum are addressed and adopted
          by Sun it usage for high performance
          applications will be restricted.




22 January, 2012      Mark.baker@Computer.Org

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:6
posted:1/22/2012
language:
pages:103