Linking Programming models between Grids, Web 2.0 and Multicore by ygq15756

VIEWS: 16 PAGES: 47

									Linking Programming models
between Grids, Web 2.0 and
          Multicore
   Distributed Programming Abstractions
      Workshop NESC Edinburgh UK
                 May 31 2007


              Geoffrey Fox
     Computer Science, Informatics, Physics
       Pervasive Technology Laboratories
    Indiana University Bloomington IN 47401

               gcf@indiana.edu                1
            http://www.infomall.org

                          Points in Talk I
    All parallel programming projects more or less fail
   All distributed programming projects report success
     • There are several hundred in Grid workflow area alone
   Few constraints on distributed programming
   Composition (in distributed computing) v decomposition (in parallel
    computing)
   There is not much difference between distributed programming and a key
    paradigm of parallel computing (functional parallelism)
   Pervasive use of  64 core chips in the future will often require one to build a
    Grid on a chip i.e. to execute a traditional distributed application on a chip
   XML is a pretty dubious syntax for expressing programs
   Web 2.0 is pretty scruffy but there are some large companies and many users
    behind it.
   Web 2.0 and Grids will converge and features of both will survive or disappear
    in merged environment
   Web 2.0 has a more plausible approach to distributed programming than Web
    Services/Grids
   Dominant Distributed Programming models will support Multicore, Web 2.0
                                                                                     2
    and Grids

                        Some More points computing
    Services could be universal abstraction in parallel and distributed
     •   Whereas objects could not be universal so perhaps should move away from their use
   Gateways/Portals (Portlets, Widgets, Gadgets) are natural user (application usage)
    interface to a collection of services
   Important Data (SQL, WFS, RSS Feeds) abstractions
   Divide Parallel Programming Run-time (matching application structure) into 3 or 4
    Broad classes
   Inter-entity communication time characteristic of different programming model
     •   1-5 µs for MPI/Thread switching to 1-1000 milliseconds for services on the Grid and 25 µs
         for services inside a chip
   Multicore Commodity Programming Model
     •   Marine corps write libraries in “HLA++”, MPI or dynamic threads (internally one
         microsecond latency) expressed as services
     •   Services composed/mashuped by “millions”
   Many composition (coordination) or mashup approaches
     •   Functional (cf. Google Map Reduce for data transformations)
     •   Dataflow
     •   Workflow
     •   Visual
     •   Script
   The difficulties of making effective use of multicore chips will so great that it will be
    main driver of new programming environments
   Microsoft CCR DSS is good example of unification of parallel and distributed
    computing
                    Some Details
   See http://www.slideshare.net/Foxsden or more
    conventionally
   Web 2.0 and Grid Tutorial
    • http://grids.ucs.indiana.edu/ptliupages/presentations/CTSpar
      tIMay21-07.ppt
    • http://grids.ucs.indiana.edu/ptliupages/presentations/Web20T
      utorial_CTS.ppt
   Multicore and Parallel Computing Tutorial
    • http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/
      index.html
   “Web 2.0” citation site
    http://www.connotea.org/user/crmc
        Web 2.0 and Web Services I
   Web Services have clearly defined protocols (SOAP) and a well defined
    mechanism (WSDL) to define service interfaces
     • There is good .NET and Java support
     • The so-called WS-* (WS-Nightmare) specifications provide a rich
        sophisticated but complicated standard set of capabilities for security,
        fault tolerance, meta-data, discovery, notification etc.
   “Narrow Grids” build on Web Services and provide a robust managed
    environment with growing adoption in Enterprise systems and distributed
    science (e-Science)
   We can use the term Grids strictly as Narrow Grids that are collections of
    Web Services (or even more strictly OGSA Grids) or just call any collections
    of services as “Broad Grids” which actually is quite often done
   Web 2.0 supports a similar architecture to Web services but has developed in
    a more chaotic but remarkably successful fashion with a service architecture
    with a variety of protocols including those of Web and Grid services
     • Over 400 Interfaces defined at http://www.programmableweb.com/apis
   One can easily combine SOAP (Web Service) based services/systems with
    HTTP messages but the “lowest common denominator” suggests additional
    structure/complexity of SOAP will not easily survive
      Web 2.0 and Web Services II
   Web 2.0 also has many well known capabilities with Google
    Maps and Amazon Compute/Storage services of clear general
    relevance
   There are also Web 2.0 services supporting novel collaboration
    modes and user interaction with the web as seen in social
    networking sites and portals such as: MySpace, YouTube,
    Connotea, Slideshare ….
   I once thought Web Services were inevitable but this is no longer
    clear to me
   Web services are complicated, slow and non functional
     • WS-Security is unnecessarily slow and pedantic
       (canonicalization of XML)
     • WS-RM (Reliable Messaging) seems to have poor adoption
       and doesn‟t work well in collaboration
     • WSDM (distributed management) specifies a lot
   There are de facto standards like Google Maps and powerful
    suppliers like Google which “define the rules”
     Attack of the Killer Multicores
   Today commodity Intel systems are sold with 8 cores spread over
    two processors
   Specialized chips such as GPU‟s and IBM Cell processor have
    substantially more cores
   Moore‟s Law implies and will be satisfied by and imply
    exponentially increasing number of cores doubling every 1.5-3
    Years
     • Modest increase in clock speed
     • Intel has already prototyped a 80 core Server chip ready in
       2011?
   Huge activity in parallel computing programming (recycled from
    the past?)
     • Some programming models and application styles similar to
       Grids
   We will have a Grid on a chip …………….
                                                                  7
    Grids meet Multicore Systems
   The expected rapid growth in the number of cores per chip has
    important implications for Grids
   With 16-128 cores on a single commodity system 5 years from
    now one will both be able to build a Grid like application on a
    chip and indeed must build such an application to get the
    Moore‟s law performance increase
     • Otherwise you will “waste” cores …..
   One will not want to reprogram as you move your application
    from a 64 node cluster or transcontinental implementation to a
    single chip Grid
   However multicore chips have a very different architecture from
    Grids
     • Shared not Distributed Memory
     • Latencies measured in microseconds not milliseconds
   Thus Grid and multicore technologies will need to “converge”
    and converged technology model will have different
    requirements from current Grid assumptions                      8
    Grid versus Multicore Applications
   It seems likely that future multicore applications will
    involve a loosely coupled mix of multiple modules that
    fall into three classes
     • Data access/query/store
     • Analysis and/or simulation
     • User visualization and interaction
   This is precisely mix that Grids support but Grids of
    course involve distributed modules
   Grids and Web 2.0 use service oriented architectures to
    describe system at module level – is this appropriate
    model for multicore programming?
   Where do multicore systems get their data from?
                                                              9
   RMS: Recognition Mining Synthesis
Recognition                      Mining                           Synthesis
What is …?                       Is it …?                        What if …?


                            Find a model                      Create a model
  Model
                              instance                           instance

                                  Today
 Model-less           Real-time streaming and                 Very limited realism
                           transactions on
                     static – structured datasets


                               Tomorrow
Model-based            Real-time analytics on                  Photo-realism and
multimodal             dynamic, unstructured,                    physics-based
recognition             multimodal datasets                       animation

    Intel has probably most sophisticated analysis of
         future “killer” multicore applications –
  they are “just” standard Grid and parallel computing
                  Pradeep K. Dubey, pradeep.dubey@intel.com                          10
           Recognition                                              Mining                               Synthesis




          What is a tumor?                                 Is there a tumor here?                 What if the tumor progresses?


                       It is all about dealing efficiently with complex multimodal datasets

Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html


                                                      Pradeep K. Dubey, pradeep.dubey@intel.com                                   11
Intel‟s Application Stack
       PC07Intro gcf@indiana.edu   12
    Role of Data in Grid/Multicore I
   One typically is told to place compute (analysis) at the
    data but most of the computing power is in multicore
    clients on the edge
   These multicore clients can get data from the internet
    i.e. distributed sources
    • This could be personal interests of client and used by client to
      help user interact with world
    • It could be cached or copied
    • It could be a standalone calculation or part of a distributed
      coordinated computation (SETI@Home)
   Or they could get data from set of local sensors (video-
    cams and environmental sensors) naturally stored on
    client or locally to client
                                                                     13
    Role of Data in Grid/Multicore
   Note that as you increase sophistication of data
    analysis, you increase ratio of compute to I/O
    • Typical modern datamining approach like Support Vector
      Machine is sophisticated (dense) matrix algebra and not just
      text matching
    • http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/PC07BYOPA.ppt

   Time complexity of Sophisticated data analysis will
    make it more attractive to fetch data from the Internet
    and cache/store on client
    • It will also help with memory bandwidth problems in
      multicore chips
   In this vision, the Grid “just” acts as a source of data
    and the Grid application runs locally
                                                                                   14
 Multicore Programming Paradigms
• At a very high level, there are three or four broad classes of
  parallelism
• Coarse grain functional parallelism typified by workflow
  and often used to build composite “metaproblems” whose
  parts are also parallel
   – “Compute-File”, Database/Sensor, Community, Service, Pleasing
     Parallel (Master-worker) are sub-classses
• Large Scale loosely synchronous data parallelism where
  dynamic irregular work has clear synchronization points as
  in most large scale scientific and engineering problems
• Fine grain (asynchronous) thread parallelism as used in
  search algorithms which are often data parallel (over
  choices) but don‟t have universal synchronization points
• Discrete Event Simulations are either a fourth class or a
  variant of thread parallelism
                         PC07Intro gcf@indiana.edu                   15
      Data Parallel Time Dependence
• A simple form of data parallel applications are synchronous with all elements
  of the application space being evolved with essentially the same instructions
• Such applications are suitable for SIMD computers and run well on vector
  supercomputers (and GPUs but these are more general than just
  synchronous)
• However synchronous applications also run fine on MIMD machines
• SIMD CM-2 evolved to MIMD CM-5 with same data parallel language
  CMFortran
• The iterative solutions to Laplace‟s equation are synchronous as are many full
  matrix algorithms
                                           Application Time     Synchronous

    Synchronization on MIMD           t4
    machines is accomplished          t3
    by messaging                      t2
    It is automatic on SIMD           t1
    machines!                         t0
                                                    Application Space
                                              Identical evolution algorithms
     Local Messaging for Synchronization
•   MPI_SENDRECV is typical primitive
•   Processors do a send followed by a receive or a receive followed by a send
•   In two stages (needed to avoid race conditions), one has a complete left shift
•   Often follow by equivalent right shift, do get a complete exchange
•   This logic guarantees correctly updated data is sent to processors that have their data at same
    simulation time




                                                            ………
                     Application and Processor Time

Communication
Phase
Compute
Phase

Communication
Phase
Compute
Phase
Communication
Phase
Compute
Phase
                                                         8 Processors
Communication
Phase

                                                                      Application Space
      Loosely Synchronous Applications
• This is most common large scale science and engineering
  and one has the traditional data parallelism but now
  each data point has in general a different update
    – Comes from heterogeneity in problems that would be
      synchronous if homogeneous
• Time steps typically uniform but sometimes need to support variable time steps
  across application space – however ensure small time steps are t = (t1-
  t0)/Integer so subspaces with finer time steps do synchronize with full domain
                                           Application Time
  • The time synchronization via
    messaging is still valid          t4
  • However one no longer load        t3
    balances (ensure each processor   t2
    does equal work in each time      t1
    step) by putting equal number     t0
    of points in each processor                     Application Space
  • Load balancing although NP                Distinct evolution algorithms for
    complete is in practice
    surprisingly easy                         each data point in each processor
               MPI Futures?
• MPI likely to become more important as
  multicore systems become more common
• Should use MPI when MPI needed and use other
  messaging for other cases (such as linking
  services) where different features/performance
  appropriate
• MPI has too many primitives which will
  handicap broad implementation/adoption
• Perhaps only have one collective primitive like
  CCR which allows general collective operations
  to be built by user
   Fine Grain Dynamic Applications
• Here there is no natural universal „time‟ as there is in science
  algorithms where an iteration number or Mother Nature‟s time
  gives global synchronization
• Loose (zero) coupling or special features
  of application needed for successful         Application Time
  parallelization
• In computer chess, the minimax scores
  at parent nodes provide multiple
  dynamic synchronization points
     Application Time




                                               Application Space



           Application Space
 Computer Chess
• Thread level parallelism unlike
  position evaluation parallelism
  used in other systems
• Competed with poor reliability
  and results in 1987 and 1988
  ACM Computer Chess
  Championships




                                    Increasing
                                    search depth
         Discrete Event Simulations
• These are familiar in military and circuit (system) simulations
  when one uses macroscopic approximations
   – Also probably paradigm of most multiplayer Internet games/worlds
• Note Nature is perhaps synchronous when viewed quantum
  mechanically in terms of uniform fundamental elements (quarks
  and gluons etc.)
• It is loosely synchronous when considered in terms of particles
  and mesh points
• It is asynchronous                  Battle of Hastings
   when viewed in
  terms of tanks,
   people, arrows etc.
• Circuit simulations
  can be done loosely
  synchronously but
  inefficient as many
  inactive elements
               Programming Models
• The three major models are supported by HPCS languages which
  are very interesting but too monolithic
• So the Fine grain thread parallelism and Large Scale loosely
  synchronous data parallelism styles are distinctive to parallel
  computing while
• Coarse grain functional parallelism of multicore overlaps with
  workflows from Grids and Mashups from Web 2.0
• Seems plausible that a more uniform approach evolve for coarse
  grain case although this is least constrained of programming
  styles as typically latency issues are not critical
   – Multicore would have strongest performance constraints
   – Web 2.0 and Multicore the most important usability constraints
• A possible model for broad use of multicores is that the difficult
  parallel algorithms are coded as libraries (Fine grain thread
  parallelism and Large Scale loosely synchronous data parallelism
  styles) while the general user uses composes with visual interfaces,
  scripting and systems like Google MapReduce
            Google MapReduce
Simplified Data Processing on Large Clusters
• http://labs.google.com/papers/mapreduce.html
• This is a dataflow model between services where services can do useful
  document oriented data parallel applications including reductions
• The decomposition of services onto cluster engines is automated
• The large I/O requirements of datasets changes efficiency analysis in favor of
  dataflow
• Services (count words in example) can obviously be extended to general
  parallel applications
• There are many alternatives to language expressing either dataflow and/or
  parallel operations and indeed one should support multiple languages in spirit
  of services




                              PC07Intro gcf@indiana.edu                        24
          Programming Models
• The services and objects in distributed
  computing are usually “natural” (come from
  application) whereas parts connected by MPI (or
  created by parallelizing compiler) come from
  “artificial” decompositions and not naturally
  considered services
• Services in multicore (parallel computing) are
  original modules before decomposition and its
  these modules that coarse grain functional
  parallelism addresses
• Most of “difficult” issues in parallel computing
  concern treatment of decomposition
Parallel Software Paradigms: Top Level
• In the conventional two-level Grid/Web Service
  programming model, one programs each
  individual service and then separately programs
  their interaction
  – This is Grid-aware Services programming model
  – SAGA supports Grid-aware programs?
• This is generalized to multicore with “Marine
  Corps” programming services for “difficult”
  cases
  – Loosely Synchronous
  – Fine Grain threading
  – Discrete Event Simulation
      The Marine Corps Lack of
Programming Paradigm Library Model
• One could assume that parallel computing is “just too hard
  for real people” and assume that we use a Marine Corps of
  programmers to build as libraries excellent parallel
  implementations of “all” core capabilities
   – e.g. the primitives identified in the Intel application
     analysis
   – e.g. the primitives supported in Google MapReduce, HPF,
     PeakStream, Microsoft Data Parallel .NET etc.
• These primitives are orchestrated (linked together) by
  overall frameworks such as workflow or mashups
• The Marine Corps probably is content with efficient rather
  than easy to use programming models
  Component Parallel and Program Parallel
• Component parallel paradigm is where one explicitly programs
  the different parts of a parallel application with the linkage either
  specified externally as in workflow or in components themselves
  as in most other component parallel approaches
   – In Grids, components are natural
   – In Parallel computing, components are produced by decomposition
• In the program parallel paradigm, one writes a single program to
  describe the whole application and some combination of compiler
  and runtime breaks up the program into the multiple parts that
  execute in parallel
• Note that a program parallel approach will often call a built in
  runtime library written in component parallel fashion
   – A parallelizing compiler could call an MPI library routine
• Could perhaps better call “Program Parallel” as “Implicitly
  Parallel” and “Component Parallel” as “Explicitly Parallel”
  Component Parallel and Program Parallel
• Program Parallel approaches include
   – Data structure parallel as in Google MapReduce, HPF (High
     Performance Fortran), HPCS (High-Productivity Computing
     Systems) or “SIMD” co-processor languages (PeakStream,
     ClearSpeed and Microsoft Data Parallel .NET)
   – Parallelizing compilers including OpenMP annotation
   – Note OpenMP and HPF have failed in some sense for large scale
     parallel computing (writing algorithm in standard sequential
     languages throws away information needed for parallelization)
• Component Parallel approaches include
   – MPI (and related systems like PVM) parallel message passing
   – PGAS (Partitioned Global Address Space CAF, UPC, Titanium,
     HPJava )
   – C++ futures and active objects
   – CSP … Microsoft CCR and DSS
   – Workflow and Mashups
   – Discrete Event Simulation
            Why people like MPI!
• Jason J Beech-Brandt, and Andrew A. Johnson, at AHPCRC
  Minneapolis
• BenchC is unstructured finite element CFD Solver
• Looked at
  OpenMP on
  shared memory
   Altix with some    After Optimization of UPC
   effort to
  optimize
• Optimized UPC
  on several
  machines
• MPI always good
  but other
  approaches
  erratic
• Other studies
  reach similar
  conclusions?                                     cluster
                                                cluster
    Web 2.0 Systems are Portals, Services, Resources
   Captures the incredible development of interactive
    Web sites enabling people to create and collaborate




                      The world does itself in large numbers!
                Mashups v Workflow?
   Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63
   Workflow Tools are reviewed by Gannon and Fox
    http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf
   Both include
    scripting in PHP,
    Python, sh etc. as
    both implement
    distributed
    programming at level
    of services
   Mashups use all
    types of service
    interfaces and do not
    have the potential
    robustness (security)
    of Grid service
    approach
   Typically “pure”
    HTTP (REST)                                                                  32
                    Web 2.0 APIs
   http://www.programmable
    web.com/apis has (May 14
    2007) 431 Web 2.0 APIs
    with GoogleMaps the most
    often used in Mashups
   This site acts as a “UDDI”
    for Web 2.0
The List of
Web 2.0 API‟s
   Each site has API and
    its features
   Divided into broad
    categories
   Only a few used a lot
    (42 API‟s used in
    more than 10
    mashups)
   RSS feed of new APIs
   Amazon S3 growing
    in popularity
APIs/Mashups per Protocol Distribution
                                                                         google
     Number of                                                            maps
     APIs
            Number of
            Mashups


           del.icio.us                                                    virtual
   411sync
          yahoo! search                                                   earth
 yahoo! geocoding
      technorati
                                                                          netvibes
      yahoo! images
      trynt                                                     amazon
         yahoo! local                                                      live.com
                    google                                       ECS
                    search                             flickr
                                                                 ebay
                     amazon S3
                                           youtube

     REST          SOAP          XML-RPC    REST,     REST,     REST,      JS         Other
                                           XML-RPC   XML-RPC,   SOAP
                                                      SOAP
                                            4 more Mashups
                                            each day
                                               For a total of 1906
                                                April 17 2007 (4.0 a
                                                day over last
                                                month)
                                               Note ClearForest
                                                runs Semantic Web
                                                Services Mashup
                                                competitions (not
                                                workflow
                                                competitions)
                                               Some Mashup
                                                types: aggregators,
                                                search aggregators,
                                                visualizers, mobile,
Growing number of commercial Mashup Tools       maps, games
    Implication for Grid Technology
      of Multicore and Web 2.0 I
   Web 2.0 and Grids are addressing a similar application
    class although Web 2.0 has focused on user interactions
     • So technology has similar requirements
   Multicore differs significantly from Grids in
    component location and this seems particularly
    significant for data
     • Not clear therefore how similar applications will be
     • Intel RMS multicore application class pretty similar
       to Grids
   Multicore has more stringent software requirements
    than Grids as latter has intrinsic network overhead
                                                          37
    Implication for Grid Technology
      of Multicore and Web 2.0 II
   Multicore chips require low overhead protocols to
    exploit low latency that suggests simplicity
    • We need to simplify MPI AND Grids!
   Web 2.0 chooses simplicity (REST rather than SOAP)
    to lower barrier to everyone participating
   Web 2.0 and Multicore tend to use traditional (possibly
    visual) (scripting) languages for equivalent of workflow
    whereas Grids use visual interface backend recorded in
    BPEL
     • Google MapReduce illustrates a popular Web 2.0
       and Multicore approach to dataflow
                                                           38
    Implication for Grid Technology
      of Multicore and Web 2.0 III
   Web 2.0 and Grids both use SOA Service Oriented
    Architectures
     • Seems likely that Multicore will also adopt although a more
       conventional object oriented approach also possible
     • Services should help multicore applications integrate
       modules from different sources
     • Multicore will use fine grain objects but coarse grain
       services
   “System of Systems”: Grids, Web 2.0 and Multicore are likely
    to build systems hierarchically out of smaller systems
     • We need to support Grids of Grids, Webs of Grids, Grids
       of Multicores etc. i.e. systems of systems of all sorts
                                                                39
          The Ten areas covered by the 60 core WS-*
                        Specifications
WS-* Specification Area           Typical Grid/Web Service Examples
1: Core Service Model             XML, WSDL, SOAP
2: Service Internet               WS-Addressing, WS-MessageDelivery; Reliable
                                  Messaging WSRM; Efficient Messaging MOTM
3: Notification                   WS-Notification, WS-Eventing (Publish-
                                  Subscribe)
4: Workflow and Transactions      BPEL, WS-Choreography, WS-Coordination
5: Security                       WS-Security, WS-Trust, WS-Federation, SAML,
                                  WS-SecureConversation
6: Service Discovery              UDDI, WS-Discovery
7: System Metadata and State      WSRF, WS-MetadataExchange, WS-Context
8: Management                     WSDM, WS-Management, WS-Transfer
9: Policy and Agreements          WS-Policy, WS-Agreement
10: Portals and User Interfaces   WSRP (Remote Portlets)
                        WS-* Areas and Web 2.0
WS-* Specification Area        Web 2.0 Approach
1: Core Service Model          XML becomes optional but still useful
                               SOAP becomes JSON RSS ATOM
                               WSDL becomes REST with API as GET PUT etc.
                               Axis becomes XmlHttpRequest
2: Service Internet            No special QoS. Use JMS or equivalent?
3: Notification                Hard with HTTP without polling– JMS perhaps?
4: Workflow and Transactions   Mashups, Google MapReduce
(no Transactions in Web 2.0)   Scripting with PHP JavaScript ….
5: Security                    SSL, HTTP Authentication/Authorization,
                               OpenID is Web 2.0 Single Sign on
6: Service Discovery           http://www.programmableweb.com
7: System Metadata and State   Processed by application – no system state –
                               Microformats are a universal metadata approach
8: Management==Interaction     WS-Transfer style Protocols GET PUT etc.
9: Policy and Agreements       Service dependent. Processed by application
10: Portals and User Interfaces Start Pages, AJAX and Widgets(Netvibes) Gadgets
                        WS-* Areas and Multicore
WS-* Specification Area           Typical Grid/Web Service Examples
1: Core Service Model             Fine grain Java C# C++ Objects and coarse grain
                                  services as in DSS. Information passed explicitly
                                  or by handles. MPI needs to be updated to handle
                                  non scientific applications as in CCR
2: Service Internet               Not so important intrachip
3: Notification                   Publish-Subscribe for events and Interrupts
4: Workflow and Transactions      Many approaches; scripting languages popular
5: Security                       Not so important intrachip
6: Service Discovery              Use libraries
7: System Metadata and State      Environment Variables
8: Management == Interaction      Interaction between objects key issue in parallel
                                  programming trading off efficiency versus
                                  performance
9: Policy and Agreements          Handled by application
10: Portals and User Interfaces   Web 2.0 technology popular
CCR as an example of a Cross Paradigm
             Run Time
• Naturally supports fine grain thread switching
  with message passing with around 4 microsecond
  latency for 4 threads switching to 4 others on an
  AMD PC with C#. Threads spawned – no
  rendezvous
• Has around 50 microsecond latency for coarse
  grain service interactions with DSS extension
  which supports Web 2.0 style messaging
• MPI Collectives – Shift and Exchange vary from
  10 to 20 microsecond latency in rendezvous mode
• Not as good as best MPI‟s but managed code and
  supports Grids Web 2.0 and Parallel Computing
                     Microsoft CCR
• Supports exchange of messages between threads using named
  ports
• FromHandler: Spawn threads without reading ports
• Receive: Each handler reads one item from a single port
• MultipleItemReceive: Each handler reads a prescribed number of
  items of a given type from a given port. Note items in a port can
  be general structures but all must have same type.
• MultiplePortReceive: Each handler reads a one item of a given
  type from multiple ports.
• JoinedReceive: Each handler reads one item from each of two
  ports. The items can be of different type.
• Choice: Execute a choice of two or more port-handler pairings
• Interleave: Consists of a set of arbiters (port -- handler pairs) of 3
  types that are Concurrent, Exclusive or Teardown (called at end
  for clean up). Concurrent arbiters are run concurrently but
  exclusive handlers are
• http://msdn.microsoft.com/robotics/
                           PC07Intro gcf@indiana.edu                  44
                       Rendezvous exchange as two shifts
                                    Latency/Overhead
                       Rendezvous exchange customized for MPI
                       Rendezvous Shift
        25




        20




        15
                                                                         AMDExch
 Time




                             Time Microseconds
                                                                         AMD Exch as
                                                                         AMD Shift
        10




        5



                           Stages (millions)
        0
             0    2          4                 6     8           10
                                                          Millions
Overhead (latency) of AMD 4-core PC with 4 execution threads on MPI style
                                 Stages
Rendezvous Messaging for Shift and Exchange implemented either as two shifts or as
custom CCR pattern. Compute time is 10 seconds divided by number of stages
                          Latency/Overhead up to million stages

                Rendezvous exchange as two shifts
       90
                Rendezvous exchange customized for MPI
                Rendezvous Shift
       80
                             Time Microseconds
       70                                                                    INTELEX
                                                                             INTEL Ex
       60                                                                    INTEL Sh


       50
Time




       40

       30

       20

       10
                                   Stages (millions)
       0
            0   0.2          0.4             0.6         0.8          1
Overhead (latency) of INTEL 8-core PC with 8 execution        Millions
                                                                    style
                                                         threads on MPI
Rendezvous Messaging for Shift and Exchange implemented either as two shifts or as
                                 Stages
custom CCR pattern. Compute time is 15 seconds divided by number of stages
                                  350

                                                         DSS Service Measurements
Average run time (microseconds)

                                  300

                                  250

                                  200

                                  150

                                  100

                                   50

                                    0
                                        1               10                   100             1000            10000

Timing of HP Opteron Multicore as aRound trips
                                    function of number of simultaneous two-
     way service messages processed (November 2006 DSS Release)
                                 CGL Measurements of Axis 2 shows about 500 microseconds – DSS is 10 times better
                                                                 PC07Intro gcf@indiana.edu                       47

								
To top