; part2
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

part2

VIEWS: 0 PAGES: 44

  • pg 1
									                                                SMiLE



SMiLE Shared Memory Programming
    Wolfgang Karl & Martin Schulz
    Lehrstuhl für Rechnertechnik und Rechnerorganisation
    Technische Universität München

    1st SCI Summer School
    October 2nd – 4th, 2000, Trinity College Dublin
SMiLE software layers
                                                      True Shared Memory
                                      Target applications / Test suites
                                                      Programming                        High-level
                                                                                         SMiLE
                         SISAL                                  SPMD        TreadMarks
                           on                   SISCI PVM        style      compatible
         NT              MuSE                                   model           API
        prot.                                                                            Low-level
        stack                                                                            SMiLE
                        AM 2.0               SS-lib     CML
                                                                     SCI-VM lib
                                      SCI-Messaging
 Raw SCI Programming                                                                     User/Kernel
                                                                                         boundary
        NDIS                           SCI Drivers &
                                                                          SCI-VM
        driver                          SISCI API
              SCI-Hardware: SMiLE & Dolphin adapter, HW-Monitor

Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                             2
Overview
 SCI              hardware DSM principle
 Using                  raw SCI via SISCI
 SCI-VM:                           A global virtual memory for SCI
 Shared                    Memory Programming Models
A           sample model: SPMD
 Outlook                       on the Lab-Session

Martin Schulz, SCI Summer School, Oct 2-4, Dublin                     3
                                         SMiLE



SMiLE Shared Memory Programming
     SCI hardware DSM principle
       Using raw SCI via SISCI
       SCI-VM: A global virtual memory for SCI
       Shared Memory Programming Models
       A sample model: SPMD
       Outlook on the Lab-Session
SCI-based PC clusters
                                                                PCs with
                                                                PCI-SCI adapter



                                                                SCI interconnect
                                               Global address
                                                   space



   Bus-like services
       Read-, write-, and synchronization transactions
 Hardware-supported DSM
 Global address space to access any physical memory
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                         5
Setup of communication
                                                               Export of physical memory
                                                                 SCI Physical address space
                                        Export
                                                               Mapping into virtual memory
                                                               User-level communication
                          SCI Ringlet




                                                                 Read/write to mapped segment
                                                                 NUMA characteristics
                                         Map
                                                               High performance
                                                   SCI           No protocol overhead
                                                 physical
                                                 address         No OS influence
                                                  space
                                                                 Latency: < 2.0 ms
                                                                 Bandwidth: > 80 MB/s
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                              6
SCI remote memory mappings

                           Process                                          Process
                            on A                                             on B

          Virtual address space on A                               Virtual address space on B


CPU MMU                                                   CPU MMU
                             Physical mem. on A                             Physical mem. on B

   PCI addr. on A                                             PCI addr. on B



  Node A                                                     SCI Bridge ATTs             Node B

                                              SCI physical address space
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                                7
Address Translation Table (ATT)
 Present                    on each SCI adapter
         Controls outbound communication
 4096                entries available
         Each entry controls between 4KB and 512KB
         256 MB total memory  64 KB per entry
 Many                   parameters per ATT entry, e.g.
         Prefetching & Buffering
         Enable atomic fetch & inc
         Control ordering
Martin Schulz, SCI Summer School, Oct 2-4, Dublin         8
Some more details
 Inbound                       communication
         No remapping
         Offset part of SCI physical addr. = physical addr.
         Basic protection mechanism using access window
 Outstanding                               transactions
         Up to 16 write streams
         Gather write operations to consequent addresses
 Read/Write                              Ordering
         Complete write transactions before read transaction
Martin Schulz, SCI Summer School, Oct 2-4, Dublin           9
                                          SMiLE



SMiLE Shared Memory Programming
        SCI hardware DSM principle
      Using raw SCI via SISCI
        SCI-VM: A global virtual memory for SCI
        Shared Memory Programming Models
        A sample model: SPMD
        Outlook on the Lab-Session
The SISCI API
 Standard                        low-level API for SCI-based systems
         Developed within the SISCI Project
                 Standard Software Infrastructure for SCI
         Implemented by Dolphin ICS (for many platforms)
 Goal:                 Provide the full SCI functionality
         Secure abstraction of user-level communication
         Based on the IRM (Interconnect Resource Mng.)
                 Global resource management
                 Fault tolerance & configuration mechanisms
         Comprehensive user-level API
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                 12
SISCI in the overall infrastructure
                                       Target applications / Test suites
                                                                                          High-level
                                                                                          SMiLE
                          SISAL                                  SPMD        TreadMarks
                            on                   SISCI PVM        style      compatible
          NT              MuSE                                   model           API
         prot.                                                                            Low-level
         stack                                         SISCI API / Library                SMiLE
                         AM 2.0               SS-lib   CML
                                                                      SCI-VM lib
                                       SCI-Messaging
                                                                                          User/Kernel
                                                                                          boundary
         NDIS                           SCI Drivers &
                                                                           SCI-VM
         driver                          SISCI API
               SCI-Hardware: SMiLE & Dolphin adapter, HW-Monitor

 Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                            13
Use of SISCI API
 Provides                       raw shared memory functionalities
         Based on individual shared memory segments
         No task and/or connection management
         High complexity due to low-level character
 Application                             area 1: Middleware
         E.g. base for efficient message passing layers
         Hide the low-level character of SISCI
 Application                             area 2: Special purpose applications
         Directly use the raw performance of SCI
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                        14
SISCI availability
  SISCI                   API directly available from Dolphin
          http://www.dolphinics.no/
          Multiplatform support
                  Windows NT 4.0 / 2000
                  Linux (currently for 2.2.x kernels)
                  Solaris 2.5.1/7/8 (Sparc) & 2.6/7 (X86)
                  Lynx OS 3.01 (x86 & PPC), VxWorks
                  True64 (first version available)

  Documentation                                     also On-line
          Complete user-library specification
          Sample codes
 Martin Schulz, SCI Summer School, Oct 2-4, Dublin                  15
Basic SISCI functionality
 Main      functionality:
      Basic shared segment management
         Creation of segments
         Sharing/Identification of segments
         Mapping of segments
 Support                      for Memory transfer
         Sequence control for data integrity
         Optimized SCI memcpy version
 Direct                  abstraction of the SCI principle
Martin Schulz, SCI Summer School, Oct 2-4, Dublin            16
Advanced SISCI functionality
 Other                 functionality
         DMA transfers
         SCI-based interrupts
         Store-barriers
 Implicit                     resource management
 Additional                           library in SISCI kit
         sisci_demolib
         Itself based on SISCI
         Easier Node/Adapter identification
Martin Schulz, SCI Summer School, Oct 2-4, Dublin             17
Allocating and Mapping of Segments
SCICreateSegment
SCIPrepareSegment
SCIMapLocalSegment
SCISetSegmentAvailable
                                                            SCIConnectSegment
                          Data Transfer                     SCIMapRemoteSegment

                                                            SCIUnMapSegment
SCIUnMapSegment                                             SCIDisconnectSegment
SCIRemoveSegment




Virt. Addr.                            Phys. Mem.             PCI Addr.   Virt Addr.

Martin Schulz, SCI Summer School, Oct 2-4, Dublin
                                                    Node A Node B             18
SISCI Deficits
 SISCI                   based on individual segments
         Individual segments with own address space
         Placed on a single node
         No global virtual memory
 No             task/thread control, minimal synchronization
         No full programming environment
         Has to rebuilt over and over again
 No             cluster global abstraction
         Configuration and ID management missing
Martin Schulz, SCI Summer School, Oct 2-4, Dublin        19
                                       SMiLE



SMiLE Shared Memory Programming
        SCI hardware DSM principle
        Using raw SCI via SISCI
      SCI-VM: Global virtual memory for SCI
        Shared Memory Programming Models
        A sample model: SPMD
        Outlook on the Lab-Session
SCI support for shared memory
 SCI              provides Inter-process shared memory
         Available through SISCI
 Shared      memory support not sufficient for
      true shared memory programming
         No support for virtual memory
         Each segment placed on a single node only
 Intra-process                               shared memory required
         Based on remote memory operations
         Software DSM support necessary
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                      22
Shared memory for clusters
 Goal:                 SCI Virtual Memory
            Flexible, general global memory abstraction
            One global virtual address space across nodes
            Total transparency for the user
            Direct utilization of SCI HW-DSM
 Hybrid                    DSM based system
         HW-DSM for implicit communication
         SW-DSM component for system management
 Platform:                        Linux & Windows NT
Martin Schulz, SCI Summer School, Oct 2-4, Dublin            23
SCI-VM design
 Global                    process abstraction
         Team on each node host threads
         Each team needs identical virtual memory view
 Virtual                    memory distributed at page granularity
         Map local pages conventionally via MMU
         Map remote pages with the help of SCI
 Extension                          of the virtual memory concept
         Use memory resources across nodes transparently
         Modified VM-Manager necessary
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                    24
Design of the SCI-VM

                          Team on A                       Global process             Team on B
                                                         abstraction with
                                                       SCI Virtual Memory

             Virtual address space on A                                     Virtual address space on B
                   1 2 3 4 5 6 7 8                                               1 2 3 4 5 6 7 8


                               Physical mem. on A                                      Physical mem. on B
    PCI addr. on A                                                    PCI addr. on B
                                    4                   6                                                1
                      1                                                    4 6


   Node A                                                                                           Node B
                                             4                6                  1
                                                    SCI physical address space

Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                                            25
Implementation challenges
     Integration                           into virtual memory management
             Virtual memory mappings at page granularity
             Utilization of paged memory
     Integration                           into SCI driver stack
             Enable fast, fine grain mappings of remote pages
     Caching                       of remote memory
             SCI over PCI does not support cache consistency
             Caching necessary for read performance
             Solution: Caching & Relaxing consistency
             Very similar to LRC mechanisms
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                       26
Implementation design
     Modular                        implementation
             Extensibility and Flexibility
     VMM                     extension
             Manage MMU mappings directly
             Kernel driver required
     Synchronization                               operations
             Based on SCI atomic transactions
     Consistency                               enforcing mechanisms
             Often in coordination with synchronization points
             Flush buffers and caches
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                      27
PCI/SCI & OS Problems
     Missing                       transparency of the hardware
             SCI adapter and PCI host bridge create errors
             Shmem model expects full correctness
     Integration
             SCI Integration done by IRM extension
             Missing OS Integration leads to instabilities
             Real OS integration for Linux in the works
     Locality                      problems
             Often the cause for bad performance
             Easy possibility for optimizations necessary
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                  28
Implementation status
     SCI-VM                          prototype running
                Windows NT & Linux
                Static version (no remapping)
                Mostly user level based
                VMM extensions for both OSs
     Current                     work
             Experiments with more applications
                     Includes comparison with SW-DSM systems
             Adding dynamic version of SCI-VM
                     Real extension of virtual memory management
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                   29
                                          SMiLE



SMiLE Shared Memory Programming
        SCI hardware DSM principle
        Using raw SCI via SISCI
        SCI-VM: A global virtual memory for SCI
      Shared Memory Programming Models
        A sample model: SPMD
        Outlook on the Lab-Session
Multi-Programming Model support
 Abundance                              of shared memory models
         In contrast to Message Passing (MPI & PVM)
         Few standardization approaches
                 OpenMP, HPF, Posix threads
         Many DSM systems with own API (historically)
 Negative                        impact on portability
         Frequent retargeting necessary
 Required:                          Multi-Programming Model support
         One system with many faces
         Minimize work for support of additional models
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                  32
Flexibility of HAMSTER
     HAMSTER:       Hybrid-dsm based Adaptive and
          Modular Shared memory archiTEctuRe
     Goal:
             Support for arbitrary Shmem models
                     Standard models -> portability of applications
                     Domain specific models
             SCI-VM as efficient DSM core for clusters
     Export                    of various shared memory services
             Strongly modularized
             Many parameters for easy tuning
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                      33
HAMSTER in the infrastructure
                                      Target applications / Test suites
 HAMSTER                                                                                 High-level
                                                                                         SMiLE
   Hybrid DSM based
           SISAL                                                SPMD        TreadMarks
   Adaptive and
             on    SISCI PVM                                     style      compatible
      NT
   Modular MuSE                                                 model           API
     prot.                                                                               Low-level
   Shared Memory
     stack                                                                               SMiLE
           AM 2.0
   archiTEctuRe SS-lib    CML
                                                                     SCI-VM lib
                                      SCI-Messaging
                                                                                         User/Kernel
                                                                                         boundary
        NDIS                           SCI Drivers &
                                                                          SCI-VM
        driver                          SISCI API
              SCI-Hardware: SMiLE & Dolphin adapter, HW-Monitor

Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                            34
HAMSTER framework
                                        Shared Memory application
                       Arbitrary Shared Memory model
                                                                              VI-like
     Clus.Ctrl.




                     Task                     Sync.      Cons.      Mem.       comm.
                     Mgmt.                    Mgmt.      Mgmt.      Mgmt.      access
                                                                              through
                                                      SCI-VM: Hybrid DSM
                                                                                HW-
                   Standalone OS                           for SCI clusters
                                                                                DSM
                  Linux & WinNT                       NIC driver

          Cluster built of                                   SAN with HW-DSM
          commodity PC hardware

Martin Schulz, SCI Summer School, Oct 2-4, Dublin                                   35
HAMSTER status
 Prototype                         running
         Based on the current SCI-VM
         Most modules more or less completed
 Available                         programming models
         SPMD (incl. SMP version), Jia-Jia
         TreadMarks ™, ANL macros (for SPLASH 2)
         POSIX thread API (Linux), Win32 threads (WinNT)
 Current                     work
         Further programming models (for comparison)
         Evaluation using various applications
Martin Schulz, SCI Summer School, Oct 2-4, Dublin        36
                                          SMiLE



SMiLE Shared Memory Programming
        SCI hardware DSM principle
        Using raw SCI via SISCI
        SCI-VM: A global virtual memory for SCI
        Shared Memory Programming Models
      A sample model: SPMD
        Outlook on the Lab-Session
What is SPMD ?
 Single                  Program Multiple Data
         The same code is executed on each node
         Each node works on separate data
 Characteristics                                   of SPMD
         Static task model (no dynamic spawn)
                 Fixed assignment of node IDs
         Main synchronization construct: Barriers
         All data stored in a globally accessible memory
         Release Consistency

Martin Schulz, SCI Summer School, Oct 2-4, Dublin             39
SMPD model
   Task / Node management                               Locks
      spmd_getNodeNumber                                  spmd_allocLock
      spmd_getNodeCount                                   spmd_lock
                                                           spmd_unlock
   Memory management
                                                         Synchronization routines
      spmd_alloc
                                                           spmd_sync
      spmd_allocOpt
                                                           Atomic counters
   Barrier                                              Misc
      spmd_allocBarrier                                 spmd_getGlobalMem
      spmd_barrier                                 Coreroutines
                                                          spmd_
                                                          distInitialData
                                                    used today
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                             40
Sample SPMD application
 Successive                            Over-Relaxation (SOR)
         Iterative method for solving PDEs
         Advantage: Straightforward parallelization
 Main                 concepts of SOR
         Dense matrix
         For each iteration
                 Go through whole matrix
                 Update points

            U(i - 1, j)  U(i  1, j)  U(i, j - 1)  U(i, j  1)
U (i, j ) 
                                      4
Martin Schulz, SCI Summer School, Oct 2-4, Dublin               41
Task distribution
                                                      Dense   Matrix
                                                      Boundary
                                                      Split Matrix
                                                       e.g. 2 nodes
                                                      Local boundaries
                                                      Overlap area with
                                                       communication
                                                      Possibility for
                                                       locality optim.
 Martin Schulz, SCI Summer School, Oct 2-4, Dublin                   42
SPMD code for SOR application
int count = spmd_getNodeCount();
int number = spmd_getNodeNumber();
start_row                                = (N / count) * number;
end_row                                  = (N / count) * (number+1);
int barrier = spmd_allocBarrier();
float* matr = (float*) spmd_alloc(size);
Do iter iterations
   Work on Matrix start_row – end_row;
   spmd_barrier(barrier);
If (number == 0)
   Print the result;
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                43
Configuration & Startup
 Configuration                                     file   ~/.hamster
         Placed in home dir.                               # Name ID      CPUs
         SMP description                                   smile1     4    2
          (for certain models)                              smile2    68    2

 Startup
         Install SCI-VM driver and IRM extension
         Application execution
                 Shell on each node to run on
                 Execute program with identical parameters
         Output on the individual shells
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                            44
                                          SMiLE



SMiLE Shared Memory Programming
        SCI hardware DSM principle
        Using raw SCI via SISCI
        SCI-VM: A global virtual memory for SCI
        Shared Memory Programming Models
        A sample model: SPMD
      Outlook on the Lab-Session
A warning for the Lab-Session
 SCI-VM                         is still a prototype
         First time out of the lab 
         Remaining problems
                 Transparency of PCI bridges
                 OS integration

 Hang-Ups                            may occur – In this case:
            Remain calm and do not panic
            Follow the instructions of the crew
            Emergency exits on both sides
            Most of the times: A restart/reboot will do the trick
Martin Schulz, SCI Summer School, Oct 2-4, Dublin                 47
Outline
 Lab              Setup
 Experiments                                with the SISCI API
         SISCI sample code
         Add segment management code
 Experiments                                with the SPMD model
         SCI-VM / HAMSTER installation
         Given: seq. SOR application
         Goal: Parallelization


Martin Schulz, SCI Summer School, Oct 2-4, Dublin                  48
Before the Lab-Session




                           http://smile.in.tum.de/
                           http://hamster.in.tum.de/
Martin Schulz, SCI Summer School, Oct 2-4, Dublin      49

								
To top