Parallel Processing Architecture Overview

W
Document Sample
scope of work template
							           Parallel Processing:
          Architecture Overview
                   Subject Code: 433-498



                                                      WW Grid
Rajkumar Buyya
Grid Computing and Distributed Systems (GRIDS) Lab.
The University of Melbourne
Melbourne, Australia
www.gridbus.org
         Overview of the Talk

   Why Parallel Processing ?
   Parallel Hardwares
   Parallel Operating Systems
   Parallel Programming Paradigms
   Grand Challenges
           Computing Elements


                                                 Applications




                                         Programming paradigms
          Threads Interface
              Microkernel                    Operating System
  Multi-Processor Computing System
 P   P    P     P       P     ..     P          Hardware

P Processor    Thread              Process
                    Two Eras of Computing
                              Architectures
Sequential                       System Software/Compiler
Era
                                     Applications
                                          P.S.Es
Parallel
                                      Architectures
Era                                      System Software
                                             Applications
                                                 P.S.Es
             1940   50   60     70   80     90   2000            2030
                                 Commercialization
                         R&D                         Commodity
History of Parallel Processing

   PP can be traced to a tablet dated around
    100 BC.
       Tablet has 3 calculating positions.
       Infer that multiple positions:
            Reliability/ Speed
       Motivating factors

 Just as we learned to fly, not by
constructing a machine that flaps its
wings like birds, but by applying
aerodynamics principles


demonstrated by the nature...
 We modeled PP after those of
biological species.
        Motivating Factors


Aggregated speed with
   which complex calculations
   carried out by neurons-individual
   response is slow (ms) –
 demonstrate feasibility of PP
    Why Parallel Processing?

Computation requirements are ever
   increasing -- visualization, distributed
   databases, simulations, scientific
   prediction (earthquake), etc.

Sequential architectures reaching
   physical limitation (speed of light,
   thermodynamics)
Human Architecture! Growth Performance



                Vertical           Horizontal
   Growth




            5     10       15 20 25 30   35     40   45 . . . .
                                 Age
                        Computational Power
                           Improvement

                            Multiprocessor
C.P.I.




         Uniprocessor



          1             2. . . .
                  No. of Processors
         Why Parallel Processing?

The Tech. of PP is mature and can be
   exploited commercially; significant
   R & D work on development of tools &
   environment.

Significant development in Networking
   technology is paving a     way   for
   heterogeneous computing.
         Why Parallel Processing?

Hardware        improvements         like
  Pipelining, Superscalar, etc., are non-
  scalable and requires sophisticated
  Compiler Technology.

Vector Processing works well for
  certain kind of problems.
             Parallel Program has &
                     needs ...

 Multiple “processes” active simultaneously
  solving a given problem, general multiple
  processors.

 Communication  and synchronization of its
  processes (forms the core of parallel
  programming efforts).
Processing Elements
    Architecture
                Processing Elements

 Simple classification by Flynn:
   (No. of instruction and data streams)
      SISD - conventional
      SIMD - data parallel, vector computing
      MISD - systolic arrays
      MIMD - very general, multiple approaches.

 Current focus is on MIMD model, using
   general purpose processors.
    (No shared memory)
    SISD : A Conventional Computer




                               Instructions
         Data Input        Processor          Data Output



   Speed is limited by the rate at which computer can
               transfer information internally.
     Ex:PC, Macintosh, Workstations
         The MISD Architecture
                       Instruction
                       Stream A

                                 Instruction
                                 Stream B
                                               Instruction Stream C
           Processor
              A                                    Data
                                                   Output
Data                     Processor                 Stream
Input                       B
Stream
                                     Processor
                                        C

   More of an intellectual exercise than a practicle
configuration. Few built, but commercially not available
                      SIMD Architecture
                           Instruction
                             Stream




                                                     Data Output
Data Input     Processor                             stream A
stream A          A
                                                     Data Output
Data Input                  Processor
                                                     stream B
stream B                       B
                                         Processor   Data Output
Data Input                                           stream C
                                            C
stream C
                                                           Ci<= Ai * Bi

Ex: CRAY machine vector processing, Thinking machine cm*
            Intel MMX (multimedia support)
                      MIMD Architecture
                 Instruction Instruction Instruction
                  Stream A Stream B Stream C



                                                       Data Output
Data Input        Processor                            stream A
stream A             A
                                                       Data Output
Data Input                    Processor
                                                       stream B
stream B                         B
                                          Processor    Data Output
Data Input                                             stream C
                                             C
stream C

Unlike SISD, MISD, MIMD computer works asynchronously.
        Shared memory (tightly coupled) MIMD
        Distributed memory (loosely coupled) MIMD
  Shared Memory MIMD machine
                   Processor    Processor   Processor
                      A            B           C




                        M         M         M
                        E         E         E
                        M B       M B       M B
                        O U       O U       O U
                        R S       R S       R S
                        Y         Y         Y




                        Global Memory System

Comm: Source PE writes data to GM & destination retrieves it
 Easy to build, conventional OSes of SISD can be easily be ported

 Limitation : reliability & expandibility. A memory component or

  any processor failure affects the whole system.
 Increase of processors leads to memory contention.

  Ex. : Silicon graphics supercomputers....
                   Distributed Memory MIMD
                  IPC                                             IPC
                channel                                          channel

                          Processor    Processor     Processor
                             A            B             C



                               M          M           M
                               E          E           E
                               M B        M B         M B
                               O U        O U         O U
                               R S        R S         R S
                               Y          Y           Y




                           Memory       Memory      Memory
                           System A    System B     System C


   Communication : IPC on High Speed Network.
   Network can be configured to ... Tree, Mesh, Cube, etc.
   Unlike Shared MIMD
        easily/ readily expandable
        Highly reliable (any CPU failure does not affect the whole system)
             Laws of caution.....

   Speed of computers is proportional to the square of
    their cost.
                                  C
    i.e. cost =   Speed
                                          (speed = cost2)

                                               S



   Speedup by a parallel computer increases as the
    logarithm of the number of processors.
      Speedup = log2(no. of processors) S




                                                          P
             Caution....

 Very fast development in PP and related area
   have blurred concept boundaries,   causing lot
   of terminological confusion :       concurrent
   computing/ programming, parallel   computing/
   processing,     multiprocessing,    distributed
   computing, etc.
It’s hard to imagine a field
 that changes as rapidly as
         computing.
                   Caution....


Computer Science is Immature Science.
 (lack of standard taxonomy, terminologies)
                            Caution....

   Even well-defined distinctions like
    shared memory and distributed
    memory are merging due to new
    advances in technolgy.

   Good environments for developments
    and debugging are yet to emerge.
          Caution....
 There    is no strict delimiters for
   contributors to the area of parallel
   processing : CA,OS, HLLs, databases,
   computer networks, all have a role to
   play.
 This makes it a Hot Topic of Research
Operating Systems for
 High Performance
     Computing
         Types of Parallel Systems

Shared Memory Parallel
  Smallest extension to existing systems
  Program conversion is incremental
Distributed Memory Parallel
  Completely new systems
  Programs must be reconstructed
Clusters
  Slow communication form of Distributed
                Operating Systems for PP


   MPP systems having thousands of
    processors requires OS radically
    different fromcurrent ones.
   Every CPU needs OS :
       to manage its resources
       to hide its details

   Traditional systems are heavy,
    complex and not suitable for MPP
                 Operating System Models



   Frame work that unifies features,
    services and tasks performed
   Three approaches to building OS....
       Monolithic OS
       Layered OS
       Microkernel based OS
             Client server OS
             Suitable for MPP systems
   Simplicity, flexibility and high
    performance are crucial for OS.
              Monolithic Operating
                    System
           Application              Application
            Programs                 Programs
                                                  User Mode

                                                  Kernel Mode

                System Services




                         Hardware

   Better application Performance
   Difficult to extend            Ex: MS-DOS
                               Layered OS
            Application              Application
             Programs                 Programs
                                                   User Mode
                                                   Kernel Mode
                      System Services


                  Memory & I/O Device Mgmt



                  Process Schedule


                       Hardware


 Easier to enhance
 Each layer of code access lower level interface
 Low-application performance                  Ex : UNIX
                         Traditional OS
         Application              Application
          Programs                 Programs
                                                User Mode

                                                Kernel Mode




                       OS



                       Hardware

OS Designer
              New trend in OS design


Application                             Application
 Programs
                          Servers        Programs

                                    User Mode

                                    Kernel Mode

                          Microkernel


               Hardware
                 Microkernel/Client Server OS
                                  (for MPP Systems)

         Client        Thread             File        Network       Display
       Application       lib.            Server        Server       Server


                                                                       User

                                                                       Kernel
                           Microkernel
         Send
         Reply              Hardware



   Tiny OS kernel providing basic primitive (process, memory, IPC)
   Traditional services becomes subsystems
   Monolithic Application Perf. Competence
   OS = Microkernel + User Subsystems
                                             Ex: Mach, PARAS, Chorus, etc.
 Few Popular Microkernel Systems

MACH, CMU
PARAS, C-DAC
Chorus
QNX,
(Windows)

						
Related docs