Docstoc

CIS669 Distributed and Parallel Processing

Document Sample
CIS669 Distributed and Parallel Processing Powered By Docstoc
					        CIS669
Distributed and Parallel
       Processing


      Spring 2002
   Professor Yuan Shi
Distributed Processing

Transaction-oriented
Geographically dispersed locations
I/O intense
Database-central
 Parallel Processing

Non-transactional, single goal
computing
Computing intense and/or data-intense
May or may not involve databases
       Is There a Real
         Difference?
Not in terms of functionality and resource use
intensity.
For transactional systems, there are OLAP
(Online Analysis Processing) and data mining
tools that is computing intense and single
goal-oriented.
For parallel processing, many
scientific/engineering applications need to
interact with databases to make more
accurate calculations.
    Parallelism and
Programming Difficulties
 For distributed processing, parallelism is
 given and usually cannot easily change.
 Programming is relatively easy.
 For parallel processing, the programmer
 defines parallelism by partitioning the
 serial program(s). Parallel programming
 in general is more difficult than
 transaction applications.
       This picture is
        changing…
Industrial-strength distributed
applications are evolving into more
parallel-like.
Lab-based parallel applications are
blending into industrial strength
applications by incorporating
transactions.
    Why Clusters (the
      textbook)?
We have tried all others: vector,
dataflow, NUMA, hypercube, 3D-torus,
etc.
Parallel programming does not get
easier with any configuration.
Cluster promises the most potential for
cost/performance. Check this out:
 Types of Parallelism
     (Flynn,1972*)
1.SIMD (Single Instruction Multiple Data)
2.MIMD (Multiple Instruction Multiple
  Data)


3.MISD (Pipeline)
*Flynn, M., Some Computer Organizations and Their
Effectiveness, IEEE Trans. Comput., Vol. C-21, pp. 94,
1972.
** Other taxonomies to categorize parallel machines. (see
http://csep1.phy.ornl.gov/ca/node11.html)
     SIMD
D1    I
D2          Tseq=4
      I
            Tpar=1
D3    I     Sp =
D4          Tseq/Tpar=4=P
      I
     MIMD
D1    I1
D2          Tseq=4
      I2
            Tpar=1
D3    I3    Sp =
D4          Tseq/Tpar=4=P
      I4
        Pipeline(MISD)

D4 D3 D2 D1   I1     I2      I3     I4


              Tseq=4x4=16
              Tpar=4+3=7
              Sp = Tseq/Tpar~=2.3
Machines that can work
      in parallel
Cray: X-MP, Y-MP, T3D.
TMC: CM1-CM5.
Kendal Square Research: KSR-1
SGI: Power Challenge,Origin
IBM: 3090, SP2…
PCs
             History
Single CPU: Smaller size -> faster
speed (Cray, remember Moore’s Law?)
Muti-CPU: Share memory or not share
memory?
The war between Big-iron and Many
irons: Cray against TMC.
Result: All lost. Cluster won by survival.
          State of Art
Symmetric Multiprocessing is still the only
practical industrial application. Vendors
include HP, Sun, SGI, IBM, Compaq/Tandem,
Status.
Special purpose, small scale multiprocessors:
CISCO routers, SSL processors, MPEG
decoders, etc.
Special purpose massively parallel
Processors are designed for special types of
applications, such as human genome
classification, nuclear accelerator simulation,
fluid-dynamic simulations, etc.
    Hardware Technology Advances*




* Credit: Gordon Bell
      Everything cyberizable
       will be in Cyberspace
         and covered by a
     hierarchy of computers!
                                       Body
          Continent     Region/        Cars…
                        Intranet        phys. nets
                                        Home…
                                 Campus buildings

       World

* Credit: Gordon Bell
    Distributed
Programming Tools
•C/C++ with TCP/IP
•Perl with TCP/IP
•Java
•Corba
•ASP
•.Net
Parallel Programming
         Tools
PVM
MPI
Synergy
Others (proprietary hardware)
    Semester Outline
Parallel programming
Architecture and performance evaluation
Distributed programming
Architecture and performance evaluation
Project selection
Project implementation
Presentation
Parallel Programming
     Difficulties
Program partition and allocation
Data partition and allocation
Program(process) synchronization
Data access mutual exclusion
Dependencies
Process(or) failures
Scalability…
Meeting the Challenge
Use the Stateless Parallel Processing
principle. (U.S. Patent: #5,517,656, May
1996).

Advantages:
   High performance – automatic formation of SIMD,
    MIMD and MISD clusters at runtime.
   Runtime add/subtract processors allows for
    ultimate scalability.
   It is the ONLY multiprocessor architecture
    designed with fault tolerance in mind.
   Ease of programming – no mutual exclusion
    problems, automatic tools possible.
    Stateless Parallel
       Processing
A stateless program is any program
whose execution does not hard-wire
and does not incur side-effects on ANY
global information.
Non-stateless program example: All
PVM/MPI programs. Since they create
processes with IDs(global information).
       Why Stateless
        Programs?
A stateless program can execute on any
processor. This allows dynamic
formation of SIMD, MIMD and MISD
clusters at runtime.
Only stateless programs can promise
the ultimate scalability (adding a
processor on the fly) and fault tolerance
(loosing a processor on the fly).
Stateless Parallel Processor

            Processor      Processor



Processor       High Speed Switch        Processor


             Processor     Processor


                                       Unidirectional Ring
 Operations of A Stateless
    Parallel Processor
The shared disk stores ALL stateless
programs.
The unidirectional ring flows control tuples of
two types: read and exclusive read. Read
tuples drops off the ring after on rotation.
Exclusive-read tuples drops of the ring after
being consumed.
Each processor can execute ANY stateless
program from the shared disk.
Control tuples carry data locations to allow
direct data access via high speed switch.
 How does a stateless
    system start?
An initialization program sends initial ER tuple(s) onto
the ring.
It fires up all dependent programs on multiple
processors (MIMD).
Newly generated tuples fire up more programs.
A SIMD cluster forms when a stateless program can
accept multiple tuple values (MD).
MISD (pipeline) forms when multiple processors form
a chain of dependency with sufficient data supply.
 How do you get your
  hands on a SPP?
Synergy. Synergy is a close-
approximation of SPP. It uses a tuple
space to replace the unidirectional ring
(same function, but slower). Multiple
tuple spaces are used to simulate the
high speed switch.
Note: The absence of the high speed
switch costs great deal on performance.
Next: Parallel Program Performance Analysis

  Next week no lecture.
  Home Work1 (Due 2/4/02, submit .doc file to
  shi@cis.temple.edu with subject: 669 HW1)
  Reading: Textbook chapters 1-4.
  Problems:
     1. What is the most likely performance bottleneck of an
        SPP machine? Explain.
     2. Why the unidirectional ring? Explain.

     3. Is it possible to build an SPP system using cluster of
        PCs? How? What would you propose to make Synergy
        a true SPP system? Justify.
     4. Compare SMP (symmetric multiprocessor) with SPP.
        Explain pros and cons. Are they compatible?
     5. Compare SPP with Massively Parallel Processors.
        Explain pros and cons. Restrict discussion at
        architecture level.
     6. Design a stateless matrix multiplication system. How
        many programs do you need? Explain.How many forms
        of parallelisms can you find?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:1/23/2012
language:English
pages:29