Parallel Processing Architecture Overview
Document Sample


Parallel Processing:
Architecture Overview
Subject Code: 433-498
WW Grid
Rajkumar Buyya
Grid Computing and Distributed Systems (GRIDS) Lab.
The University of Melbourne
Melbourne, Australia
www.gridbus.org
Overview of the Talk
Why Parallel Processing ?
Parallel Hardwares
Parallel Operating Systems
Parallel Programming Paradigms
Grand Challenges
Computing Elements
Applications
Programming paradigms
Threads Interface
Microkernel Operating System
Multi-Processor Computing System
P P P P P .. P Hardware
P Processor Thread Process
Two Eras of Computing
Architectures
Sequential System Software/Compiler
Era
Applications
P.S.Es
Parallel
Architectures
Era System Software
Applications
P.S.Es
1940 50 60 70 80 90 2000 2030
Commercialization
R&D Commodity
History of Parallel Processing
PP can be traced to a tablet dated around
100 BC.
Tablet has 3 calculating positions.
Infer that multiple positions:
Reliability/ Speed
Motivating factors
Just as we learned to fly, not by
constructing a machine that flaps its
wings like birds, but by applying
aerodynamics principles
demonstrated by the nature...
We modeled PP after those of
biological species.
Motivating Factors
Aggregated speed with
which complex calculations
carried out by neurons-individual
response is slow (ms) –
demonstrate feasibility of PP
Why Parallel Processing?
Computation requirements are ever
increasing -- visualization, distributed
databases, simulations, scientific
prediction (earthquake), etc.
Sequential architectures reaching
physical limitation (speed of light,
thermodynamics)
Human Architecture! Growth Performance
Vertical Horizontal
Growth
5 10 15 20 25 30 35 40 45 . . . .
Age
Computational Power
Improvement
Multiprocessor
C.P.I.
Uniprocessor
1 2. . . .
No. of Processors
Why Parallel Processing?
The Tech. of PP is mature and can be
exploited commercially; significant
R & D work on development of tools &
environment.
Significant development in Networking
technology is paving a way for
heterogeneous computing.
Why Parallel Processing?
Hardware improvements like
Pipelining, Superscalar, etc., are non-
scalable and requires sophisticated
Compiler Technology.
Vector Processing works well for
certain kind of problems.
Parallel Program has &
needs ...
Multiple “processes” active simultaneously
solving a given problem, general multiple
processors.
Communication and synchronization of its
processes (forms the core of parallel
programming efforts).
Processing Elements
Architecture
Processing Elements
Simple classification by Flynn:
(No. of instruction and data streams)
SISD - conventional
SIMD - data parallel, vector computing
MISD - systolic arrays
MIMD - very general, multiple approaches.
Current focus is on MIMD model, using
general purpose processors.
(No shared memory)
SISD : A Conventional Computer
Instructions
Data Input Processor Data Output
Speed is limited by the rate at which computer can
transfer information internally.
Ex:PC, Macintosh, Workstations
The MISD Architecture
Instruction
Stream A
Instruction
Stream B
Instruction Stream C
Processor
A Data
Output
Data Processor Stream
Input B
Stream
Processor
C
More of an intellectual exercise than a practicle
configuration. Few built, but commercially not available
SIMD Architecture
Instruction
Stream
Data Output
Data Input Processor stream A
stream A A
Data Output
Data Input Processor
stream B
stream B B
Processor Data Output
Data Input stream C
C
stream C
Ci<= Ai * Bi
Ex: CRAY machine vector processing, Thinking machine cm*
Intel MMX (multimedia support)
MIMD Architecture
Instruction Instruction Instruction
Stream A Stream B Stream C
Data Output
Data Input Processor stream A
stream A A
Data Output
Data Input Processor
stream B
stream B B
Processor Data Output
Data Input stream C
C
stream C
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD
Distributed memory (loosely coupled) MIMD
Shared Memory MIMD machine
Processor Processor Processor
A B C
M M M
E E E
M B M B M B
O U O U O U
R S R S R S
Y Y Y
Global Memory System
Comm: Source PE writes data to GM & destination retrieves it
Easy to build, conventional OSes of SISD can be easily be ported
Limitation : reliability & expandibility. A memory component or
any processor failure affects the whole system.
Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers....
Distributed Memory MIMD
IPC IPC
channel channel
Processor Processor Processor
A B C
M M M
E E E
M B M B M B
O U O U O U
R S R S R S
Y Y Y
Memory Memory Memory
System A System B System C
Communication : IPC on High Speed Network.
Network can be configured to ... Tree, Mesh, Cube, etc.
Unlike Shared MIMD
easily/ readily expandable
Highly reliable (any CPU failure does not affect the whole system)
Laws of caution.....
Speed of computers is proportional to the square of
their cost.
C
i.e. cost = Speed
(speed = cost2)
S
Speedup by a parallel computer increases as the
logarithm of the number of processors.
Speedup = log2(no. of processors) S
P
Caution....
Very fast development in PP and related area
have blurred concept boundaries, causing lot
of terminological confusion : concurrent
computing/ programming, parallel computing/
processing, multiprocessing, distributed
computing, etc.
It’s hard to imagine a field
that changes as rapidly as
computing.
Caution....
Computer Science is Immature Science.
(lack of standard taxonomy, terminologies)
Caution....
Even well-defined distinctions like
shared memory and distributed
memory are merging due to new
advances in technolgy.
Good environments for developments
and debugging are yet to emerge.
Caution....
There is no strict delimiters for
contributors to the area of parallel
processing : CA,OS, HLLs, databases,
computer networks, all have a role to
play.
This makes it a Hot Topic of Research
Operating Systems for
High Performance
Computing
Types of Parallel Systems
Shared Memory Parallel
Smallest extension to existing systems
Program conversion is incremental
Distributed Memory Parallel
Completely new systems
Programs must be reconstructed
Clusters
Slow communication form of Distributed
Operating Systems for PP
MPP systems having thousands of
processors requires OS radically
different fromcurrent ones.
Every CPU needs OS :
to manage its resources
to hide its details
Traditional systems are heavy,
complex and not suitable for MPP
Operating System Models
Frame work that unifies features,
services and tasks performed
Three approaches to building OS....
Monolithic OS
Layered OS
Microkernel based OS
Client server OS
Suitable for MPP systems
Simplicity, flexibility and high
performance are crucial for OS.
Monolithic Operating
System
Application Application
Programs Programs
User Mode
Kernel Mode
System Services
Hardware
Better application Performance
Difficult to extend Ex: MS-DOS
Layered OS
Application Application
Programs Programs
User Mode
Kernel Mode
System Services
Memory & I/O Device Mgmt
Process Schedule
Hardware
Easier to enhance
Each layer of code access lower level interface
Low-application performance Ex : UNIX
Traditional OS
Application Application
Programs Programs
User Mode
Kernel Mode
OS
Hardware
OS Designer
New trend in OS design
Application Application
Programs
Servers Programs
User Mode
Kernel Mode
Microkernel
Hardware
Microkernel/Client Server OS
(for MPP Systems)
Client Thread File Network Display
Application lib. Server Server Server
User
Kernel
Microkernel
Send
Reply Hardware
Tiny OS kernel providing basic primitive (process, memory, IPC)
Traditional services becomes subsystems
Monolithic Application Perf. Competence
OS = Microkernel + User Subsystems
Ex: Mach, PARAS, Chorus, etc.
Few Popular Microkernel Systems
MACH, CMU
PARAS, C-DAC
Chorus
QNX,
(Windows)
Related docs
Get documents about "