COMPUTER STRUCTURAL
ARCHITECTURE DYNAMICS
EVALUATION COMPUTATIONS
FOR
Final Technical
Report
Project
Summary
Principal
Investigator:
Dr. Hilda Department
M. Standley Science
of Computer
February
i0, 1986-
August
7, 1989
The
University Ohio
of Toledo 43606
Toledo,
NASA
Lewis
Research
Center
Grant
( N A" A-C"-'LVALU_,TI_)H CO!,iOLITATI<_i,S: ] _ q I z r) Fj_ Cr]MPUr i_: p
Number:
NAG
3-699
_.i'ti'il[ _- T tJ-_iz C T C L_YC,,IAMILR
_TPUCTU_AL ?__:J_:CT
._LI_a_Ac>Y r-',<_h. I(Q'_:
rirl_l T AUI. C.qOL l'gciQ q_P n31uO
rr, c_l_icrl
tlol_o
_,-'i._-_i"tt univ.) in
10 _,
ABSTRACT
The intent of parallel end, three
of the proposed
effort
is the examination realized a language synthesis and
of the impact
of the elements To this
architectures major projects
on the performance are developed: technique for the prediction,
in a parallel for
computation.
the expression
of high-level interconnection
parallelism, networks shared
a statistical based upon
of multicomputer a queueing model
performance
for the analysis
of
memory
hierarchies.
INTRODUCTION
Parallel as the speed may
computer advantages be classified
architectures, of parallel
both commercial computation are
and theoretical, recognized. in which approach
are proliferating New architecture is given a
designs slight entirely disciplined design
as fundamentally in which remains architecture
"traditional" the standard that, because today
an old design
modification new design. approach,
or "radical" The fact
is discarded
in favor
of an
of the lack is very
of a well
developed, A
computer
design
much
trial and error.
is produced
and then evaluated
to determine
how good
it is.
This architectural diverse
study elements
considers having classes. be by
a
variety
of impact
parallel on
architectures one
and from
selects each
two of two
profound Models In
performance, by which the
architectural may described of the data memory
are developed one case
the performance of
for these
architectures networks, analysis
predicted. graphical about
performance be predicted In the other hierarchy
interconnection a statistical
properties, existing with
may networks.
through case,
collected
the performance is modeled
of a shared analytically.
multiprocessor
a memory
component
Relative EASY-FLOW traditional
to a11 large-grained is developed to assist languages.
parallel
computation,
a high-level of parallel tasks
parallel
language, of
with the expression
in the context
programming
A HIGH-LEVEL PARALLEL LANGUAGE Softwarefor
parallelism high-level directed specific multiple toward the computers offering parallel computation must provide the level from of a is
for the target task model
architecture, to low-level model
designating bit operations. which
a point The
on the spectrum effort to the
in this project message based passing, upon the
former
parallel
is specific language execution, sequencing,
multicomputer data flow
architecture. schema models of
A high-level data-dependency directed
parallel directed execution:
is developed
the three
incorporating branching,
fundamental
of control
and looping.
Data
flow
computing
is based
upon
the notion instead between
that the execution
of a computation from the
may be initiated "flow which at one of control." consume computation
by the availability Data input data may values
of data, "flow"
of by a sequence computations, as results.
determined triggering
executions
and produce be consumed
output
data
Results
that are produced establishing dependent may a data are be
at a subsequent
computation, that not
dependency constrained executed
between to execute in parallel.
the two
computations. Other
Computations computations
are data
in sequence.
so constrained
The objectives requires little retraining software at varying is developed.
of this language of conventional libraries, levels and (3)
design
project
are to:
(1)
develop (2)
a language
that
language expose
programmers, potential
provide both
for the reuse implicitly and
of existing explicitly language
parallelism To this end
of procedural
computation.
the EASY-FLOW
The has no
basic
unit of computation supplied C). by The
in EASY-FLOW a subprogram program above
is the atomic
unit (atomic
since it
substructure) (e.g.
written
in a conventional by EASY-FLOW them a
high-level gives a
language
FORTRAN, located
notation
provided
superstructure expressed consisting data
conceptually
the subprograms other than atomic,
and relates may
by explicitly substructure
dependencies. units related
Units,
have
of other
by data dependencies.
The scheduling
EASY-FLOW the execution
notation of units.
provides which
information
which
may by data
be
used
in
Units
are not constrained in time.
dependencies dependencies program
may be scheduled are made clear
to execute by the "single
in parallel assignment"
or overlapping rule:
The
data
any name
in an EASY-FLOW
is associated with only onevalue throughoutexecution. As an exceptionto this rule, the looping constructallows for the convenientupdateof a nameusedin iteration, but this may be done only in specifically isolated instanceswhich are clearly marked in the program. While the EASY-FLOW statements allow for the schedulingof units or tasks,the atomic units provide for the computationspecified in the program. Data values as parameters arepassedby assigningtheir valuesto actualparametervariablesto be used in a subprogramcall. Upon returning from the call, assignmentsare made from the returningparameters EASY-FLOW variablenames,thus shielding the EASY-FLOW to variablesfrom alterationwithin the subprogram. An EASY-FLOW compiler hasbeenwritten thatproducessequentialFORTRAN code(in order to determinefeasibility) for usewith FORTRAN subprograms.The data flow graph produced by the compiler is made sequential through application of a topological sort. A compiler to produceparallel FORTRAN code for a Transputer systemis currentlyin progress.
MULTICOMPUTER NETWORK
Inter-task processing system, pathways the for communication
SYNTHESIS
in in large network
a
multiprocess
computation realized.
may
dominate
time and determine interconnection messages network in alleviating
part the performance linking tasks the
In a multicomputer provides processors. the An will one to a
processing on
elements separate
passed
between
residing
interconnection clearly in which dissimilar assist
that closely
fits the pattern
of interprocess The
communication situation, be mapped to single through
the communications requirements network by single
overhead. of the
alternative must edges passing
the communications interconnection links or
application multiple to paths
mapping edges
physical multiple
communication processing
mapping delays
elemnets,
may cause
due to resulting
bottlenecks.
Previousinterconnectionnetwork designs
which network designer. techniques approach designs of matches to a degree has the pattern of intertask decision examines synthesis elements structure been an intuitive this project modeling and
have
incorporated
a regular The selection
network of the of the
communications. based upon the
experience and
As an alternative, used represents in a way in the
the use of statistical of interconnection
optimization This network
networks.
a way
to compare the synthesis other, perhaps
of diverse
interconnection
that allows and
of networks hybrid,
by selection networks that
of the best elements may offer better
existing
designs
performance.
A multidimensional (the dependent characteristics graph size, variable) (the average dominating performance completion regression, such
solution of existing
space networks
is constructed along of graphs. girth,
by considering
the performance and qualitative may include
with both quantitative Such characteristics
independent degree,
variables) radius,
diameter, and
node-connectivity, of prime message node
edge-connectivity, and edge cutsets. of
minimum Network message linear
set size, may
maximum
number
be described
by the average cost.
delay
or the ratio of stepwise Optimization may then
rate to network a polynomial surface surface
connection
By using
the method space. path
is developed methodology from
in the solution or steepest ascent
techniques
as response
be
used to optimize
the performance
variable
the polynomial
surface.
Screening those used that contribute to determine point
of the relatively
large
number
of independent value.
variables
may
eliminate is An upon the
little to the dependent local or global points
variable of
An optimization network
technique
"optimum" interconnection
performance. based
"optimum" values which indicate
is an indication independent corresponding
of an "ideal" variables.
network, vector
of the various does not have trends variable.
The
gradient
for an optimum variable values
point may
realistically-valued of greatest increase
independent
general
or direction(s)
in the value
of the dependent
(performance)
The optimization suitable information indicating levels. in "Optimal" the
process
produces
a ranking will designer
of desirable not follow in the
characteristics directly design from process,
and their this. The
network will
synthesis assist the
ranking
perhaps
unconventional
directions
in the choice
of network
elements.
QUEUEING MODEL FOR SHAREDMEMORY HIERARCHIES Interferencebetweenprocessors issuing requeststo a sharedmemorymay be a major factor in limiting performancein a shared memory multiprocessor system. Simultaneousrequeststo a single memory module cannotbe servicedsimultaneously. Only one requestmay be served,requiring the others to wait under some queueing scheme. Memory requestswaiting in a queue translateto processorsblocked from computationanda consequential degradation achieved in performance. The queueing model presentedis one for a hierarchyof memory modules. A hierarchy representsa realistic view of sharedmemory organization, with relatively small, high speed memories at the direct accesslevel and larger, slower response memoriesorganizedat moreremotelevelsof access.
An analytical mean waiting time model is developed, from based upon a general queueing at a memory model. module The is to of
for a request the time
a processor
to be served service delay
calculated, retrieve queue expected performance. differing found
including the data from
spent
in a queue module.
awaiting Queueing
and the time required on an estimate From
the memory service
is based
length
and the average of busy
time for a memory is computed with and
module used
access.
this the
number
memories
as the results modules
measure for several
of system systems
Analytic
results numbers
are compared of processors
simulated
in the relative to be high.
and memory
and the correlation
7
APPENDIX
A--EASY-FLOW
GRAMMAR
Modified 2/1/89
1) 2)
..-""'-..-
unit : input : output : ¢ndunit
Note: input and output called for. Semantics: unit). Record 3) 4) 5) 6)
are important
enough,
it was decided
to require
them
even if no explicit
ldO is
Make note of the unit id. Record declarations input list and output list, associated with this unit. ::= "..-"::-.,-
in symbol
table (associate
them with this
declare : I nil ,lnil I I
list> I nil
I unit> I call> outof:
7) 8)
::-::--
into : then else
Note: because
is treated unit would require another
as a subprogram call for now (see grammar). "Unitsets" used here input/output pair and this is already provided by the enclosing unit.
9)
"-..-
iter do reassign
Semantics: I0) 11) Note: 12)
Process
boolean unit>
expression ::= ::=
same as above (see ). distribute =
I nil
a may be nil. ::=
13)
::=
l
nil
Note: 14) 15)
may be nil. ..-""'-.. .. id>
=>
..-"..-'--
subprogram ( ) I nil
parameters>
parameters>
18)
::---
real l list>/ list>[ list>lnil
list>
Note: 19) 20) 21)
The nil above allows id> list>
declare: ::::= :>,
.
This is OK to emphasize
dimension
lnil
22) 23) 24) 25) 26) 27) 28)
list>
"'-..::= ::= list> ::= ..-"list> ::= ::=
Inil svariable list> list>lnil
)lnil subscript subscript
subscript list>
list> list>lnii
id>l
29) 30) 31)
list>
..-"::=
()lnil dimension dimension list> list>lnil
dimension
list>
::=
9
Note"
Three kinds of variable
list are provided
for: and allow only constant lists in units. dimensions.
lo
Used in declarations Used in input/output Other places allow
2. 3. Note:
are allowed.
subscripts. allows
any subscripts
is any unsubscripted
id (or simple
l0
APPENDIX
B--BIBLIOGRAPHIC
REFERENCES
(Copies
attached.) "A General Model Badie for Memory Interference in A Multiprocessor 1989 International System with
Memory on Parallel
Hierarchy," Processing,
A. Taha,
Hilda
M. Standley, 8-12,
Conference
pp. 1-225--I-232,
August
1989.
"Adapting Flow," Lewis
High-Level
Language
Programs NASA
for
Parallel
Processing Publication
Using 3003,
Data Vol. 1,
Structures May 24-25,
Technology--1988, 1988.
Conference
pp. 103--111,
"Modeling Standley Engineering, and D. 1988.
and Synthesis Steve Auxter,
of Multicomputer Technical Report,
Interconnection Dept. of
Networks," Computer
Hilda Science
M. and
"Multiprocessor Workshop on Computational
Architecture: Mechanics,
Synthesis November
and 1987.
Evaluation,"
NASA
Langley
"A Fifteenth
Very Annual
High
Level
Language
for
Large-Grained pp. 191-195,
Data February
Flow," 1987.
1987
ACM
Computer
Science
Conference,