Parallel Performance Mapping_ Diagnosis_ and Data Mining ParCo
Shared by: zhouwenjuan
-
Stats
- views:
- 3
- posted:
- 10/17/2012
- language:
- Unknown
- pages:
- 45
Document Sample


Parallel Performance Mapping,
Diagnosis, and Data Mining
Allen D. Malony, Sameer Shende, Li Li, Kevin Huck
{malony,sameer,lili,khuck}@cs.uoregon.edu
Department of Computer and Information Science
Performance Research Laboratory
University of Oregon
Research Motivation
Tools for performance problem solving
Empirical-based performance optimization process
Performance technology concerns
Performance
Performance Tuning
Technology hypotheses
Performance
• Experiment Diagnosis Performance
management properties Technology
• Performance Performance
storage Experimentation • Instrumentation
characterization • Measurement
Performance • Analysis
Observation • Visualization
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 2
Challenges in Performance Problem Solving
How to make the process more effective (productive)?
Process may depend on scale of parallel system
What are the important events and performance metrics?
Tied to application structure and computational model
Tied to application domain and algorithms
Process and tools can/must be more application-aware
Tools have poor support for application-specific aspects
What are the significant issues that will affect the
technology used to support the process?
Enhance application development and benchmarking
New paradigm in performance process and technology
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 3
Large Scale Performance Problem Solving
How does our view of this process change when we
consider very large-scale parallel systems?
What are the significant issues that will affect the
technology used to support the process?
Parallel performance observation is clearly needed
In general, there is the concern for intrusion
Seen as a tradeoff with performance diagnosis accuracy
Scaling complicates observation and analysis
Performance data size becomes a concern
Analysis complexity increases
Nature of application development may change
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 4
Role of Intelligence, Automation, and Knowledge
Scale forces the process to become more intelligent
Even with intelligent and application-specific tools, the
decisions of what to analyze is difficult and intractable
More automation and knowledge-based decision making
Build automatic/autonomic capabilities into the tools
Support broader experimentation methods and refinement
Access and correlate data from several sources
Automate performance data analysis / mining / learning
Include predictive features and experiment refinement
Knowledge-driven adaptation and optimization guidance
Will allow scalability issues to be addressed in context
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 5
Outline of Talk
Performance problem solving
Scalability, productivity, and performance technology
Application-specific and autonomic performance tools
TAU parallel performance system (Bernd said “No!”)
Parallel performance mapping
Performance data management and data mining
Performance Data Management Framework (PerfDMF)
PerfExplorer
Model-based parallel performance diagnosis
Poirot and Hercule
Conclusions
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 6
TAU Performance System
event
selection
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 7
Semantics-Based Performance Mapping
Associate
performance
measurements
with high-level
semantic
abstractions
Need mapping
support in the
performance
measurement
system to assign
data correctly
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 8
Hypothetical Mapping Example
Particles distributed on surfaces of a cube
Particle* P[MAX]; /* Array of particles */
int GenerateParticles() {
/* distribute particles over all faces of the cube */
for (int face=0, last=0; face < 6; face++){
/* particles on this face */
int particles_on_this_face = num(face);
for (int i=last; i < particles_on_this_face; i++) {
/* particle properties are a function of face */
P[i] = ... f(face);
...
}
last+= particles_on_this_face;
}
}
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 9
Hypothetical Mapping Example (continued)
int ProcessParticle(Particle *p) {
/* perform some computation on p */
}
int main() { work
packets
GenerateParticles();
/* create a list of particles */
engine
for (int i = 0; i < N; i++)
/* iterates over the list */
ProcessParticle(P[i]);
}
How much time (flops) spent processing face i particles?
What is the distribution of performance among faces?
How is this determined if execution is parallel?
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 10
No Performance Mapping versus Mapping
Typical performance TAU’s performance
tools report performance mapping can observe
with respect to routines performance with respect
Does not provide support to scientist’s
for mapping programming and
problem abstractions
TAU (no mapping) TAU (w/ mapping)
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 11
Performance Mapping Approaches
ParaMap (Miller and Irvin)
Low-level performance to high-level source constructs
Noun-Verb (NV) model to describe the mapping
noun is an program entity
verb represents an action performed on a noun
sentences (nouns and verb) map to other sentences
Mappings: static, dynamic, set of active sentences (SAS)
Semantics Entities / Abstractions/ Associations (SEAA)
Entities defined at any level of abstraction (user-level)
Attribute entity with semantic information
Entity-to-entity associations
Target measurement layer and asynchronous operation
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 12
SEAA Implementation
Two association types (implemented in TAU API)
Embedded – extends associated
object to store performance
measurement entity
External – creates an external
look-up table using address of …
object as key to locate performance
measurement entity
Implemented in TAU API
Applied to performance measurement problems
callpath/phase profiling, C++ templates, …
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 13
Uintah Problem Solving Environment (PSE)
Uintah component architecture for Utah C-SAFE project
Application programmers provide:
description of computation (tasks and variables)
code to perform task on single “patch” (sub-region of space)
Components for scheduling, partitioning, load balance, …
Uintah Computational Framework (UCF)
Execution model based on software (macro) dataflow
computations expressed a directed acyclic graphs of tasks
input/outputs specified for each patch in a structured grid
Abstraction of global single-assignment memory
Task graph gets mapped to processing resources
Communications schedule approximates global optimal
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 14
Uintah Task Graph (Material Point Method)
Diagram of named tasks
(ovals) and data (edges)
Imminent computation
Dataflow-constrained
MPM
Newtonian material point
motion time step
Solid: values defined at
material point (particle)
Dashed: values defined at
vertex (grid)
Prime (’): values updated
during time step
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 15
Task Execution in Uintah Parallel Scheduler
Profile methods
and functions in
scheduler and in
MPI library
Task execution time Task execution
dominates (what task?) time distribution
MPI communication
overheads (where?)
Need to map
performance data!
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 16
Mapping Instrumentation in UCF (example)
Use TAU performance mapping API
void MPIScheduler::execute(const ProcessorGroup * pc,
DataWarehouseP & old_dw,
DataWarehouseP & dw ) {
...
TAU_MAPPING_CREATE(
task->getName(), "[MPIScheduler::execute()]",
(TauGroup_t)(void*)task->getName(), task->getName(), 0);
...
TAU_MAPPING_OBJECT(tautimer)
TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void*)task->getName());
// EXTERNAL ASSOCIATION
...
TAU_MAPPING_PROFILE_TIMER(doitprofiler, tautimer, 0)
TAU_MAPPING_PROFILE_START(doitprofiler,0);
task->doit(pc);
TAU_MAPPING_PROFILE_STOP(0);
...
}
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 17
Task Performance Mapping (Profile)
Mapped task
performance
across processes
Performance
mapping for
different tasks
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 18
Work Packet – to – Task Mapping (Trace)
Work packet
computation
events colored
by task type
Distinct phases of
computation can be
identifed based on task
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 19
Comparing Uintah Traces for Scalability Analysis
8 processes
32 processes
32 processes 8 processes
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 20
Important Questions for Application Developers
How does performance vary with different compilers?
Is poor performance correlated with certain OS features?
Has a recent change caused unanticipated performance?
How does performance vary with MPI variants?
Why is one application version faster than another?
What is the reason for the observed scaling behavior?
Did two runs exhibit similar performance?
How are performance data related to application events?
Which machines will run my code the fastest and why?
Which benchmarks predict my code performance best?
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 21
Performance Problem Solving Goals
Answer questions at multiple levels of interest
Data from low-level measurements and simulations
use to predict application performance
High-level performance data spanning dimensions
machine, applications, code revisions, data sets
examine broad performance trends
Discover general correlations application performance
and features of their external environment
Develop methods to predict application performance on
lower-level metrics
Discover performance correlations between a small set
of benchmarks and a collection of applications that
represent a typical workload for a given system
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 22
Empirical-Based Performance Optimization
Experiment
management Process
Experiment Performance
Schemas
Tuning
hypotheses
Performance
Diagnosis
properties
Experiment Performance
Trials Experimentation
characterization
Performance
Observation
observability
requirements ?
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 23
Performance Data Management Framework
ICPP 2005 paper
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 24
PerfExplorer (K. Huck, Ph.D. student, UO)
Performance knowledge discovery framework
Use the existing TAU infrastructure
TAU instrumentation data, PerfDMF
Client-server based system architecture
Data mining analysis applied to parallel performance data
comparative, clustering, correlation, dimension reduction, ...
Technology integration
Relational DatabaseManagement Systems (RDBMS)
Java API and toolkit
R-project / Omegahat statistical analysis
WEKA data mining package
Web-based client
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 25
PerfExplorer Architecture
SC’05 paper
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 26
PerfExplorer Client GUI
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 27
Hierarchical and K-means Clustering (sPPM)
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 28
Miranda Clustering on 16K Processors
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 29
Parallel Performance Diagnosis
Performance tuning process
Process to find and report performance problems
Performance diagnosis: detect and explain problems
Performance optimization: performance problem repair
Experts approach systematically and use experience
Hard to formulate and automate expertise
Performance optimization is fundamentally hard
Focus on the performance diagnosis problem
Characterize diagnosis processes
How it integrates with performance experimentation
Understand the knowledge engineering
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 30
Parallel Performance Diagnosis Architecture
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 31
Performance Diagnosis System Architecture
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 32
Problems in Existing Diagnosis Approaches
Low-level abstraction of properties/metrics
Independent of program semantics
Relate to component structure
not algorithmic structure or parallelism model
Insufficient explanation power
Hard to interpret in the context of program semantics
Performance behavior not tied to operational parallelism
Low applicability and adaptability
Difficult to apply in different contexts
Hard to adapt to new requirements
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 33
Poirot Project
Lack of a formal theory of diagnosis processes
Compare and analyze performance diagnosis systems
Use theory to create system that is automated / adaptable
Poirot performance diagnosis (theory, architecture)
Survey of diagnosis methods / strategies in tools
Heuristic classification approach (match to characteristics)
Heuristic search approach (based on problem knowledge)
Problems
Descriptive results do not explain with respect to context
users must reason about high-level causes
Performance experimentation not guided by diagnosis
Lacks automation
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 34
Model-Based Approach
Knowledge-based performance diagnosis
Capture knowledge about performance problems
Capture knowledge about how to detect and explain them
Where does the knowledge come from?
Extract from parallel computational models
Structural and operational characteristics
Associate computational models with performance
Do parallel computational models help in diagnosis?
Enables better understanding of problems
Enables more specific experimentation
Enables more efffective hypothesize testing and search
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 35
Implications for Performance Diagnosis
Models benefit performance diagnosis
Base instrumentation on program semantics
Capture performance-critical features
Enable explanations close to user’s understanding
of computation operation
of performance behavior
Reuse performance analysis expertise
on the commonly-used models
Model examples
Master-worker model Pipeline
Divide-and-conquer Domain decomposition
Phase-based Compositional
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 36
Hercule Project
Goals of automation , adaptability, validation
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 37
Approach
Make use of model knowledge to diagnose performance
Start with commonly-used computational models
Engineering model knowledge
Integrate model knowledge with performance
measurement system
Build a cause inference system
define “causes” at parallelism level
build causality relation
between the low-level “effects” and the “causes”
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 38
Master-Worker Parallel Computation Model
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 39
Performance Diagnosis Inference Tree (MW)
Low speedup
2.Fine-granularity 3.Master-being-bottleneck 1.Insufficient-parallelism 4. Some workers
+ + + Noticeably inefficient
+
Large amount master-assign-task The workers waited quite
of message Num of reqs in time significant a while in master queue in
exchanged master queue > Κ1 Some time intervals
every time in some time intervals Init. or final. +
time significant
Waiting long time for Time imbalance
Such intervals Master assigning each
>Κ2 individual task
Such intervals
<Κ2
+
: Observation
Worker number : Hypotheses
saturation : Causes
number : priority
Worker Κi : threshold
starvation + : coexistence
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 40
Knowledge Engineering - Abstract Event (MW)
Use CLIPS expert system building tool
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 41
Diagnosis Results Output (MW)
QuickTime™ an d a
TIFF (LZW) decomp resso r
are need ed to see this picture .
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 42
Experimental Diagnosis Results (MW)
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 43
Concluding Discussion
Performance tools must be used effectively
More intelligent performance systems for productive use
Evolve to application-specific performance technology
Deal with scale by “full range” performance exploration
Autonomic and integrated tools
Knowledge-based and knowledge-driven process
Performance observation methods do not necessarily
need to change in a fundamental sense
More automatically controlled and efficiently use
Support model-driven performance diagnosis
Develop next-generation tools and deliver to community
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 44
Support Acknowledgements
Department of Energy (DOE)
Office of Science contracts
University of Utah ASCI Level 1
sub-contract
ASC/NNSA Level 3 contract
NSF
High-End Computing Grant F c
c
Qu i k Ti me ™ a d a
TIF (Un o m p s s e ) d e o m p s s o
e
re d c
n
i
re
a re n e e d d to s e e th i s p c tu re.
r
Research Centre Juelich
John von Neumann Institute
Dr. Bernd Mohr
Los Alamos National Laboratory
ParCo 2005 Parallel Performance Mapping, Diagnosis, and Data Mining 45
Get documents about "