Docstoc

Parallel Performance Mapping_ Diagnosis_ and Data Mining ParCo

Document Sample
Parallel Performance Mapping_ Diagnosis_ and Data Mining ParCo Powered By Docstoc
					  Parallel Performance Mapping,
   Diagnosis, and Data Mining


Allen D. Malony, Sameer Shende, Li Li, Kevin Huck
         {malony,sameer,lili,khuck}@cs.uoregon.edu


  Department of Computer and Information Science
         Performance Research Laboratory
               University of Oregon
  Research Motivation
     Tools for performance problem solving
            Empirical-based performance optimization process
            Performance technology concerns
                                        Performance
      Performance                         Tuning
      Technology                 hypotheses
                                        Performance
      • Experiment                       Diagnosis                        Performance
        management                 properties                             Technology
      • Performance                   Performance
        storage                      Experimentation                      • Instrumentation
                            characterization                              • Measurement
                                        Performance                       • Analysis
                                        Observation                       • Visualization
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining                 2
  Challenges in Performance Problem Solving
     How to make the process more effective (productive)?
     Process may depend on scale of parallel system
     What are the important events and performance metrics?
            Tied to application structure and computational model
            Tied to application domain and algorithms
     Process and tools can/must be more application-aware
            Tools have poor support for application-specific aspects
     What are the significant issues that will affect the
      technology used to support the process?
     Enhance application development and benchmarking
     New paradigm in performance process and technology
ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining   3
  Large Scale Performance Problem Solving
     How does our view of this process change when we
      consider very large-scale parallel systems?
     What are the significant issues that will affect the
      technology used to support the process?
     Parallel performance observation is clearly needed
     In general, there is the concern for intrusion
            Seen as a tradeoff with performance diagnosis accuracy
     Scaling complicates observation and analysis
            Performance data size becomes a concern
            Analysis complexity increases
     Nature of application development may change
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining   4
  Role of Intelligence, Automation, and Knowledge
     Scale forces the process to become more intelligent
     Even with intelligent and application-specific tools, the
      decisions of what to analyze is difficult and intractable
     More automation and knowledge-based decision making
     Build automatic/autonomic capabilities into the tools
            Support broader experimentation methods and refinement
            Access and correlate data from several sources
            Automate performance data analysis / mining / learning
            Include predictive features and experiment refinement
     Knowledge-driven adaptation and optimization guidance
     Will allow scalability issues to be addressed in context
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining   5
  Outline of Talk
     Performance problem solving
            Scalability, productivity, and performance technology
            Application-specific and autonomic performance tools
     TAU parallel performance system (Bernd said “No!”)
     Parallel performance mapping
     Performance data management and data mining
            Performance Data Management Framework (PerfDMF)
            PerfExplorer
     Model-based parallel performance diagnosis
            Poirot and Hercule
     Conclusions
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining   6
    TAU Performance System


  event
selection




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   7
  Semantics-Based Performance Mapping
     Associate
      performance
      measurements
      with high-level
      semantic
      abstractions
     Need mapping
      support in the
      performance
      measurement
      system to assign
      data correctly
ParCo 2005       Parallel Performance Mapping, Diagnosis, and Data Mining   8
  Hypothetical Mapping Example
     Particles distributed on surfaces of a cube
  Particle* P[MAX]; /* Array of particles */
  int GenerateParticles() {
    /* distribute particles over all faces of the cube */
    for (int face=0, last=0; face < 6; face++){
      /* particles on this face */
      int particles_on_this_face = num(face);
      for (int i=last; i < particles_on_this_face; i++) {
        /* particle properties are a function of face */
        P[i] = ... f(face);
        ...
      }
      last+= particles_on_this_face;
    }
  }
ParCo 2005       Parallel Performance Mapping, Diagnosis, and Data Mining   9
  Hypothetical Mapping Example (continued)
  int ProcessParticle(Particle *p) {
    /* perform some computation on p */
  }
  int main() {                                                              work
                                                                            packets
    GenerateParticles();
    /* create a list of particles */
                                                                               engine
    for (int i = 0; i < N; i++)
      /* iterates over the list */
      ProcessParticle(P[i]);
  }

     How much time (flops) spent processing face i particles?
     What is the distribution of performance among faces?
     How is this determined if execution is parallel?
ParCo 2005       Parallel Performance Mapping, Diagnosis, and Data Mining               10
  No Performance Mapping versus Mapping
     Typical performance                         TAU’s performance
      tools report performance                     mapping can observe
      with respect to routines                     performance with respect
     Does not provide support                     to scientist’s
      for mapping                                  programming and
                                                   problem abstractions
             TAU (no mapping)                                     TAU (w/ mapping)




ParCo 2005      Parallel Performance Mapping, Diagnosis, and Data Mining             11
  Performance Mapping Approaches
     ParaMap (Miller and Irvin)
            Low-level performance to high-level source constructs
            Noun-Verb (NV) model to describe the mapping
              noun  is an program entity
              verb represents an action performed on a noun
              sentences (nouns and verb) map to other sentences

            Mappings: static, dynamic, set of active sentences (SAS)
     Semantics Entities / Abstractions/ Associations (SEAA)
            Entities defined at any level of abstraction (user-level)
            Attribute entity with semantic information
            Entity-to-entity associations
            Target measurement layer and asynchronous operation
ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining   12
  SEAA Implementation
     Two association types (implemented in TAU API)
            Embedded – extends associated
             object to store performance
             measurement entity

            External – creates an external
             look-up table using address of                                     …
             object as key to locate performance
             measurement entity
     Implemented in TAU API
     Applied to performance measurement problems
            callpath/phase profiling, C++ templates, …
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining       13
  Uintah Problem Solving Environment (PSE)
     Uintah component architecture for Utah C-SAFE project
            Application programmers provide:
              description of computation (tasks and variables)
              code to perform task on single “patch” (sub-region of space)

            Components for scheduling, partitioning, load balance, …
     Uintah Computational Framework (UCF)
            Execution model based on software (macro) dataflow
              computations  expressed a directed acyclic graphs of tasks
              input/outputs specified for each patch in a structured grid

            Abstraction of global single-assignment memory
            Task graph gets mapped to processing resources
            Communications schedule approximates global optimal
ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining   14
  Uintah Task Graph (Material Point Method)
     Diagram of named tasks
      (ovals) and data (edges)
     Imminent computation
            Dataflow-constrained
     MPM
            Newtonian material point
             motion time step
            Solid: values defined at
             material point (particle)
            Dashed: values defined at
             vertex (grid)
            Prime (’): values updated
             during time step
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining   15
  Task Execution in Uintah Parallel Scheduler
     Profile methods
      and functions in
      scheduler and in
      MPI library

    Task execution time                                                       Task execution
    dominates (what task?)                                                   time distribution


              MPI communication
              overheads (where?)
     Need to map
      performance data!
ParCo 2005        Parallel Performance Mapping, Diagnosis, and Data Mining                       16
  Mapping Instrumentation in UCF (example)
     Use TAU performance mapping API
  void MPIScheduler::execute(const ProcessorGroup * pc,
                             DataWarehouseP    & old_dw,
                             DataWarehouseP    & dw ) {
    ...
    TAU_MAPPING_CREATE(
      task->getName(), "[MPIScheduler::execute()]",
    (TauGroup_t)(void*)task->getName(), task->getName(), 0);
    ...
    TAU_MAPPING_OBJECT(tautimer)
    TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void*)task->getName());
    // EXTERNAL ASSOCIATION
    ...
    TAU_MAPPING_PROFILE_TIMER(doitprofiler, tautimer, 0)
    TAU_MAPPING_PROFILE_START(doitprofiler,0);
    task->doit(pc);
    TAU_MAPPING_PROFILE_STOP(0);
    ...
  }

ParCo 2005        Parallel Performance Mapping, Diagnosis, and Data Mining   17
  Task Performance Mapping (Profile)




                                                                        Mapped task
                                                                        performance
                                                                        across processes




                                                                         Performance
                                                                         mapping for
                                                                         different tasks
ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining                      18
  Work Packet – to – Task Mapping (Trace)
                                                                            Work packet
                                                                            computation
                                                                            events colored
                                                                            by task type




             Distinct phases of
             computation can be
             identifed based on task
ParCo 2005       Parallel Performance Mapping, Diagnosis, and Data Mining                    19
  Comparing Uintah Traces for Scalability Analysis

                                                                              8 processes




                                                                                32 processes




    32 processes    8 processes
ParCo 2005         Parallel Performance Mapping, Diagnosis, and Data Mining                    20
  Important Questions for Application Developers
     How does performance vary with different compilers?
     Is poor performance correlated with certain OS features?
     Has a recent change caused unanticipated performance?
     How does performance vary with MPI variants?
     Why is one application version faster than another?
     What is the reason for the observed scaling behavior?
     Did two runs exhibit similar performance?
     How are performance data related to application events?
     Which machines will run my code the fastest and why?
     Which benchmarks predict my code performance best?

ParCo 2005       Parallel Performance Mapping, Diagnosis, and Data Mining   21
  Performance Problem Solving Goals
     Answer questions at multiple levels of interest
            Data from low-level measurements and simulations
              use   to predict application performance
            High-level performance data spanning dimensions
              machine, applications, code revisions, data sets
              examine broad performance trends

     Discover general correlations application performance
      and features of their external environment
     Develop methods to predict application performance on
      lower-level metrics
     Discover performance correlations between a small set
      of benchmarks and a collection of applications that
      represent a typical workload for a given system
ParCo 2005             Parallel Performance Mapping, Diagnosis, and Data Mining   22
  Empirical-Based Performance Optimization
     Experiment
     management                 Process
        Experiment                     Performance
         Schemas
                                         Tuning
                                 hypotheses
                                       Performance
                                        Diagnosis
                                   properties
        Experiment                    Performance
          Trials                     Experimentation
                            characterization
                                       Performance
                                       Observation

      observability
      requirements                                                               ?
ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining       23
  Performance Data Management Framework




     ICPP 2005 paper
ParCo 2005      Parallel Performance Mapping, Diagnosis, and Data Mining   24
  PerfExplorer (K. Huck, Ph.D. student, UO)
     Performance knowledge discovery framework
            Use the existing TAU infrastructure
              TAU   instrumentation data, PerfDMF
            Client-server based system architecture
            Data mining analysis applied to parallel performance data
              comparative,     clustering, correlation, dimension reduction, ...
     Technology integration
            Relational DatabaseManagement Systems (RDBMS)
            Java API and toolkit
            R-project / Omegahat statistical analysis
            WEKA data mining package
            Web-based client

ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining   25
  PerfExplorer Architecture




     SC’05 paper
ParCo 2005      Parallel Performance Mapping, Diagnosis, and Data Mining   26
  PerfExplorer Client GUI




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   27
  Hierarchical and K-means Clustering (sPPM)




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   28
  Miranda Clustering on 16K Processors




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   29
  Parallel Performance Diagnosis
     Performance tuning process
            Process to find and report performance problems
            Performance diagnosis: detect and explain problems
            Performance optimization: performance problem repair
     Experts approach systematically and use experience
            Hard to formulate and automate expertise
            Performance optimization is fundamentally hard
     Focus on the performance diagnosis problem
            Characterize diagnosis processes
            How it integrates with performance experimentation
            Understand the knowledge engineering
ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining   30
  Parallel Performance Diagnosis Architecture




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   31
  Performance Diagnosis System Architecture




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   32
  Problems in Existing Diagnosis Approaches
     Low-level abstraction of properties/metrics
            Independent of program semantics
            Relate to component structure
              not   algorithmic structure or parallelism model
     Insufficient explanation power
            Hard to interpret in the context of program semantics
            Performance behavior not tied to operational parallelism
     Low applicability and adaptability
            Difficult to apply in different contexts
            Hard to adapt to new requirements


ParCo 2005              Parallel Performance Mapping, Diagnosis, and Data Mining   33
  Poirot Project
     Lack of a formal theory of diagnosis processes
            Compare and analyze performance diagnosis systems
            Use theory to create system that is automated / adaptable
     Poirot performance diagnosis (theory, architecture)
            Survey of diagnosis methods / strategies in tools
            Heuristic classification approach (match to characteristics)
            Heuristic search approach (based on problem knowledge)
     Problems
            Descriptive results do not explain with respect to context
              users   must reason about high-level causes
            Performance experimentation not guided by diagnosis
              Lacks   automation
ParCo 2005              Parallel Performance Mapping, Diagnosis, and Data Mining   34
  Model-Based Approach
     Knowledge-based performance diagnosis
            Capture knowledge about performance problems
            Capture knowledge about how to detect and explain them
     Where does the knowledge come from?
            Extract from parallel computational models
              Structural   and operational characteristics
            Associate computational models with performance
     Do parallel computational models help in diagnosis?
            Enables better understanding of problems
            Enables more specific experimentation
            Enables more efffective hypothesize testing and search

ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining   35
  Implications for Performance Diagnosis
     Models benefit performance diagnosis
            Base instrumentation on program semantics
            Capture performance-critical features
            Enable explanations close to user’s understanding
              of computation operation
              of performance behavior

            Reuse performance analysis expertise
              on   the commonly-used models
     Model examples
            Master-worker model                             Pipeline
            Divide-and-conquer                              Domain decomposition
            Phase-based                                     Compositional
ParCo 2005             Parallel Performance Mapping, Diagnosis, and Data Mining      36
  Hercule Project




     Goals of automation , adaptability, validation
ParCo 2005       Parallel Performance Mapping, Diagnosis, and Data Mining   37
  Approach
     Make use of model knowledge to diagnose performance
            Start with commonly-used computational models
            Engineering model knowledge
            Integrate model knowledge with performance
             measurement system
            Build a cause inference system
              define “causes” at parallelism level
              build causality relation
                 between the low-level “effects” and the “causes”




ParCo 2005              Parallel Performance Mapping, Diagnosis, and Data Mining   38
  Master-Worker Parallel Computation Model




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   39
  Performance Diagnosis Inference Tree (MW)
                                                       Low speedup


      2.Fine-granularity            3.Master-being-bottleneck           1.Insufficient-parallelism         4. Some workers
               +                               +                                    +                     Noticeably inefficient
                                                                                                                      +

Large amount                                                     master-assign-task              The workers waited quite
 of message               Num of reqs in                          time significant               a while in master queue in
 exchanged              master queue > Κ1                                                           Some time intervals
  every time          in some time intervals                                   Init. or final.                  +
                                                                             time significant

                                                     Waiting long time for                                     Time imbalance
                   Such intervals                    Master assigning each
                       >Κ2                              individual task
                                    Such intervals
                                         <Κ2
                                        +
                                                                                                              : Observation
         Worker number                                                                                         : Hypotheses
           saturation                                                                                       : Causes
                                                                                                        number : priority

                                           Worker                                                       Κi : threshold
                                          starvation                                                     + : coexistence
ParCo 2005                   Parallel Performance Mapping, Diagnosis, and Data Mining                                         40
  Knowledge Engineering - Abstract Event (MW)
                                   Use CLIPS expert system building tool




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   41
  Diagnosis Results Output (MW)




                     QuickTime™ an d a
                 TIFF (LZW) decomp resso r
              are need ed to see this picture .




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   42
  Experimental Diagnosis Results (MW)




ParCo 2005   Parallel Performance Mapping, Diagnosis, and Data Mining   43
  Concluding Discussion
   Performance tools must be used effectively
   More intelligent performance systems for productive use
            Evolve to application-specific performance technology
            Deal with scale by “full range” performance exploration
            Autonomic and integrated tools
            Knowledge-based and knowledge-driven process
     Performance observation methods do not necessarily
      need to change in a fundamental sense
            More automatically controlled and efficiently use
   Support model-driven performance diagnosis
   Develop next-generation tools and deliver to community

ParCo 2005            Parallel Performance Mapping, Diagnosis, and Data Mining   44
  Support Acknowledgements
    Department of Energy (DOE)
      Office of Science contracts
      University of Utah ASCI Level 1
       sub-contract
      ASC/NNSA Level 3 contract
    NSF
            High-End Computing Grant                                              F      c
                                                                                                 c
                                                                                              Qu i k Ti me ™ a d a
                                                                                TIF (Un o m p s s e ) d e o m p s s o
                                                                                               e
                                                                                                 re      d      c
                                                                                                                 n
                                                                                                                   i
                                                                                                                       re
                                                                                   a re n e e d d to s e e th i s p c tu re.
                                                                                                                             r




    Research Centre Juelich
       John von Neumann Institute
       Dr. Bernd Mohr
    Los Alamos National Laboratory

ParCo 2005           Parallel Performance Mapping, Diagnosis, and Data Mining                                                    45

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:10/17/2012
language:Unknown
pages:45