Hypergraph-based Dynamic Load Balancing for Adaptive Scientific by wmf18501

VIEWS: 22 PAGES: 24

									             Hypergraph-based
          Dynamic Load Balancing
    for Adaptive Scientific Computations
      Ümit V. Çatalyürek1, Erik G. Boman2, Karen D. Devine2,
       Doruk Bozdağ1, Robert Heaphy2, Lee Ann Riesen2

                                           1 The         Ohio State University
                                       2 Sandia              National Laboratories

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Departme nt of Energy’s National
Nuclear Security Administration under contract DE-AC04-94AL85000.
Partitioning and Load Balancing
 Goal: assign data to processors to
    minimize application runtime
    maximize utilization of computing resources
 Metrics:                                                                =
    minimize processor idle time (balance workloads)
                                                             A        x       b
    keep inter-processor communication costs low           Linear solvers &
 Impacts performance of a wide range of simulations        preconditioners




     Contact detection          Particle simulations    Adaptive mesh refinement
Dynamic Load Balancing/Repartitioning
    Applications with workload or locality that changes during simulation require
     dynamic load balancing (a.k.a. repartitioning)
       Adaptive mesh refinement
       Particle methods
       Contact detection
    Repartitioning has additional cost:
       Moving data from old to new decomposition

 executionT = #iter x ( computationT + communicationT) + repartT + migrationT
Roadmap

 Motivation
 Background
    Common Repartitioning Techniques
    Graph and Hypergraph Approaches
 New Hypergraph Model for Dynamic Load Balancing
 Parallel Multilevel Hypergraph Partitioning with Fixed
  Vertices
 Experimental Results
 Conclusion
Common Repartitioning Approaches
  Scratch Repartitioning
  Incremental geometric methods
     Recursive Coordinate Bisection (Berger & Bokhari, 1987)
     Space-Filling Curves (Warren & Salmon, 1993; Pilkington & Baden, 1994;
      Patra & Oden, 1995)
     Requires geometric coordinates
     Do not explicitly control communication costs
  Graph-based methods
     Diffusion (Cybenko, 1989; Ou & Ranka, 1992; Hu & Blake, 1995)
     Multi-level adaptive methods (Walshaw, Cross & Everett, 1997;
      Schloegel, Karypis & Kumar, 1997; Schloegel, Karypis & Kumar, 2000;
      Aykanat et al. 2007)
  Our approach: Hypergraph-based repartitioning
Graph and Hypergraph Partitioning
                            Graphs                               Hypergraphs
Community     load-balancing                          VLSI,
              (highly successful for PDE problems)    recently Computational Science
Model         Vertices = computation/data             Vertices= computation/data
              Edge = relationship between             Edge = dependency to data elements
              computation/data (bi-directional)       (multi-way)
Goal          Evenly distribute vertex weight while   Evenly distribute vertex weight while
              minimizing weight of cut edges          minimizing cut size
Algorithms    Kernighan, Lin, Simon, Hendrickson,     Kernighan, Schweikert, Fiduccia,
              Leland, Kumar, Karypis, et al.          Mattheyes, Sanchis, Alpert, Kahng,
                                                      Hauck, Borriello, Çatalyürek, Aykanat,
                                                      Karypis, et al.
Serial        Chaco (SNL), Jostle (U. Greenwich),     hMETIS (Karypis), PaToH (Çatalyürek),
Partitioner   METIS (U. Minn.), Party (U.             Mondriaan (Bisseling)
              Paderborn), Scotch (U. Bordeaux)
Parallel      ParMETIS (U. Minn.), PJostle (U.        Zoltan PHG (Sandia), Parkway
Partitioner   Greenwich)                              (Trifunovic)
Impact of Hypergraph Models
(Where Graph is not Sufficient)
  Greater expressiveness 
        Greater applicability
     Structurally non-symmetric systems
         circuits, biology
     Rectangular systems
         linear programming, least-squares
          methods                                                       Mondriaan Partitioning
     Non-homogeneous, highly connected                                Courtesy of Rob Bisseling
      topologies                                  P1
                                                                                    Vj
         circuits, nanotechnology, databases                     nk


     Multiple models for different granularity         Vi
                                                             nl
                                                                               Vk


      partitioning                                                                  Vl

         Owner compute, fine-grain,                          ni
          checkerboard/cartesian, Mondriaan
                                                             nm
                                                                                     Vm


  Accurate communication model                             nh
                                                                              Vh

   lower application communication costs
                                                   P3                                     P4
Roadmap

 Motivation
 Background
    Common Repartitioning Techniques
    Graph and Hypergraph Approaches
 New Hypergraph Model for Dynamic Load Balancing
 Parallel Multilevel Hypergraph Partitioning with Fixed
  Vertices
 Experimental Results
 Conclusion
Hypergraph Repartitioning Model
  Augment application hypergraph to represent both application and
   current processor assignments
  Minimize combined cost of application communication and data migration
Hypergraph Model
                    H=(V, E) is Hypergraph
                    P = {P1, P2, .. Pk} is k-way
                     partition

                       i : #parts edge ei connects
                    cut(P)=     ( 1)  c
                                         i     i
                                e i E



                    cut(P) = total comm volume
                     
Hypergraph Repartitioning
 Start with application hypergraph
 Add
   one partition vertex for each partition
   migration edges connecting application
    vertices to their partition vertices
 Weight the hyperedges:
   Migration edge weight =
    size of application objects (migration
    size)
   Application edge weight =
    size of communication elements
 Scale application edge weights by  ≈
  number of application communications
  between repartitions (#iter)
 Perform hypergraph partitioning with
  partition vertices “fixed”
Hypergraph Repartitioning
 Start with application hypergraph
 Add
   one partition vertex for each partition
   migration edges connecting application
    vertices to their partition vertices
 Weight the hyperedges:
   Migration edge weight =
    size of application objects (migration
    size)
   Application edge weight =
    size of communication elements
 Scale application edge weights by  ≈
  number of application communications
  between repartitions (#iter)
 Perform hypergraph partitioning with
  partition vertices “fixed”
Hypergraph Repartitioning
 Start with application hypergraph
 Add
   one partition vertex for each partition
   migration edges connecting application
    vertices to their partition vertices
 Weight the hyperedges:
   Migration edge weight =                          QuickTime™ an d a
                                                 TIFF (LZW) decompressor
    size of application objects (migration    are need ed to see this p icture.
    size)
   Application edge weight =
    size of communication elements
 Scale application edge weights by  ≈
  number of application communications
  between repartitions (#iter)
 Perform hypergraph partitioning with
  partition vertices “fixed”
Hypergraph Repartitioning
 Start with application hypergraph
 Add
   one partition vertex for each partition
   migration edges connecting application
    vertices to their partition vertices
 Weight the hyperedges:
   Migration edge weight =
    size of application objects (migration
    size)
   Application edge weight =
    size of communication elements
 Scale application edge weights by  ≈
  number of application communications
  between repartitions (#iter)
 Perform hypergraph partitioning with
  partition vertices “fixed”
Roadmap

 Motivation
 Background
    Common Repartitioning Techniques
    Graph and Hypergraph Approaches
 New Hypergraph Model for Dynamic Load Balancing
 Parallel Multilevel Hypergraph Partitioning with Fixed
  Vertices
 Experimental Results
 Conclusion
Implementation of Hypergraph Repartitioning
   Implemented in Zoltan toolkit
   Zoltan constructs augmented hypergraph
   Zoltan uses its existing parallel hypergraph partitioner…
     Parallel multilevel algorithm with recursive bisection
      (Bui & Jones, 1993; Hendrickson & Leland, 1993; Karypis and Kumar, 1995)




   … with added capability for handling “fixed vertices.”
Handling Fixed Vertices
  In coarsening:
     Vertices that are fixed to different partitions cannot merge with
       each other
     Coarse vertices are fixed if they contain fixed fine vertices
  In coarse partitioning:
     Fixed vertices are assigned to their respective partitions
  In refinement:
     Fixed vertices cannot move out of their partition
  In recursive bisection:
     Fixed assignment is adjusted to the set containing the fixed
       partition
Roadmap

 Motivation
 Background
    Common Repartitioning Techniques
    Graph and Hypergraph Approaches
 New Hypergraph Model for Dynamic Load Balancing
 Parallel Multilevel Hypergraph Partitioning with Fixed
  Vertices
 Experimental Results
 Conclusion
Experimental Results
    Experiments on
        BMI-RI cluster
            64 compute nodes connected with Infiniband
            Dual 2.4 GHz AMD Opteron processors with 8 GB RAM
    Zoltan v2.1 hypergraph partitioner & ParMETIS v3.1
     graph partitioner
    Test problems:
        2DLipid : 4K x 4K; 5.6M nonzeros                         Xyce ASIC Stripped
        Xyce ASIC Stripped: 680K x 680K; 2.3M nonzeros
        Cage15 Electrophoresis: 5.1M x 5.1M; 99M nonzeros




                                                                 Cage Electrophoresis
                                                                                                                                        Total Communication Volume




                                                                                                                                     0.E+00
                                                                                                                                                 5.E+07
                                                                                                                                                             1.E+08
                                                                                                                                                                         2.E+08
                                                                                                                                                                                     2.E+08
                                                                                                                                                                                                  3.E+08
                                                                                                                                                                                                              3.E+08
                                                                                                                                                                                                                             4.E+08
                                                                                                                      Hypergraph
                                                                                                                          Repart

                                                                                                                           Graph
                                                                                                                           Repart

                                                                                                                           Static




                                                                                                          alpha=80
                                                                                                                      Hypergraph
                             Total Communication Volume
                                                                                                                                                                                                              2DLipidFMat


                                                                                                                      Static Graph




                           0 .E +0 0
                                         5 .E +0 8
                                                     1 .E +0 9
                                                                 2 .E +0 9
                                                                             2 .E +0 9
                                                                                             3 .E +0 9
            Hypergraph
                Repart
                                                                                                                      Hypergraph
                  Graph                                                                                                   Repart




                                                                                    Cage14
                 Repart
                                                                                                                           Graph
                 Static                                                                                                    Repart




alpha=80
            Hypergraph
                                                                                                          alpha=800        Static
                                                                                                                      Hypergraph
            Static Graph
                                                                                                                      Static Graph



                                                                                                                                      Total Communication Volume
            Hypergraph
                Repart
                                                                                                                                     0 .E +0 0
                                                                                                                                                 5 .E +0 7
                                                                                                                                                             1 .E +0 8
                                                                                                                                                                         2 .E +0 8
                                                                                                                                                                                     2 .E +0 8
                                                                                                                                                                                                 3 .E +0 8
                                                                                                                                                                                                             3 .E +0 8
                                                                                                                                                                                                                            4 .E +0 8




                  Graph
                 Repart                                                                                               Hypergraph
                                                                                                                          Repart
                 Static




alpha=800
            Hypergraph                                                                                                     Graph
                                                                                                                           Repart

            Static Graph                                                                                                   Static
                                                                                                         alpha=80




                                                                                                                      Hypergraph
                                                                                                                                                                                                              Xyce_680K




                                                                                                                      Static Graph




                                                                                                                      Hypergraph
                                                                                                                                                                                                                                        Dynamic Weights: Communication Volume




                                                                                                                          Repart
             Volume
                                       Volume




                                                                                                                           Graph
                                                                                                                           Repart
                                       Migration



             Application




                                                                                                                           Static
                                                                                                         alpha=800




                                                                                                                      Hypergraph


                                                                                                                      Static Graph
             Commmunication
                                       Commmunication
                                                                                                               Partitioning Time (secs)




                                                                                                           0
                                                                                                               1
                                                                                                                       2
                                                                                                                           3
                                                                                                                               4
                                                                                                                                       5
                                                                                                                                           6
                                                                                                                                                         7
                                                                                            Hypergraph
                                                                                                Repart


                                                                                                  Graph
                                                                                                 Repart


                                                                                                 Static




                                                                                alpha=80
                                                                                                                                           2DLipidFMat
                                                                                            Hypergraph


                                                                                            Static Graph

                               Partitioning Time (secs)




                           0
                               5
                                   10
                                        15
                                             20
                                                  25
                                                       30
                                                            35
                                                                 40
                                                                      45
                                                                           50
            Hypergraph
                Repart                                                                      Hypergraph
                                                                                                Repart
                  Graph




                                                                      Cage14
                 Repart                                                                           Graph
                                                                                                 Repart
                 Static




alpha=80
            Hypergraph                                                                           Static
                                                                                alpha=800
                                                                                            Hypergraph

            Static Graph
                                                                                            Static Graph


                                                                                                               Partitioning Time (secs)
                                                                                                           0
                                                                                                               1
                                                                                                                   2
                                                                                                                       3
                                                                                                                           4
                                                                                                                               5
                                                                                                                                   6
                                                                                                                                       7
                                                                                                                                           8
                                                                                                                                               9
                                                                                                                                                         10




            Hypergraph
                Repart                                                                      Hypergraph
                                                                                                Repart
                  Graph
                 Repart                                                                           Graph
                                                                                                 Repart
                 Static




alpha=800
                                                                                                                                           Xyce_680K




            Hypergraph                                                                           Static
                                                                                alpha=80




                                                                                            Hypergraph

            Static Graph
                                                                                            Static Graph
                                                                                                                                                              Dynamic Weights: Partitioning Time




                                                                                            Hypergraph
                                                                                                Repart


                                                                                                  Graph
                                                                                                 Repart


                                                                                                 Static
                                                                                alpha=800




                                                                                            Hypergraph


                                                                                            Static Graph
                                                                                                                                                        Total Communication Volume




                                                                                                                                                       0 .E +0 0
                                                                                                                                                                    2 .E +0 7
                                                                                                                                                                                4 .E +0 7
                                                                                                                                                                                            6 .E +0 7
                                                                                                                                                                                                        8 .E +0 7
                                                                                                                                                                                                                    1 .E +0 8
                                                                                                                                                                                                                                1 .E +0 8
                                                                                                                                                                                                                                            1 .E +0 8
                                                                                                                                                                                                                                                        2 .E +0 8
                                                                                                                                                                                                                                                                    2 .E +0 8
                                                                                                                                                                                                                                                                                2 .E +0 8
                                                                                                                                        Hypergraph
                                                                                                                                            Repart

                                                                                                                                              Graph
                                                                                                                                             Repart

                                                                                                                                             Static




                                                                                                                           alpha=80
                                                                                                                                        Hypergraph
                                                                                                                                                                                                                                                               2DLipidFMat


                           Total Communication Volume
                                                                                                                                        Static Graph




                           0 .E +0 0
                                       2 .E +0 8
                                                   4 .E +0 8
                                                               6 .E +0 8
                                                                           8 .E +0 8
                                                                                       1 .E +0 9
                                                                                                   1 .E +0 9
                                                                                                               1 .E +0 9
            Hypergraph
                Repart
                                                                                                                                        Hypergraph
                                                                                                                                            Repart
                  Graph




                                                                                                   Cage14
                  Repart                                                                                                                      Graph
                                                                                                                                             Repart
                 Static




alpha=80
            Hypergraph                                                                                                     alpha=800         Static
                                                                                                                                        Hypergraph
            Static Graph
                                                                                                                                        Static Graph



                                                                                                                                                        Total Communication Volume
            Hypergraph
                Repart
                                                                                                                                                        0 .E +0 0
                                                                                                                                                                            2 .E +0 7
                                                                                                                                                                                                4 .E +0 7
                                                                                                                                                                                                                    6 .E +0 7
                                                                                                                                                                                                                                       8 .E +0 7
                                                                                                                                                                                                                                                           1 .E +0 8
                                                                                                                                                                                                                                                                                1 .E +0 8




                  Graph
                  Repart                                                                                                                Hypergraph
                                                                                                                                            Repart
                 Static




alpha=800
            Hypergraph                                                                                                                        Graph
                                                                                                                                              Repart

            Static Graph                                                                                                                     Static
                                                                                                                            alpha=80
                                                                                                                                                                                                                                                               Xyce_680K




                                                                                                                                        Hypergraph


                                                                                                                                        Static Graph




                                                                                                                                        Hypergraph
                                                                                                                                            Repart

                                                                                                                                              Graph
              Volume
                                         Volume




                                                                                                                                              Repart
                                         Migration
                                                                                                                                                                                                                                                                                            Dynamic Graph: Communication Volume




              Application




                                                                                                                                             Static
                                                                                                                            alpha=800




                                                                                                                                        Hypergraph


                                                                                                                                        Static Graph
              Commmunication
                                         Commmunication
                                                                                                  Partitioning Time (secs)




                                                                                              0
                                                                                                   1
                                                                                                          2
                                                                                                              3
                                                                                                                   4
                                                                                                                            5
                                                                                                                                  6
                                                                                                                                                7
                                                                               Hypergraph
                                                                                   Repart


                                                                                     Graph
                                                                                    Repart


                                                                                    Static




                                                                   alpha=80
                                                                                                                                      2DLipidFMat
                                                                               Hypergraph


                                                                               Static Graph
                               Partitioning Time (secs)




                           0
                                 5
                                     10
                                          15
                                               20
                                                    25
                                                              30
            Hypergraph
                Repart
                                                                               Hypergraph
                                                                                   Repart
                  Graph




                                                     Cage14
                 Repart
                                                                                     Graph
                                                                                    Repart
                 Static




alpha=80
            Hypergraph
                                                                   alpha=800        Static
                                                                               Hypergraph
            Static Graph

                                                                               Static Graph


                                                                                                  Partitioning Time (secs)
                                                                                              0
                                                                                                  2
                                                                                                      4
                                                                                                          6
                                                                                                              8
                                                                                                                  10
                                                                                                                       12
                                                                                                                            14
                                                                                                                                 16
                                                                                                                                      18
                                                                                                                                                20




            Hypergraph
                Repart
                                                                               Hypergraph
                                                                                   Repart
                  Graph
                 Repart
                                                                                     Graph
                                                                                    Repart
                 Static




alpha=800
            Hypergraph
                                                                                                                                      Xyce_680K




                                                                                    Static
                                                                   alpha=80




                                                                               Hypergraph
            Static Graph

                                                                               Static Graph
                                                                                                                                                     Dynamic Graph: Partitioning Time




                                                                               Hypergraph
                                                                                   Repart


                                                                                     Graph
                                                                                    Repart


                                                                                    Static
                                                                   alpha=800




                                                                               Hypergraph


                                                                               Static Graph
Conclusion
  A novel hypergraph model for dynamic load balancing
     Single hypergraph that incorporates both communication volume
      in the application and data migration cost
     Performs better or comparable to graph-based dynamic load
      balancing
  A parallel dynamic load balancing tool
     A must for real big/parallel problem
     Scales similar to those of graph-based tools


  Future Work:
     There is always room for improvement: speed and/or quality

								
To top