Document Sample
antonio Powered By Docstoc
					Toward an automatic parallel tool for
   solving systems of nonlinear

             Antonio M. Vidal
              Jesús Peinado

Departamento de Sistemas Informáticos y Computación
Universidad Politécnica de Valencia
Solving Systems of Nonlinear Equations
   Given F :  n   n , find x *   n / F ( x* )  0

  Newton’s iteration:            x  xc  J ( xc ) 1 F ( xc )
  Newton’s Algorithm
                Chose x ( 0 )
                   Evaluate F ( x ( 0 ) )
                   While F ( x ( i ) )  bound
                      Compute Jacobian matrix J ( x ( i ) )
                       Solve J ( x ( i ) ) sk   F ( x ( i ) )
                       xi  xi  si
                       Evaluate F ( x ( i ) )
     Methods to solve Nonlinear
• Newton’s Methods: To solve the linear system by using
  a direct method (LU, Cholesky,..) Several approaches :
  Newton, Shamanskii, Chord,..

• Quasi-Newton Methods: To approximate the Jacobian
  matrix . (Broyden Method, BFGS,...)
                    B(xc) ≈ J(xc)
                    B(x+)= B(xc)+uvT

• Inexact Newton Methods : To solve the linear system
  by using an iterative method (GMRES, C. Gradient,..) .
             ||J(xk )sk+ F(xk )||2 = ηk ||F(xk )||2
  Difficulties in the solution of Nonlinear
    Systems by a non-expert Scientist
• Several methods
• Slow convergence
• A lot of trials are needed to obtain the optimum
• If parallelization is tried the possibilities increase
  dramatically: shared memory, distributed
  memory, passing message environments,
  computational kernels, several parallel
  numerical libraries,…
• No help is provided by libraries to solve a
  nonlinear system
• To achieve a software tool which
  automatically obtains the best from a
  sequential or parallel machine for solving a
  nonlinear system, for every problem and
  transparently to the user
                   Work done
• A set of parallel algorithms have been implemented:
  Newton’s, Quasi-Newton and Inexact Newton algorithms
  for symmetric and nonsymmetric Jacobian matrices
• Implementations are independent of the problem
• They have been tested with several problems of different
• They have been developed by using the support and the
  philosophy of ScaLAPACK
• They can be seen as a part of a more general
  environment related to software for message passing
• Example of distribution for solving a linear system with J Jacobian
  Matrix and F problem function
• Programming Model: SPMD.
• Interconnection network: Logical Mesh
• Two-dimensional distribution of data: block cyclic
       Software environment
USER           Authomatic Parallel Tool

                Numerical Paralell Algorithms

                                                    ScaLAPACK                Scalable Linear Algebra Package

                                    MINPACK                                                               Global
                                       Package                                      PBLAS          Parallel BLAS

                              LAPACK         Linear Algebra Package
                                                                                                   Basic Linear Algebra
                                                                                    BLACS          Communication Subroutines

    Other      CERFACS:                                                                                   Local
  packages..   CG,GMRES
               Iterative Solvers
                                   BLAS   Basic Linear Algebra Subroutines
                                                                              (MPI, PVM, ...)
      Developing a systematic approach
                   How to chose the best method?

•    Specification of data problem
    i. Starting point.
    ii. Function F.
    iii. Jacobian Matrix J.
    iv. Structure of Jacobian Matrix (dense, sparse, band, …)
    v. Required precision.
    vi. Using of chaotic techniques.
    vii. Possibilities of parallelization (function, Jacobian Matrix,…).

•    Sometimes only the Function is known:
       Prospecting with a minimum simple algorithm (Newton+finite
    differences+sequential approach) can be interesting
La metodología(1).
Esquema general
Developing a systematic approach
     Method                                  flops
     Newton                                                         2
                                        C C  k N (C E  C J           n3 )
   Shamanskii                                         2
                                 C c  k S (C J          n 3  m (C E  2n 2 ))
      Chord                                       2
                                    Cc  C J         n 3  kC (C E  2n 2 )
 Newton-Cholesky                                                        n3
                                       C C  k NCH (C E  C J               )
     Broyden                                          4
                               CC  C E  CJ              n 3  k B (C E  29n 2 )
      BFGS          CC
                                    n3                               n3
                        CE  C J  3  kBF (2 n 2  CE )  m (C J  3 )  (k BF  m )( n 2)

 Newton-GMRES                  C C  C E  k NG (C E  C J  kG 2n 2 m  C E )
  Newton-CG                        C C  C E  k NCG (C J  kCG n 2  C E )

  CE= Function evaluation cost; CJ=Jacobian matrix evaluation cost
    Developing a systematic approach
•   Function and Jacobian Matrix characterize the nonlinear system
•   It is important to know features of both: sparse or dense, how to compute
    (sequential or parallel), structure,…
•   It is be interesting to classify the problems according to their cost, specially to
    identify the best method or to avoid the worst method and to decide what must
    be parallelized

                                  O(n) O(n2) O(n3) O(n4) >O(n4)
                          O(n)    P11   P12    P13   P14     P1+
                          O(n2)   P21   P22    P23   P24     P2+
                          O(n3)   P31   P32    P33   P34     P3+
                          O(n4)   P41   P42    P43   P44     P4+
                         >O(n4)   P+1   P+2    P+3   P+4     P++
  Developing a systematic approach
• Once the best sequential option has been
  selected the process can be finalized
• If the best parallel algorithm is required the
  following items must be analyzed:
   – Computer architecture: (tf, t, b )
   – Programming environments: PVM/MPI….
   – Data distribution to obtain the best
   – Cost of the parallel algorithms
 Developing a systematic approach

Data Distribution
 It depends on the parallel environment. In the case of ScaLAPACK:
   Cyclic by blocks distribution: optimize the size of block and the size of
   the mesh
Parallelization chances
   Function evaluation and/or Computing the Jacobian matrix.
                  Parallelize the more expensive operation!

Cost of the parallel algorithms
   Utilize the table for parallel cost with the parameters of the parallel
      machine: (tf, t, b)
Developing a systematic approach
     Final decision for chosing the method
               Cost < O(n3) => 0; Cost >= O(n3) => 1

CE     CJ                         Advisable

0      0    Chose according to the speed of convergence. If it is
            slow chose Newton or Newton GMRES
0      1    Avoid to compute the Jacobian matrix. Chose Broyden
            or use finite differences
1      0    Newton or Newton-GMRES adequate. Avoid to compute
            the function
1      1    Try to do a small number of iterations. Use Broyden to
            avoid the computation of Jacobian matrix
 Developing a systematic approach
           Final decision for parallelization
     No chance of parallelization => 0; Chance of parallelization => 1

Fun Jac.                                 Advisable
 0          0    Try to do few iterations. Use Broyden or Chord to avoid
                 the computation of Jacobian matrix
 0          1    Newton or Newton-GMRES adequate. Do few iterations
                 and avoid to compute the function
 1          0    Compute few times Jacobian matrix. Use Broyden or
                 Chord if possible.
 1          1    Chose according to speed of convergence. Newton or
                 Newton-GMRES adequate
       Developing a systematic
 Finish or feedback:
 IF selected method is convenient
     THEN finish
     ELSE feedback

Sometimes bad results are obtained due to:
  – No convergence.
  – High computational cost
  – Parallelization no satisfactory.
   La metodología(12).
Esquema del proceso guiado
   La metodología(12).
Esquema del proceso guiado
                How does it work?
•Inverse Toeplitz Symmetric Eigenvalue
•Well known problem: Starting point, function, analytical Jacobian
matrix or finite difference approach, …
•Kind of problem
   F  O(n3)                            F  O(n3)
              P33 Anal.Jac.                         P34   Fin.Dif. Jac
   J  O(n3)                            J  O(n 4 )

• Cost of Jacobian matrix high: Avoid compute it. Use
  Chord o Broyden.
• High chance of parallelization, even if finite difference
  is used.
• If speed of convergence is slow use Broyden but
  insert some Newton iterations.
                 How does it work?
•Leakage minimization in a network of water
•Well known problem: Starting point, function, analytical Jacobian
matrix or finite difference approach, …
•Jacobian matrix: symmetric, positive def.
•Kind of problem

                      F  O(n2)
                                 P22
                      J  O(n2)

• Avoid methods with high cost of a iteration like Newton-
• Computation of F and J can be parallelized.
• Use Newton-CG (to speed-up convergence) or BFGS
• Part of this work has been done in the Ph.D. Thesis of
  J.Peinado: “Resolución Paralela de Sistemas de
  Ecuaciones no Lineales”. Univ.Politécnica de Valencia.
  Sept. 2003
• All specifications and parallel algorithms have been
• Implementation stage of the automatic parallel tool starts
  in January 2004 in the frame of a CITYT Project:
  “Desarrollo y optimización de código paralelo para
  sistemas de Audio 3D”. TIC2003-08230-C02-02

Shared By: