# Sparse LU Factorization by fionan

VIEWS: 22 PAGES: 30

• pg 1
```									Sparse Matrix Methods
• Day 1: Overview

• Day 2: Direct methods

• Day 3: Iterative methods

•   Parallel conjugate gradient and graph partitioning
•   Preconditioning methods and graph coloring
•   Domain decomposition and multigrid
•   Krylov subspace methods for other problems
•   Complexity of iterative and direct methods
SuperLU-dist:
Iterative refinement to improve solution

Iterate:
• r = b – A*x
• backerr = maxi ( ri / (|A|*|x| + |b|)i )
•   if backerr < ε or backerr > lasterr/2 then stop iterating
•   solve L*U*dx = r
•   x = x + dx
•   lasterr = backerr
•   repeat

Usually 0 – 3 steps are enough
(Matlab demo)
• iterative refinement
Convergence analysis of iterative refinement

Let C = I – A(LU)-1    [ so A = (I – C)·(LU) ]

x1 = (LU)-1b
r1 = b – Ax1 = (I – A(LU)-1)b = Cb
dx1 = (LU)-1 r1 = (LU)-1Cb
x2 = x1+dx1 = (LU)-1(I + C)b
r2 = b – Ax2 = (I – (I – C)·(I + C))b = C2b
...
In general, rk = b – Axk = Ckb

Thus rk  0 if |largest eigenvalue of C| < 1.
The Landscape of Sparse Ax=b Solvers

Direct     Iterative
A = LU      y’ = Ay
More General
Non-       Pivoting   GMRES,
symmetric      LU       QMR, …

Symmetric
Cholesky   Conjugate
definite
More Robust

More Robust                      Less Storage
D

x0 = 0, r0 = b, p0 = r0
for k = 1, 2, 3, . . .
αk = (rTk-1rk-1) / (pTk-1Apk-1)   step length
xk = xk-1 + αk pk-1               approx solution
rk = rk-1 – αk Apk-1             residual
βk = (rTk rk) / (rTk-1rk-1)       improvement
pk = rk + βk pk-1                 search direction

• One matrix-vector multiplication per iteration
• Two vector dot products per iteration
• Four n-vectors of working storage
• Eigenvalues: Au = λu               { λ1, λ2 , . . ., λn}
• Cayley-Hamilton theorem:
(A – λ1I)·(A – λ2I) · · · (A – λnI) = 0
Therefore  Σ nciAi =
0i
0 for some ci

so           A-1 =  Σ n
1i
(–ci/c0) Ai–1

• Krylov subspace:
Therefore if Ax = b, then x = A-1 b and
x  span (b, Ab, A2b, . . ., An-1b) = Kn (A, b)

• Krylov subspace: Ki (A, b) = span (b, Ab, A2b, . . ., Ai-1b)
for i = 1, 2, 3, . . .
find xi  Ki (A, b)
such that ri = (Axi – b)  Ki (A, b)

• Notice ri  Ki+1 (A, b), so ri  rj for all j < i

• Similarly, the “directions” are A-orthogonal:
(xi – xi-1 )T·A· (xj – xj-1 ) = 0
• The magic: Short recurrences. . .
A is symmetric => can get next residual and direction
from the previous one, without saving them all.

• In exact arithmetic, CG converges in n steps
(completely unrealistic!!)

• Accuracy after k steps of CG is related to:
• consider polynomials of degree k that are equal to 1 at 0.
• how small can such a polynomial be at all the eigenvalues of A?

• Thus, eigenvalues close together are good.

• Condition number: κ(A) = ||A||2 ||A-1||2 = λmax(A) / λmin(A)

• Residual is reduced by a constant factor by
O(κ1/2(A)) iterations of CG.
(Matlab demo)
• CG on grid5(15) and bcsstk08
• n steps of CG on bcsstk08

• Lay out matrix and vectors by rows

• Hard part is matrix-vector product         P0 P1   P2   P3
y = A*x                                            x
P0
• Algorithm                              y                     P1
Each processor j:
Compute y(j) = A(j,:)*x                                    P3

• May send more of x than needed

• Partition / reorder matrix to reduce
communication
(Matlab demo)
• 2-way partition of eppstein mesh
• 8-way dice of eppstein mesh
Preconditioners

•   Suppose you had a matrix B such that:
1. condition number κ(B-1A) is small
2. By = z is easy to solve

•   Then you could solve (B-1A)x = B-1b instead of Ax = b

•   B = A is great for (1), not for (2)
•   B = I is great for (2), not for (1)
•   Domain-specific approximations sometimes work
•   B = diagonal of A sometimes works

•   Or, bring back the direct methods technology. . .
(Matlab demo)
• bcsstk08 with diagonal precond
Incomplete Cholesky factorization (IC, ILU)

x

A                   RT             R
• Compute factors of A by Gaussian elimination,
but ignore fill

• Preconditioner B = RTR  A, not formed explicitly

• Compute B-1z by triangular solves (in time nnz(A))

• Total storage is O(nnz(A)), static data structure

• Either symmetric (IC) or nonsymmetric (ILU)
(Matlab demo)
• bcsstk08 with ic precond
Incomplete Cholesky and ILU: Variants

1         4          1   4
• Allow one or more “levels of fill”
• unpredictable storage requirements     2        3           2   3

• Allow fill whose magnitude exceeds a “drop tolerance”
• may get better approximate factors than levels of fill
• unpredictable storage requirements
• choice of tolerance is ad hoc

• Partial pivoting (for nonsymmetric A)

• “Modified ILU” (MIC): Add dropped fill to diagonal of U or R
• A and RTR have same row sums
• good in some PDE contexts
Incomplete Cholesky and ILU: Issues

• Choice of parameters
• good: smooth transition from iterative to direct methods
• tradeoff: time per iteration (more fill => more time)
vs # of iterations (more fill => fewer iters)

• Effectiveness
• condition number usually improves (only) by constant factor
(except MIC for some problems from PDEs)
• still, often good when tuned for a particular class of problems

• Parallelism
• Triangular solves are not very parallel
• Reordering for parallel triangular solve by graph coloring
(Matlab demo)
• 2-coloring of grid5(15)
Sparse approximate inverses

A                   B-1

• Compute B-1  A explicitly

• Minimize || B-1A – I ||F    (in parallel, by columns)

• Variants: factored form of B-1, more fill, . .
• Good: very parallel
Support graph preconditioners: example                           [Vaidya]

G(A)                                G(B)
• A is symmetric positive definite with negative off-diagonal nzs
• B is a maximum-weight spanning tree for A
(with diagonal modified to preserve row sums)
• factor B in O(n) space and O(n) time
• applying the preconditioner costs O(n) time per iteration
Support graph preconditioners: example

G(A)                             G(B)
• support each edge of A by a path in B
• dilation(A edge) = length of supporting path in B
• congestion(B edge) = # of supported A edges
• p = max congestion, q = max dilation
• condition number κ(B-1A) bounded by p·q (at most O(n2))
Support graph preconditioners: example

G(A)                               G(B)
• can improve congestion and dilation by adding a few
strategically chosen edges to B
• cost of factor+solve is O(n1.75), or O(n1.2) if A is planar
• in recent experiments [Chen & Toledo], often better than
drop-tolerance MIC for 2D problems, but not for 3D.
Domain decomposition (introduction)

B     0   E

A=        0     C   F

ET   FT G

• Partition the problem (e.g. the mesh) into subdomains
• Use solvers for the subdomains B-1 and C-1 to
precondition an iterative solver on the interface

• Interface system is the Schur complement:
S = G – ET B-1E – FT C-1F
• Parallelizes naturally by subdomains
(Matlab demo)
• grid and matrix structure for overlapping 2-way
partition of eppstein
Multigrid          (introduction)

• For a PDE on a fine mesh, precondition using a solution on
a coarser mesh
• Use idea recursively on hierarchy of meshes
• Solves the model problem (Poisson’s eqn) in linear time!
• Often useful when hierarchy of meshes can be built
• Hard to parallelize coarse meshes well

• This is just the intuition – lots of theory and technology
Other Krylov subspace methods
• Nonsymmetric linear systems:
• GMRES:
for i = 1, 2, 3, . . .
find xi  Ki (A, b) such that ri = (Axi – b)  Ki (A, b)
But, no short recurrence => save old vectors => lots more space

• BiCGStab, QMR, etc.:
Two spaces Ki (A, b) and Ki (AT, b) w/ mutually orthogonal bases
Short recurrences => O(n) space, but less robust

• Convergence and preconditioning more delicate than CG
• Active area of current research

• Eigenvalues: Lanczos (symmetric), Arnoldi (nonsymmetric)
The Landscape of Sparse Ax=b Solvers

Direct     Iterative
A = LU      y’ = Ay
More General
Non-       Pivoting   GMRES,
symmetric      LU       QMR, …

Symmetric
Cholesky   Conjugate
definite
More Robust

More Robust                      Less Storage
Complexity of direct methods

Time and
space to solve
any problem         n1/2                n1/3
on any well-
shaped finite
element mesh

2D                 3D
Space (fill):          O(n log n)          O(n 4/3 )
Time (flops):           O(n 3/2 )          O(n 2 )
Complexity of linear solvers

Time to solve
model problem
(Poisson’s             n1/2               n1/3
equation) on
regular mesh
2D                 3D
Sparse Cholesky:               O(n1.5 )            O(n2 )
CG, exact arithmetic:           O(n2 )             O(n2 )
CG, no precond:                O(n1.5 )           O(n1.33 )
CG, modified IC:               O(n1.25 )          O(n1.17 )
CG, support trees:             O(n1.20 )          O(n1.75 )
Multigrid:                      O(n)               O(n)

```
To top