# High performance implementations of the simplex method for linear

Document Sample

```					High performance implementations of the simplex method
for linear programming
Julian Hall
School of Mathematics - University of Edinburgh
APO - IRIT - ENSEEIHT

8th April 2008
Overview

• Review of the simplex method for linear programming
• Issues in high performance implementations
◦ Hyper-sparsity (1998–2005)
◦ Parallelism (1992–1997; 2004–date)
◦ Block-angular LP problems (2007–date)
◦ LP problems in sparse reconstruction (2007–date)

1
Simplex methods for LP problems

minimize    f = cT x
subject to   Ax = b x ≥ 0

x∈I n
R        and b ∈ I m
R
Simplex methods for LP problems

minimize     f = cT x
subject to    Ax = b x ≥ 0

x∈I n
R        and b ∈ I m
R

• If A is dense use the simplex method with dense matrix algebra
Simplex methods for LP problems

minimize      f = cT x
subject to     Ax = b x ≥ 0

x∈I n
R        and b ∈ I m
R

• If A is dense use the simplex method with dense matrix algebra
• If A is sparse but unstructured use a sparsity-exploiting revised simplex solver
Simplex methods for LP problems

minimize      f = cT x
subject to     Ax = b x ≥ 0

x∈I n
R         and b ∈ I m
R

• If A is dense use the simplex method with dense matrix algebra
• If A is sparse but unstructured use a sparsity-exploiting revised simplex solver
• If A is has block angular form
                                                                          
A10 A11 A12 . . . A1r                          A01

        H1                     

 A11 H1



              H2                      or      A21
 .            H2             
...                                           ...
                                                                           
                                              . .                         
Hr                    Ar1                      Hr
consider methods that exploit this structure.

2
Standard simplex method

N          B      RHS
1
.
.
.        ˆ
N         I        ˆ
b
m
0      sT        0T       ˆ
−f
N

In each iteration:

• Increase xq from zero
ˆ     ˆ
• Use the pivotal column q of N and b to ﬁnd the pivotal row p
• Exchange indices p and q between B and N
Standard simplex method

N          B     RHS
1
.
.
.       ˆ
N          I       ˆ
b
m
0       sT        0T      ˆ
−f
N

In each iteration:

•   Increase xq from zero
•                               ˆ     ˆ
Use the pivotal column q of N and b to ﬁnd the pivotal row p
•   Exchange indices p and q between B and N
•   Update tableau corresponding to this basis change
ˆ
◦ Scale row p by Npq
ˆ
◦ Eliminate Niq , i = 0, . . . , m, i = p

3
Revised simplex method

• Maintains a representation of B −1 (rather than N )
ˆ
Revised simplex method

• Maintains a representation of B −1 (rather than N )
ˆ
• Pivotal column is
aq = B −1aq , where aq is column q of A
ˆ
Revised simplex method

• Maintains a representation of B −1 (rather than N )
ˆ
• Pivotal column is
aq = B −1aq , where aq is column q of A
ˆ

• Pivotal row is
aT = π T N , where π T = eT B −1
ˆp     p             p    p
Revised simplex method

• Maintains a representation of B −1 (rather than N )
ˆ
• Pivotal column is
aq = B −1aq , where aq is column q of A
ˆ

• Pivotal row is
aT = π T N , where π T = eT B −1
ˆp     p             p    p

• Representation of B −1 is updated by exploiting

−1             (ˆ q − ep)eT
a         p       −1
B        :=   I−                  B
ˆ
apq
Revised simplex method

• Maintains a representation of B −1 (rather than N )
ˆ
• Pivotal column is
aq = B −1aq , where aq is column q of A
ˆ

• Pivotal row is
aT = π T N , where π T = eT B −1
ˆp     p             p    p

• Representation of B −1 is updated by exploiting

−1             (ˆ q − ep)eT
a         p       −1
B        :=   I−                  B
ˆ
apq

• Periodically B −1 is formed from scratch

4
Basis matrix inversion: Tomlin INVERT (1972)

Triangularisation:

• Active row/column singleton entries are identiﬁed as pivots
until all active rows/columns have at least two active nonzeros
• Corresponds to Markowitz pivot selection so long as count is zero
• Yields approximate Tarjan form at much lower cost
Basis matrix inversion: Tomlin INVERT (1972)

Triangularisation:

• Active row/column singleton entries are identiﬁed as pivots
until all active rows/columns have at least two active nonzeros
• Corresponds to Markowitz pivot selection so long as count is zero
• Yields approximate Tarjan form at much lower cost

Factorization:

• Gaussian elimination applied the residual bump

5
Relative time for triangularisation and factorization
log (Triang/Factor time)

3

Triang 100 times slower
2

Triang 10 times slower                                 • Triangularisation can be hugely dominant for
10

1
hyper-sparse LP problems ( )
• Factorization can be hugely dominant for
0                                                            other LPs ( )

1                                Factor 10 times slower

2
Factor 100 times slower

3
2         3              4      5           6
log (Basis dimension)
10

6
Hyper-sparsity in the revised simplex method
Hyper-sparsity in the revised simplex method

• Major computational cost each iteration is
forming B −1r F , r T B −1 and r T N
B            π
• Vectors r F , r B are always sparse
• Vector r π may be sparse
Hyper-sparsity in the revised simplex method

• Major computational cost each iteration is
forming B −1r F , r T B −1 and r T N
B             π
• Vectors r F , r B are always sparse
• Vector r π may be sparse
• If the results of these operations are (usually)
sparse then LP is said to be hyper-sparse
Hyper-sparsity in the revised simplex method

• Major computational cost each iteration is
forming B −1r F , r T B −1 and r T N
B             π
• Vectors r F , r B are always sparse
• Vector r π may be sparse
• If the results of these operations are (usually)
sparse then LP is said to be hyper-sparse
• LP structure means B −1 is sparse
Hyper-sparsity in the revised simplex method

2

log (IPM/simplex time)
• Major computational cost each iteration is
forming B −1r F , r T B −1 and r T N
1 Simplex 10 times faster
B             π
• Vectors r F , r B are always sparse
• Vector r π may be sparse                                                        Simplex 2 times faster

10
• If the results of these operations are (usually)                            0
sparse then LP is said to be hyper-sparse
IPM 2 times faster
• LP structure means B −1 is sparse
• Simplex method typically better than interior                               1                                   IPM 10 times faster
point methods for hyper-sparse LP problems
(◦) but frequently not for other LPs (•)

2
2          3             4      5           6
log (Basis dimension)
10

7
Representing B −1

K
• Following INVERT, B −1 is represented by the eta ﬁle {pk , µk , η k }k=1
• Derived directly from the results of Gaussian elimination
Representing B −1

K
• Following INVERT, B −1 is represented by the eta ﬁle {pk , µk , η k }k=1
• Derived directly from the results of Gaussian elimination
◦ Only retain the nontrivial columns of the LU decomposition of B
Representing B −1

K
• Following INVERT, B −1 is represented by the eta ﬁle {pk , µk , η k }k=1
• Derived directly from the results of Gaussian elimination
◦ Only retain the nontrivial columns of the LU decomposition of B
◦ The pivots µk are in rows pk
◦ η k are the eta vectors
◦ K       2m is common
Representing B −1

K
• Following INVERT, B −1 is represented by the eta ﬁle {pk , µk , η k }k=1
• Derived directly from the results of Gaussian elimination
◦ Only retain the nontrivial columns of the LU decomposition of B
◦ The pivots µk are in rows pk
◦ η k are the eta vectors
◦ K       2m is common
• Operating with the eta ﬁle ≡ Column-wise forward/backward substitution

8
Exploiting hyper-sparsity

• Traditional technique for solving Bx = r by   do k = 1, K
transforming r into x                             rpk := rpk /ηk
r := r − rpk η k
end do
Exploiting hyper-sparsity

• Traditional technique for solving Bx = r by   do k = 1, K
transforming r into x                             rpk := rpk /ηk
r := r − rpk η k
end do

• When r is sparse skip η k if rpk is zero      do k = 1, K
if (rpk .ne. 0) then
rpk := rpk /ηk
r := r − rpk η k
end if
end do
Exploiting hyper-sparsity

• Traditional technique for solving Bx = r by         do k = 1, K
transforming r into x                                   rpk := rpk /ηk
r := r − rpk η k
end do

• When r is sparse skip η k if rpk is zero            do k = 1, K
if (rpk .ne. 0) then
rpk := rpk /ηk
r := r − rpk η k
end if
end do

• When x is sparse, the dominant cost is the test for zero
Exploiting hyper-sparsity

• Traditional technique for solving Bx = r by         do k = 1, K
transforming r into x                                   rpk := rpk /ηk
r := r − rpk η k
end do

• When r is sparse skip η k if rpk is zero            do k = 1, K
if (rpk .ne. 0) then
rpk := rpk /ηk
r := r − rpk η k
end if
end do

• When x is sparse, the dominant cost is the test for zero
• Eﬃcient identiﬁcation of vectors η k to be applied
◦ Gilbert and Peierls (1988)
◦ H and McKinnon (1998–2005)

9
Hyper-sparse FTRAN

• During each INVERT
◦ For each row i record the indices k of the etas for which pk = i
◦ At most two indices for each row
Hyper-sparse FTRAN

• During each INVERT
◦ For each row i record the indices k of the etas for which pk = i
◦ At most two indices for each row

• During each FTRAN
Let E be indices of etas with pivots corresponding to nonzeros in r
Hyper-sparse FTRAN

• During each INVERT
◦ For each row i record the indices k of the etas for which pk = i
◦ At most two indices for each row

• During each FTRAN
Let E be indices of etas with pivots corresponding to nonzeros in r
1 Scan E to determine the index k of the next eta to be applied
If k is undeﬁned then FTRAN is complete
Hyper-sparse FTRAN

• During each INVERT
◦ For each row i record the indices k of the etas for which pk = i
◦ At most two indices for each row

• During each FTRAN
Let E be indices of etas with pivots corresponding to nonzeros in r
1 Scan E to determine the index k of the next eta to be applied
If k is undeﬁned then FTRAN is complete
Apply eta k
If a nonzero is created in row i add to E indices of etas with pivots in row i
Go to 1

10
Hyper-sparse BTRAN

• Traditional technique for solving B T x = r by transforming r into x

do k = K , 1
rpk := (rpk − r T η k )/ηk
end do
Hyper-sparse BTRAN

• Traditional technique for solving B T x = r by transforming r into x

do k = K , 1
rpk := (rpk − r T η k )/ηk
end do

• When x is sparse most r T η k are zero
• No way to exploit hyper-sparsity properly with “column-wise” eta ﬁle
Hyper-sparse BTRAN

• Traditional technique for solving B T x = r by transforming r into x

do k = K , 1
rpk := (rpk − r T η k )/ηk
end do

•   When x is sparse most r T η k are zero
•   No way to exploit hyper-sparsity properly with “column-wise” eta ﬁle
•   After INVERT: Form a “row-wise” copy of the eta ﬁle
•   Pass row-wise eta ﬁle to hyper-sparse forward solution code

11
Exploiting hyper-sparsity in the revised simplex method

• Need to exploit hyper-sparsity wherever possible
◦ When operating with B −1 after UPDATE
◦ When selecting pivotal column and pivotal row
◦ When factorising the bump in INVERT
Form of Amdahl’s law applies
Exploiting hyper-sparsity in the revised simplex method

• Need to exploit hyper-sparsity wherever possible
◦ When operating with B −1 after UPDATE
◦ When selecting pivotal column and pivotal row
◦ When factorising the bump in INVERT
Form of Amdahl’s law applies

• Limitations
◦ Not all results of B −1r F , r T B −1 and r T N may be sparse
B            π
−1
◦ B may have small number of dense rows or columns
Exploiting hyper-sparsity in the revised simplex method

• Need to exploit hyper-sparsity wherever possible
◦ When operating with B −1 after UPDATE
◦ When selecting pivotal column and pivotal row
◦ When factorising the bump in INVERT
Form of Amdahl’s law applies

• Limitations
◦ Not all results of B −1r F , r T B −1 and r T N may be sparse
B            π
−1
◦ B may have small number of dense rows or columns

• Why not use Gilbert and Peierls?
◦ For: Guaranteed to identify vectors η k at a cost proportional to arithmetic operations
Exploiting hyper-sparsity in the revised simplex method

• Need to exploit hyper-sparsity wherever possible
◦ When operating with B −1 after UPDATE
◦ When selecting pivotal column and pivotal row
◦ When factorising the bump in INVERT
Form of Amdahl’s law applies

• Limitations
◦ Not all results of B −1r F , r T B −1 and r T N may be sparse
B            π
−1
◦ B may have small number of dense rows or columns

• Why not use Gilbert and Peierls?
◦ For: Guaranteed to identify vectors η k at a cost proportional to arithmetic operations
◦ Against: Futile if result of B −1r F or r T B −1 is not sparse
B
◦ Against: Pessimistic when numerical cancellation is considerable
◦ Against: Results of experiments with LP basis matrices

12
Speedup in total solution time and computational components

Problem    Dimension   Solution   B −1r F   r T B −1
B        rT N
π
80bau3b        2262       3.34       5.13      3.51     6.06
ﬁt2p           3000       1.75       1.30     12.22    13.47
stocfor3      16675       1.85       1.14      7.26     7.61
dcp2          32388       5.32       8.24      6.21     6.20
ken-11        14694      22.84      98.04     27.22    66.36
ken-13        28632      12.12     104.09     12.87    17.60
ken-18       105127      15.27     263.94     13.91    19.92
pds-06         9881      17.48      24.07     21.58    28.18
pds-10        16558      10.36      11.24     16.60    17.55
pds-20        33874      10.35       5.96     14.33    15.40
Speedup in total solution time and computational components

Problem    Dimension     Solution   B −1r F    r T B −1
B        rT N
π
80bau3b         2262        3.34       5.13       3.51     6.06
ﬁt2p            3000        1.75       1.30      12.22    13.47
stocfor3       16675        1.85       1.14       7.26     7.61
dcp2           32388        5.32       8.24       6.21     6.20
ken-11         14694       22.84      98.04      27.22    66.36
ken-13         28632       12.12     104.09      12.87    17.60
ken-18        105127       15.27     263.94      13.91    19.92
pds-06          9881       17.48      24.07      21.58    28.18
pds-10         16558       10.36      11.24      16.60    17.55
pds-20         33874       10.35       5.96      14.33    15.40

• Hyper-sparsity has been the major contribution to the revised simplex method over the past decade
• Revised simplex method now out-performs IPM for important classes of LP problems
• My revised simplex code is the underlying solver in several long-term industrial applications

13
Parallelising the simplex method
Parallelising the simplex method

Data parallel standard simplex method

• Good parallel eﬃciency achieved
• Totally uncompetitive with serial revised simplex method without prohibitive resources
Parallelising the simplex method

Data parallel standard simplex method

• Good parallel eﬃciency achieved
• Totally uncompetitive with serial revised simplex method without prohibitive resources

Data parallel revised simplex method

• Only immediate parallelism is in forming r T N
π
T
• When n     m, forming r π N dominates the cost of iterations: signiﬁcant speed-up achieved
Bixby and Martin (2000)
Parallelising the simplex method

Data parallel standard simplex method

• Good parallel eﬃciency achieved
• Totally uncompetitive with serial revised simplex method without prohibitive resources

Data parallel revised simplex method

• Only immediate parallelism is in forming r T N
π
T
• When n      m, forming r π N dominates the cost of iterations: signiﬁcant speed-up achieved
Bixby and Martin (2000)
• One “fully” parallel implementation: no speed-up achieved
Shu (1995)

14

Overlap computational components for diﬀerent iterations

• Wunderling (1996)
“Fully” parallel for only two processors: good results only when n   m

Overlap computational components for diﬀerent iterations

• Wunderling (1996)
“Fully” parallel for only two processors: good results only when n     m
• ASYNPLEX: H and McKinnon (1995)
Fully parallel (ineﬃcient) revised simplex variant: speed-up of 5 but numerically unstable

Overlap computational components for diﬀerent iterations

• Wunderling (1996)
“Fully” parallel for only two processors: good results only when n      m
• ASYNPLEX: H and McKinnon (1995)
Fully parallel (ineﬃcient) revised simplex variant: speed-up of 5 but numerically unstable
• PARSMI: H and McKinnon (1996)
Fully parallel revised simplex variant: speed-up of 2 but numerically unstable

Overlap computational components for diﬀerent iterations

• Wunderling (1996)
“Fully” parallel for only two processors: good results only when n      m
• ASYNPLEX: H and McKinnon (1995)
Fully parallel (ineﬃcient) revised simplex variant: speed-up of 5 but numerically unstable
• PARSMI: H and McKinnon (1996)
Fully parallel revised simplex variant: speed-up of 2 but numerically unstable
• SYNPLEX: H (2005)
“Fully” parallel revised simplex variant: speed-up of 3 and numerically stable

Overlap computational components for diﬀerent iterations

• Wunderling (1996)
“Fully” parallel for only two processors: good results only when n      m
• ASYNPLEX: H and McKinnon (1995)
Fully parallel (ineﬃcient) revised simplex variant: speed-up of 5 but numerically unstable
• PARSMI: H and McKinnon (1996)
Fully parallel revised simplex variant: speed-up of 2 but numerically unstable
• SYNPLEX: H (2005)
“Fully” parallel revised simplex variant: speed-up of 3 and numerically stable

All data and task parallel implementations compromised by serial inversion of B

15
SYNPLEX performance

7

6

5

4

3

2

1

0

3.84800       3.85000           3.85200            3.85400       3.85600        3.85800

−1                     T   −1             T
Form B        rF    Form r B B            Form r π N       Invert B

16
Parallel basis matrix inversion
Parallel basis matrix inversion

• Both phases of Tomlin INVERT can dominate the computational cost
• Depends on the nature of the LP
Parallel basis matrix inversion

• Both phases of Tomlin INVERT can dominate the computational cost
• Depends on the nature of the LP
• Parallel bump factorisation (important for smaller, non hyper-sparse LPs)
◦ Consider using existing codes such as MUMPS or SuperLU
Parallel basis matrix inversion

• Both phases of Tomlin INVERT can dominate the computational cost
• Depends on the nature of the LP
• Parallel bump factorisation (important for smaller, non hyper-sparse LPs)
◦ Consider using existing codes such as MUMPS or SuperLU
• Parallel triangularisation (important for larger, hyper-sparse LPs)
◦ Of greater interest: H (2006–date)

17
Parallel triangularisation

Each processor p = 1, . . ., N has its own row set Rp and column set Cp

• Initialisation: Determine row and column count data for Rp and Cp
Parallel triangularisation

Each processor p = 1, . . ., N has its own row set Rp and column set Cp

• Initialisation: Determine row and column count data for Rp and Cp
• Major iteration: Repeat:
◦ Minor iteration: Identify row (column) singletons in Rp (Cp) until it is “wise” to stop
Parallel triangularisation

Each processor p = 1, . . ., N has its own row set Rp and column set Cp

• Initialisation: Determine row and column count data for Rp and Cp
• Major iteration: Repeat:
◦ Minor iteration: Identify row (column) singletons in Rp (Cp) until it is “wise” to stop
◦ Broadcast pivot indices to all other processors
Parallel triangularisation

Each processor p = 1, . . ., N has its own row set Rp and column set Cp

• Initialisation: Determine row and column count data for Rp and Cp
• Major iteration: Repeat:
◦ Minor iteration: Identify row (column) singletons in Rp (Cp) until it is “wise” to stop
◦ Broadcast pivot indices to all other processors
◦ Update row (column) count data on each processor using pivot indices from all other
processors
Parallel triangularisation

Each processor p = 1, . . ., N has its own row set Rp and column set Cp

• Initialisation: Determine row and column count data for Rp and Cp
• Major iteration: Repeat:
◦ Minor iteration: Identify row (column) singletons in Rp (Cp) until it is “wise” to stop
◦ Broadcast pivot indices to all other processors
◦ Update row (column) count data on each processor using pivot indices from all other
processors
• Until there are no row or column singletons
Parallel triangularisation

Each processor p = 1, . . ., N has its own row set Rp and column set Cp

• Initialisation: Determine row and column count data for Rp and Cp
• Major iteration: Repeat:
◦ Minor iteration: Identify row (column) singletons in Rp (Cp) until it is “wise” to stop
◦ Broadcast pivot indices to all other processors
◦ Update row (column) count data on each processor using pivot indices from all other
processors
• Until there are no row or column singletons

Communication cost of O(m log N ) for computation cost O(τ )

Perform minor iterations in parallel and update in parallel

18
Parallel triangularisation: Example

19
Parallel triangularisation: Iteration 1
Processors
1             2        3           4

7
6
3

Singleton rows
1
1
3
3

Row counts
3
3
4

1    6   4   2     4   3   3    3   4       4
Column counts
Singleton column

20
Parallel triangularisation: Iteration 1 result

1        2      3     4

21
Parallel triangularisation: Iteration 2

1              2           3       4

5
2

Singleton row
1
2
3
2
3

3      1   3   3   2       3       3

Singleton column

22
Parallel triangularisation: Iteration 3

1        2           3       4

2

Singleton row
1
2
2
2

2        2   1       2       3

Singleton column

23
Parallel triangularisation: Iteration 4

1        2       3     4

2
2
2

2        2             2

24
SunFire E15K implementation: speedup

•   For model pds-100
•   156243 rows
•   7485 logicals
•   Bump dimension is 1655
•   10% of pivots per major iteration
SunFire E15K implementation: speedup

•   For model pds-100                                     Processors
•   156243 rows                         Pivots     2     4     8     16    32
•   7485 logicals                        90%     2.1   3.9 6.1 7.5        5.8
•   Bump dimension is 1655               99%     2.0   3.7 5.6 6.4        4.6
•   10% of pivots per major iteration
SunFire E15K implementation: speedup

•   For model pds-100                                     Processors
•   156243 rows                         Pivots     2     4     8     16    32
•   7485 logicals                        90%     2.1   3.9 6.1 7.5        5.8
•   Bump dimension is 1655               99%     2.0   3.7 5.6 6.4        4.6
•   10% of pivots per major iteration   100%     1.9   3.7 5.3 5.6        3.8

25
SunFire E15K implementation: speedup
Processors
Model          Rows     2     4     8     16    32
ken-13        28632   1.2   1.6 1.6 1.2        0.6
mod2          35664   1.1   1.7 1.6 1.1        0.5
stormG2-125   66185   1.2   1.8 2.4 2.4        1.7
deteq27       68672   1.1   1.8 2.3 2.3        1.6
pds-40        66844   0.8   1.4 1.7 1.5        0.8
SunFire E15K implementation: speedup
Processors
Model             Rows      2     4     8     16    32
ken-13           28632    1.2   1.6 1.6 1.2        0.6
mod2             35664    1.1   1.7 1.6 1.1        0.5
stormG2-125      66185    1.2   1.8 2.4 2.4        1.7
deteq27          68672    1.1   1.8 2.3 2.3        1.6
pds-40           66844    0.8   1.4 1.7 1.5        0.8
ken-18          105127    1.3   2.7 4.3 4.6        3.4
pds-80          129181    1.8   3.5 5.3 5.1        3.1
pds-100         156243    1.9   3.7 5.3 5.6        3.8
sgpf5y6         246077    1.6   3.8 7.0 8.3        8.0
watson 2        352013    1.7   3.3 5.8 7.4        6.3

• Good parallel performance is aided by cache eﬀects
• Parallel performance limited by
◦ Communication costs
◦ Overheads of update loop parallelisation

26
Parallel solution of row-linked block angular LP problems
Parallel solution of row-linked block angular LP problems

minimize     f = cT x
subject to    Ax = b x ≥ 0

                                 
A10   A11   A12   . . . A1r

              H1                      

A=                    H2                
...
                                      
                                      
Hr

• The linking rows are [ A10    A11    A12   . . . A1r ]
A10
• The master columns are
0

27
pds-02: 3518 rows, 7535 columns, 11 diagonal blocks

Source: Patient Distribution System, Carolan et al. (1990)

28
Diagonal block of pds-02

29
dcp1: 4950 rows, 3007 columns, 87 diagonal blocks

Source: Industrial, Hall (1997)

30
Methods for row-linked block angular LP problems

minimize      f = cT x
subject to     Ax = b x ≥ 0

                                 
A10   A11   A12   . . . A1r

             H1                      

A=                   H2                
...
                                     
                                     
Hr

•   Kaul’s algorithm (1965) performs simplex iterations exploiting block angular structure
•   Dense arithmetic restricted to “small” matrices
•   Scope for parallel solver of practical value
Particularly for problems with dense blocks

31
Kaul’s algorithm: motivation

• For A in block angular form
                                   
A10   A11   A12   . . . A1r

         H1                        

               H2                  
...
                                   
                                   
Hr
                            
H1
                 H2                  
• Let A1 = [ A11   A12   . . . A1r ] and H = 
                          ...        

Hr
A10    A1
• So A =
H

32
Kaul’s algorithm: motivation (continued)

A10   A1                                          D   E
• When A =              the basis matrix B has structure B =
H                                           R   T
Kaul’s algorithm: motivation (continued)

A10   A1                                           D   E
• When A =                   the basis matrix B has structure B =
H                                            R   T

• T has nonsingular diagonal blocks Ti, for i = 1, . . ., r
Kaul’s algorithm: motivation (continued)

A10    A1                                               D    E
• When A =                       the basis matrix B has structure B =
H                                                R    T

• T has nonsingular diagonal blocks Ti, for i = 1, . . ., r
• Standard simplex tableau can be represented as
−1
−1          D   E        A10   A1        Z11     Z12       Z1
Z=B            A=                             =                  =
R   T              H         Z21     Z22       Z2
Kaul’s algorithm: motivation (continued)

A10    A1                                               D    E
• When A =                       the basis matrix B has structure B =
H                                                R    T

• T has nonsingular diagonal blocks Ti, for i = 1, . . ., r
• Standard simplex tableau can be represented as
−1
−1          D   E        A10   A1        Z11     Z12       Z1
Z=B            A=                             =                  =
R   T              H         Z21     Z22       Z2

• Kaul’s algorithm performs simplex iterations by maintaining Z1 explicitly and Z2 implicitly
Kaul’s algorithm: motivation (continued)

A10    A1                                               D    E
• When A =                       the basis matrix B has structure B =
H                                                R    T

• T has nonsingular diagonal blocks Ti, for i = 1, . . ., r
• Standard simplex tableau can be represented as
−1
−1          D   E        A10   A1        Z11     Z12       Z1
Z=B            A=                             =                  =
R   T              H         Z21     Z22       Z2

• Kaul’s algorithm performs simplex iterations by maintaining Z1 explicitly and Z2 implicitly
• Requires V = −T −1R and T −1 given by Ti−1, for i = 1, . . ., r
Kaul’s algorithm: motivation (continued)

A10    A1                                               D    E
• When A =                       the basis matrix B has structure B =
H                                                R    T

• T has nonsingular diagonal blocks Ti, for i = 1, . . ., r
• Standard simplex tableau can be represented as
−1
−1          D   E        A10   A1        Z11     Z12       Z1
Z=B            A=                             =                  =
R   T              H         Z21     Z22       Z2

• Kaul’s algorithm performs simplex iterations by maintaining Z1 explicitly and Z2 implicitly
• Requires V = −T −1R and T −1 given by Ti−1, for i = 1, . . ., r
◦ Matrices Z1 and V have one “small” dimension
◦ Each Ti−1 is a “small” square matrix
Kaul’s algorithm: motivation (continued)

A10    A1                                               D    E
• When A =                       the basis matrix B has structure B =
H                                                R    T

• T has nonsingular diagonal blocks Ti, for i = 1, . . ., r
• Standard simplex tableau can be represented as
−1
−1          D   E        A10   A1        Z11     Z12       Z1
Z=B            A=                             =                  =
R   T              H         Z21     Z22       Z2

• Kaul’s algorithm performs simplex iterations by maintaining Z1 explicitly and Z2 implicitly
• Requires V = −T −1R and T −1 given by Ti−1, for i = 1, . . ., r
◦ Matrices Z1 and V have one “small” dimension
◦ Each Ti−1 is a “small” square matrix
◦ Updating each of Z1, V and Ti−1 requires a rank-one outer product operation

33
Kaul’s algorithm: parallel implementation

Implemented on UoE’s new 512 processor cluster eddie using p = 2m + 1 processors
Kaul’s algorithm: parallel implementation

Implemented on UoE’s new 512 processor cluster eddie using p = 2m + 1 processors

Boduroglu, H and Hogg (2007-date)

• Processor 0 performs operations with Λ = T −1
• Processors 1 . . . 2m perform operations with V and Z1

1     2           1       3                   2m 1
Λ=                                 V=                Z1=
3     4           2       4                   2m

0

2m 1   2m

34
Results: pds-05

Processors    Time    Speed-up
1+2     1207.8        1.0
1+4      594.6        2.0
1+6      570.8        2.1
1+8      446.4        2.7
1+12      337.9        3.6
1+14      288.3        4.2
1+16      570.3        2.1
Results: pds-05

Processors     Time    Speed-up
1+2      1207.8        1.0
1+4       594.6        2.0
1+6       570.8        2.1
1+8       446.4        2.7
1+12       337.9        3.6
1+14       288.3        4.2
1+16       570.3        2.1

Observations:

• Code now exploits sparsity in outer-products: speed-up reduced!
Results: pds-05

Processors     Time     Speed-up
1+2      1207.8         1.0
1+4       594.6         2.0
1+6       570.8         2.1
1+8       446.4         2.7
1+12       337.9         3.6
1+14       288.3         4.2
1+16       570.3         2.1

Observations:

• Code now exploits sparsity in outer-products: speed-up reduced!
Results: pds-05

Processors     Time     Speed-up
1+2      1207.8         1.0
1+4       594.6         2.0
1+6       570.8         2.1
1+8       446.4         2.7
1+12       337.9         3.6
1+14       288.3         4.2
1+16       570.3         2.1

Observations:

• Code now exploits sparsity in outer-products: speed-up reduced!
• pds problems are too sparse
Results: pds-05

Processors    Time     Speed-up
1+2     1207.8         1.0
1+4      594.6         2.0
1+6      570.8         2.1
1+8      446.4         2.7
1+12      337.9         3.6
1+14      288.3         4.2
1+16      570.3         2.1

Observations:

•   Code now exploits sparsity in outer-products: speed-up reduced!
•   pds problems are too sparse
•   Need to solve other problems: dcp1
Results: pds-05

Processors    Time     Speed-up
1+2     1207.8         1.0
1+4      594.6         2.0
1+6      570.8         2.1
1+8      446.4         2.7
1+12      337.9         3.6
1+14      288.3         4.2
1+16      570.3         2.1

Observations:

•   Code now exploits sparsity in outer-products: speed-up reduced!
•   pds problems are too sparse
•   Need to solve other problems: dcp1
•   First resolve numerical instabilities

35
LP problems in sparse reconstruction
LP problems in sparse reconstruction

• Under-determined system from signal processing, notably compressed sensing
Ax = b where       x∈I n
R       and b ∈ I m
R        for   m < n (possibly m    n)
• A is a full matrix
LP problems in sparse reconstruction

• Under-determined system from signal processing, notably compressed sensing
Ax = b where        x∈I n
R         and b ∈ I m
R         for   m < n (possibly m   n)
• A is a full matrix
• Seek the sparsest solution
minimize x   0   subject to Ax = b
LP problems in sparse reconstruction

• Under-determined system from signal processing, notably compressed sensing
Ax = b where        x∈I n
R         and b ∈ I m
R         for   m < n (possibly m   n)
• A is a full matrix
• Seek the sparsest solution
minimize x   0   subject to Ax = b

• May be found by solving
minimize x   1   subject to Ax = b
LP problems in sparse reconstruction

• Under-determined system from signal processing, notably compressed sensing
Ax = b where        x∈I n
R         and b ∈ I m
R         for   m < n (possibly m   n)
• A is a full matrix
• Seek the sparsest solution
minimize x   0   subject to Ax = b

• May be found by solving
minimize x   1   subject to Ax = b

• A may not be known explicitly
• Ax and AT y may be formed in as little as O(n) operations

36
Solution using the simplex method

• Optimality conditions: Lagrange multipliers y and s exist such that

Ax = b        2AT y + s = 0        −2e ≤ s ≤ 2e         xj sj = 0 ∀j
Solution using the simplex method

• Optimality conditions: Lagrange multipliers y and s exist such that

Ax = b        2AT y + s = 0        −2e ≤ s ≤ 2e         xj sj = 0 ∀j

• Optimal solution has only k m nonzero components: highly degenerate
• y = 0 ⇒ s = 0 is dual feasible
Solution using the simplex method

• Optimality conditions: Lagrange multipliers y and s exist such that

Ax = b        2AT y + s = 0        −2e ≤ s ≤ 2e          xj sj = 0 ∀j

•   Optimal solution has only k       m nonzero components: highly degenerate
•   y = 0 ⇒ s = 0 is dual feasible
•   Solve the problems using the dual revised simplex method
•   Maintains dual feasibility whilst seeking primal feasibility
•   Same computational components as the (primal) revised simplex method
Solution using the simplex method

• Optimality conditions: Lagrange multipliers y and s exist such that

Ax = b        2AT y + s = 0        −2e ≤ s ≤ 2e          xj sj = 0 ∀j

•   Optimal solution has only k       m nonzero components: highly degenerate
•   y = 0 ⇒ s = 0 is dual feasible
•   Solve the problems using the dual revised simplex method
•   Maintains dual feasibility whilst seeking primal feasibility
•   Same computational components as the (primal) revised simplex method
•   If m     n then cost of r T N dominates
π
T
◦ If operations with A are fast then form AT r π
◦ Otherwise, scope for exploiting parallelism
Solution using the simplex method

• Optimality conditions: Lagrange multipliers y and s exist such that

Ax = b        2AT y + s = 0      −2e ≤ s ≤ 2e          xj sj = 0 ∀j

•   Optimal solution has only k     m nonzero components: highly degenerate
• y = 0 ⇒ s = 0 is dual feasible
• Solve the problems using the dual revised simplex method
• Maintains dual feasibility whilst seeking primal feasibility
• Same computational components as the (primal) revised simplex method
• If m     n then cost of r T N dominates
π
T
◦ If operations with A are fast then form AT r π
◦ Otherwise, scope for exploiting parallelism
• If A is not available, compute columns when required as Aeq

37
Conclusions

• Hyper-sparsity (1998–2003)
◦ Successful
◦ Maybe re-visit Gilbert and Peierls
◦ Slavova’s work may be valuable for parallel hyper-sparse simplex
Conclusions

• Hyper-sparsity (1998–2003)
◦ Successful
◦ Maybe re-visit Gilbert and Peierls
◦ Slavova’s work may be valuable for parallel hyper-sparse simplex
• Parallelism (1992–1997; 2004–date)
◦ Relatively unsuccessful: but an irresistible challenge!
◦ Parallel triangularisation oﬀers renewed hope
Conclusions

• Hyper-sparsity (1998–2003)
◦ Successful
◦ Maybe re-visit Gilbert and Peierls
◦ Slavova’s work may be valuable for parallel hyper-sparse simplex
• Parallelism (1992–1997; 2004–date)
◦ Relatively unsuccessful: but an irresistible challenge!
◦ Parallel triangularisation oﬀers renewed hope
• Block-angular LP problems (2007–date)
◦ Modest success so far
◦ Parallel sparse revised simplex for row-linked block diagonal LPs
Conclusions

• Hyper-sparsity (1998–2003)
◦ Successful
◦ Maybe re-visit Gilbert and Peierls
◦ Slavova’s work may be valuable for parallel hyper-sparse simplex
• Parallelism (1992–1997; 2004–date)
◦ Relatively unsuccessful: but an irresistible challenge!
◦ Parallel triangularisation oﬀers renewed hope
• Block-angular LP problems (2007–date)
◦ Modest success so far
◦ Parallel sparse revised simplex for row-linked block diagonal LPs
• LP problems in sparse reconstruction (2007–date)
◦ Too soon to tell!

38

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 38 posted: 7/9/2010 language: English pages: 109