Parallelization
of
SHAKE
Yael Weinbach1 and Ron Elber1,2
1Hebrew University
2University of Texas
Weinbach
and
Elber,
Journal
of
Computational
Physics,
2005
1
Parallelization
is
the
most
effective
and
promising
approach
today
for
extending
the
lengths
of
molecular
simulations
• More
cores
on
CPU
(~100
is
on
the
horizon),
cores
do
not
improve
much
in
speed.
(with
the
exception
of
a
recent
article
from
NYT)
• Heterogeneous
environment:
Special
purpose
machines,
shared
memory
cores,
distributed
memory
CPUs,
GPUs
• Parallelization
of
MD
codes
exist
on
multi‐core
systems:
GROMAX,
NAMD,
DESMOND,
AMBER,
MOIL
,
‐‐
Frequently
no
SHAKE
parallelization
beyond
trivial
decomposable
systems.
2
Elements
of
parallelization
1. Fraction
of
serial
code
2. Load
balancing
3. Communication
between
PEs
4. Local
vs
global
memory
1
2
3
PTime = S + T / P + C + L = S + T
Recent
questions
• Shared
memory
vs
distributed
memory?
• Heterogeneous
calculations?
• Special
purpose
hardware?
• Code
uniformity?
• Single,
double,
non‐float?
4
The
SHAKE
algorithm
JOURNAL
OF
COMPUTATIONAL.
PHYSICS
23,
321‐341
(1977)
Numerical
integration
of
the
Cartesian
Equations
of
Motion
of
a
System
with
Constraints:
Molecular
Dynamics
of
n‐Alkanes
JEAN‐PAUL
RYCKAERT,
GIOVANNI
CICCOTTI,
AND
HERMAN
J.
C.
BERENDSEN
5
Molecular
dynamics
with
constraints
6
SHAKE
iterations
7
The
constraint
matrix
Approx.
Symm
matrix
Build
every
time
step
Similar
approach
applies
for
the
velocity
(linear
constraints)
Related
work:
Berk
Hess
JCTC,
2008
8
Matrix
SHAKE
• Need
to
determine
the
Lagrange’s
multipliers.
λl
• In
principle
can
use
any
parallel
linear
solver.
• Every
iteration
requires
the
application
of
a
linear
solver
• Every
iteration
requires
re‐building
the
constraint
matrix
and
communication
• SHAKE
is
solution
by
relaxation.
Inherently
serial.
(k) Δt2
All ' = ∇σ l (k)
X = Xn+1
M −1 ∇σ l ' X = Xn
2
Aλ = σ
9
Storage
of
a
sparse
matrix
10
Exploit
properties
of
the
constraint
matrix
for
more
efficient
calculation
and
parallelization
1.
The
symmetric
constraint
matrix
is
non‐negative
A = ∑ bb t
non negative matrix will satisfy for all x
x t Ax ≥ 0
( )( )
because x t ∑ bb t x = ∑ x t b b t x ≥ 0
For non-negative A can find solution
of the linear equation Aλ = σ with
conjugate gradient 11
Conjugate
gradient
(CG)
solution
of
linear
problems
For Aλ = σ
Write a quadratic minimization problem
1 t
F = λ Aλ − λ σ t
2
works for non-negative A
converged in L steps (not so great)
benefits from good initial guess (previous λ )
And diagonal preconditioner A ( )−1
D
12
Conjugate
Gradient
CG
13
Every
time
step
needs
to
compute
A.
Is
there
a
way
around
it?
• A
is
a
constant
(time
independent)
if
we
constrained
bond
length
and
bond
angles.
14
Constant
constraint
matrix
• For
small
systems
(water,
amide
plane)
matrix
can
be
inverted
and
stored
(problem
solved,
implementation
by
us
and
recent
implementation
by
Pande).
• Large
system
CG
iterations
are
necessary.
Requires
communication
of
λ.
Still
better
than
most
algorithms.
15
P‐
Considerations
(I)
• Load
balancing
not
trivial
(we
used
list
parallelization,
vs
spatial
decomposition)
w
w
w
w
w
w
w
w
w
w
16
P‐Considerations
II
• Minimize
communication
(do
not
use
bond
relaxation,
use
matrix
shake).
Exploit
properties
of
matrix
shake:
– General
matrix:
asymmetric,
need
to
be
re‐computed
every
iteration.
–
Approximate
(symmetric)
matrix:
computed
once
every
time
step.
Number
of
iterations
per
step
increases
slightly
– Matrix
for
bond
and
angles
constraints:
Computed
only
once
in
the
simulations.
Slightly
larger
number
of
steps.
17
Number
of
iterations
required
by
linear
solvers
during
bond
(angle)
SHAKE
on
myoglobin
• Relaxation
~30
• Asymmetric
matrix
~3
• Symmetric
matrix
~5
• Constant
matrix
~7
18
Numerical
implementation
on
Tera
19
Membrane
system
20
Shake
parallelization
• Necessary
to
exploit
an
ever
growing
use
of
parallel
system
in
Molecular
Dynamics
• Matrix
(symmetric
and
constant)
seems
the
most
appropriate
for
future
application
(reducing
communication)
• Difficult
to
load
balance
(unsolved
problem).
Common
work
around
splitting
the
system
to
decouple
constraints.
21
22
Numerical
requirement
for
a
step
in
CG
23
• J.
Chem.
Theory
Comput.
2010,
6,
434–437
• Constant
Constraint
Matrix
Approximation:
A
Robust,
• Parallelizable
Constraint
Method
for
Molecular
• Simulations
• Peter
Eastman*,†
and
Vijay
S.
Pande‡
• Department
of
Bioengineering
and
Department
of
Chemistry,
Stanford
UniVersity,
• Stanford,
California
94305
• Received
June
16,
2009
• Abstract:
We
introduce
a
new
algorithm,
the
constant
constraint
matrix
approximation
(CCMA),
• for
constraining
distances
in
molecular
simulations.
It
combines
the
best
features
of
many
existing
• algorithms
while
avoiding
their
defects:
it
is
fast
and
stable,
can
be
applied
to
arbitrary
constraint
• topologies,
and
can
be
efficiently
implemented
on
modern
parallel
architectures.
We
test
it
on
• a
protein
with
bond
length
and
limited
angle
constraints
and
find
that
it
requires
less
than
one
sixth
• as
many
iterations
as
SHAKE
to
converge.
24
Pande…
• If
the
angle
is
not
constrained,
the
element
will
vary
with
• time,
but
usually
not
very
much.
Molecular
force
fields
• typically
include
a
harmonic
force
term
for
each
angle
that
• restricts
its
motion
to
a
narrow
range.
This
suggests
that
if
• we
construct
and
invert
J
once
at
the
start
of
the
simulation,
we
can
reuse
it
for
every
time
step
and
it
will
continue
to
be
a
good
approximation.
(We
note
in
passing
that
this
same
observation
was
made
by
Weinbach
and
Elber,
but
they
did
not
pursue
it
further
or
develop
an
algorithm
based
on
it.9)
25
Elements
of
parallelization
1. Fraction
of
serial
code
Processing
2. Load
balancing
Element
3. Communication
between
PEs
4. Local
vs
global
memory
mem
26