Embed
Email

Parallelization of SHAKE

Document Sample

Shared by: qinmei liao
Categories
Tags
Stats
views:
0
posted:
11/16/2011
language:
English
pages:
26
Parallelization
of
SHAKE




Yael Weinbach1 and Ron Elber1,2

1Hebrew University



2University of Texas









Weinbach
and
Elber,
Journal
of
Computational
Physics,
2005






1


Parallelization
is
the
most
effective
and
promising


approach
today
for
extending
the
lengths
of
molecular


simulations


•  More
cores
on
CPU
(~100
is
on
the
horizon),
cores
do


not
improve
much
in
speed.
(with
the
exception
of
a


recent
article
from
NYT)




•  Heterogeneous
environment:
Special
purpose


machines,
shared
memory
cores,
distributed
memory


CPUs,
GPUs




•  Parallelization
of
MD
codes
exist
on
multi‐core


systems:
GROMAX,
NAMD,
DESMOND,
AMBER,
MOIL
,


‐‐
Frequently
no
SHAKE
parallelization
beyond
trivial


decomposable
systems.




2


Elements
of
parallelization




1.  Fraction
of
serial
code


2.  Load
balancing


3.  Communication
between
PEs


4.  Local
vs
global
memory








1
 2
 3










PTime = S + T / P + C + L = S + T

Recent
questions


•  Shared
memory
vs
distributed
memory?




•  Heterogeneous
calculations?




•  Special
purpose
hardware?




•  Code
uniformity?




•  Single,
double,
non‐float?




4


The
SHAKE
algorithm






JOURNAL
OF
COMPUTATIONAL.
PHYSICS
23,
321‐341
(1977)


Numerical
integration
of
the
Cartesian
Equations
of
Motion


of
a
System
with
Constraints:
Molecular
Dynamics
of
n‐Alkanes


JEAN‐PAUL
RYCKAERT,
GIOVANNI
CICCOTTI,
AND


HERMAN
J.
C.
BERENDSEN










5


Molecular
dynamics
with
constraints










6


SHAKE
iterations










7


The
constraint
matrix










Approx.
Symm
matrix


Build
every
time
step




Similar
approach
applies
for
the
velocity
(linear
constraints)








Related
work:

Berk
Hess
JCTC,
2008


8


Matrix
SHAKE



• Need
to
determine
the
Lagrange’s
multipliers.
 λl

• In
principle
can
use
any
parallel
linear
solver.




• Every
iteration
requires
the
application
of
a
linear
solver




• Every
iteration
requires
re‐building
the
constraint
matrix
and
communication




• SHAKE
is
solution
by
relaxation.
Inherently
serial.











(k) Δt2

All ' = ∇σ l (k)

X = Xn+1

M −1 ∇σ l ' X = Xn

2

Aλ = σ

9


Storage
of
a
sparse
matrix










10


Exploit
properties
of
the
constraint


matrix
for
more
efficient
calculation


and
parallelization


1.
The
symmetric
constraint
matrix
is
non‐negative


A = ∑ bb t

non negative matrix will satisfy for all x

x t Ax ≥ 0

( )( )

because x t ∑ bb t x = ∑ x t b b t x ≥ 0

For non-negative A can find solution

of the linear equation Aλ = σ with

conjugate gradient 11


Conjugate
gradient
(CG)
solution
of


linear
problems


For Aλ = σ

Write a quadratic minimization problem

1 t

F = λ Aλ − λ σ t



2

works for non-negative A

converged in L steps (not so great)

benefits from good initial guess (previous λ )

And diagonal preconditioner A ( )−1

D

12


Conjugate
Gradient
CG










13


Every
time
step
needs
to
compute
A.
Is


there
a
way
around
it?


•  A
is
a
constant
(time
independent)
if
we


constrained
bond
length
and
bond
angles.










14


Constant
constraint
matrix


•  For
small
systems
(water,
amide
plane)
matrix


can
be
inverted
and
stored
(problem
solved,


implementation
by
us
and
recent


implementation
by
Pande).




•  Large
system
CG
iterations
are
necessary.


Requires
communication
of
λ.
Still
better
than


most
algorithms.




15


P‐
Considerations
(I)


•  Load
balancing
not
trivial
(we
used
list


parallelization,
vs
spatial
decomposition)






w
 w
 w


w


w
 w


w
 w


w


w










16


P‐Considerations
II


•  Minimize
communication
(do
not
use
bond


relaxation,
use
matrix
shake).
Exploit
properties


of
matrix
shake:


–  General
matrix:
asymmetric,
need
to
be
re‐computed


every
iteration.



–  
Approximate
(symmetric)
matrix:
computed
once


every
time
step.
Number
of
iterations
per
step


increases
slightly


–  Matrix
for
bond
and
angles
constraints:
Computed


only
once
in
the
simulations.
Slightly
larger
number
of


steps.




17


Number
of
iterations
required
by
linear
solvers
during


bond
(angle)
SHAKE
on
myoglobin




•  Relaxation
~30





•  Asymmetric
matrix
~3




•  Symmetric
matrix
~5




•  Constant
matrix
~7




18


Numerical
implementation
on
Tera










19


Membrane
system










20


Shake
parallelization


•  Necessary
to
exploit
an
ever
growing
use
of


parallel
system
in
Molecular
Dynamics


•  Matrix
(symmetric
and
constant)
seems
the


most
appropriate
for
future
application


(reducing
communication)


•  Difficult
to
load
balance
(unsolved
problem).


Common
work
around
splitting
the
system
to


decouple
constraints.




21


22


Numerical
requirement
for
a
step
in


CG










23


•  J.
Chem.
Theory
Comput.
2010,
6,
434–437


•  Constant
Constraint
Matrix
Approximation:
A
Robust,


•  Parallelizable
Constraint
Method
for
Molecular


•  Simulations


•  Peter
Eastman*,†
and
Vijay
S.
Pande‡


•  Department
of
Bioengineering
and
Department
of
Chemistry,
Stanford
UniVersity,


•  Stanford,
California
94305


•  Received
June
16,
2009


•  Abstract:
We
introduce
a
new
algorithm,
the
constant
constraint
matrix
approximation
(CCMA),


•  for
constraining
distances
in
molecular
simulations.
It
combines
the
best
features
of
many
existing


•  algorithms
while
avoiding
their
defects:
it
is
fast
and
stable,
can
be
applied
to
arbitrary
constraint


•  topologies,
and
can
be
efficiently
implemented
on
modern
parallel
architectures.
We
test
it
on


•  a
protein
with
bond
length
and
limited
angle
constraints
and
find
that
it
requires
less
than
one
sixth


•  as
many
iterations
as
SHAKE
to
converge.






24


Pande…


•  If
the
angle
is
not
constrained,
the
element
will
vary
with


•  time,
but
usually
not
very
much.
Molecular
force
fields


•  typically
include
a
harmonic
force
term
for
each
angle
that


•  restricts
its
motion
to
a
narrow
range.
This
suggests
that
if


•  we
construct
and
invert
J
once
at
the
start
of
the


simulation,
we
can
reuse
it
for
every
time
step
and
it
will


continue
to
be
a
good
approximation.
(We
note
in
passing


that
this
same
observation
was
made
by
Weinbach
and


Elber,
but
they
did
not
pursue
it
further
or
develop
an


algorithm
based
on
it.9)










25


Elements
of
parallelization


1.  Fraction
of
serial
code


Processing
 2.  Load
balancing


Element
 3.  Communication
between
PEs


4.  Local
vs
global
memory








mem










26




Related docs
Other docs by qinmei liao
Arrival RSE Financial Year
Views: 0  |  Downloads: 0
Take chill pill Workshop GO KART RACING
Views: 0  |  Downloads: 0
Abe cough with sputum
Views: 2  |  Downloads: 0
SDPI Healthy Heart Project
Views: 2  |  Downloads: 0
Alternative Trade Adjustment Assistance ATAA
Views: 0  |  Downloads: 0
Improving the Bjorken estimate PHENIX
Views: 0  |  Downloads: 0
Teacher Erase Color Rhyme
Views: 1  |  Downloads: 0
Estimates of District Domestic Product
Views: 4  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!