CHARMM-G, A GPU Based MD Simulation Code With PME and Reaction Force Field For Studying Large Membrane Regions

CHARMM-G, a GPU based MD Simulation
code with PME and Reaction Force field for

Studying Large Membrane Regions
Narayan Ganesan1, Sandeep Patel2, and Michela Taufer1
Computer and Info. Sciences Dept.1
Chemistry and Biochemistry Dept.2
University of Delaware
Outline
Overview of forces in molecular dynamics
Data structures and methodology
PME for long distance electrostatic interactions
Steps involved in PME calculations
Performance and profiling of large membranes
Related work and conclusions
Classical Forces
Classical
Forces
Bonde
d
Bond
s
Angl
es
NonBonded
Dihedr
als
Van der
Waals
Electrosta
tic
Reaction
Field (RF)
PME
Bond Interactions
Bond forces: acts only within pairs of
molecules
Angle forces: acts only within a triad of atoms

Torsion or dihedral forces: acts only within
quartet of atoms
Non-bond Interactions:
Van der Waals Potential
Van der Waals or Lennard-Jones potential: Decays
rapidly with distance
EVDW

4
r
12
A cutoff of ~10A,
accurately
captures the
effect of the Van
der Waals
potential
Non-Bond Interactions:
Electrostatic Potential
Coulomb Potential: inverse square law
U coulomb
1
q1q2
=
4 0 | r1 r2 |
Fcoulomb
1 q1q2
4 0 | r1 r2 |2
Decays as 1/r with

distance
Since 1/r decays rather
slowly, the potential can
act over long distances
Choosing a cutoff for electrostatic force/potential

causes computational errors and inaccuracy
Our solutions to sum long distance electrostatic
forces:
Reaction Force Field (RF)
Ewald summation / Particle Mesh Ewald (PME)
GPU Implementation: Data

Structures
A single thread is assigned to each atom
For each atom a set of lists is maintained:
Bond list stores list of bonds the atom belongs to
Angle list stores list of angles the atom belongs to
Dihedral list stores list of dihedrals the atom belongs to
Nonbond list stores non-bond interactions with atoms
within cutoff
Nonbondlist for q5: q2,
r2
q6,
r6
q8,
r8
q9,
r9
q1
q2
q3
q4
q5
q6
q7
q8
q9
MD Simulation
MD simulations are iterative executions of MD
steps
Each iteration computes forces
onangle,
each
particle
Bond,
and
due to:
dihedral lists are
unchanged for each
Bonds Bond List
atom throughout the
Angles Angle List
Nonbond list is updated
- Nonbond List simulation
Dihedrals Dihedral List
based on a cutoff buffer
Electrostatic
Van der Waals
If Ewald summation is used an additional

component is added:
Long distance interaction using PME method
Ways to Update Nonbond

List
Global neighbor list
Each thread can iterate through
the global list of atoms to build
the nonbond list
Cell-based neighbor list

Divide the domain into equal
cells of size = cutoff

Search only in current cell and
adjacent cells for neighboring
atoms
There are 26 adjacent cells and
1 current cell in 3-dimensions
Cell-based list is computationally very efficient but

also needs regular cell updates
Cell Updates
Single thread manages a
single or a set of cells
Each cell is managed by a
list of atoms in the cell
called CellList
When an atom i moves
from Cell A to Cell B, the
thread responsible for Cell A
updates the list of Cell B via
thread safe integer atomic
intrinsics
Invalid atoms are removed
from the cell lists by the
CellClean kernel
Periodic Boundary Condition

q1
q2
q3
q5
q6
q7
q8
q9
q1
q2
q3
q5
q6
q7
q8
q1
q4
q4
q4
q7
q1
q2
q3
q5
q6
q7
q8
q9
q1
q2
q3
q4
q1
q2
q3
q5
q6
q7
q8
q9
q1
q2
q3
q5
q6
q4
q5
q6
q9
q4
q7
q8
q9
q7
q8
q9
q2
q3
q1
q2
q3
q1
q2
q3
q5
q6
q5
q6
q5
q6
q8
q9
q8
q9
q8
q9
q4
q7
Region of influence
q4
q4
q7
Cell of
interest of
edge vectors
ax, ay
Reaction Force Field

Any molecule is surrounded by spherical cavity of
finite radius
Within the radius, electrostatic interactions are
calculated explicitly
Outside the cavity, the system is treaded as a dielectric
continuum
2 the
This model allows

the
replacement
of
Coulomb
1
1 B0 rji infinite
plus
reaction
qiqsum
the
CoulombUsum
finite
potential
c = by a
j
3
4 0 i< j
rij 2Rc
filed
where the second terms is the reaction filed correction and

Rc is the radius of the cavity
Ewald Summation Method

(I)
Proposed by Paul Peter Ewald in 1921 for
crystallographic systems
Has found applications in molecular, astrophysical
and crystallographic systems
Used to sum inverse distance potential over long
distance efficiently e.g., Gravity and Coulomb
Potential.
Was started to be used in the late 70s for
numerical simulations
O(NlogN) instead of O(NxN)
Ewald Summation Method

Three contributions to the total energy, depending
on the distance of the interaction:
Direct space (Edir)
Reciprocal space (Erec)
Self energy (Eself )
Ewald Summation Method (II)

Divide interactions into short range (Direct
Space) and long range (Reciprocal Space)
Edir
Short
Rang
e
Erec
qi q j erfc( | ri rj |)
4 0 1 | ri rj |
Direct space
using Nonbond
List
2V
Long
Rang
e
exp( 2 m 2 / 2 )
S ( m) S ( m )
2
m
m0
Fourier
Space
V - Volume of the
simulation region
S(m) Structure
parameters
Steps in SMPE
1Put
charges on grids
3
FFT of charge grid
Multiply with
structure
constants
Convolution yields potential at grid

points
which have to be summed
U
Compute force on atom i by calculating
ri
FFT back
Charge Spreading
Each charge is spread on a 4x4x4 = 64 grid points
in 3-D
Grid spacing 1 A by a cardinal B-Spline of order 4
Create a 3 dimensional Charge Matrix Q.
Mesh-based charge density
Essman et al.,
J. Chem.
Approximation by sum of charges at each grid
pointPhys. 1995
Multiple charges can influence a single lattice point
Charges
Q(k1 , k 2 , k3 )
q M
i 1..n
( xi k1 ) M 4 ( yi k 2 ) M 4 ( zi k3 )
xi yi zi: position of the ith charge; k1 k2 k3: index of the lattice
Cardinal B-Spline of Order 4

B-Spline has a region of
influence of 4 units
Each unit = 1A
During charge
spreading B-Spline has
an impact on the
neighboring 4x4x4 cells
in 3 dimensions
CPU vs. GPU Charge

Spreading
Charge Spreading by a cardinal B-Spline of order 4:
Unit cell
charges
CPU implementation is straightforward
Time computation: Natoms x 4 x 4 x 4 time steps

GPU implementation is hard to parallelize
Can lead to racing conditions - need floating point atomic
writes
Current version of CUDA supports atomic writes for

integers only
Charges need to be converted to fixed point in order to
utilize the functionality
CPU vs. GPU Charge

Spreading
CPU spreading of charges:
GPU gathering of
charges by a
cardinal B-Spline
of order 4:
Each thread is
assigned
to a lattice point
Charge spreading on GPU can be parallelized easily by the
grid points instead of the atoms

Each thread works on a single or a set of grid points
Need O(ax*ay*az) threads, with each thread parsing through
all the atoms within 4x4x4 neighborhood > O(N)
GPU Charge Spreading (I)

Each lattice point maintains a list of atoms within
4x4x4 neighborhood for charge gathering
Effect of charges 1, 2, 3 are

gathered at the lattice point
q1, r1
Neighbor list of point:
q2, r2
q3, r3
GPU Charge Spreading (II)

When a charge moves, several
lattice points need to be
updated
2
The charge is added to the
2
neighbor list of lattice points in
dark gray
The charge is removed from
the neighbor list of lattice
points in light gray
Lattice points in white are not
affected
Since there are equal number
The threads for lattice
points
light
of light
gray in
and
darkgray
gray
update the list of lattice
dark gray
latticepoints
points,in
a 1-to-1
GPU Charge Spreading (III)
1
2
2
When a single lattice point is

updated by multiple threads,
thread safe integer atomic
intrinsics are used to update
the cell lists
Fast Fourier Transform

CUFFT provides library functions to compute FFT
and inverse FFT

3D FFT implemented with series of 1D FFTs and
transpositions
CUFFTExec can be optimized by choosing proper
FFT dimensions
Power of 2
Scientific Challenge
One-third of the human genome is composed of
membrane-bound proteins
Pharmaceuticals target membrane-bound protein
receptors e.g., G-protein coupled receptors
Importance of systems to human health and understanding
of dysfunction
State-of-the-art simulations only consider small

regions (or patches) of physiological membranes
Heterogeneity of the membrane spans length scales
much larger than included in these smaller model
systems.
Our goal: apply large-scale GPU-enabled
computations for the study of large membrane
regions
DMPC
DiMyristoylPhosphatidylCholine (DMPC) lipid bilayers
Small system: 17 004

atoms, 46. 8A x 46.8 A
x 76.0 A
Large system: 68 484
atoms, 93.6 A x 93.6 A x
152.0 A
92A
92A
Explicit solvent i.e., water
Membrane
152A
Performance
Small membrane (17 004 atoms)
Large membrane (68 484 atoms)
Case studies: Global neighbor list and RF (I), with cell-based list and RF (II),
with neighbor list
Kernel Profiling (I)

Large membrane RF method
Kernel Profiling (II)

Large membrane PME method
Related Work
Other MD code including PME method:
M. J. Harvey and G. De. Fabritiis, J. Chem. Theory and Comp,
2009
Our implementation is different in terms of:

Charge spreading algorithm
Force field methods, including RF
Conclusions and Future

Work
CHARMM-G is a flexible MD code based on the
CHARMM force field integrating

Ewald summation
Reaction force field
The code supports explicit solvent representations
and enables fast simulations of large membrane
regions
Improvements of the CUDA FFT will further
improve the performance presented in the paper
Future work include:
Code optimizations and parallelization across multiple
GPUs
Scientific characterization of large membranes
Acknowledgements
GCL Members:
Trilce Estrada
Abel Licon
Ganesan
Lifan Xu
Boyu Zhang
Narayan
Philip Saponaro
Maria
Ruiz
Michela Taufer
Collaborators:
Sandeep Patel, Brad A. Bauer,
Joseph E. Davis (Dept. of Chemistry,
UD)
Related work:
GCL members in Spring 2010

Sponsors:
Bauer et al, JCC 2010 (In Press)

Davis et al., BICoB 2009
More questions:
taufer@udel.edu
32

CHARMM-G, A GPU Based MD Simulation Code With PME and Reaction Force Field For Studying Large Membrane Regions

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

CHARMM-G, A GPU Based MD Simulation Code With PME and Reaction Force Field For Studying Large Membrane Regions

Diunggah oleh

Hak Cipta:

Format Tersedia

CHARMM-G, a GPU based MD Simulation

code with PME and Reaction Force field for

Angle forces: acts only within a triad of atoms

Decays as 1/r with

Choosing a cutoff for electrostatic force/potential

GPU Implementation: Data

If Ewald summation is used an additional

Ways to Update Nonbond

Cell-based neighbor list

cells of size = cutoff

Cell-based list is computationally very efficient but

Periodic Boundary Condition

Reaction Force Field

This model allows

where the second terms is the reaction filed correction and

Ewald Summation Method

Ewald Summation Method

Ewald Summation Method (II)

FFT of charge grid

Convolution yields potential at grid

xi yi zi: position of the ith charge; k1 k2 k3: index of the lattice

Cardinal B-Spline of Order 4

CPU vs. GPU Charge

CPU implementation is straightforward

Time computation: Natoms x 4 x 4 x 4 time steps

Current version of CUDA supports atomic writes for

CPU vs. GPU Charge

Charge spreading on GPU can be parallelized easily by the

grid points instead of the atoms

GPU Charge Spreading (I)

Effect of charges 1, 2, 3 are

GPU Charge Spreading (II)

GPU Charge Spreading (III)

When a single lattice point is

Fast Fourier Transform

and inverse FFT

State-of-the-art simulations only consider small

Small system: 17 004

Explicit solvent i.e., water

Large membrane (68 484 atoms)

Kernel Profiling (I)

Cell-based neighbor list

Kernel Profiling (II)

Cell-based neighbor list

Our implementation is different in terms of:

Conclusions and Future

CHARMM force field integrating

GCL members in Spring 2010

Bauer et al, JCC 2010 (In Press)

Anda mungkin juga menyukai