Anda di halaman 1dari 32

CHARMM-G, a GPU based MD Simulation

code with PME and Reaction Force field for


Studying Large Membrane Regions
Narayan Ganesan1, Sandeep Patel2, and Michela Taufer1
Computer and Info. Sciences Dept.1
Chemistry and Biochemistry Dept.2
University of Delaware

Outline
Overview of forces in molecular dynamics
Data structures and methodology
PME for long distance electrostatic interactions
Steps involved in PME calculations
Performance and profiling of large membranes
Related work and conclusions

Classical Forces
Classical
Forces

Bonde
d
Bond
s

Angl
es

NonBonded
Dihedr
als

Van der
Waals

Electrosta
tic

Reaction
Field (RF)

PME

Bond Interactions
Bond forces: acts only within pairs of
molecules

Angle forces: acts only within a triad of atoms


Torsion or dihedral forces: acts only within
quartet of atoms

Non-bond Interactions:
Van der Waals Potential
Van der Waals or Lennard-Jones potential: Decays
rapidly with distance

EVDW


4
r

12

A cutoff of ~10A,
accurately
captures the
effect of the Van
der Waals
potential

Non-Bond Interactions:
Electrostatic Potential
Coulomb Potential: inverse square law
U coulomb

1
q1q2
=
4 0 | r1 r2 |

Fcoulomb

1 q1q2

4 0 | r1 r2 |2

Decays as 1/r with


distance
Since 1/r decays rather
slowly, the potential can
act over long distances

Choosing a cutoff for electrostatic force/potential


causes computational errors and inaccuracy
Our solutions to sum long distance electrostatic
forces:
Reaction Force Field (RF)
Ewald summation / Particle Mesh Ewald (PME)

GPU Implementation: Data


Structures
A single thread is assigned to each atom
For each atom a set of lists is maintained:
Bond list stores list of bonds the atom belongs to
Angle list stores list of angles the atom belongs to
Dihedral list stores list of dihedrals the atom belongs to
Nonbond list stores non-bond interactions with atoms
within cutoff
Nonbondlist for q5: q2,
r2
q6,
r6
q8,
r8
q9,
r9

q1

q2

q3

q4

q5

q6

q7

q8

q9

MD Simulation
MD simulations are iterative executions of MD

steps
Each iteration computes forces
onangle,
each
particle
Bond,
and
due to:
dihedral lists are
unchanged for each
Bonds Bond List
atom throughout the
Angles Angle List
Nonbond list is updated
- Nonbond List simulation
Dihedrals Dihedral List
based on a cutoff buffer
Electrostatic
Van der Waals

If Ewald summation is used an additional


component is added:
Long distance interaction using PME method

Ways to Update Nonbond


List
Global neighbor list
Each thread can iterate through
the global list of atoms to build
the nonbond list

Cell-based neighbor list


Divide the domain into equal

cells of size = cutoff


Search only in current cell and
adjacent cells for neighboring
atoms
There are 26 adjacent cells and
1 current cell in 3-dimensions

Cell-based list is computationally very efficient but


also needs regular cell updates

Cell Updates
Single thread manages a
single or a set of cells
Each cell is managed by a
list of atoms in the cell
called CellList
When an atom i moves
from Cell A to Cell B, the
thread responsible for Cell A
updates the list of Cell B via
thread safe integer atomic
intrinsics
Invalid atoms are removed
from the cell lists by the
CellClean kernel

Periodic Boundary Condition


q1

q2

q3

q5

q6

q7

q8

q9

q1

q2

q3

q5

q6

q7

q8

q1

q4

q4

q4
q7

q1

q2

q3

q5

q6

q7

q8

q9

q1

q2

q3

q4

q1

q2

q3

q5

q6

q7

q8

q9

q1

q2

q3

q5

q6

q4

q5

q6

q9

q4
q7

q8

q9

q7

q8

q9

q2

q3

q1

q2

q3

q1

q2

q3

q5

q6

q5

q6

q5

q6

q8

q9

q8

q9

q8

q9

q4
q7

Region of influence

q4

q4
q7

Cell of
interest of
edge vectors
ax, ay

Reaction Force Field


Any molecule is surrounded by spherical cavity of
finite radius
Within the radius, electrostatic interactions are

calculated explicitly
Outside the cavity, the system is treaded as a dielectric
continuum

2 the

This model allows


the
replacement
of
Coulomb
1
1 B0 rji infinite

plus

reaction
qiqsum
the
CoulombUsum
finite
potential
c = by a
j
3

4 0 i< j
rij 2Rc
filed

where the second terms is the reaction filed correction and


Rc is the radius of the cavity

Ewald Summation Method


(I)
Proposed by Paul Peter Ewald in 1921 for

crystallographic systems
Has found applications in molecular, astrophysical
and crystallographic systems
Used to sum inverse distance potential over long
distance efficiently e.g., Gravity and Coulomb
Potential.
Was started to be used in the late 70s for
numerical simulations
O(NlogN) instead of O(NxN)

Ewald Summation Method


Three contributions to the total energy, depending
on the distance of the interaction:
Direct space (Edir)
Reciprocal space (Erec)
Self energy (Eself )

Ewald Summation Method (II)


Divide interactions into short range (Direct
Space) and long range (Reciprocal Space)

Edir
Short
Rang
e

Erec

qi q j erfc( | ri rj |)
4 0 1 | ri rj |
Direct space
using Nonbond
List

2V

Long
Rang
e

exp( 2 m 2 / 2 )
S ( m) S ( m )

2
m
m0

Fourier
Space
V - Volume of the
simulation region
S(m) Structure
parameters

Steps in SMPE
1Put

charges on grids
3

FFT of charge grid

Multiply with
structure
constants

Convolution yields potential at grid


points
which have to be summed

U
Compute force on atom i by calculating
ri

FFT back

Charge Spreading
Each charge is spread on a 4x4x4 = 64 grid points

in 3-D
Grid spacing 1 A by a cardinal B-Spline of order 4
Create a 3 dimensional Charge Matrix Q.
Mesh-based charge density
Essman et al.,
J. Chem.
Approximation by sum of charges at each grid
pointPhys. 1995
Multiple charges can influence a single lattice point

Charges

Q(k1 , k 2 , k3 )

q M

i 1..n

( xi k1 ) M 4 ( yi k 2 ) M 4 ( zi k3 )

xi yi zi: position of the ith charge; k1 k2 k3: index of the lattice

Cardinal B-Spline of Order 4


B-Spline has a region of

influence of 4 units
Each unit = 1A
During charge
spreading B-Spline has
an impact on the
neighboring 4x4x4 cells
in 3 dimensions

CPU vs. GPU Charge


Spreading
Charge Spreading by a cardinal B-Spline of order 4:
Unit cell
charges

CPU implementation is straightforward

Time computation: Natoms x 4 x 4 x 4 time steps


GPU implementation is hard to parallelize
Can lead to racing conditions - need floating point atomic
writes

Current version of CUDA supports atomic writes for


integers only
Charges need to be converted to fixed point in order to
utilize the functionality

CPU vs. GPU Charge


Spreading
CPU spreading of charges:

GPU gathering of
charges by a
cardinal B-Spline
of order 4:

Each thread is
assigned
to a lattice point

Charge spreading on GPU can be parallelized easily by the

grid points instead of the atoms


Each thread works on a single or a set of grid points
Need O(ax*ay*az) threads, with each thread parsing through
all the atoms within 4x4x4 neighborhood > O(N)

GPU Charge Spreading (I)


Each lattice point maintains a list of atoms within
4x4x4 neighborhood for charge gathering

Effect of charges 1, 2, 3 are


gathered at the lattice point
q1, r1
Neighbor list of point:
q2, r2
q3, r3

GPU Charge Spreading (II)


When a charge moves, several
lattice points need to be
updated
2
The charge is added to the
2
neighbor list of lattice points in
dark gray
The charge is removed from
the neighbor list of lattice
points in light gray
Lattice points in white are not
affected
Since there are equal number
The threads for lattice
points
light
of light
gray in
and
darkgray
gray
update the list of lattice
dark gray
latticepoints
points,in
a 1-to-1

GPU Charge Spreading (III)

1
2
2

When a single lattice point is


updated by multiple threads,
thread safe integer atomic
intrinsics are used to update
the cell lists

Fast Fourier Transform


CUFFT provides library functions to compute FFT

and inverse FFT


3D FFT implemented with series of 1D FFTs and
transpositions
CUFFTExec can be optimized by choosing proper
FFT dimensions
Power of 2

Scientific Challenge
One-third of the human genome is composed of
membrane-bound proteins
Pharmaceuticals target membrane-bound protein
receptors e.g., G-protein coupled receptors
Importance of systems to human health and understanding
of dysfunction

State-of-the-art simulations only consider small


regions (or patches) of physiological membranes
Heterogeneity of the membrane spans length scales
much larger than included in these smaller model
systems.
Our goal: apply large-scale GPU-enabled
computations for the study of large membrane
regions

DMPC
DiMyristoylPhosphatidylCholine (DMPC) lipid bilayers

Small system: 17 004


atoms, 46. 8A x 46.8 A
x 76.0 A
Large system: 68 484
atoms, 93.6 A x 93.6 A x
152.0 A

92A
92A

Explicit solvent i.e., water

Membrane

152A

Performance
Small membrane (17 004 atoms)

Large membrane (68 484 atoms)

Case studies: Global neighbor list and RF (I), with cell-based list and RF (II),
with neighbor list

Kernel Profiling (I)


Large membrane RF method
Global neighbor list

Cell-based neighbor list

Kernel Profiling (II)


Large membrane PME method
Global neighbor list

Cell-based neighbor list

Related Work
Other MD code including PME method:
M. J. Harvey and G. De. Fabritiis, J. Chem. Theory and Comp,
2009

Our implementation is different in terms of:


Charge spreading algorithm
Force field methods, including RF

Conclusions and Future


Work
CHARMM-G is a flexible MD code based on the

CHARMM force field integrating


Ewald summation
Reaction force field
The code supports explicit solvent representations
and enables fast simulations of large membrane
regions
Improvements of the CUDA FFT will further
improve the performance presented in the paper
Future work include:
Code optimizations and parallelization across multiple

GPUs
Scientific characterization of large membranes

Acknowledgements
GCL Members:
Trilce Estrada
Abel Licon
Ganesan
Lifan Xu

Boyu Zhang
Narayan
Philip Saponaro

Maria
Ruiz
Michela Taufer
Collaborators:
Sandeep Patel, Brad A. Bauer,
Joseph E. Davis (Dept. of Chemistry,
UD)

Related work:

GCL members in Spring 2010


Sponsors:

Bauer et al, JCC 2010 (In Press)


Davis et al., BICoB 2009

More questions:
taufer@udel.edu

32

Anda mungkin juga menyukai