One of the principal tools in the theoretical study of biological molecules is the method of
molecular dynamics simulations (MD). This computational method calculates the time
dependent behavior of a molecular system. MD simulations have provided detailed
information on the fluctuations and conformational changes of proteins and nucleic
acids. These methods are now routinely used to investigate the structure, dynamics and
thermodynamics of biological molecules and their complexes. They are also used in the
determination of structures from xray crystallography and from NMR experiments.
• Local Motions (0.01 to 5 Å, 1015 to 101 s)
o Atomic fluctuations
o Sidechain Motions
o Loop Motions
• Rigid Body Motions (1 to 10Å, 109 to 1s)
o Helix Motions
o Domain Motions (hinge bending)
o Subunit motions
• LargeScale Motions (> 5Å, 107 to 104 s)
o Helix coil transitions
o Dissociation/Association
o Folding and Unfolding
An overview of the theoretical foundations of classical molecular dynamics simulations, to
discuss some practical aspects of the method and to provide several specific applications
within the framework of the CHARMM program. Although the applications will be presented
in the framework of the CHARMM program, the concepts are general and applied by a
number of different molecular dynamics simulation programs. The CHARMM program is a
research program developed at Harvard University for the energy minimization and dynamics
simulation of proteins, nucleic acids and lipids in vacuum, solution or crystal environments
(Harvard CHARMM Web Page http://yuri.harvard.edu/).
Section I of this course will focus on the fundamental theory followed by a brief
discussion of classical mechanics. In section II, the potential energy function and some
related topics will be presented. Section III will discuss some practical aspects of
molecular dynamics simulations and some basic analysis. The remaining sections will
present the CHARMM program and provide some tutorials to introduce the user to the
program. This course will concentrate on the classical simulation methods (i.e., the most
common) that have contributed significantly to our understanding of biological systems.
• Protein stability
• Conformational changes
• Protein folding
• Molecular recognition: proteins, DNA, membranes,
complexes
• Ion transport in biological systems
and provide the mean to carry out the following studies,
• Drug Design
• Structure determination: Xray and NMR
Historical Background
The molecular dynamics method was first introduced by Alder and Wainwright in the
late 1950's (Alder and Wainwright, 1957,1959) to study the interactions of hard spheres.
Many important insights concerning the behavior of simple liquids emerged from their
studies. The next major advance was in 1964, when Rahman carried out the first
simulation using a realistic potential for liquid argon (Rahman, 1964). The first molecular
dynamics simulation of a realistic system was done by Rahman and Stillinger in their
simulation of liquid water in 1974 (Stillinger and Rahman, 1974). The first protein
simulations appeared in 1977 with the simulation of the bovine pancreatic trypsin
inhibitor (BPTI) (McCammon, et al, 1977). Today in the literature, one routinely finds
molecular dynamics simulations of solvated proteins, proteinDNA complexes as well as
lipid systems addressing a variety of issues including the thermodynamics of ligand
binding and the folding of small proteins. The number of simulation techniques has
greatly expanded; there exist now many specialized techniques for particular problems,
including mixed quantum mechanical classical simulations, that are being employed to
study enzymatic reactions in the context of the full protein. Molecular dynamics
simulation techniques are widely used in experimental procedures such as Xray
crystallography and NMR structure determination.
References
Alder, B. J. and Wainwright, T. E. J. Chem. Phys. 27, 1208
(1957)
Rahman, A. Phys. Rev. A136, 405 (1964)
McCammon, J. A., Gelin, B. R., and Karplus, M. Nature
(Lond.) 267, 585 (1977)
3. STATISTICAL MECHANICS
INTRODUCTION TO STATISTICAL MECHANICS:
Reference Textbooks on Statistical Mechanics
R. E. Wilde and S. Singh, Statistical Mechanics, Fundamentals
and Modern Applications (John Wiley & Sons, Inc, New York,
1998)
Statistical mechanics is the branch of physical sciences that studies macroscopic systems
from a molecular point of view. The goal is to understand and to predict macroscopic
phenomena from the properties of individual molecules making up the system. The
system could range from a collection of solvent molecules to a solvated proteinDNA
complex. In order to connect the macroscopic system to the microscopic system, time
independent statistical averages are often introduced. We start this discussion by
introducing a few definitions.
Definitions
The thermodynamic state of a system is usually defined by a small set of parameters, for
example, the temperature, T, the pressure, P, and the number of particles, N. Other
thermodynamic properties may be derived from the equations of state and other
fundamental thermodynamic equations.
The mechanical or microscopic state of a system is defined by the atomic positions, q, and
momenta, p; these can also be considered as coordinates in a multidimensional space
called phase space. For a system of N particles, this space has 6N dimensions. A single
point in phase space, denoted by Γ , describes the state of the system. An ensemble is a
collection of points in phase space satisfying the conditions of a particular
thermodynamic state. A molecular dynamics simulations generates a sequence of points
in phase space as a function of time; these points belong to the same ensemble, and they
correspond to the different conformations of the system and their respective momenta.
Several different ensembles are described below.
An ensemble is a collection of all possible systems which have different microscopic
states but have an identical macroscopic or thermodynamic state.
There exist different ensembles with different characteristics.
• Microcanonical ensemble (NVE) : The thermodynamic state
characterized by a fixed number of atoms, N, a fixed volume, V, and
a fixed energy, E. This corresponds to an isolated system.
• Canonical Ensemble (NVT): This is a collection of all systems
whose thermodynamic state is characterized by a fixed number of
atoms, N, a fixed volume, V, and a fixed temperature, T.
• IsobaricIsothermal Ensemble (NPT): This ensemble is
characterized by a fixed number of atoms, N, a fixed pressure, P,
and a fixed temperature, T.
• Grand canonical Ensemble (µ VT): The thermodynamic state
for this ensemble is characterized by a fixed chemical potential, µ ,
a fixed volume, V, and a fixed temperature, T.
CALCULATING AVERAGES FROM A MOLECULAR DYNAMICS SIMULATION
An experiment is usually made on a macroscopic sample that contains an extremely
large number of atoms or molecules sampling an enormous number of conformations. In
statistical mechanics, averages corresponding to experimental observables are defined in
terms of ensemble averages; one justification for this is that there has been good
agreement with experiment. An ensemble average is average taken over a large number
of replicas of the system considered simultaneously.
In statistical mechanics, average values are defined as ensemble averages.
The ensemble average is given by
where
is the observable of interest and it is expressed as a function of the momenta, p, and the
positions, r, of the system. The integration is over all possible variables of r and p.
The probability density of the ensemble is given by
where H is the Hamiltonian, T is the temperature, kB is Boltzmann’s constant and Q is
the partition function
Another way, as done in an MD simulation, is to determine a time average of A, which
is expressed as
where τ is the simulation time, M is the number of time steps in the simulation and
A(pN,rN) is the instantaneous value of A.
The dilemma appears to be that one can calculate time averages by molecular dynamics
simulation, but the experimental observables are assumed to be ensemble averages.
Resolving this leads us to one of the most fundamental axioms of statistical mechanics,
the ergodic hypothesis, which states that the time average equals the ensemble average.
The Ergodic hypothesis states
Ensemble average = Time average
The basic idea is that if one allows the system to evolve in time indefinitely, that system
will eventually pass through all possible states. One goal, therefore, of a molecular
dynamics simulation is to generate enough representative conformations such that this
equality is satisfied. If this is the case, experimentally relevant information concerning
structural, dynamic and thermodynamic properties may then be calculated using a
feasible amount of computer resources. Because the simulations are of fixed duration,
one must be certain to sample a sufficient amount of phase space.
Some examples of time averages:
AVERAGE POTENTIAL ENERGY
where M is the number of configurations in the molecular dynamics trajectory and Vi is
the potential energy of each configuration.
AVERAGE KINETIC ENERGY
where M is the number of configurations in the simulation, N is the number of atoms in
the system, mi is the mass of the particle i and vi is the velocity of particle i.
4. CLASSICAL MECHANICS
The molecular dynamics simulation method is based on Newton’s second law or the
equation of motion, F=ma, where F is the force exerted on the particle, m is its mass and
a is its acceleration. From a knowledge of the force on each atom, it is possible to
determine the acceleration of each atom in the system. Integration of the equations of
motion then yields a trajectory that describes the positions, velocities and accelerations of
the particles as they vary with time. From this trajectory, the average values of properties
can be determined. The method is deterministic; once the positions and velocities of each
atom are known, the state of the system can be predicted at any time in the future or the
past. Molecular dynamics simulations can be time consuming and computationally
expensive. However, computers are getting faster and cheaper. Simulations of solvated
proteins are calculated up to the nanosecond time scale, however, simulations into the
millisecond regime have been reported.
Newton’s equation of motion is given by
where Fi is the force exerted on particle i, mi is the mass of particle i and ai is the
acceleration of particle i. The force can also be expressed as the gradient of the potential
energy,
Combining these two equations yields
we obtain an expression for the velocity after integration
and since
we can once again integrate to obtain
Combining this equation with the expression for the velocity, we obtain the following
relation which gives the value of x at time t as a function of the acceleration, a, the initial
position, x0 , and the initial velocity, v0..
The acceleration is given as the derivative of the potential energy with respect to the
position, r,
Therefore, to calculate a trajectory, one only needs the initial positions of the atoms, an
initial distribution of velocities and the acceleration, which is determined by the gradient
of the potential energy function. The equations of motion are deterministic, e.g., the
positions and the velocities at time zero determine the positions and velocities at all other
times, t. The initial positions can be obtained from experimental structures, such as the x
ray crystal structure of the protein or the solution structure determined by NMR
spectroscopy.
The initial distribution of velocities are usually determined from a random distribution
with the magnitudes conforming to the required temperature and corrected so there is
no overall momentum, i.e.,
The temperature can be calculated from the velocities using the relation
where N is the number of atoms in the system.
INTEGRATION ALGORITHMS
The potential energy is a function of the atomic positions (3N) of all the atoms in the
system. Due to the complicated nature of this function, there is no analytical solution to the
equations of motion; they must be solved numerically.
Numerous numerical algorithms have been developed for integrating the equations of
motion. We list several here.
• Verlet algorithm
• Leapfrog algorithm
• Velocity Verlet
• Beeman’s algorithm
Important: In choosing which algorithm to use, one should consider the following
criteria:
• The algorithm should conserve energy and momentum.
• It should be computationally efficient
• It should permit a long time step for integration.
INTEGRATION ALGORITHMS
All the integration algorithms assume the positions, velocities and accelerations can be
approximated by a Taylor series expansion:
Where r is the position, v is the velocity (the first derivative with respect to time), a is
the acceleration (the second derivative with respect to time), etc.
To derive the Verlet algorithm one can write
Summing these two equations, one obtains
THE LEAPFROG ALGORITHM
THE VELOCITY VERLET ALGORITHM
This algorithm yields positions, velocities and accelerations at time t. There is no
compromise on precision.
BEEMAN’S ALGORITHM
This algorithm is closely related to the Verlet algorithm
The advantage of this algorithm is that it provides a more accurate expression for the
velocities and better energy conservation. The disadvantage is that the more complex
expressions make the calculation more expensive.
Use
of Molecular Dynamics Sim
ulation
Kinetics and irreversible processes
•chemical reaction kinetics (with QM)
• conformational changes, allosteric
mechanisms
• Protein folding
Equilibrium ensemble sampling
•Flexibility
• thermodynamics (free energy
changes, binding)
Modeling tool
• structure prediction / modeling
• solvent effects
• NMR/crystallography (refinement)
• Electron microscopy (flexible fitting)
Why
use molecular dynamics?
• MD is a sampling
method. But there are other sampling
methods like MonteCarlo (MC). So why u
se MD?
• MD gives you DYNAMICS. Other
methods can give you the ensemble
(smeared picture), but MD gives you a
movie.
• Dynamics are important because
Biological systems are
compartmentalized and are FAR FROM
EQUILIBRIUM.
• From a small molecule’s standpoint, it
doesn’t matter what the list of potential
structures of a protein are. Instead the
molecule cares about the protein’s
structures over the time it can diffusively
sample (about a nsec). And can the
molecule influence the dynamics during
its contact time?
• Consider highway traffic at rush hour,
midday and at 2 in the morning. The
average (ensemble) picture of the two
doesn’t help the poor frog trying to get
across the highway.
Atomic
Detail Computer Simulation
Model System
Molecular Mechanics Potential
Energy Surface →
Exploration by Simulation..
© Jeremy Smith
Bonded
Interactions: Stretching
Estr represents the energy required to stretch or
compress a covalent bond:
A bond can be thought
of as a spring having its own equilibrium length, ro, and
the energy required to stretch or compress it can be
approximated by the Hookean potential for an ideal
spring:
Estr = ½ ks,ij ( rij - ro )2
Bonded
Interactions: Bending
Ebend is the energy required to bend a bond from its
equilibrium angle, θ o:
Again this system can
be modeled by a spring, and the energy
is given by the Hookean potential with
respect to angle:
Bonded
Interactions: Improper Torsio
n
Eimproper is the energy required to deform a planar group
of atoms from its equilibrium angle, ω o, usually equal
to zero:
Again this system can be modeled by a spring, and
the energy is given by the Hookean potential with
respect to planar angle:
ω
i
j
l
k
© Thomas W. Shattuck
= ½ ktor,1 (1 - cos φ ) +
Etor ½ ktor,2 (1 - cos 2
φ ) + ½ ktor,3 ( 1 - cos 3 φ )
asymmetry (butane) 2-fold groups e.g. COO- standard tetrahedral
torsions
Non-Bonded
Interactions: van der Waals
EvdW is the steric exclusion and long-range attraction
energy (QM origins):
© Thomas W. Shattuck
E
E
Non-Bonded
Interactions: Coulomb
Eqq is the Coulomb potential function for electrostatic
interactions of charges:
© Thomas W. Shattuck
Formula:
Newton’s Law
Newton’s Law:
Esteric energy = Estr + Ebend + Eimproper + Etor +
EvdW + Eqq
Verlet’s
Numeric Integration Method
Taylor expansion:
Verlet’s Method
Timescale Limitations
• Protein Folding
- milliseconds/seconds (10-3-1s)
• Ligand Binding - micro/milliseconds (10-
6
-10-3 s)
• Enzyme catalysis - micro/milliseconds
(10-6-10-3 s)
• Conformational transitions -
pico/nanoseconds (10-12-10-9 s)
• Collective vibrations -
1 picosecond (10-12 s)
• Bond vibrations -
1 femtosecond (10-15 s)
Timescale Limitations
Molecular dynamics:
Integration timestep
- 1 fs, set by fastest varying force.
Accessible timescale:
about 10 nanoseconds.
Cutting Corners
SHAKE, Schlick 12.5
MTS, Schlick 13.3
PME, Schlick 9.4
Input
files for MD simulation
• A starting structure (.pdb)
• A description of structural connections
and atom types (.psf)
• A force field (.par) for the atom types
(charmm, gromacs, amber, etc)
• An input script for the MD program
(.conf)
And
• Diagonal elements
give an average property of an atom.
(ex: the RMSD of a residue). RMSD
from MD can be compared to B-factors
from X-ray chrystallography, H/D protecti
on
factors from NMR data, and order param
eters from NMR, EPR, or fluorescence.
RMSD
MD - energies
energies: kinetic and potential
Biased
MD - jumping barriers
exploring conformations
Biased MD
exploring conformations
MD simulations are
generally bad at sampling phase space.
To cover configurational
phase space, one needs to be able
to do two things well: explore canyons
and jump over energy barriers. Biased
MD algorithms have been devised to overco
me these deficiencies.
• Explore canyons
- accelerated collective motions (ACM) T4
lysozyme example.
http://cmm.info.nih.gov/intro_simulation
MOLECULAR DYNAMICS
From Wikipedia, the free encyclopedia
• 1 Areas of Application
• 2 Design Constraints
o 2.1 Microcanonical ensemble (NVE)
o 2.2 Canonical ensemble (NVT)
o 2.3 Isothermal-Isobaric (NPT) ensemble
o 2.4 Generalized ensembles
• 3 Potentials in MD simulations
o 3.1 Empirical potentials
o 3.2 Pair potentials vs. many-body potentials
o 3.3 Semi-empirical potentials
o 3.4 Polarizable potentials
o 3.5 Ab-initio methods
o 3.6 Hybrid QM/MM
o 3.7 Coarse-graining and reduced representations
• 4 Examples of applications
• 5 Molecular dynamics algorithms
o 5.1 Integrators
o 5.2 Short-range interaction algorithms
o 5.3 Long-range interaction algorithms
o 5.4 Parallelization strategies
• 6 Major software for MD simulations
• 7 Related software
• 8 Specialized hardware for MD simulations
• 9 See also
• 10 References
o 10.1 General references
• 11 External links
[EDIT ] AREAS OF APPLICATION
There is a significant difference between the focus and methods used by chemists and
physicists, and this is reflected in differences in the jargon used by the different fields. In
chemistry and biophysics, the interaction between the particles is either described by a
"force field" (classical MD), a quantum chemical model, or a mix between the two. These
terms are not used in physics, where the interactions are usually described by the name of
the theory or approximation being used and called the potential energy, or just the
"potential".
MD can also be seen as a special case of the discrete element method (DEM) in which the
particles have spherical shape (e.g. with the size of their van der Waals radii.) Some
authors in the DEM community employ the term MD rather loosely, even when their
simulations do not model actual molecules.
During a classical MD simulation, the most CPU intensive task is the evaluation of the
potential (force field) as a function of the particles' internal coordinates. Within that energy
evaluation, the most expensive one is the non-bonded or non-covalent part. In Big O
notation, common molecular dynamics simulations scale by O(n2) if all pair-wise
electrostatic and van der Waals interactions must be accounted for explicitly. This
computational cost can be reduced by employing electrostatics methods such as Particle
Mesh Ewald ( O(nlog(n)) ), P3M or good spherical cutoff techniques ( O(n) ).
Another factor that impacts total CPU time required by a simulation is the size of the
integration timestep. This is the time length between evaluations of the potential. The
timestep must be chosen small enough to avoid discretization errors (i.e. smaller than the
fastest vibrational frequency in the system). Typical timesteps for classical MD are in the
order of 1 femtosecond (1E-15 s). This value may be extended by using algorithms such as
SHAKE, which fix the vibrations of the fastest atoms (e.g. hydrogens) into place. Multiple
time scale methods have also been developed, which allow for extended times between
updates of slower long-range forces.[6][7][8]
For simulating molecules in a solvent, a choice should be made between explicit solvent
and implicit solvent. Explicit solvent particles (such as the TIP3P, SPC/E and SPC-f water
models) must be calculated expensively by the force field, while implicit solvents use a
mean-field approach. Using an explicit solvent is computationally expensive, requiring
inclusion of roughly ten times more particles in the simulation. But the granularity and
viscosity of explicit solvent is essential to reproduce certain properties of the solute
molecules. This is especially important to reproduce kinetics.
In all kinds of molecular dynamics simulations, the simulation box size must be large
enough to avoid boundary condition artifacts. Boundary conditions are often treated by
choosing fixed values at the edges (which may cause artifacts), or by employing periodic
boundary conditions in which one side of the simulation loops back to the opposite side,
mimicking a bulk phase.
In the microcanonical, or NVE ensemble, the system is isolated from changes in moles
(N), volume (V) and energy (E). It corresponds to an adiabatic process with no heat
exchange. A microcanonical molecular dynamics trajectory may be seen as an exchange of
potential and kinetic energy, with total energy being conserved. For a system of N particles
with coordinates X and velocities V, the following pair of first order differential equations
may be written in Newton's notation as
The potential energy function U(X) of the system is a function of the particle coordinates
X. It is referred to simply as the "potential" in Physics, or the "force field" in Chemistry.
The first equation comes from Newton's laws; the force F acting on each particle in the
system can be calculated as the negative gradient of U(X).
For every timestep, each particle's position X and velocity V may be integrated with a
symplectic method such as Verlet. The time evolution of X and V is called a trajectory.
Given the initial positions (e.g. from theoretical knowledge) and velocities (e.g.
randomized Gaussian), we can calculate all future (or past) positions and velocities.
A temperature-related phenomenon arises due to the small number of atoms that are used
in MD simulations. For example, consider simulating the growth of a copper film starting
with a substrate containing 500 atoms and a deposition energy of 100 eV. In the real world,
the 100 eV from the deposited atom would rapidly be transported through and shared
among a large number of atoms (1010 or more) with no big change in temperature. When
there are only 500 atoms, however, the substrate is almost immediately vaporized by the
deposition. Something similar happens in biophysical simulations. The temperature of the
system in NVE is naturally raised when macromolecules such as proteins undergo
exothermic conformational changes and binding.
In the canonical ensemble, moles (N), volume (V) and temperature (T) are conserved. It is
also sometimes called constant temperature molecular dynamics (CTMD). In NVT, the
energy of endothermic and exothermic processes is exchanged with a thermostat.
A variety of thermostat methods is available to add and remove energy from the
boundaries of an MD system in a more or less realistic way, approximating the canonical
ensemble. Popular techniques to control temperature include velocity rescaling, the Nosé-
Hoover thermostat, Nosé-Hoover chains, the Berendsen thermostat and Langevin
dynamics. Note that the Berendsen thermostat might introduce the flying ice cube effect,
which leads to unphysical translations and rotations of the simulated system.
In the isothermal-isobaric ensemble, moles (N), pressure (P) and temperature (T) are
conserved. In addition to a thermostat, a barostat is needed. It corresponds most closely to
laboratory conditions with a flask open to ambient temperature and pressure.
The replica exchange method is a generalized ensemble. It was originally created to deal
with the slow dynamics of disordered spin systems. It is also called parallel tempering. The
replica exchange MD (REMD) formulation [9] tries to overcome the multiple-minima
problem by exchanging the temperature of non-interacting replicas of the system running at
several temperatures.
The reduction from a fully quantum description to a classical potential entails two main
approximations. The first one is the Born-Oppenheimer approximation, which states that
the dynamics of electrons is so fast that they can be considered to react instantaneously to
the motion of their nuclei. As a consequence, they may be treated separately. The second
one treats the nuclei, which are much heavier than electrons, as point particles that follow
classical Newtonian dynamics. In classical molecular dynamics the effect of the electrons is
approximated as a single potential energy surface, usually representing the ground state.
When finer levels of detail are required, potentials based on quantum mechanics are used;
some techniques attempt to create hybrid classical/quantum potentials where the bulk of the
system is treated classically but a small region is treated as a quantum system, usually
undergoing a chemical transformation.
Empirical potentials used in chemistry are frequently called force fields, while those used
in materials physics are called just empirical or analytical potentials.
Most force fields in chemistry are empirical and consist of a summation of bonded forces
associated with chemical bonds, bond angles, and bond dihedrals, and non-bonded forces
associated with van der Waals forces and electrostatic charge. Empirical potentials
represent quantum-mechanical effects in a limited way through ad-hoc functional
approximations. These potentials contain free parameters such as atomic charge, van der
Waals parameters reflecting estimates of atomic radius, and equilibrium bond length, angle,
and dihedral; these are obtained by fitting against detailed electronic calculations (quantum
chemical simulations) or experimental physical properties such as elastic constants, lattice
parameters and spectroscopic measurements.
Because of the non-local nature of non-bonded interactions, they involve at least weak
interactions between all particles in the system. Its calculation is normally the bottleneck in
the speed of MD simulations. To lower the computational cost, force fields employ
numerical approximations such as shifted cutoff radii, reaction field algorithms, particle
mesh Ewald summation, or the newer Particle-Particle Particle Mesh (P3M).
Chemistry force fields commonly employ preset bonding arrangements (an exception
being ab-initio dynamics), and thus are unable to model the process of chemical bond
breaking and reactions explicitly. On the other hand, many of the potentials used in
physics, such as those based on the bond order formalism can describe several different
coordinations of a system and bond breaking. Examples of such potentials include the
Brenner potential[10] for hydrocarbons and its further developments for the C-Si-H and C-O-
H systems. The ReaxFF potential[11] can be considered a fully reactive hybrid between bond
order potentials and chemistry force fields.
The potential functions representing the non-bonded energy are formulated as a sum over
interactions between the particles of the system. The simplest choice, employed in many
popular force fields, is the "pair potential", in which the total potential energy can be
calculated from the sum of energy contributions between pairs of atoms. An example of
such a pair potential is the non-bonded Lennard-Jones potential (also known as the 6-12
potential), used for calculating van der Waals forces.
Another example is the Born (ionic) model of the ionic lattice. The first term in the next
equation is Coulomb's law for a pair of ions, the second term is the short-range repulsion
explained by Pauli's exclusion principle and the final term is the dispersion interaction
term. Usually, a simulation only includes the dipolar term, although sometimes the
quadrupolar term is included as well.
In many-body potentials, the potential energy includes the effects of three or more
particles interacting with each other. In simulations with pairwise potentials, global
interactions in the system also exist, but they occur only through pairwise terms. In many-
body potentials, the potential energy cannot be found by a sum over pairs of atoms, as these
interactions are calculated explicitly as a combination of higher-order terms. In the
statistical view, the dependency between the variables cannot in general be expressed using
only pairwise products of the degrees of freedom. For example, the Tersoff potential[12],
which was originally used to simulate carbon, silicon and germanium and has since been
used for a wide range of other materials, involves a sum over groups of three atoms, with
the angles between the atoms being an important factor in the potential. Other examples are
the embedded-atom method (EAM)[13] and the Tight-Binding Second Moment
Approximation (TBSMA) potentials[14], where the electron density of states in the region of
an atom is calculated from a sum of contributions from surrounding atoms, and the
potential energy contribution is then a function of this sum.
Semi-empirical potentials make use of the matrix representation from quantum mechanics.
However, the values of the matrix elements are found through empirical formulae that
estimate the degree of overlap of specific atomic orbitals. The matrix is then diagonalized
to determine the occupancy of the different atomic orbitals, and empirical formulae are
used once again to determine the energy contributions of the orbitals.
Most classical force fields implicitly include the effect of polarizability, e.g. by scaling up
the partial charges obtained from quantum chemical calculations. These partial charges are
stationary with respect to the mass of the atom. But molecular dynamics simulations can
explicitly model polarizability with the introduction of induced dipoles through different
methods, such as Drude particles or fluctuating charges. This allows for a dynamic
redistribution of charge between atoms which responds to the local chemical environment.
For many years, polarizable MD simulations have been touted as the next generation. For
homogenous liquids such as water, increased accuracy has been achieved through the
inclusion of polarizability.[15] Some promising results have also been achieved for proteins.
[16]
However, it is still uncertain how to best approximate polarizability in a simulation.
[citation needed]
In classical molecular dynamics, a single potential energy surface (usually the ground
state) is represented in the force field. This is a consequence of the Born-Oppenheimer
approximation. In excited states, chemical reactions or a more accurate representation is
needed, electronic behavior can be obtained from first principles by using a quantum
mechanical method, such as Density Functional Theory. This is known as Ab Initio
Molecular Dynamics (AIMD). Due to the cost of treating the electronic degrees of
freedom, the computational cost of this simulations is much higher than classical molecular
dynamics. This implies that AIMD is limited to smaller systems and shorter periods of
time.
The most important advantage of hybrid QM/MM methods is the speed. The cost of doing
classical molecular dynamics (MM) in the most straightforward case scales O(n2), where N
is the number of atoms in the system. This is mainly due to electrostatic interactions term
(every particle interacts with every other particle). However, use of cutoff radius, periodic
pair-list updates and more recently the variations of the particle-mesh Ewald's (PME)
method has reduced this between O(N) to O(n2). In other words, if a system with twice as
many atoms is simulated then it would take between two to four times as much computing
power. On the other hand the simplest ab-initio calculations typically scale O(n3) or worse
(Restricted Hartree-Fock calculations have been suggested to scale ~O(n2.7)). To overcome
the limitation, a small part of the system is treated quantum-mechanically (typically active-
site of an enzyme) and the remaining system is treated classically.
In more sophisticated implementations, QM/MM methods exist to treat both light nuclei
susceptible to quantum effects (such as hydrogens) and electronic states. This allows
generation of hydrogen wave-functions (similar to electronic wave-functions). This
methodology has been useful in investigating phenomena such as hydrogen tunneling. One
example where QM/MM methods have provided new discoveries is the calculation of
hydride transfer in the enzyme liver alcohol dehydrogenase. In this case, tunneling is
important for the hydrogen, as it determines the reaction rate.[17]
[EDIT ] COARSE-GRAINING AND REDUCED REPRESENTATIONS
At the other end of the detail scale are coarse-grained and lattice models. Instead of
explicitly representing every atom of the system, one uses "pseudo-atoms" to represent
groups of atoms. MD simulations on very large systems may require such large computer
resources that they cannot easily be studied by traditional all-atom methods. Similarly,
simulations of processes on long timescales (beyond about 1 microsecond) are
prohibitively expensive, because they require so many timesteps. In these cases, one can
sometimes tackle the problem by using reduced representations, which are also called
coarse-grained models.
Examples for coarse graining (CG) methods are discontinuous molecular dynamics (CG-
DMD)[18][19] and Go-models[20]. Coarse-graining is done sometimes taking larger pseudo-
atoms. Such united atom approximations have been used in MD simulations of biological
membranes. The aliphatic tails of lipids are represented by a few pseudo-atoms by
gathering 2 to 4 methylene groups into each pseudo-atom.
• protein folding studies are often carried out using a single (or a few) pseudo-atoms per
amino acid;
• DNA supercoiling has been investigated using 1-3 pseudo-atoms per basepair, and at
even lower resolution;
• Packaging of double-helical DNA into bacteriophage has been investigated with models
where one pseudo-atom represents one turn (about 10 basepairs) of the double helix;
• RNA structure in the ribosome and other large systems has been modeled with one
pseudo-atom per nucleotide.
The simplest form of coarse-graining is the "united atom" (sometimes called "extended
atom") and was used in most early MD simulations of proteins, lipids and nucleic acids.
For example, instead of treating all four atoms of a CH3 methyl group explicitly (or all
three atoms of CH2 methylene group), one represents the whole group with a single
pseudo-atom. This pseudo-atom must, of course, be properly parameterized so that its van
der Waals interactions with other groups have the proper distance-dependence. Similar
considerations apply to the bonds, angles, and torsions in which the pseudo-atom
participates. In this kind of united atom representation, one typically eliminates all explicit
hydrogen atoms except those that have the capability to participate in hydrogen bonds
("polar hydrogens"). An example of this is the Charmm 19 force-field.
The polar hydrogens are usually retained in the model, because proper treatment of
hydrogen bonds requires a reasonably accurate description of the directionality and the
electrostatic interactions between the donor and acceptor groups. A hydroxyl group, for
example, can be both a hydrogen bond donor and a hydrogen bond acceptor, and it would
be impossible to treat this with a single OH pseudo-atom. Note that about half the atoms in
a protein or nucleic acid are nonpolar hydrogens, so the use of united atoms can provide a
substantial savings in computer time.
• MD is the standard method to treat collision cascades in the heat spike regime, i.e. the
effects that energetic neutron and ion irradiation have on solids an solid surfaces.[22][23]
The following two biophysical examples are not run-of-the-mill MD simulations. They
illustrate notable efforts to produce simulations of a system of very large size (a complete
virus) and very long simulation times (500 microseconds):
• MD simulation of the complete satellite tobacco mosaic virus (STMV) (2006, Size: 1
million atoms, Simulation time: 50 ns, program: NAMD) This virus is a small, icosahedral
plant virus which worsens the symptoms of infection by Tobacco Mosaic Virus (TMV).
Molecular dynamics simulations were used to probe the mechanisms of viral assembly. The
entire STMV particle consists of 60 identical copies of a single protein that make up the viral
capsid (coating), and a 1063 nucleotide single stranded RNA genome. One key finding is that
the capsid is very unstable when there is no RNA inside. The simulation would take a single
2006 desktop computer around 35 years to complete. It was thus done in many processors in
parallel with continuous communication between them.[24]
• Folding Simulations of the Villin Headpiece in All-Atom Detail (2006, Size: 20,000
atoms; Simulation time: 500 µs = 500,000 ns, Program: folding@home) This simulation was
run in 200,000 CPU's of participating personal computers around the world. These
computers had the folding@home program installed, a large-scale distributed computing
effort coordinated by Vijay Pande at Stanford University. The kinetic properties of the Villin
Headpiece protein were probed by using many independent, short trajectories run by CPU's
without continuous real-time communication. One technique employed was the Pfold value
analysis, which measures the probability of folding before unfolding of a specific starting
conformation. Pfold gives information about transition state structures and an ordering of
conformations along the folding pathway. Each trajectory in a Pfold calculation can be
relatively short, but many independent trajectories are needed.[25]
• Verlet-Stoermer integration
• Runge-Kutta integration
• Beeman's algorithm
• Gear predictor - corrector
• Constraint algorithms (for constrained systems)
• Symplectic integrator
• Cell lists
• Verlet list
• Bonded interactions
• Ewald summation
• Particle Mesh Ewald (PME)
• Particle-Particle Particle Mesh P3M
• Reaction Field Method