John von Neumann Institute for Computing NIC Classical Molecular Dynamics Godehard Sutmann published in Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms,Lecture Notes, J.Grotendorst,D.Marx,A.Muramatsu(Eds.), John von Neumann Institute for Computing,Julich, NIC Series,Vol.10,1SBN3-00-009057-6,pp.211-254,2002. C2002 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.To copy otherwise requires prior specific permission by the publisher mentioned above. http://www.fz-juelich.de/nic-series/volume10
John von Neumann Institute for Computing Classical Molecular Dynamics Godehard Sutmann published in Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, Lecture Notes, J. Grotendorst, D. Marx, A. Muramatsu (Eds.), John von Neumann Institute for Computing, Julich, ¨ NIC Series, Vol. 10, ISBN 3-00-009057-6, pp. 211-254, 2002. c 2002 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above. http://www.fz-juelich.de/nic-series/volume10
Classical Molecular Dynamics Godehard Sutmann John von Neumann Institute for Computing Central Institute for Applied Mathematics Research Centre Julich,52425 Julich,Germany E-mail:g.sutmann@fz-juelich.de An introduction to classical molecular dynamics simulation is presented.In addition to some historical notes,an overview is given over particle models,integrators and different ensemble techniques.In the end,methods are presented for parallisation of short range interaction po- tentials.The efficiency and scalability of the algorithms on massively parallel computers is discussed with an extended version of Amdahl's law. 1 Introduction Computer simulation methods have become a very powerful tool to attack the many-body problem in statistical physics.physical chemistry and biophysics.Although the theoretical description of complex systems in the framework of statistical physics is rather well de- veloped and the experimental techniques for detailed microscopic information are rather sophisticated,it is often only possible to study specific aspects of those systems in great de- tail via the simulation.On the other hand,simulations need specific input parameters that characterize the system in question,and which come either from theoretical considerations or are provided by experimental data.Having characterized a physical system in terms of model parameters.simulations are often used both to solve theoretical models beyond certain approximations and to provide a hint to experimentalists for further investigations. In the case of big experimental facilities it is even often required to prove the potential outcome of an experiment by computer simulations.In that way one can say that the field of computer simulations has developed into a very important branch of science,which on the one hand helps theorists and experimentalists to go beyond their inherent limitations and on the other hand is a scientific field on its own. The traditional simulation methods for many-body systems can be divided into two classes of stochastic and deterministic simulations.which are largely covered by the Monte Carlo(MC)method and the molecular dynamics(MD)method,respectively.Monte Carlo simulations probe the configuration space by trial moves of particles.Within the so-called Metropolis algorithm,the energy change from step n to n+1 is used as a trigger to accept or reject the new configuration.Paths towards lower energy are always accepted,those to higher energy are accepted with a probability governed by Boltzmann statistics.In that way,properties of the system can be calculated by averaging over all Monte Carlo moves (where one move means that every degree of freedom is probed once on average). By contrast,MD methods are governed by the system's Hamiltonian and consequently Hamilton's equations of motion OH OH p=- qi= Opi (1) dgi 211
Classical Molecular Dynamics Godehard Sutmann John von Neumann Institute for Computing Central Institute for Applied Mathematics Research Centre Julich, ¨ 52425 Julich, ¨ Germany E-mail: g.sutmann@fz-juelich.de An introduction to classical molecular dynamics simulation is presented. In addition to some historical notes, an overview is given over particle models, integrators and different ensemble techniques. In the end, methods are presented for parallisation of short range interaction potentials. The efficiency and scalability of the algorithms on massively parallel computers is discussed with an extended version of Amdahl’s law. 1 Introduction Computer simulation methods have become a very powerful tool to attack the many-body problem in statistical physics, physical chemistry and biophysics. Although the theoretical description of complex systems in the framework of statistical physics is rather well developed and the experimental techniques for detailed microscopic information are rather sophisticated, it is often only possible to study specific aspects of those systems in great detail via the simulation. On the other hand, simulations need specific input parameters that characterize the system in question, and which come either from theoretical considerations or are provided by experimental data. Having characterized a physical system in terms of model parameters, simulations are often used both to solve theoretical models beyond certain approximations and to provide a hint to experimentalists for further investigations. In the case of big experimental facilities it is even often required to prove the potential outcome of an experiment by computer simulations. In that way one can say that the field of computer simulations has developed into a very important branch of science, which on the one hand helps theorists and experimentalists to go beyond their inherent limitations and on the other hand is a scientific field on its own. The traditional simulation methods for many-body systems can be divided into two classes of stochastic and deterministic simulations, which are largely covered by the Monte Carlo (MC) method and the molecular dynamics (MD) method, respectively. Monte Carlo simulations probe the configuration space by trial moves of particles. Within the so-called Metropolis algorithm, the energy change from step n to n + 1 is used as a trigger to accept or reject the new configuration. Paths towards lower energy are always accepted, those to higher energy are accepted with a probability governed by Boltzmann statistics. In that way, properties of the system can be calculated by averaging over all Monte Carlo moves (where one move means that every degree of freedom is probed once on average). By contrast, MD methods are governed by the system’s Hamiltonian and consequently Hamilton’s equations of motion p˙i = − ∂H ∂qi , q˙i = ∂H ∂pi (1) 211
are integrated to move particles to new positions and to get new velocities at these new positions.This is an advantage of MD simulations with respect to MC,since not only is the configuration space probed but the whole phase space which gives additional information about the dynamics of the system.Both methods are complementary in nature but they lead to the same averages of static quantities,given that the system under consideration is ergodic and the same statistical ensemble is used. Although there are different methods to obtain information about complex systems, particle simulations always require a model for the interaction between system constitu- ants.This model has to be tested against experimental results,i.e.it should reproduce or approximate experimental findings like distribution functions or phase diagrams,and theoretical constraints,i.e.it should obey certain fundamental or limiting laws like energy conservation. Concerning MD simulations the ingredients for a program are basically threefold: (i)As already mentioned,a model for the interaction between system constituants (atoms, molecules.surfaces etc.)is needed.Often,it is assumed that particles interact only pair- wise,which is exact e.g.for particles with fixed partial charges.This assumption greatly reduces the computational effort and the work to implement the model into the program. (ii)An integrator is needed,which propagates particle positions and velocities from time t to t ot.It is a finite difference scheme which moves trajectories discretely in time.The time step ot has properly to be chosen to guarantee stability of the integrator,i.e.there should be no drift in the system's energy (iii)A statistical ensemble has to be chosen,where thermodynamic quantities like pressure, temperature or the number of particles are controlled.The natural choice of an ensemble in MD simulations is the microcanonical ensemble (NVE),since the system's Hamiltonian without external potentials is a conserved quantity.Nevertheless,there are extensions to the Hamiltonian which also allow to simulate different statistical ensembles. These steps essentially define an MD simulation.Having this tool at hand,it is possible to obtain exact results within numerical precision.Results are only correct with respect to the model which enters into the simulation and they have to be tested against theoretical predictions and experimental findings.If the simulation results differ from the real system properties or are incompatible with solid theoretical manifestations,the model has to be refined.This procedure can be understood as an adaptive refinement which leads in the end to an approximation of a model of the real world at least for certain properties.The model itself may be constructed from plausible considerations,where parameters are cho- sen from neutron diffraction or NMR measurements.It may also result from first principle investigations,like quantum ab initio calculations.Although the electronic distribution of the particles is calculated very accurately,this type of model building contains also some approximations,since many-body interactions are mostly neglected(this would increase the parameter space in the model calculation enormously).However,it often provides a good starting point for a realistic model. An important issue of simulation studies is the accessible time-and length-scale cov- erable by microscopic simulations.Fig.1 shows a schematic representation for different types of simulations in a length-time-diagram.It is clear that the more detailed a simu- lation technique operates,the smaller is the accessibility of long times and large length scales.Therefore quantum simulations,where fast motions of electrons are taken into account,are located in the lower left corner of the diagram and typical length and time 212
are integrated to move particles to new positions and to get new velocities at these new positions. This is an advantage of MD simulations with respect to MC, since not only is the configuration space probed but the whole phase space which gives additional information about the dynamics of the system. Both methods are complementary in nature but they lead to the same averages of static quantities, given that the system under consideration is ergodic and the same statistical ensemble is used. Although there are different methods to obtain information about complex systems, particle simulations always require a model for the interaction between system constituants. This model has to be tested against experimental results, i.e. it should reproduce or approximate experimental findings like distribution functions or phase diagrams, and theoretical constraints, i.e. it should obey certain fundamental or limiting laws like energy conservation. Concerning MD simulations the ingredients for a program are basically threefold: (i) As already mentioned, a model for the interaction between system constituants (atoms, molecules, surfaces etc.) is needed. Often, it is assumed that particles interact only pairwise, which is exact e.g. for particles with fixed partial charges. This assumption greatly reduces the computational effort and the work to implement the model into the program. (ii) An integrator is needed, which propagates particle positions and velocities from time t to t + δt. It is a finite difference scheme which moves trajectories discretely in time. The time step δt has properly to be chosen to guarantee stability of the integrator, i.e. there should be no drift in the system’s energy. (iii) A statistical ensemble has to be chosen, where thermodynamic quantitieslike pressure, temperature or the number of particles are controlled. The natural choice of an ensemble in MD simulations is the microcanonical ensemble (NVE), since the system’s Hamiltonian without external potentials is a conserved quantity. Nevertheless, there are extensions to the Hamiltonian which also allow to simulate different statistical ensembles. These steps essentially define an MD simulation. Having this tool at hand, it is possible to obtain exact results within numerical precision. Results are only correct with respect to the model which enters into the simulation and they have to be tested against theoretical predictions and experimental findings. If the simulation results differ from the real system properties or are incompatible with solid theoretical manifestations, the model has to be refined. This procedure can be understood as an adaptive refinement which leads in the end to an approximation of a model of the real world at least for certain properties. The model itself may be constructed from plausible considerations, where parameters are chosen from neutron diffraction or NMR measurements. It may also result from first principle investigations, like quantum ab initio calculations. Although the electronic distribution of the particles is calculated very accurately, this type of model building contains also some approximations, since many-body interactions are mostly neglected (this would increase the parameter space in the model calculation enormously). However, it often provides a good starting point for a realistic model. An important issue of simulation studies is the accessible time- and length-scale coverable by microscopic simulations. Fig.1 shows a schematic representation for different types of simulations in a length-time-diagram. It is clear that the more detailed a simulation technique operates, the smaller is the accessibility of long times and large length scales. Therefore quantum simulations, where fast motions of electrons are taken into account, are located in the lower left corner of the diagram and typical length and time 212
10000 HD ● 1000 100 BD 10 MD QS fs ps ns us ms time Figure 1.Schematic comparison of time-and length-scales,accessible to different types of simulation techniques(quantum simulations(QM),molecular dynamics (MD),Brow- nian dynamics (BD)and hydrodynamics/fluid dynamics(HD)).The black dots mark the longest(≈1s)and the biggest(N>5×109,L≈0.4 m molecular dynamics simulations by Duan Kollman'and Roth2 respectively.) scales are of order of A and ps.Classical molecular dynamics approximates electronic distributions in a rather coarse-grained fashion by putting either fixed partial charges on interaction sites or by adding an approximate model for polarization effects.In both cases, the time scale of the system is not dominated by the motion of electrons,but the time of intermolecular collision events.rotational motions or intramolecular vibrations,which are orders of magnitude slower than those of electron motions.Consequently,the time step of integration is larger and trajectory lengths are of order ns and accessible lengths of order 10-100 A.If one considers tracer particles in a solvent medium,where one is not inter- ested in a detailed description of the solvent,one can apply Brownian dynamics,where the effect of the solvent is hidden in average quantities.Since collision times between tracer particles is very long,one may apply larger timesteps.Furthermore,since the solvent is not simulated explicitly,the lengthscales may be increased considerably.Finally,if one is in- terested not in a microscopic picture of the simulated system but in macroscopic quantities the concepts of hydrodynamics may be applied,where the system properties are hidden in effective numbers,e.g.density,viscosity,sound velocity. It is clear that the performance of particle dynamics simulations strongly depends on the computer facilities at hand.The first studies using MD simulation techniques were performed in 1957 by B.J.Alder and T.E.Wainright3 who simulated the phase transition 213
fs ps ns µs ms 1 10 100 1000 10000 QS MD BD QS MD BD HD length [ Å] time Figure 1. Schematic comparison of time- and length-scales, accessible to different types of simulation techniques (quantum simulations (QM), molecular dynamics (MD), Brownian dynamics (BD) and hydrodynamics/fluid dynamics (HD)). The black dots mark the longest (≈ 1 µs) and the biggest (N > 5 × 109 , L ≈ 0.4µm molecular dynamics simulations by Duan & Kollman1 and Roth2 respectively.) scales are of order of A˚ and ps. Classical molecular dynamics approximates electronic distributions in a rather coarse-grained fashion by putting either fixed partial charges on interaction sites or by adding an approximate model for polarization effects. In both cases, the time scale of the system is not dominated by the motion of electrons, but the time of intermolecular collision events, rotational motions or intramolecular vibrations, which are orders of magnitude slower than those of electron motions. Consequently, the time step of integration is larger and trajectory lengths are of order ns and accessible lengths of order 10 − 100 A. ˚ If one considers tracer particles in a solvent medium, where one is not interested in a detailed description of the solvent, one can apply Brownian dynamics, where the effect of the solvent is hidden in average quantities. Since collision times between tracer particles is very long, one may apply larger timesteps. Furthermore,since the solvent is not simulated explicitly, the lengthscales may be increased considerably. Finally, if one is interested not in a microscopic picture of the simulated system but in macroscopic quantities, the concepts of hydrodynamics may be applied, where the system properties are hidden in effective numbers, e.g. density, viscosity, sound velocity. It is clear that the performance of particle dynamics simulations strongly depends on the computer facilities at hand. The first studies using MD simulation techniques were performed in 1957 by B. J. Alder and T. E. Wainright3 who simulated the phase transition 213
of a system of hard spheres.The general method,however,was presented two years later" In this early simulation.which was run an an IBM-704,up to 500 particles could be simu- lated,for which 500 collisions per hour could be calculated.Taking into account 200000 collisions for a production run,these simulations lasted for more than two weeks.The propagation of hard spheres in a simulation is determined by the collision events between two particles.Therefore,the propagation is not based on an integration of the equations of motion.but rather the calculation of the time of the next collision.which results in a variable time step in the calculations The first MD simulation which was applied to atoms interacting via a continuous po- tential was performed by A.Rahman in 1964.In this case,a model system for Argon was simulated and not only binary collisions were taken into account but the interactions were modeled by a Lennard-Jones potential and the equations of motion were integrated with a finite difference scheme.This work may be considered as seminal for dynamical calcula- tions.It was the first work where an exact method (within numerical precision)was used to calculate dynamical quantities like autocorrelation functions and transport coefficients like the diffusion coefficient for a realistic system.Also more involved topics like the dy- namic van Hove function and non-Gaussian corrections to diffusion were evaluated.The calculations were performed for 864 particles on a CDC 3600,where the propagation of all particles for one time step took 45 s.The calculation of 50000 timesteps then took more than three weeks!a With the development of faster and bigger massively parallel architectures the accessi- ble time and length scales are increasing.In the case of classical MD simulations it was demonstrated by J.Roth in 1999 on the CRAY T3E-1200 in Juilich that it is possible to simulate more than 5 x 109 particles,corresponding to a length scale of several 1000 A. This was possible with the highly memory optimised MD program IMD5.2,which used the 512 nodes with 256 MB memory each,quite efficiently.However,the limits of such a demonstration became rather obvious,since for a usual production run of 10000 time steps a simulation time of a quarter of a year would be required(given that the whole machine is dedicated to one user).In another demonstration run Y.Duan and P.A.Kollman extended the time scale of an all atom MD simulation to 1 us,where they simulated the folding process of the subdomain HP-36 from the villin headpiece.1.The protein was modelled with a 596 interaction site model dissolved in a system of 3000 water molecules.Using a timestep of integration of 2 x 10-15s,the program was run for 5 x 108 steps.In order to perform this type of calculation,it was necessary to run the program several months on a CRAY T3D and CRAY T3E with 256 processors.It is clear that such kind of simulation is exceptional due to the large amount of computer resources needed,but is nonetheless a kind of milestone pointing to future simulation practices. Classical molecular dynamics methods are nowadays applied to a huge class of prob- lems,e.g.properties of liquids,defects in solids,fracture,surface properties,friction, molecular clusters,polyelectrolytes and biomolecules.Due to the large area of applica- bility,simulation codes for molecular dynamics were developed by many groups.On the internet homepage of the Collaborative Computational Project No.5(CCP5)?a lot of com- puter codes are assembled for condensed phase dynamics.During the last years several programs were designed for parallel computers.Among them,which are partly avail- "On a standard PC this calculation may be done within one hour nowadays! 214
of a system of hard spheres. The general method, however, was presented two years later4 . In this early simulation, which was run an an IBM-704, up to 500 particles could be simulated, for which 500 collisions per hour could be calculated. Taking into account 200000 collisions for a production run, these simulations lasted for more than two weeks. The propagation of hard spheres in a simulation is determined by the collision events between two particles. Therefore, the propagation is not based on an integration of the equations of motion, but rather the calculation of the time of the next collision, which results in a variable time step in the calculations. The first MD simulation which was applied to atoms interacting via a continuous potential was performed by A. Rahman in 1964. In this case, a model system for Argon was simulated and not only binary collisions were taken into account but the interactions were modeled by a Lennard-Jones potential and the equations of motion were integrated with a finite difference scheme. This work may be considered as seminal for dynamical calculations. It was the first work where an exact method (within numerical precision) was used to calculate dynamical quantities like autocorrelation functions and transport coefficients like the diffusion coefficient for a realistic system. Also more involved topics like the dynamic van Hove function and non-Gaussian corrections to diffusion were evaluated. The calculations were performed for 864 particles on a CDC 3600, where the propagation of all particles for one time step took ≈ 45 s. The calculation of 50000 timesteps then took more than three weeks! a With the development of faster and bigger massively parallel architectures the accessible time and length scales are increasing. In the case of classical MD simulations it was demonstrated by J. Roth in 1999 on the CRAY T3E-1200 in Julich ¨ that it is possible to simulate more than 5 × 109 particles, corresponding to a length scale of several 1000 A. ˚ This was possible with the highly memory optimised MD program IMD5, 2 , which used the 512 nodes with 256 MB memory each, quite efficiently. However, the limits of such a demonstration became rather obvious, since for a usual production run of 10000 time steps a simulation time of a quarter of a year would be required (given that the whole machine is dedicated to one user). In another demonstration run Y. Duan and P. A. Kollman extended the time scale of an all atom MD simulation to 1 µs, where they simulated the folding process of the subdomain HP-36 from the villin headpiece6, 1 . The protein was modelled with a 596 interaction site model dissolved in a system of 3000 water molecules. Using a timestep of integration of 2 × 10−15s, the program was run for 5 × 108 steps. In order to perform this type of calculation, it was necessary to run the program several months on a CRAY T3D and CRAY T3E with 256 processors. It is clear that such kind of simulation is exceptional due to the large amount of computer resources needed, but is nonetheless a kind of milestone pointing to future simulation practices. Classical molecular dynamics methods are nowadays applied to a huge class of problems, e.g. properties of liquids, defects in solids, fracture, surface properties, friction, molecular clusters, polyelectrolytes and biomolecules. Due to the large area of applicability, simulation codes for molecular dynamics were developed by many groups. On the internet homepage of the Collaborative Computational Project No.5 (CCP5)7 a lot of computer codes are assembled for condensed phase dynamics. During the last years several programs were designed for parallel computers. Among them, which are partly availaOn a standard PC this calculation may be done within one hour nowadays! 214