MD:The Limits and Beyond 23 1.0 0.20 0.5 0.10 cumulative contribution 0.000-51015 20 253035 4045 5000 eigenvector index m (a)Eigenvector magnitudes in descending order (left scale)and cumulative contribution to total fluctuation (right scale) 0.150 0.100 (wu) 0.050 品 0.000- -0.050 -0.100 -0.1500.30 -0.20-0.10’ 0.00 010 0.200.30 eigenvalue 20(nm) m (b)Fluctuation in the plane of the 20th and 50theigenvector,showing virtually complete sampling of independent gaussian distributions. Fig.4.(a)Eigenvalues and distributions for the a-carbon atoms in the 56-residue BI-domain of streptococcal protein G,from a 1 ns MD simulation in water(courtesy of Bert de Groot,Groningen)
24 Berendsen the total fluctuation.After the,say,10th eigenvector the distribution of the fluctuation is rather gaussian and uninteresting;such motions could be con- sidered as near-constraints'and good candidates for reduction of complexity (see fig.4). With the very nonlinear force field expressed in all coordinates,it seems intractable to reformulate the potential energy surface as a function of the essential degrees of freedom,while treating the other d.o.f.as full constraints. What is possible is to drive the dynamics in a space of a few(2 or 3)degrees of freedom [139,140]so as to expand the sampled volume.This is done by preventing a MD step to decrease the distance in that space with respect to a given reference point.In this way the borders of the accessible region are probed and a much more effective sampling is obtained than with normal MD;in the case of a cyclic polypeptide in solution the full conformational space could be sampled with this method [140].The method is useful to characterize different type of motions,e.g.to distinguish between functional motions and motions that relate to instability and early unfolding processes [141. If proteins are so full of internal 'near-constraints',we may take the anal- ysis even a step further and investigate whether the protein is built from building blocks that can be approximated as rigid bodies.If there are n rigid building blocks,there are at most 6(n-1)internal degrees of freedom,most of which are likely to be additionally constrained.Recently Hayward [142,143] has devised an automatic procedure that detects rigid bodies and character- izes the mutual motion of each pair,given at least two different conformations of the protein.These conformations can either be obtained from X-ray data, or from an essential dynamics analysis of a MD simulation. The analysis (in the case of two structures)starts by a translational- rotational fit of the two structures and constructing the displacement vectors of all backbone atoms.Considering these as samples of a vector field,the curl of that vector field is computed,as sampled by each aminoacid.Thus a collection of rotation vectors is obtained.If a rigid body exists,the rotation vectors of all aminoacids in that body are equal,and different from rotation vectors in other rigid bodies.A standard cluster analysis on the rotation vectors,using the ratio of external to internal motion as a discrimination criterion,is then carried out.This yields a subdivision of the protein in semi- rigid bodies (if they exist)and identifies the links between them.The type of motion of one rigid body with respect to another is then analysed in terms of a unique axis,and such (hinge bending)motions can be objectively characterized as closing or twisting motions (Fig.5). Interestingly,there are many proteins with two domains that show a very clear hinge-bending motion with an obvious functional significance.Such do- mains have often been reported in the literature,but were never detected on an automated basis
MD:The Limits and Beyond 25 Fig.5.Rigid-body analysis of citrate synthase,using two X-ray structures (after Hayward and Berendsen,Proteins 30(1998)144).The decomposition of the protein into two domains(dark gray and white)and two interconnecting regions(light gray) is shown,together with the hinge axis for the closing/opening motion between them. Mesoscopic Dynamics After suitable collective degrees of freedom have been defined,their equations of motion can be systematically derived by averaging over equilibrium dynamics of the other degrees of freedom.There is a hierarchy of methods of the generalised Langevin dynamics type,with details that depend on the kind of averaging.Coarse-graining in time leads first to neglect of the time-dependent details of the stochastic contribution and then to the neglegt of inertial terms,resulting in Brownian dynamics. Methods of this type are still under development. When a system contains gradients that are small over atomic distances,a big leap in time and space can be taken by describing the collective behaviour of species s (as a molecule or a collection of atoms in a molecule)that interact with neighbours through a simple hamiltonian(e.g.representing a gaussian chain)and feel the environment through a mean field defined by the local chemical potential us(r).The variables are the space-and time-dependent deusities of the specics ps.What in fact is done,is averaging locally over
26 Berendsen nearby space and temporally over short times.The resulting method rests on the classical density functional formalism of Ginzburg and Landau [144];its dynamical form 145,149 is called mesoscopic dynamics. Fig.6.Snapshot from a dynamic density functional simulation of the self- organisation of the block copolymer PL64 (containing 30 propylene oxide and 26 ethylene oxide units:(EO)13(PO)30(EO)13)in 70%aqueous solution.The simula- tion was carried out during 6250 time steps on a 64 x 64 x 64 grid (courtesy of B.A.C.van Vlimmeren and J.G.E.M.Fraaije,Groningen). If there are no reactions,the conservation of the total quantity of each species dictates that the time dependence of ps is given by minus the diver- gence of the flux ps(vs),where (Us)is the drift velocity of the species s.The latter is proportional to the average force acting locally on species s,which is the thermodynamic force,equal to minus the gradient of the thermody- namic potential.In the local coupling approximation the mobility appears as a proportionality constant M.For spontaneous processes near equilibrium it is important that a noise term n(t)is retained [146.Thus dynamic equations of the form ∂p =MV·pVu+n Ot are obtained (see [145,147,149]for details).The chemical potentials are derived from the functional derivative of the total free energy F,which itself is a functional of the density distribution:
MD:The Limits and Beyond 27 6 hs=δps Fpl, The method has severe limitations for systems where gradients on near- atomic scale are important (as in the protein folding process or in bilayer membranes that contain only two molecules in a separated phase),but is extremely powerful for (co)polymer mixtures and solutions [147,148,149]. As an example Fig.6 gives a snapshot in the process of self-organisation of a polypropylene oxide-ethylene oxide copolymer PL64 in aqueous solution on its way from a completely homogeneous initial distribution to a hexagonal structure. 4 Conclusions In this contribution I have tried to give a,necessarily subjective,review of the history,state-of-the-art,and future development of molecular dynamics simulations.The method is mature,but has not yet realized the full potential of the best algorithms,long-range interactions,and transferable,polarisable force fields.The near future will undoubtedly see the proper and efficient incorporation of quantum methods where needed,and an explosion in the simulation of reactive events,including enzyme reactions,is expected. For the simulation of long-time and large-scale events a systematic hier- archy of models is in principle available,ranging from atomic detail to fluid dynamics,but there are many gaps to be filled and some of this territory is still unknown.On a mesoscopic scale promising methods have been devel- oped,and I am optimistic that the gap between the atomic and mesoscopic scale will be filled in.But my optimism ends with the question whether one will ever be able to fold a protein reliably by ab initio simulation,starting from an aminoacid sequence alone and using only physical principles without resorting to database knowledge.For the most difficult questions one must be practical and exploit all available information,whether derived from theory or from experimental data in the broadest sense. Acknowledgements My initiation in the field of MD rests with the late Anees Rahman,whose insight and integrity has stimulated many of us,but who died far too young. Many CECAM activities would not have been realized without the stimulus provided by Carl Moser.The work in our laboratory has involved many col- laborators over the years,all of whom I thank for their contributions,but with special thanks to the seniors'Wilfred van Gunsteren and Hans Fraaije. I have quoted work of Andrea Amadei,Bert de Groot,Steven Hayward,Berk Hess,Marc Lensink,Siewert-Jan Marrink,Janez Mavri,David van der Spoel, and Bernard van Vlimincren:thank you all