The model we used in this work is a three-dimensional lattice polymer.
Monomers that are connected along the chain are constrained to be
nearest-neighbors on the lattice, and only one monomer is allowed per
site. (This is the excluded volume condition.) The chain is then a
self-avoiding walk on the lattice. The polymers are all 27 monomers
long. The maximally compact state is a
cube (see Fig.
). Although it is not feasible to
enumerate all configurations of a 27 monomer chain, it is easy to
enumerate all the compact cubes, of which there are
. If we
choose a potential that favors the formation of contacts, then the
minimum energy conformation will usually be a compact cube. Selecting
such a potential enables us to determine the native structure of a
given sequence by enumeration of the cubes, since for this simple
model the native state is the lowest energy conformation. In addition,
the degeneracy of the lowest energy state can be determined. Since we
are interested in proteinlike polymers which have a ``single'' native
state,[21] we will choose sequences with a nondegenerate
ground state, i.e., those with only one lowest energy conformation.
We want a potential that will favor compact states and cause the chain to fold. The dominant force in protein folding is the hydrophobic effect.[22] This force is a many-body interaction between the hydrophobic side chains and the solvent (water). The main effect is to cause the chain to collapse and create a hydrophobic core. In our simulations we model this effect by using an attractive potential to collapse the chain. This potential favors the formation of contacts between any two monomers. However, we do not want a homopolymer, so the interaction energy is dependent on whether the two monomers in contact are of the same type or not. The potential is given explicitly by
where the sum is over all nearest neighbor pairs on the lattice,
excluding covalently linked pairs. The type of monomer i is
which we will denote with A and B.
is the
interaction matrix given by
is the energy for a contact between monomers of the same type,
and
is for contacts between unlike monomers. To collapse the
chain, we pick both energies to be negative with
, favoring
contacts between monomers of the same type.
We now need to specify the dynamics of the model. For lattice systems there is no unique way to do so, and it has been shown that different move sets may give very different kinetic behavior. In a study of homopolymer folding kinetics, Chan and Dill[10, 11] showed that various kinetic quantities, like the collapse time and the mean first passage time, will depend on the type of moves allowed. Therefore, we wish to choose a set of moves that will give dynamics that are as realistic as possible. Care must be taken in analyzing the results of these simulations. In particular, one must not try to extract too much detail from these types of simulations. The right questions need to be asked. For example, we will look at how folding and collapse time varies with sequence. This is a generic question and the behavior should be universal to all reasonable move sets. A question which would be more difficult (or perhaps not even valid) for this simulation to answer would concern specific details of the folding pathway, for instance, the role of secondary structure formation in folding. One would imagine that depending on the move set used one would find very different answers to this question. To answer such questions, more realistic models with clearly defined dynamics are necessary.
The move set used consists of local moves which preserve the covalent
links and keep each lattice site either singly occupied or empty. This
set was developed some time ago to study the dynamics of
polymers.[23, 24, 25] The allowed moves
consist of end moves in which the ends of the chain move to an empty
adjacent site, corner moves which flip a single monomer and crankshaft
moves which move two monomers simultaneously (see
Fig.
). Studies involving this move set have shown
that it closely reproduces the relaxation dynamics of the Rouse
model.[25, 26] More complex moves are possible where
more than two monomers are moved simultaneously. They have the
advantage of allowing concerted motion of structural elements (like
helices). One must take into account the different rates of these
more complex moves; i.e., a 5 monomer motion should occur more
slowly than the flipping of a single monomer. If this is not taken
into account, the time scales will be distorted since different moves,
all of which can be performed in one iteration of the algorithm, will
have different ``physical'' times. To correct for this, one can assign
a different probability for each of the
moves.[4, 3] Since our chains are relatively
short, we use only the simple one and two monomer moves. We do not
believe that including the possibility of concerted motions of large
subsections of the polymer will change the answers to the questions
asked here.
There is one important comment to be made about this set of
moves--they are not ergodic. In particular, it is not possible to
reach the configuration in Fig.
from an extended
chain.[27] The question is whether the nonergodicity of
our moves will create a problem. The simple answer to this problem is
that we are interested in the kinetics of folding and as long as the
native state is accessible, there should no problems due to the
existence of inaccessible states. In particular, we can view these
states as being irrelevant in the same way that highly unlikely states
are irrelevant for real proteins. The chain in Fig.
actually forms a knot. We know that real proteins do not have
``tight'' knots,[28, 29] and it is highly unlikely that,
in folding, a protein passes through a knotted state. Strictly
speaking, it is not impossible for a protein to become knotted; it is
just unlikely. Due to the constraints of the lattice, the knotted
state is now inaccessible rather than unlikely; however, its existence
will have no effect on the folding properties. One may argue that
these inaccessible states may still affect thermodynamic calculations.
In particular, what will be the effect of the fact that our moves
restrict us to a ergodic subspace of the full phase (conformation)
space of the system? That will depend on the relative sizes of the
excluded space. In practice, for small chains, the errors introduced
by the nonergodicity of this move set is smaller than the statistical
error. Comparison with other ergodic algorithms show no change in the
results.[25]
A move is made using the Metropolis Monte Carlo algorithm.[30] A monomer is selected at random. If it is an end monomer, then one of the neighboring lattice point is also selected at random. If it is not an end piece, then it can do either a corner move or a crankshaft move, depending on the position of its neighbors along the chain. In the case of the crankshaft, the possible direction is also selected at random. If the move selected would violate the excluded volume constraint by moving the monomer to an occupied site, the old configuration is counted once more in averaging (i.e., a step is considered to have elapsed), and a new monomer is picked. If a move is possible, that is, if the lattice site is empty, then the energy of the new conformation is calculated and compared to the original energy. If the energy decreases, then the move is accepted unconditionally. If the energy increases, then the move is accepted with the usual Boltzmann probability:
where
and
are the new and old energies,
respectively, and T is the temperature. Note that we have chosen
units such that the Boltzmann's constant is unity (
).
Whether a move is accepted or not, one unit of time (a Monte Carlo
step) is considered to have elapsed.
There is no simple and direct connection between Monte Carlo steps and the physically relevant times scales of the system. One important result of this work is that we will determine the mapping between the physically important time scales such as the folding time and the computation time scales (Monte Carlo steps). This will be useful in later works in which we will study the thermodynamics of these systems where it will be necessary to know how long it takes for systems to reach equilibrium and explore conformation space. Once again, the precise connection between physical time (like the folding time) and Monte Carlo steps will depend on the details of the move set used. However, we expect that as long as the moves are chosen with care, that is, one attempts to make it as physically realistic as possible, then the qualitative features will remain the same. For example, the behavior of folding time as a function of temperature will look qualitatively the same although the exact folding time (number of steps) will vary.