In this work we studied six sequences, all 27 monomers long. Four were
selected by the following procedure. First a sequence was generated at
random with the appropriate ratio of monomer types. We then enumerated
all cubes calculating the number of minimum energy states. Sequences
with degenerate minimum energy states were rejected. From the
remaining nondegenerate sequences we picked four which had a spread
in energy of the native states (-82 to -76). One sequence was
obtained by changing a single monomer in one of the original four;
i.e., it is a single-site mutation, which lowered the ground
state energy, from -82 to -84. The last sequence was taken from a
paper by Shakhnovich[15] which gave a method for
finding optimal sequences; it also has a ground state energy of -84.
Table
shows the various sequences along with some data
for each. All six sequences have the same ratio of monomer types,
roughly 50:50 (14:13, actually). For these simulations the value for
the contact energies are -3 for monomers of the same type and -1
for unlike monomers. Again, the units used are such that
.
The maximally compact conformation of a 27 monomer chain is the
cube. There are 28 noncovalent or
topological contacts in this cube, so the lowest possible energy is
-84. Two of the sequences have this energy for their minimum
conformation. This corresponds to a completely ``unfrustrated''
ground state. By ``unfrustrated,'' we mean that the ground state has
no topological contacts between monomers of different types. The other
three sequences have ground state energies higher than -84 and,
consequently, their ground states have at least one bad topological
contact and consequently are frustrated energetically.
Using the Monte Carlo moves described above, we simulated the folding
of these sequences and examined how the folding behavior depends on
temperature and differs from sequence to sequence. For each sequence
we started with a random, completely unfolded, initial conformation.
Here, completely unfolded means that there are no contacts between any
of the monomers. A temperature was selected, and the sequence was
allowed to fold. Once the sequence found its folded state (which we
detected by monitoring the energy), we stopped the simulations. The
simulations were also stopped if the sequence did not find its folded
state within
steps. For the simulations in this
work,
Monte Carlo steps. This
maximum time was picked both so that it was longer than the typical
folding time of most sequences and to minimize the actual computer
time used in the simulations. It is important to realize that any
times longer than
are undefined. Ideally, we
would want to pick
to be longer than any
interesting and relevant physical or biological time scale. Since
there is no simple connection between Monte Carlo steps and physical
time, we cannot directly determine
. We chose a
first value for
which seemed reasonable and then
made sure that it was much longer than the folding time for the
various sequences.
At each temperature we ran many simulations, each with a different
random initial condition (always unfolded). We then calculated an
average folding time (
) from these runs. This time is
the mean first passage time from the set of unfolded initial states to
the folded state. Figure
shows
as a
function of temperature. Once again, the units of temperature and
energy have been chosen so that
. We ran anywhere from 10
to 600 simulations at each temperature and calculated the average
folding time. If the folded state was not found within
steps, we averaged in
for that run.
The error bars are the standard deviation of the mean given by
, where
is the standard deviation of the
distribution of folding times and N is the number of runs at that
temperature. It is important to note that since we average in
when the chain does not fold, the error bars are
not as meaningful at temperatures where the folding time approaches
and may be much larger at these temperatures. In
particular, at high and low temperature the points equal
with zero error. That is simply due to the fact that at
those temperatures the simulations never found the folded state.
Figure
shows the fraction of times the folded state
was found as a function of temperature. It has a maximum plateau over
the same temperature range that
has a minimum plateau.
At temperatures where the chain folds rapidly, it also finds its
native state 100% of the time. At temperatures where the simulations
did not find the folded state all the time, the
shown
in Fig.
is a lower bound to the actual mean first
passage time. Figure
shows the distribution of
folding times for three temperatures. At the temperature of fastest
folding (T=1.58) the distribution of times is narrowest. For
temperatures above and below this the distribution becomes quite
broad. All three histograms appear roughly Poissonian. The standard
deviations are approximately equal to the means.
We observe three different temperature regions, similar to those found
in two-dimensional lattice simulations.[9, 11] Above a
temperature of
and below
, the chains did not fold
within
steps. Between these temperatures the
folding time drops rapidly to approximately
-
. The fraction of runs that find the
folded state increases sharply from 0 to 1 in this temperature range.
In the next plot (Fig.
) the folding time is
plotted along with the chain compaction time. The compaction time is
simply the number of steps it takes for an unfolded state to reach a
maximally compact cube. In addition, we also show a time to reach a
nearly compact state, which we define to be a conformation with 25
(out of 28) contacts. The behavior of the compaction time as a
function of temperature is similar to that of the folding time, but
chains compact much faster than they fold. This behavior is similar to
what is believed to occur in real proteins in which the chain first
folds rapidly to a compact state and then rearranges itself to the
native structure.
Above a temperature of approximately 5, the compaction time
approaches
. Above this temperature the
free-energy is dominated by the entropic term which favors non compact
conformations which are far more numerous than compact ones. In the
range from 5-2, we observe the following interesting behavior.
The chains compact fairly rapidly, but the folding time is still quite
long. In particular, at about
the chains compacted easily
but never folded within
steps. We can draw a
parallel between this state and the molten globule state of
proteins.[31] At this temperature range, the temperature
is low enough so that the potential, which favors contacts, drives the
chain to a compact conformation, but the temperature is still too high
for the potential to drive the chain to the native state. In this
range we can imagine the chain is fluctuating about various compact
states, ``randomly'' searching for the native state. This is like the
often-discussed Levinthal paradox in which it is argued that a protein
could not find its folded structure by random search. If the
temperature is high enough there is no strong driving force that
favors the native state. When the temperature is low enough, the
chain is no longer randomly searching compact conformations but is
driven to the folded state. This is, of course, the well-known
resolution to the paradox. At the appropriate temperatures proteins
do not randomly search for their native state but are directed to it
by the shape of the free energy surface.
At still lower temperatures both the folding and compaction time start
to increase again. At temperatures slightly less than 1 the folding
time reaches
again and at a temperature of
roughly .63 the compaction time approaches
. At
low temperatures the system is beginning to slow down, kinetically,
and is now getting trapped in local meta-stable states. Even
though we expect, at these low temperatures, the free energy to have a
very pronounced minimum at the native state, the system is unable to
reach it within a reasonable time. This region is often referred to as
the glass phase. We can define a temperature at which the system
undergoes a glass transition characterized by the slowing down of
various times, such as the folding and compaction times. The
autocorrelation time of the system would also increase in this
temperature region, indicating that the chain was locally trapped. We
define the glass transition temperature (
) as the temperature at
which the folding time is half way between
and
(where
is the fastest
folding time observed). Using this definition we get
.
Note that the definition of glass temperature is not the usual
thermodynamic definition of temperature. It is not determined by the
inverse of the derivative of entropy with respect to energy (
) at the point where the entropy
``vanishes.''[32] One difficulty with this definition is
its relation to the kinetics of the system. The idea is that as a
system is taken out of equilibrium, then time it takes to relax back
will increase as the temperature gets closer to
. To avoid this
kind of assumption, we have given a kinetic definition for
in
which we will explicitly look for a slowing down of the system. We
would expect the precise value of
to depend on the moves used
and therefore we will not focus on the exact value but on the relative
value. In particular later we will compare
to another important
temperature, the folding temperature (
), and we will discover a
key relation between the two. Additionally
depends on the value
of
; that is, it will depend on how long we run
our simulations. This is a subtle but extremely important fact to
remember when studying finite sized systems. When talking about
glasslike behavior of a finite system the notion of glasslike depends
on the time scale you are looking at. If you wait long enough the
chain will always find the native state. To speak of a physically
meaningful glass transition one must define the physical time scales
of interest. The time scales of importance here are related to the
minimum folding time of a good folding sequence. We want to examine
our system on a time scale that is reasonable greater then the minimum
folding time. For our simple system there is no obvious greater time
to pick. For real proteins the life time of the organism would be a
reasonable choice, since proteins need to fold on a time scale much
shorter than this to be useful. We picked a time that was roughly two
orders of magnitude greater than the folding time. Since this is
somewhat arbitrary, we investigated how
changed as
is varied. Figure
shows the results of
several simulations in which
was varied from one
fourth the usual value (
) to almost twice this
value. We see that although
decreases with increasing
it does so quite slowly. The difference between
the last two is only about 4%; therefore,
is not too sensitive
to the precise value of
.
The glass temperature just defined is related to the folding of the
chains. One can also define a glass temperature that has to do with
the slowdown of compaction. This would be the temperature at which the
time it takes the chain to form a cube (28 contacts) is half way
between the maximum and minimum times. Call this temperature
. Examining figure
we see that
is less than
. (It is approximately equal to 0.7.) One could
also consider
the glass temperature for forming 25 contacts
(which is lower still). In general the transition temperature will be
a function of some parameter,
, which is a measure of the
compactness of the chain and/or similarity to the native state.
Bryngelson and Wolynes[18] first calculated
in their random energy model.
The two regions in which the chain fails to fold are qualitatively very different. At high temperatures the energy differences between conformations becomes negligible so all conformations have roughly equal probabilities. The chain is randomly exploring the conformation space. It takes a long time to find the native state by random search due to the vast number of conformations. The free energy is dominated by the entropic term so the native state is no longer the global minimum. At low temperatures the energy differences between states becomes important and the folded state is the global minimum free energy. The problem now is that the barriers between states are too high and at low temperatures there is a very small probability for crossing them. For compact conformations many moves will involve the breaking of contacts which at low temperatures becomes unlikely. In particular, moves that break more than one contact are much less probable that those that break only one. Instead of a random search the chain is now forced in to a very narrow kinetic pathway consisting of those steps with very small free energy barriers. The chain gets trapped in the many local minima.
If we were willing to wait long enough the system would eventually fold. Since our system is finite, the system always has a finite nonzero probability to find the native state. The same is true for the high-temperature case: If we wait long enough, the chain will eventually find the folded state. However, one must remember that at those temperatures the folded state is not the free energy minimum and is therefore not stable. For example, consider the following two temperatures: 2.24 and 1.12. The folding time for these two temperatures is roughly the same. At the higher temperature (as we will see shortly) the chain spends almost zero time in the folded state (less than 0.04%). At the lower temperature the chain spends roughly 77% of the time in the folded state. When we speak of folding time, this is simply the time it takes the chain to find the native conformation. There is another important factor here: Namely, is the folded state stable thermodynamically? We will address this issue at the end of this paper, where we see that it is not enough that a chain find its native state in a short time, but it must do so at temperatures where the native state is thermodynamically stable.
Let us return to the question of how long is too long to wait for a sequence to fold. Too long is in general determined by other time scales in the system. For proteins, there are a number of biologically relevant time scales, the lifespan of the organism for example. Proteins that do not fold fast enough on this time scale can be considered not to fold at all. Since we are studying a simple artificial system there is no a priori time scale to pick, other than limits on the simulation (computer) time. One of the problems with Monte Carlo dynamics is that there is no easy way to ``calibrate'' them, that is to make a connection between Monte Carlo steps and ``real'' physical or biological time. What we have done here is to define the relevant time scale as the folding time (or the compaction time) and make sure we ran simulations for long enough that we could see the the variation of folding time as a function of temperature.
We have examined the folding (and compaction) time as a function of
temperature for one sequence. We now would like to see how this
function varies from sequence to sequence. Figure
show a plot of the folding time and compaction time versus temperature
for several sequences. From this figure we notice two very interesting
features of this model. First the compaction time is sequence
independent. All six sequences have roughly the same compaction time
for temperatures above
. In contrast the folding time is highly
sequence dependent. At a temperature of roughly 1.6 there is a
difference of nearly an order of magnitude between the fastest and
slowest folding sequence. The folding time is also roughly correlated
with the energy of the folded state. The lower the energy, the faster
the folding time. However, the relation between the energy of the
native state and the folding time is not a simple one. For example the
two sequences with the lowest energy folded states (sequence 002 and
004, see Table
) have different folding times. The
difference is slight but sequence 004 has a consistently faster
folding time at all temperatures. Sequence 005, which is a single
monomer mutation of 004, has a higher ground state energy (-82),
but its folding times are very close to those of the lower energy
sequence 002. There also appears to be a fairly large difference
between the sequences that have energies below -81 and those that
have energies above.
Note that the collapse time is always much faster than the folding
time for all sequences. Even sequences that fold slowly collapse as
rapidly as the fast-folding ones. This sequence-independent property
of the collapse time is often referred to as a self-averaging
property; it does not depend on the specifics of the sequence but
rather on the general character of the ensemble of sequences. It is
important to remember here that we are choosing a restricted ensemble
of sequences though, namely the subset of sequences with a particular
ratio of monomer types (a ratio of 14:13). Sequences that contain a
different composition of monomers may have different collapse times
than the sequences used here. The folding time is not self-averaging;
i.e., it depends on the sequence. So we can view the kinetics of
folding as a two-stage process. The first involves a rapid collapse of
the polymer. The nature of this collapse is sequence independent. We
can picture the polymer in this collapsed state as fluctuating about
various compact cube states. This picture has been advanced previously
by others.[33] The next step is a medium-to-slow event in
which the polymer searches for the minimum energy state among the
compact states. The time it takes for the polymer to find its minimum
state depends on the specifics of the sequence. The two-phase collapse
with two distinct time scales has also been observed for real
proteins.[34, 35, 36] The first phase is a
rapid collapse in which a hydrophobic core is formed. We should
expect this collapse to be independent of the specific sequence,
depending on the ratio of hydrophobic to hydrophilic monomers. This
collapsed state then undergoes rearrangement to the folded structure
of the specific sequence. The collapse time below the glass
temperature loses its self-averaging property. Examining
Fig.
we see that below
the
collapse time is no longer sequence independent. In the glass region
we would expect the kinetics to depend on the details of the energy
surfaces and these details will be sequence dependence. This is
expected of a system exhibiting glassy behavior. Note how this
contrasts with the high temperature limit, where the collapse times
remain sequence independent even as they approach
.
At this point one may be tempted to conclude that we have two types of
sequences: Fast folders and slow folders. However this is not the
case. In reality what we have are sequences that fold and sequences
that do not. In order to see this we need to look at the
thermodynamics of these systems. In particular we need to look at the
thermodynamic stability of the native state as a function of
temperature. To do so, we performed a series of thermodynamic runs
using the same Monte Carlo algorithm described above. The system was
equilibrated by first running it for 100 million steps, which is on
the order of the folding time for most of the sequences. Care must be
taken at low temperatures since near the glass transition the system
will slow down; i.e., the auto-correlation time will diverge. We
looked at temperatures above
to avoid this problem. We calculate
the
following thermodynamic quantity:
where
is the energy of the native state and Z is the
partition function. This quantity is the probability that the system
is in the native state; that is, it is folded. We define the folding
temperature as the temperature at which
; that
is when the folded state is half occupied. Note that
is a sufficient condition that the native state be the
global minimum of the free energy. Four sequences were used, each with
a different minimum energy ranging from -84 to -76. Several
simulations were run at a number of different temperatures.
Figure
shows the results. In addition to running
simulations at different temperatures, we also used the Monte Carlo
histogram method[37] to calculate the function
at temperatures other then the simulation temperature.
Using histograms collected from simulations run at T=1.58, we were
able to calculate
for all temperatures, extrapolating
into the glass region. These lines are plotted along with the points
calculated from the standard Monte Carlo runs.
The folding temperature,
, varies with the value of the
minimum-energy state. The lower the minimum energy the higher the
folding temperature. More importantly we see that the two lower energy
sequences have folding temperatures above the glass temperature,
, while the others have
. At
the lowest energy
sequence (which also folds the fastest) is 90% in the native state.
The highest energy sequence has a native state population of only
15%. At temperatures at which the folded state of the high-energy
sequence is thermodynamically stable this state is not kinetically
accessible. Therefore we would say this sequence does not fold. In
order for a sequence to be foldable it must meet two conditions.
First, it must have a reasonably fast folding time, where by
reasonable we mean on the relevant (biological) time scales, and
second, the folding temperature must be above the glass temperature.
The analogous situation would be a polypeptide which had a folding
temperature below the freezing point of the solvent. Such a protein
would not be considered foldable.