next up previous
Next: CONCLUSIONS Up: Folding kinetics of proteinlike Previous: MODEL AND METHODS

RESULTS AND DISCUSSION

In this work we studied six sequences, all 27 monomers long. Four were selected by the following procedure. First a sequence was generated at random with the appropriate ratio of monomer types. We then enumerated all cubes calculating the number of minimum energy states. Sequences with degenerate minimum energy states were rejected. From the remaining nondegenerate sequences we picked four which had a spread in energy of the native states (-82 to -76). One sequence was obtained by changing a single monomer in one of the original four; i.e., it is a single-site mutation, which lowered the ground state energy, from -82 to -84. The last sequence was taken from a paper by Shakhnovich[15] which gave a method for finding optimal sequences; it also has a ground state energy of -84. Table gif shows the various sequences along with some data for each. All six sequences have the same ratio of monomer types, roughly 50:50 (14:13, actually). For these simulations the value for the contact energies are -3 for monomers of the same type and -1 for unlike monomers. Again, the units used are such that tex2html_wrap_inline438 . The maximally compact conformation of a 27 monomer chain is the tex2html_wrap_inline372 cube. There are 28 noncovalent or topological contacts in this cube, so the lowest possible energy is -84. Two of the sequences have this energy for their minimum conformation. This corresponds to a completely ``unfrustrated'' ground state. By ``unfrustrated,'' we mean that the ground state has no topological contacts between monomers of different types. The other three sequences have ground state energies higher than -84 and, consequently, their ground states have at least one bad topological contact and consequently are frustrated energetically.

Using the Monte Carlo moves described above, we simulated the folding of these sequences and examined how the folding behavior depends on temperature and differs from sequence to sequence. For each sequence we started with a random, completely unfolded, initial conformation. Here, completely unfolded means that there are no contacts between any of the monomers. A temperature was selected, and the sequence was allowed to fold. Once the sequence found its folded state (which we detected by monitoring the energy), we stopped the simulations. The simulations were also stopped if the sequence did not find its folded state within tex2html_wrap_inline446 steps. For the simulations in this work, tex2html_wrap_inline448 Monte Carlo steps. This maximum time was picked both so that it was longer than the typical folding time of most sequences and to minimize the actual computer time used in the simulations. It is important to realize that any times longer than tex2html_wrap_inline446 are undefined. Ideally, we would want to pick tex2html_wrap_inline446 to be longer than any interesting and relevant physical or biological time scale. Since there is no simple connection between Monte Carlo steps and physical time, we cannot directly determine tex2html_wrap_inline446 . We chose a first value for tex2html_wrap_inline446 which seemed reasonable and then made sure that it was much longer than the folding time for the various sequences.

At each temperature we ran many simulations, each with a different random initial condition (always unfolded). We then calculated an average folding time ( tex2html_wrap_inline458 ) from these runs. This time is the mean first passage time from the set of unfolded initial states to the folded state. Figure gif shows tex2html_wrap_inline458 as a function of temperature. Once again, the units of temperature and energy have been chosen so that tex2html_wrap_inline438 . We ran anywhere from 10 to 600 simulations at each temperature and calculated the average folding time. If the folded state was not found within tex2html_wrap_inline446 steps, we averaged in tex2html_wrap_inline446 for that run. The error bars are the standard deviation of the mean given by tex2html_wrap_inline468 , where tex2html_wrap_inline470 is the standard deviation of the distribution of folding times and N is the number of runs at that temperature. It is important to note that since we average in tex2html_wrap_inline446 when the chain does not fold, the error bars are not as meaningful at temperatures where the folding time approaches tex2html_wrap_inline446 and may be much larger at these temperatures. In particular, at high and low temperature the points equal tex2html_wrap_inline446 with zero error. That is simply due to the fact that at those temperatures the simulations never found the folded state. Figure gif shows the fraction of times the folded state was found as a function of temperature. It has a maximum plateau over the same temperature range that tex2html_wrap_inline458 has a minimum plateau. At temperatures where the chain folds rapidly, it also finds its native state 100% of the time. At temperatures where the simulations did not find the folded state all the time, the tex2html_wrap_inline458 shown in Fig. gif is a lower bound to the actual mean first passage time. Figure gif shows the distribution of folding times for three temperatures. At the temperature of fastest folding (T=1.58) the distribution of times is narrowest. For temperatures above and below this the distribution becomes quite broad. All three histograms appear roughly Poissonian. The standard deviations are approximately equal to the means.

We observe three different temperature regions, similar to those found in two-dimensional lattice simulations.[9, 11] Above a temperature of tex2html_wrap_inline486 and below tex2html_wrap_inline488 , the chains did not fold within tex2html_wrap_inline446 steps. Between these temperatures the folding time drops rapidly to approximately tex2html_wrap_inline492 - tex2html_wrap_inline494 . The fraction of runs that find the folded state increases sharply from 0 to 1 in this temperature range. In the next plot (Fig. gif) the folding time is plotted along with the chain compaction time. The compaction time is simply the number of steps it takes for an unfolded state to reach a maximally compact cube. In addition, we also show a time to reach a nearly compact state, which we define to be a conformation with 25 (out of 28) contacts. The behavior of the compaction time as a function of temperature is similar to that of the folding time, but chains compact much faster than they fold. This behavior is similar to what is believed to occur in real proteins in which the chain first folds rapidly to a compact state and then rearranges itself to the native structure.

Above a temperature of approximately 5, the compaction time approaches tex2html_wrap_inline446 . Above this temperature the free-energy is dominated by the entropic term which favors non compact conformations which are far more numerous than compact ones. In the range from 5-2, we observe the following interesting behavior. The chains compact fairly rapidly, but the folding time is still quite long. In particular, at about tex2html_wrap_inline504 the chains compacted easily but never folded within tex2html_wrap_inline446 steps. We can draw a parallel between this state and the molten globule state of proteins.[31] At this temperature range, the temperature is low enough so that the potential, which favors contacts, drives the chain to a compact conformation, but the temperature is still too high for the potential to drive the chain to the native state. In this range we can imagine the chain is fluctuating about various compact states, ``randomly'' searching for the native state. This is like the often-discussed Levinthal paradox in which it is argued that a protein could not find its folded structure by random search. If the temperature is high enough there is no strong driving force that favors the native state. When the temperature is low enough, the chain is no longer randomly searching compact conformations but is driven to the folded state. This is, of course, the well-known resolution to the paradox. At the appropriate temperatures proteins do not randomly search for their native state but are directed to it by the shape of the free energy surface.

At still lower temperatures both the folding and compaction time start to increase again. At temperatures slightly less than 1 the folding time reaches tex2html_wrap_inline446 again and at a temperature of roughly .63 the compaction time approaches tex2html_wrap_inline446 . At low temperatures the system is beginning to slow down, kinetically, and is now getting trapped in local meta-stable states. Even though we expect, at these low temperatures, the free energy to have a very pronounced minimum at the native state, the system is unable to reach it within a reasonable time. This region is often referred to as the glass phase. We can define a temperature at which the system undergoes a glass transition characterized by the slowing down of various times, such as the folding and compaction times. The autocorrelation time of the system would also increase in this temperature region, indicating that the chain was locally trapped. We define the glass transition temperature ( tex2html_wrap_inline374 ) as the temperature at which the folding time is half way between tex2html_wrap_inline446 and tex2html_wrap_inline518 (where tex2html_wrap_inline518 is the fastest folding time observed). Using this definition we get tex2html_wrap_inline522 . Note that the definition of glass temperature is not the usual thermodynamic definition of temperature. It is not determined by the inverse of the derivative of entropy with respect to energy ( tex2html_wrap_inline524 ) at the point where the entropy ``vanishes.''[32] One difficulty with this definition is its relation to the kinetics of the system. The idea is that as a system is taken out of equilibrium, then time it takes to relax back will increase as the temperature gets closer to tex2html_wrap_inline374 . To avoid this kind of assumption, we have given a kinetic definition for tex2html_wrap_inline374 in which we will explicitly look for a slowing down of the system. We would expect the precise value of tex2html_wrap_inline374 to depend on the moves used and therefore we will not focus on the exact value but on the relative value. In particular later we will compare tex2html_wrap_inline374 to another important temperature, the folding temperature ( tex2html_wrap_inline376 ), and we will discover a key relation between the two. Additionally tex2html_wrap_inline374 depends on the value of tex2html_wrap_inline446 ; that is, it will depend on how long we run our simulations. This is a subtle but extremely important fact to remember when studying finite sized systems. When talking about glasslike behavior of a finite system the notion of glasslike depends on the time scale you are looking at. If you wait long enough the chain will always find the native state. To speak of a physically meaningful glass transition one must define the physical time scales of interest. The time scales of importance here are related to the minimum folding time of a good folding sequence. We want to examine our system on a time scale that is reasonable greater then the minimum folding time. For our simple system there is no obvious greater time to pick. For real proteins the life time of the organism would be a reasonable choice, since proteins need to fold on a time scale much shorter than this to be useful. We picked a time that was roughly two orders of magnitude greater than the folding time. Since this is somewhat arbitrary, we investigated how tex2html_wrap_inline374 changed as tex2html_wrap_inline446 is varied. Figure gif shows the results of several simulations in which tex2html_wrap_inline446 was varied from one fourth the usual value ( tex2html_wrap_inline546 ) to almost twice this value. We see that although tex2html_wrap_inline374 decreases with increasing tex2html_wrap_inline446 it does so quite slowly. The difference between the last two is only about 4%; therefore, tex2html_wrap_inline374 is not too sensitive to the precise value of tex2html_wrap_inline446 .

The glass temperature just defined is related to the folding of the chains. One can also define a glass temperature that has to do with the slowdown of compaction. This would be the temperature at which the time it takes the chain to form a cube (28 contacts) is half way between the maximum and minimum times. Call this temperature tex2html_wrap_inline556 . Examining figure gif we see that tex2html_wrap_inline556 is less than tex2html_wrap_inline374 . (It is approximately equal to 0.7.) One could also consider tex2html_wrap_inline564 the glass temperature for forming 25 contacts (which is lower still). In general the transition temperature will be a function of some parameter, tex2html_wrap_inline566 , which is a measure of the compactness of the chain and/or similarity to the native state. Bryngelson and Wolynes[18] first calculated tex2html_wrap_inline568 in their random energy model.

The two regions in which the chain fails to fold are qualitatively very different. At high temperatures the energy differences between conformations becomes negligible so all conformations have roughly equal probabilities. The chain is randomly exploring the conformation space. It takes a long time to find the native state by random search due to the vast number of conformations. The free energy is dominated by the entropic term so the native state is no longer the global minimum. At low temperatures the energy differences between states becomes important and the folded state is the global minimum free energy. The problem now is that the barriers between states are too high and at low temperatures there is a very small probability for crossing them. For compact conformations many moves will involve the breaking of contacts which at low temperatures becomes unlikely. In particular, moves that break more than one contact are much less probable that those that break only one. Instead of a random search the chain is now forced in to a very narrow kinetic pathway consisting of those steps with very small free energy barriers. The chain gets trapped in the many local minima.

If we were willing to wait long enough the system would eventually fold. Since our system is finite, the system always has a finite nonzero probability to find the native state. The same is true for the high-temperature case: If we wait long enough, the chain will eventually find the folded state. However, one must remember that at those temperatures the folded state is not the free energy minimum and is therefore not stable. For example, consider the following two temperatures: 2.24 and 1.12. The folding time for these two temperatures is roughly the same. At the higher temperature (as we will see shortly) the chain spends almost zero time in the folded state (less than 0.04%). At the lower temperature the chain spends roughly 77% of the time in the folded state. When we speak of folding time, this is simply the time it takes the chain to find the native conformation. There is another important factor here: Namely, is the folded state stable thermodynamically? We will address this issue at the end of this paper, where we see that it is not enough that a chain find its native state in a short time, but it must do so at temperatures where the native state is thermodynamically stable.

Let us return to the question of how long is too long to wait for a sequence to fold. Too long is in general determined by other time scales in the system. For proteins, there are a number of biologically relevant time scales, the lifespan of the organism for example. Proteins that do not fold fast enough on this time scale can be considered not to fold at all. Since we are studying a simple artificial system there is no a priori time scale to pick, other than limits on the simulation (computer) time. One of the problems with Monte Carlo dynamics is that there is no easy way to ``calibrate'' them, that is to make a connection between Monte Carlo steps and ``real'' physical or biological time. What we have done here is to define the relevant time scale as the folding time (or the compaction time) and make sure we ran simulations for long enough that we could see the the variation of folding time as a function of temperature.

We have examined the folding (and compaction) time as a function of temperature for one sequence. We now would like to see how this function varies from sequence to sequence. Figure gif show a plot of the folding time and compaction time versus temperature for several sequences. From this figure we notice two very interesting features of this model. First the compaction time is sequence independent. All six sequences have roughly the same compaction time for temperatures above tex2html_wrap_inline374 . In contrast the folding time is highly sequence dependent. At a temperature of roughly 1.6 there is a difference of nearly an order of magnitude between the fastest and slowest folding sequence. The folding time is also roughly correlated with the energy of the folded state. The lower the energy, the faster the folding time. However, the relation between the energy of the native state and the folding time is not a simple one. For example the two sequences with the lowest energy folded states (sequence 002 and 004, see Table gif) have different folding times. The difference is slight but sequence 004 has a consistently faster folding time at all temperatures. Sequence 005, which is a single monomer mutation of 004, has a higher ground state energy (-82), but its folding times are very close to those of the lower energy sequence 002. There also appears to be a fairly large difference between the sequences that have energies below -81 and those that have energies above.

Note that the collapse time is always much faster than the folding time for all sequences. Even sequences that fold slowly collapse as rapidly as the fast-folding ones. This sequence-independent property of the collapse time is often referred to as a self-averaging property; it does not depend on the specifics of the sequence but rather on the general character of the ensemble of sequences. It is important to remember here that we are choosing a restricted ensemble of sequences though, namely the subset of sequences with a particular ratio of monomer types (a ratio of 14:13). Sequences that contain a different composition of monomers may have different collapse times than the sequences used here. The folding time is not self-averaging; i.e., it depends on the sequence. So we can view the kinetics of folding as a two-stage process. The first involves a rapid collapse of the polymer. The nature of this collapse is sequence independent. We can picture the polymer in this collapsed state as fluctuating about various compact cube states. This picture has been advanced previously by others.[33] The next step is a medium-to-slow event in which the polymer searches for the minimum energy state among the compact states. The time it takes for the polymer to find its minimum state depends on the specifics of the sequence. The two-phase collapse with two distinct time scales has also been observed for real proteins.[34, 35, 36] The first phase is a rapid collapse in which a hydrophobic core is formed. We should expect this collapse to be independent of the specific sequence, depending on the ratio of hydrophobic to hydrophilic monomers. This collapsed state then undergoes rearrangement to the folded structure of the specific sequence. The collapse time below the glass temperature loses its self-averaging property. Examining Fig. gif we see that below tex2html_wrap_inline522 the collapse time is no longer sequence independent. In the glass region we would expect the kinetics to depend on the details of the energy surfaces and these details will be sequence dependence. This is expected of a system exhibiting glassy behavior. Note how this contrasts with the high temperature limit, where the collapse times remain sequence independent even as they approach tex2html_wrap_inline446 .

At this point one may be tempted to conclude that we have two types of sequences: Fast folders and slow folders. However this is not the case. In reality what we have are sequences that fold and sequences that do not. In order to see this we need to look at the thermodynamics of these systems. In particular we need to look at the thermodynamic stability of the native state as a function of temperature. To do so, we performed a series of thermodynamic runs using the same Monte Carlo algorithm described above. The system was equilibrated by first running it for 100 million steps, which is on the order of the folding time for most of the sequences. Care must be taken at low temperatures since near the glass transition the system will slow down; i.e., the auto-correlation time will diverge. We looked at temperatures above tex2html_wrap_inline374 to avoid this problem. We calculate the following thermodynamic quantity:

equation138

where tex2html_wrap_inline600 is the energy of the native state and Z is the partition function. This quantity is the probability that the system is in the native state; that is, it is folded. We define the folding temperature as the temperature at which tex2html_wrap_inline604 ; that is when the folded state is half occupied. Note that tex2html_wrap_inline606 is a sufficient condition that the native state be the global minimum of the free energy. Four sequences were used, each with a different minimum energy ranging from -84 to -76. Several simulations were run at a number of different temperatures. Figure gif shows the results. In addition to running simulations at different temperatures, we also used the Monte Carlo histogram method[37] to calculate the function tex2html_wrap_inline612 at temperatures other then the simulation temperature. Using histograms collected from simulations run at T=1.58, we were able to calculate tex2html_wrap_inline616 for all temperatures, extrapolating into the glass region. These lines are plotted along with the points calculated from the standard Monte Carlo runs.

The folding temperature, tex2html_wrap_inline376 , varies with the value of the minimum-energy state. The lower the minimum energy the higher the folding temperature. More importantly we see that the two lower energy sequences have folding temperatures above the glass temperature, tex2html_wrap_inline374 , while the others have tex2html_wrap_inline380 . At tex2html_wrap_inline374 the lowest energy sequence (which also folds the fastest) is 90% in the native state. The highest energy sequence has a native state population of only 15%. At temperatures at which the folded state of the high-energy sequence is thermodynamically stable this state is not kinetically accessible. Therefore we would say this sequence does not fold. In order for a sequence to be foldable it must meet two conditions. First, it must have a reasonably fast folding time, where by reasonable we mean on the relevant (biological) time scales, and second, the folding temperature must be above the glass temperature. The analogous situation would be a polypeptide which had a folding temperature below the freezing point of the solvent. Such a protein would not be considered foldable.


next up previous
Next: CONCLUSIONS Up: Folding kinetics of proteinlike Previous: MODEL AND METHODS

nsocci@ucsd.edu