It has been known for some time that for many proteins the information necessary to specify the native structure is contained within the amino acid sequence. There has been a tremendous amount of research aimed at deciphering this code and determining the final structure from the sequence. Solving this problem is of paramount importance; however, simply knowing how to map sequences to structures would leave many interesting questions unanswered. How do proteins fold to their native structure and, more specifically, how do they manage to fold so quickly? What are the key factors that determine whether or not a given sequence will fold and what the folding time will be? One may argue that it might be necessary to solve these problems before it will be possible to solve the folding problem (i.e., predicting structure from sequence).
A great deal of work (both experimental and theoretical) has been done on the kinetics of protein folding. One extremely useful theoretical technique is to study simple heteropolymer models. The idea is to reduce the complex system of proteins in solution to its bare essentials, leaving only the key features. The advantage of studying these simpler models is that an in-depth analysis (sometimes even an exhaustive one) can be performed, yielding detailed answers and information. This information should, in turn, provide insights into real proteins.
One class of model that is often used in theoretical polymer work is the lattice model, where the monomers are constrained to lie on lattice sites. Excluded volume is included by allowing only one monomer per site. To study dynamics, the Monte Carlo algorithm with a variety of move sets is used. Some of the earliest work using lattice models on proteins was done by Go and others[1, 2] using two- and three-dimensional lattices to examine the folding process. However, the interaction potential they used was somewhat unusual. The native state was explicitly built into the potential. The energy of any given conformation was determined by counting the number of native contacts, i.e., contacts found in the native structure. An attractive contribution to the energy was added for each native contact formed. This potential is somewhat unphysical, depending on an a priori knowledge of the native structure. Although much of this early work on lattice models was on simple cubic lattices, Skolnick and others[3, 4, 5, 6, 7] have used more complex lattices which are able to more faithfully represent the structure of actual proteins. Using these lattices they are able to model real protein structures (e.g., secondary structure) and study the dynamics of folding and the formation of these structures. However, with increasing complexity it becomes more difficult to study these models in great detail.
Rather than trying to model real proteins exactly, some have opted for
simpler models which permit a more thorough analysis. Chan and
Dill[8, 9, 10, 11] have used a two-dimensional
simple cubic lattice model with two monomer types (a polar monomer,
P, and a hydrophobic one, H). The potential used models
the hydrophobic interaction and is equal to
times the
number of hydrophobic contacts (HH). They studied short chains,
which allowed them to do exhaustive enumeration to measure a variety
of properties (both static and dynamic). For dynamics they used both
Monte Carlo[9] and transfer matrix
methods.[10, 11] By using short polymers, they were able
to construct the full transfer matrix (this matrix determines the
probability of one state transforming to another) and use it to solve
exactly for the dynamics of the system. Although their model is
simpler than an actual protein, it has yielded a wealth of interesting
information and provided valuable insight into proteins and
heteropolymers. Their models show a two-phase process similar to that
found in proteins. There is a rapid collapse to compact states,
followed by slower reconfiguring of the chains to the native
structure. Fiebig and Dill [12] show that simple searching
strategies, such as the formation of opportunistic hydrophobic
contacts, can lead to the globally optimal conformation (native
state), suggesting a possible mechanism for folding. Shakhnovich and
others[13, 14] have studied the folding of random
heteropolymers (the interaction between monomers is picked from a
random distribution) on the three-dimensional simple cube lattice.
They examined 27 monomer polymers using Monte Carlo dynamics and also
found a two-stage collapse process in folding. They found that by
examining an overlap function, which measures how low-energy
conformations differ, they could distinguish the difference between
foldable and not foldable sequences. From examination of many
different sequences, they conclude that the existence of a pronounced
energy gap between the native state and the remaining conformations
distinguishes good folding
sequences.[15] To examine how the
specific form of the interaction affects the dynamics of folding,
Camacho and Thirumalai[16] looked at two-dimensional
lattice systems. They studied the kinetics of three different types
of interaction potentials. They found two transition temperatures: A
collapse temperature at which the chain forms a compact structure and
a folding temperature at which the native structure is formed. They
found three stages in the transition from open coil to native
structure.
In this work we will continue using the three-dimensional simple cubic
lattice model. The polymers will be 27 monomers long and consist of
two monomer types. Monte Carlo dynamics will be used to study the
collapse and folding kinetics. The chains are too long for exhaustive
enumeration of all conformations but are short enough to permit
exhaustive enumeration of all maximally compact configurations. This
information will be used to determine the minimum energy structure
(native state) which will allow us to measure the folding time from
extended conformations. We will examine several different sequences
and measure collapse and folding time as a function of temperature and
sequence. One question to be addressed is which kinetic quantities are
sequence dependent and which are sequence independent (
self-averaging). In addition, we will examine how the glass
transition affects the ability of a sequence to fold. A major goal is
to define, as precisely as possible, various physically important
quantities. Of particular importance will be the determination of the
important time scales. One problem with Monte Carlo dynamic
simulations is the relation between Monte Carlo steps and physical
time. There is no simple connection; in fact, the precise relation may
depend on the move set.[10, 11] To circumvent this
problem, we will relate Monte Carlo steps to physical time by looking
for the natural time scales in the problem, such as the collapse and
the folding time. Using these time scales, we will then be able to
define the glass transition temperature (
) of this model. In the
past others have speculated that the relation between the folding
temperature (
) and the glass temperature (
) would play an
important role in protein folding. Bryngelson and
Wolynes[17, 18] have proposed that in order
for a chain to fold, the folding transition must occur before the
glass transition of the system, and the optimal folding temperature
would lie between
and
. Specifically, Wolynes and others
state that to optimize folding potentials for structure prediction,
one should maximize the ratio of the folding temperature to the glass
temperature (
).[19, 20] To calculate the
glass transition, they used a random energy modellike assumption;
i.e., for each given value of the degree of folding, the
energies of the different conformation are independent random
variables. In our work we will give a direct kinetic definition of the
glass temperature that does not rely on this assumption, and show
explicitly that the relative values of
and
will determine
the folding properties of a given sequence.