PROTEIN FOLDING: HP Lattice Model

PROTEIN FOLDING: HP Lattice Model By: Mohammad Montazeri Supervisor: Professor Frank Pinski Department of Physics, University of Cincinnati, May 200...
20 downloads 0 Views 721KB Size
PROTEIN FOLDING: HP Lattice Model

By: Mohammad Montazeri Supervisor: Professor Frank Pinski

Department of Physics, University of Cincinnati, May 2007

Table of Contents:

1. Introduction........................................... 3 2. Protein Folding...................................... 8 3. Mechanism of Protein Folding........... 10 4. H-P Lattice Model............................... 15 5. Our Problem........................................ 17 6. Conclusion ........................................... 23 References............................................ 24

2

1.Introduction The word protein comes from the Greek πρώτα ("prota"), meaning "of primary importance" and these molecules were first described and named by Jöns Jakob Berzelius in 1838. However, proteins' central role in living organisms was not fully appreciated until 1926, when James B. Sumner showed that the enzyme urease was a protein. The first protein to be sequenced was insulin, by Frederick Sanger, who won the Nobel Prize for this achievement in 1958. The first protein structures to be solved included hemoglobin and myoglobin, by Max Perutz and Sir John Cowdery Kendrew, respectively, in 1958. Both proteins' three-dimensional structures were first determined by x-ray diffraction analysis; the structures of myoglobin and haemoglobin won the 1962 Nobel Prize in Chemistry for their discoverers.

Proteins are large organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The sequence of amino acids in a protein is defined by a gene and encoded in the genetic code. Although this genetic code specifies 20 "standard" amino acids, the residues in a protein are often chemically altered in post-translational modification: either before the protein can function in the cell, or as part of control mechanisms. Proteins can also work together to achieve a particular function, and they often associate to form stable complexes.

3

Proteins are essential parts of all living organisms and participate in every process within cells. Many proteins are enzymes that catalyze biochemical reactions, and are vital to metabolism. Other proteins have structural or mechanical functions, such as the proteins in the cytoskeleton, which forms a system of scaffolding that maintains cell shape. Proteins are also important in cell signaling, immune responses, cell adhesion, and the cell cycle. Protein is also a necessary component in our diet, since animals cannot synthesize all the amino acids and must obtain essential amino acids from food.

The lowest functional size of protein begins from 40 to 50 amino acids up to several thousand residues. Human body makes at least 50,000 different types of proteins with different functionality in the body.

Depending on the scale looking at proteins, the structure of protein could be realized into four types:

I. Primary Structure: The primary structure is simply the sequence of amino acids in protein (see figure 1). This structure forms the back bone of protein shape.

II. Secondary Structure: The secondary structure is regularly formed by repeating local structures stabilized by hydrogen bonds. The most common examples are the alpha helix and beta sheet (see figure 2). Because secondary structures are local, many regions of different secondary structure can be present in the same protein molecule.

4

III. Tertiary Structure or Folded State: The folded state is the overall shape of a single protein molecule; the spatial relationship of the secondary structures to one another (see figure 3). Tertiary structure is generally stabilized by non-local interactions, most commonly the formation of a hydrophobic core, but also through salt bridges, hydrogen bonds, disulfide bonds, and even post-translational modifications. The term "tertiary structure" is often used as synonymous with the term fold.

IV. Quaternary Structure: Quaternary structure is the shape or structure that results from the interaction of more than one protein molecule, usually called protein subunits in this context, which function as part of the larger assembly or protein complex.

In the next section we focus on the process of formation of folded structure, namely protein folding.

5

Figure 1: Primary Structure

Figure 2: Alpha Helices in Secondary Structure

6

Figure 3: Folded Structre (Simulation Results)

7

2.Protein Folding Protein folding is the physical process by which a protein folds into its characteristic three-dimensional structure. Each protein begins as a polypeptide, translated from a sequence of mRNA as a linear chain of amino acids. This polypeptide lacks any developed three-dimensional structure. However each amino acid in the chain can be thought of having certain 'gross' chemical features. These may be hydrophobic, hydrophilic, or electrically charged, for example. These interact with each other and their surroundings in the cell to produce a well-defined, three dimensional shape, the folded protein, known as the native state. The resulting three-dimensional structure is determined by the sequence of the amino acids.

Experimentally determining the three dimensional structure of a protein is often very difficult and expensive. However the sequence of that protein is often known. Therefore scientists have tried to use different biophysical techniques to manually fold a protein. That is, to predict the structure of the protein complete from the sequence of the protein.

In certain solutions and under some conditions proteins will not fold into their biologically "functional" forms. Temperatures above the range that cells tend to live in will cause proteins to unfold or "denature" (this is why boiling makes the white of an egg opaque). High concentrations of solutes and extremes of pH can do the same. A fully denatured protein lacks both tertiary and secondary structure, and exists as a so-called

8

random coil. Cells sometimes protect their proteins against the denaturing influence of heat with enzymes known as chaperones or heat shock proteins, which assist other proteins both in folding and in remaining folded. Some proteins never fold in cells at all except with the assistance of chaperone molecules, that either isolate individual proteins so that their folding is not interrupted by interactions with other proteins or help to unfold misfolded proteins, giving them a second chance to refold properly.

For many proteins the correct three dimensional structure is essential for the protein to function correctly. Thus, failure of folding usually produces inactive proteins with different properties. Several diseases are believed to result from the accumulation of misfolded proteins like Alzheimer's disease, cystic fibrosis and BSE. These diseases are associated with the aggregation of misfolded proteins into insoluble plaques; it is not known whether the plaques are the cause or merely a symptom of illness.

9

3.Mechanism of Protein Folding The mechanism of protein folding is not well understood. It is generally accepted that the folding process is dominated by hydrophobic residues and their interaction with the solvent and other residues.

Hydrophobic molecules tend to be non-polar and thus prefer other neutral molecules and nonpolar solvents. Hydrophobic molecules in water often cluster together. Water on hydrophobic surfaces will exhibit a high contact angle. Examples of hydrophobic molecules include the alkanes, oils, fats, and greasy substances in general. Hydrophobic materials are used for oil removal from water, the management of oil spills, and chemical separation processes to remove non-polar from polar compounds.

On the other hand, a hydrophilic molecule or portion of a molecule is one that is typically charge-polarized and capable of hydrogen bonding, enabling it to dissolve more readily in water than in oil or other hydrophobic solvents. Hydrophilic and hydrophobic molecules are also known as polar molecules and nonpolar molecules, respectively. Soap has a hydrophilic head and a hydrophobic tail which allows it to dissolve in both waters and oils, therefore allowing the soap to clean a surface.

In a protein sequence, some residues are hydrophobic and some hydrophilic. When protein is in the solvent (usually water, which its molecules are polar), the protein deforms such a way to make minimum contact between its hydrophobic residues and

10

water molecules, or equivalently, to maximize the contact between its own hydrophobic residues. This is understood as the dominant interaction in folding process.

One can define a free energy for the protein taking into account the domination of hydrophobic interaction in protein folding process. At the native state, which the free energy is minimized, the number of hydrophobic contacts is maximized. When an unfolded protein start to be folded the protein passes through various geometrical formations each corresponds to an energy level. Hence one can define the energy landscape (see figure 4). Each point on the energy landscape corresponds to one geometrical conformation of protein. In this view, the folding process is understood as a path on the energy landscape toward the native state which is located at the lowest level of the energy landscape.

The energy landscape theory was formulated by Joseph Bryngelson and Peter Wolynes in the late 1980's and early 1990's. This approach introduced the principle of minimal frustration, which asserts that evolution has selected the amino acid sequences of natural proteins so that interactions between side chains largely favor the molecule's acquisition of the folded state. Interactions that do not favor folding are selected against, although some residual frustration is expected to exist. A consequence of these evolutionarily selected sequences is that proteins are generally thought to have globally "funneled energy landscapes" (coined by José Onuchic) that are largely directed towards the native state(see figure 5). This "folding funnel" landscape allows the protein to fold to the native state through any of a large number of pathways and intermediates, rather than being restricted to a single mechanism. The theory is supported by computational simulations 11

of model proteins and has been used to improve methods for protein structure prediction and design.

For most of sequences the energy landscape has not the funnel like form but has roughness (see figure 6). In this case, if the protein follows the lower energy path, it might be trapped in some region that is not the native state but a local minimum. This phenomenon is called kinetic traps. In this situation, the protein is expected to climb the local minimum, or in other words, the protein should unfold enough to pass the kinetic trap.

The depth of these minima could be large enough that even the thermal fluctuation could not help the protein to follow its path toward its native state. The kinetic traps could be one of the reasons for protein misfolding or delaying the folding process. It turns out that toward the native state, the protein makes large number of mistakes (follows wrong paths). These mistakes are not only because of kinetic traps but phenomena like uphill steps, thermal motions, retrying of earlier conformation and etc.

Generally, what is understood from folding process is that, at very first steps, the unfolded protein collapses to a compact formation with hydrophobic core in a very short time, and then it goes toward native state by some internal reformation involving both folding and unfolding steps. Most of above mentioned mistakes are made after collapse of protein during its internal deformation.

12

Figure 4: Energy Landscape

Figure 5: Funnel-Like Energy Landscape 13

Figure 6: Roughness on Energy Landscape and Kinetic Traps

14

4.H-P Lattice Model The hydrophobic-polar protein folding model is a highly simplified model for examining protein folds in space. First proposed by Dill in 1985, it is motivated by the observation that hydrophobic interactions between amino acid residues are the driving force for proteins folding into their native state. All amino acid types are classified as either hydrophobic (H) or polar (P), and the folding of a protein sequence is defined as a selfavoiding walk in a 2D or 3D lattice. The HP model imitates the hydrophobic effect by assigning a negative (favorable) weight to interactions between adjacent, non-covalently bound H residues. Proteins that have minimum energy are assumed to be in their native state (see figure 7).

The HP model can be expressed in both two and three dimensions, generally with square lattices, although triangular lattices have been used as well.

Randomized search algorithms are often used to tackle the HP folding problem. This includes stochastic, evolutionary algorithms like the Monte Carlo method, genetic algorithms, and ant colony optimization. While no method has been able to calculate the experimentally determined minimum energetic state for long protein sequences, the most advanced methods today are able to come close.

Even though the HP model abstracts away many of the details of protein folding, it could describe the general behavior of protein folding accurately.

15

Figure 7: HP Lattice Model on a Square Lattice

16

5.Our Problem In this project, we developed a code to finding the native state of an arbitrary sequence by approach of HP lattice model. Given a specific sequence, the code generate large number of conformation. These conformations are generated by self avoiding random walk path algorithm. The number of steps in the algorithm is equal to the length of protein. Afterward, the code counts the number of HH contacts for each generated conformation. Searching among the conformation, the code specifies the conformation with the maximum number of HH contacts as the native state of the sequence.

The algorithm of our code is as follows:

17

Start Given Sequence Using Self Avoiding Random Walk Generate Many Conformations. Count Number of HH Contacts in Each Conformation. Find the Conformation with the Greatest Number of HH Contacts. Native State

End

18

The specific sequence we used to check our code is a sequence with 13 residue “P-H-PH-H-P-H-H-H-H-H-H-H”. After examining 10,000 conformations for this sequence and finding the number of HH contacts for each conformation, the graph shown in figure 8 produced. In this graph, the horizontal axis corresponds to the serial number of conformations while the vertical axis represents the number of HH contacts for each conformation. It is understood that among these 10,000 conformations, two of them have 6 HH contacts. Noting the dramatically decrease of degeneracy by increasing the energy level, we reasonably conclude that these two specific conformations represent the native state. It turns out that both of them represent the same conformation which is generated twice by the self avoiding random walk code. This is expected since, after all, the code does not prevent double generation of the same conformation. The predicted conformation of the native state is illustrated at figure 9. The conformation is in agreement with the reference.

19

Figure 8: Number of HH Contacts vs Conformations for Our Test Sequence

Figure 9: The Natice State with 6 HH Contacts 20

At figure 10, the Mathematica Code we developed appears. In this code, number of conformations intended to be generated is defined by ‘try’, the dimension of sequence is defined by ‘dim’, and the sequence of H and P residues is given by ‘seq’. At the end of running, two output would be appear: ‘pos[[conf, x, y]]’ and ‘h[[conf]]’. Where in the first output position of residues for each conformation and in the second output number of HH contact for each conformation are saved. By use of ‘h[[conf]]’ the conformation correspond to the maximum number of HH contact could be find. Returning to the ‘pos[[conf, x, y]]’, the conformation of native state could be obtained.

21

try = 10000; dim = 13; seq = 8P, H, P, H, H, P, H, H, H, H, H, H, H