4. Structure of biological macromolecules. Proteins and nucleic acids. L. Kurunczi and T. I. Oprea

4. Structure of biological macromolecules. Proteins and nucleic acids. L. Kurunczi and T. I. Oprea 4.1. Structural features of the proteins. The Dutch...
Author: Lora Harper
37 downloads 0 Views 1MB Size
4. Structure of biological macromolecules. Proteins and nucleic acids. L. Kurunczi and T. I. Oprea 4.1. Structural features of the proteins. The Dutch chemist Gerardus J. Mulder introduced the term protein designating the most abundant class of biomolecules in cells in 1838. This was derived from the Greek word proteis, meaning first rank of importance.1 The proteins, macromolecules with molecular weights between 4,000 and several millions (in the case of protein complexes), are mainly polymers formed by several covalently linked amino acids. Thus the central chain of the polymer is represented by the peptide bond: –C(=O)– N(H)–. The chemical nature, number, and sequential order of amino acids in a protein chain determine the distinctive structure, characteristic chemical behavior and, partly, the biological function of each protein. This is why, to understanding the complex biological role of proteins, the preliminary knowledge of chemical proprieties of amino acids is required. 4.1.1. General properties of the amino acids in relation with the structure of proteins. Amino acids are small molecules, with mean molecular weights around 100, containing an amino and a carboxyl group, bonded to an organic group, R, in those of interest here, through an intermediary C(H) atom (the asymmetric α-carbon). In proteins, till recently, only 20 such genetically encoded (see below 4.2.4) amino acids are encountered, denominated essential or sometimes magic twenty. The chemical constitutions of these are presented in figure 4.1. With one exception, the proline, each such amino acid possesses a free carboxyl and amino group, the differences residing in the R groups (side chains). The often-utilized conventional three- and oneletter abbreviations for the magic twenty are also presented in figure 4.1. Very recently two other genetically encoded amino acids were identified. In 1986 the nonstandard amino acid selenocysteine, directly specified by the genetic code, was added to the magic twenty.2 Selenocysteine is found in many proteins (selenoproteions), which all have redox activities. Among the most widely found selenoproteins in mammals are the enzymes Thioredoxin Reductase and Glutathione Peroxidase. In 2002 selenocysteine is joined by pyrrolysine, which is encoded in the genetic code of certain Archaea and eubacteria.3 The 21st and 22nd genetically encoded amino acids are also presented in the last row of figure 4.1 (without conventional abbreviation). 76

N

N

N

N

N

O L-Glycine (Gly) G

O L-Alanine (Ala) A

O L-Valine (Val) V

O O L-Leucine L-Isoleucine (Leu) L (Ile) I

O

S N

N

N

N

O L-Phenylalanine (Phe) F

O L-Tyrosine (Tyr) Y

S N

N

O O O L-Tryptophan L-Methionine L-Cysteine (Trp) W (Met) M (Cys) C N N N

N O

N

O

N O L-Serine (Ser) S

N

N

N

O L-Threonine (Thr) T O

N O L-Arginine (Arg) R

O

N

O L-Lysine (Lys) K O

O

O L-Histidine (His) H

N

O

O N O L-Proline (Pro) P

N

N

N N

O O O L-Glutamic acid L-Aspartic acid L-Glutamine (Glu) E (Asp) D (Gln) Q O X Se N

N O L-Asparagine (Asn) N

N N N L-Selenocysteine O L-Pyrrolysine (X = Me, NH2, OH) O Figure 4.1. Structure of the genetically encoded amino acids encountered in proteins. 77

The proteins molecular complexity implies a gradual approach of their structure at different levels: the primary, secondary, tertiary and quaternary structures. Based on the “interactions” necessary to realize these structures, the characterization of these levels is useful in understanding the structure, physical-chemical and biological properties of the proteins. The four levels also interdependently influence a protein’s biological function. The first three structural levels can exist in molecules formed of a single polypeptide chain, whereas the fourth involves multi-chained protein molecules. Shortly, the primary structure refers to the amino acid sequences from the polypeptide chain and to the covalent bonds between the amino acids, very alike with usual organic compounds, characterizing the relatively small molecules, like peptides. Together with the growth of the number of the amino acids from the chain new types of “bonds” appear, and the existing ones gain new attributes. As an example, the hydrogen bonds, and not only they, are determinant for the secondary structure, which refers to the local, regular arrangement of the sequence. This structure imposes the existence of certain conformations: the α-helical, the β-pleated sheet, etc. Between some sections of different α-helices or β-sheets, appear new types of interactions implying mainly the side chains: hydrophobic, saline etc., which determine the tertiary structure. This refers to the “packing” in space of the secondary structural elements forming compact units named protein domains. Interactions with the participation of different protein molecules, which imply the formation of characteristic aggregates in space, lead to the definition of the quaternary structure. These interactions involve some complex relations among the surfaces of the side chains and certain portions of the peptide linkage or of the peptide subunits. As mentioned, many types of interaction forces are implied in the assembly of the proteins. The covalent peptide bonds determine the primary structure, while the disulfide bonds (–S–S– between two remote cysteinyl groups, vide infra) intervene in the tertiary and quaternary structure. The peptide linkages are stiff (partial double bond character), unlike the other participating covalent chemical bonds, which are flexible (simple bonds). The intra- and inter-chain hydrogen bonds are characteristics for the secondary structure. The hydrophobic and electrostatic interactions among the R side chains are determinant mostly for the tertiary and quaternary structures. A peptide adopts usually a secondary and tertiary structure, while the proteins a tertiary and quaternary one. As a rule in the peptide bond between two consecutive amino acid residues the O(=C) and H(–N) atoms are in trans position. The numbering of the consecutive residues from a polypeptide (protein) is performed from the N-terminal (– NH3+ in physiological conditions), towards the C-terminal (– COO- in the same conditions). With one exception, the glycine, all essential amino acids present optical activity, because of presence in their molecule of at least one asymmetric carbon atom. When they are analyzed at pH 7.0, some of the α-amino acids isolated from the proteins are dextrorotatory molecules (Ala, Ile, Glu, etc.), others levorotatory isomers (Trp, Leu, Phe). However all these “natural” amino acids (obtained by hydrolysis of the proteins in soft conditions) belong at the α-carbon to the L stereochemical series (see Figure 4.2). Thus, in the protein molecules only L-α-amino acids are present, but a number of D-amino acids are found in the cells in other chemical forms. For example, they are recovered from the cellular walls of some microorganisms, or participate as parts of the structure of some peptide antibiotics, like gramacidine and actinomicine-D.

78

R C O

Cα H

R Cα C H H O

N

N

H

Figure 4.2. Chirality of the amino acids. The cysteine is a fairly polar amino acid, due to the presence in its molecule of the sulfhydryl (–SH) group. Often two remote cysteinyl residues in proteins form covalent bonds by oxidation of the SH groups (appearance of the disulfide, –S–S–, bond). The presence of this type of bond can be detected by the week, but characteristic UV spectral absorption at 240 nm. These disulfide bonds are important in the stabilization of certain secondary or tertiary structures (vide supra). The formation of the –S–S– bonds is also a very useful technique often used in the determination of the quaternary structure of the proteins (through mutations of the cysteine residues in different positions in a cloned protein). Table 4.1. Some characteristics of the 20 magic amino acids. Amino π Character pK a pK a pK a Acids (α-COOH) (α- NH ) (in R) (for R) of R 2 Ala (A) 2.35 9.69 0.31 Nonpolar Arg (R) 2.17 9.04 12.48 -1.01 Hydrophilic (+) Asn (N) 2.02 8.80 -0.60 Polar enough Asp (D) 2.09 9.82 3.86 -2.57 Hydrophilic (-) Cys (C) 1.71 10.78 8.33 1.54 Polar Gln (Q) 2.17 9.13 -0.22 Polar enough Glu (E) 2.19 9.67 4.25 -2.29 Hydrophilic (-) Gly (G) 2.34 9.60 0.00 His (H) 1.82 9.17 6.0 0.13 Only 10 % (+) Ile (I) 2.36 9.68 1.80 Nonpolar Leu (L) 2.36 9.60 1.70 Nonpolar Lys (K) 2.18 8.95 10.53 -0.99 Hydrophilic (+) Met (M) 2.28 9.21 1.23 Nonpolar Phe (F) 1.83 9.13 1.79 Nonpolar Pro (P) 1.99 10.60 0.72 Nonpolar Ser (S) 2.21 9.15 -0.04 H-bonding Thr (T) 2.63 10.43 0.26 H-bonding Trp (W) 2.38 9.39 2.25 Nonpolar Tyr (Y) 2.20 9.11 10.07 0.96 H-bonding Val (V) 2.32 9.62 1.22 Nonpolar All the peptides and the proteins have as fundamental “skeletal” units α-amino acids bonded between them by peptide type bonds –C(=O)–N(H)– formed between the α–NH2 of an amino acid and the α–COOH group of the neighbor amino acid in the chain. Thus the amino acids,∗ as a matter of course, participate to the formation of the peptide chain through the same common fragment: the amide bond. Determinant in the stabilization of the protein tertiary and quaternary structure is the polarity of the ∗

Exception is the Pro and 4-hydroxy-Pro (a modified magic amino acid, present in collagen beside of 5-hydroxy-lysine).

79

side chains R of the incorporated amino acids. Therefore Table 4.1 presents pK a values (at 25° C) for the –COOH and –NH2 groups (bonded to the α-carbon) and for the side chains R possessing a polar group which can exist in protonated and deprotonated state. The same Table contains also the lipophilicity contribution of the side chains of 20 coded amino acids4 (obtained from the partition of N-acetyl-C amides in water / octanol system at pH 7.1) and the polarity of R at physiological pH. 4.1.2. Structure and conformation for the peptide units. The steric structure of the peptide chain is characterized by the rotational angles Φ (C–N–Cα–C) and Ψ (N–Cα–C–N) presented in figure 4.3. The (O=)C–N bond having partial double-bond character, the above mentioned angles around the Cα carbon are the only bonds with free rotation in the polypeptide main chain. O O R' H H Φ'' Ψ '' C N C C N C N C Φ ' Ψ' H O H R'' H Figure 4.3. Rotational freedom about the single bonds in a polypeptide main chain. Quantum-chemical calculations for dipeptides showed that the minima as regards the Φ and Ψ angles are determined especially by the side chain R, and are fairly independent with respect to the other residues. The calculations applied to tripeptides evidenced an additional minimum, determined by the formation of a hydrogen bond. This is the so-called “U-turn” [U10], made up by 10 atoms, and “closed” by a hydrogen bond (see figure 4.4). H R

H

N

H

5 C C7 C 6 4 R 3C H N8 O O 2 9 1 N C 10 H O

Figure 4.4. The “U-turn” formed by 10 atoms. Calculations for the energy of the system considering also the interactions between non-bonded atoms (for homo-polypeptides, isolated amino acid residues and amino acids from dipeptides) revealed five skeletal conformational energy minima. Table 4.2. The (rounded) values of Φ and Ψ angles for some secondary structures Structure Φ (degrees) Ψ (degrees) 0 0 sp -60 -50 Rα β -150 150 -60 -30 U10 60 50 Lα

80

From these only four are retained counting on possibilities of hydrogen-bond formation (see Table 4.2 above, where the exactly synperiplanar (sp) or eclipsed conformation is considered Φ = 0 and Ψ = 0). The minima correspond to α - helix structure (right Rα, or left Lα), β-sheet structure, or a U-turn. These are presented in Table 4.2 (rounded values) and Figure 4.5.

Φ

Ψ

C n −1 U10 , R α ← C

C → Lα

U10 ← N R α← N

N n −1 N → Lα

Cn

Nn β← C

N →β

Figure 4.5. Newman projections for the stable conformations of the polypeptide skeleton. Concerning the side chain conformations, these are characterized by the dihedral angles (see Figure 4.6) χ1 (N–Cα–Cβ–Cγ), χ 2 (Cα–Cβ–Cγ–Cδ), χ 3 , etc. The conformational energy calculations indicated three minima around the “normal” values: χ = +60°, 180° and -60°. Often, in polypeptides certain values for the χ angles are prohibited because of the steric repulsions, which appear between the different R side chains in reciprocal van der Waals contact (especially in the tertiary and quaternary structure). χ1 χ χ3 Cα 2Cγ O C Cβ Cδ H N

Figure 4.6. Parameters defining the side chain conformations. There are many biologically important amino acid polymers with a limited number of residues. A lot of peptides (containing 2 – 30 amino acid residues), sometimes having closed cyclic chains, demonstrate significant biologic functions. Table 4.3 present some natural peptides produced by different organisms. Table 4.3. Examples of natural peptides Name Structure Leu-enkefaline Tyr-Gly-Gly-Phe-Leu Met-enkefaline Tyr-Gly-Gly-Phe-Met Angiotensine II Asp-Arg-Val-Tyr-Ile-His-Pro-Phe Vasopresine Cys –Tyr-Phe-Gln-Asn-Cys-Pro-Arg-Gly Oxytocine Cys –Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly Gramacidine S D-Phe-L-Pro-L-Val-L-Orn-L-Leu L-Leu-L-Orn-L-Val-L-Pro-D-Phe 4.1.3. The conformations of polypeptides and proteins.

The knowledge of the primary structure, as has been already pointed out, means to find out the number and the sequence of the amino acids belonging to a protein. The recognition of the conserved regions from the primary structure

81

constitutes the basis of the primary sequence alignment, the secondary structure prediction, and respectively, the tertiary structure estimation (threading, folding, homology modeling). The secondary structure implies some hydrogen bond types, and is a consequence of the characteristics of the peptide bond. The analysis of the specific features of these intra-molecular hydrogen bonds leads to the conclusion, that the most important structures are the α-helices (hydrogen bonds between nearer amino acids from the chain) and the β-pleated sheets (hydrogen bonds between farther amino acids from the chain). The name α and β belongs to Pauling and Corey.5 The Ramachandran plot shows the permitted and non-permitted conformations for the polypeptide chains. As mentioned (see 4.1.2 and Figure 4.3) every third skeletal bond of a polypeptide chain, i.e. the (O=)C–N bond, has partial double bond character and is unable to ensure free rotation. The other bonds of the “central” skeleton, namely N–Cα and Cα–C, theoretically are free to rotate, because they are single bonds. However if the side chain R attached to the Cα carbon is large enough, it hinders the complete rotation around these bonds. Moreover, if the H(–N) and O(=C) atoms belonging to the same residue, during the rotation arrive in a position corresponding to a nil angle formed by the corresponding vectors NH and C=O, the steric repulsion between the H(–N) and O(=C) atoms of two adjacent amino acids creates an energetic barrier that prohibits the corresponding Φ and Ψ angles. Based on theoretical considerations, and also on peptide models, Ramachandran constructed a plot of the permitted and non-permitted values for the Φ and Ψ angles around the single bonds N–Cα and Cα–C. Ψ(º)

Ψ(º) β

α α

β

Φ(º) Φ(º) Figure 4.7. Ramachandran plots. Left: sterically permitted Φ – Ψ angle pairs (usual right-handed α-helices, R α , β-strands, and left-handed α-helices, L α ); disallowed, generously allowed, favorable, and most favorable regions are indicated by progressively darker shading. Right: Glycine and non-glycine Ψ, Φ pairs are designate by triangles and squares for the experimental structure of a dehydrogenase enzyme. As was already pointed out in Table 4.2, the number of energetically stable conformations around these bonds is limited. Thus, for example, the α-helix is an accepted conformation in the Ramachandran plot, because all the successive peptide bonds from α-helical chain admit rotation angles in a permitted region. The most stable such conformations correspond to the α-helix rotated to right, Rα to which can 82

be attributed the Φ = -57° and Ψ = -47° pair. As can observe, for a significant Φ angle enlargement, without a modification of the Ψ value, the accepted region is abandoned. On the other hand, if the Ψ angle value rise or diminish considerably, without an adequate alteration of the Φ angle, the Φ - Ψ pair does not represent a real conformation. In this way, for any Φ - Ψ pair on can foresee if the corresponding conformation exists. All the naturally appearing conformations in proteins are allowed by the Ramachandran plot (see Figure 4.7). In α – helix conformation the basic structure is represented by a helical spiral, which has approximately 3.6 amino acids for each volute. The side chain R groups are found in the external part of the relatively restrained helix formed by the polypeptide skeleton (see figure 4.8).

3.6 residues 5.4 Å

R side chain

Figure 4.8. α-Helix: lateral and top view along the longest axis. The H – bonds are also represented. In this rod-like structure, the rise per helix turn is formed by 3.6 amino acid residues and represents around 5.4 Å along the helix axis, as was established from the X-ray structure of the natural α-keratines. The diameter of the cylindrical surface containing the Cα carbons of the helix is of 10.1 Å. Such α-helix means formation of “intramolecular” (intrachain) hydrogen bonds between two consecutive turns, approximately parallel with the long axis. These are formed between the H attached to the electronegative nitrogen of the amide group and the carbonyl O of the fourth consecutive amino acid in the chain. The helix can be formed either by L – amino acids, or by D – amino acids, but never by a chain of mixed enantiomers (L and D). With the L – amino acids occurring in the natural proteins, the right-handed ( R α ) and the left-handed ( L α ) form of the helix are allowed, but the first one is significantly more stable. For such a helix Φ ≈ -60º and Ψ ≈ -50º, and the number of amino acid residues ranges between 4 and 80. On an average a typical α-helix contains 10 amino acids and a length of 15 Å. The dipole moment corresponding to the helix would be an important factor in stabilization of the protein structure. Every amino acid residue

83

contributes with about 3 D to the overall moment (10 residues means ≈ 30 D). For example in the case of barnase (a bacterial ribonuclease with an amino terminal helix - α1), a His mutated C-terminal (this residue possessing a positive charge – see Table 4.1) increase the melting point with nearly 5º C, as determined from the reversible thermal denaturation.6 This means a 2 kcal/mole stabilization of the protein. The enhanced stability is attributed to the ion (His) – dipole (helix α1) interaction, which appears due to spatial neighborhood of these structural units (see figure 4.9).

Figure 4.9. The structure of barnase represented by “graphical conventions”: the ribbon volutes symbolize α-helices, the plate slightly twisted straps are β-strands, and the loops between these appear like thinner, disordered wires. As can be seen the C terminus is situated near the helix labeled α1. It was determined that the preference for α-helix formation is pronounced for A, E, L, M, F, and lowers for P, G, Y, S. If in the primary sequence four helixfavorable amino acids appear consecutively (interrupted at the most by one G), an αhelix is initiated, which continues until two consecutive unfavorable residues are present, or a Pro residue is encountered. Table 4.4 presents the probability to find the 20 magic amino acids in one of the three secondary structure forms (α-helix, β-sheet or random coil) based on crystallographic (X-ray) data: tens of proteins comprising 2437 residues with known secondary structure were analyzed.7 Table 4.4. The secondary structure tendencies for the 20 magic amino acids. G A V L I

Α 0.19 0.52 0.41 0.48 0.36

Β 0.15 0.16 0.28 0.21 0.29

Coil α β 0.66 S 0.28 0.15 0.32 T 0.29 0.24 0.31 Y 0.22 0.24 0.31 C 0.28 0.22 0.35 M 0.48 0.29

Coil 0.57 0.48 0.54 0.50 0.28

D N E Q R

α 0.35 0.26 0.55 0.42 0.28

β 0.14 0.12 0.04 0.25 0.17

Coil α β Coil 0.51 K 0.38 0.11 0.51 0.62 H 0.45 0.12 0.43 0.41 F 0.40 0.21 0.39 0.33 W 0.41 0.20 0.39 0.55 P 0.21 0.17 0.62

Recent studies8 have shown that the formation of an α-helix is determined not only by the amino acids that effectively build this secondary structural motif. Also the interactions between the neighbor 4 – 5 amino acids (labeled NI, NII, NIII, … at the N terminal side, and CI, CII, CIII, CIV, … at the C terminal) from the sequences that flank the helix, and the first amino acids participating to the helix construction (Nhead, N1, N2, N3 and Chead,C1, C2, C3, using the same convention) play an important role: 84

··· NIII, NII, NI, Nhead, N1, N2, N3, ···, C3, C2, C1, Chead, CI, CII, CIII, CIV, ··· The types of intermolecular interactions which produce the formation of helix form are: the hydrophobic ones (NIII, NII, NI with N3/N4, and CII, CIII,CIV with C3), the hydrogen bonds between the side chains of Nhead and the amido group from N3 (at the N terminal), and the amide hydrogen and the carbonyl O (at the C terminal: between the amino acids of the helix and the loop after Chead, and also between amide O from Chead and amide H from C3 + C4, the last one three-centric). C1 is usually Gly or Pro. Besides the hydrogen bonds presented in Figure 4.8, in the stabilization of α-helix often are important the interactions between the carbonyl O from the polypeptide chain and the different functional groups from the side chains. Such interactions are presented in Table 4.5, where O symbolizes the oxygen from the carbonyl group.6 Table 4.5. Different type of side chain of interactions in peptide helices Charge – aromatic H bond – charge H bond Nonpolar interactions + O Phe (F) – His (H) Gln (Q) – Asp (D) Gln (Q) – Asp (D) Leu (L) – (Ile (I) Trp (W) – His+ (H) GluO (E) – Lys+ (K) Gln (Q) - GluO (E) Leu (L) – Leu (L) His+ (H) – AspO (D) Leu (L) – Val (V) The facts exposed in the last three paragraphs demonstrate that the formation of this type of structure (α-helix) is determined in a large measure by the information from local sequence of the protein. In vitro kinetic measurements have shown that the folding of an α-helix of mean length require about 0.2 microseconds. In figure 4.10 such a helix of mean dimension, together with the side chains is presented. Concerning the interactions of these helices with the “environment” (either the solvent, or other secondary structural elements from the protein, or even other proteins or biomolecules), the character of the side chains is of great importance (see Table 4.1).

A L G N M A A A F S L Figure 4.10. An α-helix alcohol dehydrogenase presented also with the side chains. The helix long axis is parallel (left) or perpendicular (right) to the plane of the paper. If the side chains of a helix are predominantly nonpolar, there is a great chance to find this part of the polypeptide buried in the interior of a protein, without contacts with the polar solvent. In the case of preponderantly polar side chains of a helix, this

85

unit can be exposed to the polar medium (solvent, for example). The rule is predominantly valid for globular proteins, but in integral proteins of membranes is often reversed. In this case the exterior part of the protein molecule, in contact with the nonpolar lipid bilayer of the membrane, comprises mainly hydrophobic side chains. In such proteins, in the interior of the molecule, channels lined with polar, often electrically charged side chains can be formed (ion channels). In the sequence from alcohol dehydrogenase depicted in figure 4.10 above, the spatial disposition of the polar and nonpolar side chains are in such a way as to ensure that one half of the helix “contact surface” is polar (the left side of each picture) and the other half is nonpolar (right side, see for example the phenyl ring from Phe). The structure of biologically active proteins does not consist exclusively of αhelices. Some of them however possess a high enough percentage of such structure. For example the myoglobin is 75 % formed by α-helices. The protein consists of a single polypeptide chain containing 153 amino acid residues, and a heme group (an iron-containing biomolecule - see below, paragraph 4.1.5). The polypeptide chain forms eight right-handed α-helices and five interposed nonhelical regions. At the Nterminus and the C-terminus also can be find two randomly arranged loops. Spiraled ribbons as in figure 4.11 represent the α-helices conventionally.

Figure 4.11. Several α-helices from acetylcholinesterase: they present different magnitude, showing different spatial orientations, and are represented by conventional notation. Rarely a polypeptide can form also a “tighter” helix consisting of triangular turns of three amino acids, denominated 310-helix. Such a turn can be represented by the spatial arrangement from figure 4.4, the geometrical characteristics being reproduced in Table 4.2. In this case the (>N-)H···O(=CC=)O···H(-NN-)H protons with the H3O+ and HOcontaining solution (pH = 3 ÷ 8). The order of folding is inversely proportional with the rate of proton exchange in native protein.15 A certain kinetic pathway (like in transition state kinetic theory of chemical reactions) for the folding seems to be necessary. In principle, with four minima pro amino acid residue (in function of the Φ, Ψ angles), for a polypeptide formed by a number of N residues 4N conformations are possible. The time necessary for “accidentally finding” of the absolute energetic minim would be anyway astronomic: 4N ≈ 100.6·N, and for N = 100 (a small protein) we have 1060·τ(necessary for a conformational transition) → ∞. It would be possible that nature selects sequences that have the capacity to fold rapidly toward the thermodynamically most stable conformation.16 However simple rules proved to be operative in the protein folding prediction, in spite of the mentioned complexity.17 The topology of native proteins seems to determine the principal characteristics of free energy relief for the folding. The folding transition state structures for similar native structure proteins are similar. The topology dependence of the folding rates was proved by comparison of these rates with the relative contact order (a linear dependence was found). The relative contact order of native structure is the mean separation (distance) along the sequence, between amino acids in physical contact in a folded protein, divided by the length of the protein. Evidently, for example a β-sheet presents low contact order, and an αhelix high contact order. In vivo the folding process is still more complicated. Many proteins possess an amino-terminus propeptide, essential for the correct folding, which is excised from the mature protein.18 More, the folding in native structure of a protein in cell seems to be assisted by specialized proteins, which contribute to this process.19 These proteins are called molecular “chaperones” (companionships for young ladies) and possess the

93

property to recognize and to interact with hydrophobic surface sequences of proteins exposed to solvent in non-native states. They do not recognize those non-polar surfaces that are in contact with the environment in correctly folded form. Thus the chaperones do not interact only with certain sequences, and so possess the ability to hinder incorrect intra and intermolecular folding. This last property is very important, having in mind the high concentration of protein species in the cell, and the great danger of aggregation in these conditions. The functioning of two chaperone proteins DnaK and GroEL (heat shock proteins) in the cytosol of Escherichia coli can be summarized as follows.20 DnaK (together with its partner chaperone DnaJ) shields the hydrophobic surfaces of nascent polypeptides (by binding to these), until at least a complete protein domain is available for folding. GroEL forms a large double ring cylinder-like complex with an internal cavity which can trap the synthesized protein (or protein domain) to assist its correct folding. Before this sequestration of the substrate protein, the DnaK – DnaJ ensemble is released as the consequence of a regulator protein (GrpE) action. GroEL possesses also a regulator protein (GroES), which cycles between the two ends of the cylinder (accompanied by ATP hydrolysis), during multiple rounds of substrate protein release and rebinding to the interior surface of chaperonin GroEL. Once folded, the substrate protein has lost its affinity for GroEL and leaves the cylinder cavity. The major rate limiting transition state of protein folding remains unchanged in the presence of chaperones. The duties of chaperones do not end once the native state has been reached. They are able to interact with their former substrate proteins as these expose appropriate binding sites due to accidental unfolding. Thus chaperones might prevent harmful protein aggregation. Calnexin (unrelated with the above heat shock proteins), a membrane-bound protein, is also considered molecular chaperone. One of the already stated functions of this protein is that it retains the incorrectly or incompletely folded proteins, thus acting as endoplasmic reticulum quality controller.21 4.2. Structural features of nucleic acids.

The deoxyribonucleic (DNA) acid was first isolated by Miechner (from white blood cells and then from salmon sperm) around 1870, publishing a series of remarkable studies concerning the structure of the isolated product.1 He called the phosphate rich substance “nuclein” due to its presence in cellular nucleus. When the acidic character of the substance was discovered, this name was changed to “nucleic acid”. The research of these biomolecules has shown, that they are, like proteins, biopolymers. DNA and RNA (ribonucleic acid) are chain-like macromolecules with the property to stock and transfer the genetic information. Although the chemistry of these substances was thoroughly studied after their discovery, a period of 75 years has elapsed since the biological significance of DNA and RNA was realized. In 1944 Avery suggest the idea that DNA is genetic material. The involvement of RNA in protein synthesis was established only in 1957, though it was already identified as genetic material for some viruses. The nucleic acids are major components, and represent 5 – 15 % of the dry mass of the cells. They can be found also in viruses, which are nucleic acid - protein infectious complexes, capable of auto-replication in the host cell. Unlike plants and animals, a virus possesses either DNA, or RNA, never

94

both. Although their name is due to the fact that they are isolated from cellular nuclei, they can be found also in other cellular compartments (i.e. mitochondria). The monomer unit of a nucleic acid is called nucleotide, and the polymer is a polynucleotide. The monomer from DNA is a deoxyribonucleotide, and that from RNA a ribonucleotide. A nucleotide might be hydrolyzed to a heterocyclic nitrogenous base, a pentose, and phosphoric acid. Further on, the structure of nucleic acids at different level of complexity will be presented. 4.2.1. Nitrogenous bases.

The nucleic acids contain two classes of nitrogenous bases: purines and pyrimidines. The two purines from DNA and RNA are adenine (A) and guanine (G), among the pyrimidines cytosine (C) is encountered in both nucleic acids. Uracil (U) shows up only in RNA, replacing thymine (T), which is common for DNA (see figure 4.19 for the chemical structure of these heterocycles). O

N N 5 8

7

6

N1

9

3

2

N 4 N Purine

N N

4 N3

6

1

N N Guanine (G) O

O N

N

N

N Adenine (A) N

5

N

N

N

N

2

O O O N N N N Uracil (U) Pyrimidine Cytosine (C) Thymine (T) Figure 4.19. The structures of purine and pyrimidine nitrogenous bases. Because of tautomerism a pH dependent equilibrium exist between the keto (lactame) and enol (lactime) forms of these bases. The two tautomeric forms for uracil are presented in figure 4.20. At physiological pH the lactame dominates, which is responsible for the H-bonds formed between the base pairs in the native DNA molecule. O

OH NH

N

OH N

O O OH N N N H H Lactame Lactime Double lactime Figure 4.20. The tautomeric forms of uracil There are known also other bases, called modified or rare bases, which appear in small quantities in the structure of polynucleotides. For example, the transfer-RNA (t-RNA) contains a significant percentage (up to 10 %) of rare bases (figure 4.21). The most often encountered modified form of purines is the methylated one, but in

95

table 4.6, which presents a series of rare bases, one can observe also acetyl, isopentenyl or hydroxyl substituent groups. H3C N

S

O

NH N

N

N

CH3

NH2 NH

H3C

N

N N O O N N NH2 N N H H H H N6-Methyladenine 1-Methylguanine 4-Thiouracil 5-Methylcytosine Figure 4.21. Structures of some rare bases that are components of the nucleic acids.

Table 4.6. Modified pyrimidine and purine bases Pyrimidine Purine 5,6-Dihydrouracil 2-Thiouracil 1-Methyladenine 7-Methylguanine 1-Methyluracil N4-Acetylcytosine 2-Methyladenine N2-Methylguanine 3-Methyluracil 3-Methylcytosine 7-Methyladenine N2, N2-Dimethylguanine 6 6 5-Hydroxymethyl- 5-Hydroxymethyl- N , N -Dimethyl6-Oxypurine uracil cytosine adenine 5-Methylcytosine is a common component of plant and animal DNA. 5hydroxymethylcytosine appears instead of cytosine in T-even bacteriophages of Eschericia coli. 4.2.2. Nucleosides. Nucleosides are compounds that have a nitrogenous base (purine or pyrimidine) covalently bonded to D-ribofuranose (ribonucleosides) or to 2-deoxy-Dribofuranose (deoxy-ribonucleoside) through an N-β-glicosidic linkage (see figure 4.22). These bonds involve the hemiacetal group of C-1’ of the pentose and the N-9 nitrogen atom of purine, or N-1 of a pyrimidine. O

N O

5' 4' 3'

O N

N9

1' 2'

N

O

5'

O

N

4' 3'

N1

1'

N

2'

O O O Deoxythymidine Adenosine Figure 4.22. A ribonucleoside and a deoxyribonucleoside. O

A nucleoside (pseudouridine) in which the C-1’ of ribose is linked with the C5 of uracil was discovered in t-RNA.

4.2.3. Nucleotides. Nucleotides are phosphate esters of nucleosides. Although several classes of nucleotides exist, since the phosphate can be at the 2’-, 3’-, or 5’-carbon of a

96

ribonucleotide, or at the 3’- or 5’-carbon of a deoxynucleotide, the naturally occurring nucleotides are commonly 5’-monophosphates (see figure 4.23).

O P O O O

N

N O

N

P O O O

N

N O

O

O

N

O

N N

N

O

O

N

Riboadenosine-5'-phosphate Deoxyriboguanosine-5'-phosphate Figure 4.23. Examples of ribo- and deoxyribonucleotide structures. All the ribonucleosides and deoxyribonucleosides exist in the cells not only as monophosphate derivatives, but also as 5’-di- and 5’-triphosphates, i.e. nucleotide esters of 5’-pirophophoric and 5’-triphosphoric acid (see figure 4.24). Thus there are three series of 5’-phophorylated nucleosides; for adenosine these are adenosinemonophosphate (AMP), adenosine-diphosphate (ADP), and adenosine-triphosphate (ATP). The phosphate groups of these compounds are noted with α, β, γ. The nucleoside 5’-diphosphoric and 5’-triphosphoric acids are relatively strong acids and liberate from the phosphate groups three, respectively four protons. The phosphate groups from these acids give complexes with the bivalent ions Mg2+, and Ca2+. Because of the relatively high Mg2+ concentration in cytoplasm, nucleoside 5’-di- and 5’-triphophates appear in healthy cells as magnesium complexes. The hydrolysis of the γ phosphate group of the triphophate derivative is an important source of chemical energy, which is utilized by the biological systems to work. O

O

O

P

P

P

O O O O O O O γ β α

N O

N

N N

O

N

O

AMP ADP ATP Figure 4.24. The structure of nucleoside 5’-mono-, 5’-di-, and 5’-triphosphate. As example the derivatives of adenosine are presented. For the corresponding deoxyribonucleoside phosphates the pentose is deoxyribose.

Besides the nucleoside-5’-phosphates, some cyclic phosphate derivatives are important compounds in the biochemistry of living organisms. Cyclic ribonucleoside2’, 3’-phosphates (see figure 4.25) are intermediates (3’-phosphates are final products) of RNA hydrolysis catalyzed by the ribonuclease enzymes. Two nucleotides play a crucial role in biochemical action of some hormones: cyclic adenosine - 3’, 5’ - monophosphate (cAMP) and cyclic guanosine - 3’, 5’ mono-phosphate (cGMP) seen in figure 4.25.

97

O

N

N O

O

O

O P Cyclic 2',3'-adenosineO monophosphate

P

O cAMP O

N N

O

O

N

N

O P

N

O

O O

N

N

O

N

N O

O

N

N

N

O

N cGMP Figure 4.25. Biologically important cyclic adenosine phosphates. O

cAMP is formed in eukaryotic cells from ATP by the action of a cellular membrane enzyme, adenilate-cyclase. This enzyme is the catalytic unit associated with some receptors coupled with the G-proteins, stimulated by hormones and neurotransmitters from the blood. cAMP which results following the stimulation is called second messenger, because it triggers a cascade of cellular events. This is the way of transmitting chemical signals from outside the cell towards the interior. In visual excitation cGMP is a key transmitter molecule, which controls the sodium channels in the plasma membrane.

4.2.4. Polynucleotide structure of the nucleic acids. Covalently bounded deoxyribonucleotide chains form deoxyribonucleic acid (DNA), and similar ribonucleotide chains ribonucleic acid (RNA). DNA and RNA posses a series of similar physical and chemical properties, because in both molecules the successive nucleotides are covalently joined by phophodiester bridges between the 5’-hydoxyl group of pentose and the corresponding 3’-hydroxyl of the other nucleotide (see below figure 4.26). Thus the backbone of DNA and RNA consists of alternative phosphoric and pentose groups. The pyrimidinic and purinic bases of the nucleotidic units are not included in the main polymeric chain. They form side chains in the manner already encountered in the case of polypeptides with distinct amino acid side chains. Similarly with proteins, in the case of nucleic acids, the linear polymer structure has two ends, one called here 5’-terminus, the other 3’-terminus. As can be observed in the tetranucleotide from figure 4.26, the ribose group from the 5’terminus has a free hydroxyl group (not implicated in a phosphodiester linkage), and similarly the 3’-terminus a C-3’ – OH end. Nevertheless, it is usual enough in cellular polynucleotides to find the terminal C-5’ implied in a monophosphate ester linkage. The tetranucleotide in figure 4.26 can be represented in abbreviated structure as illustrated in figure 4.27. Also, like in the case of proteins, the polynucleotides might be represented in an abbreviated notation form, using as basis the one letter code of the nitrogenous bases (see for these codes figure 4.19). For the above tetranucleotide adopts either the notation 5’-ApCpGpT-3’, or, respectively 5’ACGT-3’. These abbreviations highlight the sequence and the nature of nitrogenous

98

bases from the molecule. If the 5’-hydroxyl is esterified with phosphoric acid, the same tetranucleotide can be symbolized as 5’p ApCpGpT-3’, or 5’-pACGT-3’. In data base systems containing the genetic codes the corresponding sequence is written ACGT. H2N N

A

9 N

5’-terminus

N Adenine NH2

N

O

HO 5'

3'

1'

C N Cytosine O N1

OH O

O

5' P O O O-

3'

N

O

1' OH

N

O

N G

9

N

O

5' P O O O-

3'

Guanine NH2 O HN

1'

O OH

O

5' P O O O-

N

U 1

Uracil

O 3' 1'

OH HO 3’-terminus Figure 4.26. A tetranucleotide with the covalent phosphodiesteric linkages between the 3’- and 5’-hydroxyl groups of the pentoses.

A3' C G U P HO5'

P

3'

OH

P

5'

Figure 4.27. Abbreviated structure notation for the tetranucleotide (see preceding figure). The importance of this type of abbreviated notation is evident, referring to codons, the groups of three nucleotides that determine the synthesized amino acid sequence from a nascent polypeptide chain. This is the genetic code, and is written in the direction 5’-terminus → 3’-terminus (mRNA, see below), as they are read. Thus, for example ACG specifies an amino acid (Thr), on the other hand GCA other amino acid (Ala). These are the codification triplets. The DNA molecule segment that codifies a whole polypeptide chain is called gene. Briefly stated, the message from a gene (DNA) is first transcribed to a messenger RNA (mRNA, see 4.2.6.1). The messenger RNA is a complementary copy of one of the two strands (see next paragraph) of a gene (DNA), called template or copy strand. Next, the information of mRNA is translated into protein structure using the transfer RNA (tRNA, see 4.2.6.3) as “vehicle” and “activator” for the amino acids that are inserted in the protein. There are 61 triplet codons that encode the magic twenty amino acids, and three such triplets (UAA, UAG, and UGA) serve as termination signals for the protein synthesis. Evidently more than one triplet holds for an amino acid. For example the

99

above-mentioned Thr is encoded beside the ACG triplet, also by ACU, ACC, and ACA. The triplets GCU, GCC, GCA, and GCG represent alanine in the genetic code. It is remarkable that the two recently discovered genetically encoded amino acids (selenocysteine and pyrrolisine) are represented by two termination codons: the first one by UGA, and the second one by UAG.2, 3 There must however be something which tells the translational machinery of the cells that it should continue translation instead of terminate the synthesis. In the case of selenocysteine special signals in mRNAs tag a subset of UGA stop codons that are to have their meaning redefined (mRNA context dependent redefinition of stop codons). This tag is a stem-loop structure (for significance see below 4.2.6.1), which can be close to the UGA codon, as in bacteria, or distant, as in the 3’-untranslated region of eukaryotic mRNAs. After the stop codon was read as signal for selenocysteine insertion, a selenocysteinyl tRNA is first charged with serine. This is then enzymatically modified to form selenocysteine, which is afterwards inserted in the nascent protein. For pyrrolysine the exact mechanism of the termination codon redefinition is not yet clear: is it a “permanent” reassignment of the UAG signal in the organisms in which pyrrolysine was detected or only a subset of such stop codons from the genetic code, accompanied by specific tags are reinterpreted.22 4.2.5. DNA structure 4.2.5.1. Primary structure of DNA DNA molecules belonging to different cells and viruses differ one from the other by the ratio of the four types of nucleotide monomers, by the nucleotide sequence, and by their molecular mass. Besides the four major nitrogenous bases (A, G, T, C), in certain DNA types, particularly of viral origin, small quantities of methyl derivatives of these bases can be found. In diploid eukaryotic cells, almost the whole quantity of DNA is in the nucleus, combined by ionic bonds with basic proteins called histones. Besides nuclear DNA, small quantities of DNA in these cells can be found also in mitochondria. This is the mitochondrial DNA, with molecular masses of the order of 10 millions. Likewise, chloroplasts contain a distinct type of small DNA molecule. Many viruses contain DNA, with molecular masses between 2 millions and 100 millions, function of the virus specie. Although the base-sequences are not always known, it was established that in deoxyribonucleic acids adenine and thymine enter in similar proportions, and guanine participates with the same proportion as cytosine. Thus the experimentally determined ratios of A / T and G / C are very close to one, only the ratios of the sums (A + G)/(T + C) varies from one species to another. 4.2.5.2. Tertiary structure of DNA: the double helix model. DNA conformations. In 1953 James D. Watson and Francis H. C. Crick23 have postulated, based on X-ray diffraction results obtained by Rosalind Franklin and M. Wilkins (Nobel Price 1962), a three-dimensional model of the DNA structure. Based on the model structure the observed physical and chemical properties of DNA have been explained. In their model Watson and Crick have proposed for DNA a structure formed by two polynucleotide strands, which wound together around a common axis in a righthanded double helix. The two chains are antiparallel, the 3’, 5’-phosphodiester

100

linkages being in opposite directions in the two strands. Figure 4.28 illustrates the propagation in space of every chain by two ribbons.

Minor groove (mG) 34 Å

Major groove (MG)

10 Å

Figure 4.28. Double-helical model of DNA. The two ribbons represent the phosphate (P) and deoxyribose (D) groups. A – T and G – C symbolize adenine - thymine and guanine – cytosine nitrogenous base pairs. The purine and pyrimidine bases are oriented toward the interior of double helix, their planes being approximately parallel, and perpendicular to the helices axis. This position is maintained by the hydrogen bonds that are formed between the base pairs of the two strands. The hydrogen bonds are specifically formed between the pairs A – T and G – C (see also figure 4.29). MG H

O

H N H

N A

N

N mG

T O

N

C1' of deoxyribose (D) H MG H N

N

N C1' of deoxyribose (D)

O N G N

N N

H O N

N

C

H

N C1' of deoxyribose (D)

mG

C1' H of deoxyribose (D) Figure 4.29. The base pairs A – T and G – C from a double helix DNA, with the two (first pair) or three (second pair) stabilizing H-bonds. MG means major, and mG minor groove.

101

Thus the two chains of DNA are complementary, because to every base from one strand corresponds its hydrogen-bonded pair in the other strand. The purine base pair A – G (not permitted in DNA) is too bulky, and is no space for it to match in the interior of the double helix. The width of the helix is of 20 (10 + 10) Å. In the hypothetical base pair C – T (not encountered), the bases would be situated too far to form stable hydrogen bonds. The allowed pairs (A –T, and G – C) present the same distance between the β-glycoside linkage, and also have stronger and more stable hydrogen bonds comparatively with the other pairs (A – G, and C – T). The double helix presents two groves: the major groove (MG) and the minor groove (mG). These are highlighted in figures 4.28 and 4.29. They are the “headquarters” of molecular interactions (with proteins or micromolecules). To explain the periodicity of 34 Å observed by the X-ray diffraction method, Watson and Crick have emitted the hypothesis that the bases stacked at these distances measured from the center of one to the center of the other. Every turn of the double helix contains the nucleotides, and this corresponds to the same distance of 34 Å. As already mentioned the half-diameter of the helix is of 10 Å. The two antiparallel strands of the DNA double helix are not identical concerning the sequence and composition of nitrogenous bases, but are complementary each other (see figure 4.30). The complementary nature of the two strands suggested a mode of replication to Watson and Crick: the replication occurs by the synthesis of complementary strands on each of the two parent strands. The original chains serve as “stencils” to specify the base sequences of the new synthesized complementary strands. The final result is formation of two “daughter” double helix molecules, identical with the “parental” DNA. 5’

3’

G–C–C–T–A–G–A–A C–G–G–A–T–C–T–T 3’

5’

Figure 4.30. Complementary antiparallel DNA strands. The molecules from the environment of the double helix enter in contact in MG with the “upper” groups of the base pairs (O, NH2 for both T –A, and G – C), and in mG with the “lower” groups of the base pairs (O for T –A, and O, NH2 for G - C). The relatively hydrophobic bases are disposed toward the interior of the double helix, thus being protected against the contact with the water molecules of the environment. The pentose groups and the ionized phosphate groups are disposed at the exterior of the double helix (figure 4.31 below). The term DNA conformation refers to spatial arrangement of the DNA molecules, comprising the secondary and tertiary structure. The in vivo secondary structure is the folded double helix (except some small bacteriophage DNA). The tertiary structure presents a great flexibility, thus a single, rigid conformation cannot be identified. This seems to be a statistical function of the populations of many DNA molecules. Also differences are observed between the in vivo and in vitro structures. The DNA double helix filaments suffer in vitro structural alterations following the loss of water molecules, in function of the electrolyte content of the medium.

102

C G G C A T T A Figure 4.31. The nonpolar nucleotide bases disposition: with the H-bonds toward the interior of the double helix. The polar ribose and phosphate groups are oriented toward the exterior, in contact with the environment. The orientation characterized by various values for the dihedral angles belonging to flexible bonds of the chain (backbone) condition the adopted secondary structure. The relevant angles are χCN, which determine the orientation of the base plane in relation to β-glycoside linkage, the τ angles characterizing the pentose cycle structure, and the Φ angles corresponding to the bonds C – O, O – P, P – O (figure 4.32). The pentose cycle is not planar: the endo structure has the C3’ atom at the same side with C5’, the exo structure having C3’ at the opposite side of C5’ (see also figure 4.32). Conformational energy calculations have shown that these structures (besides others) are energetically stables. The fact that the cycle is not planar is responsible for major differences in the helix structure. Quantum-chemical calculations conducted to the conclusion that the most flexible part of the molecule is localized at the φPO3’ and φPO5’ angles. According to these calculations two main minims are identified for each dihedral angle φ, in a large measure independently of the pentose structure (see Table 4.7). A great number of possible conformations result. P O5'5'O φ O5'Base 5'C τ4 τ0 O χ φ C5'4' 1' 1' CN τ1 τ 3 3' 2' φ τ2 O3'3'O φ PP φ

C 5' C 3' C 4' O 1' C 1' C 2' C3'-endo (C3'-endo - C2'-exo)

C 5'

C 2'

C 4' O 1' C 1' C 3' C3'-exo (C3'-exo - C2'-endo)

Figure 4.32. Notations of the dihedral angles responsible for the DNA chain flexibility (left), and the most stable conformations of the pentose cycle (right).

103

Table 4.7. Angles corresponding to energetic minima for the DNA chain φC4’-C5’ C3’-C4’-C5’-O5’ 60º 180º φC5’-O5’ C4’-C5’-O5’-P 180º 180º C5’-O5’-P-O3’ 60º -60º φO5’-P 60º -60º φCP-O3’ O5’-P-O3’-C3’ φO3’-C3’ P-O3’-C3’-C4’ 180º 180º The other major determinant of the backbone conformation is the rotation angle around the glycoside bond, χCN (i.e. O1’-C1’-N1-C6 in pyrimidine derivatives, and O1’-C1’-N9-C8 in purines). For χCN = 0º the conformation is considered cis. Through energy calculations three main minima are found: χCN = 180º anti, χCN = +90º high anti, and χCN = -90º syn. In both anti conformations the base cycle is in the opposite side of the pentose cycle, and in the syn conformation approximately above the pentose cycle. A distinct characteristic of the double helix is the orientation of the opposed H-bonded base pairs related to helix axis. This orientation is determined by three angles: twist (between successive base pairs), roll and tilt (see figure 4.33). twist MG roll

D

D mG

tilt Double helix axis

Figure 4.33. Characterization of the base pair plane orientation by three angles. Three DNA helix forms were identified, called A, B and Z. The A and B forms are right-handed twisted spirals, and the Z form is left-handed. In the B form the bases are approximately perpendicular to the helix axis (tilt of 2º), this inclination being of about 20º for the A form, and 6º for Z. Some geometrical characteristics of the three forms are presented in Table 4.8. Table 4.8. Structural characteristics of DNA conformations Helix Base/turn Rise/base Helix diameter Twist B R 10 3.34 Å ~20 Å 36 º A R 11 2.65 Å ~23 Å 33 º Z L 12 3.71 Å ~18 Å DNA in solutions prefers the B form, but the non-hydrated form at high Na+ and K+ concentrations is A. Viral RNA molecules also adopt structure A, because the 2-hydroxyl ribose substituents sterically hinder the formation of B-DNA. The Z form appears in alternative purine – pyrimidine sequences (like GCGCGCGC).

104

There are a few types of interactions implied in DNA conformation stabilization, specified below. The ionizable groups of the bases from DNA accept protons at pH 4 ÷ 5; the enol groups ionize at pH 9, and at pH greater than 3, every phosphate group possesses a solitary negative charge. Owing to these properties, in the physiological pH domain (5 ÷ 9) DNA appear as an anionic polyelectrolyte. Thus the electrostatic interaction forces determine the stability, and respectively unwinding of the double helix at low ionic concentrations. The multi-valent cations and poly-anions enhance the stability of DNA versus denaturants. Interactions with the hydrophilic environment determine a stacking in plane of the nucleotide bases. The consecutive base planes oriented perpendicularly to the helix axis are in interaction with each other. These week interactions become notables when the distance and geometrical disposition between two superposed base rings is optimal. These are the stacking forces. The hydrogen bonds are perhaps the most important forces implied in mechanisms through which DNA exercise its biological function in DNA replication (the conservation of genetic information), and mRNA synthesis (transcription). Also the double helix is maintained by hydrogen bonding between bases of the two DNA strands. The energetically most stable pair (ab initio calculations) is G – C (see figure 4.29) with -23.8 kcal/mole, and to A – T pair corresponds to -11.8 kcal/mole. Geometrically the A – T pair is planar, but for G – C the amino group hydrogen atoms are out of plane, the whole structure being bulged and twisted in a helix form. The hydrophobic and hydrophilic interactions as a whole determine the DNA conformation and solubility property, especially owing to presence of phosphate groups oriented toward the aqueous environment, and to base pairs situated in the core of the double helix. As regards this aspect the twist angles and the distances between the base pairs are important, because the combination of these effects assure the protection of nonpolar edges of the base planes (without H-bond forming groups) to the contact with the polar medium. 4.2.5.3. DNA types. DNA subunits. The axis of the DNA double helix is only ideally linear. In fact it bends and twists in function of the local base sequence and of the specific interactions with proteins or other ligands. Especially computer modeling experience has suggested two conclusions. (i) Following small periodic modifications of the local chain conformation, DNA might twist around a histonic protein. (ii) A-form DNA insertion at the end of a B-DNA helix might induce chain bending, without base arrangement perturbation. These facts have insinuated that certain base pairs might adopt non-parallel orientations, and act as double helix axis direction modifiers. Inserted at certain regular intervals in the double helix, such non-parallel bases might facilitate the unidirectional bending of the chain, without destroying the hydrophobic interactions and the hydrogen bonds between the bases. Viral DNA molecules from bacteria can be obtained in native homogeneous form, serving as prototypes for much higher DNA molecules from the eukaryotic cell. Special attention was given to bacteriophage λ and ΦX 174, which infect Echerichia coli. The viral deoxyribibonucleic acids, in many cases, are linear duplexes, except for example the DNA of ΦX 174, which possesses circular single-stranded structure. Some structural properties of viral DNA demonstrate that initially it is circular, or it

105

circularizes before the replication proceeds. This concept is called “the rule of circular structure”. The term “circular” does not refer here to the geometric form, but to the continuity of the DNA chain. For example the linear double stranded DNA of bacteriophage λ contains at the 5’-terminus of each strand (beyond the final 3’terminus paired base) adhesive extremities: single-stranded ends with 12 complementary base sequences. These can produce an open-circle DNA when the linear molecule adopts a circular shape. Subsequently DNA ligase (catalyst for phosphodiester bond formation) converts this molecule in into a covalently closed circle. Certain linear viral DNA molecules present repetitive terminus sequences: the 3’terminus base sequences repeat the 5’-terminus base sequences. These molecules do not contain a single sequence of bases, being constituted from a population of molecules with circularly permutated sequences. Figure 4.34 illustrates the circular permutation in the case of three linear DNA duplexes (the sequences are numbered). 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 1' - 2' - 3' - 4' - 5' - 6' - 7' - 8' - 9' - 10' 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 1 - 2 3' - 4' - 5' - 6' - 7' - 8' - 9' - 10' - 1' - 2' 7 - 8 - 9 - 10 - 1 - 2 - 3 - 4 - 5 - 6 7' - 8' - 9' - 10' - 1' - 2' - 3' - 4' - 5' - 6' Figure 4.34 Circular permutation of viral DNA. Other characteristic property of the viral DNA molecules is the content of modified bases. Thus DNA of T-even bacteriophages of E. Coli, instead of cytosine, has 5-hydromethylcytosine, and the hydroxyl group is often glycosylated. These modifications protect viral DNA against the action of endonucleases of the host cell. In the physiological pH domain DNA is submitted to antagonist forces: (i) the repulsions between the ionized phosphate groups, which tend stretch and to elongate the molecule, and (ii) all the attractive interactions (forces) between the bases, which tend to form compact segments of helical structures. The reduction of these interactions among the bases (for example through temperature increasing) leads finally to the protonated form of the basic groups, and thus to diminution of the attractive interactions. The molecule tends to stretch and destroy the former double helix structure. Also, the alkaline titration, because of ionization of the enol OH groups, destroys the helical conformation, through a maximal elongation of the DNA chains. In several cases the native DNA might be also single-stranded. For example, phage ΦX 174 has single-stranded circular (continuous) DNA. Circular duplex DNA molecules isolated in native form from viruses, bacteria, and mitochondria are often supercoiled. This supercoiling is determined by higher base pair content per length unit comparatively to the normal for a double helix. As a result, the whole circular DNA structure compensates this torsional stress by opposite twisting. Such supercoiled DNA molecules might be relaxed by the appearance of a temporary gap in one of the strands. This is afterwards “repaired” by insertion of some chemical agents between the base pairs, or even by protein binding. Much effort was spent to model the local characteristics of supercoiled DNA.24 Numerous influential factors are identified, like the global fold of the helix

106

axis, the twist angle (see figure 4.33), the number and position of localized bent segments, etc. Because the base pair twist angle modifications caused by supercoiling might affect protein binding, DNA segments which are identical in relaxed state, have different protein affinities along the supercoiled double helix. Also the global form of the supercoiled polymer is highly influenced by localized bending. Thus the research of supercoiling is highly important for elucidation and explanation of all the biochemical functions of DNA. 4.2.6. Structure of ribonucleic acids. Unlike DNA, ribonucleic acid presents distinct cellular molecular species. Three cytoplasmic RNAs are known: messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). The three RNA types are single stranded polyribonucleotide chains, which differ in molecular masses and sedimentation coefficients, S (see Table 4.9). Every RNA type exists in multiple molecular forms. The rRNA presents at least three major forms, tRNA sixty forms, and mRNA thousands distinct forms. The majority of the cells contain two to eight times more RNA than DNA.

Type mRNA tRNA rRNA

Table 4.9. Different RNA types and some of their characteristics. S Molecular mass Number of nucleotide % of total residues RNA 6– 25,000 – 75 – 3,000 2 25 1,000,000 ~4 23,000 – 30,000 75 – 90 16 5 35,000 100 82 16 ~ 550,000 ~ 1,500 23 1,100,000 ~ 3,000

Certain viral RNA molecules have different A / U (≈ 1) and G / C ratios; in E. coli rRNA and in yeast neither of this values are near to one; in E. coli tRNA both ratios are near to one, while in yeast tRNA these values are different. The limited dimensions and the different base contents and ratios for rRNA and tRNA suggest that they form by replication of only a portion of chromosomal DNA. There is a great structural variety of RNAs. The structure results implying complementary base pair formation between different portions of the chain, involve usually cations (Mg2+), and other types of interactions.25 Thermodynamic analysis of RNA defolding suggests three main strategies of obtaining a very stable and compact folded structure: (i) hydrogen bonds between irregular complementary surfaces in tertiary structure of tRNA structure; (ii) mono- and divalent cation binding at specific sites in rRNA; (iii) pseudoknot folds in mRNA fragments.26 4.2.6.1. Messenger RNA. Messenger ribonucleic acid contains the four major bases A, G, C, and U. It is synthesized in the nucleus in the process of transcription. By this process the base sequence of a DNA strand is enzymatically copied in the chain of mRNA. A small quantity is also synthesized in mitochondria. The base sequence of mRNA is complementary to the DNA strand that is transcribed. Messenger RNA from the eukaryotic cells is characterized by the presence at the 3’-terminus of a long, nearly

107

200 adenyl nucleotides containing (polyA) tail. This tail protects the mRNA from degradation, and is also involved in its transport from the nucleus to the cytoplasm. mRNA presents heterogeneous magnitudes, because the transcribed genes differ very much in length. The codon of an amino acid being represented by three nucleotides, it results that for the synthesis of medium dimension protein (500 amino acids) 1500 nucleotides are necessary (molecular mass 500,000 daltons). The double helix appears as secondary structure element also in mRNA, even if not all the juxtaposed bases are complementary (see figure 4.35). Remote nucleotides from the single strand form the portion of double helix of mRNA structure. Generally, mRNA molecules arise as a mixture of segments of double helices, random coils, loops that close a double helix portion, stems that appear like swellings representing non-complementary portions of the single strand formed double helix. m ≥ 3 baze Loop C-G C-G G-C A-U A-U A-U U C-A A C Stem U C C C-G G-C G U-A U-A UC U-A C-G G-C 5'

3'

Figure 4.35. Some structural elements for mRNA. 4.2.6.2. Ribosomal RNA. Ribosomal ribonucleic acid represents 65 % of the mass of ribosomes. rRNA is obtained from the E. Coli ribosomes as linear single stranded molecules in three forms, which differ in sequence and base ratio. In the cells of eukaryotes there are four types of rRNA, labeled after the sedimentation coefficient 5 S, 7 S, 18 S and 28 S. rRNA plays a role in polypeptide synthesis. Some of the bases from rRNA are methylated. 4.2.6.3. Transfer RNA. Transfer ribonucleic acids are relatively small molecules, which function as specific transporters of an amino acid in the protein biosynthesis realized in the ribosomes. Every of the 20 magic amino acids possesses at least one correspondent tRNA. For example in the E. Coli cells exist five different tRNAs for the transfer of leucine. Different mitochondrial and cytoplasmic tRNAs correspond to a certain

108

amino acid in the eukaryote cells. tRNA contain an appreciable number of rare bases, which might represent 10 % of the total base quantity. Moreover in tRNA also some unusual nucleotides appear, as the pseudouridilate and ribothymidilate depicted in figure 4.36. O

O

P

P

N

O O O

O

O

O O O

O

O

O

O

O

O

N

N

N O

O

Pseudouridilate ion Ribothymidilate ion Figure 4.36. Two unusual nucleotides in tRNA.

In pseudouridilic acid the β-glycoside linkage is in the position 5 of uracil, and not in position1, like in the usual nucleotide. The ribothymidilic acid is unusual, because thymidine currently does not participate in RNA molecules, but only in DNA. The famous cloverleaf structure of tRNA is depicted in figure 4.37. A C C A pG

C

G

C

G

U

C

G

G

C

U

U

G

C A G G C C

U G hU

A U

C G C

G

A G

hU

T

C

G

G C G C

C

G

U C C G G

m'G G

U U A

Ψ

C

U G m2G

A

C

G

U

A

C

G

C

G

C

G

U

Ψ

U

m'I I

G

C

Anticodon

Figure 4.37. A tRNA example: the nucleotide sequence for alanine tRNA from yeast. The rare base codifications are: Ψ = pseudouridine, I = inosine (hypoxanthine), T = ribothymidine, hU = 5,6-dihydrouridine, m’I = 1-methylinosine, m’G = 1methylguanosine, mG = N2-dimethylguanosine. All the tRNAs posses at one end of the polymer chain a guanyl terminal (5’), and at the other end a C – C – A sequence terminal (3’). The 3’-terminal adenyl nucleotide is the site of specific amino acid attachment, by enzymatic acylation. This amino acid is transferred by enzymatic catalysis to the nascent polypeptide chain. The specificity is 109

assured by the anticodon loop. This loop has in the figure a sequence of three nucleotides I – G – C (proper for alanine) which forms hydrogen bonds with the corresponding codon (CCG for alanine) within the structure of mRNA. As a consequence alanine is incorporated into its genetically designated positions during polypeptide synthesis. As can be seen double helical portions and loops form the cloverleaf model. The model, based on maximum intramolecular hydrogen bonding is applicable to all tRNAs. Because they have the same general function in protein synthesis (except for their individual specificities), this is not unexpected. The base sequence of the anticodon loop is usually: 5’ – pyrimidine – pyrimidine – anticodon – modified purine – variable base – 3’. The real three-dimensional structure of yeast phenylalanyl tRNA is a compact L - shaped molecule.27 It contains two double-helix segments, every one having around 10 base pairs that corresponds to a turn. Two loops form the corner of the L. The distance between the 3’ (CCA) end of the L, and the anticodon loop (at the other end of the L) is about 70 Å (see figure 4.38). The CCA end and the helical region change their conformation with the occasion of amino acid attachment.

loop 5’-terminus

double helix 3’terminus (CCA) loop

anticodon Figure 4.38. The structure of tRNA (phenylalanyl) presented as ribbon. References 1

2

3

4

Armstrong, F. B., Biochemistry, Third Edition, Oxford University Press, Oxford, 1989 Chambers, I., Frampton, J., Goldfarb, P., Affara, N., McBain, W. and Harrison, P.R., EMBO J., 5, 1221 – 1227 (1986); Zinoni, F., Birkmann, A., Stadtman, T.C. and Böck, A., Proc. Natl. Acad. Sci. USA, 83, 4650 – 4654 (1986) Hao, B., Gong, W., Ferguson, T.K., James, C.N., Krzycki, J.A. and Chan, M.K., Science, 296, 1462 – 1466 (2002); Srinivasan, G, James, C.N. and Krzycki, J.A., Science, 296, 1459 – 1462 (2002) Van de Waterbeemd, H and Mannhold, R., “Lipophilicity Descriptors for Structure – Property Correlation Studies: Overview of Experimental and Theoretical Methods and Benchmark of logP Calculations”, in Lipophilicity in Drug Action and Toxicology, Pliška, V., Testa, B. and van de Waterbeemd, Editors, VCH, Weinheim, 1996, pp. 401 - 418 [vol. 4 of Methods and Principles

110

5

6

7 8

9 10 11

12 13

14 15 16

17 18 19 20 21

22 23 24

25 26 27

in Medicinal Chemistry, edited by Mannhold, R., Kubinyi, H. and Timmerman, H.] Pauling, L., Corey, R. B. and Hayward, R., Scientific American, 191, 51 - 59 (1954) Sancho,J., Serraneo, L. and Fersht, A.R., Biochemistry, 31, 2253 – 2258 (1992); Fersht, A.R. and Serraneo, L., Curr. Opin. Struct. Biol., 3, 75 – 83 (1993) Chou, P.Y. and Fasman, G. D., J. Mol. Biol., 74, 263 – 281 (1973) Baldwin, R. L. and Rose, G. D., TIBS, 24, 26 – 33 (1999); Baldwin, R. L. and Rose, G. D., TIBS, 24, 77 – 83 (1999) Minor, Jr., D. L. and Klim, P.S., Nature, 380, 730 - 734 (1996) Minor, Jr., D. L. and Klim, P.S., Nature, 371, 264 - 267 (1994) Muñoz,V., Thompson, P.A., Hofrichter, J. and Eaton, W.A., Nature, 390, 196 – 199 (1999) Efimov, A. V., Structure, 2, 999 – 1002 (1994) Thomas, P.J., Qu, B.H. and Redersen, P.L., TIBS, 20, 456 – 459 (1995); Booth, B. R., Sunde, M., Bellotti, V., Robinson, C.V., Hutchinson, W.L., Fraser, P.E., Hawkins, P.N., Dobson, C.M., Radford, S.E. and Pepis, M.B., Nature, 385, 787 – 793 (1997) Dobson, C.M., Evans, P.A. and Radford, S.E., TIBS, 19, 31 – 37 (1994) Woodward, C., TIBS, 18, 359 – 360 (1993) Baldwin, R.L., “Protein folding. Matching speed and stability.”, Nature, 369, 183 – 184 (1994) Backer, D., Nature, 405, 39 – 42 (2000) Schinde, U. and Inouye, M., TIBS, 18, 442 – 446 (1993) Georgeopoulos, C., TIBS, 18, 295 – 299 (1992) Martin, J. and Hartl, F.U., Structure, 1, 161 -164 (1993) Bergeron, J.J.M., Brenner, M., Thomas, D.Y. and Williams, D.B., TIBS, 19, 124 – 128 (1994) Atkins, J.F. and Gesteland, R., Science, 296, 1409 – 1410 (2002) Watson, J.D. and Crick, F.H.C., Nature, 171, 737 – 738 (1953) Yang, Y., Wescott, T.P., Pedersen, S.C., Tobias, I., and Olson, W.K., TIBS, 20, 319 – 319 (1995) Doudna, J.A., Nature, 388, 830 – 831 (1997) Draper, D.E., TIBS, 21, 145 – 149 (1996) Shi, H., Moore, P.B., RNA, 6, 1091 - 1105 (2000)

111

Suggest Documents