II. - Nucleic acids, DNA, and RNA Structure and Representation G´erald Monard Structure et r´eactivit´e des biopolym`eres (Biopolymer structure and reactivity) Master Chimie 2`eme ann´ee

1

Contents 1 Nucleic acids 1.1 Nucleotides, nucleosides, and Bases . . . . . . . . . 1.2 Chemical structures of DNA and RNA . . . . . . . . 1.3 Expression and Transmission of Genetic Information 1.4 RNA Synthesis: Transcription . . . . . . . . . . . . 1.5 Protein synthesis: Translation . . . . . . . . . . . . . 1.6 The Watson-Crick Structure: B-DNA . . . . . . . . 1.7 Other nucleic acid helices . . . . . . . . . . . . . . .

2

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 3 13 17 21 23 31 34

1 Nucleic acids 1.1 Nucleotides, nucleosides, and Bases Nucleotides and their derivatives are biological substances that participate in nearly all biochemical processes: • They form the monomeric units of nucleic acids and thereby play central roles in both the storage and the expression of genetic information. • Nucleoside triphosphates, in particular ATP, are the “energy-rich” end products of the majority of energy-releasing pathways and the substances whose utilization drives most energy-requiring processes. • Most metabolic pathways are regulated, as least in part, by the levels of nucleotides such as ATP and ADP. • Nucleotide derivatives, such as nicotinamide adenine dinucleotide (NAD and NADP), flavin adenine dinucleotide (FAD), and coenzyme A, are required participants in many enzymatic reactions.

3

Figure 1: Chemical structures of ADP and ATP. 4

Figure 2: The structures and reaction of nicotinamide adenine dinucleotide (NAD+ ) and nicotinamide adenine dinucleotide phosphate (NADP+ ). Their reduced forms are NADH and NADPH, respectively.

5

Figure 3: The molecular formula and reactions of the coenzyme flavin adenine dinucleotide (FAD). The FAD may be half-reduced to the stable radical FADH• or fully reduced to FADH2 .

6

Figure 4: Chemical structure of acetyl-Co-A 7

Figure 5: Map of the major metabolic pathways in a typical cell.

8

Nucleotides are phosphate esters of a five-carbon sugar (a pentose) in which a nitrogenous base is covalently linked to C1’ of the sugar residue.

Figure 6: Chemical structures of ribonucleotides and deoxyribonucleotides In ribonucleotides, the monomeric units of RNA, the pentose is D-ribose, whereas in deoxyribonucleotides (or just deoxynucleotides), the monomeric units of DNA, the pentose is 2’-deoxy-D-ribose (note that the “primed” numbers refer to the atoms of the ribose residue; “unprimed” numbers refet to the nitrogenous base). The phosphate group may be bonded to C5’ of the pentose to form a 5’-nucleotides or to its C3’ to form a 3’-nucleotide. If the phosphate group is absent, the compound is known as a nucleoside. Note that nucleotide phosphate groups are doubly ionized at physiological pH’s; that is, nucleotides are moderately strong acids.

9

The nitrogenous bases are planar, aromatic, heterocyclic molecules which, for the most part, are derivatives of either purine or pyrimidine.

Figure 7: Chemical structures of purines and pyrimidines.

10

Figure 8: Names and Abbreviations of Nucleic Acid Bases, Nucleosides, and Nucleotides. 11

The major purine components of nucleic acids are adenine and guanine residues; the major pyrimidine residues are those of cytosine, uracil (which occurs mainly in RNA), and thymine (5-methyluracil, which occurs mainly in DNA). The purines form glycosidic bonds to ribose via their N9 atoms, whereas pyrimidines do so through their N1 atoms.

12

1.2 Chemical structures of DNA and RNA Nucleic acids are, with few exceptions, linear polymers of nucleotides whose phosphate groups bridge the 3’ and 5’ positions of successive sugar residues.

Figure 9: Chemical structure of a nucleic acid.

13

The phosphates of these polynucleotides, the phosphodiester groups, are acidic, to that, at physiological pH’s, nucleic acids are polyanions. Polynucleotides have directionality, that is, each has a 3’ end (the end whose C3’ atom is not linked to a neighboring nucleotide) and a 5’ end (the end whose C5’ atom is not linked to a neighboring nucleotide). DNA (deoxyribonucleic acid) consists of two strands of linked nucleotides, each of which is composed of a deoxyribose sugar residue, a phosphoryl group, and one of four bases: adenine (A), thymine (T), guanine (G), or cytosine (C). Each DNA base is hydrogen bonded to a base on the opposite strand to form an entity known as a base pair. However, A can only hydrogen bond with T, and G with C, so that the two strands are complementary; that is, the sequence of one strand implies the sequence of the other.

14

Figure 10: Double-stranded DNA. The two polynucleotide chains associate through complementary base pairing. A pairs with T, and G pairs with C by forming specific hydrogen bonds. 15

Chargaff’s rules: DNA has equal numbers of adenine and thymine residues (A=T) and equal numbers of guanine and cytosine residues (G=C). This rule does not apply to RNA.

16

1.3 Expression and Transmission of Genetic Information The division of a cell must be accompanied by the replication of its DNA. In this enzymatically mediated process, each DNA strand acts as a template for the formation of its complementary strand. Consequently, every progeny cell contains a complete DNA molecule (or set of DNA molecules), each of which consists of one parental strand and one daughter strand.

Figure 11: Schematic diagram of DNA replication. Each strand of parental DNA (red) acts as a template for the synthesis of a complementary daughter strand (green). Consequently, the resulting double-stranded molecules are identical.

17

The expression of genetic information is a two-stage process.

Figure 12: The central dogma of molecular biology. Solid arrows indicate the types of genetic information transfers that occur in all cells. Special transfers are indicated by the dashed arrows, they are known to appear in RNA viruses and in some plants. In the first stage, which is termed transcription, a DNA strand serves as a template for the synthesis of a complementary stand of ribonucleic acid (RNA). This nucleic acid, which is generally single stranded, differs chemically from DNA only in that it has ribose sugar residues

18

in place of DNA’s deoxyribose and uracil (U) replacing DNA’s thymine base. In the second stage of genetic expression, which is known as translation, ribosomes enzymatically link together amino acids to form proteins. The order in which the amino acids are linked together is prescribed by the RNA’s sequence of bases. Consequently, since proteins are self-assembling, the genetic information encoded by DNA serves, through the intermediacy of RNA, to specify protein structure and function.

19

20

1.4 RNA Synthesis: Transcription

Figure 13: Gene expression. One strand of DNA directs the synthesis of RNA, a process known as transcription. The base sequence of the transcribed RNA is complementary to that of the DNA strand. The RNAs known as messenger RNAs (mRNAs) are translated when molecules of transfer RNA (tRNA) align with the mRNA via complementary base pairing between segments of three consecutive nucleotides known as codons. Each type of tRNA carries a specific amino acid. These amino acids are covalently joined by the ribosome to form a polypeptide. Thus, the sequence of bases in DNA specifies the sequence of amino acids in a protein.

21

The enzyme that synthesizes RNA is named RNA polymerase. It catalyzes the DNA-directed coupling of the nucleoside triphosphates (NTPs) adenosine triphosphate (ATP), cytidine triphosphate (CTP), guanosine triphosphate (GTP), and uridine triphosphate (UTP) in a reaction that releases pyrophosphate ion (P2 O4− 7 ). (RNA)n residues + NTP −→ (RNA)n+1 residues + P2 O4− 7

Figure 14: Action of RNA polymerases. These enzymes assemble incoming ribonucleoside triphosphates on templates consisting of single-stranded segments of DNA such that the growing strand is elongated in the 5’ to 3’ direction.

22

RNA synthesis proceeds in a stepwise manner in the 5’→3’ direction, that is, the incoming nucleotide is appended to the free 3’-OH group of the growing RNA chain. The DNA template strand contains control sites consisting of specific base sequences that specify both the site at which RNA polymerase initiates transcription and the rate at which RNA polymerase initiates transcription at this site. For the RNAs that encode proteins, which are named messenger RNAs (mRNAs), these control sites precede the initiation site. The rate at which a cell synthesizes a given protein, or even whether the protein is synthesized at all, is mainly governed by the rate at which the synthesis of the corresponding mRNA is initiated.

1.5 Protein synthesis: Translation Polypeptides are synthesized under the direction of the corresponding mRNA by ribosomes. mRNAs are essentially a series of consecutive 3-nucleotide segments known as codons, each of which specifies a particular amino acid. However, codons do not bind amino acids. Rather, on the ribosome, they specifically bind molecules of transfer RNA (tRNA) that are each covalently linked to the corresponding amino acid.

23

Figure 15: Transfer RNA (tRNA) drawn in its “cloverleaf” form. Its covalently linked amino acid residue forms an aminoacyl-tRNA (top), and its anticodon (bottom), a trinucleotide segment, base pairs with the complementary codon on mRNA during translation. 24

A tRNA typically consists of ∼ 76 nucleotides and contains a trinucleotide sequence, its anticodon, which is complementary to the codon specifying its appended amino acid. An amino acid is covalently linked to the 3’ end of its corresponding tRNA to form an aminoacyl-tRNA through the action of an enzyme that specifically recognizes both the tRNA and the amino acid.

25

Figure 16: Schematic diagram of translation. The ribosome binds an mRNA and two tRNAs and facilitates their specific association through consecutive codon–anticodon interactions.

26

During translation, the mRNA is passed through the ribosome such that each codon, in turn, binds its corresponding aminoacyl-tRNA. As this occurs, the ribosome transfers the amono acid residue on the rRNA to the C-terminal end of the growing polypeptide chains. Hence, the polypeptide grows from its N-terminus to its C-terminus.

27

Figure 17: The ribosomal reaction forming a peptide bond. The amino group of the aminoacyltRNA in the ribosomal A site nucleophilically displaces the tRNA of the peptidyltRNA ester in the ribosomal P site, thereby forming a new peptide bond and transferring the growing polypeptide to the A site tRNA.

28

The Genetic Code: The correspondence between the sequence of bases in a codon and the amino acid residue it specifies is known as the genetic code. There are four possible bases (U, C, A, and G) that can occupy each of the three positions in a codon, hence 43 = 64 possible codons. Of these codons, 61 specify amino acids (of which there are only 20) and the remaining three (UAA, UAG, and UGA) are Stop codons that instruct the ribosome to cease polypeptide synthesis and release the resulting transcript.

29

Figure 18: The standard genetic code. 30

1.6 The Watson-Crick Structure: B-DNA Structure of DNA has been determined by Watson and Crick in 1953. Fibers of DNA assume the so-called B conformation when the counterion is an alkali such as Na+ and the relative humidity is > 92%. B-DNA is regarded as the native (biologically functional) form of DNA.

31

Figure 19: Three dimensional structure of B-DNA (left) and the Watson-Crick base pairs (right). 32

The Watson-Crick structure of B-DNA has the following major features: • It consist of two polynucleotide stands that wind about a common axis with a right˚ diameter double helix. Thw two strands are antiparallel handed twist to form an ∼ 20 A (run in opposite directions) and wrap around each other such that they cannot be separated without unwinding the helix. The bases occupy the core of the helix and the sugar-phosphate chains are coiled about its periphery, thereby minimizing the repulsions between charged phosphate groups. • The planes of the bases are nearly perpendicular to the helix axis. Each base is hydrogen bonded to a base on the opposite strand to form a planar base pair. It is these hydrogen bonding interactions, a phenomenon known as base pairing, that result in the specific association of the two chains of the double helix. • The “ideal” B-DNA helix has a 10 base pairs (bp) per turn (a helical twist of 36◦ per bp) ˚ and are partially and, since the aromatic bases have van der Waals thicknesses of 3.4 A ˚ stacked on each other (base stacking), the helix has a pitch of 34 A. B-DNA has tow deep exterior grooves that wind between its sugar-phosphate chains as a consequence of the helix axis passing through the approximate center of each base pair. However, the grooves are of unequal size. The minor groove exposes that edge of a base pair from which its C1’ atoms extend, whereas the major groove exposes the opposite edge of each base pair. Although B-DNA is, by far, the most prevalent form of DNA in the cell, double helical DNAs and RNAs can assume several distinct structures.

33

1.7 Other nucleic acid helices Double helical DNA has three major helical forms: B-DNA, A-DNA, and Z-DNA.

Figure 20: Structures of A-, B-, and Z-DNAs.

34

Figure 21: Structural features of ideal A-, B-, and Z-DNAs.

35