Nucleic Acid Biodiversity: Rewriting the Information in DNA and RNA

Nucleic Acid Biodiversity: Rewriting the Information in DNA and RNA Tamara L. Horton* and Laura F. Landweber† Departments of Ecology and Evolutionary ...
Author: Emerald Logan
0 downloads 0 Views 228KB Size
Nucleic Acid Biodiversity: Rewriting the Information in DNA and RNA Tamara L. Horton* and Laura F. Landweber† Departments of Ecology and Evolutionary Biology† & Molecular Biology* Princeton University, Princeton NJ 08544 USA

From Encyclopedia of Biodiversity. S. Levin, ed. Adademic Press (2000) 4: 415-426.

I. Gene Scrambling A. Ciliates and Nuclear Dualism B. DNA processing during Macronuclear Formation C. Scrambled Genes and the Unscrambling Process II. RNA Editing A. Kinetoplastids: Cryptogenes to proteins by U insertion/deletion B. Myxomycetes: Editing by C, U, dinucleotide insertion and C → U conversion C. Plant organelles: C → U and U → C editing restores conserved amino acids D. Mitochondrial tRNA editing: Acceptor stems and anticodons E. Human Apo B and NF1: Shorter peptides by editing F. A → I deamination: One gene, many peptides G. Paramyxoviruses and Ebola Virus: Polymerase stutter unites reading frames

1

GLOSSARY DNA: linear polymer of nucleotides encoding information for a cell. eukaryote: an organism whose cell or cells have a membrane-bound nucleus. mitochondrion: a eukaryotic cellular organelle which is used for cellular respiration and energy prodction. nucleus: compartment of a cell where DNA is stored on chromosomes. protein: a three-dimensional macromolecule constructed of amino acids which is formed based on an RNA sequence. protist: a member of a diverse collection of eukaryotes, defined only by their exclusion from the groups plants, animals, and fungi. RNA: a linear polymer of nucleotides which is transcribed from DNA.

DNA is often described as a “blueprint for life,” implying that knowledge of the primary sequence of nucleic acids in a genome can give biologists a complete picture of the organism built from these plans. However, neither life nor DNA is that simple. In reality, the sequence of nucleotides in a genome is often just a starting point for the construction of DNA genes and RNA transcripts, which undergo many alterations before organisms actually use the information to fashion their building materials. Two fascinating modes of nucleic acid sequence modification are gene scrambling and RNA editing. Gene scrambling is the rearrangement of DNA segments between a transcriptionally active copy and an archived germline copy. Some genes are broken into more than fifty unordered fragments along a germline chromosome, and then unshuffled during formation of an active somatic chromosome. While gene scrambling assembles existing DNA information into a new order, RNA editing of transcripts creates completely novel sequences which are not even found in the genome. Processing of edited RNAs alters the transcript length and nucleotide identity, and the transformations can render the expressed RNA form unrecognizably different from the DNA molecule of its origin. Although gene scrambling and RNA editing complicate our analysis of genomes, understanding the methods by which organisms achieve these genetic revisions gives us an appreciation for the diversity of ways by which nucleic acids can store and recombine information. I. Gene Scrambling Gene scrambling occurs when coding segments of DNA are mixed in a randomly or non-randomly shuffled order along a chromosome. Before the stored information can be expressed by the organism, the pieces of the gene must be cut apart and reassembled in the proper sequential arrangement. These stunning acrobatics are performed in the genomes of spirotrichous ciliates, protists which have many unique and mysterious features. However, the fundamental lessons learned from spirotrichs about the flexibility of nucleic acid storage mechanisms are universally important. 2

I-A. Ciliates and Nuclear Dualism Ciliates are unicellular protists closely related to the “eukaryotic crown taxa,” meaning that on most phylogenetic trees, they diverge as one lineage near the neighboring cluster of plants, animals, and fungi. The ciliates themselves are a diverse monophyletic group, with certain ciliates estimated to be as evolutionarily distant from one another as corn from rats. All ciliates share two features: a coating of cilia on their cell surfaces and two types of nuclei within single cells. The two nuclei types in each ciliate cytoplasm are different sizes; they are called the micronucleus and the macronucleus. The tiny germline micronucleus is transcriptionally inert, and functions solely in sexual exchange. In contrast, the large somatic macronucleus is responsible for gene expression, but its contents are only transmitted to clonal offspring. Ciliates reproduce asexually by fission, but are capable of exchanging genetic information in a sexual manner independent of reproduction. Conjugation between ciliates leads to an exchange of haploid micronuclei, which fuse to form a zygotic nucleus (See Figure 1.). The bi-parentally-created zygotic nuclei in each mating partner form new micronuclei and macronuclei as the old macronuclei are destroyed. I-B. DNA processing during Macronuclear Formation The micronuclear and macronuclear genomes do not have the same chromosomal structure. As the new macronucleus is formed, diploid micronuclear chromosomes are polytenized and reproducibly broken at certain sites, portions of DNA are deleted, chromosomes are rejoined, and new chromosome ends are healed by telomerase action. DNA in the new, shortened chromosomes is differentially amplified, so that copy numbers of individual chromosomes in Tetrahymena thermophila range from 45 to 9000. Spirotrichous ciliates tend to have higher copy numbers of chromosomes than the oligohymenophoreans like Tetrahymena, and the spirotrich Oxytricha nova has 100,000 copies of a particular chromosome in the macronucleus! Disparities in DNA copy number relate to levels of gene expression, as the most highly amplified DNAs encode the highly expressed ribosomal RNAs, and the quantities of mRNA synthesis and cognate protein secretion from Euplotes raikovi pheromone genes directly correlate with the genes’ individual macronuclear copy numbers. Coding sequences in micronuclear genes are called MDSs, for macronuclear destined sequences (or segments), whereas IESs are internal eliminated (or excised) sequences. MDSs and IESs are superficially analogous to exons and introns, respectively, although the latter two terms only refer to a particular type of RNA processing and do not apply to this unrelated restructuring of DNA. The amount of DNA eliminated during macronuclear formation varies extensively across diverse ciliates: the oligohymenophorean Tetrahymena thermophila eliminates approximately 15% of its micronuclear genome, while spirotrichous ciliates, like Stylonychia and Oxytricha eliminate up to 98% of their micronuclear genomes to create macronuclear chromosomes that bear single genes and very little non-coding sequence.

3

I-C. Scrambled Genes and the Unscrambling Process The drastic genome rearrangements of spirotrichous ciliates are not confined to the extreme quantity of DNA deletions. The protein-coding MDSs in Oxytricha and Stylonychia species are sometimes disordered relative to their final position in the macronuclear copy. For example, Prescott and his colleagues found that in O. nova, the micronuclear copy of three genes (Actin I, α telomere binding protein, and DNA polymerase α) must be reordered while intervening DNA sequences are removed in order to construct functional macronuclear genes (Greslin et al. 1989; Hoffman & Prescott 1996; Mitcham et al. 1992). In Oxytricha nova’s micronuclear genome, the MDSs destined to construct the gene for α telomere binding protein (α TP) are arranged in the cryptic order 1-3-5-7-9-11-2-4-6-8-10-12-13-14 relative to their conventional order in the macronucleus of 1-2-3-4-5-6-7-8-9-10-11-12-13-14. Most impressively, the gene encoding DNA polymerase α (DNA pol α) in O. trifallax and S. lemnae is apparently scrambled in 50 or more pieces in their germline nuclei (See Figure 2.). Homologous recombination at MDS boundaries probably helps guide the unscrambling process. A segment of DNA sequence at the junction between a particular MDS and its downstream IES usually matches the sequence at the junction of the next MDS and its upstream IES, leading to the correct ligation of the two MDSs over a distance. However, the presence of shared repeat regions as short as an average of 4 base pairs for non-scrambled MDSs and 9 base pairs for scrambled MDSs suggests that although these recombination guides may be necessary, they are certainly not sufficient to guide accurate assembly of the genes. Hence it is more likely that the repeats satisfy a structural requirement for MDS joining, rather than perform any role in substrate recognition. Otherwise, incorrectly spliced products of promiscuous recombination would dominate genomes, since two to four base pair repeats occur many thousands of times throughout the micronucleus. This incorrect hybridization could be a driving force in the production of newly scrambled patterns in evolution. Nonetheless, if this sort of ambiguous unscrambling actually occurs during macronuclear development, then only unscrambled molecules which contain both 5' and 3' telomere addition sequences are selectively retained in the macronucleus, ensuring that most haphazardly-ordered genes would be lost. The pattern of MDSs in the micronuclear genome provides a strong clue to both the ciliates’ orchestration of the unscrambling process and the mutational events which led to the scrambling in each gene sequence. For example, in the previously mentioned gene encoding Oxytricha nova’s α telomere binding protein, the micronuclear order of MDSs (1-3-5-7-9-11-2-4-6-8-10-12-13-14) predicts a spiral mechanism in the unscrambling path to link odd and even segments in order (See Figure 3.). In contrast, DNA polymerase α has at least 44 MDSs in O. nova and 51 in O. trifallax, scrambled in a nonrandom order with an inversion in the middle; some MDSs are located at least several kb away from the main gene in an unmapped fragment. A hairpin structure could resolve MDSs during unscrambling (See figure 4). Comparison of the O. nova sequence with that of O. trifallax allows precise predictions for the origin of new scrambled segments (See figure 4 and 5). Micronuclear junction sequences promote pairing between each MDSs and the non-coding IES on the opposite side of a hairpin (See Figure 4.). Any mutation which leads to an increased stabilization of this hairpin during macronuclear chromosome formation would also increase the chance of

4

homologous recombination between correct segments within the micronuclear genome. Consequently, selection would favor the appearance of more scrambled MDS segments in such a non-randomly scrambled gene, since each additional MDS adds more paired junction sequences to stabilize the hairpin necessary for unscrambling the alreadyexisting MDSs. This explanation is consistent with the additional MDSs in the DNA pol α gene in O. trifallax. The arrangement of MDSs 2, 6, and 10 in O. nova could have given rise to the arrangement of eight new MDSs in O. trifallax (See Figure 4.) by multiple crossovers in the germline micronucleus. Thus, the appearance of an inversion leads to the introduction of new MDSs in a nonrandomly scrambled pattern. Gene scrambling in ciliates may have evolved as a product of an increased capacity for homologous recombination. Why it has been detected in only a restricted group of ciliates might reflect their increased levels of recombination, which both generate the scrambled arrangements and the subsequent process of unscrambling them. Although it is difficult to propose an adaptive argument for the presence and maintenance of such a complicated gene-decoding procedure, the forces that have led to ciliate nuclear dualism might be at the root of its origin. For example, Yao and Doerder have proposed that the different roles required of the micronucleus and macronucleus, as well as intragenomic competition, may have led to disruptive selection acting on their genetic organization and chromatin structure. The accommodation of both types of nuclei within a single cytoplasm may promote or at least permit profound differences to arise that distinguish active genes from transmitted genes. Likewise, the gene scrambling present in certain spirotrichous ciliates may be a profound exaggeration of this solution to problems presented by ciliate life. II. RNA Editing "RNA editing" is the alteration of RNA sequences by base modifications, substitutions, insertions, and deletions. The process of rewriting RNA transcripts by editing produces major effects, even adding over half of the nucleotides in some mitochondrial transcripts of kinetoplastid protozoa. By contrast, the impact of editing can still be large even in cases where the physical extent of editing is small. For instance, in human apolipoprotein B transcripts, replacement of a single cytidine (C) by uridine (U) results in the conversion of a glutamine codon to a stop codon; the early termination of apo B translation shortens the resultant polypeptide by one half and removes a functional domain. Since the initial discovery of RNA editing as extensive U insertions and deletions in trypanosome mitochondrial mRNAs, many additional and apparently unrelated examples of editing have been found in organisms ranging from Ebola virus to humans. Substitution/modification editing exists in certain nuclear and organellar RNAs among a diverse set of eukaryotes; however, insertion/deletion editing has only been found in mitochondrial RNAs of two protist groups (See Table 1.). Upon first inspection, the use of RNA editing in gene expression seems inefficient. Why not encode genes in their final edited form, rather than require an additional revision step? However, organisms exploit the ability to edit RNA in amazingly clever ways. RNA editing allows the persistence of “deleterious” mutations in DNA genomes: new point mutations and frameshifts may persist if they are repaired at the RNA level, and DNA copies of genes and control sequences in crowded genomes may overlap yet generate two or more discrete RNA sequences through editing. RNA

5

editing also offers an array of post-transcriptional modes of genetic regulation, through the formation of start and stop codons, intron splice sites, and open reading frames. Editing even permits single genes to produce multiple peptides, allowing combinatorial protein diversity. The following sections review several models which explain our current understanding of the molecular and evolutionary basis of numerous forms of RNA editing. II-A. Kinetoplastids: Cryptogenes to proteins by U insertion/deletion Kinetoplastids are a group of unicellular protists which include the pernicious trypanosome and leishmania parasites responsible for deadly human diseases such as African sleeping sickness and Chagas disease. Translation of mitochondrial mRNAs of kinetoplastids is impossible without massive RNA editing of the transcripts by insertion and deletion of uridine (U’s). This editing drastically rewrites the coding regions of the transcripts by introducing and fixing frameshifts as well as creating stop and start codons. In some kinetoplastid genes, editing creates over 90% of the amino acid codons. Since the DNA copies of many genes are barely recognizable as sources of their cognate mRNAs, they have been named “cryptogenes.” Though all kinetoplastids display specific, reproducible editing patterns in certain mitochondrial mRNAs, comparison of the editing patterns in a particular gene transcript across a variety of species reveals a trend in the extent of RNA editing within each species. The earlier-diverging kinetoplastids exhibit more editing within the cytochrome c oxidase III transcript than their later-diverging relatives. This implies that this type of RNA editing is ancient within this lineage, and that its use has decreased over evolutionary time. However, the lack of U insertion/deletion editing in any other type of organism suggests that it arose specifically within the kinetoplastid lineage. CavalierSmith has proposed that glycosomes, special organelles found only in kinetoplastids, may provide a clue to the origin of editing in these protists. Glycosomes permit an efficient anaerobic lifestyle, during which RNA editing may have become fixed by drift in the seldom-used mitochondria of the early kinetoplastids. Guide RNAs (gRNAs) provide the specificity of the kinetoplastid editing mechanism. gRNAs are tiny RNA transcripts which guide the editing machinery through base-pairing with the mRNAs in edited regions. The gRNAs are complementary to small segments of the fully edited mRNAs, and thus serve as mini-templates to guide the addition and deletion of uridines from the pre-edited mRNA as they form a locally double-stranded RNA helix. For instance, an unpaired A in a gRNA signals for an editing event to insert a U into the mRNA. Complete editing of a gene requires a set of many overlapping gRNAs. The affiliated set of overlapping gRNAs sequentially edit the mRNA from its 3' end to its 5' end. During editing, a complex of proteins called the "editosome" catalyzes the insertion and deletion of uridines until the length of each paired gRNA/mRNA region is maximized. The extent of G·U wobble pairs and conventional AU and G-C pairs of each gRNA/mRNA pair controls the cascade of editing. Each subsequent gRNA binds the mRNA more stably than the last by possessing more Watson-Crick base pairs (A-U and G-C) in the so-called “anchor region” on one end of the gRNA. These Watson-Crick base pairs displace the less stable G·U base pairs on the

6

opposite end of the previous gRNA to dislodge the upstream gRNA and lead to an overall 3’ to 5’ directionality of editing. (See Figure 6.) Guide RNAs are encoded in the kinetoplastid mitochondrial genome. Kinetoplastids are named after the unusual structure of their mitochondrial genomes, which consist of a "kinetoplast," a network of DNA inside a single, large mitochondrion located at the base of the flagellum. In one kinetoplastid subdivision, the trypanosomatids, the kinetoplast consists of a few identical maxicircles of 20-40 kb intertwined with thousands of heterogeneous minicircles of 0.5-3 kb. The maxicircles encode the mitochondrial genes and cryptogenes for respiratory proteins, ribosomal proteins, and rRNA. The only known role of the minicircles is to encode gRNAs. Bodonids, the other group of kinetoplastids, share the presence of a DNA network with the trypanosomatids, but have a different structure for their minicircle homologues. In Bodo caudatus, the “minicircles” are non-catenated 1.4 kb circles. Trypanoplasma borreli has minicircle-like structures of 170 and 200 kb, with tandem one kb repeats encoding most guide RNAs. How did the gRNAs and minicircles come to exist as they do today? A product of mitochondrial DNA recombination provides a plausible explanation. Lunt and Hyman (1997) have identified a “minicircle” of DNA excised from the major mitochondrial genome circle in a nematode. Similar intramolecular recombination within a kinetoplastid maxicircle might have led to the formation of minicircles encoding tiny portions of mRNAs, while other unrecombined maxicircles still contained complete functional copies of the mRNA. If the minicircle copy of a gene fragment acquired a promoter that allowed transcription of an antisense RNA, we propose that such a gene encoding complementary RNA could have given rise to a proto-gRNA. If the protein equipment responsible for catalyzing RNA insertions and deletions simultaneously arose or was recruited from another function to serve in RNA editing, mutations in the mitochondrial mRNA could have been repaired by editing. gRNA-containing minicircles would be selectively retained in the genome, ensuring their survival. II-B. Myxomycetes: The slimes, they are a-changin’! Physarum polycephalum is a myxomycete, or plasmodial slime mold. It takes on many shapes and sizes throughout its life, morphing from microscopic amoeba to a multinucleate syncytium which can be as large as several feet across, then forming millimeter-scale delicate, mushroom-like fruiting bodies. Physarum’s mitochondrial transcripts for almost all messenger and structural RNAs require several types of RNA editing to create functional products. Editing positions are sprinkled regularly throughout the entire length of each edited transcript, spaced an average of 27 bases apart, but never closer than nine nucleotides. The majority of editing events are insertions of cytosine, but there are also a small number of specific and reproducible insertions of uridines and dinucleotides into sequences. Though there appears to be no consensus sequence that defines the hundreds of insertion sites, they usually follow a purine/pyrimidine dinucleotide and show significant preference for third codon position. Visomirski-Robic and Gott (1997) have found that under conditions of stalled RNA polymerization, sites only 14—22 nucleotides from the 3' end of the RNA have correct insertion editing, suggesting that the editing reactions closely follow transcription and they may even be coupled. The tight association of transcription and editing yields a high editing

7

efficiency; un-edited or partially edited transcripts are rarely detected. Physarum insertional editing must occur within a narrow window of possibility: if editing is restricted in vitro by low concentrations of CTP, and CTP is subsequently restored, then sites that were "missed" by the editing process never get edited. Myxomycete insertional editing seems distinct from the uridine insertion/deletion editing found in the kinetoplastids. This is supported by the dissimilar pattern of edited sites, the different identity of nucleotides involved, and Physarum’s apparent lack of gRNA-like template molecules. Though these differences alone set myxomycete editing apart, another remarkable feature of myxomycete editing makes it unique among all other editing systems. Myxomycetes are the only organisms known to combine multiple types of editing within the same transcript. For example, Physarum polycephalum’s 1.5 kb transcript encoding cytochrome c oxidase subunit I (coI) undergoes 59 C insertions, a single U insertion, three dinucleotide insertions, and astoundingly, four sites of C → U base conversion! (See Figure 7.) C-insertion has been separated from C → U conversion in isolated mitochondria, implying separate mechanisms or components. We have found that insertional and base conversion editing have distinct evolutionary histories. The coI gene of Stemonitis flavogenita, another myxomycete, shares all three types of insertional editing with Physarum polycephalum, but lacks the C → U conversion editing in this transcript. Even the three types of insertional editing did not all evolve simultaneously, as U insertional editing is found in all myxomycetes to date, while C insertion and dinucleotide insertion are found only in some slime molds. The sites of C → U editing in Physarum are all in first or second codon position, similar to the case in plant organelle editing (see below). Also similar is that the process of editing restores the conserved peptide sequence in this region, so that the unedited Stemonitis transcript codes for the same amino acids at these positions as the edited Physarum transcript. By contrast, the role of the inserted nucleotides, usually in third codon positions, may be primarily to restore the correct reading frame, since they rarely change crucial coding information. II-C. Plant organelles: C → U and U → C editing restores conserved amino acids Plant mitochondria employ rampant RNA editing. In nearly every mitochondrial mRNA, RNA editing converts many cytidines to uridines, and some mitochondrial mRNAs have uridine to cytidine conversions as well. Several plant chloroplast RNAs also show similar editing patterns. Though the mechanism for plant organellar editing is not fully understood, when CTP residues of in vitro transcripts were labeled on their α−phosphate and on their cytosine base, the labels were retained after editing. This means that the C → U changes occur through a deamination of cytidine, rather than a base or nucleotide substitution. Plants with different nuclear genotypes exhibit different degrees of editing of transcripts from identical mitochondrial genomes, indicating that at least part of the editing machinery is nuclearly-encoded. Though no consensus sequence for editing site recognition has been determined, recent analysis (Bock et al. 1997; Williams et al. 1998) of natural mutants and creation of mutant sequences indicate that the specificity for editing sites in both organelles lies in local upstream primary sequence, rather than in the predicted folded conformation of an edited region of RNA. RNA editing is present in both the mitochondrial and chloroplast genomes of all land plants, but absent in green algae and the liverwort Marchantia polymorpha. This

8

distribution suggests that the plant editing systems share common components, and may have arisen simultaneously in both organelles or have been transferred from one to the other. However, though the editing seems to appear suddenly in the land plant lineage, the degree of mitochondrial or chloroplast editing does not correlate with phylogenetic position, indicating multiple evolutionary losses and gains of editing at particular gene sites. Plants which possess the ability to perform base conversion may exploit the conversions as a mechanism for repairing sequences disturbed by mutational drift. The observation that most of the edited sites in plant genes are in first or second codon positions supports this hypothesis, as editing produces non-synonymous changes in amino acids. Furthermore, Malek and colleagues (Malek et al. 1996) found that the level of mitochondrial editing in plants correlates with G/C content. Indeed, Lu and associates (1998) determined that editing "reverses" all non-synonymous U → C substitutions at the RNA level in coI genes from eight gymnosperm species, eliminating almost all variation in the predicted protein sequences. Editing can regulate translation or enzyme activity; it creates stop and start codons and proteins of varying function. However, the pressure toward loss of editing appears to be greater than any selective advantage it confers, since Shield and Wolfe’s (1997) comparison of edited sites over many plant species reveals a high rate of mutation of the genes toward the "edited" (uridine) nucleotide. II-D. Mitochondrial tRNA editing: Acceptor stems and anticodons Mitochondrial genomes of some organisms seem pressed to economize space. "Junk" DNA is held to a minimum, and genes immediately abut their neighbors. Some tRNA genes overlap their 5' and 3' extremities. Base-pairing of a tRNA’s 5’ and 3’ ends is essential to form an acceptor stem for aminoacylation. In six described cases of tRNA editing, mitochondria repair incompletely paired tRNA acceptor stems by exchanging mismatched bases on one side of the acceptor stem for ones that complement those bases present on the opposing stem. Thus editing and the presence of an internal acceptor stem template allow the genome to repair mutations that may occur in response to selection for either nucleotide compositional bias or sequence compression. In land snails, squids, and chickens, the overlapping ends of certain tRNAs disturb the crucial basepairing of the acceptor stems. RNA editing activity restores complementarity to these tRNAs by replacing mismatched 3’ guanosine (G), uridine (U), and cytidine (C) nucleotides with adenosines (A). In a platypus tRNA, there are three exchanged nucleotides (U → A, U → C, and A → C) in the 3’ half of the stem to complement the 5’ half of the acceptor stem. There is also editing of mitochondrial tRNA acceptor stems in the amoeboid protist Acanthamoeba and the fungus Spizellomyces, using the second half of the stem as guides. However, the editing mechanism in these two organisms may be unrelated to the tRNA editing described in animals, since it occurs on the 5’ half of the stem. Editing of the Acanthamoeba and Spizellomyces tRNAs exchanges uridines, adenosines, and cytidines in the first three 5’ nucleotides for purines complementary to a corresponding pyrimidine on the 3’ template side of the stem. In opossums and other marsupials, the DNA encoding mitochondrial tRNAAsp has the wrong anticodon. Though the rest of the tRNA has a canonical tRNAAsp sequence, the anticodon, GCC, will pair with glycine codons, rather than aspartic acid codons.

9

About half of the tRNAAsp transcripts are not edited and consequently are aminoacylated with glycine. The remainder are edited by a C → U change that restores the GUC aspartic acid anticodon. These are charged correctly with aspartic acid, since this anticodon is used as a determinant of tRNA charging. Surprisingly, the standard tRNAGly (with a UCC anticodon) is present in the genome as well, and could theoretically recognize all four GGN glycine codons by wobble. However, a mutation just outside of the anticodon region of this tRNAGly renders it incapable of recognizing the two glycine codons which the mutated tRNAAsp can bind. Boerner and Pääbo (1996) suggest that marsupial mitochondrial editing became fixed through two mutational steps. First, a mutation occurred in the anticodon for tRNAAsp, transforming it into a functional tRNAGly. Genomes with this mutation were still viable, since the altered tRNA could regain its necessary function as tRNAAsp through editing. A subsequent mutation in the original tRNAGly was not lost because of redundancy in tRNAGly activity: a mutation outside the anticodon of the original tRNAGly eventually left it unable to recognize the two glycine codons that the newly mutated tRNAAsp/Gly could translate. After this second mutation in the original tRNAGly, backmutation of the tRNAGly/Asp anticodon (to form simply a tRNAAsp anticodon) would be deleterious, since the mutant tRNAGly/Asp serves double-duty by pairing with both aspartic acid and glycine codons. Thus, RNA editing permitted a deleterious change in a genome, and then became fixed when a second mutation made editing a requirement for expression of mitochondrial proteins. II-E. Human Apo B and NF1: Shorter peptides by editing Apolipoprotein B is present in two different forms in both humans and rodents. The long form, apo B100, is part of very low density lipoprotein (VLDL) particles which have a role in cholesterol metabolism, while the short form, apo B48, contributes to chylomicrons that transport dietary lipids. The two forms of the apo B protein are actually encoded by a single gene which undergoes tissue specific C → U RNA editing at nucleotide position 6666. The editing converts an encoded glutamine codon into a stop codon to create the short form of the peptide. A complex of proteins edits the apo B transcript. The main catalytic peptide is apo B RNA editing cytidine deaminase subunit 1 (APOBEC-1). APOBEC-1 expression varies throughout development, and apo B editing levels vary correspondingly. Specificity of the cytidine deamination is determined by the primary sequence of the apo B RNA transcript, particularly an eleven base "mooring sequence," located just four nucleotides downstream of the edited site. In addition to this required mooring sequence three other "efficiency" elements are found in the 140 base region surrounding the edited cytidine. These upstream and downstream elements increase the effectiveness of the editing reaction. APOBEC-1 is part of a family of cytidine deaminases. The family is split into two groups of larger and smaller deaminases with various structural features in common. APOBEC-1 is categorized as a member of the group of larger deaminases, as is E. coli cytidine deaminase (ECCDA). The two enzymes are of approximately the same size, form homodimers, and share such structural features as the carboxy-terminal core domain, which is absent in the smaller, homotetramer-forming deaminases. ECCDA catalyzes the deamination of single cytidine nucleosides as part of bacterial biosynthetic

10

pathways. Comparison of the APOBEC-1 primary sequence with the ECCDA crystal structure predicts the presence of an additional hollow space within the APOBEC-1 structure. Navaratnam and associates (1998) noted that this cavity is just the correct size and shape to accommodate an RNA transcript, which suggests how this type of editing might occur. Cytidine deamination may even play a role in human tumorogenesis. For example, an imperfect APOBEC-1 mooring sequence and efficiency elements are present within the coding region of the neurofibromatosis type I (NF1) tumor suppressor gene. Despite slight differences in position and sequence context, normal individuals exhibit very low levels of editing. In comparison, tumor tissues from neurofibromatosis type I affected individuals showed more than eight times the normal quantity of edited transcript. The NF1 gene encodes the neurofibromin protein, which is a putative homologue of yeast proteins in the ras signal transduction pathways. The proposed GTPase activating domain of the neurofibromin lies just downstream of the NF1 editing site, and indeed C → U RNA editing at the site transforms an arginine codon into a premature translation stop upstream of the domain. Thus the editing of the NF1 transcript most likely cripples the tumor suppressing activity of neurofibromin. II-F. A → I Deamination: One gene, many peptides Deamination of another type alters additional RNA sequences of humans and rodents. Removal of adenosine’s amino group yields inosine, a nucleotide that the translation machinery reads as guanosine. Several such A → I transitions cause predicted amino acid changes in glutamate and serotonin receptor subunits in the human and rodent brain. Teleost fish also share one of these A → I editing sites in their glutamate receptors. These editing events adjust the calcium channel permeability and the speed of the desensitization response in glutamate receptors. In the serotonin receptors, protein loops encoded by the edited region of the RNA interact with G-proteins, to turn on signaling pathways. The editing, along with alternative splicing, allows the exquisite fine-tuning of neural responses by increasing combinatorial protein diversity available from a single transcript. Post-transcriptional regulation demands the expenditure of less time and energy than transcription of multiple gene sequences. A family of enzymes called "ADARs" (adenosine deaminase acting on RNA) are the candidates for A → I editing activity, based on in vitro experiments. At least three different ADAR enzymes are expressed differentially in human tissues, and their specific targets have not been definitively resolved in vivo, though they display discrete editing abilities at various mRNA sites in vitro. The double-stranded RNA regions required for ADAR action are elegantly provided by pairing of the exonic sequences with intronic sequences in the pre-mRNAs. Neither mature mRNAs whose introns have been removed by splicing nor pre-mRNAs with mutations disturbing intron/exon complementarity act as substrates for editing in transfected cell lines. Mice genetically disabled for A → I editing at a single site demonstrate the importance of RNA editing in glutamate receptors. Brusa (Brusa et al. 1995) removed an intron region crucial for creating a double stranded editing substrate from the mouse GluR-B gene. Heterozygotes for this editing-disabled allele displayed severe epileptic seizures and premature death within 3 weeks of birth. The decrease in edited mRNA

11

levels led to a five-fold increase in the calcium flow into their neurons, and serious damage to their nervous systems. Vertebrates are not alone in using the A → I editing activity to generate diversity in nervous system components. Recent work (Petschek et al. 1996; Smith et al. 1996) reveals that two Drosophila mRNAs have apparent A → G substitutions between their genomic and RNA copies. The 4f-rnp gene encoding a putative RNA-binding protein is expressed in several alternatively-spliced variants during fruitfly development. The adult form, expressed heavily in the central nervous system, has probable deamination editing of 263 adenosine sites. Another Drosophila gene, which encodes a subunit of a calcium channel protein, has seven putative A → I editing sites, five of which alter codon identity, in addition to many alternatively-spliced forms. Scott (1995) suggests that ADARs might have evolved for an anti-viral function, destabilizing RNA genomes by modification and subsequent unwinding. Interestingly, a devious viral genome co-opted this function, and uses host adenosine deaminases to its own benefit. Hepatitis Delta Virus, a single stranded RNA subvirus of Hepatitis B, has only one known gene product, the delta antigen. The short form of delta antigen stimulates replication of the genome, while the long form of the antigen suppresses replication and is necessary for packaging HDV. The short and long forms of the delta antigen differ only by a single A → I base change of the negative strand, or antigenome. The base conversion replaces a stop codon at the end of the shorter peptide’s mRNA with a tryptophan codon to allow read through of the longer peptide. In vitro, ADAR1 is capable of inducing this A → I change on the anti-genome of the HDV virus, suggesting that either this or a related enzyme edits HDV in vivo. In exquisite control of the reaction, the long form delta antigen acts as a repressor of editing in a negative feedback loop. The repression prevents accumulation of unduly high levels of long edited antigen, which would lower the level of viral replication. II-G. Paramyxoviruses and Ebola Virus: Polymerase stutter unites reading frames Two other types of viruses also capitalize upon RNA editing processes as a method of increasing information storage while under pressure for genome size constraint. The RNA genomes of the Paramyxoviruses have several overlapping genes. Some of these genes are activated for expression by ribosomal choice, while others become available by co-transcriptional RNA editing. The polymerase may slip as it travels through a purine run, adding from one to six extra guanosines, depending upon the sequence present in the particular virus species. The stuttering polymerase thereby fuses coding regions together to make extended versions of genes. Many viruses use an apparently similar mechanism to add poly A tails to mRNA transcripts. When the RNA polymerase reaches the U-rich regions, it adds additional non-templated adenines to create the tails. In Paramyxoviruses, RNA editing activity may be related to a process that corrects genome length, maintaining length in multiples of six nucleotides. The substrate for the RNA polymerase is the RNA genome complexed with the capsid, in hexamer-length segments. Hausmann and colleages (Hausmann et al. 1996) note that if the genome is not a multiple of six, the polymerase inserts or deletes guanosines or adenines from the same region in which the RNA editing occurs.

12

Another RNA virus, the infamous Ebola virus, uses a comparable mechanism to insert adenines into a sequence to unite two reading frames. In this case, the sequence of the site where adenine addition occurs appears similar to viral poly A addition sites. In contrast to the Paramyxovirus editing, the additions occur even when the RNA is produced by a non-native polymerase, and thus must be intrinsic to the template sequence or structure. T7 transcription of the Ebola mRNA in vitro results in edited product, although in a lower quantity than when transcribed by Ebola’s RNA polymerase. Thus, once considered a molecular anomaly unique to protists, RNA editing is now recognized as a vital part of gene expression in a wide distribution of eukaryotes and their viruses. Organisms can alter RNA sequences through many different mechanisms, by recruiting enzymes such as deaminases, nucleases, ligases, and special polymerases to specific RNA editing tasks. Editing clearly arose multiple times in evolutionary history, in the various forms of both insertional and substitutional editing, all of which profoundly affect the expression of RNAs, the products they form, and the biological processes in which they participate. RNA editing is significant in cancers, cholesterol regulation, and neural function in vertebrates, as well as the de novo creation of coding sequences from obscured mitochondrial genes. Furthermore, the impact of editing on the genomes which employ it is astounding. Genomes that use editing may become increasingly lenient towards persistence of deleterious mutations. Large amounts of genetic information and control devices may be confined to small spaces. Lastly, RNA editing, like gene scrambling, establishes a device for generating combinatorial sequence diversity. Acknowledgements: We thank Laura A. Katz and Andrew L. Goodman for helpful discussions. TLH is supported by a National Defense Science and Engineering Graduate Fellowship.

13

Bibliography Arts, JG & R Benne (1996). “Mechanism and evolution of RNA editing in kinetoplastida.” Biochimica et Biophysica Acta 1307: 39-54. Bass, BL (1993). “RNA editing: new uses for old players in the RNA world.” pp 383418 in The RNA World, RF Gesteland & JF Atkins, eds. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. Börner, GV, S Yokobori, M Mörl, M Dörner & S Pääbo (1997). “RNA editing in metazoan mitochondria: staying fit without sex.” FEBS Letters 409: 320-324. Chan L (1995). “Apolipoprotein B messenger RNA editing: an update.” Biochimie 77: 75-78. Covello, PS & MW Gray (1993). “On the evolution of RNA editing.” Trends in Genetics 9(8): 265-268. Greslin, AF, DM Prescott, Y Oka, SH Loukin & JC Chappell (1989). “Reordering of nine exons is necessary to form a functional actin gene in Oxytricha nova.” Proc Natl Acad Sci U S A 86: 6264-6268. Grosjean, H & R Benne, eds (1998). Modification and Editing of RNA. ASM Press, Washington, DC. Hoffman, DC & DM Prescott (1996). “The germline gene encoding DNA polymerase alpha in the hypotrichous ciliate Oxytricha nova is extremely scrambled.” Nucleic Acids Res 24(17): 3337-40. Landweber, LF, Kuo, T-C, Curtis, EA (2000). “Evolution and assembly of an extremely scrambled gene.” Proc. Natl. Acad. Sci. 97: 3298-3303 Mitcham, JL, AJ Lynn & DM Prescott (1992). “Analysis of a scrambled gene: the gene encoding alpha-telomere-binding protein in Oxytricha nova.” Genes Dev 6(5): 788-800. Prescott, DM (1994). “The DNA of ciliated protozoa.” Microbiol Rev 58(2): 233-67. Schuster, W, R Hiesel & A Brennicke (1993). “RNA editing in plant mitochondria.” Cell Biology 4: 279-284. Seeburg, PH, M Higuchi & R Sprengel (1998). “RNA editing of brain glutamate receptor channels: mechanism and physiology.” Brain Research Reviews 26: 217-229. Smith, HC, JM Gott & MR Hanson (1997). “A guide to RNA editing.” RNA 3: 11051123.

14

3

2

1 4

7 5 6 Figure 1: Ciliate Sex. The micronuclei of conjugating ciliates undergo meiosis, exchange, and fusion to form new genetic combinations. Between steps 1 and 2, the ciliates conjugate. In the transition from 2 to 3, the micronuclei have undergone meiosis to form haploid micronuclei, while the old macronuclei have been destroyed. In step 4, the haploid micronuclei are exchanged, and in step 5 they fuse. By step 6, two unique diploid micronuclei are formed with genetic material from both parents. At step 7, a new macronucleus is formed from each new micronucleus.

15

Figure 2: Overview of gene unscrambling. Dispersed coding MDSs 1-7 reassemble during macronuclear development to form the functional gene copy (top), complete with telomere addition to mark and protect both ends of the gene.

16

12

2 4

3 11

1 5

10

9

7

6

8

Figure 3: Spiral model for unscrambling in α-TP (adapted from Mitcham et al. 1992).

17

Figure 4: Model for scrambling of DNA pol α. Vertical lines indicate recombination junctions between scrambled MDSs, guided by direct repeats. MDS 1 contains the start of the gene. MDS 10 in O. nova can also give rise to three new MDSs (13–15) in O. trifallax, one scrambled on the inverted strand, by two spontaneous intramolecular recombination events (×’s) in the folded orientation shown. O. nova MDS 6 can give rise to O. trifallax MDSs 7–9 (MDS 8, shaded, is only 6 bp and was not identified in (Hoffman and Prescott 1997)). O. trifallax non-scrambled MDSs 2 and 3 could be generated by the insertion of an IES in O. nova MDS 2 (similar to a model suggested by M. DuBois in Hoffman and Prescott 1997).

18

Figure 5: Proposed model for the origin of a scrambled gene. Left: birth of a scrambled gene from a non-scrambled gene by a double recombination with an IES or any noncoding DNA (new MDS order 1-3-2 with an inversion between MDSs 3 and 2). Middle: generation of a scrambled gene with a non-random MDS order, from a non-scrambled gene with an inversion between two MDSs. Right: creation of new scrambled MDSs in a scrambled gene containing an inversion. Inversions may dramatically increase the production of scrambled MDSs, by stabilizing the folded conformation that allows reciprocal recombination across the inversion.

19

Table 1: RNA editing in eukaryotes and viruses. ORGANISM kinetoplastids Myxomycetes

GENOME mitochondria mitochondria

CLASS OF RNA FORM OF EDITING mRNA U insertion,U deletion mRNA, rRNA, tRNA C insertion, U insertion, mixed dinucleotide insertion, C →U

plants plants humans, rodents humans, rodents, fish humans, rodents humans Drosophila marsupials monotremes land snails Loligo (squid) chicken Acanthamoeba Spizellomyces (fungus) Hepatitis Delta virus Paramyxoviruses

mitochondria chloroplast nucleus nucleus

mRNA, rRNA, tRNA mRNA mRNA mRNA

C → U, U → C C → U, U → C C →U A →I

nucleus nucleus nucleus mitochondria mitochondria mitochondria mitochondria mitochondria mitochondria mitochondria

mRNA mRNA mRNA tRNA tRNA tRNA tRNA tRNA tRNA tRNA

viral

mRNA

U →C U →A A →I C →U U → A, U → C, A → C C → A, U → A, G → A G →A G →A U → A, U → G, A → G A → G, U → G, U → A, C →A A →I

viral

mRNA

Ebola Virus

viral

mRNA

20

G insertion (by polymerase stutter) A insertion (by polymerase stutter)

gRNA 4

gRNA 2

3'

5'

5'

gRNA 1

gRNA 3

5' 5'

5' 3'

**

mRNA

Figure 6: Editing of a gene region by four overlapping gRNAs. Thick lines in the mRNA are encoded in the mitochondrial DNA. Thin shaded lines are inserted U’s; the two asterisks are deleted U’s (Maslov and Simpson 1992). Thin lines in the gRNA’s are guide nucleotides (A or G) that pair with inserted U’s. Vertical lines indicate Watson-Crick base pairs; colons indicate G:U wobble base pairs, illustrating formation of well-paired ‘anchors’ between the 5' ends of gRNA’s and the corresponding region of the mRNA.

21

CUAUUUUUAGUCUGCAUCUAGCUGGUGUUUCUUCUAUGUUAGGUGCUAUCA AUUUCAUUUGUACCAUUAAAAAUAUGCGUCUUAAAGGAUUAACAGGAGAAC GUUUAUCUUUAUUUGUUUGGGCUGUAUUAGUAACUGUGAUUUUAUUAUUAC UUUCACUGCCUGUCUUAGCAGGUGCUAUCACUAUGUUAUUAACUGAUCGUA AUUUUAAUACAUCUUUUUUUGAUGCAACCGGUGGUGGAGAUCCUAUUUUAU AUCAACAUUUGUUUUGGUUUUUUGGCCAUCCAGAAGUUUACAUUUUAAUUU UACCUGGUUUUGGUAUCGUUUCUAUUAUUAUUCAAGCCUAUGCUAAUAAAG CUAUUUUUGGUUAUUUAGGUAUGGUGUAUGCUAUGUUGUCUAUUGGUAUCU UGGGUUUUAUAGUGUGGGCUCAUCAUAUGUAUACUGUAGGAUUGGAUGUGG AUACUCGCGCUUAUUUCACCGCUGCUACUAUGAUCAUUGCUGUGCCAACCG Figure 7: RNA Editing in Physarum polycephalum. Partial Physarum polycephalum mRNA sequence for gene encoding cytochrome c oxidase subunit I, with underlined text indicating inserted nucleotides, and boxed text indicating C→U substitution events (Gott et al, 1993).

22