CHAPTER 4 GENETIC MARKERS MORPHOLOGICAL, BIO-

CHAPTER 4 GENETIC MARKERS – MORPHOLOGICAL, BIOCHEMICAL AND MOLECULAR MARKERS A genetic marker is any visible character or otherwise assayable phenoty...
Author: Allan Reeves
7 downloads 2 Views 5MB Size
CHAPTER 4 GENETIC MARKERS – MORPHOLOGICAL, BIOCHEMICAL AND MOLECULAR MARKERS

A genetic marker is any visible character or otherwise assayable phenotype, for which alleles at individual loci segregate in a Mendelian manner. Genetic markers can be used to study the genetics of organisms, including trees, at the level of single genes. The development of the discipline of genetics would not have been possible without genetic markers such as the visible characters in peas and Drosophila. Trees, unfortunately, do not have a large number of visible Mendelian characters (Chapter 3) and for many years, this was a limitation in forest genetics research. It was not until the early 1970s that biochemical genetic markers such as terpenes and allozymes were developed for trees. These biochemical markers were applied to an array of research problems, most notably the study of amounts and patterns of genetic variation in natural populations of trees and the characterization of mating systems (Chapters 7-10). A major limitation of biochemical markers, however, is that there are only a small number of different marker loci; therefore, genetic information obtained from such markers may not be very representative of genes throughout the genome. The limitation in the number of markers was overcome beginning in the 1980s with the development of molecular or DNAbased genetic markers. Molecular markers have a wide variety of applications in both basic research and applied tree improvement programs. The goal of this chapter is to briefly describe the basic properties of genetic markers and to introduce some of their uses in forestry. Additional reading on genetic markers can found in Adams et al. (1992), Mandal and Gibson (1998), Glaubitz and Moran (2000), and Jain and Minocha (2000).

USES AND CHARACTERISTICS OF GENETIC MARKERS Before describing the many types of genetic markers available for use in forest trees, it is first appropriate to consider the various applications of genetic markers and the desired attributes for such applications. Genetic markers are used to study the genetics of natural and domesticated populations of trees and the forces that bring about change in these populations. Some of the more important applications of genetic markers include: (1) Describing mating systems, levels of inbreeding, and temporal and spatial patterns of genetic variation within stands (Chapter 7); (2) Describing geographic patterns of genetic variation (Chapter 8); (3) Inferring taxonomic and phylogenetic relationships among species (Chapter 9); (4) Evaluating the impacts of domestication practices, including forest management and tree improvement, on genetic diversity (Chapter 10); (5) Fingerprinting and germplasm identification in breeding and propagation populations (Chapters 16, 17, and 19); (6) Constructing genetic linkage maps (Chapter 18); and (7) Marker assisted breeding (Chapter 19). We describe several different types of genetic markers in this chapter, each of which has different attributes that make it more or less desirable to use in certain applications. Some of the desirable attributes of a given type of genetic marker are that it be: (1) Inex-

53

54 Genetic Markers pensive to develop and apply; (2) Unaffected by environmental and developmental variation; (3) Highly robust and repeatable across different tissue types and different laboratories; (4) Polymorphic, i.e. reveal high levels of allelic variability; and (5) Codominant in its expression.

MORPHOLOGICAL MARKERS As discussed in Chapter 3, very few simple Mendelian morphological characters have been discovered in forest trees that could be used as genetic markers. Many of the identified morphological markers are mutations observed in seedlings such as albino needles, dwarfing and other aberrations (Franklin, 1970; Sorensen, 1973) (Fig. 5.2). Such mutants have been used to estimate self-pollination rates (Chapter 7) in conifers. These markers, however, have limited application because morphological mutants occur rarely and often are highly detrimental or even lethal to the tree.

BIOCHEMICAL MARKERS Monoterpenes Monoterpenes are a subgroup of the terpenoid substances found in resins and essential oils of plants (Kozlowski and Pallardy, 1979). Although the metabolic functions of monoterpenes are not fully understood, they probably play an important role in resistance to attack by diseases and insects (Hanover, 1992). The concentrations of different monoterpenes, such as alpha-pinene, beta-pinene, myrcene, 3-carene and limonene are determined by gas chromotography and are useful as genetic markers (Hanover, 1966a, b, 1992; Squillace, 1971; Strauss and Critchfield, 1982). Monoterpene genetic markers have been applied primarily to taxonomic and evolutionary studies (Chapter 9). However, they have also been used to a limited extent to estimate genetic patterns of geographic variation within species (Chapter 8). Although monoterpenes were the best available genetic markers for forest trees in the 1960s and early 1970s, they require specialized and expensive equipment for assay. In addition, there are relatively few monoterpene marker loci available and most express some form of dominance in their phenotypes. Dominant genetic markers have the disadvantage that dominant homozygous genotypes cannot be distinguished from heterozygotes carrying the dominant allele. Monoterpenes were gradually replaced by allozyme genetic markers because allozymes are less expensive to apply, are codominant in expression, and many more marker loci can be assayed. Allozymes Allozymes have been the most important type of genetic marker in forestry and are used in many species for many different applications (Conkle, 1981; Adams et al., 1992). Allozymes are allelic forms of enzymes that can be distinguished by a procedure called electrophoresis. The more general term for allozymes is isozymes, and refers to any variant form of an enzyme, whereas allozyme implies a genetic basis for the variant form. Most allozyme genetic markers have been derived from enzymes of intermediary metabolism, such as enzymes in the glycolytic pathway; however, conceivably an allozyme genetic marker could be devel-

Genetic Markers

55

oped from any enzyme. Allozyme analysis is fairly easy to apply (Fig. 4.1) and standard protocols for its use in trees are available (Conkle et al., 1982; Cheliak and Pitel, 1984; Soltis and Soltis, 1989; Kephart, 1990). Crude protein extracts are isolated from almost any tissue type and then are separated on starch gels by applying an electrical current (i.e. electrophoresis). Isozymes in the protein extract migrate to different positions on the gel depending on the electrical charge and size of the isozyme. Isozymes with different amino acid composition generally have a different charge and/or size, so it is these genetic differences that are revealed as mobility differences on the gel. The location of an isozyme on a gel following electrophoresis is visualized by placing the gel in a solution that contains the enzyme substrate, appropriate cofactors and a dye. The colored bands on the gel are the products of the enzymatic reactions linked to the dye. Before allozymes can be used for genetic studies, however, the Mendelian inheritance of allozymes must be established. The inheritance of allozymes can be determined by using crosses between trees similar to the types of crosses Mendel used with peas, e.g. hybrid crosses and test crosses. In conifers, however, the unique genetic system offered by the seed tissues is most often used. The conifer megagametophyte (i.e. nutritive tissue surrounding the embryo) is haploid and genetically identical to the egg cell, since they are products of the same meiotic event (Fig. 4.2). Therefore, a sample of seed megagametophytes from a single tree represents a population of maternal meioses. This is a very convenient system because segregation analysis can be performed by simply using open-pollinated seed from mother trees, which eliminates the need for controlled crosses to establish the Mendelian inheritance of allozyme variants (Fig. 4.2). Assays have been developed for many enzymes and as many as 25-40 different allozyme loci can be detected depending on the tree species and tissue type. Because of their codominant expression and relatively high level of polymorphism, allozymes have been used extensively for estimating genetic variation in tree species (Chapters 7-8), to a lesser extent for evolutionary studies (Chapter 9), and for monitoring various gene conservation (Chapter 10) and tree improvement activities (Adams, 1983; Wheeler and Jech, 1992). Other Protein Markers Another type of protein-based genetic marker utilizes two-dimensional polyacrylamide gel electrophoresis (2-D PAGE). Unlike allozymes where single known enzymes are assayed individually, the 2-D PAGE technique simultaneously reveals all enzymes and other proteins present in the sample preparation. The proteins are revealed as spots on gels and marker polymorphisms are detected as presence or absence of spots. This technique has been used most extensively for linkage mapping in Pinus pinaster where protein polymorphisms have been assayed from both seed and needle tissues (Bahrman and Damerval, 1989; Gerber et al., 1993; Plomion et al., 1997). Although 2-D PAGE has the potential advantage that many marker loci can be assayed simultaneously on a single gel, assays are more difficult than in allozyme analyses, and the markers are often dominant in their expression.

MOLECULAR MARKERS A vast array of DNA-based genetic markers has been discovered since 1980 and new marker types are developed every year. There are two general types of DNA markers:

56 Genetic Markers

Fig. 4.1. Allozyme analysis in forest trees: (a) Step 1, tissues for assay are prepared such as megagametophytes and embryos from conifer seeds; Step 2, tissues are homogenized in an extraction buffer and absorbed onto small filter paper wicks; Step 3, wicks are loaded onto gels and samples are electrophoresed; Step 4, gel slices are stained to reveal allozyme bands; and (b) Three general types of allozyme patterns are observed in heterozygotes carrying slow and fast migrating variants, depending on whether an enzyme is functional as a monomer (lane 1), dimer (lane 2) or multimer (lane 3). Note that in all three cases, both alleles are expressed (i.e. are codominant) in heterozygotes.

Genetic Markers

57

Fig. 4.2. The conifer megagametophye and embryo genetic system: (a) In conifer seeds, the embryo is diploid (2N), while the megagametophyte is haploid (1N) and genetically identical to the egg gamete; and (b) Allozymes from an individual megagametophyte show the product of just one allele. Megagametophyes from a single seed tree, heterozygous for an allozyme locus, are expected to segregate in a 1:1 ratio of fast and slow allozyme bands (alleles), as shown here for six megagametophytes from seed of a heterozygous mother tree. (Photo courtesy of G. Dupper, Institute of Forest Genetics, Placerville, California).

(1) Those based on DNA-DNA hybridization; and (2) Those based on amplification of DNA sequences using the polymerase chain reaction (PCR). Important technical aspects of these two approaches are discussed in detail in the following sections. More comprehensive reviews of molecular markers in forestry are available (Neale and Williams, 1991; Neale and Harry, 1994; Echt, 1997), so only the marker types most often used in forest trees are discussed here. DNA-DNA Hybridization: Restriction Fragment Length Polymorphism Genetic marker systems based on DNA-DNA hybridization were developed in the 1970s. Eukaryotic genomes are very large and there was no simple way to observe genetic polymorphisms of individual genes or sequences. The property of complementary base pairing allowed for methods to be developed whereby small pieces of DNA could be used as probes to reveal polymorphisms in just the sequences homologous to the probe. The genetic system derived using this approach is called restriction fragment length polymorphism (RFLP). Restriction fragment length polymorphism (RFLP) markers were the first DNA-

58 Genetic Markers based genetic markers developed (Botstein et al., 1980). A brief description of the RFLP procedure is shown in Fig. 4.3. To begin, total cellular DNA is digested with a restriction endonuclease (Box 4.1), which reduces the genome to a large pool of restriction fragments of different sizes. Hundreds of restriction endonucleases have been discovered that cleave DNA at specific recognition sites of varying length and sequence. However, just a few of these enzymes (e.g. HindIII, EcoRI, BamHI) are routinely used because they generally provide the best size distribution of DNA fragments and are inexpensive. Restriction endonuclease recognition sites are found throughout the genome, both in coding and non-coding regions, and are a powerful way to sample DNA sequence variation in the genome. The restriction fragments are then separated by their size on an agarose gel by electrophoresis. It is possible to visualize DNA within such a gel by staining it with ethidium bromide; however, because there are typically so many restriction fragments of all possible sizes, discrete fragments cannot be seen. To overcome this problem, the fractionated DNA is transferred and chemically bound to a nylon membrane by a process called Southern blotting, named after its inventor E.M. Southern (1975). Specific DNA fragments are visualized by hybridizing the DNA fragments bound to the nylon membrane with a radioactively- or fluorescently-labeled DNA probe. A DNA probe is just a small piece of DNA used to reveal its complementary sequence in the DNA bound to the membrane. The DNA probing relies on complementary base pairing; both the DNA fragments on the nylon membrane and the probe are first denatured so that they are single stranded and available to pair with their complementary DNA sequence. DNA probes have been developed for genetic marker analyses of all three plant genomes: nDNA, cpDNA and mtDNA. The chloroplast and mitochondrial genomes are relatively small, so it has been possible to digest these genomes with a restriction endonuclease and clone individual sections of these genomes using standard plasmid cloning techniques (Box 4.1). The cpDNA and mtDNA probes have been developed for a number of forest tree species by cloning DNA fragments from those genomes (Strauss et al., 1988; Lidholm and Gustafsson, 1991; Wakasugi et al., 1994a,b). Each of these cloned fragments contains several different genes; therefore, when one clone is used for a probe in RFLP analysis, it is possible to reveal genetic variation for a number of organelleencoded genes at once. Developing probes for RFLP analysis of nDNA is more problematic because of the large amount of repetitive DNA in the nuclear genome. Two types of probes are commonly used: genomic DNA (gDNA) probes and complementary DNA (cDNA) probes. Probes are isolated from DNA libraries, which are a large collection of cloned fragments resulting from a single cloning experiment. Both cDNA and gDNA probes are equally easy to use and reveal abundant genetic variation in trees (Devey et al., 1991; Liu and Furnier, 1993; Bradshaw et al., 1994; Byrne et al., 1994; Jermstad et al., 1994). The gDNA probe libraries, however, are much easier to construct than cDNA probe libraries because the difficult task of mRNA isolation is not required. The cDNA probes are derived from expressed genes because cDNA is derived from mRNA (Box 4.2), whereas gDNA probes generally are not; therefore, cDNA probes are often preferred for many applications of RFLP analysis in trees. The genetic interpretation of RFLP banding patterns can be difficult especially in conifers whose large genomes often lead to large numbers of fragments revealed by a single probe. Examples illustrating the molecular basis of several RFLP patterns are shown in Fig. 4.4, as well as the Mendelian interpretations of these band patterns.

Genetic Markers

59

Fig. 4.3. Restriction fragment length polymorphism (RFLP) analysis. Step 1, isolation of DNA from tree tissues; Step 2, restriction enzyme digestion of DNA to cleave DNA into small fragments; Step 3, electrophoresis of DNA samples on agarose gels to separate DNA fragments by size; Step 4, transfer of DNA fragments from gel to nylon membrane by Southern blotting technique; Step 5, hybridization of nylon membrane with specific radioactively labeled DNA probe; Step 6, exposure of nylon membrane to x-ray film (autoradiography); and Step 7, autoradiogram showing RFLP bands.

60 Genetic Markers Box 4.1. Restriction enzymes and DNA cloning. Experimental methods to study the structure and expression of individual genes were first developed in the 1970s. Together these methods are known as recombinant DNA technologies. The essence of recombinant DNA technology is to isolate a specific fragment of DNA (often an entire gene) from the genome and then introduce the fragment into a foreign host genome (usually a bacteria or virus), where large quantities of the recombinant DNA fragment can be amplified and isolated. Recombinant DNA technology was made possible by the discovery of enzymes from bacteria that cleave DNA at specific locations. These enzymes are called restriction enzymes or restriction endonucleases, because their function in bacteria is to recognize, cleave and destroy foreign DNA invading the bacterial cell. The first restriction enzyme discovered was called EcoRI, because it was isolated from E. coli. The number of restriction enzymes now available is in the hundreds. Most restriction enzymes recognize a specific DNA sequence at the location where they cleave the DNA. This is called a recognition sequence and for EcoRI the sequence is GAATTC (Fig. 1). This sequence is called a palindrome, because it is identical on both complementary DNA strands when read in the same direction (5’ to 3’ or 3’ to 5’). Most restriction enzymes also cleave the DNA at a specific location within the recognition sequence. EcoRI cleaves between the G and A.

Fig. 1. Schematic representation of DNA cleaving by EcoRI.

Molecular biologists learned that restriction enzymes were not only useful tools for cleaving large DNA molecules into small fragments, but that these enzymes could also be used in the process to clone DNA. The simplest form of DNA cloning involves inserting a fragment of foreign DNA into an E. coli plasmid vector and then reintroducing the recombinant plasmid back into the bacterial host (called transformation). The bacteria can then be grown in culture to produce large quantities of the plasmid containing the foreign DNA fragment to be studied. The entire cloning process is shown in Fig. 2. In step 1, the plasmid DNA is isolated from the bacteria. In practice, this is rarely done by individual researchers because plasmid DNA can be purchased from suppliers. In step 2, the circular plasmid DNA is linearized by cleaving it with a restriction enzyme. Plasmid vectors have been genetically engineered to include a number of restriction sites that can be used for cloning. In step 3, the foreign DNA fragment to be cloned is cleaved with the same restriction enzyme as was used to cleave the plasmid vector. In step 4, the foreign DNA fragment to be cloned is isolated from the rest of the DNA, usually by electrophoresis and gel purification. (Box 4.1 continued on next page)

Genetic Markers

Box 4.1. Restriction enzymes and DNA cloning. (Continued from previous page.)

Fig. 2. Process of DNA cloning

(Box 4.1 continued on next page)

61

62 Genetic Markers Box 4.1. Restriction enzymes and DNA cloning. (Continued from previous page.) At this point (step 5), it is now possible to join the foreign DNA fragment with the linearized plasmid. The cleaving of both DNAs with EcoRI leaves a four-base (AATT) single-stranded end. These are called “sticky ends.” Through complementary base pairing, the two DNA fragments join together perfectly. All that remains to reconstitute a circular, double-stranded DNA molecule is to form a bond between the nucleotide bases that were separated by cleaving with the restriction enzyme. This is accomplished by the enzyme DNA ligase and the process is called a ligation reaction. Finally, the recombinant plasmid is re-introduced into the bacterial host (step 6) by transformation. Now the transformed E. coli can be grown in culture to produce large quantities of plasmid DNA. Cloning of DNA fragments makes it possible to study aspects of the DNA, such as determining its nucleotide sequence, which would not be possible without cloning because large quantities of DNA are generally needed for such assays.

Restriction fragment length polymorphism analysis has been applied to both chloroplast and mitochondrial genomes to study: (1) Phylogenetic relationships (Wagner et al., 1991, 1992; Tsumura et al., 1995)(Chapter 9); (2) Genetic variation within species (Ali et al., 1991; Strauss et al., 1993; Ponoy et al., 1994); and (3) Modes of organelle DNA inheritance (Chapter 3). Restriction fragment length polymorphism analysis of nuclear genomes of forest trees has not been as widely applied, due to the technical challenges of performing Southern blot analysis, especially with the large genomes of conifers. Nuclear DNA RFLP probes are available for a few conifer, Populus, and Eucalyptus species. These probes have been used to construct genetic linkage maps for a number of species as discussed in Chapter 18. Box 4.2. Complementary DNA (cDNA) cloning. The ability to obtain millions of copies of individual genes is fundamentally important for molecular genetics research. Molecular biologists developed a very clever method, called complementary DNA (cDNA) cloning, to obtain cloned copies of expressed genes. Complementary DNA cloning usually involves cloning many genes at once, which together form cDNA libraries (Fig. 1). In step 1, mRNA is isolated from one or more tissues. In trees for example, mRNA has been isolated from xylem to obtain a cDNA library of genes expressed in xylem. The mRNA has a tail consisting of many A’s (called a polyadenylated tail) which provides a convenient location to attach a primer. In step 2, a poly-T primer is annealed to the poly-A tail. In step 3, the enzyme reverse transcriptase is added, along with free A, T, C, and G nucleotides, to synthesize a DNA strand complementary to the mRNA strand. In step 4, an enzyme called RNAseH is added which digests away the original mRNA template, leaving just the newly synthesized single-stranded DNA copy. In step 5, a DNA strand complementary to the first DNA strand is synthesized by the enzyme DNA polymerase I. Finally in step 6, the double-stranded DNA molecule is inserted into a plasmid or viral vector using the enzyme DNA ligase (Box 4.1). (Box 4.2 continued on next page)

Genetic Markers

63

64 Genetic Markers Box 4.2. Complementary DNA (cDNA) cloning. (Continued from previous page) Once the cDNA is put into the vector, the vector can be transformed into a bacterial host that can be cultured to produce large quantities of the cDNA.

Fig. 1. Steps in cDNA cloning.

Genetic Markers

65

Fig. 4.4. Molecular interpretation of RFLPs resulting from all possible homozygous and heterozygous combinations of three different alleles at a single marker locus: (a) Allele 1 is the wild-type, allele 2 has a mutation that creates a new EcoRI restriction site and allele 3 has an insertion of a segment of DNA within the wild-type fragment; and (b) RFLP band patterns of all possible homozygous and heterozygous combinations of alleles 1, 2 and 3. The homozygous 11 genotype has just one band whereas the homozygous 22 genotype has two bands because the third EcoRI site creates two fragments from what was once one. The homozygous 33 band is slightly larger than the homozygous 11 band, reflecting the insertion.

Molecular Markers Based on the Polymerase Chain Reaction The polymerase chain reaction (PCR) was one of the fundamentally most important biological discoveries in the 20th century and earned its inventor, Kerry Mullis, a Nobel prize (Mullis, 1990). Before PCR, the analysis of a specific DNA fragment generally required cloning of the fragment and amplification in a plasmid or comparable vector (Box 4.1). The polymerase chain reaction enables the production of a large amount of a specific DNA sequence without cloning, starting with just a few molecules of the target sequence. One advantage of PCR-based marker methods over DNA-DNA hybridization marker methods is that the latter method requires isolation of large quantities of DNA. The polymerase chain reaction has three basic steps: (1) Denaturing of the doublestranded DNA template; (2) Annealing of a pair of primers to the region to be amplified; and (3) Amplification using a heat-resistant DNA polymerase called Taq polymerase (Fig. 4.5). The completed sequence of the three steps is called a cycle and at the end of the first

66 Genetic Markers cycle two new double-stranded molecules arise from the original template. These molecules serve as templates during subsequent cycles so that a geometric amplification of molecules occurs. Specialized instruments, called thermocyclers, have been developed to carry out the PCR process. Discovery of the heat-resistant DNA polymerase was critical to

Fig. 4.5. The polymerase chain reaction (PCR) is used to amplify specific segments of DNA from complex genomes. The PCR involves three basic steps: Step 1, denaturing the DNA template; Step 2, annealing primers; and Step 3, synthesizing the complementary DNA strand using a heat resistant DNA polymerase. The three steps are repeated for many cycles to produce large quantities of the specific DNA segment. Only two cycles are shown in this example.

the development of the automated procedure because prior to this, fresh DNA polymerase

Genetic Markers

67

had to be added each cycle because it was destroyed by high temperature during the denaturing step. The polymerase chain reaction is used not only for DNA marker technology but also for a variety of recombinant DNA assays. Many procedures in molecular biology that previously required cloning of DNA can now be performed by PCR. Random Amplified Polymorphic DNA Random amplified polymorphic DNA (RAPD) markers have been the most widely used molecular marker type in forest trees to date. They were the first of the PCR-based markers and were developed independently by Welsh and McClelland (1990) and Williams et al. (1990). The RAPD marker system is easy to apply as no prior DNA sequence information is needed for designing PCR primers as is required for other PCR-based genetic marker systems. In the RAPD marker system (Fig. 4.6), a PCR reaction is conducted using a very small amount of template DNA (usually less than 10 nanograms) and a single RAPD primer. Primers are usually just 10 base pairs long (10-mers) and are of random sequence. There are several thousand primers commercially available, all with a different 10-base sequence, which in theory will all amplify different regions of the target genome. Therefore, the RAPD marker system has the potential to randomly survey a large portion of the genome for the presence of polymorphisms. The small amount of DNA needed is a big advantage of the RAPD technique versus RFLPs, because marker analysis can be applied to haploid conifer megagametophytes as was discussed for allozyme markers (Fig. 4.2). A specific segment of the genomic DNA is amplified when the RAPD primer finds its complementary sequence at a location in the genome and then again at a second nearby location, but in the opposite orientation from the first priming site. If both chromosomes of a homologous pair each have the forward and reverse priming sites (homozygous +/+), PCR amplification products of identical length are synthesized from both homologues and a RAPD band appears on a gel when the amplification product is electrophoresed (Fig. 4.6). Likewise, if both homologues are missing one or both of the priming sites (homozygous −/−), no amplification products are synthesized and no bands are seen on gels. If one homologue has both priming sites, but the other homologue is missing at least one (heterozygous +/−), then an amplification product results from the first homologue. The heterozygous (+/−) band pattern phenotype cannot be distinguished from the +/+ homozygote; therefore, RAPD markers are dominant, di-allelic (i.e. only two alleles (+ and -) are expressed at each locus), genetic markers. The + (band) and – (no band) phenotypes are distinguishable in haploid megagametophytes. Therefore, conifer trees heterozygous for a RAPD marker will segregate for + and – phenotypes in megagametophytes of their seeds, while only + phenotypes will be observed in megagametophytes of +/+ homozygotes. In this manner, +/− heterozygotes can be distinguished from +/+ homozygotes in conifer mother trees. Since a single RAPD primer can anneal to many locations in the genome, multiple loci are revealed by a single primer. Therefore, it is possible to obtain a large number of RAPD genetic markers in a short amount of time and at relatively low cost. Carlson et al. (1991) first demonstrated the use of RAPD markers in trees by showing the inheritance of RAPD markers in F1 families of Pseudotsuga menziesii and Picea glauca. In a subsequent paper, Tulsieram et al. (1992) used RAPD markers and megagametophyte segregation analysis to construct a partial genetic linkage map for Picea glauca. Random amplified polymorphic DNA markers have since been used for linkage mapping and marker analyses in dozens of tree species (Cervera et al., 2000). However, as the popularity of RAPD markers increased, difficulty in establishing marker repeatability

68 Genetic Markers across laboratories slowly manifested itself. Therefore, although RAPD markers are easy and quick to use, they have less overall value than the earlier allozyme and RFLP markers because of the problems with repeatability.

Fig. 4.6. The random amplified polymorphic DNA (RAPD) marker system involves a small number of steps and all are generally easy to apply in forest trees: Step 1, DNA is isolated; Step 2, DNA is amplified by PCR (Fig. 4.5) using single 10-mer primers; and Step 3, RAPD products are electrophoresed and bands are visualized by staining gels with ethidium bromide.

Amplified Fragment Length Polymorphism

Genetic Markers

69

Amplified fragment length polymorphism (AFLP) markers are a recent development (Vos et al., 1995). They are like RAPDs in that many markers can be assayed quickly using PCR and they are generally dominant; but, AFLPs appear to be more repeatable than RAPDs. AFLP markers also are similar to RFLPs because they survey the genome for the presence of restriction fragment polymorphisms. The first report of the use of AFLPs in trees was by Cervera et al. (1996) who used this marker system to genetically map a disease resistance gene in Populus. Genetic linkage maps based on AFLPs have also been constructed in Eucalyptus globulus and E. tereticornis (Marques et al., 1998) and in Pinus taeda (Fig. 4.7; Remington et al., 1998). Simple Sequence Repeat Simple sequence repeat (SSR) markers were first developed for use in genetic mapping in humans (Litt and Luty, 1989; Weber and May, 1989), and are another name for microsatellite DNA (Chapter 2). Short, tandemly-repeated sequences of two, three or four nucleotides are found throughout the genome. For example, the dinucleotide repeat AC is commonly found in Pinus genomes. Since the number of tandem repeats at a locus can vary greatly, SSR markers tend to be amongst the most polymorphic genetic marker types. For example, one allele might have 10 copies of the AC tandem repeat (AC)10 , whereas another allele would have 11 copies (AC)11, another 12 copies (AC)12, and so forth. Simple sequence repeat genetic markers require a considerable investment to develop. Genomic DNA libraries rich in microsatellite sequences must be created and screened for clones containing SSR sequences (Ostrander et al., 1992). The DNA sequence of these clones must be determined (Box 4.3), because the unique sequence regions flanking the SSR are needed to design PCR primers to amplify SSR sequences from individual samples. Once a pair of primers is developed to amplify the SSR region, it must be determined whether there is polymorphism for the SSR and whether band patterns on gels have simple genetic interpretations (Fig. 4.8). Some of the first SSR markers developed in trees were from the chloroplast genome (Powell et al., 1995; Cato and Richardson, 1996; Vendramin et al., 1996). Development of these markers was made easier because the complete DNA sequence of the entire chloroplast genome of Pinus thunbergii was known (Wakasugi et al., 1994a,b). The SSR sequences were found by a computer search of the entire cpDNA sequence database (see Chapter 18 for a discussion of database searching and comparison of DNA sequences). Furthermore, since cpDNA sequences are highly conserved among related plant taxa, PCR primers designed from sequences flanking SSR sequences in P. thunbergii should easily amplify homologous sequences in other Pinus species. The cpDNA SSRs are highly polymorphic relative to other types of cpDNA markers and are useful for many types of studies. For example, because cpDNA is paternally inherited in conifers (Chapter 3), cpDNA markers are useful for determining male parentage of offspring (paternity analysis) (Chapter 7) and for following the dispersal of pollen in populations where the SSR genotypes of all male trees are known (Stoehr et al., 1998). Nuclear DNA SSRs have been developed for several forest trees, including species of Pinus (Smith and Devey, 1994; Kostia et al., 1995; Echt et al., 1996; Echt and MayMarquardt, 1997; Pfeiffer et al., 1997; Fisher et al., 1998), Picea (van de Ven and McNicol, 1996), Quercus (Dow et al., 1995) and Populus (Dayanandan et al., 1998). Each of these studies describes the isolation and cloning of a small number of SSRs, their inheritance patterns, and their utility for related species.

70 Genetic Markers

Fig. 4.7. An autoradiogram of amplified fragment length polymorphisms (AFLP) of Pinus taeda megagametophyte DNA samples. The first and last lanes are DNA samples of known molecular weight that are used to estimate the molecular weights of the pine DNA samples. All other lanes represent DNA in a sample of 64 megagametophytes from a single Pinus taeda seed tree. Each of the horizontal bands represents a different genetic locus. Bands of identical migration on the gel that are segregating for the presence or absence of a band are from polymorphic loci; all other loci are monomorphic in this seed tree. (Photo courtesy of D. Remington, North Carolina State University, Raleigh).

Genetic Markers

71

Expressed Sequence Tagged Polymorphisms Expressed sequence tagged polymorphisms (ESTPs) are PCR-based genetic markers that are derived from expressed sequenced tags (ESTs). Expressed sequenced tags are partial cDNA sequences that have been obtained by automated DNA sequencing methods (Chapter 18); therefore, ESTPs are a genetic marker for structural gene loci. The EST databases contain hundreds of thousands of entries from a variety of organisms, most notably Arabidopsis thaliana, rice, and maize in plants. In forest trees, there are EST databases for Pinus, Populus, and Eucalyptus. The ESTs are routinely compared to DNA sequence databases to determine their biochemical function. It is also a goal of most genome projects to place the ESTs onto genetic linkage maps. Expressed sequenced tags can be genetically mapped by a variety of methods, all of which rely on detecting polymorphism for the ESTs, hence the name ESTPs for the genetic marker. Box 4.3. Methods to determine the nucleotide sequence of a DNA fragment. The ability to determine the nucleotide sequence of a gene or any piece of DNA is fundamentally important to understanding how genes work and the genetic variation found at the DNA sequence level. Chemical methods to determine the nucleotide sequence of a piece of DNA were first developed in the 1970s. In earlier years, it would take months to determine the sequence of just a small piece of DNA. In the 1990s, automated DNA sequencing technologies were developed that allowed individual labs to sequence millions of bases in a single day (Chapter 18). These technologies have enabled the determination of the entire DNA sequence of the human genome, drosophila, mice, chicken, Arabidopsis, and others. The first tree genome to be completely sequenced is Populus trichocarpa, and sequencing of others is likely to be undertaken in the near future. Although DNA sequencing is currently done by automated methods, it is important to understand the basic chemistry of the manual methods that preceeded automated technologies. There are several different manual methods for determining the nucleotide sequence of a DNA molecule. The most often used approach is the chain-termination method of Sanger et al. (1977). The DNA sequence is determined by synthesizing partial complementary strands of DNA. In this example, we wish to determine the unknown sequence of a small fragment of DNA whose actual sequence is ATGCATGC (Fig. 1). In Step 1, a primer is annealed to a single-stranded DNA molecule. The DNA to be sequenced has generally been cloned and the single-stranded copy is derived from the recombinant plasmid-cloning vector (Box 4.1). The primer is complementary to the cloning site within the vector. In Step 2, four different sequencing reactions are set up. To each reaction, a different di-dioxy XTP is added (dd-ATP, dd-TTP, dd-GTP, dd-CTP). When DNA polymerase incorporates a dd-XTP in the synthesis of the complementary strand versus a normal dioxy-XTP (d-XTP), the sequencing reaction is terminated because the dd-XTP will not chemically bind to a complementary d-XTP. The dd-XTPs are in low concentration in the reaction mixes, so it is only occasionally and randomly that a dd-XTP is added. In this example, two different chain-termination products are synthesized in each reaction, each corresponding to the base complementary to the dd-XTP in the reaction. In Step 3, the reaction mixes are electrophoresed on polyacrylamide gels by loading one lane for each of the four reaction mixes. The DNA strands produced by chain-termination separate by size. The DNA sequence of the complementary strand is read from the bottom of the gel upwards; in this example, it is TACGTACG. Finally, the complement of this sequence is determined to obtain the sequence of the original single-stranded copy, which is ATGCATGC. (Box 4.3 continued on next page)

72 Genetic Markers

Genetic Markers

73

Box 4.3. Methods to determine the nucleotide sequence of a DNA fragment. (Continued from previous page)

Fig. 1. Illustration of the steps required in determining the nucleotide sequence in a fragment of DNA.

74 Genetic Markers

Fig. 4.8. The simple sequence repeat (SSR) or microsatellite marker system requires significant time and cost to develop; however, it is fairly easy to apply once markers are developed: (a) Primers complementary to unique sequence regions flanking the SSRs are used to amplify the SSR sequence by PCR; allele 1 has seven copies of the ATC repeat and allele 2 has six copies of the ATC repeat; and (b) A simplified gel pattern of the two homozygous (11 and 22) and the heterozygous (12) SSR genotypes. Since all three genotypes are distinguishable, this is a codominant marker.

All approaches to ESTP detection require a pair of PCR primers to be designed from the EST sequence. The primers are then used to amplify genomic DNA fragments by PCR. The next step is to reveal polymorphism among amplification products and for this a variety of methods have been used. In Picea mariana, Perry and Bousquet (1998) were able to detect length variation among ESTP alleles when amplification products were analyzed on standard agarose gels. This is the fastest and simplest technique for revealing ESTP variation, although it seems unlikely that such length variation can be found for the majority of ESTs. Polymorphisms surely exist as nucleotide substitutions among alleles; however, these polymorphisms are more problematic to detect.

Genetic Markers

75

In Cryptomeria japonica (Tsumura et al., 1997) and in Pinus taeda (Harry et al., 1998), researchers have digested amplification products with restriction enzymes to reveal polymorphism. This approach is similar to RFLP analysis and the markers are sometimes called PCR-RFLPs or cleaved amplified polymorphisms (CAPs). More sensitive techniques for revealing polymorphisms are available such as simple sequence conformational polymorphism (SSCP) or density gradient gel electrophoresis (DGGE) analysis (Fig. 4.9). Temesgen et al. (2000, 2001) have used the DGGE technique in Pinus taeda to develop a large number of codominant markers. Cato et al. (2000) developed a different method for mapping ESTs that is similar to the AFLP method, except that instead of using two random primers as in AFLP, one primer is complementary to the EST sequence. An important advantage to ESTP markers over most of the other PCR-based genetic markers, such as RAPDs, AFLPs and SSRs, is that ESTPs most likely reveal variation in

Fig. 4.9. Expressed sequence tagged polymorphism (ESTP) markers can be revealed by several different polymorphism detection methods. An EST from the same 12 individuals (lanes 1-4 and 7-10 are the grandparents of two Pinus taeda pedigrees and lanes 5-6 and 11-12 are the parent trees of these pedigrees) is analyzed using three approaches: (a) No polymorphism is revealed in the PCR amplification products separated on agarose gels; (b) Two different alleles were detected in this sample after the DNAs were digested with one restriction enzyme; and, (c) Multiple codominant alleles were revealed when ESTP amplification products were assayed by denaturing gradient gel electrophoresis (DGGE).

76 Genetic Markers structural gene loci whereas the others generally reveal variation in non-coding regions of the genome. Such structural gene markers are called genic markers versus non-genic markers. This distinction can be important for some applications of genetic marker analysis, such as candidate gene mapping and the discovery of single nucleotide polymorphisms (SNPs) within candidate genes. SNPs are the most recently developed marker type and are covered in detail in Chapter 18.

SUMMARY AND CONCLUSIONS Genetic markers are essential for a large number of types of genetic investigations in forest trees. Genetic markers are often used with natural populations of forest trees to determine amounts and patterns of genetic variation, to understand mating systems and inbreeding, and to study taxonomic and phylogenetic relationships among species. Genetic markers are routinely used to monitor the efficiency of various tree improvement activities and are necessary for the construction of genetic maps and marker-aided breeding. Morphological genetic markers are available in forest trees; however, the number of such markers is limited and they are often associated with a deleterious phenotype. Nevertheless, seedling mutant characters have been used to study mating systems in a number of conifer species. Several types of biochemical markers exist and have been extremely valuable for genetic studies in trees. Monoterpenes were the first biochemical markers in trees and have been mostly used for taxonomic studies in pines. The small number of monoterpene markers and their dominant expression have limited their utility for other applications. Studies based on allozymes, the most widely used of all genetic markers to date, have contributed greatly to our knowledge of the population genetics of forest trees. Allozymes are fairly easy to learn how to assay and apply, and are highly polymorphic and codominant. They have been used to describe patterns of genetic variation both within and among forest tree populations and for estimating mating systems and gene flow. Allozymes are also very useful for genetic fingerprinting and paternity analysis. Molecular, or DNA-based markers are the most recent to be developed and have many advantages over morphological and biochemical markers. The primary advantages are: (1) There is potentially an unlimited number of molecular markers available; and (2) DNA markers are generally not affected by developmental differences or environmental influences. There are two general classes of molecular markers: those based on DNA-DNA hybridization and those based on the polymerase chain reaction (PCR). Restriction fragment length polymorphism (RFLP) markers rely on DNA-DNA hybridization and have been used for organelle genetic analysis and genetic linkage mapping in forest trees. The three PCR-based molecular marker types used widely in forest trees are: (1) Random amplified polymorphic DNA (RAPD); (2) Amplified fragment length polymorphisms (AFLP); and (3) Simple sequence repeats (SSR). All three of these marker types generally reveal polymorphism in non-coding regions of genomes. The RAPDs and AFLPs are diallelic, dominant markers, whereas SSRs are codominant and multiallelic. A new PCRbased marker type called expressed sequence tag polymorphism (ESTP) potentially overcomes many of the limitations of the other PCR-based marker types. The ESTPs are codominant, multiallelic and reveal polymorphism within, or in sequences flanking, expressed structural genes. Therefore, ESTPs have significant potential for studies of adaptive genetic variation in forest trees.

Suggest Documents