Chloroplast genome sequences from total DNA for plant identification

Plant Biotechnology Journal (2011) 9, pp. 328–333 doi: 10.1111/j.1467-7652.2010.00558.x Chloroplast genome sequences from total DNA for plant identi...
Author: Everett West
13 downloads 0 Views 243KB Size
Plant Biotechnology Journal (2011) 9, pp. 328–333

doi: 10.1111/j.1467-7652.2010.00558.x

Chloroplast genome sequences from total DNA for plant identification Catherine J. Nock1, Daniel L.E. Waters1, Mark A. Edwards1, Stirling G. Bowen1, Nicole Rice1,2, Giovanni M. Cordeiro1 and Robert J Henry1,2,*,† 1

Centre for Plant Conservation Genetics, Southern Cross University, Lismore, NSW, Australia

2

Australian Plant DNA Bank, Lismore, NSW, Australia

Received 3 March 2010; revised 6 June 2010; accepted 8 June 2010. *Correspondence (Tel +61 733460552; fax +61 733651188; email [email protected]) † Present address: Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Qld 4072 Australia. Sequence accession numbers: Genbank Accession GU592207–GU592211.

Summary Chloroplast DNA sequence data are a versatile tool for plant identification or barcoding and establishing genetic relationships among plant species. Different chloroplast loci have been utilized for use at close and distant evolutionary distances in plants, and no single locus has been identified that can distinguish between all plant species. Advances in DNA sequencing technology are providing new cost-effective options for genome comparisons on a much larger scale. Universal PCR amplification of chloroplast sequences or isolation of pure chloroplast fractions, however, are nontrivial. We now propose the analysis of chloroplast genome sequences from massively parallel sequencing (MPS) of total DNA as a simple and cost-effective option for plant barcoding, and analysis of plant relationships to guide gene discovery for biotechnology. We present chloroplast genome sequences of five grass species derived from MPS of total DNA. These data accurately established the phylogenetic

Keywords: genome, chloroplast,

relationships between the species, correcting an apparent error in the published rice

DNA barcode, plant, massively

sequence. The chloroplast genome may be the elusive single-locus DNA barcode for

parallel sequencing.

plants.

Introduction Analysis of plant DNA for identification of plant species and genotypes has generally replaced earlier techniques

sons, while higher mutation rates are needed to discriminate among closely related species (Heinze, 2007). Although the mitochondrial gene CO1 has been

based upon other biochemical markers. DNA analysis techniques have evolved becoming increasingly more discriminating and amenable to routine low-cost applications (Henry, 2008). However, a universal protocol that could

widely adopted as an efficient DNA barcode for animal species (Hebert et al., 2003), no such single locus has been identified for plants (Rubinoff et al., 2006), and there has been considerable debate surrounding the selection of the most suitable loci (Chase and Fay,

be applied to any unknown sample has not been achieved. It has been difficult to define discriminating loci in the nuclear genome that could be analysed in all plants with a standard protocol.

2009; Kress and Erickson, 2008). Recently, a two-locus land plant barcode consisting of portions of the chloroplast genes rbcL and matK has been proposed (CBOL, 2009). The plant barcode was limited to two loci to

Chloroplasts contain both highly conserved genes fundamental to plant life and more variable regions which are informative over broad time scales. When PCR is used to retrieve discrete chloroplast regions for phylogenetic analysis and barcoding, the target sequences need to be care-

reduce the cost and time involved in bidirectional Sanger sequencing. Primer universality and species discrimination, however, were suboptimal especially for non-angiosperm land plants. The potential difficulties

fully selected. Sequences with relatively low mutation rates are required for higher level phylogenetic compari-

328

associated with targeting particular chloroplast regions for barcoding are circumvented by sequencing the chloroplast genome.

ª 2010 The Authors Plant Biotechnology Journal ª 2010 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd

Chloroplast genome sequences for plant identification 329

Conventional approaches to chloroplast genome sequencing commonly involve purification or PCR amplification of the chloroplast genome prior to sequencing. More recently, massively parallel sequencing (MPS) has been used to capture sequence data from many individual multiplexed chloroplast PCR amplicons (Cronn et al., 2008; Parks et al., 2009). These approaches are, however, relatively time consuming. Non-purified (total) DNA extractions include chloroplast DNA which is sequenced during MPS runs and is treated as many applications. However, for DNA plant barcoding and tionships. Plant identification

contaminating sequence for these sequences have utility the exploration of plant relahas many uses in enforcing

intellectual property rights and in quality control in plant and food production and processing. Phylogenetically guided identification of close relatives has the potential to aid gene discovery for crop improvement, and access to chloroplast genome sequences will be of benefit for targeted chloroplast transformation (Daniell et al., 2005). As the primary staple food for over half of the global population, rice is the world’s most important crop and is a model plant species for genetic studies. The wild relatives of cultivated rice provide a broad gene pool for improved food security as the human population expands in an uncertain climatic future. This gene pool will be accessed more efficiently by comparison of wild rice genome sequence data with the complete rice genome sequence (Goff et al., 2002; IRGSP, 2005). The Illumina GA (http://www.illumina.com) generates short sequence reads of up to 100 base pairs (bp) which can be converted to contiguous DNA sequence data by reference-guided assembly using an existing genome sequence as a scaffold. This approach was used to retrieve chloroplast sequence from a single 36- bp paired end run of cultivated rice Oryza sativa japonica (cultivar Nipponbare) and wild relatives Oryza meridionalis, Oryza australiensis and Potamophila parviflora from the tribe Oryzeae, and Microlaena stipoides from the Ehrharteae. The aim of this study was to determine the extent to

Results The total aligned data set of five grass species and the rice reference was 134 551 bp in length. One of the two inverted repeat regions was excluded from subsequent phylogenetic analysis. The modified alignment was 113 749 bp in length. There were 109 665 constant positions. Of the 4084 variable positions, 935 were parsimony informative in-group substitutions. Phylogenetic analysis of the 113 749 -bp alignment resulted in the construction of trees which conform to the accepted phylogeny (Guo and Song, 2005; Kellogg, 2009). Single optimal phylogenetic trees obtained by maximum likelihood (-lnL = 186669.13), maximum parsimony (MP) (5178 steps; consistency index CI = 1.00) and Bayesian analysis shared the same topology, with maximum bootstrap (100%) and posterior probability (1.00) for all nodes (Figure 1). Reference-guided assembly of rice MPS re-sequence data against a complete chloroplast genome sequence of the same cultivar (Tang et al., 2004) generated a consensus sequence of identical length (134 551 bp) to the reference (Table 1). The re-sequenced sample differed from the reference by one base. A guanine ‘G’ was present at position 8128 of the reference, while an adenine ‘A’, confirmed by Sanger sequencing, was detected at this position in the re-sequenced sample (32 times coverage; 100% were A). Interestingly, an ‘A’ was also found in this position in the other four grass species included in this study and in a previously published Oryza nivara chloroplast genome sequence (Genbank accession AP006728.1). This suggests an error in the reference sequence and demonstrates the potential accuracy of this technique. The number of assembly gaps increased with evolutionary distance from the reference (Table 1), ranging from zero in the control (O. sativa japonica), to one in O. meridionalis (an 8 -bp gap in the psaJ-rpl33 intergenic spacer), 19 in O. aus-

which chloroplast genome sequences can be recovered by MPS from total DNA. Re-sequenced cultivated rice was included as a control to test the accuracy of chloroplast sequence data recovered by reference-guided assembly using a previously published rice chloroplast genome of the same cultivar (Nipponbare) as a scaffold. Wild relatives of increasing evolutionary distance from rice were selected to test the utility of this approach in constructing chloroplast genomes with a more distant reference scaffold.

Figure 1 Chloroplast genome phylogeny of Oryzeae. Phylogenetic tree of aligned chloroplast sequences derived from reference assembly of short-read sequences to cultivated rice Oryza sativa japonica. The same topology was obtained using maximum parsimony (MP) and maximum likelihood (ML) and Bayesian analysis. Nodal support is shown above branches as ML ⁄ MP (% bootstrap) ⁄ Bayesian (posterior probability). Scale is substitutions per site.

ª 2010 The Authors Plant Biotechnology Journal ª 2010 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 9, 328–333

330 Catherine J. Nock et al.

Table 1 Summary statistics for reference-guided assembly of short-read massively parallel sequencing data Reads aligning Genome

Median

SNPs relative

Assembly

Assembly gaps

Genbank

genome (%)

coverage

to reference

gaps (bp)

(% intragenic)

accession

size (Mbp)

Oryza sativa japonica

389*

9 559 560

4.99

108

1

0

0.00

GU592207

Oryza meridionalis

493†

11 654 060

3.82

112

91

8

0.00

GU592208

Oryza australiensis

965à

25 890 716

11.63

713

807

179

3.47

GU592209



43 652 474

7.42

751

2319

479

8.37

GU592210

869§

23 151 138

2.77

174

4198

8230

4.71

GU592211

Potamophila parviflora Microlaena stipoides

Reads

to chloroplast

Species

Assembly of 36- bp paired-end reads from five grass species to a complete chloroplast genome sequence (O. sativa japonica, cv. Nipponbare; Genbank accession AY522330.1). Genome size is according to *IRGSP (2005), †Uozo et al. (1997), àAmmiraju et al. (2006), §derived from 2C-value in Murray et al. (2005) using the conversion factor 965 Mbp = 1 pg.—is unknown.

traliensis (179 bp in total), 61 in P. parviflora (479 bp in total) and 263 gaps in M. stipoides (8230 bp in total). The majority of assembly gaps (>91%) were in non-coding regions as illustrated in an mVISTA alignment plot (Figure 2). In O. australiensis, the only intragenic gap was 6 bp in length

suggesting that indels or highly divergent sequence prevented assembly of MPS reads in these regions, rather than incomplete sequence data, as median coverage exceeded 100 times. Indels, located primarily in non-coding regions, are the

in rbcL. In P. parviflora, six intragenic gaps in total were found in four genes: matK, RpoC2, rpl33 and ndhH. In M. stipoides, a total of 32 intragenic gaps were located in 13 genes, with the majority (61%) in accD and rpoC2. Assembly

most abundant chloroplast mutations and are often associated with repetitive elements such as simple sequence repeats, cpSSRs (Ebert and Peakall, 2009). Recovery of indels and repetitive elements is problematic in the assem-

gaps within coding regions did not disrupt reading frames, with the exception of the accD region in M. stipoides. A start codon was also lacking suggesting that the chloroplast accD gene is non-functional in this species, a finding that has been

bly of contiguous sequence from short reads, and is of particular concern for population level studies (Harismendy et al., 2009; Kidd et al., 2010). The impact of missing data has been explored in the context of phylogenomics and it

observed in other grasses (Diekmann et al., 2009). Of 91 SNPs identified between the closely related species O. meridionalis and O. sativa japonica, 73 were located in the large single copy region and 16 were in the

has been found that trees based on the sequence of many genes, supermatrices, are surprisingly robust to high levels missing data (Delsuc et al., 2005). In one case, the integrity of the tree was maintained with 90% missing data

small single copy region. Only two were located in the inverted repeats (IR) (one in each IR at positions 90 581 and 124 575). The majority of SNPs (73%) were in noncoding regions, and within coding regions only synonymous SNPs were found. Multiple SNPs were identified in

(Philippe et al., 2004). Our data demonstrate that MPS platforms have the capacity to sequence the chloroplast genome at over 100 times coverage in a single lane without purification (Table 1). Despite representing a small fraction of total

the genes matK (2), rpoC2 (3), rbcL (3) and ndhD (2).

DNA sequence, 0.04% in rice, the concentration of chloroplast sequence reads is high relative to nuclear sequence in total DNA preparations. Chloroplast genome sequences of over 100 species of land plants have been deposited in

Discussion The chloroplast genome sequences of five grass species attained by reference-guided assembly of short-read sequences to a reference sequence from cultivated rice allowed construction of trees which conform to the

the public domain (Cui et al., 2006), and these are available as scaffolds for reference assembly of MPS data. Initial assembly of highly conserved chloroplast regions (such as the IR) for unidentified samples and species that have

accepted phylogeny. Evolutionary distance from the reference ranged from intravarietal to intertribal, with divergence times from recent to around 40 million years ago (Kellogg, 2009). The number of assembly gaps increased

not been previously sequenced will allow selection of the most appropriate (closely related) published chloroplast genome for reference assembly. Chloroplast sequence data has the capacity to accu-

with increasing genetic distance from the reference. Gaps were located primarily in non-coding regions (Figure 2)

rately identify different species, a characteristic which is exploited in DNA barcoding (Kress et al., 2009; Lahaye

ª 2010 The Authors Plant Biotechnology Journal ª 2010 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 9, 328–333

Chloroplast genome sequences for plant identification 331

nearly complete chloroplast genome sequences derived from MPS of chloroplast amplicons (Parks et al., 2009). The increased resolution was attributed to increased matrix length and numbers of informative sites, leading the authors to conclude that plastome sequences have potential as plant DNA barcodes. An advantage with the approach presented in this study is that universal primers and PCR amplification were not required. Techniques deployed to enrich for chloroplast sequences, PCR and purification of DNA preparations with cesium chloride gradients, for example may be unnecessary in future to capture plastome sequence data. The cost per lane of Illumina GA sequencing is now less than US$1500 and in this study gave more than 100-fold coverage of the chloroplast genome (Table 1). Recent improvements in MPS have led to an increase in the length and number of sequence reads. For example, the Illumina HiSeq 2000 promises a full order of magnitude increase in depth of genome sequence over the Illumina GA. When used in conjunction with multiplexed sequencing approaches, MPS of chloroplast genomes is rapidly becoming a simple and cost-effective alternative to bidirectional Sanger sequencing of PCR-amplified partial chloroplast genes. Current limitations to the use of chloroplast genome sequences derived through MPS of total DNA for plant barcoding include recovery of indels that may be necessary to discriminate between recently diverged species and unavailability of a reasonably close reference for some highly divergent taxa or less-studied groups. These problems are likely to diminish as ‘short-read’ sequences increase in length and more chloroplast genome sequences become available. Another issue for consideration is the applicability of this approach for species with large genomes. In this study, coverage of the chloroplast genome was not correlated with genome size (Table 1). Assembly of the chloroplast genome for highly polyploid sugarcane (Saccharum) species, with genome sizes exceeding 10 Gbp has been tested using MPS of Figure 2 Alignment of grass chloroplast genome sequences. mVISTAbased sequence conservation plots comparing aligned sequences to an annotated rice chloroplast genome. Position in the genome (bp) is shown on the x-axis and sequence conservation on the y-axis. Small arrows above the alignment indicate genes and their orientation. Purple regions correspond to coding sequence. Pink regions correspond to non-coding sequence and ribosomal genes. Thick black arrows show the position of the inverted repeats.

et al., 2008). Recently, markedly increased phylogenetic resolution among 32 gymnosperm species in comparison with that using a 2-locus barcode was demonstrated using

total DNA (unpublished data). Although direct comparison was not possible as longer read lengths (75 bp) were generated, median coverage of the chloroplast genome from a single lane of Illumina MPS data exceeded 100 times. The simplicity and efficiency of obtaining chloroplast genomes from total DNA is advantageous for the barcoding of plants. It obviates reliance on universal primers and decreases the risk of misidentification owing to the amplification of pseudogenes (Huang et al., 2005; Song et al.,

ª 2010 The Authors Plant Biotechnology Journal ª 2010 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 9, 328–333

332 Catherine J. Nock et al.

2008). Furthermore, it provides access to many variable sites to improve resolution even among closely related species. Chloroplasts are haploid and non-recombining, so they act as a single locus. The chloroplast genome therefore has the potential to become the elusive universal single-locus plant barcode for plant species identification. This approach provides a general strategy that may have wide applications for plant barcoding, species delimitation and to guide gene discovery by defining related species for plant biotechnology. This protocol is probably a good first option for plant identification in any forensic application and should have widespread application in the routine enforcement of the intellectual property rights of plant

addition replicates. and TBR branch swapping. M. stipoides was the outgroup in rooted trees with 1000 bootstrap replicates to evaluate nodal support. Bayesian phylogenetic analysis was conducted using MrBayes 3.1 (Ronquist and Huelsenbeck, 2004) using the GTR+I model (I = 0.73). Two independent runs of 1 · 106 Monte Carlo Markov Chains (MCMC) were performed following burn in of 1 · 105 MCMC, each starting with a different random tree. Nodal support for Bayesian consensus trees was evaluated by posterior probability distribution. Consensus sequences were annotated using DOGMA, Dual Organellar Genome Annotator (Wyman et al., 2004) and manually adjusted as needed before submission to Genbank. Aligned sequences and annotations for O. sativa japonica were used to construct sequence conservation plots in the program mVISTA (Frazer et al., 2004).

breeders.

Acknowledgements Experimental procedures DNA was extracted from leaf tissue using a Qiagen DNeasy kit. Approximately 3 lg of total DNA was sheared, polished and prepared following the manufacturer’s instructions (Illumina sample preparation protocol for paired-end sequencing) with the following modifications. Briefly, DNA was sheared using the adaptive focused acoustics method on a Covaris S2 device with the following settings: duty cycle 10%; intensity 5; cycles per burst 200 for 180 s at 6 C. Ligation products were purified by agarose gel electrophoresis (2% agarose, 120 V for 120 min). A narrow size range of predominantly 200- bp fragments was excised from the gel, and the products isolated with a QIAquick Gel Extraction kit without heating. PCR products were further purified with a QIAquick PCR Purification Kit and quantified using a DNA 1000 chip on an Agilent BioAnalyzer 2100. Approximately 4 pmol per individual and 3 pmol of the PhiX control lane were sequenced for 36 · 2 cycles on an Illumina Genome Analyser (GAII) following the manufacturer’s instructions. Base calling was performed with Illumina software Pipeline 1.4 (Illumina, San Diego, CA, US). Paired-end sequence reads were trimmed of low-quality data with a quality score limit of 0.01 and adaptor sequence in CLC Genomics Workbench 3.6.5 (http://www.clcbio.com) and reads of less than 15 base pairs (bp) in length were discarded. Trimmed short-read sequences were assembled using reference-guided assembly (read mapping) against a published rice chloroplast genome sequence (Genbank accession AY522330.1). Assembly was undertaken with CLC Genomics Workbench with the following short-read parameters: ungapped alignment; limit = 8; mismatch cost = 2. Match mode was random to allow for assembly of both inverted repeat regions and repetitive elements, and the conflict resolution mode was vote majority. Consensus sequences were exported to Geneious 4.7 (http:// www.genious.com) and aligned using Mauve (Darling et al., 2004). The best fitting nucleotide substitution models were selected using Modeltest and MrModeltest (Posada and Crandall, 1998). Aligned data were analysed under MP and maximum likelihood (ML) criteria using the TVM+G model (G = 0.1158) in PAUP* (http://www.paup.csit.fsu.edu). Gaps were treated as missing data. Heuristic searches were conducted with 20 random

The authors acknowledge the assistance of Sally Norton from the Australian Tropical Crops and Forages Collection for supply of seed samples. In addition, the authors acknowledge the technical assistance provided by Laura Homer, Shabana Kasem, Asuka Kawamata and Linda Hammond from the Centre for Plant Conservation Genetics, Southern Cross University.

References Ammiraju, J.S.S., Luo, M., Goicoechea, J.L., Wang, W., Kudrna, D., Mueller, C., Talag, J., Kim, H., Sisneros, N.B., Blackman, B., Fang, E., Tomkins, J.B., Darshan, B., MacKill, D., McCouch, S., Kurata, N., Lambert, G., Galbraith, D.W., Arumuganathan, K., Rao, K., Walling, J.G., Gill, N., Yu, Y., SanMiguel, P., Soderfund, C., Jackson, S. and Wing, R.A. (2006) The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res., 16, 140–147. CBOL (2009) A DNA barcode for land plants. Proc. Nat. Acad. Sci., 106, 12794–12797. Chase, M. and Fay, M. (2009) Barcoding of plants and fungi. Science, 325, 682–683. Cronn, R., Liston, A., Parks, M., Gernandt, D., Shen, R. and Mockler, T. (2008) Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Res., 36, e122. Cui, L., Veeraraghavan, N., Richter, A., Wall, K., Jansen, R., Leebens-Mack, J., Makalowska, L. and dePamphilis, C. (2006) ChloroplastDB: the chloroplast genome database. Nucleic Acids Res., 34, 692–696. Daniell, H., Kumar, S. and Dufourmantel, N. (2005) Breakthrough in chloroplast genetic engineering of agronomically important crops. Trends Biotechnol., 23, 238–245. Darling, A., Mau, B., Blattner, F. and Perna, N. (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res., 14, 1394–1403.

ª 2010 The Authors Plant Biotechnology Journal ª 2010 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 9, 328–333

Chloroplast genome sequences for plant identification 333

Delsuc, F., Brinkmann, H. and Herve´, P. (2005) Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Gen., 6, 361–375. Diekmann, K., Hodkinson, T.R., Wolfe, K.H., van den Bekerom, R., Dix, P.J. and Barth, S. (2009) Complete chloroplast genome sequence of a major allogamous forage species, perennial ryegrass (Lolium perenne L.). DNA Res., 16, 165–176. Ebert, D. and Peakall, R. (2009) Chloroplast simple sequence repeats (cpSSRs): technical resources and recommendations for expanding cpSSR discovery and applications to a wide array of plant species. Mol. Ecol. Resour., 9, 673–690. Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M. and Dubchak, I. (2004) VISTA: computation tools for comparative genomics. Nucleic Acids Res. 32, Web Server issue:W273–W279. Goff, S., Ricke, D., Lan, T.-H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller, P., Varma, H., Hadley, D., Hutchinson, D., Martin, C., Katagiri, F., Lange, B.M., Moughamer, T., Xia, Y., Budworth, P., Zhong, J.P., Miguel, T., Paszkowski, U., Zhang, S.P., Colbert, M., Sun, W.L., Chen, L.L., Cooper, B., Park, S., Wood, T.C., Mao, L., Quail, P., Wing, R., Dean, R., Yu, Y.S., Zharkikh, A., Shen, R., Sahasrabudhe, S., Thomas, A., Cannings, R., Gutin, A., Pruss, D., Reid, J., Tavtigian, S., Mitchell, J., Eldredge, G., Scholl, T., Miller, R.M., Bhatnagar, S., Adey, N., Rubano, T., Tusneem, N., Robinson, R., Feldhaus, J., Macalma, T., Oliphant, A. and Briggs, S. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 296, 92–100. Guo, Y.-L. and Song, G. (2005) Molecular phylogeny of Oryzeae (Poaceae) based on DNA sequences from chloroplast, mitochondrial, and nuclear genomes. Am. J. Bot., 92, 1548– 1558. Harismendy, O., Ng, P.C., Strausberg, R.L., Wang, X., Stockwell, T.B., Beeson, K.Y., Schork, N.J., Murray, S.S., Topol, E.J., Levy, S. and Frazer, K.A. (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 10:R32, doi:10.1186/gb-2009-1110-1183-r1132. Hebert, P.D.N., Cywinska, A., Ball, S.L. and deWaard, J.R. (2003) Biological identifications through DNA barcodes. Proc. R. Soc. Lond., B, Biol. Sci., 270, 313–321. Heinze, B. (2007) A database of PCR primers for the chloroplast genomes of higher plants. Plant Methods, 3: 4, doi:10.1186/ 1746-4811-3-4. Henry, R.J. (2008) Future prospects for plant genotyping. In Plant Genotyping II: SNP Technology. (Henry, R.J., ed), pp. 272–280, Wallingford: CABI. Huang, C., Gru¨nheit, N., Ahmadinejab, N., Timmis, J. and Martin, W. (2005) Mutational decay and age of chloroplast and mitochondrial genomes transferred recently to angiosperm nuclear chromosomes. Plant Physiol., 138, 1723–1733. IRGSP (2005) The map-based sequence of the rice genome. Nature, 436, 793–800. Kellogg, E. (2009) The evolutionary history of Ehrhartoideae, Oryzeae, and Oryza. Rice, 2, 1–14.

Kidd, J.M., Sampas, N., Antonacci, T., Graves, T., Fulton, R., Hayden, H.S., Alkan, C., Malig, M., Ventura, M., Giannuzzi, G., Kallicki, J., Anderson, P., Tsalenko, A., Yamada, N.A., Tsang, P. and Kaul, R. (2010) Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods, 7, 365–372. Kress, W. and Erickson, D. (2008) DNA barcodes: genes, genomics and bioinformatics. Proc. Nat. Acad. Sci., 105, 2761– 2792. Kress, W., Erickson, D., Jones, F., Swenson, N., Perez, R., Sanjur, O. and Bermingham, E. (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc. Nat. Acad. Sci., 106, 18621–18626. Lahaye, R., van der Bank, M., Bogarin, D., Warner, J., Pupulin, F., Gigot, G., Maurin, O., Dutholt, S., Barraclough, T. and Savolainen, V. (2008) DNA barcoding the floras of biodiversity hotspots. Proc. Nat. Acad. Sci., 105, 2923– 2928. Murray, B.G., de Lange, P.J. and Ferguson, A.R. (2005) Nuclear DNA variation, chromosome numbers and polyploidy in the endemic and indigenous grass flora of New Zealand. Ann. Bot., 96, 1293–1305. Parks, M., Cronn, R. and Liston, A. (2009) Increasing phylogentic resolution at low taxonomc levels using massively parallel sequencing of chloroplast genomes. BMC Biol., 7: 84, doi:10.1186/1741-7007-7-84. Philippe, H., Snell, E., Bapteste, E., Lopez, P., Holland, P. and Casane, D. (2004) Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol. Biol. Evol., 21, 1740– 1752. Posada, D. and Crandall, K.A. (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics, 14, 817–818. Ronquist, F. and Huelsenbeck, J.P. (2004) MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19, 1572–1574. Rubinoff, D., Cameron, S. and Will, K. (2006) Are plant DNA barcodes a search for the Holy Grail? Trends Ecol. Evol., 21, 1–2. Song, H., Buhay, J., Whitting, M. and Crandall, K. (2008) Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseuogenes are coamplified. Proc. Nat. Acad. Sci., 105, 13486–13491. Tang, J., Xia, H., Cao, M., Zhang, X., Zeng, W., Hu, S., Tong, W., Wang, J., Wang, J., Yu, J., Yang, H. and Zhu, L. (2004) A comparison of rice chloroplast genomes. Plant Physiol., 135, 412–420. Uozo, S., Ikehashi, H., Ohmido, N., Ohtsubo, H., Ohtsubo, E. and Fukui, K. (1997) Repetitive sequences: cause for variation in genome size and chromosome morphology in the genus Oryza. Plant Mol. Biol., 35, 791–799. Wyman, S.K., Jansen, R.K. and Boore, J.L. (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics, 20, 3252–3255.

ª 2010 The Authors Plant Biotechnology Journal ª 2010 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 9, 328–333

Suggest Documents