Eleanor A S Adamson BAppSc (Hons) Biogeoscience Faculty of Science and Technology Queensland University of Technology Brisbane, Australia

Influence of historical landscapes, drainage evolution and ecological traits on patterns of genetic diversity in Southeast Asian freshwater snakehead ...
Author: Hugh Harrell
2 downloads 0 Views 6MB Size
Influence of historical landscapes, drainage evolution and ecological traits on patterns of genetic diversity in Southeast Asian freshwater snakehead fishes Eleanor A S Adamson BAppSc (Hons)

Biogeoscience Faculty of Science and Technology Queensland University of Technology Brisbane, Australia

This dissertation is submitted in fulfilment of requirements for the degree of

Doctor of Philosophy 2010

KEYWORDS Admixture analysis, Air-breathing fish, Clades, Channidae, Chao Phraya River, Chronogram, Cytochrome b, Bayesian clustering, Drainage evolution, Ecology, Freshwater fish, Fossil calibration, Genetic diversity, Historical biogeography, Khorat Plateau, MedianJoining network, Mekong River, Microsatellite DNA, Molecular evolution, Phylogeny, Phylogeography, Population genetics, RAG1, Ribosomal RNA 16S, RP1, Siam River, Southeast Asia, Snakehead fishes, Sunda Shelf, S7 intron, Tonle Sap Great Lake.

i

ABSTRACT Snakehead fishes in the family Channidae are obligate freshwater fishes represented by two extant genera, the African Parachannna and the Asian Channa. These species prefer still or slow flowing water bodies, where they are top predators that exercise high levels of parental care, have the ability to breathe air, can tolerate poor water quality, and interestingly, can aestivate or traverse terrestrial habitat in response to seasonal changes in freshwater habitat availability. These attributes suggest that snakehead fishes may possess high dispersal potential, irrespective of the terrestrial barriers that would otherwise constrain the distribution of most freshwater fishes. A number of biogeographical hypotheses have been developed to account for the modern distributions of snakehead fishes across two continents, including ancient vicariance during Gondwanan break-up, or recent colonisation tracking the formation of suitable climatic conditions. Taxonomic uncertainty also surrounds some members of the Channa genus, as geographical distributions for some taxa across southern and Southeast (SE) Asia are very large, and in one case is highly disjunct. The current study adopted a molecular genetics approach to gain an understanding of the evolution of this group of fishes, and in particular how the phylogeography of two Asian species may have been influenced by contemporary versus historical levels of dispersal and vicariance. First, a molecular phylogeny was constructed based on multiple DNA loci and calibrated with fossil evidence to provide a dated chronology of divergence events among extant species, and also within species with widespread geographical distributions. The data provide strong evidence that trans-continental distribution of the Channidae arose as a result of dispersal out of Asia and into Africa in the mid–Eocene. Among Asian Channa, deep divergence among lineages indicates that the Oligocene-Miocene boundary was a time of significant species radiation, potentially associated with historical changes in climate and drainage geomorphology. Mid-Miocene divergence among lineages suggests that a taxonomic revision is warranted for two taxa. Deep intra-specific divergence (~8Mya) was also detected between C. striata lineages that occur sympatrically in the Mekong River Basin. The study then examined the phylogeography and population structure of two major taxa, Channa striata (the chevron snakehead) and the C. micropeltes (the giant snakehead),

iii

across SE Asia. Species specific microsatellite loci were developed and used in addition to a mitochondrial DNA marker (Cyt b) to screen neutral genetic variation within and among wild populations. C. striata individuals were sampled across SE Asia (n=988), with the major focus being the Mekong Basin, which is the largest drainage basin in the region. The distributions of two divergent lineages were identified and admixture analysis showed that where they cooccur they are interbreeding, indicating that after long periods of evolution in isolation, divergence has not resulted in reproductive isolation. One lineage is predominantly confined to upland areas of northern Lao PDR to the north of the Khorat Plateau, while the other, which is more closely related to individuals from southern India, has a widespread distribution across mainland SE Asian and Sumatra. The phylogeographical pattern recovered is associated with past river networks, and high diversity and divergence among all populations sampled reveal that contemporary dispersal is very low for this taxon, even where populations occur in contiguous freshwater habitats. C. micropeltes (n=280) were also sampled from across the Mekong River Basin, focusing on the lower basin where it constitutes an important wild fishery resource. In comparison with C. striata, allelic diversity and genetic divergence among populations were extremely low, suggesting very recent colonisation of the greater Mekong region. Populations were significantly structured into at least three discrete populations in the lower Mekong. Results of this study have implications for establishing effective conservation plans for managing both species, that represent economically important wild fishery resources for the region. For C. micropeltes, it is likely that a single fisheries stock in the Tonle Sap Great Lake is being exploited by multiple fisheries operations, and future management initiatives for this species in this region will need to account for this. For C. striata, conservation of natural levels of genetic variation will require management initiatives designed to promote population persistence at very localised spatial scales, as the high level of population structuring uncovered for this species indicates that significant unique diversity is present at this fine spatial scale.

iv

Table of Contents Keywords Abstract

i iii

List of tables

viii

List of figures

ix

List of plates

x

Statement of original authorship

xi

Acknowledgements Chapter 1. General Introduction DISTRIBUTION PATTERNS OF WILD ORGANISMS BIOGEOGRAPHY FRESHWATER DRAINAGE EVOLUTION IN SE ASIA PHYLOGEOGRAPHY SIGNIFICANCE FOR MANAGEMENT AND CONSERVATION SNAKEHEAD FISHES CURRENT STUDY

xii 1 3 7 9 11 14 16

Chapter 2. Systematic investigation of the Asian Snakeheads: Channa (Scopolli) 17 INTRODUCTION SYSTEMATICS OF ASIAN SNAKEHEADS, A MOLECULAR PERSPECTIVE HISTORY OF THE CHANNIDAE CONTEMPORARY CHANNID GEOGRAPHICAL DISTRIBUTION AIMS OF THIS CHAPTER METHODS SAMPLE COLLECTION DNA MARKER SELECTION DNA EXTRACTION, AMPLIFICATION, CLONING AND SEQUENCING OF PCR PRODUCTS ADDITIONAL DATA SEQUENCE ALIGNMENT AND GAP CODING GENETIC DIVERSITY AND GENE TREE RECONSTRUCTION SPECIES TREE INFERENCE AND ESTIMATION OF DIVERGENCE TIMES RESULTS CHANNA GENETIC VARIATION AND PHYLOGENETIC ANALYSES BASED ON MITOCHONDRIAL DNA CHANNA GENETIC VARIATION AND PHYLOGENETIC ANALYSIS BASED ON NUCLEAR DNA MULTI-LOCUS PHYLOGENY RECONSTRUCTION AND CHRONOGRAM ESTIMATION

19 20 22 23 24 25 25 28 29 30 31 32 34 36 36 41 45 v

DISCUSSION PATTERNS OF INTRA-SPECIFIC DIVERGENCE RELATIONSHIPS AMONG TAXA CHANNID DIVERGENCE TIMES CONCLUSION

49 49 52 53 55

Chapter 3. Patterns of genetic diversity and phylogeography of Channa striata in SE Asia 57 INTRODUCTION ECOLOGY UNDERSTANDING C. STRIATA DIVERSITY IN A REGIONAL CONTEXT AIMS OF THIS CHAPTER METHODS SAMPLE COLLECTION DNA MARKER SELECTION MOLECULAR TECHNIQUES - MITOCHONDRIAL DNA MOLECULAR TECHNIQUES - MICROSATELLITE DNA STATISTICAL ANALYSES RESULTS MTDNA DIVERSITY AND PHYLOGEOGRAPHY NUCLEAR DNA RESULTS DISCUSSION TWO C. STRIATA FORMS IN MAINLAND SE ASIA PHYLOGEOGRAPHY OF THE WIDESPREAD FORM FINE SCALE DIFFERENTIATION CONCLUSION

59 60 62 64 65 65 69 70 73 76 84 84 98 115 115 119 120 122

Chapter 4. Patterns of genetic diversity and phylogeography of Channa micropeltes in the Mekong River Basin 123 INTRODUCTION ECOLOGY UNDERSTANDING C. MICROPELTES DIVERSITY IN A REGIONAL CONTEXT AIMS OF THIS CHAPTER METHODS SAMPLE COLLECTION DNA MARKER SELECTION MOLECULAR TECHNIQUES – MITOCHONDRIAL DNA MOLECULAR TECHNIQUES – NUCLEAR DNA STATISTICAL ANALYSES

vi

125 126 128 130 131 131 133 133 134 135

RESULTS MTDNA DIVERSITY AND PHYLOGEOGRAPHY NUCLEAR DNA RESULTS DISCUSSION DIVERSITY AND PHYLOGEOGRAPHY STOCK STRUCTURE IN THE LOWER MEKONG BASIN DIFFERENCES IN CHANNA MEKONG PHYLOGEOGRAPHY CONCLUSION

Chapter 5. General Discussion HISTORICAL BIOGEOGRAPHY OF TROPICAL ASIAN FRESHWATER FISHES PHYLOGEOGRAPHY OF MEKONG FISHES MANAGING SNAKEHEADS IN THE MEKONG: A GENETIC PERSPECTIVE CONCLUSION

139 139 146 159 159 165 166 168

169 172 176 182 186

References

189

Appendices

219

APPENDIX 1. DNA EXTRACTION

221

APPENDIX 2. PCR CLEAN-UP AND SEQUENCING PROTOCOL

222

APPENDIX 3. CLONING PCR PRODUCT

224

APPENDIX 4. MULTILOCUS PHYLOGENIES

228

APPENDIX 5. PUBLISHED PAPER

229

APPENDIX 6. C. STRIATA TGGE

230

APPENDIX 7. MICROSATELLITE ISOLATION

234

APPENDIX 8. ADDITIONAL C. STRIATA MICROSATELLITE PRIMERS

235

APPENDIX 9. GELSCAN PROTOCOL

236

APPENDIX 10. C. STRIATA MICROSATELLITE FREQUENCIES

237

APPENDIX 11. C. MICROPELTES MICROSATELLITE FREQUENCIES

241

vii

List of Tables

TABLE 2.1. SAMPLING DETAILS

27

TABLE 2.2. PRIMERS USED TO AMPLIFY DNA REGIONS USED FOR CHANNA SPP. PHYLOGENETIC ANALYSIS 29 TABLE 2.3. PCR CONDITIONS USED TO AMPLIFY TARGET DNA FRAGMENTS

30

TABLE 2.4. DETAILS OF GENBANK™ SEQUENCES USED IN THE CHANNIDAE PHYLOGENETIC ANALYSIS

31

TABLE 2.5. DIVERGENCE TIMES ESTIMATED BY BEAST FROM A TWO GENE DATASET

47

TABLE 3.1. GEOGRAPHICAL CO-ORDINATES FOR C. STRIATA SAMPLE SITES

66

TABLE 3.2. C. STRIATA COLLECTION SITES AND SAMPLE SIZES

68

TABLE 3.3. MICROSATELLITE PRIMERS FOR C. STRIATA

74

TABLE 3.4. VARIABLE SITES FOR 70 HAPLOTYPES OF 570BPS OF CYT B FOR C. STRIATA

85

TABLE 3.5. C. STRIATA MITOCHONDRIAL CYT B HAPLOTYPE FREQUENCIES

86-87

TABLE 3.6. SUMMARY STATISTICS FOR C. STRIATA MTDNA

94

TABLE 3.7. POPULATION PAIR-WISE ST’S FOR C. STRIATA MTDNA DATA

95

TABLE 3.8. SUMMARY STATISTICS FOR C. STRIATA MICROSATELLITE LOCI AT ALL SITES

99-102

TABLE 3.9. RESULTS OF PAIR-WISE FSTS OF C. STRIATA MICROSATELLITE DATA

105

TABLE 3.10. RESULTS OF PAIR-WISE RSTS OF C. STRIATA MICROSATELLITE DATA

105

TABLE 4.1. SAMPLING DETAILS FOR C. MICROPELTES

131

TABLE 4.2. MICROSATELLITE PRIMERS FOR C. MICROPELTES

134

TABLE 4.3. VARIABLE SITES IN C. STRIATA MTDNA CYT B HAPLOTYPES

139

TABLE 4.4. MITOCHONDRIAL CYT B HAPLOTYPE FREQUENCIES FOR C. MICROPELTES

139

TABLE 4.5. SUMMARY STATISTICS FOR C. MICROPELTES MTDNA

141

TABLE 4.6. POPULATION PAIR-WISE ST ANALYSIS OF C. MICROPELTES MTDNA

143

TABLE 4.7. SUMMARY STATISTICS FOR C. MICROPELTES MICROSATELLITE DIVERSITY

149-150

TABLE 4.8. RESULTS OF PAIR-WISE FST ANALYSIS OF C. MICROPELTES MICROSATELLITE DATA

152

TABLE 4.9. RESULTS OF PAIR-WISE RST ANALYSIS OF C. MICROPELTES MICROSATELLITE DATA

152

viii

LIST OF FIGURES

FIGURE 1.1. MAP OF MAINLAND SE ASIA AND WESTERN INDONESIAN ARCHIPELAGO

5

FIGURE 1.2. MAINLAND SE ASIA

7

FIGURE 1.3. FRESHWATER DRAINAGES OF MAINLAND SE ASIA

8

FIGURE 2.1. NATURAL GEOGRAPHICAL DISTRIBUTION OF CHANNIDAE

23

FIGURE 2.2. SATURATION PLOTS FOR DNA FRAGMENTS

37

FIGURE 2.3. 16SRNA BAYESIAN CONSENSUS TREE

38

FIGURE 2.4. CYT B BAYESIAN CONSENSUS TREE

40

FIGURE 2.5. RAG1 BAYESIAN CONSENSUS TREE

42

FIGURE 2.6. BAYESIAN CONSENSUS TREE OF RP1 ALLELES

44

FIGURE 2.7. BAYESIAN CHANNA PHYLOGENY CONSENSUS TREE ESTIMATED FROM FOUR LOCI

46

FIGURE 2.8. BAYESIAN INFERENCE CHRONOGRAM

48

FIGURE 3.1. GLOBAL FISHERIES PRODUCTION FOR C. STRIATA 1950-2007

59

FIGURE 3.2. MAP OF SOUTHERN ASIA SHOWING BROAD SCALE SAMPLING SITES FOR C. STRIATA

67

FIGURE 3.3. MAP OF MAINLAND SE ASIA SHOWING FINE SCALE SAMPLING SITES FOR C. STRIATA

67

FIGURE 3.4. AGAROSE CHECK GEL SHOWING AMPLIFICATION OF CYT B MTDNA GENE FRAGMENT

70

FIGURE 3.5. BANDING PATTERNS OBSERVED FOR FOUR CYT B TGGE-OHA GELS

72

FIGURE 3.6. MICROSATELLITE GEL IMAGES

75

FIGURE 3.7. NEIGHBOUR JOINING TREE FOR C. STRIATA MTDNA CYT B HAPLOTYPES

88

FIGURE 3.8. MEDIAN-JOINING NETWORK OF C. STRIATA CYT B HAPLOTYPES

89

FIGURE 3.9. MAP AND MEDIAN-JOINING NETWORK SHOWING THREE CYTB MTDNA CLADES

91

FIGURE 3.10. MAP AND MEDIAN-JOINING NETWORK COLOURED BY SECTION OF DRAINAGE BASIN

91

FIGURE 3.11. MAP AND MEDIAN-JOINING NETWORK OF EA CLADE SHOWING STAR CLUSTER

93

FIGURE 3.12. ST PLOTTED AGAINST THE LOG OF GEOGRAPHIC DISTANCE FOR C. STRIATA

97

FIGURE 3.13. DISTRIBUTION OF MICROSATELLITE ALLELE FREQUENCIES FOR C. STRIATA

98

FIGURE 3.14. GRAPH OF MEASURES OF DIFFERENTIATION RANKED BY DEST

104

FIGURE 3.15. NJ TREE OF DEST MICROSATELLITE DIFFERENTIATION AMONG C. STRIATA SAMPLES

107

FIGURE 3.16. RESULTS OF FACTORIAL CORRESPONDENCE ANALYSIS

108

FIGURE 3.17. RESULTS OF BAYESIAN CLUSTER ANALYSIS

109

FIGURE 3.18. GRAPHS OF MEMBERSHIP COEFFICIENTS ESTIMATED FROM BAYESIAN CLUSTER ANALYSIS 110 FIGURE 3.19. ADMIXTURE LEVELS BETWEEN C. STRIATA FORMS ACROSS SE ASIA

112

FIGURE 3.20. ADMIXTURE ANALYSIS FOR C. STRIATA BASED ON BOTH MICROSATELLITES AND MTDNA

113 ix

FIGURE 3.21. THE DISTRIBUTION OF GENETIC DIVERSITY FOR C. STRIATA IN SE ASIA

114

FIGURE 4.1. NATURAL GEOGRAPHICAL RANGE OF C. MICROPELTES IN SE ASIA

128

FIGURE 4.2. MAP OF THE MEKONG RIVER BASIN SHOWING SAMPLE SITES FOR C. MICROPELTES

132

FIGURE 4.3. MICROSATELLITE GEL IMAGE OF LOCUS CS-3

135

FIGURE 4.4. MAP AND MEDIAN-JOINING NETWORK SHOWING C. MICROPELTES CYT B HAPLOTYPES

140

FIGURE 4.5. RESULTS OF THE POWER ANALYSIS FOR THE C. MICROPELTES CYT B DATA SET

142

FIGURE 4.6. RAW ST PLOTTED AGAINST LOG-DISTANCE FOR IN C. MICROPELTES MTDNA DATA

144

FIGURE 4.7. GRAPH OF CT VALUES OBTAINED IN SAMOVA ANALYSIS

144

FIGURE 4.8. ILLUSTRATION OF GEOGRAPHICAL EXTENT OF GROUPS DEFINED BY SAMOVA ANALYSIS

145

FIGURE 4.9. ALLELE FREQUENCIES OBSERVED AT THE THREE MICROSATELLITE LOCI

147

FIGURE 4.10. MEKONG BASIN MAP SHOWING C. MICROPELTES MICROSATELLITE ALLELE FREQUENCIES 148 FIGURE 4.11. AVERAGE ALLELIC RICHNESS FOR C. MICROPELTES MICROSATELLITE DATA

150

FIGURE 4.12. RESULTS OF THE POWER ANALYSIS FOR THE C. MICROPELTES MICROSATELLITE DATA SET

151

FIGURE 4.13. GRAPH OF PAIR-WISE MEASURES OF DIFFERENTIATION RANKED BY DEST

151

FIGURE 4.14. NJ TREE OF DEST MICROSATELLITE DIFFERENTIATION AMONG C. MICROPELTES SAMPLES

153

FIGURE 4.15. GRAPH OF FCT VALUES OBTAINED IN SAMOVA ANALYSIS

154

FIGURE 4.16. RAW FST PLOTTED AGAINST LOG-DISTANCE FOR C. MICROPELTES MICROSATELLITE DATA

154

FIGURE 4.17. RESULTS OF FACTORIAL CORRESPONDENCE ANALYSIS

156

FIGURE 4.18. RESULTS OF BAYESIAN CLUSTER ANALYSIS

157

FIGURE 4.19. GRAPHS OF CLUSTER MEMBERSHIP COEFFICIENTS

158

LIST OF PLATES

PLATE 2.1. C. GACHUA – THE DWARF SNAKEHEAD

25

PLATE 2.2. C. LUCIUS – THE SPLENDID SNAKEHEAD

26

PLATE 2.3. C. MICROPELTES – THE GIANT SNAKEHEAD

26

PLATE 2.4. C. STRIATA - THE STRIPED OR CHEVRON SNAKEHEAD

26

PLATE 2.5. C. SP “X” (UNIDENTIFIED)

27

PLATE 2.6. ADULT C. MARULIA GUARDING YOUNG

51

PLATE 4.1. PHENOTYPIC VARIATION IN CHANNA MICROPELTES ACROSS THE MEKONG RIVER BASIN

x

127

STATEMENT OF ORIGINAL AUTHORSHIP The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Eleanor A S Adamson September 2010

xi

ACKNOWLEDGEMENTS Thank you to my supervisors, Professor Peter Mather and Dr David Hurwood. Thanks for your fantastic open door policy, faith, positivity, interest, tolerance, encouragement and patience. Thanks for introducing me to the Mekong and involving me in your research. Most of all, thanks for believing in me and offering me the chance to tackle such a great project, it’s been a wonderful experience. Thank you to all my international collaborators, who not only assisted with collection but also made me welcome in their home countries and revealed to me a side of SE Asia that I could never have experienced on my own. I will treasure those wonderful field trip memories forever. Special thanks to Chris Barlow, Boonsong, Nguyen Nguyen Du, Dr Nguyen Van Hao, Suchart Ingthamjitr, Estu Nugroho, Phuc Dinh Phan, Pom, Lieng Sopha, Naruepon Sukmasavin, Ubolratana Suntornata, Nao Thouk and the Mekong River Commission. Thank you to Vincent Chand, for technical assistance, laboratory support and friendship. Without your help I would never have isolated those microsats! Your expertise has been essential to my research, and I’ll be lost without you to nag! Thank you to the great bunch of NRS/Biogeoscience academics for fostering a great academic environment and sense of community; it has been a pleasure to be part of the ecology group. Special thanks to Andrew Baker, Susan Fuller, Tanya Scharashkin and Ian Williamson for ongoing support, friendship, advice and occasional wisdom. Thanks also to Dr David Strayer, whose insightful question at NABS2009 encouraged me to look more closely at the genetics of introduced populations. To my fellow postgrads, thanks for sharing my journey, for insightful discussions, silliness, help, and companionship over so many lunches, coffees (and beers) especially Bita Archangi, Terrence Dammannagoda, Litticia Bryant, Little Susan harveyii and Matt Krosch (Matt - sorry for the countless interruptions, my thesis is all the better from utilising your vast talent as a personal dictionary/thesaurus). Thank you to my Mum Peggy. Thanks for teaching me about the world, and fostering my interest and understanding of all things scientific. Thanks for your infinite patience in answering my multitude of childhood “how?” and “why?” questions. Thanks for providing me with a great education and the skill set to achieve my goals. Thanks for always xii

supporting me in the life choices I’ve made. And, thanks for all those cash injections when I’ve needed them, you’re the best! Lastly, to my partner Andrew: without your love and support this journey would have been a very different and much harder road to travel. Thanks for enduring the long hours I’ve spent away from home in the field, at conferences, in the lab, and in front of my computer, for tolerating my total absorption, procrastination, lack of income and general lack of sanity. During my PhD I received financial support from a QUT Postgraduate Research Award, with added funding from QUT’s Institute for Sustainable Resources. Additional laboratory costs were kindly absorbed by PM’s research funds. The draft thesis benefited greatly from comments by A Baker and S Fuller, and two excellent anonymous external reviewers.

xiii

Chapter 1 General Introduction

1

Introduction For a long time naturalists have recognised that landscape evolution has had a large influence on patterns of biodiversity and the distributions of different species and groups of species. Over the last 30 years, advances in genetic technologies and greater understanding of molecular evolution have allowed investigations into the relationship between geography and spatial patterns of genetic diversity. Such studies have provided new insights into the processes that have determined contemporary biogeographical patterns among groups of taxa. Furthermore, improvements in the theoretical understanding of population genetics and in analytical interpretation of genetic data have allowed investigations into the spatial distribution of genetic diversity at the intra-specific level (Avise 2009; Hickerson et al. 2010). This technology has been applied to a wide range of taxa, including plants, invertebrates and vertebrates in both aquatic and terrestrial environments (Beheregaray 2008). It has become evident that historical landscape evolution can have significant impacts on the way genetic diversity is distributed in natural populations at the intra-specific level. Patterns of genetic diversity, however, are not always easy to predict, especially when contemporary landscapes differ greatly from the configuration of past landscapes. It is now widely recognised that conservation plans must account for spatial patterns of genetic diversity if the long term goal is to ensure species and/or population persistence over evolutionary time. Thus, information on phylogenetic relationships at the interspecific and intra-specific level is relevant to addressing conservation goals as well as to provide insight into how landscape processes have shaped the evolution of regional biotas.

Distribution patterns of wild organisms Biogeography Biogeography considers the history of organisms in a geographical landscape (Morrone & Crisci 1995). The contemporary spatial distributions of organisms arise as a result of historical range expansions, extinctions, and speciation (Futuyma 1998). Underlying these events are the processes of dispersal (movement among sites, including colonisation of formerly unoccupied sites) and vicariance (division of an ancestral population resulting from cessation of dispersal among sites). Patterns of dispersal and vicariance are linked integrally to an organism’s ecology, and in particular to their intrinsic ability to disperse (vagility) (Kodandaramaiah 2009), especially where suitable habitat patches are subdivided across a natural geographical landscape. Historical changes in climate and geomorphology alter the spatial arrangement of suitable habitats, and hence drive changes in patterns of

3

Chapter 1. dispersal and vicariance (Wiens & Donoghue 2004; Woodruff 2010) . In fact, in a biogeographical sense, “ecology and history are indissolubly tied together” (Posadas et al. 2006, p398), as an organism’s ecology will inherently determine its response to historical processes that modify natural geographical landscapes. The relationship between ecology and history is clearly evident in the biogeography of contemporary tropical Asian biota. The flora and fauna of the Indian subcontinent has mixed origins, including representatives of relictual lineages of Gondwanan origin and more recent arrivals that evolved in the northern hemisphere (Karanth 2006). The ecology of some Gondwanan relictual Indian taxa has enabled them to disperse widely across southern and Southeast (SE) Asia in the time since continental plate collision with Eurasia. Examples of taxa that have dispersed “out-of-India” include a number of amphibians (Bossuyt & Milinkovitch 2001; Gower et al. 2002), lizards (Macey et al. 2000; Melville et al. 2009), and plants (Conti et al. 2002; Morley 2003). Similarly, after the India-Asia plate collision, taxa of Eurasian origin were able to disperse into India, for example some flowering plants (Bande & Prakash 1986) and gastropods (Kohler & Glaubrecht 2007). In contrast, other groups have not dispersed either into or out of the Indian subcontinent post-collision, for example burrowing frogs (Biju & Bossuyt 2003), and millipedes (Wesener & VandenSpiegel 2009). This is largely because, in spite of at least a 35Mya history of dry land connection with Eurasia (Ali & Aitchison 2008), the ecologies of these taxa have limited their ability to disperse and establish natural ranges that span Eurasia and the Indian subcontinent. The biogeography of SE Asian biota has been of particular interest since the mid 19th century, when Alfred Wallace first noted an unexpectedly large shift in species composition between the Indonesian islands of Bali and Lombok, signifying an abrupt change from Asian species to those with Australian affinities across only a 25km ocean gap (Wallace 1863). Wallace suggested that the modern distributions of species he observed reflected the geological history of the land masses, and indeed, all western islands once formed a continuous land mass, Sundaland (Figure 1.1), that was connected to mainland SE Asia until the Eocene (54-33Mya) (Hall 1998). These islands have continued to experience periodic dry land connections with mainland SE Asia and each other during past times of lowered sea level (Moss & Wilson 1998; Woodruff 2010), including most recently during the Pleistocene (Voris 2000; Woodruff & Turner 2009).

4

Introduction

South China Sea

Indian Ocean

0

500km

Sunda Shelf

Molengraaff Rivers

Wallace’s Line

Figure 1.1. Map of mainland SE Asia and western Indonesian archipelago showing current land mass (contemporary freshwater drainage lines (white) and the historical extent of Sundaland, including drowned Molengraaff rivers , and Wallace’s original biogeographical line (1863), figure modified from Voris (2000). Periods of dry land connection between Sundaic islands and the SE Asian mainland have allowed floral and faunal (biotic) exchange across the SE Asian region. Many taxa are now known to have dispersed among islands during the Miocene and Pliocene, including many groups of mammals, frogs and snakes (Harrison et al. 2006; Heaney 1984; Inger & Voris 2001; Ruedi & Fumagalli 1996). As the geographical distributions of more SE Asian taxa were examined, Wallace and later biogeographers revised Wallace’s (1863) original zoogeographic division, and various boundaries have been drawn based on the distribution of different faunal groups (Mayr 1944). These boundaries, in essence, demonstrate how different life history traits and dispersal capabilities among organisms produce different natural geographical ranges, even where extrinsic historical factors have been the same. For some SE Asian taxa, ecological traits have prevented widespread dispersal, even when marine regressions exposed dry land between islands on the Sunda Shelf. Dispersal of forest-dependent mammals, for example, may have been limited by the presence of

5

Chapter 1. large areas of open grasslands in the lowlands between closed forest habitats (Bird et al. 2005; Meijaard 2003), although the extent of grasslands and forest cover on the Sunda Shelf during the last glacial maximum remains controversial (Cannon et al. 2009; Woodruff 2010). In contrast, for members of other groups, such as some moths that are accomplished fliers, dispersal between island land masses has probably been ongoing, even during periods where marine transgressions have submerged land bridges on the Sunda Shelf (Beck et al. 2006). Within mainland SE Asia, ecological and geomorphological barriers are also known to have limited dispersal of some taxa, for example gibbons have been isolated either side of major rivers (Meijaard & Groves 2006; Thinh et al. 2010). These examples demonstrate that isolation of historical habitats may or may not limit dispersal, depending on the specific ecological traits of the taxa considered. Obligate freshwater fishes provide a unique model for biogeographical study because, by nature, their distributions are restricted by the physical boundaries of the environment they inhabit, i.e., freshwater drainages. Unlike terrestrial plant and animal taxa, with few exceptions, freshwater fishes are unable to disperse via winds, oceanic floating debris, birds, or other secondary agents (Briggs 2003b). For most freshwater fishes this ecological constraint restricts dispersal to historical periods of drainage connectivity, such as before geomorphological change involving tectonic or errosional processes shaped current drainage configurations, amalgamating or subdividing historical drainage networks, or when isolated drainages met in extended river basins on exposed continental shelves during periods of lowered sea levels. In southern and SE Asia, geomorphological and eustatic processes are known to have played a major role in reshaping freshwater networks throughout the Cenozoic, and in turn have influenced the distribution of freshwater taxa. SE Asia has among the highest freshwater diversity found anywhere in the world (Allan et al. 2005; Dudgeon 2000a), and different groups of taxa with varied ecologies are likely to have responded in different ways to changes in freshwater habitats associated with the evolution of contemporary drainage basins.

6

Introduction

Freshwater drainage evolution in SE Asia The precise history of freshwater drainage evolution across southern and SE Asia is so far relatively poorly documented, however a number of significant changes are known to have occurred recently, and in the early and late Tertiary. After the Indian-Asian plate collision, the up-thrust of the Himalayas at the northern margin of the Indian subcontinent led to the formation of large eastern flowing rivers (Brookfield 1998; Hall 1998). Major uplifts in the Himalayas occurred at the Eocene-Oligocene boundary ~38Mya, during the Oligocene ~30Mya, in the Late Miocene 8Mya, and in the mid Pliocene ~3Mya (DupontNivet et al. 2008; Pei et al. 2009; Rowley & Currie 2006; Zhisheng et al. 2001). This mountain building triggered large scale drainage captures (Brookfield 1998; Clark et al. 2004; Clift et al. 2006). Evolution of large eastern flowing rivers at this time may have promoted the eastward dispersal of Gondwanan freshwater taxa into SE Asia (Hall 1998). The Mekong River is now the longest and largest drainage in SE Asia (van Zalinge et al. 2003; Zakaria-Ismail 1994), and it is also the most zoogeographically diverse freshwater region in Asia (Allan et al. 2005; Kottelat 1985). The Mekong is thought to have assumed its current configuration as recently as the Pleistocene (Rainboth 1996a). From headwaters on the Tibetan Plateau, the Mekong flows over 4,350km to drain into the South China Sea. The majority of the River Basin (82%) is located on SE Asia’s Indo-Chinese peninsula, where the drainage encompasses Lao PDR, northeastern Thailand, Cambodia, and parts of Southern Vietnam (Figure 1.2).

Figure 1.2 Mainland SE Asia: (a) Mekong River Basin (yellow) and Chao Phraya Basin (orange), and (b) political boundaries of the Indo-Chinese Peninsula.

7

Chapter 1. During its formation, the Mekong drainage system probably incorporated many parts of other river networks that previously drained the Indo-Chinese peninsula, and in doing so accumulated many species that had previously evolved in isolation (Coates 2001; Rainboth 1996a). Although there is no comprehensive reconstruction of past drainage lines based on geological evidence, it is likely that the growing Mekong Basin captured the headwaters of the ancestral Siam/Yom River (now the Chao Phraya River that drains central Thailand) and also tributaries that drain the Khorat Plateau of northeastern Thailand (Brookfield 1998; Rainboth 1996b). Figure 1.3 shows one reconstruction of former drainage lines in comparison with the current course of the Mekong River.

Figure 1.3 Freshwater drainages of mainland SE Asia, (a) Reconstruction of historical drainage morphology modified from Rainboth (1996b, page 168), (b) Contemporary drainage lines, Mekong River bolded, and (c) four Mekong Freshwater ecoregions as defined by Abell et al. (2008), Tibetan headwater ecoregion (Upper Lancang) not shown. Mekong geomorphological changes have been ongoing. As recently as 0.6Mya, tectonic subsidence in the lower part of the drainage on the Lao-Cambodian border created an extensive series of rapids, the Khone Falls (Rainboth 1996a). Further subsidence of the Cambodian plate only 6Kya created the Tonle Sap Great Lake in central Cambodia, that is now the largest permanent freshwater body in SE Asia (Penny 2006; Rainboth 1996a). In the dry season the lake covers 2,520km2 and is mostly less than 3m deep, but in the wet season it expands to around 15,780km2, flooding savannah and deciduous forests as it receives floodwaters from the main Mekong channel (Kottelat 1985; Rainboth 1996a; Rainboth 1996b). Many freshwater taxa are shared among mainland SE Asian drainages and drainage sections (Ng & Rainboth 2001; Rainboth 1996b), and are believed to have assumed their contemporary distributions as historical drainage rearrangement enabled dispersal into 8

Introduction new drainage networks. In the Mekong Basin itself, five freshwater ecoregions have recently been defined based on broad ecological and evolutionary patterns in species distributions (Abell et al. 2008) (see Figure 1.3-c). In general, these regions correspond with sections of the Basin that are thought to have previously been isolated. In addition to large scale geomorphological change that has divided and united freshwater habitats in SE Asia, mainland and island freshwater habitats that are now isolated have also experienced extended periods of connectivity in drowned river basins on the Sunda Shelf (Molengraaff Rivers, see Figure 1.1) (Molengraaff 1921; Sathiamurthy & Voris 2006; Voris 2000) . Many groups of freshwater fishes have contemporary distributions across island and mainland drainages that reflect this widespread pattern of freshwater connectivity (Dodson et al. 1995; Kottelat 1985; Rainboth 1996b; Taki 1975; Yap 2002).

Phylogeography Ongoing processes of dispersal and vicariance, that over long periods of evolutionary time shape the historical biogeography of groups of taxa, also shape the distribution of lineages within taxa over shorter time scales. Information about patterns of variation of neutrally evolving genetic lineages within a single species can be used to trace the microevolutionary history of an organism, and can provide insights into how dispersal events, vicariance and demographic fluctuations have shaped the distribution of species in space and time (Avise 1998; Avise 2009; Avise et al. 1987). The magnitude and patterns of phylogenetic differentiation among individuals can be interpreted in relation to models of population genetics, for example Coalescent Theory (Donnelly & Tavaré 1995; Hudson 1990; Kingman 1982; Stephens & Donnelly 2000). Using Coalescent Theory, simulations of gene genealogies back in time based on current gene variation allow inferences to be made about the history of populations, including estimates of past population sizes, growth rates, and the time to most recent common ancestor among samples (Avise 1989; Kingman 2000; Kuhner 2009; Rand 1996). When geographical distributions of genetic variation are considered, further inferences can be made concerning the origin of historical population founders and about ongoing rates of migration among demes, providing insight into how past and contemporary patterns of gene flow have shaped natural geographical ranges and how they contribute to the persistence of extant populations. The study of genealogy in a geographical landscape is 9

Chapter 1. known as phylogeography. Avise at al. (1987) outlined five generalised classes of phylogeographical pattern that may be expected in contiguous and discontinuous populations under varying amounts of gene flow and historical barriers to dispersal. They range from total sub-division with complete reciprocal monophyly and large divergence among individual demes, to total homogeneity of closely related lineages across large undivided geographical ranges. Each pattern can be related to the likely evolutionary circumstance under which they arose, for example a number of deeply divergent lineages co-occurring across a large geographical range may indicate a history of recent secondary population admixture after a long period of evolution in isolated ranges, or alternatively may indicate the presence of previously unidentified sibling taxa that are reproductively isolated (Avise et al. 1987). Freshwater fish populations are often structured spatially in nature (Avise 1992; Gyllensten 1985; Ward et al. 1994). This is due to the nature of freshwater environments, that often possess many potential barriers to dispersal for aquatic taxa, for example drainage divides, waterfalls, rapids, lakes, and even dams (Crispo et al. 2006; Meffe & Vrijenhoek 1988; Monaghan et al. 2001; Yamamoto et al. 2004). When the history of regional drainage patterns has been stable, predictions can be made regarding the degree of phylogeographical structuring within and among drainages. Generally, highest phylogenetic divergence is expected among populations that inhabit independent, historically isolated drainage basins. Within drainage networks, gene exchange and genetic similarity is expected to be highest (and phylogenetic divergence lowest) when populations are geographically close and barriers to dispersal are absent or limited (Meffe & Vrijenhoek 1988). In addition, unidirectional stream flow can bias dispersal, leading to the accumulation of genetic diversity in downstream populations (Hernandez-Martich & Smith 1997). These phylogeographical predictions, based on a scenario of historically stable drainage geomorphology, are unlikely to be appropriate for describing phylogeographical patterns for taxa that have experienced recent large scale changes in freshwater habitat and habitat connectivity. In northeastern Australia and New Zealand, for example, where recent geomorphological change has re-shaped drainage basins, diversity and divergence among intra-specific freshwater fishes retains a signature of ancient freshwater connectivity (Burridge et al. 2007; Hurwood & Hughes 1998; Waters et al. 2001). In SE Asia, where historically recent drainage re-arrangement and eustasy have altered the probability

10

Introduction of freshwater connectivity among stream sections and drainage basins, some obligate freshwater taxa may also show patterns of phylogeographical structure that have evolved in ancestral river networks. Similarity among isolated rivers has already been noted for a few SE Asian freshwater fishes, although patterns vary with the taxa considered (Adamson et al. 2009; Hurwood et al. 2008; McConnell 2004; Rainboth 1996b). Adding further complexity to predictions of population structure, intrinsic ecological characteristics can play a significant role in determining the phylogeographical structuring of aquatic taxa in contiguous freshwater habitats (Bohonak 1999). Mode of dispersal (passive or active) (Avise 1992; Bohonak 1999), behavioural traits (such as territorial defence), life history traits (including discrete migration pathways, or breeding sites), and specific reproductive behaviours (Avise et al. 2002; Bohonak 1999; So et al. 2006a) can all influence phylogeography and population structure. In the Mekong River, discrete migration pathways are thought to maintain population structure in small carp (Hurwood et al. 2008), while breeding site affinity is thought to influence population structure in a large catfish species (So et al. 2006a; So et al. 2006b). The many freshwater taxa endemic to the recently evolved SE Asian river networks are likely to display a vast array of phylogeographical structures, influenced by regional history and unique ecological traits of target species.

Significance for management and conservation Information on the level of contemporary gene flow, population structuring and historical phylogeography for a species has important implications for conservation and management. Understanding the magnitude of gene flow among local populations and groups of populations helps determine the spatial limits of functionally independent population units, or ‘Management Units’ (MUs) (Palsbøll et al. 2007). MUs are the logical unit for population monitoring and demographic study, and their recognition is fundamental for short term management of sustainable populations (Moritz 1994). This is especially true for harvested populations, such as heavily fished species, because management for (maximum) sustainable harvesting must account for the level at which fisheries resources operate as independent self-recruiting units, including where multiple MUs are present in established fishery “stocks” (stock complexity), and conversely, where multiple fisheries exploit single MUs (Begg & Waldman 1999; Ovenden 1990; Stephenson 1999). 11

Chapter 1. At a higher level of hierarchical population structuring, patterns of phylogeographical structuring are important for defining Evolutionary Significant Units (ESUs) for long term conservation management (Moritz 1994) . ESUs may constitute one or more subpopulations (MUs), and each ESU represents a significant proportion of the total genetic diversity present within a species across its natural geographical range (McElhany et al. 2000). The idea behind ESUs is that, where a species is significantly structured into divergent sets of populations that experience substantial reproductive isolation, each individual unit contains distinct genetic variation including adaptive differences and as a consequence they represent unique evolutionary potentials (Moritz 1994; Palsbøll et al. 2007). Over the long term, one of the main goals for conservation of biological diversity is to maintain the evolutionary potential of species by conserving levels of genetic diversity and genetic variability within and among natural populations (Ryman 1991; Woodruff 2001). In SE Asia, where freshwaters have undergone significant large scale changes in recent history, phylogeographical patterns and the extent of ESUs for freshwater taxa are likely to be difficult to predict. Moreover, there is a high potential for cryptic diversity, as drainage amalgamation may have united divergent lineages and sibling species in single drainages. Effective management for conservation requires an understanding of how target organisms are structured genetically, involving information on the geographical scale at which independent populations, ESUs and species occur naturally. Genetic analysis of intra-specific fish populations can reveal the distribution of discrete, semi-independent, or panmictic populations (Carvalho & Hauser 1994; Ward 2000). Understanding the magnitude of intra-specific population differentiation and/or similarity and the regional processes that maintain it will be critical for managing freshwater fish stocks effectively in perturbed environments. It will also allow management decisions to incorporate knowledge about ESUs and MUs, and to assess the potential impacts of erecting barriers to migration, changing habitat or flow regimes, and to account for high levels of harvesting at different spatial scales (Carvalho & Hauser 1994). It is likely that due to historical geomorphological change and eustasy, SE Asian freshwater populations have undergone repeated episodes of vicariance and reconnection during recent times. This has probably led to many cases of secondary admixture, and in some extreme instances, the formation of sympatric sibling species (e.g., Henicorhynchus siamensis and H. lobatus distributions are sympatric across the greater Mekong 12

Introduction zoogeographic region) (Rainboth 1996a). The extent to which the distribution of individual taxa have been affected by changes in drainage structure is determined by specific life history traits (Rainboth 1996b). Rainboth (1996b) found that at the intra-specific level, species of the cyprinid genus Hypsibarbus with preferences for small upland streams and coarse substrates show a different history of drainage invasion compared with species with a preference for large river habitat. Synthesis of ecological data, phylogenetic reconstructions and geomorphological evidence can elucidate the phylogeography and population structure of other SE Asian freshwater fishes, providing insight into the regional processes that have shaped modern distributions of freshwater fish populations at the intra-specific level. Recent regional processes are likely to have had a huge impact on the distribution of intra-specific genetic diversity in freshwater fishes, and also on the scale at which local fish populations operate as independent demographic units. In SE Asia and in particular the greater Mekong region, fishes comprising a single taxon within a single drainage may represent divergent populations that have evolved independently in isolation for long periods of time and that have only recently recontacted. These populations may have developed life history traits specific to sub-drainages or river sections, including discrete migration pathways or discrete breeding grounds, and there may be little if any exchange between neighbouring stocks. Conversely but equally likely, taxa that evolved in past drainage configurations may have undergone historically recent expansions in range and population size when new drainage arrangements have allowed dispersal. Newly established stocks may now rely heavily on the inputs from migrants moving across the system to maintain local population sizes and levels of genetic diversity, or may have specific life history stages that tie individuals to specific localities or habitats for part of their life cycle. Thus, it is likely that the magnitude of recent geomorphological change and the frequency of drainage connection and vicariance in SE Asia have affected different fish taxa in a variety of ways and to different extents. Unlike historically stable biogeographical patterns and congruent patterns of intra-specific structuring among fishes in other world bioregions (e.g., Australia (Unmack 2001) and southern North America (Avise 1992; Bermingham & Avise 1986)), it is likely that SE Asian freshwater fishes could display a range of population structures not easily predicted simply from ancient geomorphology, or even patterns that are congruent among congeneric taxa.

13

Chapter 1.

Snakehead Fishes Asian snakehead fishes (Channa, Scopolli) are obligate freshwater fishes, with a natural distribution across southern and SE Asia including the Sunda Islands. They constitute a large proportion of the 2.5 million ton annual freshwater catch across the Mekong River Basin (Baran 2005) and command a high price on the Asian fish market (Lieng & van Zalinge unpublished; Wee 1982). There has been some taxonomic uncertainty surrounding members in the Channa genus (Courtenay & Williams 2004), as geographical distributions for some taxa across southern and SE Asia are vast, and in one case is very disjunct. Channa species possess a number of unusual ecological traits that make predictions of population structure and phylogeography from geomorphological data alone difficult. For instance, snakeheads have a number of adaptations that could facilitate overland dispersal, including modified areas of the respiratory system that allow individuals to breathe air for extended periods of time (Hughes & Munshi 1986), and the ability to propel themselves across terrestrial environments by flexing their bodies (“ipsilateral tail action”) (Sayer 2005). These characteristics may indicate the importance of overland dispersal for members of the genus, and suggest that natural geographical distributions may not be correlated totally with the evolution of watersheds in SE Asia as may be the case for most other obligate freshwater fish taxa. Channa are also known to tolerate poor water quality, can aestivate buried in mud during dry periods between flood events, and are traditionally known to be sedentary in nature (Wee 1982). Moreover, adults exercise high levels of parental care, “nest building” in mud and guarding young after hatching (Lee & Ng 1994; Wee 1982). These life history characteristics suggest that, despite the potential for high range dispersal, actual dispersal could be much more limited. In this case, large contemporary geographical distributions are unlikely to be the consequence of recent natural dispersal within and among freshwater habitats, but could instead result from ancient colonisation events or recent human mediated introductions. Different levels of contemporary and historical gene flow, associated with ancient range establishment, recent natural colonisation, human mediated range expansions, or in response to landscape changes, determine patterns of phylogeographical structuring (Avise 2000). For snakehead fishes, ecological information alone is insufficient to accurately predict the level of geographical population structuring that may be present among 14

Introduction populations, both among populations within newly formed river networks, and across populations now inhabiting isolated freshwater drainage basins. Knowledge of this nature is important, however, for devising effective conservation strategies in freshwater environments undergoing changes associated with human population (Carvalho & Hauser 1994; Ryman et al. 1995; Vrijenhoek 1998). In SE Asia, the most common and most economically important snakehead species are Channa striata (the chevron snakehead) and C. micropeltes (the giant snakehead). Both species are caught in traps, with hook and line, by draining ponds, and in rice-fish culture (Deap et al. 2003; Rothuis et al. 1998b), but large-scale fishing operations increasingly target the two species. In Cambodia, these species combined constitute 11% of the total freshwater catch (Baran 2005), however in targeted fishing operations, for example on the shores of the Cambodian Great Lake, they comprise majority of the catch (Campbell et al. 2006), and can representing as much as 43% of local catches (Rot unpublished). Although still relatively abundant in the Mekong Basin, Channa species have experienced recent population declines in other regions due to lack of management, e.g., Malaysia (Yusoff et al. 2006). Mekong Channa populations may also be vulnerable to decline, firstly as their reproductive cycle is linked closely with seasonal flood plain habitats that are increasingly exposed to modification (Lieng & van Zalinge unpublished), and secondly as populations are subject to unrestricted, high levels of harvesting of both juveniles and adults across the Mekong Basin. As Channa species represent an important cultural and economic resource (Lee & Ng 1994; Wee 1982), as well as comprising a significant proportion of available protein for many subsistence families (Coates 2002), the conservation of these fish in the wild and their maintenance as viable fishery resources is an important priority for the riparian countries of SE Asia. Ecological information currently available on Channa species cannot adequately predict the stock structure of species in the genus, but successful sustainable management will require knowledge of this nature. Phylogenetic and population genetic approaches that incorporate molecular and ecological data and regional history can elucidate contemporary stock structure for important snakehead species. This information will be relevant to the formulation of future management plans that can assist conservation of this economically important group of fishes.

15

Chapter 1.

Current study This thesis aimed to examine relationships among Asian snakeheads fishes in order to clarify levels of genetic divergence within and among currently recognised species, and to determine the phylogeographical structure of two economically important species across SE Asia. A molecular genetics approach was employed to date divergences and to assess levels of contemporary and historical gene flow. Patterns of molecular evolution were interpreted in a regional historical biogeographical context to provide insight into what factors are likely to have impacted on the macro- and micro-evolutionary history of Channa spp in SE Asia. Chapter 2 aimed to construct a fossil calibrated molecular phylogeny and used this information to propose a hypothesis for the evolutionary history of the Channidae, and to elucidate levels of molecular divergence among and within snakehead species in SE Asia. Chapter 3 aimed to determine the phylogeography and population structure of C. striata across SE Asia and examine the influence that historical landscape evolution and ecological traits have had on determining contemporary patterns of genetic diversity and divergence. Chapter 4 aimed to determine the population structure of C. micropeltes in the Lower Mekong Basin, where this species is an important component of the Cambodian freshwater fish catch. Chapter 5 aimed to place the historical biogeography and phylogeography of snakehead fishes in the broader context of other SE Asian endemic freshwater fish. Results were considered in relation to fisheries management for conservation and ongoing wild fishery production of snakeheads.

16

Chapter 2 Systematic investigation of the Asian snakeheads: Channa (Scopolli)

17

Phylogeny of the Channidae

INTRODUCTION The primary aim of systematic biology is to characterise the phylogenetic relationships among species and groups of species, and in doing so, to describe patterns of biological diversity in an historical and evolutionary context (Kullander 1999). A number of definitions have been proposed that address the concept of what actually constitutes a ‘species’ (Mallet 1995). For the purposes of this chapter, a species can be thought of as a functional unit in a community that incorporates genetic (and demographic) exchangeability and that is monophyletic (Kullander 1999). Species are one of the fundamental units in biology (de Queiroz 2007), and are important in understanding biodiversity and evolutionary processes. Therefore, understanding the patterns by which species are related in time and\or space allows a researcher to address questions about the history of lineages, and about contemporary distributions of species, groups of species and communities. Systematics includes, and is relevant to, taxonomy (the formal naming and description of taxa) but is broader in scope because it considers the relationships among taxa. The taxonomy of SE Asian freshwater biodiversity is in general, less than comprehensive. Across Asia currently, there are approximately 3,500 described species of freshwater fish, mostly tropical, but it is likely that many more await formal description and classification (Kullander 1999). In the Mekong Basin alone there are records for at least 1,200 - 1,500 fish species (Bao et al. 2001; Kottelat 2001; MRC 2003; Rainboth 1996a), but as groups of fishes undergo systematic revisions, routinely many new species are resolved (Rainboth 1996a). Unrecognised biodiversity is likely to be represented by species that exist in low numbers or that possess restricted ranges, small species that lack significant commercial value, and by morphologically cryptic species. Although in general, the occurrence of cryptic species in sympatry is likely to be rare for freshwater fish (Kotlík & Berrebi 2001), in SE Asia a recent history of large scale geomorphological upheaval may have promoted mixing of divergent lineages and sibling taxa via repeated isolation and reconnection of freshwater environments over evolutionary timescales (Kottelat 1985; Rainboth 1996a). A recent molecular study of SE Asian mudcarp (Henicorhynchus spp.) revealed deep divergence between species that are very similar morphologically, and also recognised highly divergent populations of a single Henicorhynchus “species” that co-exist in the same drainage (Hurwood et al. 2008). This outcome suggests that systematics at the genus and 19

Chapter 2. species levels for Henicorhynchus is not well understood. Recent molecular study of some Mekong cyprinids also recognised a lack of genetic differentiation in taxa that appeared morphologically distinct and are currently classified as different species (Cyclocheilichthys lagleri and C. repasson) (Hurwood , unpublished). These findings demonstrate that in the absence of comprehensive systematic knowledge, biodiversity is likely to go undetected and its significance for conservation management will be unrecognised.

Systematics of Asian snakeheads, a molecular perspective Asian snakeheads (Channa spp.) are of high economic significance and are common in markets across southern and SE Asia, and this may explain why they have received a moderate amount of systematic attention in comparison with many other Asian freshwater fish groups, (for examples see Abol-Munafi et al. 2007; Li et al. 2006; Vishwanath & Geetakumari 2009). Recent studies have recognised 26 (Li et al. 2006) or 27 (Ambak et al. 2006) species in the Channa genus, although this number may well change as the group is subject to ongoing taxonomic revision and reclassification (e.g., Alfred 1963; Musikasinthorn & Taki 2001; Tweedie 1950), and also as more complete ichthyological surveys lead to identification of new species with restricted ranges, (e.g., Musikasinthorn 1998; Zhang et al. 2002). To date, channid systematics has been focused primarily at the recognised species level or at higher taxonomic levels, ignoring the possibility for high levels of intra-specific differentiation. It has been suggested however, that at least five species in the genus may in fact represent “species complexes” (Courtenay & Williams 2004), namely for C. gachua, C. marulius, C. micropeltes, C. punctata, and C. striata. Two of these species (C. striata and C. micropeltes) are harvested from the wild in large numbers and also are now widely cultured across SE Asia. Given their economic importance, a better understanding of variation in these two species from a systematic standpoint needs to be developed. Molecular phylogenetics provides an approach for addressing systematic questions at the individual, species, and genus level. In fact, the main goal of phylogenetic analysis is to reconstruct the evolutionary history of a group of organisms (Hillis 1987). In a strict sense “phylogeny is the (absolute) history of species and populations” (Edwards 2009), although in practice, gene trees that are used to infer the history of DNA lineages are constructed as an approximation for species’ history. Regardless, molecular phylogenies can resolve important information about levels of divergence among taxa and patterns of species 20

Phylogeny of the Channidae divergence. This information is relevant to managers who seek to conserve wild populations, either for their intrinsic and/or for their commercial value (as is the case for many fish species or stocks). Furthermore, phylogenies can be calibrated with molecular rates of evolution, and fossil or geological data to provide a chronological estimation of divergence times for discrete lineages and hence the evolution of species. This opens the door to addressing a range of questions regarding modes and causes of evolution for the group in question. Answers to these questions have important implications for understanding contemporary species and populations, especially as anthropogenic impacts on climate, habitat availability, and population connectivity have the ability to rapidly and drastically alter natural environments. Asian snakeheads provide an interesting target for phylogenetic analysis for a number of reasons. First, there is an absence of clear systematic information at the species level that has led to speculation about presence of cryptic taxa. This can in part be addressed by phylogenetic reconstruction. Second, Channa spp. have unusual ecological traits (airbreathing / terrestrial locomotion) that imply that speciation in this group of fishes may not follow similar patterns as those observed for other freshwater fishes that have experienced long periods of isolation in discrete freshwater drainages. Third, members of the genus are commercially very important, and so good systematic knowledge concerning levels of variation within and among taxa has the potential to inform fishery managers and commercial producers. Finally, good fossil evidence is now available that can be used to calibrate a channid molecular phylogeny (Murray 2006; Murray & Thewissen 2008; Roe 1991), allowing biogeographical hypotheses proposed in past studies to be examined explicitly against new evidence. Previous studies have timed divergence of Asian snakeheads from their African counterparts from in the early Cretaceous (over 100Mya estimated from mitochondrial DNA divergence) (Li et al. 2006) to as recently as the late Miocene – early Pliocene (8-4Mya estimated from (incomplete) fossil records) (Bohme 2004).

21

Chapter 2.

History of the Channidae Fishes in the Family Channidae are classified into two extant genera, the African Parachannna and Asian Channa. They represent the only family in the sub-order Channoidei, one of 18 sub-orders that together make up the Order Perciformes, the most diversified vertebrate order (Nelson 1994). The oldest known Perciform fossil dates back to the late Cretaceous, with the fossil record providing good evidence that diversification in this Order occurred from the Paleocene (65Mya) (Patterson 1993). Most extant Perciform families are considered to be more recent, having evolved during the Eocene – Miocene (55 – 5Mya), although with ongoing and substantial species radiations also occurring over the last 5 million years (Arratia et al. 2004). The first fossil record of a channid appears in the early Eocene Kuldana Formation in Pakistan, where specimens described as belonging to an ancestral (now extinct) genus ‘Eochanna’ provide evidence for an early divergence from other Perciform families. This suggests that “ the Channiformes had already emerged as a distinct phylogenetic entity” some 50Mya (Roe 1991). This date was used recently to calibrate a higher order teleost phylogeny (Santini et al. 2009), but is also applicable to lower order chronogram estimations as it provides a lower boundary for the divergence of channids from other Perciformes. By the mid Eocene (40Mya) fossils that can be identified positively as Channa appear in deposits in Kashmir (Asia), in addition to another extinct channid, an Anchichanna (Murray & Thewissen 2009), suggesting that the Eocene was a period of diversification for the family. Around this period, faunal similarity has been noted between North African and Pakistani freshwater fish species assemblages, indicating likely Eocene exchange of freshwater taxa between these two regions. By the late Eocene (35Mya), the first Parachanna specimens appear in the Egyptian fossil record (Murray 2006), although this genus is likely to have diverged from the Channa lineage some time before this. Presence of Channa and Parachanna fossils provide clear evidence that the two genera had diverged from ancestral channids by at least 40Mya, providing a second fossil calibration point for reconstructing the evolutionary history of the family.

22

Phylogeny of the Channidae

Contemporary channid geographical distribution The African genus Parachannna is represented by three extant species with species ranges restricted to western Africa. Members of the Asian genus Channa have a wide natural distribution, from Iran in the west, across the Indian subcontinent including Sri Lanka, to SE Asia including; the Indonesian archipelago west of Huxley’s Line, and the Far East including China and Siberia (Wee 1982) (Figure 2.1).

Channa

Parachanna

Figure 2.1 Natural geographical distribution of Channidae genera Parachanna (Africa) and Channa (Asia). It is not clear exactly when Asian Channa expanded east to assume their modern SE Asian distribution. Their presence in western Europe and central Asia during the Miocene is well documented, but no fossils have been found in east Asia before the Pliocene (>5Mya), leading to the suggestion that Channa may only have populated eastern Asia after the onset of the SE Asian monsoon (Bohme 2004). In comparison with other primary freshwater fish taxa, a number of anomalies can be observed in the ranges of individual Channa species that may indicate a relatively recent large scale shift in species distributions. For example, unlike most obligate freshwater taxa, some members of the genus Channa possess extensive natural ranges that span thousands of kilometers across hundreds of river drainages. This suggests that they have an extraordinary capacity for natural dispersal. In contrast, a number of Channa species have very limited geographical ranges that include one or perhaps only a few river basins, suggesting that dispersal capacity for these species is quite poor. Investigating the phylogenetic relationships among Channa species can help to develop a better understanding of how these patterns may have evolved. 23

Chapter 2.

Aims of this chapter Specifically, the aim of this chapter was to construct gene trees from genetic sequence data obtained from a number of different gene loci that provide independent molecular estimates of the evolutionary relationships among Channa species. Ultimately, these molecular data were used to construct a phylogeny for Channa spp. calibrated in time to address the following specific questions:

1. LEVELS OF DIVERGENCE AMONG SAMPLED POPULATIONS 

What are the levels of divergence observed between different recognised species in the Asian genus Channa, and are the levels consistent with current taxonomic classifications for recognised species?



What are the levels of intra-specific variation within species currently recognised in the genus Channa?



Are currently recognised species monophyletic?

2. TEMPORAL SCALE OF SPECIATION EVENTS 

Over what time scales have Channa lineages diverged to form discrete species?



Can specific divergence times for species be related to known historical climatic or geomorphological changes?



What is the level of concordance between the timing of events estimated here and earlier biogeographical hypotheses for Channa spp?

24

Phylogeny of the Channidae

METHODS Sample collection Sampling of snakehead fish for phylogenetic reconstruction in the current study aimed to collect a range of species from the genus Channa, and where possible to obtain replicates of individual species across their natural distributions in SE Asia, in order to assess divergence both within and among species in the genus. Samples used for phylogenetic analyses were collected across the Mekong Basin, primarily from local markets. Where possible at the time of sampling, fish were positively identified to species level by local government fisheries scientists. At the point of collection, fin tissue was abscised from the caudal, pectoral, or dorsal fin and samples sealed individually in vials of 75% ethanol. Additional Channa samples (fin or muscle tissue) were supplied by collaborators in Vietnam, Malaysia, Indonesia and India. Plate 2.1-2.5 show five Channa species collected for the current study and sampling localities; see Table 2.1 for further information on collection. Two outgroup species were chosen to represent divergence at higher systematic levels. A member of the Eleotridae family (Mogurnda adspersa) was chosen to represent divergence within the Order Perciformes, and at a higher level a member of the Atheriniformes (Melanotaenia splendida ) was selected to represent divergence across the Percomorpha. M. splendida belongs to the Smegmamorpha, that is currently considered to be relatively closely related to the Channidae (Chen et al. 2003). Outgroup tissue samples were provided by D. Hurwood.

Plate 2.1. C. gachua – The dwarf snakehead,

= sampling locations.

25

Chapter 2.

Plate 2.2. C. lucius – The splendid snakehead,

Plate 2.3. C. micropeltes – The giant snakehead,

= sampling locations.

= sampling locations.

Plate 2.4. C. striata - The striped or chevron snakehead,

26

= sampling locations.

Phylogeny of the Channidae

Plate 2.5. C. sp “x” (unidentified – but similar to C. marulia - The bullseye snakehead), = sampling locations.

Table 2.1. Sampling details, see Table 3.1 for geographical co-ordinates of sample locations. Species C. gachua (Hamilton 1822)

Location and date of collection

Collected by

Plate

Nam Song R., Mekong Basin. Lao PDR. Aug 2006.

E. Adamson

1.

C. gachua

Songkhram R., Mekong Basin. Thailand. Nov 2005.

E. Adamson

1.

C. gachua

Mun R./Mekong R.confluence, Mekong Basin. Thailand. Nov 2005.

E. Adamson

1.

C. gachua

Buon Ma Thuot, Daklak Prov. Mekong Basin, Vietnam. Feb 2007.

C. lucius C. lucius (Cuvier 1831) C. micropeltes (Cuvier 1831) C. micropeltes C. diplogramma (Day 1865) C. striata (Bloch 1793)

Nguyen Trong Phuc 1.

Kratie, Mekong Basin. Cambodia. Apr 2007.

E. Adamson

2.

Riau, Sumatra. Indonesia. 2007.

Dr Estu Nugroho

2.

Mun R., Mekong Basin. Thailand. Nov 2005.

E. Adamson

3.

Tien Bien District, Mekong Delta. Vietnam. Feb 2007.

E. Adamson

3.

Meenachil R., Kaduthuruthy, Kerala. India. 2008.

-

-

Kanakkankadavu, Chalakkudy R. Basin, Kerala, India. 2008.

-

4.

C. striata

Sayaburi, Mekong Basin. Lao PDR. Aug 2006.

E. Adamson

4.

C. striata

Sayaburi, Mekong Basin. Lao PDR. Aug 2006.

E. Adamson

4.

C. striata

Nam Song R., Mekong Basin. Lao PDR. Aug 2006.

E. Adamson

4.

C. striata

Vientiane, Mekong River. Lao PDR. Aug 2006.

E. Adamson

4.

C. striata

Songkhram R., Mekong Basin. Thailand. Nov 2005.

E. Adamson

4.

C. striata

Chi R., Mekong Basin. Thailand. Nov 2005.

E. Adamson

4.

C. striata

Sekong R., Mekong Basin. Cambodia. Apr 2007.

E. Adamson

4.

C. striata

Gai Lai Province, Kontum, Mekong Basin. Vietnam. Feb 2007.

Phuc Dinh Phan

4.

C. striata

Tonle Sap Great Lake, Mekong Basin. Cambodia. Apr 2007

E. Adamson

4.

C. striata

Vinh Thuan Province, Mekong Delta. Vietnam. Feb 2007.

E. Adamson

4.

C. striata

Saraburi Province, Chao Phraya R. Basin. Thailand. 2007.

C. striata

Tanjung Karang. Malaysia. 2007.

C. striata

Lampung, Sumatra. 2007.

C. cf. maculata (Lacepède 1801)

Hanoi, Red R. Basin. Vietnam. 2007.

N. Sukmasavin

4.

Dr S Bhassu

4.

Dr E Nugroho

4.

Thuy Nguyen Gia

-

E. Adamson

5.

(sensu GENBANK)

C. sp “x”

Sekong R., Mekong Basin. Cambodia. April 2007.

C. sp “x”

Sekong R., Mekong Basin. Cambodia. April 2007.

Mogurnda adspersa

E. Adamson

4.

Barron R., QLD. Australia

Dr D. Hurwood

-

Johnstone R., QLD. Australia

Dr D. Hurwood

-

(Castelnau, 1878)

Melanotaenia splendida (Peters, 1866)

27

Chapter 2.

DNA Marker Selection Four DNA fragments were targeted for phylogenetic analysis in order to provide a range of DNA markers representing faster and slower evolving regions of the genome. The regions were chosen with the aim of generating a robust mulitlocus data set that could resolve divergence at both recent and ancient evolutionary time scales. Two regions of the maternally inherited mitochondrial (mtDNA) genome were amplified; corresponding with partial fragments of the 16S ribosomal RNA gene and the Cytochrome b gene, henceforth referred to as “16S”and “Cyt b”. The 16S gene represents a slowly evolving conserved region of mtDNA that typically exhibits levels of variation useful for answering phylogenetic questions among distantly related species (Meyer 1994). This region is not translated into protein, instead the structures of transcribed RNA product forms a functional unit involved in protein synthesis. This characteristic means that 16S is not a “coding region” organised into triplet codons corresponding with amino acids, and therefore mutations in this region can include insertion / deletion events (indels) that can produce changes to sequence lengths, as well as substitution mutational events. In contrast, Cyt b is a moderately fast evolving region of mtDNA that commonly shows intraspecific diversity, but this fragment also has utility for resolving higher level relationships (Farias et al. 2001; Johns & Avise 1998; Zardoya & Meyer 1996). The gene codes for a membrane protein involved in the respiration chain and energy transport, and as a coding region evolution is constrained by reading frame, resulting in conserved sequence lengths. Both 16S and Cyt b have been used commonly to infer phylogenies within and among genera, for example, 16S: (Li et al. 2008; Smith & Wheeler 2004; Smith & Wheeler 2006) Cyt b: (Farias et al. 2001; Planes et al. 2001; Slechtova et al. 2006; Zaki et al. 2008) Two regions of the nuclear genome (nDNA) were selected to provide independent (unlinked) co-inherited markers representing divergence in the nuclear genome. The first nDNA region chosen was a partial fragment of the single-copy Recombination Activation Gene-1 (RAG1). RAG1 is a highly conserved gene, with a substitution rate up to 12 times slower than mtDNA Cyt b (Quenouille et al. 2004). This region of the nuclear genome is popular for reconstructing teleost phylogenies (for example Holcroft 2004; Lopez et al. 2004; Rícan et al. 2008; Rüber et al. 2004). The second nuclear region selected was a faster evolving, non-coding region; the first intron (RP1) within the S7 Ribosomal Protein Gene. This region is also a popular nDNA marker used to resolve teleost phylogenetic relationships, (for examples see Lavoue et al. 28

Phylogeny of the Channidae 2003; Near & Cheng 2008), and as a non-coding single copy region without constraint on mutations in length or substitution, this fragment typically exhibits much higher levels of variation than the conserved coding RAG1 locus.

DNA extraction, amplification, cloning and sequencing of PCR products DNA was extracted from fin tissue following a standard salt extraction protocol (Miller et al. 1988). See Appendix 1 for further details on extraction method. Four DNA fragments were amplified, representing three partial gene regions and a single intron. Full details of primer sequences used are listed in Table 2.2. PCR conditions for each fragment are detailed in Table 2.3. Each PCR procedure included a negative control (no DNA template). After successful PCR amplification of target fragments, amplified products were purified before the template was sequenced in both directions on a capillary sequencer. See Appendix 2 for full details. Sequence data were edited manually to confirm all base pair assignments from chromatographs using BIOEDIT software (Hall 1999). After sequencing, some individuals were found to be heterozygotes at the RP1 locus. Of these individuals, a number had alleles that differed at multiple point substitutions or differed in allele length, making it impossible to distinguish individual alleles from the heterozygote sequence read. In these cases, an attempt was made to isolate individual alleles by cloning before re-sequencing cloned PCR product to obtain single allele sequence reads. For details of cloning protocols see Appendix 3.

Table 2.2. Primers used to amplify DNA regions used for Channa spp. phylogenetic analysis. Primer Sequence

DNA region

Reference

16sbr-L: 5’-CGC CTG TTT ATC AAA AAC AT-3’ 16sbr-H: 5’- CCG GTC TGA ACT CAGA TCA CGT-3’

mtDNA 16S

Palumbi et al. (1991)

GLUDG-L: 5’-TGA CTT GAA RAA CCA YCG TTG-3’ CB3-H: 5’-GGC AAA GAG AAA RTA TCA TTC-3’

mtDNA Glut & Cyt b

Palumbi et al. (1991)

RAG1F1: 5’-CTG AGC TGC AGT CAG TAC CAT AAG ATG T-3’ RAG1R1: 5’-CTG AGT CCT TGT GAG CTT CCA TRA AYT T-3’

nDNA RAG1

Lopez et al. (2004)

S7RPEX1F:5’-TGG CCT CTT CCT TGG CCG TC-3’ S7RPEX2R: 5’-AAC TCG TCT GGC TTT TCG CC-3’

nDNA RP1

Chow & Hazama (1998)

29

Chapter 2. Table 2.3. PCR conditions used to amplify target DNA fragments. DNA

Reaction Mix (25µL volume)

Cycling conditions

Product

region 16S

Glut & Cyt b

size 50-200ng genomic DNA 0.2µM of each primer 1µL 10mM dntps (Roche™) 2.5µL 10X PCR Reaction Buffer (Roche™) 1µL 25mM MgCl2 (Fisher™) 0.5Units of Taq DNA Polymerase (Roche™) 50-200ng genomic DNA 0.2µM of each primer 1µL 10mM dntps (Roche™) 2.5µL 10X PCR Reaction Buffer (Roche™) 1µL 25mM MgCl2 (Fisher™) 0.5Units of Taq DNA Polymerase (Roche™)

RAG1

50-200ng genomic DNA 0.2µM of each primer 1µL 10mM dntps (Roche™) 2.5µL 10X PCR Reaction Buffer (Roche™) 1µL 25mM MgCl2 (Fisher™) 0.5Units of Taq DNA Polymerase (Roche™)

RP1

50-200ng genomic DNA 0.2µM of each primer 1µL 10mM dntps (Roche™) 2.5µL 10X PCR Reaction Buffer (Roche™) 1µL 25mM MgCl2 (Fisher™) 0.5Units of Taq DNA Polymerase (Roche™)

o

94 C - 2min o 35 X 94 C - 15s o 50 C - 15s o 72 C - 30s o 72 C - 1mi o 15 C - hold o 94 C - 2min o 35 X 94 C - 15s o 50 C - 15s o 72 C - 30s o 72 C - 1min o 15 C - hold Annealing temp varied for individual taxon o 94 C - 2min o 40 X 94 C - 30s o 50-57 C-30s o 72 C - 40s o 72 C - 5mins o 15 C - hold Following Chow & Hazama (1998) o 95 C - 1mins o 35 X 95 C - 30s o 60 C - 1min o 72 C - 2min o 72 C - 10min o 15 C - hold

569-576 bps

832-834 bps (Cyt b: 809 bps)

1484 bps

626-808 bps

Additional data As the number of species obtained in the field was small, and in two cases unidentified at the species level, additional sequence information for four species was acquired from GENBANK™ (Benson et al. 1999). These samples were included initially to help classify

unidentified samples collected in the study, secondly to provide nodes that could be used as calibration points in chronogram estimation, thirdly to help disperse homoplasy across the tree (Heath et al. 2008) and to overcome possible long branch attraction during phylogenetic tree estimation (Graybeal 1998), and finally, to provide a broader sample of

30

Phylogeny of the Channidae taxa to increase the accuracy of phylogenetic reconstruction, especially as the number of characters (base pairs) used in the analysis was substantial (Hillis et al. 2003). Details of additional sequences added to the dataset are presented in Table 2.4. While GENBANK™ sequences covering the four regions used for phylogenetic estimation were not available for any of the four additional taxa chosen, data that were available was still included, as missing data in Bayesian phylogenetic reconstruction is reportedly not a problem when the overall number of characters assessed is large (Wiens & Moen 2008) although see Lemmon et al.,(2009). Table 2.4. Details of GENBANK™ sequences used in the Channidae phylogenetic analysis. Species

DNA

GENBANK ™

region

Accession

Reference

Number Parachanna obscura

16S

AY763726

Rüber et al. (2006)

(Günther, 1861) Parachanna obscura

RAG1

AY763788

Rüber et al. (2006)

Parachanna obscura

Cyt b

AY763772

Rüber et al. (2006)

Channa bleheri

16S

AY763724

Rüber et al. (2006)

Channa bleheri

RAG1

AY763786

Rüber et al. (2006)

Channa bleheri

Cyt b

AY763770

Rüber et al. (2006)

Channa marulia

16S

AY763725

Rüber et al. (2006)

Channa marulia

RAG1

AY763787

Rüber et al. (2006)

Channa marulia

Cyt b

AY763771

Rüber et al. (2006)

Channa maculata

Cyt b

AF479271

Direct submission Bai et al. (2002)

(Hamilton 1822)

(Hamilton 1822)

Sequence alignment and gap coding Coding regions (Cyt b and RAG1) were homologous in length and could be easily aligned by eye. The RNA gene fragment (16S) and the non-coding RP1 intron varied in length among taxa, and so more complex multiple alignment methods were investigated to provide robust alignments with gap inference for these fragments. M-COFFEE (Wallace et al. 2006), a meta-method that combines the outputs of several multiple alignment methods to estimate a consensus alignment, was used to align the fragments. Previous authors have 31

Chapter 2. found (e.g., Pei 2008), that M-COFFEE consistently outperforms all other alignment methods. Methods trialed here included CLUSTALW (Larkin et al. 2007), MAFFT (Katoh et al. 2005), MUSCLE (Edgar 2004a; Edgar 2004b), T-COFFEE (Notredame et al. 2000), and RCOFFEE (Moretti et al. 2008) (an alignment method especially for aligning RNA sequences

that here increased the overall number of indels inferred in 16S by 550%). All other methods tended to infer many more gaps, less parsimoniously informative gaps, and produced alignments that were overly complicated with large regions of no homology. Alignments were performed on the M-COFFEE online server (Moretti et al. 2007; Wallace et al. 2006). All alignments were further checked and adjusted by eye after automated multiple alignments had been performed. After multiple alignment, parsimoniously informative indels (gaps common to two or more individuals) were coded as binary presence/absence data following the “Simple Indel Coding” method of (Simmons & Ochoterena 2000), except in a single region of RP1 in C. gachua, where a simple tandem sequence repeat (microsatellite) was coded to reflect a stepwise model of mutation. Coding indels as binary characters allows phylogenetic estimation to incorporate gap information that would otherwise be overlooked, as raw gaps (‘-’) in sequence alignments are treated as “missing data” by programs that estimate phylogenies. Harnessing the parsimoniously informative signal from indel data by treating gaps as binary characters leads to improved accuracy in phylogenetic reconstruction (Dwivedi & Gadagkar 2009).

Genetic diversity and gene tree reconstruction Tamura and Nei’s pair-wise genetic distance (p-distance) (Tamura & Nei 1993) was calculated individually for each locus to examine divergence within and among species using DAMBE software (Xia & Xie 2001). p-distance is appropriate for comparing both intraspecific and interspecific levels of divergence up to the family level (Kartavtsev & Lee 2006). Tamura and Nei’s (1993) mutational model was chosen over more simple models of nucleotide substitution e.g., the Kimura 2-Parameter (Kimura 1980), because it accounts for variable substitution rates among sites, unequal nucleotide frequencies, as well different frequencies of transition and transversion events (Tamura & Nei 1993). These issues are more important when dealing with variation at the interspecifc level, where simpler methods tend to underestimate real divergence (Tamura & Nei 1993).

32

Phylogeny of the Channidae For each DNA locus, saturation was investigated by plotting the number of transition and transversion events observed between pairs of sequences against p-distance. As sequences diverge, the rate at which identical substitutions start to arise through multiple independent mutation events approaches the rate at which new differences arise, leading to an observed “saturation” of nucleotide substitutions in the DNA region (Kocher & Carleton 1997). As transition events are generally more frequent than transversions, saturation is likely to occur first in transitional substitutions, and for coding regions third base positions are likely to reach saturation before first and second positions, as many third base substitutions are synonymous and therefore escape selection for protein structure (Kocher & Carleton 1997). When there is a high occurrence of multiple substitutions (saturation) the true evolutionary rate, and hence true level of divergence , is hidden between sequences (Kocher & Carleton 1997). Saturation plots were drawn using DAMBE (Xia & Xie 2001).

Of the methods currently available for inferring phylogenetic relationship among genes, probabilistic methods, namely Bayesian and Maximum Likelihood approaches, are generally considered to be superior for tree estimation when compared with Parsimony / Neighbour Joining methods as they incorporate information on the level of divergence under an evolutionary model, as well as permitting tree topologies to be recovered that are not strictly bifurcating (Hall 2008). These methods have also been shown to be more accurate for phylogenetic inference when gap information is included (Dwivedi & Gadagkar 2009). Both methods were used here to construct gene trees for all four loci independently. This was firstly to investigate the resolution provided by each individual gene fragment, and secondly to identify incongruence between gene trees, that can arise if loci have independent evolutionary histories (Degnan & Rosenberg 2009). Bayesian inference of gene relationships was implemented using the program MRBAYES (Ronquist & Huelsenbeck 2003). A number of separate analyses were run including and excluding indels, and with different levels of partitioning among codons for Cyt b and RAG1. After initial short runs to confirm adequate settings, analysis used three runs of four chains, with all rates and partitions unlinked and allowed to vary across partitions. The ‘4by4’ nuclear model, an nst of 6, and an invariable sites model combined with gamma model were applied to nucleotide data. States in gap (binary) data partitions were allowed to vary according to the beta distribution. Final analyses were run for 12,000,000 generations, with trees sampled every 1,000 generations. The first 25% of samples were

33

Chapter 2. discarded as burn in. At the end of runs, plots of generation versus log probability of the data were examined to confirm adequate sampling from the posterior probability distribution. In addition, the potential scale reduction factors for each parameter were checked to confirm convergence. Results were summarised as consensus trees with posterior probabilities for clades. Majority rule consensus trees have recently been advocated as the best way for presenting agreement among trees, although they should not strictly be interpreted as phylogenies (Holder et al. 2008). Maximum Likelihood analysis was implemented in RAxML (Stamatakis 2006) via the online CIPRES PORTAL (www.phylo.org). Again, a number of initial runs were executed under different parameters, with and without codon partitioning for the coding genes. Gap characters were not included. Final analyses were conducted with the GTR Gamma Model with 10,000 multi-parametric bootstrap replicates. Majority rule 50% consensus trees for the RAxML output were constructed in the program MESQUITE (Maddison & Maddison 2009).

Species tree inference and estimation of divergence times Bayesian and ML estimation of the Channa species phylogeny were performed for two sets of combined DNA fragments. The first method concatenated all four loci following the “simultaneous analysis” approach (Nixon & Carpenter 1996). In cases where individuals were represented by more than a single nuclear allele, one allele was chosen at random to represent that individual in the concatenated analysis. The second approach discarded loci that were poorly resolved in the independent gene tree reconstruction to cut down possible “noise” in the data originating from conflicting poorly resolved relationships at the interspecific level, loosely following the “prior agreement” approach outlined by Bull et al. (1993). In all multi-locus analysis, loci were partitioned with rates and models of evolution allowed to vary across partitions. Analyses were performed in MRBAYES and RAxML following the methodology described above for independent gene tree estimation. Bayesian estimation of divergence times was implemented in BEAST (Drummond & Rambaut 2007). Both data sets used in the species tree reconstruction were analysed separately. Clades were defined for all major groups of taxa following the results of the multi-locus species tree analysis. Intra-specific clades were also defined where there was clear evidence for geographical isolation across the species range (i.e., Sumatra versus mainland Asia for C. lucius and C. striata) and also where deep intra-specific divergence 34

Phylogeny of the Channidae had already been determined from independent gene tree topologies. Data files were compiled with BEAUTi software (part of the BEAST package) using the following parameters: GT’ (General Time Reversible) nucleotide substitution model incorporating Gamma + invariant sites heterogeneity, a relaxed molecular clock with uncorrelated lognormal distribution (Drummond et al. 2006), and randomly generated starting trees with tree prior set to ‘Speciation – Yule Process’. Two fossil dates were entered as priors to provide a timeframe for other node estimates, with time values assigned lognormal distributions following advice from Ho (2007) and Ho and Phillips (2009). Assigning a non uniform probability distribution to calibration points (soft bounds) allows sequence data to correct for poor calibrations if conflict between sequence data and calibration priors arises (Yang & Rannala 2006). The node representing divergence between Channa and Parachanna was constrained to a minimum of 40Mya (offset=40, mean=1.5, SD = 1), representing the first appearance of distinct Channa spp. in the fossil record around this time (Murray 2006). The prior for a second node representing the divergence of the Channidae from other perciformes was set for 48Mya (offset = 48, mean = 1.71, SD = 1.4) (Roe 1991; Santini et al. 2009). For the first run, the analysis incorporated 10,000,000 generations with parameters logged every 1,000th generation. At the end of this run, output was examined and the operators adjusted following the program log to increase Effective Sample Sizes (ESSs) in future runs (for the full data set: branch rates window size increased to 2.0, scale factor for the clock rate parameter set to 0.7775, for the reduced data set: branch rates window size increased to 4.0, scale factor for the clock rate parameter set to 0.7722). The analysis was then re-run three times and the log files combined in the program LOGCOMBINERv1.5.1 (part of the BEAST package). The program TRACERv1.4.1 (Rambaut & Drummond 2007) was used to

assess the performance of the analysis by checking the convergence of each parameter (“trace”), viewing ESS scores, and used to read divergence times for each clade, including upper and lower 95% highest posterior density bounds (credible intervals).

35

Chapter 2.

RESULTS DNA sequences from a total of 32 individuals were used in the phylogenetic analysis, including two outgroup taxa Mogurnda adspersa and Melanotaenia splendida, one African Parachanna species P.obscura , and 29 Asian Channa representing 9 putative species. Six Channa species were represented by more than a single individual (C. gachua: n= 4, C. lucius: n= 2, C. maculata: n= 2, C. micropeltes: n= 2, C. striata: n= 14, C. unknown: n= 2).

Channa genetic variation and phylogenetic analyses based on mitochondrial DNA Two mtDNA regions encompassing partial sequences of two genes (16S and Cyt b) were sequenced for taxa in Table 2.1. Additional samples were included from GENBANK™ as outlined in Table 2.4.

16S RNA A total of 30 16S sequences from 11 putative species were used in the phylogenetic analysis. Ten sequences were incomplete, with 3-4% missing data for these sequences. Amplified fragment length varied among taxa but length was consistent within species, with the shortest sequences observed in C. striata (569 bps) and the longest sequence observed in M. splendida (576 bps). Sequences were aligned to a total length of 582bps that included 177 variable sites and 11 indel regions. One hundred and twenty-three sites were variable among Channa sequences, and only two indel regions were exclusive to the Channidae (Parachanna and Channa spp). Gap coding of parsimoniously informative indels produced eight binary characters. Within Channa spp the maximum p-distance was 0.139 (between C. bleheri and C. unknown, and C. gachua and C. unknown), rising to a maximum divergence of 0.145 between Channa (C. bleheri) and P. obscura sequences. Figure 2.2c illustrates the number of transitions and transversions identified in pair-wise 16S sequence comparisons. As can be seen from the ts curve, as p-distance between sequences increases above 0.14, the frequency of transitions begins to plateau, indicating that saturation is likely to be masking divergence between outgroup and ingroup taxa after this point.

36

Phylogeny of the Channidae

0.15

0.12

(a)

Cyt b

0.13

(b)

0.10

RAG1

ts (1st)

Transitions & Transversions

tv (1st) 0.10

0.08

0.08

0.06

0.05

0.04

0.03

0.02

0.00 0.00

0.00 0.00

ts (2nd) tv (2nd) ts (3rd) tv (3rd) ts (all) tv (all)

0.11

0.10

0.20

0.30

0.36

(c)

0.10 0.08

0.24

0.06

0.18

0.04

0.12

0.02

0.06

0.00 0.00

0.10

0.20

0.20

(d)

0.30

16S

0.10

0.00 0.00

RP1 ts tv

0.20

0.40

0.60

0.80

1.0

Pair-wise genetic distance (Tamura & Nei, 1993) Figure 2.2. Saturation plots for DNA fragments. Transitions (ts) and transversions (tv) plotted against pair-wise genetic distance (Tamura and Nei, 93). Protein coding regions: (a) mtDNA Cyt b and (b) nDNA RAG1 show mutation frequency split by codon position (1st, 2nd, and 3rd), data is summarised in curves. For non-protein coding regions: (c) mtDNA 16S RNA gene and (d) nDNA RP1 intron scatter and curves are presented. Bayesian analyses, including and excluding gaps, and ML analyses failed to resolve relationships among taxa based solely on 16S data, except at some of the shallowest nodes (see Figure 2.3). Topology of the 16S phylogeny did support sister relationships between C. diplogramme and C. micropeltes, C. marulia and C. sp “x”, and C. bleheri and C. gachua (Bayesian and ML bootstrap clade support values of 1.00/97, 0.93/99, and 1.00/96 respectively). The data also indicated divergence between three C. striata lineages, the first lineage represented by the single Indian sample, the second by samples across mainland SE Asia and Sumatra, and the third lineage was restricted to the mid-to-upper Mekong River Basin, where it occurs in sympatry with the second clade at one site (Sayaburi, North Western Lao PDR). Henceforth, these C. striata clades will be referred to as West Asia (WA), East Asia (EA) and Middle Mekong (MM). 16S data alone, however, was insufficient to resolve relationships between the three clades, or support plausible phylogenetic relationships at deeper nodes in the gene tree.

37

Chapter 2. M. splendida C. diplogramma (Kerala, India) 1.00/97 C. micropeltes (Northeastern Thailand) C. micropeltes (Southern Vietnam)

1.00/92

C. striata (Kerala, Southern India)

. C. striata (Songkhram, Northeastern

WA Thailand)

C. striata (Sayaburi, Northwestern Lao)

1.00/99

MM

C. striata (Vientiane, Northern Lao) 1.00/78 C. striata (Vang Vien, Northern Lao) C. striata (Stung Treng , Cambodia) C. striata (Lampung, Sumatra)

C. striata (Tanjung Karang, Northern Malaysia) 0.90/52

C. striata (Saraburi, Central Thailand) C. striata (Sayaburi, Northwestern Lao)

EA

0.58/--

C. striata (Si Sa Ket, Eastern Thailand) C. striata (Gai Lai, Central Highlands, Vietnam) 0.96/63

C. striata (Vinh Thuan, Southern Vietnam) . C.striata (Battambang, Western Cambodia) C.marulia (Rüber et al., 2006)

0.99/93 C. sp “x” (Stung Treng, Northern Cambodia) 1.00/100

C. sp “x” (Stung Treng, Northern Cambodia) M. adspersa

0.90/-C. cf. maculata (Hanoi) 1.00/65

C. lucius (Riau, Sumatra) C. lucius (Kratie, Cambodia) P. obscura (Rüber et al., 2006)

0.53/-0.91/--

C. bleheri (Rüber et al., 2006)

1.00/100

C. gachua (Songkhram, Northeastern Thailand)

1.00/96

C.gachua (Daklak Central Highlands, Vietnam) C. gachua (Kong Jeam, Eastern Thailand)

0.64/63

C. gachua (Vang Vien, Northern Lao)

0.1

Figure 2.3. Bayesian consensus phylogram for 16SRNA mtDNA data. Numbers indicate Bayesian posterior probabilities/Maximum likelihood percent bootstrap support. Coloured bars indicate Channa spp. groups, black bars indicate C. striata clades: West Asia (WA), MM (Middle Mekong), EA (East Asia). Note the large basal polytomy, representing poor resolution of phylogenetic relationships at intermediate and deep nodes. Note also that the outgroup taxa M. adspersa and P. obscura occupy internal positions in this tree, suggesting that 16S is not a suitable DNA marker for resolving relationships between the Perciformes included in this study. 38

Phylogeny of the Channidae

Cytochrome b Sequence data were obtained for the first 809 bases of the mtDNA Cyt b gene for 32 individuals comprising Melanotaenia, Mogurnda, and Parachanna outgroups as well as all nine putative Channa taxa. Six sequences had missing data of between 0.6 and 27.5% of total length (average for all six was 10%). A total of 361 variable Cyt b sites were identified. Within Channa spp. there were 313 variable sites, with a maximum p-distance between taxa of 0.246 between C. striata and C. micropeltes. Figure 2.2 shows the frequency of mutations in relation to p-distance for Cyt b. The frequency of third base transitions clearly plateaus as p-distance approaches 0.25, indicating that saturation has probably occurred for transitions in the 3rd codon position. To explore the influence of this saturation / homoplasy on phylogenetic signal, phylogenies were constructed for Cyt b under a range of data partitioning options. In all cases, models of evolution that did not partition bases among codon position recovered more resolved and better supported gene tree topologies. Figure 2.4 shows the relationships among taxa resolved in Cyt b phylogenetic analyses. The three sister relationships established in the 16S analyses were also supported by relationships resolved in the Cyt b phylogram. The Cyt b gene tree however also recovered relationships at deeper nodes within the Channidae, as well as at some shallow nodes that were unresolved in the 16S analysis. The three C. striata clades (WA, EA, and MM) can be clearly identified in Figure 2.3, and the relationship between them is now resolved, with WA clading with MM to form a sister group to the EA haplotypes. Relationships between C. gachua also show more resolution, with samples from Lao PDR and Thailand forming a monoplyletic group to the exclusion of C. gachua from the Vietnamese Central Highlands.

39

Chapter 2.

M. splendida M. adspersa P. obscura (Rüber et al., 2006)

1.00/100

C. cf. maculata (Hanoi) C. maculata (Bai et al., unpublished)

0.79/65

C. bleheri (Rüber et al., 2006)

0.99/66

1.00/100

C.gachua (Daklak Central Highlands, Vietnam)

1.00/100

C. gachua (Vang Vien, Northern Lao) C. gachua (Songkhram, Northeastern Thailand)

0.62/89 0.66/44

1.00/81

C. gachua (Kong Jeam, Eastern Thailand)

C. lucius (Riau, Sumatra)

1.00/100

C. lucius (Kratie, Cambodia) C. diplogramma (Kerala, India)

1.00/97 1.00/100 0.91/64

0.97/82

C. micropeltes (Northeastern Thailand) C. micropeltes (Southern Vietnam)

C.marulia (Rüber et al., 2006)

1.00/100 1.00/100

C. sp “x” (Stung Treng, Northern Cambodia) C. sp “x” (Stung Treng, Northern Cambodia) C. striata (Kerala, Southern India)

0.90/70

WA

. C. striata (Songkhram, Northeastern Thailand)

0.53/66 C. striata (Sayaburi, Northwestern Lao)

1.00/100

C. striata (Vientiane, Northern Lao)

MM

C. striata (Vang Vien, Northern Lao)

1.00/100

C. striata (Stung Treng , Cambodia) C. striata (Tanjung Karang, Northern Malaysia) C. striata (Lampung, Sumatra) C. striata (Saraburi, Central Thailand) . C. striata (Sayaburi, Northwestern Lao)

0.53/100

EA

. C.striata (Battambang, Western Cambodia) C. striata (Si Sa Ket, Eastern Thailand) C. striata (Gai Lai, Central Highlands, Vietnam)

0.90/66 0.1

1.00/95

C. striata (Vinh Thuan, Southern Vietnam)

Figure 2.4. Cyt b Bayesian consensus phylogram. Numbers indicate Bayesian posterior probabilities/Maximum likelihood percent bootstrap support. Coloured bars (right hand side) indicate Channa spp. groups, black bars indicate C. striata clades: West Asia (WA), MM (Middle Mekong), EA (East Asia).

40

Phylogeny of the Channidae

Channa genetic variation and phylogenetic analysis based on nuclear DNA RAG1 gene Initially, a 1484bp region of the RAG1 gene was amplified for taxa studied here. Problems however, with PCR results (multiple products) and poor sequence read lengths, meant that fragments were reduced to a 780bp region located in the middle of the gene. One shortened sequence remained incomplete, with 19% missing data. Sequences generated in this study were aligned with three GENBANK™ RAG1 channid sequences to create a data set with the three outgroup species (Melanotaenia, Mogurnda, and Parachanna) and eighteen Channa taxa representing the nine putative Channa spp. Across all taxa 231 base positions were variable, with 95 point mutations identified among Channa spp. Maximum p-distance among Channa was 0.069, found between C. gachua and C. lucius. The saturation plot (Figure 2.2b) showed some evidence that third base transitions may be approaching saturation at the highest divergence levels observed between ingroup and outgroup taxa (above 0.2). A range of phylogenetic analyses employing different partitioning among codon positions were undertaken to investigate the role of third base pair mutations in promoting / masking phylogenetic signal. Bayesian analysis with no codon partitioning resulted in better resolution of tree topologies (less polytomies), and better support for individual nodes, than models of evolution examining individual codon positions or the 3rd base separately. In the ML method, partitioning had no result on consensus topology, although slightly higher bootstraps were observed for nodes in partitioned analyses. The consensus phylogram of the nuclear RAG1 gene fragment is shown in Figure 2.5. While providing good support for deeper nodes in the gene tree, RAG1 sequences were unable to resolve intra-specific relationships for C. gachua and C. striata. Although individuals from the C. striata mtDNA MM clade do form a monophyletic group in the RAG gene tree, the three clades previously identified by mtDNA analysis were less evident in the RAG1 data. More significant however, was that topology at intermediate nodes in the RAG1 gene tree was not entirely congruent with relationships constructed from Cyt b sequence data. Specifically, three taxa, C. diplogramma, C. lucius, and C. micropeltes assume different positions with respect to other sampled taxa. While the close sister relationship between 41

Chapter 2. C. diplogramme and C. micropeltes is still evident, the two taxa now form a basal monophyletic group with C. lucius. This relationship is quite different to that observed for mtDNA lineages, where C. lucius showed closer relationship to C. maurlia, C. striata, and C. sp “x”. M. splendida M. adspersa P. obscura (Rüber et al., 2006) C. cf. maculata (Hanoi) 0.93/54

C. bleheri (Rüber et al., 2006) 1.00/100 C. gachua (Songkhram, Northeastern Thailand) C.gachua (Daklak Central Highlands, Vietnam) 1.00/100

0.86/67

C. gachua (Kong Jeam, Eastern Thailand)

0.93/72 C. gachua (Vang Vien, Northern Lao)

0.98/91 C.marulia (Rüber et al., 2006)

C. sp “x” (Stung Treng, Northern Cambodia) 0.83/--

C. striata (Kerala, Southern India)

WA

C. striata (Si Sa Ket, Eastern Thailand)

EA

1.00/100 0.78/--

C. striata (Lampung, Sumatra) C. striata (Songkhram, Northeastern Thailand)

MM

1.00/97 C. striata (Vientiane, Northern Lao) 1.00/100 C. lucius (Riau, Sumatra)

C. lucius (Kratie, Cambodia)

0.98/61

0.98/89 C. micropeltes (Northeastern Thailand) C. micropeltes (Southern Vietnam)

1.00/98 C. diplogramma (Kerala, India) 0.1

Figure 2.5. RAG1 Bayesian consensus tree. Numbers indicate Posterior probabilities/Maximum likelihood percent bootstrap support. ML bootstraps taken from tree where nucleotides were partitioned into 1st + 2nd versus 3rd base codon positions. Coloured bars (left hand side) indicate Channa spp. groups, black bars indicate C. striata clades: West Asia (WA), MM (Middle Mekong), EA (East Asia).

42

Phylogeny of the Channidae

RP1 intron The data set for the 1st intron (RP1) of the S7 ribosomal protein gene consisted of 35 individual sequences that were generated from 28 individuals. This data set included alleles sequenced from homozygotes, sequences derived from heterozygote genotypes (with variable sites coded following the IUPAC sequence alphabet for ambiguous bases), and sequences from heterozygote PCR that were cloned to produce single allele sequence reads in cases where length polymorphism prevented generation of combined allele sequence data. Data were generated for the two outgroups, Melanotaenia and Mogurnda, and for seven of the putative Channa species. Twelve sequences had some missing data (average for all twelve 5.3%). As expected for non-coding nDNA, the RP1 fragment proved to be highly variable. Sequence lengths ranged from 626bps (for the shortest C. lucius allele) to 808bps (for M. adspersa), and varied in length within individuals and among putative species as well as between species and genera. In C. gachua a microsatellite repeat contributed to allele length differences in the fragment. Multiple alignment inferred many indels, resulting in an aligned sequence length of 877bp. After alignment, a total of 62 parsimoniously informative gaps were coded as binary data for phylogenetic reconstruction. The aligned RP1 fragments had a total of 578 variable sites, with 441 positions variable within the Channa sequence alignment. The RP1 saturation plot (Figure 2.2d) illustrates the frequency of transitions and transversions as divergence increases across pair-wise comparisons. Both transition and transversion curves appear to be approaching a plateau at high pdistances, indicating that saturation is likely to be masking true divergence between more distantly related species in this study. At low levels of divergence however (p-distance less than 0.5), the RP1 fragment shows minimal saturation and therefore is likely to represent true divergence at the intra-specific level. The consensus phylogram of RP1 alleles constructed using both nucleotide and binary gap data is presented in Figure 2.6. Phylogenetic analysis of RP1 alone was only able to resolve intra-specific and very shallow interspecific relationships with any certainty (Figure 2.6). At a lower taxonomic level however, use of this marker was sucessful at recovering relationships among alleles. Firstly, the S7 gene tree reveals that C. gachua alleles from the Vietnamese highlands are different from all alleles found in Lao PDR and Thailand. For this species, the remainder of RP1 alleles fall into two shallow clades that occur in sympatry across northern Lao and eastern Thailand. This result supports the similarity of C. gachua 43

Chapter 2. across this region that had been indicated by monophyly of individuals in the Cyt b mtDNA phylogram.

M. splendida M. adspersa C. cf. maculata (Hanoi) C. sp “x” (Stung Treng, Northern Cambodia)

0.99/100

C. sp “x” (Stung Treng, Northern Cambodia) 0.66/68

C. sp “x” (Stung Treng, Northern Cambodia)

C. lucius (Riau, Sumatra) 0.96/100

C. lucius (Kratie, Cambodia)

C. lucius (Kratie, Cambodia) 0.69/80 C. micropeltes (Northeastern Thailand) C. micropeltes (Southern Vietnam)

0.96/98

C. diplogramma (Kerala, India) 0.99/100

C. diplogramma (Kerala, India) C.gachua (Daklak Central Highlands, Vietnam) 0.80/--

0.93/100

C. gachua (Vang Vien, Northern Lao) C. gachua (Songkhram, Northeastern Thailand)

0.98/70

C. gachua (Songkhram, Northeastern Thailand) C. gachua (Kong Jeam, Eastern Thailand)

0.71/--

C. gachua (Vang Vien, Northern Lao) C. striata (Kerala, Southern India)

i

C. striata (Tanjung Karang, Northern Malaysia)

0.92/98

WA EA

C. striata (Sayaburi, Northwestern Lao)

MM

C. striata (Sayaburi, Northwestern Lao) . C.striata (Battambang, Western Cambodia) 0.94/99

C. striata (Saraburi, Central Thailand) C. striata (Vinh Thuan, Southern Vietnam

b

MM

C. striata (Stung Treng , Cambodia)

0.84/-0.62/--

C. striata (Si Sa Ket, Eastern Thailand) C. striata (Lampung, Sumatra)

0.63/84

EA

EA

C. striata (Lampung, Sumatra) C. striata (Gai Lai, Central Highlands, Vietnam)

C. striata (Vang Vien, Northern Lao) 0.97/-0.87/-0.85/99

C. striata (Vientiane, Northern Lao) C. striata (Songkhram, Northeastern Thailand)

r

MM

C. striata (Songkhram, Northeastern Thailand)

0.1

Figure 2.6. Bayesian consensus tree of RP1 alleles. Numbers indicate posterior probabilities/Maximum likelihood percent bootstrap support. Left hand side: Patterned columns indicate Channa spp. groups, vertical striped bars indicate RP1 lineages: India (i), Broad (b), and restricted (r); black bars indicate C. striata clades: West Asia (WA), MM (Middle Mekong), EA (East Asia).

44

Phylogeny of the Channidae Secondly, the RP1 gene tree clearly indicates the presence of divergent nDNA lineages within C. striata, shown in Figure 2.6 by double horizontal stripes, and henceforth referred to as lineage i (India), lineage b (broad) and lineage r (restricted). The first, lineage (i), was present in the southern Indian sample. This lineage forms a closely related sister group to lineage b, that is represented by samples that are broadly distributed across SE Asia from Northwest Lao PDR to the island of Sumatra (average p-distance between lineages i and b = 0.0151). The third lineage, lineage r, is relatively divergent from other C. striata RP1 alleles (average p-distance versus lineages i and b = 0.0412 and 0.0418 respectively). This lineage was found in C. striata individuals restricted to three sites in northern Lao PDR and northeastern Thailand. Although heterozygotes were examined at this locus, no samples sequenced here were found to possess “inter-lineage” heterozygote genotypes. As with the C. striata Cyt b mtDNA data, two divergent nDNA lineages were found in the mid – upper Mekong River Basin of mainland SE Asia, and a third and different type was found in India / West Asia. Despite this apparent similarity, Cyt b and RP1 gene trees are not congruent with respect to both the relationship between lineages and the assortment of individual genotypes across lineages. Firstly, mtDNA inference of phylogenetic relationship clades the Indian haplotype (WA) as sister to the mid – upper Mekong clade (MM), whilst the Indian RP1 allele (lineage i) clades more closely with the broadly distributed lineage b. Secondly, within the SE Asian samples, genotypes from two individuals were found to be comprised of MM mtDNA yet lineage b nDNA (individuals sampled at Stung Treng, Cambodia, and Sayaburi, Lao PDR; see striped and black bars, Figure 2.6).

Multi-locus phylogeny reconstruction and chronogram estimation Two multi-locus phylogenies were constructed to estimate species relationships. The first was reconstructed from a concatenation of all four fragments that had been analysed independently above. For the second, loci that failed to resolve intra-specific relationships independently were discarded, and the two remaining loci (the protein coding regions Cyt b and RAG1) were concatenated and analysed together. Both methods inferred well resolved relationships among all taxa, and the topologies of the phylogenies constructed under each method were very similar. The phylogeny estimated from the four loci is shown in Figure 2.7. It differs slightly from the phylogeny estimated from coding loci alone, where

45

Chapter 2.

M. splendida M. adspersa P. obscura (Rüber et al., 2006)

1.00/100

C. cf. maculata (Hanoi) C. maculata (Bai et al., unpublished)

0.89/--

C. bleheri (Rüber et al., 2006) C.gachua (Daklak Central Highlands, Vietnam)

1.00/99

1.00/100 C. gachua (Songkhram, Northeastern Thailand) C. gachua (Kong Jeam, Eastern Thailand)

0.56/--

0.92/82

C. gachua (Vang Vien, Northern Lao) C.marulia (Rüber et al., 2006)

1.00/99 1.00/100

C. sp “x” (Stung Treng, Northern Cambodia) C. sp “x” (Stung Treng, Northern Cambodia)

0.86/92 C. striata (Kerala, Southern India)

1.00/85

C. striata (Sayaburi, Northwestern Lao)

1.00/60 1.00/100 1.00/99 0.60/56

0.96/74 0.66/--

C. striata (Stung Treng , Cambodia) C. striata (Vang Vien, Northern Lao) C. striata (Vientiane, Northern Lao)

. C. striata (Songkhram, Northeastern Thailand) C. striata (Tanjung Karang, Northern Malaysia) C. striata (Saraburi, Central Thailand) C. striata (Sayaburi, Northwestern Lao)

1.00/97 0.98/83

. C.striata (Battambang, Western Cambodia) C. striata (Si Sa Ket, Eastern Thailand) C. striata (Gai Lai, Central Highlands, Vietnam)

1.00/70

C. striata (Vinh Thuan, Southern Vietnam)

0.59/11

0.87/78

C. striata (Lampung, Sumatra)

1.00/100 1.00/100

C. micropeltes (Northeastern Thailand) C. micropeltes (Southern Vietnam) C. diplogramme (Kerala, India) C. lucius (Riau, Sumatra)

1.00/100 C. lucius (Kratie, Cambodia)

0.1

Figure 2.7. Bayesian Channa phylogeny consensus tree estimated from four loci. Numbers indicate posterior probabilities/Maximum likelihood percent bootstrap support.

46

Phylogeny of the Channidae C. lucius and C. micropeltes and C. diplogramme form a sister group to all other Channa spp. See Appendix 4 to compare topologies. Both multi-locus data sets were analysed separately to estimate divergence times among taxa. The times and credible intervals estimated were very similar for each data set, however lower ESSs and longer times to parameter convergence were observed for the four locus data set (data not shown). Divergence times, Upper and Lower 95% credible intervals, and ESS scores for the coding region only data set are presented in Table 2.5. Table 2.5. Divergence Times estimated by BEAST from a two gene dataset. For location of nodes see Figure 2.8. Estimates for nodes used as initial calibration points are indicated by *. Time since most recent common ancestor (tMCRA) and credible intervals expressed in millions of years. Node C1* C2* C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15

Mean tMRCA (Mya) 53.95 43.56 33.91 28.82 25.21 23.93 22.98 12.72 11.79 10.45 8.05 5.99 5.66 3.72 3.17

95% credible intervals (Lower- Upper) 48.15 – 63.68 40.18 - 48.59 24.99 - 42.31 20.02 - 37.19 14.5 - 35.70 15.10 -33.04 14.98 – 31.48 4.58 – 22.43 6.13 – 18.31 3.56 – 18.24 3.47 – 13.91 2.14 – 10.65 1.42 - 11.86 1.34 – 7.44 1.13 – 5.76

ESS 15197.9 21589.57 3826.74 2583.47 1390.26 1865.561 1580.2 684.97 1242.184 1511.16 511.1 579.22 1494.14 770.49 1695.53

The Bayesian chronogram estimated in BEAST is presented in Figure 2.8. Most divergence between Channa spp. is estimated to have occurred during the Oligocene and Miocene. The most ancient intra-specific divergence is evident in C. striata (Node C11). Credible intervals for this split overlaps with the interspecific divergence of C. diplogramme and C. micropeltes (C8), C. marulia and C. sp ”x” (C10) and C. bleheri and C. gachua (C9). This suggests that some lineages have undergone speciation more rapidly than others in the recent past, however as these nodes are quite distant from the calibration points at the root of the tree, these tMRCA estimates should be treated with caution.

47

Chapter 2.

70

60

50

40

10

20

30

0

Mya M. splendida M. adspersa P. obscura (Rüber et al., 2006)

C13 C1

C. lucius (Riau, Sumatra)

C5

C. lucius (Kratie, Cambodia)

C8

C. diplogramma (Kerala, India) C. micropeltes (Southern Vietnam)

C2

C. micropeltes (Northeastern Thailand) C. striata (Lampung, Sumatra)

C14

C. striata (Si Sa Ket, Eastern Thailand) C. striata (Gai Lai, Central Highlands, Vietnam) C. striata (Vinh Thuan, Southern Vietnam) . C.striata (Battambang, Western Cambodia)

C11

C3

C. striata (Sayaburi, Northwestern Lao) C. striata (Saraburi, Central Thailand) C. striata (Tanjung Karang, Northern Malaysia) C. striata (Kerala, Southern India)

C6

C. striata (Vang Vien, Northern Lao)

C12

C. striata (Songkhram, Northeastern Thailand) C. striata (Stung Treng , Cambodia) C. striata (Sayaburi, Northwestern Lao) C. striata (Vientiane, Northern Lao)

C4

C10

C.marulia (Rüber et al., 2006) C. sp “x” (Stung Treng, Northern Cambodia) C. sp “x” (Stung Treng, Northern Cambodia) C. cf. maculata (Hanoi)

C7

C. maculata (Bai et al., unpublished) C. bleheri (Rüber et al., 2006)

C9

C.gachua (Daklak Central Highlands, Vietnam)

C15

C. gachua (Vang Vien, Northern Lao)

Pliocene Pleistocene

Miocene

Oligocene

Eocene

C. gachua (Kong Jeam, Eastern Thailand)

Paleocene

Late Cretaceous

C. gachua (Songkhram, Northeastern Thailand)

Figure 2.8. Bayesian inference Chronogram from BEAST. Divergence times were estimated for nodes C1-C15. Open circles (C1 & C2) indicate nodes that were calibrated with fossil dates. Grey bars around nodes C1 – C15 represent 95% credible intervals for tMRCA estimates (as per Table 2.5). 48

Phylogeny of the Channidae

DISCUSSION Evidence from four DNA loci was combined here to reconstruct the evolutionary history of members of the Channa genus in SE Asia. Analysis of coding mitochondrial and nuclear loci, both independently and in combination, provided a generally well resolved phylogeny that reveals a history of divergence that spans the last 40 million years. Ribosomal and intron DNA sequences support intra-specific and close sister group relationships, but fail to resolve deeper relationships, reflecting the large divergence present within this ancient genus.

Patterns of intra-specific divergence In the phylogenetic reconstruction, five species were represented by more than a single individual collected from different geographical locations across the species’ natural ranges. The phylogeny assessed individuals belonging to at least three species that have formerly been identified as being potentially cryptic: C. striata, C. gachua, and C. micropeltes, as well as representatives of an unidentified taxa that was found to be monophyletic with C. marulia, another potential cryptic species. In all cases, taxa identified as the same species formed monophyletic groups, suggesting that none of the samples assessed here represent paraphyletic cryptic species. The chevron snakehead, C. striata, has an extensive natural distribution, where it cooccurs with C. gachua across the full extent of the Asian snakehead distribution. A total of 14 C. striata individuals were included in the phylogeny, representing samples from five major drainage basins located in southern India, mainland SE Asia, Peninsula Malaysia and the island of Sumatra (Indonesia). All samples formed a monophyletic group, but this species showed the highest level of intra-specific divergence of any taxa included in the analysis. This divergence was observed for both mtDNA and nDNA loci. Such congruence is unlikely to have resulted from random lineage sorting in a large population. Instead, the pattern of divergence indicates that a number of separate groups within C. striata have most likely been evolving in isolation for long periods of evolutionary time. Divergence between southern Indian and SE Asian groups is not unexpected, given the large geographical distance between these sites, however, divergent lineages were also detected in individuals that occurred in sympatry in the Mekong Basin (at Sayaburi, Northeastern Lao PDR). The majority of individuals genotyped that possessed Middle Mekong (MM) mtDNA haplotypes also possessed restricted (r) nDNA, however in two instances, 49

Chapter 2. individuals with MM mtDNA possessed broad (lineage b) nDNA genotypes (Figure 2.6). This indicates that genetic exchange is occurring between the two divergent groups in the Mekong Basin, and hence that the deep intra-specific divergence detected does not reflect reproductive isolation. Chapter 3 explores diversity and phylogeography of C. striata in more depth. Four C. gachua individuals were included in the phylogeny, collected from sites separated by over 1,000km of river distance in the Mekong River Basin (Plate 2.1). These individuals grouped closely together for all loci examined, and did not provide evidence for cryptic diversity in this species across the study area. The native range of C. gachua is recorded to span the complete geographical extent of Asian snakehead fish (Figure 2.1), from Pakistan, across India and Sri Lanka, across mainland SE Asia to islands in the Indonesian archipelago. Although no evidence for cryptic diversity was found in the Mekong Basin for C. gachua, it is quite possible that across its entire range there may be greater evidence for cryptic diversity. In their 2004 USGS Circular on snakehead fishes, Courtenay and Williams report that C. micropeltes has a “remarkably disjunctive distribution” (p94), where the majority of the natural range of the species is represented by rivers in mainland SE Asia, Peninsula Malaysia, Sumatra and Borneo, but that the species also occurs in an isolated pocket in southwestern India. Although the Indian “C. micropeltes” was originally described as Ophiocephalus diplogramme (Day 1865), Courtenay and Williams (2004) suggest that this is a misidentification, and that C. micropeltes was translocated to southwest India sometime before the 19th Century. Contrary to this hypothesis, the present study supports the original description of the Indian taxa as a distinct species. While the two taxa do form a monophyletic group in every gene tree reconstruction, divergence between C. diplogramme and the SE Asian C. micropeltes is larger than for any other intra-specific comparison here, suggesting that the two sister species have been evolving in isolation for a long period of evolutionary time that is unlikely to be explained by a recent introduction. Two Channa species collected for this study could not be identified to species level at the time of collection. The first, collected from northern Vietnam, was classified as C. cf maculata based on molecular comparison with DNA database records (Genbank™, Bai et al unpublished) and on records of species occurrence (Arthur & Te 2006). The second species, collected from Cambodia and referred to here as C. sp. “x” (Plate 2.5), remains unclassified. In all gene and multi-locus reconstructions the unknown species forms a sister group to C. 50

Phylogeny of the Channidae marulia, a species with a large mainland south Asian distribution. In the current study C. marulia is represented only by a DNA database sample collected from Bengal, north eastern India, over 2000km distant from the collection site for C. sp. “x” (Rüber pers comm. 2009). While C. sp. “x” and C. marulia consistently group together, considerably more divergence was present between the two taxa than was found within any recognised species groupings. In a previous channid phylogeny (Li et al. 2006), C. marulia formed a sister group to C. marulioides, but this species is not known to occur in Cambodia where C. sp. “x” was collected. The unknown species here may in fact represent evidence for a C. marulius “cryptic species complex” as suggested by Rainboth (1996a). The closest image found to the image taken at the time of collection (Plate 2.5) is shown below (Plate 2.6). Although this image also lacks formal identification, it was presented by Courtenay and Williams (2004) (among others) as a C. marulia, and superficially, it does closely resemble C. sp. “x” in the current study.

Plate 2.6. Adult C. marulia guarding young. Photographed by Ianaré Sévi. (Available from Wikimedia Commons http://en.wikipedia.org/wiki/File:Channa_marulius.jpg)

51

Chapter 2.

Relationships among taxa A number of robust interspecific relationships were identified from the analysis using independent and combined loci. C. gachua was consistently identified as the closest relative of C. bleheri. These are two of the smaller snakehead species and only grow to a maximum length of about 20cm. While C. gachua has an extensive natural distribution, C. bleheri is endemic to the Brahmaputra River Basin in northeastern India (Courtenay & Williams 2004), where presumably the two species currently occur in sympatry. This genetic affinity has been reported previously by other authors, based on three morphological traits (U-shaped Isthmus, single sensory pores arrangement, and presence of large cycloid scales on each side of the lower jaw (Vishwanath & Geetakumari 2009)), and mtDNA relationships (Li et al. 2006). As this group occupies a distal position in the phylogeny, this suggests that the discriminating traits identified by Vishwanth and Geetakumari (2009) may be recently derived. The C. bleheri + C. gachua clade consistently nested with C. maculata, a larger Channa species restricted historically to a limited natural distribution in southern China and northern Vietnam at the eastern extent of the larger range of C. gachua. Sister to these taxa is a clade composed of C. marulia, C. sp “x” and C. striata. Both C. marulia and C. striata possess extensive natural distributions across southern and SE Asia, and it is interesting that the two species also share a close genetic affinity. C. micropeltes, C. diplogramme and C. lucius clade together within the phylogeny resolved here. This result is not congruent with the topology obtained by Li et al., (2006), however, in both cases C. micropeltes and C. lucius lineages represent descendants of the most ancient divergence events within the Channa. Both species are characterised by a region of gular scales that are also present in African Parachanna, but are absent in the majority of the Asian species (Musikasinthorn & Taki 2001). This supports the hypothesis proposed by Li et al (2006) that this morphological trait is plesiomorphic, or an ancestral character state. Two other species, C. bankanensis and C. pleurophthalmus, are reported to share the gular scales character with C. lucius and C. micropeltes. All four species exhibit ranges restricted to SE Asia, with the former two restricted to islands in western Indonesia. In contrast, C. diplogramme is only recorded from southwestern India, and it is not yet documented if this species also possesses the gular scale character trait.

52

Phylogeny of the Channidae

Channid divergence times Divergence times estimated from nuclear and mitochondrial DNA data calibrated with fossil evidence show a clear history of channid diversification over the last 40 million years. Estimated divergence times vary greatly from previous estimates based on only single lines of evidence, i.e., incomplete fossil history (Bohme 2004), and single locus mtDNA evolutionary rates (Li et al. 2008). Current divergence time estimates do agree, however, with crude estimates based on general rates of Cyt b evolution, for example BEAST estimated the divergence time for Node C3 at 33.91Mya (24.99 – 42.31) (Figure 2.8, Table 2.5), compared with estimates based on p-distance that yielded 24.6Mya, 36.9Mya, and 49.2Mya with published molecular clock rates of 1%, 1.5%, and 2%, respectively (Bernardi et al. 2004; Johns & Avise 1998). The root of the fossil calibrated chronogram (Figure 2.8) is aligned to the established hypothesis that divergence and radiation of ancestral perciforms was initiated when the Indian and continental Asian Plates collided, placing India as the centre of origin for perciform families (Bagra et al. 2009; Briggs 2003a; Karanth 2006). The northern India / Pakistan region also probably represents the centre of origin for the Channidae family, as the earliest fossil records (~50Mya) have been uncovered here, dating from a time before the Himalayas were up-thrust (Roe 1991). The chronogram illustrates the estimated divergence times for members of the Asian Channidae. It provides evidence that the Channa genus underwent significant radiation during the Oligocene to early Miocene 34-20Mya. Globally, the mid-Oligocene was a time of major sea regression (Hall 1998). Islands in western Indonesia were connected to mainland SE Asia, and major river systems flowed southwards and eastwards from areas of high relief on the Tibetan Plateau, possibly driving dispersal of many taxa eastwards into SE Asia (Clark et al. 2004; Clift et al. 2006; Hall 1998). For channids, a landscape that provided potential for long distance hydrological connectivity may have presented the ideal opportunity to expand their ranges across southern Asia. As the Oligocene gave way to the Miocene around 24Mya, major drainage rearrangements occurred on the Tibetan Plateau, isolating the formerly connected headwaters of the Yangtze, Red, Mekong and Salween Rivers (which flow to China, Vietnam, mainland SE Asia, and Burma, respectively) (Clark et al. 2004; Clift et al. 2006). It is notable that this period of vast geomorphological change was also correlated with the period in which the Asian snakeheads appear to have undergone their primary speciation wave. Descendants of lineages that arose around this 53

Chapter 2. time are represented in taxa that today span the full geographical extent of the Asian snakehead distribution, suggesting significant dispersal has occurred since the Oligocene / Miocene divergence. Snakeheads prefer a wet climate, and the contemporary distributions of both Channa and Parachanna spp. are thought to be limited to areas with periods of high rainfall (>150mm in the wettest month) and warm temperatures (average mean temp 20OC in the wettest month) (Bohme 2004). The Miocene in SE Asia was generally characterised by a wet climate, culminating in the development of the East Asian Monsoon in the Late Miocene over 7Mya (An 2000; Morley 1998). The late Miocene also appears to represent a time when ancestral Channa were diverging further to become what can now be recognised as sister taxa (nodes C8, C9, and perhaps C10, Figure 2.8, Table 2.5). The onset of the monsoon system around this time brought cyclical climatic changes, alternating between dominance of dry-cold winters and warm-humid summers (An 2000; Morley 1998), resulting in associated repeated expansion and contraction cycles of rainforest assemblages (Morley 1998). This change perhaps may also have acted to isolate populations across ancestral species’ ranges in greater southern Asia, facilitating the formation of sister species pairs. The deep divergence among C. striata lineages also arose around this time. More recently, the Miocene / Pliocene boundary (5Mya) perhaps represents a time of channid population expansion, when contemporary taxa may have expanded their ranges. The most recent common ancestors for mainland and Sumatran individuals of C. striata and C. lucius are estimated to date to the Pliocene, although the land bridge connecting Sumatra to continental Asia was present most recently as little as 20Kya (Woodruff & Turner 2009). Some care should be taken in interpreting more recent divergence times, as nodes that lie distant from calibration points (for this data set C1 and C2) are hard to estimate accurately (Linder et al. 2005).

54

Phylogeny of the Channidae

Conclusion All species examined in the current study were monophyletic, and in the two cases where taxonomic confusion was present, the taxa in question (C. diplogramma and C. sp “x”) were found to be monophyletic with the recognised species with which they had previously been confused. Despite this monophyly, the large divergence (>10Mya) between these taxa suggests that taxonomic status as distinct species is probably warranted in each case, especially as this divergence is of a similar magnitude to divergence observed between taxa currently recognised as distinct (C. bleheri and C. gachua). The level of divergence uncovered among C. striata individuals, however, indicates that deep divergence has not necessarily resulted in the formation of reproductive isolation for all lineages in the genus. The combination of fossil evidence and molecular divergence presented here indicates that that the divergence between African and Asian snakeheads is most likely to have taken place soon after the Indian-Asian plate collision (Eocene), in contrast to previous biogeographical hypotheses that have suggested Gondwanan (early Cretaceous) (Li et al. 2006) or Miocene (Bohme 2004) divergence between genera. In Asia, the presence of multiple divergence events in the Miocene that reflect division between species with western and eastern distributions suggests that dispersal across southern Asia may have been limited during this time, possibly in association with aridifying climate and Himalayan upthrust

55

Chapter 3 Patterns of genetic diversity and phylogeography of Channa striata in SE Asia

57

Phylogeography of C. striata

INTRODUCTIONION Channa striata, commonly known in English as the ‘chevron snakehead’, ‘striped snakehead’, or ‘striped snakehead murrel’, is perhaps the most common species in the Asian snakehead genus. C. striata is locally abundant across a geographically wide natural distribution, with a natural range that encompasses most of southern and SE Asia, extending from Sri Lanka in the west to Borneo and Sumatra in the southeast (Courtenay & Williams 2004; Fishbase 2010). Growing up to 90cm, C. striata is also one of the largest snakeheads (Rainboth 1996a). Across its native range this species is harvested for food from the wild in vast numbers (Figure 3.1a). The species is popular for human consumption, and is generally marketed fresh or alive. In mainland SE Asia, C. striata is possibly the most common fish species in markets along the Mekong River, where it is among the most important species from wild capture and commands the highest prices from culture production (MRC Fisheries Program 1999; Naret et al. 2002). In the Vietnamese delta alone, C. striata accounts for 20-40% of

Thousands of tonnes

all household money spent on fish (Dey et al. 2005). 90 80

(a)

Data © FAO - Fisheries and Aquaculture Information and Statistics Service 20/02/2010

70 60

50 40 30 20

Thousands of tonnes

10

0 1950 16

1960

1970

(b)

1980

1990

2000

Data © FAO - Fisheries and Aquaculture Information and Statistics Service 20/02/2010

14 12 10

8 6 4

2 0 1950

(c) 1960

1970

1980

1990

© MRC

2000

Figure 3.1. Global fisheries production for C. striata 1950-2007. Source: Fisheries Statistics Data (FAO 2010). (a) Production from wild capture fisheries, (b) Production from aquaculture, (c) A Cambodian fish vendor sells C. striata at a local market. 59

Chapter 3. In the Mekong Basin C. striata are harvested in large numbers from flooded forests, rice fields, associated canals and other wetland areas in all seasons and at all stages of maturity (Ambak & Jalal 2006) (personal observation). The species is commonly caught using seines, gill-nets, traps and baited hooks (Nguyen et al. 2006; Rainboth 1996a). The majority of fishing operations are small scale family run enterprises, although larger operations are also present. C. striata production from aquaculture represents a rapidly expanding industry in the region (Figure 3.1b), with fish often grown at very high stocking densities (greater than 30/m2 for fish of average length 20cm) where they perform well on formulated feeds (Qin & Fast 1998). At present, almost all aquaculture production is established using wild caught juveniles (Ali 1999; Poulsen et al. 2008; So & Haing 2007). As demand for fish in Asia continues to grow (Delgado et al. 2003; Huang & Bouis 1996), it is likely that fishing pressure will increase on natural populations of C. striata, both as a capture fishery and as a source for cultured stocks.

Ecology C. striata are primarily carnivorous, and as they grow from fry to adults their diet is thought to shift from planktonic crustaceans, snails and worms towards larger prey items including fish, frogs, and small aquatic snakes (Lee & Ng 1994; Wee 1982). Individuals prefer standing or slow moving water up to 1m in depth (Courtenay & Williams 2004), and are often found in rice paddies, drainage canals, lakes and ponds. C. striata is nonmigratory (known in the Mekong Basin as “black fish”), but do undertake short lateral migrations to and from flood plain habitats (Poulsen et al. 2008). In fact, C. striata is one of the main ‘self-recruiting species’ indigenous to SE Asia, and is quick to colonise ephemeral freshwater habitats where they occur, including inundated rice fields and temporary flooded forest habitats (Amilhat et al. 2009a). Despite their capacity to disperse quickly at a local scale, the maximum dispersal distance reported for this species is only 3km (over a 2 year study), with average recorded dispersal distances in the order of only 500m (Amilhat & Lorenzen 2005), suggesting that movement is generally limited over individual lifetimes. C. striata are solitary except during breeding , and become reproductively mature at 23 years of age (total length >20 cm) (Kilambi 1986). Reproductive females produce an average of between 4326-9017 oocytes annually (Ali 1999; Kilambi 1986) , although in nature the maximum brood size is probably around 5000 (Parameswaran & Murugesan

60

Phylogeography of C. striata 1976). This is relatively low in comparison with many other freshwater fishes (Wee 1982). Individuals are “opportunistic breeders” (Poulsen et al. 2008), and are capable of multiple extended spawning under the right conditions, but breeding effort is commonly reserved for onset of the wet season, when fish move into floodplain habitat (Ali 1999). Across mainland SE Asia, pairs are thought to spawn up to twice a year during the flood season, and eggs are guarded by parents until they hatch (after around 24 hrs) (Ali 1999; Campbell et al. 2006; Courtenay & Williams 2004; Marimuthu & Haniffa 2007). Presence of suprabranchial chambers in adult C. striata enable it to breathe air and so remain out of water for extended periods of time, at least 28 hrs or even more provided humidity is sufficient to keep individuals moist (Hughes & Munshi 1986; Sayer 2005). Of all the snakehead species, C. striata is perhaps the most tolerant of poor water quality, but it is also capable of traversing terrestrial environments, or aestivating under hardened mud (Wee 1982). These characteristics enable individuals to respond to seasonal changes in habitat availability and quality by either moving short distances overland in search of new favoured aquatic habitat or by remaining dormant until local conditions improve. These are uncommon attributes in freshwater fish species. Unusual life history characteristics make the population structure of C. striata in SE Asia hard to predict. Populations of freshwater species commonly show population structure tightly-linked to the drainage basins they inhabit (Meffe & Vrijenhoek 1988), yet C. striata is also capable of some overland dispersal and is known to be a rapid coloniser, suggesting that unlike many freshwater species it may have potential to disperse widely among drainage basins. A previous study of C. striata genetic diversity in Thailand found differentiation among populations in the middle Mekong Basin and Chao Phraya Basin at a level expected among local races, yet noted close genetic similarity between geographically proximate C. striata populations inhabiting the upper regions of both drainages (Hara et al. 1998). This pattern suggests that neither geographical proximity nor drainage divides are likely to fully explain contemporary population structure of C. striata populations. Furthermore, spatial arrangement of genetic diversity in relatively sedentary species often conforms to a pattern of isolation by distance (Wright 1943), where genetic exchange is inversely proportional to geographical distance among sites, with higher rates of gene exchange (and hence similarity) apparent among neighbouring populations. For populations arranged in a linear habitat (such as along a large river system), this expectation can be refined to expectations of Kimura and Weis’ (1964) One-dimensional 61

Chapter 3. Stepping Stone model, or perhaps to Meffe and Vrijenhoek’s (1988) Stream Hierarchy model, where probability of gene exchange (dispersal) within a drainage is influenced directly by the level of freshwater connectivity. For C. striata in the Mekong, these predictions do not seem to be satisfied. Results of the phylogenetic analysis (Chapter 2) revealed that not only are two deeply divergent C. striata lineages present in sympatry in the Mekong, but ancestors of one lineage were more closely related to individuals from southern India, separated by over 2,000kms, than to other individuals collected at the same sampling site.

Understanding C. striata diversity in a regional context Considering that C. striata has an unusual ecology and the importance of this species to human populations, a number of environmental and anthropogenic factors may possibly have influenced contemporary population structure and genetic diversity patterns in C. striata populations in SE Asia. These can be broadly grouped into factors promoting population and/or range expansion, causing possible population declines, and those enhancing population connectivity or promoting admixture. C. striata are known to prefer slow moving shallow water habitat. The formation of the vast shallow Tonle Sap Great Lake in the Lower Mekong Basin during the mid Holocene (Penny 2006; Rainboth 1996a) was likely to have expanded available preferred habitat for the species, and hence could have promoted population expansions. It has also been suggested that recent expansion of rice growing by human populations across SE Asia with associated conversion of natural habitats to paddy fields may have benefited wild C. striata populations (Poulsen et al. 2008), leading to sustained large population sizes. Conversely, habitat loss and overexploitation of wild stocks have the potential to severely impact fish populations over very short time scales (Collares-Pereira & Cowx 2004; Kenchington 2003; Mullon et al. 2005). C. striata populations have been the focus of sustained and intensive harvesting in particular, over the last 50 years (Figure 3.1a), and this could have resulted in marked local declines in genetic diversity. Local declines in populations have been reported in some parts of the species’ natural range (Courtenay & Williams 2004), and these incidents may reflect a wider pattern of decline due to increased harvesting pressure.

62

Phylogeography of C. striata Finally, throughout the Pleistocene, sea levels fluctuated greatly, and large parts of the Sunda and Sahul shelves were repeatedly exposed, connecting freshwater drainages and forming extended river basins (Rainboth 1996b; Voris 2000; Woodruff & Turner 2009). In addition, geomorphological changes in mainland SE Asia are thought to have reshaped drainage lines, both isolating and connecting freshwater habitats (Rainboth 1996a). Expanded freshwater connections among what are now isolated drainage basins, that are known to have enhanced dispersal in other freshwater taxa (de Bruyn et al. 2004; Dodson et al. 1995; McConnell 2004) could have also promoted gene exchange and population admixture in C. striata. Dispersal of C. striata across southern Asia may also potentially have resulted from translocations by humans. Today C. striata is an important, high value food fish in India and in SE Asia. Translocation by humans is common for economically important freshwater taxa, and it is possible that some C. striata lineages may in part, owe their current distribution patterns to human populations moving individuals to new habitats across southern Asia for food or commercial reasons.

63

Chapter 3.

Aims of this chapter This chapter aimed to characterise the levels and patterns of genetic diversity in wild C. striata populations across SE Asia and to determine the pattern of population structure present in this species. Mitochondrial and nuclear microsatellite data were interpreted in a geographical, demographic and historical context to re-construct a model for the microevolutionary history of this species in the region. Specifically, the questions addressed at both local and regional scales include: 

How is genetic variation distributed across the study area?



How does divergence in mtDNA diversity compare with divergence and diversity in nDNA?



o

Do different loci indicate similar patterns of divergence?

o

Is this pattern reflected at the individual as well as at the gene level?

To what extent do divergent lineages occur in sympatry? o



What is the extent of hybridisation among lineages?

Is there any evidence for recent demographic changes that may have been associated with anthropogenic activities?



What does the spatial arrangement of genetic variation reveal about past and present patterns of gene flow, genetic exchange, and colonisation by the species?



How do observed patterns of phylogeography and population structure relate to the ecology and life history of C. striata?

64

Phylogeography of C. striata

METHODS Sample Collection Sampling of C. striata aimed to collect individuals from across the natural distribution of the species in SE Asia at a range of different hierarchical scales. At the finest spatial scale, multiple sites were sampled across the Mekong River Basin to assess variation, dispersal and population structure at the within-drainage basin level. At a higher level of the spatial hierarchy, sites were assessed in the Chao Phraya River drainage basin and Mekong River drainage basin to provide comparisons at the neighbouring drainage scale. Individuals from a third drainage in mainland SE Asia (at Tanjung Karang, Malaysia) were included to provide additional comparison at the regional scale. At the highest level of spatial hierarchy, a site from Sumatra in the Indonesian archipelago and a site from southern India were included to assess divergence/similarity across the full extent of the natural range of the species, spanning a marine barrier to dispersal (Sumatra) and large geographical distance (India) Figure 3.2 and 3.3. Across the Mekong Basin, C. striata samples were collected primarily from local markets. At the time of sampling, fish were positively identified to species level by local government fisheries scientists. All fish sampled were caught locally from the wild, usually within a 15km radius of the point of collection, and fish collected at a single site were assumed to be representative of local population variation at and around that site. Where possible, 30 – 50 individuals per site were sampled to allow for a robust analysis of variation at the local population level (Ruzzante 1998). Fin tissue was abscised from the caudal, pectoral, or dorsal fin and samples sealed individually in vials of 75% ethanol. Additional C. striata samples (fin or muscle tissue) were supplied by collaborators in Vietnam, Malaysia, Indonesia and India. Geographical co-ordinates for each site and details of sample collectors are presented in Table 3.1. Figure 3.2 and Figure 3.3 illustrate location of collection sites. Information for each location including river drainage basin membership and country are presented in Table 3.2. MtDNA and nDNA analyses utilised slightly different sets of samples (see Table 3.2 for details). Colours of collection sites in Figure 3.2 and Figure 3.3 indicate which sites correspond with the mtDNA and/or nDNA analyses. In preparation for genetic screening, total genomic DNA was extracted following Miller et al.’s (1998) standard salt extraction method (Appendix 1).

65

Chapter 3.

Table 3.1 Geographical co-ordinates for C. striata sample sites. Collaborators that assisted with sample collection are presented in the right hand column. See Table 3.2 for site code abbreviations and further sampling details. Collection site Kanakkankadavu Chiang Mai Saraburi Tanjung Karang Lampung Sayaburi Vang Vien Vientiane Songkhram Mukdahan Kong Jeam Burirum Kontum Buon Ma Thot Stung Treng Kratie Snoul Memot Kampong Cham Pha Ao Kampong Chhnang Pursat Battambang Takeo Tinh Bien Tram Chim Tan An Vinh Thuan Phung Hiep

66

Latitude and longitude

Collectors and sampling dates

O

O

Unknown. 2008

O

O

D. Hurwood & E. Adamson. Nov 2005.

O

O

N. Sukmasavin & Thai Dept. of Fisheries. 2007.

O

O

S. Bhassu. 2007.

O

O

E. Nugroho. 2007.

O

O

Boonsong, D. Hurwood & E. Adamson. Aug 2006.

O

O

Boonsong, D. Hurwood & E. Adamson. Aug 2006.

O

O

Boonsong, D. Hurwood & E. Adamson. Aug 2006.

O

O

U. Suntornata, D. Hurwood & E. Adamson. Nov 2005.

O

O

U. Suntornata, D. Hurwood & E. Adamson. Nov 2005.

O

O

U. Suntornata, D. Hurwood & E. Adamson. Nov 2005.

O

O

N. Sukmasavin & Thai Dept. of Fisheries. 2007.

O

O

Phuc Dinh Phan. 2007.

O

O

Phuc Dinh Phan. 2007.

O

O

S. Lieng, E. Adamson & Cambodian DoF. Apr 2007.

O

O

S. Lieng, E. Adamson & Cambodian DoF. Apr 2007.

O

O

S. Lieng & E. Adamson. Apr 2007.

O

O

S. Lieng & E. Adamson. Apr 2007.

O

O

O

O

S. Lieng & E. Adamson. Apr 2007.

O

O

E. Adamson & Cambodian DoF. Apr 2007.

O

O

E. Adamson & Cambodian DoF. Apr 2007.

O

O

O

O

Cambodian DoF. Apr 2007.

O

O

Nguyen Nguyen Du & E. Adamson. Feb 2007.

O

O

Nguyen Nguyen Du & E. Adamson. Feb 2007.

O

O

Nguyen Nguyen Du & E. Adamson. Feb 2007.

O

O

Nguyen Nguyen Du & E. Adamson. Feb 2007.

O

O

Nguyen Nguyen Du & E. Adamson. Feb 2007.

10 08.00’N 76 07.00’E 18 47.00’N 99 00.00’E 14 37.50’N 100 54.00’E 03 25.78’N 101 16.76’E 03 20.30’S 101 10.00’E 19 16.10’N 101 42.68’E 18 56.52’N 102 26.85’E 17 57.61’N 102 36.86’E 17 47.90’N 104 00.43’E 16 32.33’N 104 43.65’E 15 18.88’N 105 30.04’E 14 57.75’N 102 58.20’E 14 19.22’N 108 01.63’E 12 43.80’N 107 55.00’E 13 31.73’N 105 58.26’E 12 29.10’N 106 01.02’E 12 04.56’N 106 25.36’E 11 49.62’N 106 10.86’E 11 59.38’N 105 27.86’E 12 01.70’N 104 51.86’E 12 15.27’N 104 40.13’E 12 32.32’N 103 55.10’E 13 07.59’N 103 12.69’E 10 56.00’N 104 50.00’E 10 37.16’N 105 00.01’E 10 40.20’N 105 33.57’E 09 52.31’N 105 07.45’E 09 30.73’N 105 15.53’E 09 48.75’N 105 49.26’E

S. Lieng, E. Adamson & Cambodian DoF. Apr 2007.

T. Roth, E. Adamson & Cambodian DoF. Apr 2007.

Phylogeography of C. striata

See Figure 3.3

Figure 3.2. Map of Southern Asia showing broad scale sampling sites for C. striata. Stars indicate collection sites. Colour of star indicates type of genetic analysis performed; Yellow stars ( ) Cyt b mtDNA only, and Red stars ( ) both Cyt b and microsatellite DNA.

Figure 3.3. Map of mainland SE Asia showing fine scale sampling sites for C. striata in the Mekong River Drainage. Stars indicate collection sites. Colour of star indicates type of genetic analysis performed; Yellow stars ( ) Cyt b mtDNA only, Green star ( ) microsatellite DNA only, and Red stars ( ) both Cyt b and microsatellite DNA

67

Chapter 3.

Table 3.2. C. striata collection sites and sample sizes. See Figures 3.2 and 3.3, and Table 3.1 for geographical location of sampling sites.

Collection site

Site Code

n mtDNA analysis (total =988)

Kanakkankadavu Chiang Mai Saraburi

K CM CP

3 10 49

10 48

Tanjung Karang

TK

30

30

Malaysia

Lampung

LP

7

24

Indonesia

Sayaburi Vang Vien Vientiane Songkhram Mukdahan Kong Jeam Burirum Kontum Buon Ma Thot Stung Treng Kratie Snoul Memot Kampong Cham Pha Ao Kampong Chhnang Pursat Battam Bang Takeo Tinh Bien Tram Chim Tan An Vinh Thuan Phung Hiep

SB VV VT SM MD KJ NE GL LL ST KK SN ME KC PA KH PS BB TP TT TC TL VI PH

28 2 42 2 21 10 50 39 40 47 49 50 39 50 49 51 56 50 49 42 53 28 42

30 20 42 8 29 11 50 39 40 48 48 48 42 45 42

68

n microsatellite analysis (total = 654)

Country

River Drainage

India

Chalakkudy

Thailand

Chao Phraya Sungai Tengi Eastern Coastal

Lao PDR

Thailand Vietnam (Highlands)

Mekong Cambodia

Vietnam (Mekong Delta)

Phylogeography of C. striata

DNA marker selection Both mitochondrial DNA (mtDNA) and nuclear DNA (nDNA) diversity were assessed for this study. In each case, regions of the genomes were selected to address specific aims of the research, and species specific primers compatible with high through-put screening techniques were designed to target selected loci. A region of the Cytochrome b gene of the mtDNA genome was chosen to address genealogical relationships among individuals and populations. MtDNA is well suited for phylogeographic investigations, and is frequently used to address questions regarding population history, demography, and spatial patterns of genetic diversity and divergence (Avise 2009). MtDNA has a number of characteristics that make it especially useful for intra-specific phylogeographic analysis that have been discussed elsewhere in depth and that are now generally well recognised (Avise 1994, 2000, 2009; Wilson et al. 1985). Briefly, mtDNA is a haploid, maternally inherited, non-recombining, rapidly evolving locus with an effective population size one quarter that of nuclear loci. These characteristics mean that mtDNA provides an unbroken genealogical record of species’ and population histories. Under assumptions of neutrality and coalescent theory, mtDNA diversity can be analysed to infer historical demographic events including; population bottlenecks, range expansions and to map the micro-evolutionary history of a species in a spatial framework. The Cytochrome b gene region (Cyt b) was selected for screening here as this mtDNA region was found in a pilot study to contain suitable levels of variation to address intra-specific relationships among C. striata populations. To further investigate population processes, microsatellite regions of the nuclear genome were selected. Microsatellite loci are simple sequence repeats that are present in high frequencies across eukaryote genomes, that theoretically represent multiple unlinked Mendelian loci (Ellegren 2004; Zhang & Hewitt 2003). Most microsatellites are in noncoding regions of the genome and typically have much higher mutation rates than rates of nucleotide base substitution in mtDNA or nDNA protein coding regions, that generate high levels of allelic diversity at the intra-specific level relevant to addressing fine-scale ecological questions (Schlötterer 2000; Selkoe & Toonen 2006; Zhang & Hewitt 2003). Microsatellite data are used commonly to address questions regarding contemporary levels of population structure, gene flow, demographic history, interbreeding, and relatedness among individuals (Pearse & Crandall 2004; Selkoe & Toonen 2006).

69

Chapter 3.

Molecular techniques - Mitochondrial DNA PCR amplification Initial sequencing was undertaken for a number of C. striata individuals to confirm suitability of Cyt b as a variable marker for addressing phylogenetic and population genetic questions for this species. Several 834 base pair mtDNA fragments encompassing 25 bases of the Glutamate tRNA gene and 809 bases at the start of the Cyt b gene were amplified with primers GLUDG-L and CB3-H (Palumbi et al. 1991) prior to sequencing in both directions. For primer sequences and PCR conditions see Table 2.2 and Table 2.3, for sequencing protocol consult Appendix 2. Initial sequences were aligned by eye using BIOEDIT software (Hall 1999) and a region of partial homology identified from bases 174 to

193 of the Cyt b gene, from which a Channa specific primer was designed; Ch-CBi: 5’-CAT YAC CAC CGY CTT CTC AT-3’. Used in combination with CB3-H, this primer yielded a 657 base

pair internal fragment of the Cyt b gene that was suitable for mass screening with Temperature Gradient Gel Electrophoresis (TGGE). In preparation for screening, the internal Cyt b region was amplified for each sample as follows: 25µL PCR reaction volumes contained 50-200ng genomic DNA, 5pmol of each primer, 1µL of 10mM dntps (Roche™), 2.5µL of 10X PCR Reaction Buffer (Roche™), 1µL of 25mM MgCl2 (Fisher™) and 0.5units of Taq DNA Polymerase (Roche™). PCR cycling conditions began with denaturing at 94oC for 2mins, followed by 35 cycles of 15sec denaturation at 94oC, 15sec annealing at 50oC and 30sec extension at 72oC. A final extension of one minute was followed by a 15oC hold step to complete the reaction cycles. All PCR reactions included a negative control (one reaction with no DNA template). After PCR amplification, reaction success was checked for each sample using agarose gel electrophoresis. Figure 3.4 shows an example of a successful PCR check gel, see Appendix 2 for details. Loading wells bps 1357 1078 872 603 >300

M

S1

S2

S3 S4

S5

S6

-ve

Figure 3.4. Agarose check gel showing positive amplification of Cyt b mtDNA gene fragment. M: Molecular weight marker (Marker 9, Roche), S1-6: PCR product from successful amplification, -ve: negative control.

70

Phylogeography of C. striata

Mass screening using TGGE Variation in Cyt b was screened using Temperature Gradient Gel Electrophoresis in combination with Outgroup Heteroduplex Analysis (TGGE-OHA) (Campbell et al. 1995; Elphinstone & Baverstock 1997). This method provides a relatively low cost, high throughput means for screening variation at mtDNA loci. TGGE is a ‘genetic fingerprinting’ technique that can be applied to identify different DNA sequences (allelic variants) based on their relative electrophoretic mobility when partially melted (denatured) (Lessa & Applebaum 1993; Muyzer & Smalla 1998). OHA increases the sensitivity of TGGE, allowing for detection of differences as small as one base pair between allelic variants (Campbell et al. 1995). In OHA, PCR product from the target DNA fragment of each individual is annealed to PCR product from a single reference sample (outgroup), creating mismatched double stranded DNA fragments of the target DNA region (heteroduplexes). Every heteroduplex is characterised by a precise melting temperature that is determined by the specific nucleotide base mismatches between sample and reference DNA strand. TGGE on highresolution polyacrylamide gels can be used to discriminate between heteroduplexes based on these denaturation temperatures, allowing samples to be classified into haplotype groups based on the banding patterns unique to each haplotype heteroduplex. A C. striata sample collected from Tanjung Karang (Malaysia) was selected as the principal outgroup for Cyt b screening in this study. A second sample, collected from Vientiane (Lao PDR) was used as an alternative outgroup to rerun samples that had failed to produce banding patterns when the first outgroup was used. Samples were screened on a horizontal TGGE system (modelled from a DIAGEN (now QIAGEN) TGGE-System). Each heteroduplex reaction contained ~15ng sample PCR product and ~10ng reference PCR product. After optimisation of electrophoresis conditions, samples were run through 5% polyacrylamide gels at 300V for 3hrs over a perpendicular gradient of 40-60oC. DNA bands were visualised via silver staining. Examples of banding pattern and gel scoring are presented in Figure 3.5. For a detailed explanation of methods, including optimisation of the TGGE-OHA technique for screening Cyt b diversity in C. striata, please consult Appendix 6. Cyt b haplotype frequencies were scored individually for all sample populations. After all samples for each site had been sorted into haplotype classes, single individuals of each 71

Chapter 3. haplotype present for each site were sequenced for the Cyt b fragment using the Ch-CBi primer following the protocol outlined in Appendix 2. The DNA sequences of these individuals were assumed to be representative of all individuals of the same haplotype class at their site of collection. Sequences were aligned by eye using BIOEDIT software and all mutations checked against chromatograph sequencing output. Ends of sequences were discarded and, where this resulted in the loss of unique mutations, haplotype frequencies were pooled to reflect diversity in the shortened gene fragment.

Ref ? A1 A1 A1 A1 ? A1 A1 A1 A2 A1 A3 A2 A1 A2 A4 A3 A2 A1 A1 A1 A4 A1 ? A1 A2 A1

Ref B1 B1 B2 B2 B1 B2 B2 B3 B4 B5 B5 B6 B6 B6 B7 B7 B1 B8 B1 B2 B2 B2 B2 B9 B2 B2 B2

Ref C1 C2 C3 C1 C2 C2 C3 C2 C4 C2 C2 C2 C2 C2 C3 C2 C1 C3 C5 C2 C1 C2 C2 C6 C1 C7 C2

Ref D1 D1 D2 D3 D4 D5 D5 D1 D6 D6 D3 D3 D1 D1 D6 D6 D3 D1 D5 D5 D5 D1 D5 D5 D1 D5 D1

Figure 3.5. Banding patterns observed for four Cyt b TGGE-OHA gels. Letters indicate gel (A, B, C and D), numbers indicate allele classification based on scored banding pattern. The standard outgroup only homoduplex (Ref) is on the far left.

72

Phylogeography of C. striata

Molecular techniques - Microsatellite DNA Isolation of microsatellite loci, primer design and PCR amplification Nuclear DNA diversity was assessed by screening variation at eight microsatellite loci. While microsatellite loci are considered to be ubiquitous in most eukaryote genomes (Ellegren 2004), they are often located within non-coding DNA regions with high mutation rates, meaning that homology in microsatellite priming sites between species is generally low. This often necessitates isolation of taxon-specific microsatellite regions when examining new taxa (Zane et al. 2002). This proved to be the case for C. striata, as no previously published microsatellite primers were available for this, or any other member of the Channidae. Isolation of microsatellite DNA involved: 1) fragmenting C. striata genome, 2) inserting individual fragments into bacterial cells to create a genomic library, 3) screening the genomic library for specific tandem repeat sequences using radioactive probes, and 4) sequencing microsatellite DNA inserts identified in the screening process. See Archangi et al. (2009), Chand et al. 2005 and Appendix 7. Sequences that contained discrete microsatellite repeat regions with large (at least 40 base) flanking regions either side of the repeat were selected for primer design. The online program PRIMER3 (Rozen & Skaletsky 2000) was used to produce a list of primer pairs for amplifying each repeat region. From these pairs, primers were chosen that amplified fragment sizes around the 100-200bp size to facilitate short gel run times and ease of scoring against molecular size standards. Information about the eight primers developed here is presented in Table 3.3. Additional primers not used in this study are presented in Appendix 8. PCR reaction mix was optimised for each primer set by trialling multiple samples under a range of annealing temperatures and MgCl2 concentrations and visualising products on GELSCAN 3000 polyacrylamide gels. All reactions were 12µL in total and contained 5pmol of

each primer, 0.5mM dNTPS (Roche™), 1.2µL 10X PCR Reaction Buffer (Roche™), and 0.5 Units of Taq DNA Polymerase (Roche™). Reactions were performed in an EPPENDORF Mastercycler S under the following cycling conditions: 2min initial denaturation at 94oC; 35 cycles of 15s denaturation at 94oC, 15s annealing and 30s extension at 72oC; final extension ran for 1 min at 72oC. 73

Chapter 3. Table 3.3. Microsatellite primers for C. striata. Repeat motif is reported as observed in sequence used in primer design. Flanking region refers to the length in base pairs of the sequence within the priming region that is not composed of tandem repeats. Size range of PCR products is reported under Allele size, the corresponding number of tandem repeats is reported under Repeat range. For each primer set annealing temperature in degrees Celsius (Ta) and quantity (µL) of 25mM MgCl2 per 12µL were optimised for screening samples on the GELSCAN 3000 System. Primer Sequence

Cs-1 Cs-2 Cs-3 Cs-4 Cs-5 Cs-6 Cs-7 Cs-8

5’-GGC AGT GTT CCA CTC CAG TT-3’ 5’-CCG GGG ATC TTT TCA GTT TT-3’ 5’-GGT TAC ACT GCG GGT CAG AG-3’ 5’-GGA TGG GTC TAA CCT GCC TA-3’ 5’-TGC ACT GTT TCT GAC TAA ATG TG-3’ 5’-TGC CAA ACT AAA CCG ACT TTG-3’ 5’-TCG CAG TTT ATG TAC CGA CA-3’ 5’-CTC CAG GGG AAT TTA CAG CA-3’ 5’-AAA CCC AAA AGC CAC ACT TC-3’ 5’-TGA AAT AGA GCC TGT GAC TGA TG-3’ 5’-ACT TGA CAA AAC CTG CCA CA-3’ 5’-ACT TGT TCT TGG TAG ATG CCA CT-3’ 5’-CTG TGT GAA GCA GCG CAT TA-3’ 5’-GTC CAG TCT AGC AGG AGT AAC GA-3’ 5’-CTC CGA GGA TGT GTC TCT CC-3’ 5’-CTT CAT TTC TCC CCC ACC TT-3’

Repeat

Flanking

Alleles

Repeat

Ta

MgCl2

motif

region

size

range

GT(11)GG-GT(6)

138

162-198

12-30

50

0.1

TG(12)

89

103-131

7-21

56

0.4

TG(14)

83

95-151

6-34

55

0.4

CA(15)

128

146-202

9-37

53

0.4

CA(14)

125

137-197

6-36

59

0.1

AC(20)AT-AC(8)

92

122-172

15-40

59

0.15

GT(14)

122

142-176

10-27

59

0.1

GT(9)

121

133-193

6-36

59

0.15

Mass screening using real-time gel fragment analysis Variation in microsatellite allele sizes for each locus was assessed by high-density gel electrophoresis on the CORBETT Gel-Scan™ 3000 System (QIAGEN). This system utilises temperature stable high-density vertical polyacrylamide gels to separate single stranded DNA products based on their electrophoretic mobility (sequence length). As PCR fragments migrate through the gel, fluorescently tagged DNA is read by a laser to generate digital gel images. In this study primers were fluorescently tagged with HEX dye (Geneworks). Digital images were then scored using ONE-Dscan 2.05 software (SCANALYTICS) to determine length of alleles (genotype) for each sample. PCR product was denatured at 95oC in the presence of urea to create single stranded DNA prior to electrophoresis. This product was run through 18.5cm 5% polyacrylamide gels at 1500V to separate alleles based on size. To control for consistency in size scoring across gels, each gel was run with a minimum of 3 molecular size standards (Tamra350, APPLIED BIOSYSTEMS) and also included internal standards for each locus that were created by

74

Phylogeography of C. striata mixing PCR products representing a range of pre-determined allele sizes. Figure 3.6 illustrates digital image and gel scoring. Full screening protocol can be found in Appendix 9. After scoring, data for all loci were collated in spreadsheet format using Microsoft EXCEL , and then data checked with MICROCHECKER software Ver2.2.3 (Van Oosterhout et al.

2004) to identify possible genotyping errors. Genotyping errors result potentially from PCR amplification failure due to mutations at priming sites (null alleles) or preferential short allele amplification (large allele dropout), or from mis-scoring, for example scoring of stutter bands (shorter PCR product that is formed by polymerase slippage during PCR amplification) or typographic error during data collation.

(a)

(b)

m (c)

m

m

mr (d) allele

stutter band

heterozygote homozygote

Figure 3.6. Microsatellite Gel images. (a) raw digital image produced on the GELSCAN™ system, (b) area of gel with microsatellite allele banding pattern: m = molecular size standard, r = reference of pre-scored alleles, (c) microsatellite gel after scoring genotypes with ONE-Dscan software, (d) detail from (b) showing the genotypes of two individuals. All images are alleles of Cs-2 locus amplified for C. striata individuals sampled at Tan An, Mekong Delta, Vietnam.

75

Chapter 3.

Statistical Analyses Mitochondrial DNA analysis Diversity FINDMODEL (Tao et al. 2009) was used to identify the most appropriate model of

evolution for the dataset via the Akaike information criteria (Posada & Buckley 2004), and where possible, all subsequent analyses utilised the closest available evolutionary model. Two approaches were employed to describe the relationship between all C. striata Cyt b haplotypes. Firstly, neighbour joining (NJ) and Bayesian phylogenetic tree building methods were used to construct gene trees of all haplotypes. In both cases a C. marulia Cyt b sequence (AY763771: Ruber et al. 2006) was included to root the tree. The NJ method, that recovers the specific tree topology that minimises the sum of all branch lengths, was implemented in MEGA Version 4 (Tamura et al. 2007; Tamura et al. 2004) under the Maximum Composite Likelihood (MCL) model of evolution (Tamura & Nei 1993; Tamura et al. 2004), and tested with 1,000 bootstrap replicates. Bayesian inference of phylogeny was implemented in MrBAYES Version 3.1.2 (Ronquist & Huelsenbeck 2003). The analysis was run for 20,000,000 generations with a 4by4 nuclear model, nst of 6, and invariable sites with gamma evolutionary model settings. A second approach employed a maximum parsimony-median joining (MP-MJ) network to construct an estimate of the evolutionary relationships among haplotypes using NETWORK software Version 4.5.1. (Bandelt et al. 1999), with all characters weighted

equally. Among methods of network estimation, the MP-MJ method provides the best estimate of true genealogy, especially when node (internal) haplotypes are absent from the data set (Cassens et al. 2005). The star contraction algorithm (Forster et al. 2001) was applied to the network to identify star-like clusters that may be indicative of historical population expansions (Slatkin & Hudson 1991). Conservatively, the contraction radius (Δ) was set to 1 and the network only subjected to a single round of contraction. Only star-like clusters whose ancestral state was supported by a wide geographical distribution (Crandall & Templeton 1993; Fedorov et al. 2008) and which represented more than 10% of all samples were retained (Forster et al. 2001). To further elucidate the evolutionary divergence among major clades identified in tree and network analyses, Tamura-Nei (1993) corrected distance and Maximun Composite Likelihood net evolutionary divergence (Da)

76

Phylogeography of C. striata (Tamura & Nei 1993; Tamura et al. 2004) were calculated among clades, using the software DAMBE Version 5.5.1 (Xia & Xie 2001) and MEGA software respectively.

Three measures of diversity were calculated to describe mtDNA variation at each site, and also within drainages where multiple samples were available. This permitted comparisons of diversity estimates among population samples. The first measure, haplotype diversity (Hd), is the probability that two randomly chosen haplotypes within a sample will be different (Nei 1987). The second measure, nucleotide diversity () (Tajima 1983), is the mean number of pair-wise nucleotide differences between all individuals in the sample; this measure is more robust for describing variation where populations possess different sample sizes. Finally, Theta S (S) was also calculated (Watterson 1975). This measure is based on the number of segregating (polymorphic) sites among haplotypes in a sample, and is therefore useful for indicating which sites represent a mix of individuals with divergent haplotypes. All diversity measures were calculated using ARLEQUIN software Version 3.1 (Excoffier et al. 2005). Each sample and drainage (pooled sample) was also tested for deviation from mutation-drift and gene flow-drift equilibrium. Deviation from neutral expectations may be evidence for past demographic events such as population growth or the presence of population sub-structuring within a sample. Statistics for detecting such deviations can be divided into three classes (I, II, and III) based on the information they incorporate (RamosOnsins & Rozas 2002). Here, Tajima’s D (Tajima 1989), that compares the number of segregating sites to nucleotide diversity in a sample (Class I), was used to test for deviation from neutrality due to selection, population bottleneck, or admixture (Rand 1996). Fu’s FS (Fu 1997), that compares  estimated from nucleotide diversity with the expected number of haplotypes under Ewens’ (1972) distribution given the sample size (Class II) was used to detect past fluctuations in population size, and R2 (Ramos-Onsins & Rozas 2002), that uses information from the mismatch distribution of pair-wise differences (Class III) was also employed to examine demographic events. Tajima’s D and Fu’s FS were calculated in ARLEQUIN, and Rasmos-Onsins and Rosas’ R2 was calculated in DNASP Version 5 (Librado &

Rozas 2009). Significance values for all three tests were calculated using coalescent simulations implemented in DNASP (given θ, with 1,000 replicates for each simulation), and adjusted for family-wise error rate using the False Discovery Rate Procedure (Benjamini & Hochberg 1995; Verhoeven et al. 2005). In addition, FS and R2 were calculated for each clade independently to look for evidence of clade specific demographic 77

Chapter 3. expansion. Where R2 reflected a distribution expected under the population expansion model (Rogers & Harpending 1992), τ was calculated to investigate the length of time since population expansion [mutational units of time (τ) =2ut, where t is time (in generations) since population expansion given the mutation rate of the whole fragment u (Rogers & Harpending 1992)]. τ was calculated using DNASP software. Population structure To examine the level of differentiation among samples and to investigate spatial population structuring, pair-wise ST analysis (Excoffier et al. 1992), that partitions genetic variation within and among sites, was performed in ARLEQUIN. The analysis used Slatkin’s linearized measure of genetic distance (Slatkin 1991) with Tamura and Nei (1993) distance method, and significance of each pair-wise comparison was tested with a nonparametric permutation procedure (incorporating 10,000 iterations). Significance values were adjusted to account for family-wise error rate using the False Discovery Rate Procedure (FDR) (Benjamini & Hochberg 1995) (Benjamini & Hochberg 1995; Verhoeven et al. 2005), that accounts for the increased risk of type 1 error (finding a significant result by chance) when multiple statistical tests are performed (Verhoeven et al. 2005). Spatial analysis of molecular variance (SAMOVA) (Dupanloup et al. 2002) was undertaken to identify groups of samples that were most similar. In this analysis a simulated annealing approach was applied to maximise the proportion of genetic variation between groups of samples (CT), incorporating information on haplotype divergence and geographical proximity. Data were forced into k groups (where k = 2-28) to identify the combination of sites that resulted in the highest CT, and hence to identify the strongest signature of population structure (Dupanloup et al. 2002). Spatial data were entered as geographical co-ordinates (latitude and longitude) to account for the possible modes of dispersal for this species (overland or via permanent freshwater connection). To further test for significant spatial structuring in patterns of genetic differentiation, a Mantel’s test for isolation by distance (IBD) was conducted. Straight line distances for each pair-wise site comparison were log transformed prior to correlation with ST values. Calculations were performed in ARLEQUIN, with significance calculated from 10,000 permutations. Mantels tests were applied to the whole data set, and then to mainland SE Asian sites, excluding the southern Indian site (K) and the Sumatran site (LP). These sites are isolated from mainland SE Asian sites by extensive expanses of agricultural land (>1,500kms) and marine environment (Strait of Malacca) respectively, and so neither site is 78

Phylogeography of C. striata likely to contribute contemporary migrants to mainland SE Asian sites. Sites in the Lower Mekong Basin were also tested for IBD independently, as this represents a vast area of suitable habitat with no obvious barriers to dispersal, and so sites may be less influenced by potential in-stream barriers including rapids (i.e., the Khone Falls), that could mask IBD by establishing high genetic gaps over short geographical distances.

Nuclear DNA analysis Diversity Raw microsatellite data were summarised into allele frequencies for each locus at each site using the software CONVERT (Glaubitz 2004). This software was also used to write data files for subsequent analyses using other statistical software packages. To enable comparison of diversity across sites, allelic richness (A) was calculated for each locus at each site in the software program FSTAT Version 2.9.3.2 (Goudet 1995), that corrects for different sample sizes using rarefaction (Leberg 2002; Petit et al. 1998) (in this study the smallest sample was 5 for site SM at the Cs-8 locus). To test for presence of significant associations between alleles across microsatellite loci (i.e., linkage), the likelihood-ratio test of linkage disequilibrium (Slatkin & Excoffier 1996) was conducted for all pair-wise locus combinations for all sites using the EM algorithm, as implemented in ARLEQUIN with 10,000 permutations. An exact test was used to determine the statistical significance of the magnitude of linkage disequilibrium. Detection of significant linkage disequilibrium between loci may be evidence for physical proximity on a chromosome, or indicate that the sample is not representative of a population at Hardy-Weinberg Equilibrium (HWE). A sample may show HW disequilibrium if it constitutes a mix of populations (stocks), cryptic species, or individuals that are not mating at random. ARLEQUIN was also used to perform exact tests (Guo & Thompson 1992) to check data

for each locus for deviation from HWE expectations at each site. Deviation from HWE is important to detect as equilibrium is an inherent assumption for population genetic analysis, and deviation may have important biological implications, such as non-random mating within the sample. More generally, testing for deviation from population mutationdrift equilibrium can reveal historical demographic events. Two analyses were performed for each site to specifically investigate recent population demography. The first analysis assessed deviation from HWE against a distribution obtained from coalescent simulations 79

Chapter 3. under a microsatellite model of mutation using a Wilcoxon’s test (2-tailed). This analysis was performed with BOTTLENECK software Version 1.2.02 (Piry et al. 1999) under the two phase model (TPM) (incorporating 95% stepwise mutation and 5% infinite allele model) with variance of 12 and 100,000 iterations. Significance values were adjusted for linkage, exact HWE and Wilcoxon’s tests following the FDR procedure. Secondly, the Garza-Williamson Index (M: essentially the number of alleles divided by the allelic range) (Garza & Williamson 2001) was calculated in ARLEQUIN. This statistic, that can be averaged across loci, ranges from 0 to 1, with a value of 1 indicating historically stationary (stable) population size, and very small values indicating a recent reduction in population size (bottleneck). As a general rule, if M < 0.68, it can be assumed that there has been a recent reduction in population size (given a data set of seven or more loci) (Garza & Williamson 2001). Population structure Three pair-wise estimators of differentiation were calculated to assess levels of population sub-division among sample sites. FST and RST analogues (Slatkin 1991, 1995) were calculated in ARLEQUIN with 10,000 permutations, with significance values corrected using the FDR procedure. FST considers the number of alleles and their frequency, and is appropriate for describing differentiation when sub-populations are weakly structured, while RST estimates, that account for microsatellite size variation under the stepwise mutation model, outperform FST when structure is more pronounced (and migration lower) (Balloux & Goudet 2002). While both FST and RST analysis are used commonly to assess population differentiation, it has increasingly been recognised that both methods fail to recover the “true” magnitude of differentiation when diversity and/or differentiation is high (Balloux et al. 2000; Jost 2008). To avoid this bias, a third measure of differentiation, Jost’s Dest (estimator of actual differentiation) (Jost 2008) that avoids the statistical problems inherent in fixation indices, was calculated for each locus using SMOGD software (Crawford 2010), and data combined across all loci by taking the harmonic mean approximation proposed by Chao (2009)(http://www.ngcrawford.com/django/jost/). FST and RST values were plotted against Dest to assess the similarity of estimates qualitatively for each pair-wise comparison. Two visual methods were adopted to illustrate the magnitude and pattern of differentiation among samples. Firstly, Dest values were used to construct a population tree using the NJ method implemented in MEGA Version 4 to visualise the relationship among 80

Phylogeography of C. striata sites. An alternative approach, factorial correspondence analysis (FCA) (Benzécri 1973), was used to plot all individuals in p-dimensional hyperspace (where each allele defines a dimension) and to determine each sample’s centre of gravity (midpoint of all individuals) within the hyperspace. This was then displayed as plots of all individuals and of all populations in 3-dimensional space with axes corresponding with the three factors that account for the most variation within the data set (Jombart et al. 2009). FCA analysis was undertaken using GENETIX software Version 4.05.2 (Belkhir et al. 1996-2004). To examine the spatial distribution of genetic diversity quantitatively, the significance of the correlation between FST and straight line distance (log transformed) was tested using a Mantel’s test (10,000 permutations), performed in ARLEQUIN. To further test for patterns in the spatial arrangement of genetic similarity, genetic distances estimates among sites within the Mekong River Basin were mapped onto the stream sections that connect them, following the Stream Trees method proposed by Kalinowski et al (2008), that uses least squares (regression) analysis to fit genetic distance to river sections. Unlike straight tests for IBD, the Stream Trees model allows for individual sections in a drainage network to represent different degrees of genetic distance, allowing for the possible influence of instream barriers to create large genetic gaps over short geographical distances. Similar to the predictive Stream Hierarchy Model (Meffe & Vrijenhoek 1988), the Stream Trees model assumes that, in the absence of headwater exchange, large fluctuations in population size or differential life history traits, genetic difference will accumulate across a drainage network (Kalinowski et al. 2008). Regression was performed in the STREAM TREES software package (Kalinowski et al. 2008) using Dest distance estimates. Finally, two clustering methods were used to assign individuals into relatively homogenous groups. In the first method, individuals were assigned to populations using a Bayesian model-based clustering approach that classifies individuals into K groups, optimising HWE and LE for each group (Guillot et al. 2009; Pritchard et al. 2000). Clustering was applied to the full data set, and also independently across a reduced data set constituting a subset of sites where one site was not significantly differentiated from neighbouring sites, but showed strong evidence for HW and linkage disequilibrium. The analysis was performed at the smaller scale because in simulations, Bayesian clustering has been shown to only detect the uppermost hierarchical level of population structuring when multiple layers of sub-structuring are present (Evanno et al. 2005), and hence analysis of smaller scale (geographically localised) data sets may avoid including individuals from top 81

Chapter 3. tiers in the hierarchical structure, the presence of which would obscure detection of fine scale sub-structuring. Bayesian clustering was implemented using the program STRUCTURE Version 2.2 (Falush et al. 2003, 2007; Pritchard et al. 2000), with the majority of analyses performed on the CBSU bioHPC facility (http://cbsuapps.tc.cornell.edu/structure.aspx). Analyses assumed an admixture model, with a constant λ of 1.0, and were run for a MCMC chain length of 1,000,000 following a burn-in of 500,000 (these values were chosen after preliminary analysis indicated summary statistics (FST, α, likelihood, and log probability (LnP(D)) had stabilised after this run length). For analysis of the reduced data set, 20 replicates of each K from 2 to 2n (n=4) were performed. To reduce computational time for the full data set, only 10 replicates of each K from 2 to 2n (n=19) were performed. Following advice from Evanno et al (2005), to determine the optimal number of groups, LnP(D) for each K across all replicate runs were collated and the rate of change in LnP(D) assessed by plotting ΔK (Evanno et al. 2005) using EXCEL and SIGMAPLOT softwares. The analysis in which K corresponded with the highest peak in the distribution of ΔK was then considered to be the best estimate of population groupings, and the probabilities of assignment to this number of individual clusters were aligned across replicates using the program CLUMPP Version 1.1.2 (Jakobsson & Rosenberg 2007b), using the appropriate algorithm determined by calculation of D for each method as defined in the user manual (Jakobsson & Rosenberg 2007a) (The “Greedy” algorithm in each case). Results were visualised using DISTRUCT version 1.1 (Rosenberg 2004). The second clustering method was employed specifically to examine the extent of possible introgression between the two lineages earlier identified in the phylogenetic analysis (Chapter 2). In this approach, individual genotypes were assigned to one of two clusters (K =2 following the a priori assumption that introgression has occurred between two C. striata groups, previously referred to as the EA lineage and the MM lineage (see Figure 2.3). This analysis was implemented using FLOCK software (Duchesne & Turgeon 2009). The non-Bayesian approach adopted in FLOCK attempts to group contemporary admixed individuals “along their ancestral differentiation lines”, and has been shown to perform better than the Bayesian algorithm implemented in STRUCTURE when pure genotypes (non-introgressed individuals) are absent from the data set (Duchesne & Turgeon 2009). After all individuals have been assigned to ancestral clusters, it is possible re-allocate individuals following multilocus maximum likelihood (Paetkau et al. 1995). For 82

Phylogeography of C. striata each individual, log-likelihoods were estimated for membership to each of K reference clusters (K=2). The difference in log-likelihood allocation (LLOD score) was then calculated for each individual, by subtracting the likelihood of membership to reference 2 from the same value for reference 1. In this way, it follows that individuals with a high probability of membership to reference 1 and low probability of membership to reference 2 will have large positive LLOD scores, and conversely, a large negative LLOD score indicates a high probability of membership to reference 2. Individuals that have roughly equal likelihoods of membership will have LLOD scores close to zero, and are interpreted as admixed. Here, mean LLOD scores for each sample (representing the mean of differences in loglikelihood allocation to the reference group for each individual) were plotted to generate an admixture map for C. striata sample sites. This approach pinpoints sites that may represent hybridisation zones between the two lineages. In addition, LLOD scores where plotted for each individual that had also been genotyped for their mtDNA to examine the level of nuclear introgression between mitochondrial clades.

83

Chapter 3.

RESULTS MtDNA diversity and phylogeography A survey of 988 C. striata individuals collected from 28 sites identified a total of 70 unique haplotypes for the 570bp Cyt b mtDNA fragment. Among haplotypes, 96 sites were variable (Table 3.4), with 12 representing non-synonymous mutations. Frequencies of haplotypes for each site are listed in Table 3.5. Of the 70 haplotypes observed, 41 (accounting for 17.31% of all individuals sampled) were detected at only a single site (private haplotypes), and of these 23 haplotypes were detected in only a single individual (singletons).

Phylogeography All 70 haplotypes could be assigned to the three distinct lineages detected in the phylogenetic study (Figure 3.7). The West Asian lineage (WA) was represented by a single haplotype detected in all three individuals surveyed from southern India (haplotype 65). Five haplotypes were classified as belonging to the Middle Mekong lineage (MM) (haplotypes 66-70), that were found in 51 individuals from 6 sites in the Mid-upper Mekong River Basin (sites SB, VV, VT, SM, KJ and ST, Figure 3.3). The majority of individuals assessed for Cyt b (83.5%) possessed haplotypes from the East Asian lineage (EA) (haplotypes 1- 64), that were found across all eastern sampling sites except for SM and VV (n=2 for each). Within the EA group, all samples from Sumatra (site LP, haplotypes 60-64) and samples from Malaysia (site TK, haplotypes 49 and 50) formed distinct clades (see Figure 3.7). The relationship between haplotypes is also illustrated in Figure 3.8. The EA Cyt b clade was the most diverse, with a maximum of 16 mutations present among the 64 haplotypes. The other eastern clade, MM, was detected less often in sampling, and was less diverse, with a maximum of 5 mutations among MM haplotypes. The MM clade was highly divergent from the widespread EA clade, with a minimum of 39 base pair mutations present between haplotypes from each clade, representing a minimum divergence of 7.45% corrected distance (Tamura-Nei 1993), with net evolutionary divergence between groups of 0.071 (SE = 0.013)(MCL) . The WA haplotype is more closely related to the EA clade than to the MM clade, with minimum divergences of 3.44% (19 mutations) and 6.02% (31 mutations) corrected distances, respectively. This pattern of divergence between major clades indicates that the split between the common ancestors of the MM

84

Phylogeography of C. striata clade and other C. striata populations is likely to pre-date divergence between the WA and EA lineages, despite the current occurrence of MM and EA clades in sympatry at four sites in the Mekong River Basin (sites SB, VT, KJ and ST). Table 3.4. Variable sites for 70 haplotypes of 570 nucleotide bases of Cyt b for C. striata.

Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap Hap

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70

11111111111111112222222222222222222223333333333333333444444444444455555555555555555 233446789900112223334677790011123344455566778990112244555777889001455667888900112233444555667 239739283840925676792592512721703624713635817365079581725157238473251139025678917363547369289140 GTACATCTCGCCATAACAATTCCGTATCTCTTTCTTAGCTCAAACCCAATCCCCCCATATTTCAATGTCGTTCTATGCCGGCATAACCCCCCCATG ..................................................................................G............. ..........................................................G....................A..G............. .............................T...............T............G...........C......................... .............................................T............G.............................A....... .............................................T............G..................T..........A....... .............................................T............G.........................G...A....... .............................................T..........G.G.........................G...A....... ..............G..............................T............G.........................G...A....... .............................................T............G................A.................... .............................................T............G...............................T..... ...........T.................................T............G...............................T..... .............................................T............G...............................T..G.. .............................................T............G.................................T... .........................................G...T............G.......A.........................T... ........................................T....T............G..................................... .....................................A.......T............G..........................G.......... .....................................A....................G..................................... .....................................A....................G..C.................................. ..........................................................G..C.................................. ................................C.........................G..C.................................. ..........................................................G..C..................A............... .........................................................CG..C.................................. .........A................................................G.CC.................................. .........A................................................G..................................... ..........................................................G..................................... ..........................................................G......C.............................. ...................................C......................G..................................... .C........................................................G..................................... ..........................................................G....................................A ......................T...................................G..................................... .............C............................................G..................................... .......................................C..........T.......G..................................... ..........................................................G.........T........................... ..........................................................G.............A....................... ...............G..........................................G..................................... ........A.................................................G..................................... ...T......................................................G..................................... .................G........................................G..................................... ..................................C.......................G..................................... ..........................C...............................G..................................... .........A............................................T...G..................................... .........A.............A..............................T...G..................................... .........A...................................T........T...G..................................... .........A...................................T.....T..T...G..................................... .........A...................................T........T...G..............................T...... .........A...................................T........T...G.................A................... .........A...................................T........T...G.....................A............... .........Ac.........................T........T........T..tG........................C............ .........A........G.................T........T........T..tG........................C............ .........A...............G........C..........T........TT..G...............G.....A.A.....c....... .........A...............G........C..........T........TT..G...............G..................... .........A...............G........C..........T........TT..G..................................... .........A...............G........C..........T........TT..G..............C...................... .........A.......G.......G........C..........T........TT..G..................................... A........A.......G.......G........C..........T........TT..G..............C...................... .........A.......G.......G........C..........T........TT..G..........A...C...................... ..G......A...............G...................T........T...G..............C............c......... ..G......A...............G...................T........T...G...........................T......... .........A...C..........CG.........C.........TA.......T...G....G.................T.......c...... .........A...C..........CG.........C.........T..G.....T...G....G.................T.............. .........A...C..........CG.........C.........TA.G.....T...G....G.................T.............. ........TA..............C.............T.T....T........T...G....G.........C.....A.T.............. ........TA..............C.............T.T....T........T...G....G.........C.......T.............. .....C..T..........C.T..C........T.C....T.GG.T.C......T..........C...A........TAA.T........T.... .C...CT..AT.G...TC.C.T..C..T..CC...C...CT.GGTT.C.C..TTT..C.CC.TGGC.C..CC.C.....AA......T........ .C..GCTC.AT..T..TC.C.T..C..T..C....CG..CT.GGTT.c.C..TTT..C.CC.TGGC.C..CC.C.....AA.A....T......C. .C..GCT..AT.....TC.C.T..C..T..C....CG..CT.GGTT.C.C..TTT..C.CC.TGGC.C..CC.C.....AA......T......C. .C..GCT..AT.....TC.CAT..C..T..C....CG..CT.GGTT.C.C..TTT..C.CC.TGGC.C..CC.C.....AA......T......C. .C...CT..AT.....TC.C.T..C..TC.CC.T.C...CT.GGTT.C.C..TTT..C.CC.TGGC.C..CC.C.....AA......T......C.

85

Chapter 3. Table 3.5 C. striata mitochondrial Cyt b haplotype frequencies for total sample and for each site individually (n = 988) clade EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA

haplotype 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

86

site n

K 3

CM 10

CP 49

TK 30

LP 7

SB 28

VV 2

VT 42

SM 2

MD 21

KJ 10

NE 50

GL 39

LL 40

ST 47

KK 49

0.013

SN 50

ME 39

0.04

0.10

KC 50

PA 49

KH 51

PS 56

BB 50

0.02

0.02

0.10

0.005

TP 49

TT 42

TC 53

0.02

0.02

0.02

VL 28

0.09

0.001

0.02

0.008

0.08

0.007

0.02 0.06

0.08

0.02

0.001

0.02

0.027

0.67

0.04

0.001

0.02

0.009

0.23

0.001

0.03

0.007

0.02

0.04

0.004

0.10

0.02

0.001

0.04

0.02 0.06

0.010

0.02

0.08

0.02

0.00

0.005

0.02

0.001

0.04

0.19 0.07

0.03

0.001

0.02

0.004

0.08

0.076

0.20

0.28

0.36

0.20

0.006

0.18

0.05

0.06

0.05

0.14

0.10

0.06

0.02

0.10

0.02

0.021

0.23

0.005

0.16

0.11

0.003

0.02

0.04

0.001

0.02 0.14

0.20

0.08

0.51

0.67

0.48

0.56

0.58

0.76

0.39

0.61

0.62

0.41

0.021

0.77 0.50

0.001

0.02

0.036

0.61

0.001

0.02

0.003

0.03

0.001

0.02

0.02

0.02

0.001

0.02

0.002 0.079

0.02

0.00

0.007

0.328

PH 42

0.05 0.86

0.30

0.64

0.12

0.33

0.04

0.45

Phylogeography of C. striata Table 3.5 continued clade EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA WA MM MM MM MM MM

haplotype 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70

site n

K 3

CM 10

CP 49

TK 30

LP 7

SB 28

VV 2

VT 42

SM 2

MD 21

KJ 10

NE 50

GL 39

LL 40

ST 47

KK 49

0.002

SN 50

ME 39

0.02

0.03

0.004

KC 50

PA 49

KH 51

PS 56

BB 50

0.02

0.02

0.04

0.98

0.009

PH 42

0.02

0.02

0.02

0.29

0.31

0.02

0.32

0.013

0.46

0.012

0.70

0.001

0.10

0.002

0.14

0.02

0.04

0.001

0.10

0.032

0.65

0.029

0.97

0.001

0.03

0.004

0.08

0.012

0.10

0.22

0.013

0.12

0.05

0.08

0.02

0.002

0.05

0.001

0.02

0.040

0.12

0.007

0.04

0.002

0.06

0.02

0.29 0.02

0.05

0.001

0.03

0.002

0.29

0.001

0.14

0.002

0.29

0.001

0.14

0.001

0.14 1.00

0.005

0.10

0.001

0.50

0.50 0.07

0.69

0.009 0.001

VL 28

0.02

0.043

0.035

TC 53

0.04

0.001

0.003

TT 42

0.04

0.002 0.002

TP 49

0.19

0.30

0.02

0.50

0.50

87

Chapter 3.

35 30 37 36 39 38 41 31 34 26 33 29 27 32 1 2 3 40 28 22 20 23 21 19 18 17 16 4 15 14 12 13 11 10 5 -6 0.95 7 8 9

EA

25

97 --

73 --

-0.52

24 42 43 44 46 88 49 1.00 50 45 47 58 59 71 51 0.65 52 -- 53 0.97 54 55 56 57 97 64 0.86 63 61 99 1.00 62 60

Malaysia

Sumatra

WA

65 99 1.00

India

66 89 0.96

70 68 67 69

MM

0.02

Figure 3.7. Neighbour joining tree for C. striata mtDNA Cyt b haplotypes. Selected NJ bootstrap support values presented above the line, Bayesian posterior probabilities below the line. Three major clades indicated by grey vertical bars: EA = East Asian, WA = West Asian, and MM = Middle Mekong.

88

Phylogeography of C. striata

Figure 3.8. Median-Joining Network of C. striata Cyt b haplotypes. Open circles represents individual haplotypes, size of circle indicates relative frequencies (see Table 3.5 for absolute frequencies). Lines between haplotypes represent single mutational changes, where additional mutational changes have occurred small black circles represent hypothetical intermediate haplotypes not detected in sampling. Dotted lines show alternative connections. Three clades indentified from phylogenetic analysis: EA = East Asian, WA = West Asian, and MM = Middle Mekong.

89

Chapter 3. Figure 3.9 presents the geographical distribution of the three major C. striata Cyb b mtDNA clades, illustrating that the two eastern lineages, EA and MM, occur together in the upper -mid Mekong River Basin. The EA lineage is distributed across SE Asia, including the island of Sumatra. In Figure 3.10 the MJ network has been colour-coded by sections of drainage basins, to illustrate the diversity and to describe the genetic relationships found within and across four sample drainages in SE Asia. The single sites from Malaysian and Indonesian drainages (coloured blue and green) have unique sets of haplotypes. In both cases, haplotypes detected in these drainages grouped together (Figure 3.7) and these clades formed tips in the MJ network (Figure 3.8 and Figure 3.10). Tip haplotypes are generally considered under coalescent theory to be recently evolved (Castello & Templeton 1994; Crandall 1996), representing new genetic variants that have not had sufficient time to disperse widely across the distribution at large. Haplotypes found in Indonesia represent two tip lineages that are as closely related to each other as they are to an internal haplotype (haplotype 44; coloured purple and red) that was found in the northern Chao Phraya and the upper Mekong. Haplotype 44 was also the most closely related haplotype to Malaysian haplotypes. Haplotypes, such as haplotype 44, that have internal positions in a network are generally considered ancestral types (Castello & Templeton 1994; Crandall & Templeton 1993; Donnelly & Tavaré 1986), especially when their distributions are widespread (as in this case across two drainages). Therefore, it is likely that both the Malaysian and two Indonesian lineages (haplotypes 60-62 and 63-64) identified here represent descendants of separate divergence events from this ancestral EA haplotype. Along with haplotype 44, the Chao Phraya drainage in Thailand (coloured purple) has 6 unique haplotypes. Four of these haplotypes (haplotypes 45-48) appear to have evolved directly from the ancestral haplotype 44. Fifteen of the sixteen individuals with the other two unique Chao Phraya haplotypes (haplotypes 51 and 52) were detected in the Lower Chao Phraya at Saraburi (site CP). These haplotypes were most closely related to haplotype 53, that is found in the lower Mekong Drainage at sites in southern and western Cambodia, along with four other related haplotypes (haplotypes 54-57, yellow in Figure 3.10). All have distributions restricted to the lower Mekong south of Kratie (site KK on the main Mekong channel in central Cambodia above the Mekong – Tonle Sap confluence (Figure 3.3).

90

Phylogeography of C. striata

Figure 3.9. (a) Map of southern Asia showing geographical broad distributions of three Cytb mtDNA clades, and (b) Median-Joining network illustrating relationship between clades.

Figure 3.10. (a) Median-Joining network of C. striata Cyt b mtDNA haplotypes, with colours indicating section of drainage basin where haplotypes were detected, (b) Map of SE Asia with drainage basin sections indicated by colours.

91

Chapter 3. In total, 56 haplotypes were detected from sites in the Mekong River Basin. Sites in this drainage were divided into three groups in Figure 3.10 (red, yellow and orange) to reflect their genetic compositions. Upper Mekong sites (sites SB, VV, VT, and SM – coloured red) had the highest frequency of MM clade haplotypes, that represented 69% of all C. striata sampled in this river section, and comprised fifty-one of the fifty-four individuals from this clade sampled across the entire drainage. This Upper Mekong section was also the only region within the Mekong Basin where haplotype 44 was detected, as well as two other unique haplotypes (haplotypes 42 and 43 found exclusively at site SB, the most upstream Mekong Basin sampling site). Sayaburi (SB) had a much lower frequency of MM individuals (~7%, n = 28) when compared with the other three sites in the Upper Mekong group (which had MM frequencies of 100% (n=2), 97% (n=42) and 100% (n=2) for sites VV, VT and SM, respectively). The suite of haplotypes detected at Kontum (site GL) in the Vietnamese highlands represents a distinct component of the Mekong River Basin C. striata mtDNA diversity, and so this river section (coloured orange in Figure 3.10) appears to be independent from other downstream Mekong Basin sample sites. All haplotypes in GL belong to the EA Clade; however, they are not all closely related to each other within the clade. For example, haplotypes 58 and 59 (coloured orange to the right of Figure 3.10a) are more closely related to Upper Mekong (red), Chao Phraya (purple) and even to Malaysian (blue) haplotypes than to other GL types (haplotypes 7, 9, and 17 to the left of Figure 3.10a). In addition, four of five GL haploytpes are unique to this site, forming tips in the MJ network. The remaining 18 sites in the Middle and Lower Mekong River Basin were grouped together in Figure 3.10 (coloured yellow) because, at these sites, haplotypes formed starlike clusters in the MJ network. A total of twenty-one haplotypes were considered to belong to star-clusters (haplotypes 19-23 and 26-41), illustrated in Figure 3.11a. These haplotypes were found in a total of 632 individuals in the Middle and Lower Mekong River Basin, accounting for more than 81% of all individuals sampled from this region of the drainage (illustrated in Figure 3.11b). The occurrence of star-like patterns suggests that C. striata populations have undergone significant population size expansions in the relatively recent past (Forster et al. 2001; Slatkin & Hudson 1991). The geographical range of star haplotypes suggests that population size expansion may have been limited to populations in the Middle and Lower Basin (excluding site GL in the Vietnamese highlands), and the high frequency of star-haplotypes in this region suggests that new mutations retained

92

Phylogeography of C. striata during such an expansion may have contributed significantly to genetic diversity at these sites. Evidence for recent population expansion of the EA clade is supported by significant values for Ramos-Onsins and Rozas’ R2 (R2 = 0.0258, p = 0.030) and Fu’s FS (FS = -25.240, p ≤ 0.001) consistent with population growth. Tau (τ) = 0.963 for this clade, that, given a generation time of 2yrs (Ali 1999; Kilambi 1986) and a mutation rate for perciform Cyt b of 1-2% per million years (Cárdenas et al. 2005; Johns & Avise 1998), potentially indicates that the EA mtDNA clade expanded in the Late Pleistocene between 338-169 Kya. Site specific and drainage wide tests for deviation from mutation-drift and gene-flow drift equilibrium, however, did not generally support the hypothesis for population expansion, except for FS in the Mekong Basin (Table 3.6).

Figure 3.11, (a) Median-Joining Network of EA Cyt b mtDNA clade. Green circles show haplotypes which form star-like clusters that indicate recent population expansion, (b) Map of mainland SE Asian freshwater drainage lines. Geographical distribution of star clustered haplotypes is shaded green.

93

Chapter 3.

Table 3.6. Summary statistics for all sites individually, river drainages, and all data combined. Symbols denote significance (p ≤ 0.05) before (*) and after (†) FDR correction for multiple comparisons. Sites K, VV and SM have been excluded due to low sample sizes (≤3). Site

n

n haplotypes

Nucleotide diversity () 1.200 2.212 0.067 4.762 6.230 3.028 0.257 20.422 0.882 1.579 0.150 2.481 0.872 2.405 1.555 1.597 1.471 1.655 1.194 1.668 3.636 5.069 1.128 4.656 5.660 7.016 2.227

Theta (S)

Tajima’s D

Fu’s FS

4 4 2 5 4 4 2 4 3 5 2 4 5 9 9 9 6 11 7 8 13 7 8 4 5 57 7

Haplotype diversity (Hd) 0.533 0.526 0.067 0.905 0.680 0.489 0.257 0.822 0.516 0.511 0.050 0.610 0.508 0.727 0.971 0.638 0.422 0.737 0.556 0.598 0.797 0.671 0.737 0.563 0.678 0.841 0.655

2.121 1.346 0.252 4.081 10.536 9.751 0.278 15.553 0.447 2.602 0.702 9.736 1.346 2.902 3.075 4.242 2.916 4.445 1.306 2.902 5.158 3.846 3.526 3.341 3.951 11.403 1.483

-1.796* 1.647 -1.147* 0.894 -1.526* -2.415*† -0.133 1.524 1.739 -1.199 -1.716*† -2.571*† -0.899 -0.514 -1.550* -1.976* -1.491 -1.996*† -0.213 -1.275 -0.959 1.435 -2.086*† 1.320 1.391 -1.069 0.780

-0.497 3.6987 -1.212*† 0.2002 8.629 5.039 0.341 9.023 1.731 0.941 -0.150 4.191 -0.506 -0.478 -2.628 -2.046 0.108 -3.706* -1.234 -1.060 -1.532 4.127 -2.428 6.481 7.834 0.208

RasmosOnsins & Rozas’ R2 0.200 0.184 0.180 0.227 0.048* 0.132 0.129 0.232 0.220 0.072 0.156 0.138 0.074 0.090 0.058* 0.045* 0.058 0.065 0.115 0.064 0.076 0.164 0.065 0.176 0.163 0.042 0.137

CM CP TK LP SB VT MD KJ NE GL LL ST KK SN ME KC PA KH PS BB TP TT TC VI PH Mekong Chao Phraya Global

10 49 30 7 28 42 21 10 50 39 40 7 49 50 39 50 49 51 56 50 49 42 53 28 42 899 59 988

70

0.870

7.207

12.847

-1.222

-23.997*†

0.037

-24.053*†

Of the 378 pair-wise ST comparisons (Table 3.7), 355 revealed significant population differentiation after FRD correction. Samples from Peninsula Malaysia (site TK) and Sumatra (site LP) were differentiated from all other sampled sites, as was the southern Indian population (site K) when sites of low sample size (n = 2) were excluded. This broad scale differentiation is reflected in results of the Mantels test across all sites, that showed a significant association between genetic distance (ST) and geographical distance (log of straight line distance in kms, r=0.547 p ≤0.001).

94

Phylogeography of C. striata

Table 3.7. Population Pair-wise ST’s for C. striata mtDNA data. Above the line: ST estimates, below the line: Significance (α = 0.05) before (*) and after (†) False Discovery Rate Correction. Grey cells indicate comparisons that are not statistically significant. K K CM CP TK LP SB VV VT SM MD KJ NE GL LL ST KK SN ME KC PA KH PS BB TP TT TC VI PH

*† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *†

CM

CP

TK

LP

SB

VV

VT

SM MD

KJ

NE

GL

LL

ST

KK

SN

ME

KC

PA

KH

PS

BB

TP

TT

TC

VI

PH

0.95 0.90 0.99 0.85 0.75 0.94 0.92 0.94 0.99 0.41 0.96 0.94 0.99 0.89 0.96 0.90 0.93 0.93 0.94 0.93 0.95 0.93 0.85 0.80 0.95 0.83 0.79 0.23 0.86 0.66 0.12 0.96 0.94 0.96 0.87 0.26 0.79 0.69 0.92 0.56 0.79 0.48 0.63 0.64 0.64 0.64 0.73 0.60 0.36 0.27 0.69 0.33 0.25 *† 0.67 0.70 0.32 0.95 0.94 0.95 0.74 0.51 0.74 0.69 0.79 0.62 0.76 0.55 0.64 0.65 0.65 0.66 0.69 0.62 0.46 0.39 0.67 0.45 0.38 *† *† 0.89 0.47 0.99 0.96 0.99 0.98 0.56 0.91 0.85 0.98 0.76 0.90 0.73 0.83 0.82 0.83 0.82 0.87 0.81 0.64 0.57 0.86 0.65 0.54 *† *† *† 0.46 0.89 0.92 0.88 0.88 0.31 0.87 0.81 0.92 0.74 0.86 0.73 0.80 0.80 0.81 0.81 0.85 0.79 0.65 0.56 0.84 0.59 0.54 *† *† *† 0.84 0.88 0.84 0.44 0.21 0.50 0.54 0.55 0.37 0.45 0.33 0.37 0.40 0.39 0.42 0.46 0.37 0.29 0.26 0.43 0.26 0.25 *† *† *† *† *† 0.12 -0.34 0.99 0.45 0.98 0.96 0.99 0.94 0.98 0.95 0.96 0.96 0.97 0.96 0.97 0.96 0.92 0.89 0.97 0.90 0.88 *† *† *† *† *† 0.01 0.95 0.76 0.96 0.95 0.96 0.93 0.96 0.94 0.95 0.95 0.95 0.95 0.95 0.95 0.92 0.91 0.95 0.91 0.90 *† *† *† *† *† 0.99 0.43 0.98 0.96 0.99 0.94 0.98 0.94 0.96 0.96 0.95 0.96 0.97 0.96 0.92 0.88 0.97 0.89 0.87 *† *† *† *† *† *† *† *† 0.36 0.13 0.75 0.90 0.33 0.53 0.33 0.41 0.31 0.40 0.18 0.47 0.38 0.25 0.37 0.46 0.39 0.33 *† *† *† *† *† *† *† 0.49 0.51 0.54 0.36 0.48 0.40 0.40 0.44 0.44 0.44 0.48 0.43 0.34 0.32 0.47 0.28 0.30 *† *† *† *† *† *† *† *† *† *† 0.73 0.72 0.21 0.32 0.24 0.28 0.19 0.28 0.08 0.31 0.27 0.22 0.40 0.34 0.42 0.37 *† *† *† *† *† *† *† *† *† *† *† 0.81 0.57 0.68 0.53 0.59 0.61 0.62 0.61 0.68 0.60 0.44 0.46 0.64 0.49 0.42 *† *† *† *† *† *† *† *† *† *† *† *† 0.44 0.64 0.45 0.54 0.51 0.53 0.52 0.59 0.50 0.36 0.46 0.58 0.51 0.42 *† *† *† *† *† *† *† *† *† *† *† *† *† 0.05 0.06 0.07 0.06 0.07 0.10 0.05 0.06 0.09 0.29 0.12 0.27 0.26 *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.06 0.03 0.02 0.03 0.08 0.07 0.03 0.08 0.32 0.05 0.33 0.29 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.02 0.04 0.03 0.09 0.10 0.01 0.02 0.19 0.07 0.19 0.17 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.01 0.01 0.06 0.07 -0.01 0.05 0.25 0.02 0.24 0.22 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† * -0.01 0.02 0.07 0.01 0.05 0.26 0.03 0.26 0.23 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.07 0.08 0.01 0.04 0.24 0.02 0.24 0.22 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.12 0.07 0.09 0.30 0.09 0.29 0.26 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.07 0.10 0.34 0.13 0.35 0.32 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.04 0.24 0.03 0.24 0.22 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.12 0.06 0.11 0.10 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.28 0.10 0.08 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.28 0.24 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† 0.01 *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *† *†

95

Chapter 3. Excluding the two outlying sites, samples across mainland SE Asian retained a strong signature of IBD with straight line (overland) distance (r=0.627 p ≤0.001, Figure 3.12). The two populations from the Chao Phraya River drainage (sites CM and CP) were differentiated from each other (ST = 0.228, p ≤0.01); however, no significant ST was observed among CM and site SB in the Mekong Drainage, suggesting that across northern Thailand and northern Lao, patterns of differentiation do not reflect current drainage line boundary patterns. Although located in separate drainages, these sites are approximately 300km apart, less than the ~ 500km distance between the two Chao Phraya sampling sites. Significant differentiation was also observed across most sites in the Mekong basin. Comparisons between sites VV, VT, and SM, however, indicated no significant differentiation, revealing a cluster of genetically similar sample sites in the upper-mid Mekong. Spatial analysis of molecular variance (SAMOVA) also identified this group, because when k = 2 the three sites were partitioned from all other sites, resulting in the highest between group diversity values observed for any k value (CT = 0.90120, p < 0.00001). Interestingly, site KJ was not significantly differentiated from sites VV and SM in the ST analysis; however, site MD, that lies on the Mekong River between KJ and SM, was significantly differentiated from all other sites. This site had relatively low diversity compared with neighbouring sites (Table 3.6), and unlike the sites VV, VT, SM, and KJ, no individuals belonging to the MM clade were detected there. Sites at the headwaters of Mekong tributaries in the Vietnamese highlands were strongly differentiated from each other (ST = 0.806, p ≤0.001) and from all other Mekong sampled populations. Average significant ST’s between these and other Mekong sites were 0.671 (for GL) and 0.670 (for LL). Stung Treng (ST), which is the closet downstream site to the Vietnamese highlands, was highly differentiated from GL and LL (195kms, ST = 0.569, p ≤0.001 and 270kms, ST = 0.441, p ≤0.001 respectively), but least differentiated from its own nearest downstream neighbour (site KK: 115kms, ST = 0.046, p ≤0.002). This suggests that distance and direction of stream flow are not the only factors that influence patterns of differentiation among these sites. Stream connections between the Vietnamese highlands and the Mekong proper provide quite different freshwater habitat to the main Mekong channel between ST and KK, as the highland tributaries are predominantly clear water forested streams in comparison with the Mekong’s large, fast flowing current that carries a high silt load (Rainboth 1996a)

96

Phylogeography of C. striata

r = 0.627 P < 0.001

1.05

0.85

0.65

ST

0.45

0.25

0.05 1.5

1.7

1.9

2.1

2.3

2.5

2.7

2.9

3.1

3.3

-0.15

-0.35

Log of straight line geographic distance (kms)

Figure 3.12. ST plotted against the log of straight line (overland) geographic distance for C. striata mtDNA data. Trendline shows the general pattern of increasing genetic distance with greater geographical distance (IBD). In Cambodia (sites ST, KK, SN, ME, KC, PA, KH, PS, BB, TP), all sampling locations lie downstream from the largest complex of rapids in the Mekong Basin, the Khone Falls. Cambodian sites span the transition in the Mekong from a fast flowing upland river with deep pools and sand bars to a lowland meandering river with vast floodplains and numerous oxbows. Across Cambodia, 35 of 45 pair-wise comparisons showed significant ST values among sites, but values were relatively low (average of significant ST’s = 0.063). As the Mekong approaches southern Vietnam, it splits into a number of channels and takes the form of a high estuary, which means that although freshwater, sample locations in the Mekong delta (sites TT, TC, VI and PH) experience current and velocity changes associated with tidal effects at the river mouth. More differentiation was generally observed among these sites than among Cambodian sites (average significant ST among delta sites of 0.152), and these sites were also strongly differentiated from upstream Lower Basin sites (average significant ST between delta and Cambodian sites = 0.222). This suggests that genetic differentiation is not necessarily proportional to geographical distance across the Lower Mekong Basin. This hypothesis is supported by results of the Mantel’s correlation analysis, which although significant, was weak (r =0.196, p

Suggest Documents