The first genomic study on Lake Tanganyika sprat Stolothrissa tanganicae indicates a lack of population structure in this important fisheries target

Published: 22-11-2018| Version 2 | DOI: 10.17632/hhd3mz3myd.2
Els De Keyzer,
Zoë De Corte,
Maarten Van Steenberge,
Joost Raeymaekers,
Federico Calboli,
Nikol Kmentová,
N'sibula Mulimbwa,
Massimiliano Virgilio,
Carl Vangestel,
Pascal Masilya Mulungula,
Filip Volckaert,
Maarten Vanhove


Clupeid fisheries in Lake Tanganyika (East Africa) provide food for millions of people in one of the world’s poorest regions. Due to climate change and overfishing, the clupeid stocks of Lake Tanganyika are declining. We investigate the population structure of Lake Tanganyika sprat Stolothrissa tanganicae to understand its evolutionary biology. This species is a pelagic clupeid endemic to Lake Tanganyika. We hypothesize that distinct stocks are present due to the large distance between the northern and southern end of the lake (isolation by distance), limnological differences between the North and the South (adaptive evolution), or the long distinct history of three subbasins of the lake (historical discreteness). We performed a population genetic study on wild-caught Lake Tanganyika sprat through a combination of mitochondrial genotyping (96 individuals) and RAD sequencing (84 individuals). Samples were collected at five locations along the north-south axis of Lake Tanganyika. A haplotype network of the mtDNA data did not show any obvious phylogeographic structure and pairwise FST is low. RAD sequencing yielded a panel of 12,008 SNPs, which showed low genetic differentiation (FST = 0.004; 95 % CI: 0.004 - 0.005). PCA and pairwise FST did not suggest relevant genetic isolation across populations. These results show no evidence for the hypotheses of isolation by distance and of diversification during historical isolation. Since no outlier loci were detected in the RADseq data, adaptation to environmental differences between the North and the South of the lake seems unlikely. Our results show very weak geographical structuring of the stock and do not provide evidence for genetic adaptation to historical or environmental differences over the north-south axis. We speculate on the causes for the unexpected pattern.


Steps to reproduce

RAD library preparation Six RAD libraries, each including 16 individually indexed specimens, were prepared according to the protocol described in Baird et al. [53] and Etter et al. [54]. Individual DNA samples were digested using restriction enzyme SbfI-HF (NEB, cut site 5’-CCTGCA^GG-3’) and individually barcoded with P1 adapters ligated to the fragment’s overhanging end. The RAD libraries were sheared to a size of 350 base pairs (bp) and the fragments between 200-700 bp selected by gel size selection. A second, library-specific barcoded adapter (P2), was ligated to the DNA fragments for identification of the samples. RAD libraries were sequenced on an Illumina HiSeq1500 platform at the Medical Centre for Genetics of the University of Antwerp, Belgium. Processing of RAD data Overall read quality was assessed using the FastQC software v0.11.5 [55]. Raw sequence data was demultiplexed using the process_radtags module in Stacks v1.46 [56,57] (Catchen et al. 2011; Catchen et al. 2013), while reads characterized by ambiguous barcodes, ambiguous cut sites or low quality scores were discarded. PCR duplicates were removed via the clone_filter module and SNPs were called using the denovo_map pipeline, both implemented in Stacks. We screened a range of parameter combinations and selected a minimum coverage of ten reads per stack (m = 10) and a maximum number of five base pair differences between stacks within (M = 5) and between (n = 5) individuals. This parameter setting allowed us to retain a sufficient number of orthologues at a considerable depth. Individuals with insufficient raw reads (< 0.8 million) and a high proportion of missing data (> 80%) were removed. A final round of filtering was performed using VCFtools v0.1.14 [58] in order to discard sites characterized by heterozygosity excess (p-value < 0.01), a minimum allele frequency of less than 0.01, and more than 20% of missing data.