Dataset for "From metabarcoding to metaphylogeography: separating the wheat from the chaff"
Description
This dataset contains files with the relevant information on a metabarcoding dataset used as a case study to test the feasibility of performing phylogeographical studies on metabarcoding data, by Turon X, Antich A, Palacin C, Praebel, K and Wangensteen OS. Abstract of the manuscript: 1. Metabarcoding is by now a well-established method for biodiversity assessment. Metabarcoding datasets are usually used for α- and β-diversity estimates, that is, interspecies (or inter-MOTU) patterns, but they contain an enormous amount of intraspecies (intra-MOTU) information - so far untapped. 2. The use of COI amplicons is gaining momentum in metabarcoding studies targeting many eukaryote groups. COI has for a long time been the marker of choice in population genetics and phylogeography studies. Therefore, COI metabarcoding datasets can be used to study intraspecies patterns and phylogeographic features for hundreds of species simultaneously, opening a new field which we suggest to name metaphylogeography. 3. The main challenge for the implementation of this approach is the separation of erroneous sequences from true intra-MOTU variation. Here, we develop a cleaning protocol based on changes in entropy of the different codon positions of the COI sequence, together with co-occurrence patterns of sequences. 4. Using a dataset of community DNA from several benthic littoral communities in the Mediterranean and Atlantic seas, we first tested by simulation on a subset of sequences a two-step cleaning approach consisting of a denoising step followed by a minimal abundance filtering. The procedure was then applied to the whole dataset. 5. We obtained a total of 563 MOTUs that were usable for phylogeographic inference. We used semiquantitative rank data instead of read abundances to perform AMOVAs and haplotype networks. Genetic variability was mainly concentrated within samples, but with an important between-seas component as well. There were inter-group differences in the amount of variability between and within communities in each sea. For two species the results could be compared with traditional Sanger sequence data available for the same zones, giving similar patterns. 6. Our study shows that metabarcoding data can be used to infer intra- and interpopulation genetic variability of many species at a time, providing a new method with great potential for basic biogeography, connectivity and dispersal questions and for the more applied fields of conservation genetics, invasion genetics, and design of protected areas.