Dataset: Resolving species assignment in public data refines biodiversity mapping of widespread amphibians in Asian ecozones
Description
Amphibians with wide distributions are often subject to misidentification and inconsistent taxonomic labeling in open-access databases and published studies, leading to confusion over species boundaries. This dataset supports a multifaceted species delimitation framework applied to three dicroglossid frog genera: Fejervarya, Hoplobatrachus, and Quasipaa, distributed across the Indomalayan and Palearctic ecozones. The datasets includes (a) multilocus (4,044 bp), (b) concatenated supermatrix of genome-wide SNP data (~700,000 sites), and (c) 147 genome-wide loci from open-source GBS, RNA-seq, and WGS/WCS datasets, yielding 155 loci with 20 variants per split locus from 517 taxon entries and representing 14 recognised species across the two ecozones. These data were used to evaluate 18 competing topological hypotheses using Bayesian species delimitation, model selection, and iterative genomic refinement. Complementary data layers include ecological niche models (ENMs) and advertisement call recordings from key populations in East Asia. The dataset enables the identification of cryptic clades and provides ecological and acoustic evidence supporting taxonomic divergence within the Fejervarya kawamurai, F. multistriata, and F. limnocharis species complexes. Environmental variables such as precipitation of the driest month, elevation, and diurnal temperature range were identified as key predictors shaping distribution patterns. Additionally, call analysis validated the distinctiveness of sympatric clades within the Eastern Yangtze River Basin. The genomic and ecological datasets provided here support the correction of mislabeled GenBank records and offer a reusable framework for future taxonomic revision, comparative phylogeography, and amphibian biodiversity research.
Files
Steps to reproduce
Overview: This dataset supports an integrative Bayesian species delimitation study conducted across three genera of Dicroglossid frogs (Fejervarya, Hoplobatrachus, and Quasipaa) in the Indomalayan and Palearctic ecozones. It includes multilocus and genome-wide sequence alignments, StarBEAST2 XML configuration files for divergence dating and species tree estimation, as well as environmental niche model (ENM) outputs generated using MaxEnt. These data were used to infer phylogenetic relationships, estimate divergence times under a relaxed log-normal molecular clock, and project species distributions across East and Southeast Asia. Methods Summary: 1. Phylogenetic & Species Tree Inference Data format: NEXUS (aligned) Tools: BEAST2 via StarBEAST2 plugin (vX.X) Clock model: Relaxed log-normal clock (RLC) Sampling: Path sampling & nested sampling for model comparison Input: Multilocus alignments + genome-wide SNP assignments (merged by species) Output: XML configuration files, posterior distributions, dated species trees 2. Ecological Niche Modeling Software: MaxEnt v3.4.X Species modeled: Fejervarya kawamurai and F. multistriata Environmental variables: 19 BIOCLIM layers + elevation (see folder) Outputs: Projected distributions, response curves, jackknife tests Disclaimer: I only able to provide input data for all modelling due to the enormous size of divergence dating, species tree, nested sampling and maxent outputs for all 18 species delimitation models, readers can request the details fo the outputs file to me at dy.othman@gmail.com
Institutions
- Nanjing Forestry UniversityNanjing