Shotgun metagenomics data of sediment microbiome of unprotected arid-tropical natural wetlands in South Africa

Published: 7 September 2023| Version 1 | DOI: 10.17632/mm25y745hz.1
Henry JO Ogola,


Sediment samples from 10 different arid-tropical natural wetlands in Limpopo Province, South Africa, were collected in August 2021 to investigate the microbial community structure, diversity, species richness, and functional metabolic potential. To achieve this, we employed the high-throughput Illumina NextSeq 2000 platform for shotgun sequencing of the sediment microbiome. The raw fastq files have been deposited in the NCBI SRA database as BioProject ID PRJNA972844 and Biosample accession numbers SRX20358958 to SRX20358949. The following tables represent supplementary material of a preliminary overview of the dataset in terms of the relative abundance of enzyme gene families (Table 1) and pathways (Table 2) across the wetland sediments detected from shotgun sequencing results.


Steps to reproduce

Sediment samples were collected from 10 unprotected natural wetlands spread across the arid-tropical region of Limpopo Province, in the northeastern South Africa. Sediments soil samples were collected at three different sites from the surface of the bed substrate (0–10 cm deep) using dredge sampler (Kajak, KC-Denmark) for each wetland. At each site, a multi-point mixed sampling method was used, where five soil subsamples were collected randomly with an area of 2 m × 2 m and then mixed into one sample (the sediment) for DNA extraction (20 g) and soil properties data (300-400 g. For DNA extraction, approximately 20 g sediments were collected in a centrifuge tube and immediately frozen in liquid N2 and stored at -80 oC. DNA libraries were prepared using the Nextera XT® DNA Library Preparation Kit (Illumina Inc., San Diego, CA, United States) and IDT Unique Dual Indexes® Tagmentation Kit (Illumina Inc., San Diego, CA, United States) with total DNA input of 1ng. The sequencing of the constructed libraries was performed on an Illumina NextSeq 2000 platform 2x150bp at CosmosID Inc., (Germantown, MD, USA). The raw high throughput sequencing data has been deposited into the NCBI Sequence Read Archive database as BioProject ID PRJNA972844 and SRA accession numbers SRX20358958 to SRX20358949. Illumina NextSeq2000 platform sequencing generated between 23,618,694 and 42,062,226 paired-end reads per sample. Shotgun sequences was analyzed using bioBakery 3 platform ( bioBakery 3 includes updated sequence-level quality control and contaminant depletion guidelines (KneadData), MetaPhlAn v3.0 for taxonomic profiling, and HUMAnN v3.0 for functional profiling. The primary steps included initial removal of reads mapping to the human reference database and basic quality control using KneadData ( using default settings. Then, the quality filtered sequences were used for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) using MetaPHlAn v3.0 using default setting. Functional genes and pathways were annotated using HUMAnN v3.0 and its UniRef 50, Pfam, and MetaCyc pathway databases using read sequences that were trimmed and quality filtered using KneadData .


University of Venda School of Environmental Sciences, University of South Africa - Science Campus, Jaramogi Oginga Odinga University of Science and Technology


Aquatic Ecology


Department of Science and Innovation, South Africa