AERIAL Nasal Microbiome Taxonomy Table

Published: 7 April 2026| Version 1 | DOI: 10.17632/vtr3m7tvzc.1
Contributors:
JOSE ANTONIO CAPARROS MARTIN,
,
,
,
,
,
,
,
,
,
,
,
, Stephen Stick

Description

This repository contains de-identified taxonomy count tables from nasal swab microbiome samples collected in the AERIAL birth cohort. These data support the associated manuscript examining associations between early-life nasal microbiota, respiratory viral infections, and wheezing-related outcomes during the first year of life. Taxonomic assignments were generated from full-length 16S rRNA sequencing. In accordance with participant consent and ethics requirements, only taxonomy data from participants who consented to data sharing are included in this repository. No personally identifiable information is provided. Raw sequencing data are not publicly available because of ethical and privacy restrictions.

Files

Steps to reproduce

Nasal swab samples were collected from infants enrolled in the AERIAL birth cohort, a sub-study nested within the ORIGINS cohort. Samples were obtained from both nostrils at scheduled asymptomatic visits (approximately 3, 6, and 9 months of age) and during symptomatic respiratory episodes recorded using a purpose-designed smartphone app. For each sampling event, one swab was used for respiratory virus testing by qPCR, and the second swab was preserved for microbiome analysis. Microbial DNA was extracted from nasal swabs using the QIAamp DNA kit (QIAGEN) with a bead-beating pre-processing step. Negative extraction controls were included, and bacterial load was quantified using a pan-bacterial TaqMan assay to help identify potential reagent contaminants in these low-biomass samples. The full-length 16S rRNA gene was amplified, libraries were prepared using the SMRTbell preparation kit 3.0, and sequencing was performed on a PacBio Sequel IIe platform. Sequence reads were processed and assigned taxonomy using DADA2 (v1.28.0) within the nf-core/ampliseq pipeline (v2.8.0), run with Nextflow (v25.04.2). Taxonomic assignment used the naïve Bayesian classifier implemented in DADA2 with the Genome Taxonomy Database (GTDB, vR09-RS220) as reference. Potential contaminants were identified and removed using the decontam R package (v1.20.0), guided by negative controls and bacterial load quantification. Downstream statistical analyses were performed in R (v4.3.0) within RStudio (v2023.03.0). Microbiome community structure was analysed using PERMANOVA on Aitchison distances, with principal component analysis (PCA) and diversity indices calculated after centered log-ratio transformation using functions from the mixOmics package (v6.24.0). Community clustering was assessed using Dirichlet Multinomial Mixtures (DMM) models, and differential abundance testing was performed with ANCOM-BC2. Additional analyses included correlation tests, linear and logistic regression, and generalized linear models, with false discovery rate correction for multiple testing. Because of ethics and consent restrictions, only de-identified taxonomy count tables for participants whose parents or caregivers consented to data sharing are included in the public repository. Raw sequencing files are not publicly available.

Institutions

Categories

Microbiology, Pediatrics, Respiratory Medicine, Bioinformatics, Microbiome

Licence