Data: Benchmarking batch correction methods for synthesizing imbalanced microbiome community profiles

Published: 24 May 2023| Version 1 | DOI: 10.17632/5xrfg5dym6.1
Contributors:
Alicia Foxx,

Description

Batch variation is unwanted variation that plagues syntheses of microbiome sequence data. Batch effects correction algorithms (BECAs) aim to remove batch effects, but most BECAs do not account for a common problem whereby batch covariates of interest are imbalanced (e.g., when classes do not appear in all batches or in even sample proportions). Here we tested five BECAs on eight seed microbiome studies which are prone to severe batch effects due to variable seed handling practices. We compared the performance of BECAs including zero-mean centering (ZMC), Ratio-A, ConQuR, PLSDA, and wPLSDA (developed for imbalanced batch-covariates). We also account for the sparsity and compositionality of microbiome data with zero imputation and center log ratio transformation (CLR). We found 1) using a redundancy analysis, that no method reduced variation explained by the unwanted covariate to zero; 2) ConQuR, Ratio-A, and ZMC removed the magnitude of batch effects per a guided principal component analysis which quantifies the magnitude of batch effects (δ = 0, p<0.001); and 3) CLR and zero imputation improved the removal of batch effects and variance explained by the wanted variable by ZMC. These results call for careful application of BECASs and indicate that ZMC, Ratio-A, ConQuR provide some improvements in remediating batch effects in batch-covariate imbalanced data. Continued development of BECAs is urgently required for successful use for batch corrections in this use case.

Files

Steps to reproduce

README file contains information on reproducing

Institutions

Chicago Botanic Garden

Categories

Microbiome, Seed

Funding

Oak Ridge Institute for Science and Education

United States Department of Energy

Agricultural Research Service

Licence