Data: Benchmarking batch correction methods for synthesizing imbalanced microbiome community profiles
Description
Batch variation is unwanted variation that plagues syntheses of microbiome sequence data. Batch effects correction algorithms (BECAs) aim to remove batch effects, but most BECAs do not account for a common problem whereby batch covariates of interest are imbalanced (e.g., when classes do not appear in all batches or in even sample proportions). Here we tested five BECAs on eight seed microbiome studies which are prone to severe batch effects due to variable seed handling practices. We compared the performance of BECAs including zero-mean centering (ZMC), Ratio-A, ConQuR, PLSDA, and wPLSDA (developed for imbalanced batch-covariates). We also account for the sparsity and compositionality of microbiome data with zero imputation and center log ratio transformation (CLR). We found 1) using a redundancy analysis, that no method reduced variation explained by the unwanted covariate to zero; 2) ConQuR, Ratio-A, and ZMC removed the magnitude of batch effects per a guided principal component analysis which quantifies the magnitude of batch effects (δ = 0, p<0.001); and 3) CLR and zero imputation improved the removal of batch effects and variance explained by the wanted variable by ZMC. These results call for careful application of BECASs and indicate that ZMC, Ratio-A, ConQuR provide some improvements in remediating batch effects in batch-covariate imbalanced data. Continued development of BECAs is urgently required for successful use for batch corrections in this use case.
Files
Steps to reproduce
README file contains information on reproducing
Institutions
Categories
Funding
Oak Ridge Institute for Science and Education
United States Department of Energy
Agricultural Research Service