Population-level faecal metagenomic profiling as a tool to predict antimicrobial resistance in Enterobacterales isolates causing invasive infections: A study across three settings in Cambodia, Kenya, and the UK

Published: 12 December 2020| Version 1 | DOI: 10.17632/sxn6sw4r57.1
Olga Tosas Auguet,
Rene Niehus,
Hyun Soon Gweon,
James Berkley,
Joseph Waichungo,
Tsi Njim,
Jonnathan Edgeworth,
Rahul Batra,
Kevin Chau,
Jeremy Swann,
Sarah Walker,
Tim Peto,
Derrick Crook,
Sarah Lamble,
Paul Turner,
Ben Cooper,
Nicole Stoesser


Supplementary files include the following output datasets: 1 - "Corrected Gene Counts" (CGCs); 2 - "AB_Matrix_1_or_2"; 3 - "AMR-def"; 4 - "AMR-all"; 5 - "Dataset_For_Bayesian_Model" Files to produce these datasets are also provided (see input files) Supplementary files also comprise the results of the Bayesian modelling (See "Steps to reproduce"): 1 - Bayesian Model Comparisons 2 - Bayesian Model Predictions (best model versus null model)


Steps to reproduce

“R Code to produce output files.R” uses the following "INPUT" files: 1) CARD_read_count_specific.tsv 2) ResPipe_CARD-3.0.3.meta.tsv 3) CARD_lateral_coverage_specific.tsv 4) CARD_read_lengths_specific.tsv 5) Metagenomics_Metadata.csv 6) Infection_Data_For_Bayesian_Model.csv 7) bracken_combined_reads.tsv And produces the corrected resistance gene counts, plus a matrix that links each resistance gene and antibiotic based on the "Confers_Resistance_to_Antibiotic" relationship ontology term in CARD. In the matrix: • “1” is populated where the gene is associated with clear experimental evidence of elevated MIC for that antibiotic but the "Confers_Resistance_to_Antibiotic" relationship ontology term is missing; • “2” is populated where the gene is associated with demonstrably elevated MIC for that antibiotic and is known to confer or contribute to clinically relevant resistance to that antibiotic ("Confers_Resistance_to_Antibiotic" relationship ontology term is present). The code hence produces the following “OUTPUT” files: 1) Corrected_Gene_Counts.csv (herein attached as xlsx file to allow for a description tab) 2) AB_Matrix_1_or_2.csv Which are in turn used to produce the final datasets: 3) AMR_DEF.csv (herein attached as xlsx file to allow for a description tab) 4) AMR_ALL.csv (herein attached as xlsx file to allow for a description tab) 5) Dataset_For_Bayesian_Model.csv (herein attached as xlsx file to allow for a description tab) The code for the non-metric multidimensional scaling (NMDS) ordination method is presented at the end of the R code file as part of the supplementary methods, to validate the pooling of DNA extracts in this study. “BayesianModel.zip” provides the data and code to run the Bayesian analysis, as well as an overview of the main results as follows: 1) The dataset for analysis (Dataset_For_Bayesian_Model.csv) 2) The R code to run the analysis (Bayesian_Model.R) 3) A summary of the main analysis results (Model Predictions.xlsx; Model_Comparisons.xlsx) The raw sequence data reported in this study have been deposited in the European Nucleotide Archive under accession number PRJEB34871. The code to extract CARD data, including relationship ontology terms that were required to generate the final datasets and analyses, plus any required input files, are available from the ResPipe GitLab repository (https://gitlab.com/hsgweon/ResPipe). This includes all commands and parameters run for with TrimGalore, Kraken2, Bracken, BBPMAP and ResPipe (the bioinformatics pipeline).


University of Oxford