AB1 chromatograms used to validate the ConTAMPR method for detecting DNA contamination

Published: 2 July 2018| Version 1 | DOI: 10.17632/fsjpwrnv4t.1


Contingency Table Analysis of Minority Peak Ranks (ConTAMPR) is a method to interrogate Sanger sequencing chromatograms for evidence of multiple substantially divergent DNA sequences, and to identify a contaminant statistically from a panel of candidates. It was originally developed and used to test for cross-contamination in PCR amplicons generated from samples of DNA extracted from bdelloid rotifers in the genus Adineta. The 38 chromatograms in this dataset were generated during an experiment in which two rotifers of different species were deliberately placed in the same tube and had their DNA extracted together. A fragment of the mitochondrially encoded cytochrome c oxidase subunit I was amplified by PCR using the primers HCO1 and LCO1 and the amplicons were sequenced in both directions by Macrogen Europe on an ABI 3730xl DNA Analyzer (Applied Biosystems). Samples in the "1X" groups had DNA from a single rotifer (either Adineta vaga AD008 or A. sp. AD006). Samples in the "2X" group had DNA from two rotifers, one of each species. Multiple biological replicates are indicated with numbers (01-06), and technical replicates with letters (a-c). The chromatograms for "1X" samples have some noise (small minority peaks, base call errors, etc.), but this noise is not systematic. Each of the two species has a characteristic and consistent mitochondrial haplotype. The chromatograms for "2X" samples also each show a majority haplotype corresponding to one of the two species, but the identity of this majority haplotype differs from sample to sample, and the second species is revealed via a systematic pattern of minority fluorescence peaks at sites where the two species differ. This pattern can be analysed by focusing on sites where the two species differ, and looking for the minority fluorescence peaks corresponding to the predicted contaminant haplotype. Each peak is assigned a height rank (2, 3 or 4). The contaminant can be identified statistically by tabulating these peak ranks and using a Chi-square contingency test. The contaminant haplotype will show a statistically significant excess of second-ranked peaks relative to a null (equal) expectation and also relative to other candidate contaminants. This method was validated using eight of the files in this dataset: LCO1_2X_6_8_05a; HCO1_2X_6_8_05a; LCO1_2X_6_8_03a; HCO1_2X_6_8_03a; LCO1_1X_6_01a; HCO1_1X_6_01a; LCO1_1X_8_02c; HCO1_1X_8_02c . Biological and technical replicates showed comparable patterns. For further details please consult the associated publication.


Steps to reproduce

Reference mtCO1 sequences for the two rotifer species involved in this experiment are available from GenBank: JX184001 (AD008) and KM043183 (AD006). Details of DNA extraction and PCR parameters are provided in the associated publication.


Imperial College London


Natural Sciences, Genetics, Bioinformatics, Rotifera, DNA Sequencing, Polymerase Chain Reaction, mtDNA, Contamination