PCAWG Intra-Tumor Heterogeneity Simulations

Published: 31-03-2021| Version 1 | DOI: 10.17632/by4gbgr9gd.1
Stefan Dentro


This resource contains the simutated samples that go with the paper `Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes` which is available at the URL in the Related Links section below. Please find a full description of these data in the supplementary information of this paper, as well as the full author list and credit for creation of these data. Please find below a description of the included files. PhylogicNDT500: Note difference between directories marked with and without "_all". It is recommended to not use files in "_all" directories, as they contain mutations below the detection limit of 3 variant reads. * Copynumber: Segments/ directory - Subclonal copynumber is encoded via multiple reports of the same segment with a different CCF * Variants: VCF/ directory - These follow PCAWG consensus basic read count annotations * Purity/ploidy: PP_Table/ directory * Truth clusters: Subclonal_Structure/ directory * Truth assignments: Mut_Assign/ directory * Truth mutations: Mafs_Real_Info/ directory SimClone1000 - testing: * Copynumber: *_segments.txt in each sample directory - There is no subclonal copynumber * Variants: *.vcf - These follow PCAWG consensus basic read count annotations * Purity/ploidy: Included purity_ploidy.txt file SimClone1000 - truth: * Every simulation has its own directory tree. * Truth clusters (cluster position in CCF): [samplename]/simulated_0001/truth_tree/simulated_0001_subclonal_structure.txt * Truth assignments: [samplename]/simulated_0001/truth_tree/simulated_0001_mutation_assignments.txt * Truth mutations: [samplename]/simulated_0001/simulated_0001_0001/truth/simulated_0001_0001_multiplicity.txt