Dataset for: Clonal dynamics of haematopoiesis across the human lifespan
Description
These are the datasets to support the manuscript: Clonal dynamics of haematopoiesis across the human lifespan Emily Mitchell1,2,3, Michael Spencer Chapman1,#, Nicholas Williams1,#, Kevin J Dawson1,#, Nicole Mende2, Emily F Calderbank2, Hyunchul Jung1, Thomas Mitchell1, Tim Coorens1, David H Spencer4, Heather Machado1, Henry Lee-Six1, Megan Davies5, Daniel Hayler2, Margarete Fabre1,2,3, Krishnaa Mahbubani6,7, Fede Abascal1, Alex Cagan1, George Vassiliou1,2,3, Joanna Baxter3, Inigo Martincorena1, Michael R Stratton1, David Kent8, Krishna Chatterjee9, Kourosh Saeb Parsy6,7, Anthony R Green2,3, Jyoti Nangalia1,2,3*, Elisa Laurenti2,3*, Peter J Campbell1,2*. # These authors contributed equally. * These authors contributed equally. Affiliations (1) Wellcome Sanger Institute, Hinxton, CB10 1SA, UK. (2) Wellcome-MRC Cambridge Stem Cell Institute, Cambridge Biomedical Campus, Cambridge, CB2 0AW, UK. (3) Department of Haematology, University of Cambridge, Cambridge, CB2 2XY, UK. (4) Department of Medicine, McDonnell Genome Institute, Washington University, St. Louis, MO, USA. (5) Cambridge Molecular Diagnostics, Milton Road, Cambridge, CB4 0FW, UK. (6) Department of Surgery, University of Cambridge, Cambridge, CB2 0QQ, UK. (7) Cambridge Biorepository for Translational Medicine, NIHR Cambridge Biomedical Research Centre, University of Cambridge, Cambridge CB2 2XY, UK. (8) York Biomedical Research Institute, Department of Biology, University of York, York, YO10 5DD, UK. (9) Wellcome Trust-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK.
Files
Steps to reproduce
All scripts and some smaller data matrices are available on github (https://github.com/emily-mitchell/normal _haematopoiesis). Raw sequencing data is available on EGA (accession number EGAD00001007851). The main data needed to reanalyse / reproduce the results presented is available here, on Mendeley Data. See below for a guide to what is available on Mendeley Data. dNdS_input folder Contains all raw input files for the dN/dS analysis. Filtering_output_XXXX folders (one for each individual) Contains four files: a) annotated_mut_set_XXXX_01_standard_rho01 This is an R data object and is uploaded into an R workspace using load() The genotype matrix used for MPBoot tree building is available in the matrix: filtered_muts$Genotype_shared_bin The dna strings used as input for MPboot are available in the vector: filtered_muts$dna_strings The annotated variant calls with tree node information are available in the matrix: filtered_muts$COMB_mats.tree.build$mat The genotype matrix of mutations calls per sample is available in: filtered_muts$COMB_mats.tree.build$Genotype_bin Information on whether the variant is an SNV or indel is available in: filtered_muts$COMB_mats.tree.build$mat$Mut_type A summary of total numbers of shared and private SNVs and indels is available in: filtered_muts$summary b) XXXX_sensitivity This file contains information on the sensitivity of SNV and Indel calls per sample. c) tree_XXXX_01_standard_rho01.tree The raw tree with branch lengths equal to number of mutations assigned (without adjustment for sequencing coverage). metadata_matrix folder Contains file “Summary_cut.csv” which records metadata on each sample in the dataset including cell_type sorted, sequencing depth, sequencing_platform, SNV burdens, indel burdens and telomere length.