Selection for somatic escape variants in SERPINA1 in the liver of patients with alpha-1 anti-trypsin deficiency
Description
These are large files containing somatic mutations (single base substitutions and small indels) from whole-genome and exome sequencing data, required to reproduce analysis in the manuscript "Selection for somatic escape variants in SERPINA1 in the liver of patients with alpha-1 anti-trypsin deficiency". The files are to be used alongside the code and smaller data files available at: https://github.com/nataliabrz/positive_selection_in_A1AT_deficiency. The files 'exome_calls.csv' and 'wgs_calls.csv' should be placed in the 'data/calls' folder from the github repository. The columns of the data-frame follow the usual VCF conventions for genome position numbering and describing the ref and alt entries. The 'sampleID' column encodes the laser capture microdissection ID. The 'cluster_id' column denotes the patient-specific branch on the phylogenetic tree the mutation was assigned to (in order to avoid double-counting of mutations shared between microdissections). The column 'vaf' contains the variant allele fraction of the mutation, 'Mut_Frags' contains the number of sequencing reads supporting the mutant alelle, and 'coverage' contains the total sequencing depth at the site of the mutation. The 'rho' and 'qval' columns pertain to the overdispersion parameter (rho) and its associated q value for each mutation, computed by the beta-binomial overdispersion filtering algorithm as described in the methods. The genome build for the position co-ordinates is GRCh38.