Genotype data for a set of 163 worldwide populations

Published: 10-02-2020| Version 3 | DOI: 10.17632/ckz9mtgrjj.3
George Busby


Here is a combined dataset of genetic data on 2,643 individuals from 163 worldwide human populations. These genotypes were all generated on Illumina chips (550, 610, 660) for multiple different studies. The two main papers that this dataset was compiled for are: Hellenthal, et al 2014 A Genetic Atlas of Human Admixture History, Science; and Busby, et al 2015 The role of recent admixture in forming the contemporary West Eurasian genomic landscape, Current Biology. The data are in PLINK format and the BusbyWorldwidePopulations.csv file outlines where the different datasets come from. Note that because these two datasets were combined together, not all populations are typed on the same set of SNPs. We have included genotype data on 523,443 SNPs, of which 441,038 are genotyped on at least 97.5% of individuals. Therefore, additional QC steps are required to filter this set down to high quality calls, depending on the subset of samples that are required. Complete information about the populations used is available in the various publications that are outlined in the associated paper. Note that these same populations are available elsewhere and this dataset represents that compiled for the above mentioned papers. UPDATE 11/11/2019 Thanks to some heroic work by Kristján Helgi Swerford Moore at DECODE, I have now updated the population and sample information to more accurately and verbosely label the individuals.