Reference COVID-19 clinical data for synthetic data generation with SASC

Published: 19 November 2021| Version 1 | DOI: 10.17632/ptz6zhknyp.1
Andrea Zaliani


The reference COVID-19 dataset was obtained from the Clinical Practice Research Datalink (CPRD). This dataset is based on real anonymized primary care patient data extracted from the CPRD Aurum database ( . Patients were typically in primary care with symptoms of COVID-19 (confirmed/suspected) and control participants with a negative COVID-19 test result. For the purpose of this paper, we made use of the CRPD COVID-19 symptoms and risk factors synthetic dataset (Version 2021.04.001). The dataset contains information on sociodemographic and clinical risk factors from 03/12/2019 to 13/04/2021. The dataset is also publicly deposited on Zenodo at



Clinical Analysis, Clinical Data Collection