Synthetic Dataset for COVID-19 Back Tracing
Contact tracing involves the systematic collection and analysis of information on individuals who have had close contact with confirmed cases of an infectious disease. Contact tracing systems encompasses various surveillance inputs that aid in predicting high-risk individuals. The considered inputs include distance, overlap time, incubation time, visiting time lag, and facility size. The generated data includes 1000 scenarios involving different sets of exposed individuals. To ensure diversity within the samples, random generation techniques were employed to assign each person a distinct exposure scenario, distinct from that of others in the sample. The generated data covers a wide range of distances, spanning from one foot to 97 feet, considering indoor meetings. Facility sizes are categorized into five groups, namely very small (1), small (2), medium (3), large (4), and very large (5). The overlap time duration varies from one minute to 480 minutes, and similarly the visiting time lag ranges from one minute to 480 minutes. Lastly, the incubation time is generated within the context of COVID-19, ranging from one day to 14 days. Distance: This parameter represents the spatial separation between a confirmed infected person and an exposed individual during contact, indicating the physical distance maintained between them. Overlap Time: This parameter measures the duration of close proximity between an infected person and an exposed individual, indicating the amount of time spent in direct contact. Visiting Time Lag: This parameter denotes the time gap between consecutive visits of an individual to a particular location or facility after a confirmed infected person has departed from the same place. This factor takes into account the possibility of indirect contact with the infected person through touching a contaminated object or surface. Incubation Time: The parameter representing the period elapsed from the moment of exposure to an infectious agent to the onset of noticeable symptoms in an infected individual. Facility Size: The parameter indicating the dimensions or capacity of the specific location or venue where the contact takes place, offering insights into the physical space available for interactions.
Steps to reproduce
Due to the lack of available surveillance data, we generated a synthetic data. The generated data includes 1000 scenarios involving different sets of exposed individuals. To ensure diversity within the samples, random generation techniques were employed to assign each person a distinct exposure scenario, distinct from that of others in the sample.