Arbovirus clinical data, Brazil, 2013–2020
Description
This data set has pre-processed data from SINAN throughout Brazil between the years 2013 to 2020, containing sociodemographic, clinical, and laboratory information on Dengue and Chikungunya patients, as well as suspected patients who were confirmed to have neither illnesses. This dataset is in a single CSV file that can be filtered through the CLASSI_FIN attribute, which is divided into three types: "Dengue", "Chikungunya", "Discarded/Inconclusive".
Files
Steps to reproduce
The data were collected from the Health Problem and Notification Information System, from Portuguese Sistema de Informação de Agravo de Notificação (SINAN), which has records of patient notifications with a diagnosis of disease present on the national list of compulsory notification of diseases, injuries, and public health events, as is the case of Dengue and Chikungunya. The data collected contains notifications of Dengue and Chikungunya cases that occurred in the Brazilian territory, comprising all the 26 States and the Federal District (Brasília), between 2013 and 2020. Data referring to Dengue patients contain clinical information (pre-existing symptoms and comorbidities), laboratory tests performed, and socio-demographic data for each patient. However, data regarding Chikungunya cases contain only socio-demographic information. Although Chikungunya's data set does not have any clinical and laboratory information, a very small amount of data, about 100 records, have clinical and laboratory information. This is due to the fact that these data were treated as suspected Dengue, and therefore were in the Dengue data set, and only later were confirmed as cases of Chikungunya. Finally, no sensitive patient information is available. First, the SINAN data from all States were unified, resulting in 13,421,230 notifications and 118 attributes. The records were grouped into three distinct groups, located in the CLASSI\_FIN attribute: "Dengue", "Chikungunya", "Discared/Inconclusive". Only notifications that were confirmed or discarded/inconclusive through laboratory tests were selected. After this step, the attribute used for filter (CRITERIO) was also removed, since it now contains only a single value. The attribute TP\_NOT, this attribute identifies the type of notification generated, as all notifications are of the "Individual" type, this attribute has the same value in all records. Attributes that had more than 60\% null data or that were not in the original data dictionary were also removed. Attributes that still had null fields were filled with the default value referring to “not informed” of each attribute, according to the dictionary. The transformation from categorical to numerical data was also carried out. At the end of the process, the data set consisted of 4,307,513 records for Dengue, 325,000 records for Chikungunya, and 2,100,029 records for Discared/Inconclusive.