Generated Prediction Data of COVID-19's Daily Infections in Brazil
Description
Version 4 important changes: - Added a compressed zip file "Evaluating different time-steps.zip" including the evaluation of performance for 10, 20, 30, 40, and 50 time-steps alternatives. - The "Generated Prediction Data of COVID-19's Daily Infections in Brazil.zip" compressed zip file includes all the files and folders in this dataset - except for the evaluation of time-steps alternatives. Dataset general description: • This dataset reports 4195 recurrent neural network models, their settings, and their generated prediction csv files, graphs, and metadata files, for predicting COVID-19's daily infections in Brazil by training on limited raw data (30 and 40 time-steps). The used code is developed by the author and located in the following online data repository link: http://dx.doi.org/10.17632/yp4d95pk7n.2 Dataset content: • Models, Graphs, and csv predictions files: 1. Deterministic mode (DM): includes 1194 generated models' files (30 time-steps), and their generated 2835 graphs and 2835 predictions files. Similarly, this mode includes 1976 generated models' files (40 time-steps), and their generated 7301 graphs and 7301 predictions files. 2. Non-deterministic mode (NDM): includes 20 generated models' files (30 time-steps), and their generated 53 graphs and 53 predictions files. 3. Technical validation mode (TVM): includes 1001 generated models' files (30 time-steps), and their generated 3619 graphs and 3619 predictions files for 349 models (out of a 358 sample but 9 models didn't achieve the accuracy threshold), which are a sample of 1001 models. Also, all data of the control group - India. 4. 1 graph and 1 prediction files for each of DM and NDM, reporting evaluation till 2020-07-11. • Settings and metadata for the above 3 categories: 1. Used settings during the training session in json files (files count in technical validation settings folder neglects the accuracy threshold - 5370 files, unlike the zip file - 3619 files). 2. Metadata: training - prediction setup and accuracy in csv files. Raw data source used to train the models: • The used raw data [1] for training the models is from: COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University) : https://github.com/CSSEGISandData/COVID-19 (accessed 2020-07-20) • The models were trained on these versions of the raw data (both accessed 2020-07-08): 1. till 2020-06-29: https://github.com/CSSEGISandData/COVID-19/raw/78d91b2dbc2a26eb2b2101fa499c6798aa22fca8/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv 2. till 2020-06-13: https://github.com/CSSEGISandData/COVID-19/raw/02ea750a263f6d8b8945fdd3253b35d3fd9b1bee/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv References: 1- Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1
Files
Steps to reproduce
The steps that were performed to reproduce the repeated determinism during the training session in the period specified in the metadata files, are reported in the json settings file of each model, csv files, and pkl files in the settings folder.