Multi-Diseases Dataset Composed of Medical Sensors and Text inputs: Simulated Dataset Accompanied with Codes

Published: 29-10-2020| Version 1 | DOI: 10.17632/22d2kcr2yp.1
omar salman,
Mohammed I. Aal-Nouman ,
zahraa taha,
Muntadher Q. Alsabah,
Yaseein Soubhi Hussein


This dataset package presents a simulated dataset for triaging and prioritizing patients to multi emergency levels. Four types of data are presented which are: (ECG, blood pressure, and SpO2 signals) and the fourth is text-inputs. For the ECG, blood pressure, and SpO2 signals, the online library Physionet [1], which is considered the most reliable and relevant library in healthcare services and bioinformatics sciences, was used. The library contains collections of databases, and signals related to ECG, blood pressure, and SpO2 sensor. However, Simulated data accompanied by codes are presented. The contributions of our presented dataset are (1) The presented dataset is considered as vital features extracted from the signal records. The dataset includes medical vital features: (QRS width, ST Elevation, Peaks number and Cycle interval from ECG signal, SpO2 level from SpO2 signal and High Blood (systolic) Pressure value and Low-Pressure (diastolic) value from Blood Pressure signal). Those features were extracted based on our machine learning algorithms. In addition to that, new medical features were added based on medical doctors' recommendations as text-inputs: Chest pain, Shortness of Breath, Palpitation, and whether the patient at rest or not. All these features considered significant symptoms for many diseases such as Heart attack or stroke, Sleep apnea, Heart failure, arrhythmia, and blood pressure chronic diseases. Therefore, (2) The formulated dataset is considered in the doctor diagnostic procedures for identifying the patients' emergency level. (3) in the online library [1], the ECG, blood pressure, and SpO2 were represented as signals. In contrast, we achieved the signal processing tasks and we re-present the dataset by numeric values for the vital features in Excel sheet representations. Moreover (4) based on our simulation outcomes, the dataset is re-organized and re-formatted in structure dataset format. The dataset is represented in terms of tables to illustrate the patient's profile and type of diseases. (5) The presented dataset is utilized in the evaluation of medical monitoring and healthcare provisioning systems [2], and finally (6) the Simulation codes for feature extractions functions are presented.


Steps to reproduce

The ECG, blood pressure, and SpO2 signals were collected from the online library Physionet [1]. Each signal has more than 2000 elements, each element in the signal has two values. The first value represents time and the second represents voltage. The array of each signal has two columns (each column represents a value). The number of rows is defined by the number of elements in the signal, which starts from (0) and ends at (n). A real time data processing algorithm was designed to extract the required features. We implemented our algorithms in simulation environment. Simulation Setup: The software architecture for our algorithms is implemented with JAVA programming language. Moreover, XAMMP was used. based on our simulation outcomes, the dataset is re-organized and re-formatted in structure dataset format. The dataset is re-presented in terms of tables. The dataset illustrates the patient's profile and type of diseases.