Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format.
Description
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
Files
Steps to reproduce
A Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and text input. This work is an extended version of our relevant formulating of the dataset presented in [1], and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset of these signals contains eleven medical features from heterogenous sources extracted from four signal types. Firstly, the ECG sensor's signals contain QRS width, ST elevation, peak numbers, and cycle interval. Secondly: The spO2 level from the SpO2 sensor's signals. Third, blood pressure sensors' signals contain systolic and diastolic values, and finally, text input considers non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for chronic heart diseases. Each signal has a different range of values. The data processing algorithm was designed to determine this feature's probability based on each feature's range. We implemented our algorithms in a simulation environment. Simulation Setup: The software architecture for our algorithms is implemented with Python programming language. The dataset is reorganized and re-formatted probability based on a limited range in structure dataset format increasing these databases. The dataset is represented in terms of tables. The dataset illustrates the patient's profile and type of disease.