Hybrid models based on genetic algorithm and deep learning algorithms for nutritional Anemia disease classification. Biomedical Signal Processing and Control, 63, 102231. https://doi.org/10.1016/j.bspc.2020.102231

Published: 18 October 2022| Version 1 | DOI: 10.17632/dt89jydgnv.1
Contributors:
Serhat KILIÇARSLAN, Mete Celik, Safak Şahin

Description

The anemia dataset used in this study were obtained from the Faculty of Medicine, Tokat Gaziosmanpaşa University, Turkey. The data contains the complete blood count test results of 15,300 patients in the 5-year interval between 2013 and 2018. The dataset of pregnant women, children, and patients with cancer were excluded from the study. The noise in the dataset was eliminated and the parameters, which were considered insignificant in the diagnosis of anemia, were excluded from the dataset with the help of the experts. It is observed that, in the dataset, some of the records have missing parameter values and have values outside the reference range of the parameters which are marked by specialist doctors as noise in our study. Thus, records that have missing data and parameter values outside the reference ranges were removed from the dataset. In the study, Pearson correlation method was used to understand whether there is any relationship between the parameters. It is observed that the relationship between the parameters in the dataset is generally a weak relationship which is below p < 0.4 [59]. Because of this reason none of the parameters excluded from the dataset. Twenty-four features (Table 1) and 5 classes in the dataset were used in the study (Table 2). Since the difference between the parameters in the dataset was very high, a linear transformation was performed on the data with min-max normalization [30]. This dataset consists of data from 15,300 patients, of which 10,379 were female and 4921 were male. The dataset consists of 1019 (7%) patients with HGB-anemia, 4182 (27%) patients with iron deficiency, 199 (1%) patients with B12 deficiency, 153 (1%) patients with folate deficiency, and 9747 (64%) patients who had no anemia (Table 2). The transferring saturation in the dataset was obtained by the "SDTSD" feature, using the Eq. (1), which was developed with the help of a specialist physician. Saturation is the ratio of serum iron to total serum iron. In the Equation SD represents Serum Iron and TSD represents Total Serum Iron.

Files

Institutions

Bandirma Onyedi Eylul Universitesi

Categories

Data Mining, Computer

Licence