SpectroPhon DBM for ML

Published: 16 July 2025| Version 1 | DOI: 10.17632/bcp7snw4vr.1
Contributors:
,
, Apparaju Sreeharsha

Description

The dataset comprises three versions, generated for 200 subjects, 186 subjects, and 164 subjects, respectively, all derived from the extrapolated Spectrohon DBM dataset (10.17632/jt22782wjh.1). These versions include weight-based and sweat-based features for a machine learning classification tasks, with the thresholds for the four-class weight-based classification model explicitly defined in the code. Additionally, the dataset provides unseen input samples intended for use with the Gadio GUI on Hugging Face Spaces for UI testing.

Files

Steps to reproduce

To enable effective classification of hydration status, the original Spectrohon DBM dataset was enriched through a comprehensive feature engineering process. As the initial dataset did not include classification labels, additional features were derived using domain-specific knowledge and established physiological assumptions. This involved extrapolating data related to weight and sweat measurements, including projected weight changes, weight loss rates, and salt concentration estimates over extended time intervals. These enhancements were designed to capture temporal trends relevant to hydration monitoring. Three types of classification labels were generated. Two of these were based on weight change: a binary label indicating whether subjects had experienced a weight loss of 1.99% or more, and a four-class label distinguishing between hydrated, mildly dehydrated, moderately dehydrated, and dehydrated states. These labels were calculated by comparing body weight measurements before and after water intake and exercise, using established percentage thresholds. The third label was based on sweat composition and derived from estimated sweat osmolality, itself calculated through extrapolated salt loss and adapted physiological formulas. Subjects with a sweat osmolality of 149 mmol/kg or higher were classified as dehydrated. To support a thorough evaluation of model performance under varying data quality conditions, three versions of the dataset were created. The first version retained all original data, including missing values and outliers. The second excluded only instances with missing data, while the third version was fully cleaned, removing both missing entries and physiologically implausible values. Each version supports all three classification tasks and allows for comparative assessment of model robustness across different levels of data integrity.

Institutions

  • Edge Hill University

Categories

Dehydration, Hydration, Applied Machine Learning

Licence