Early Detection of Infant Common Illnesses (0–3 Years): Symptom-Based Dataset from Bangladesh
Description
The data consists of 2,553 responses obtained through a survey from the parents and primary caregivers of the children under the age of three in Bangladesh. It provides detailed demographic information (like age and gender), symptom reports (including respiratory problems, diarrhea, fever, skin rashes, and ear infections), the concerns of caregivers and final names of illnesses. The survey was also designed to capture early markers of frequent infant illnesses, and to reflect observations made in real-world care. The dataset aims to advance studies on early detection, monitoring and management of common childhood illness. It is a base for public health research, pediatric health studies and decision support systems both for parents and HCP. Moreover, it can be applied to machine learning purposes, such as symptom-based illness prediction and pattern analysis for the pediatric health records. Based on the provision of trustworthy and well-organized information, this dataset hopes to help in the timely diagnosis, intervention, and care of children outcomes in Bangladesh. Keywords: Infant health; childhood illnesses; symptom dataset; healthcare dataset; machine learning; ARI; diarrhea; fever; Bangladesh
Files
Steps to reproduce
Data Collection and Methodology This dataset was generated through a structured survey targeting parents and primary caregivers of infants aged 0–3 years in Bangladesh. The survey was designed to capture demographic information, symptom reports (such as respiratory problems, diarrhea, fever, skin rashes, and ear infections), caregiver concerns, and final illness labels. Methods and Protocols Survey Design: A structured questionnaire was developed based on common infant illnesses and caregiver-reported symptoms. Participant Recruitment: Participants were recruited from multiple regions in Bangladesh to ensure diverse representation. Data Collection: Surveys were conducted via in-person interviews and digital forms. Informed consent was obtained from all participants prior to data collection. Data Cleaning and Standardization: Responses were reviewed to remove inconsistencies, correct errors, and standardize entries for demographic and symptom data. Labeling: Final illness labels were assigned based on caregiver reports, medical diagnosis (where available), and observed symptom patterns. Data Structuring: The cleaned and labeled data were organized into a structured format (CSV/Excel), with columns representing demographics, symptoms, caregiver concerns, and illness labels. Validation: Quality checks were performed to ensure completeness, accuracy, and consistency of the dataset. Software and Tools: Data cleaning and organization were performed using spreadsheet software (e.g., Microsoft Excel) and standard data processing workflows. This methodology ensures that the dataset can be reliably reproduced and applied in research on early detection, public health analysis, and machine learning–based pediatric illness prediction.
Institutions
- Daffodil International University