TyphoDx-BD: A Clinically Validated Dataset for Machine Learning-Based Typhoid and Rickettsial Disease Prediction in Bangladesh

Published: 28 July 2025| Version 1 | DOI: 10.17632/m9pnvv2vpv.1
Contributors:
,
,
,

Description

This dataset contains detailed clinical records of 1,106 patients collected from Bangladesh between October 27, 2024 and January 2, 2025. Typhoid fever is a serious bacterial infection caused by Salmonella Typhi, which is commonly spread through contaminated food and water. It remains a widespread health issue in many parts of Bangladesh, especially where sanitation and clean water access are limited. The dataset includes results from Widal tests, which are commonly used to detect typhoid and paratyphoid fever. These include antigens such as TO and TH (specific to Salmonella Typhi) and AH and BH (specific to Salmonella Paratyphi). Additionally, the dataset contains values from the Weil-Felix test, including OXK, OX2, and OX19, which are used to help identify rickettsial infections — a group of bacterial infections often presenting with similar symptoms. This dataset has been validated by a certified medical professional and is suitable for use in epidemiological research, diagnostic modeling, and public health analysis. 1. Weil-Felix Test (Rickettsial Infections): OXK, OX2, OX19 → Antigens used to detect antibodies against rickettsial infections (e.g., scrub typhus, epidemic typhus). Note: Weil-Felix is nonspecific and largely replaced by serological/PCR tests in modern labs. 2. Widal Test (Typhoid/Paratyphoid Fever): TO (Typhi O), TH (Typhi H), AH (Paratyphi A H), BH (Paratyphi B H) → Antigens measuring antibody titers for: - Salmonella Typhi (TO, TH) - Salmonella Paratyphi A/B (AH, BH). 3. Typhoid Status: Typhoid → Overall diagnosis (e.g., Minimal/Negative/Positive). Acute_typhoid (Yes/No) → Confirms active infection (likely based on clinical symptoms + high titers). Paratyphoid_A/B (Yes/No) → Specific to Salmonella Paratyphi A or B infections. 4. Rickettsial Suspicion Rickettsia_Suspect (Yes/No) → Flagged if Weil-Felix titers (OXK/OX2/OX19) or symptoms suggest rickettsiosis. 5. Additional Diagnostic Markers M, A (Binary: 0/1) → Likely represent: M: IgM antibody presence (acute phase); A: Agglutination result (if qualitative). 6. Demographic & Privacy Fields: Gender (Male/Female); Age (Years); Encrypted_Name: → Deidentified patient records for ethical compliance. Dataset Validation: Validated by: Dr. Prio Gopal Biswas (Medical Doctor) Hospital: Saranjghola Upazila Health Complex, Bagerhat, Bangladesh Data Collection Period: October 27, 2024 – January 2, 2025 Ethical Protocols and Statement: All procedures were conducted in alignment with the ethical guidelines of Daffodil International University (DIU) and in accordance with relevant national and institutional regulations. The Research Ethics Committee (REC) of the Faculty of Science and Information Technology (FSIT) at DIU granted ethical approval for this study under the approval number REC-FSIT-2024-09-17, following a thorough review process. Written informed consent was obtained from all participants.

Files

Steps to reproduce

1. Access the Dataset: Download the raw or processed dataset containing 1,106 clinically validated patient records, including Widal and Weil-Felix test results, typhoid status, demographic data, and rickettsial indicators. 2. Understand the Features: Familiarize yourself with serological markers (TO, TH, AH, BH, OXK, OX2, OX19), diagnostic flags (Typhoid, Acute_typhoid, Paratyphoid A/B, Rickettsia_Suspect), and demographic fields (Age, Gender). 3. Load and Prepare the Data: Use Python or R to load the processed dataset. No missing values or preprocessing is required for the cleaned version. 4. Select an Application Use Case (Choose your focus area): a. Disease Trend Analysis: Use time-based or demographic filters to observe infection trends. b. ML & AI Modeling: Train classification models (e.g., logistic regression, random forest, deep learning) for infection prediction. c. Statistical Analysis: Perform correlation/regression to identify significant serological predictors. d. Diagnostic Evaluation: Assess the diagnostic performance of Widal vs. Weil-Felix tests. e. Hospital Decision-Making: Simulate triage systems or decision-support tools using labeled outcomes. f. Public Health Insights: Aggregate results to recommend WASH interventions in vulnerable areas. 5. Model or Analyze the Data: Apply your selected statistical or ML techniques, using tools like pandas, scikit-learn, TensorFlow, or statsmodels. 6. Validate and Interpret Results: Validate predictions with standard metrics (accuracy, precision, AUC), and interpret findings to support clinical or public health decision-making.

Institutions

  • Daffodil International University

Categories

Disease, Machine Learning, Fever, Asian Health, Typhoid Fever, Deep Learning, Clinical Health

Licence