EchoCardio-FMC-718: An Annotated Echocardiogram Report Dataset for Heart Disease Classification
Description
This dataset contains 718 structured Color Doppler Echocardiogram (Echo) reports collected from cardiac patients at the Diabetic Association Medical College, Faridpur, Bangladesh. The original Electronic Health Record (EHR) reports were generated as semi-structured Microsoft Word (.doc) documents by attending cardiologists. Each report has been parsed, structured into 55 clinically meaningful fields, validated through a reverse-engineering pipeline, and enriched with two layers of clinical labels: Overview : • 718 Color Doppler Echocardiogram reports from cardiac patients • Source: Diabetic Association Medical College, Faridpur, Bangladesh • Original format: semi-structured Microsoft Word (.doc) files by attending cardiologists • Each report parsed into 55 clinically meaningful fields • Two annotation layers: 11-class pathology label + 4-class severity label • All reports manually de-identified; no PHI retained Processing Pipeline: • Step 1: MS Word COM automation extracts raw tables and narrative from each .doc file • Step 2: Azure OpenAI (GPT-4o) populates the 55-field schema per report • Step 3: Reverse-engineering validation checks field-by-field consistency • Step 4: 11-class pathology annotation via priority keyword rules • Step 5: 4-class severity mapping applied • Step 6: Structured fields concatenated into NLP-ready free-text strings Structured Fields (55 total) • M-mode & 2-D measurements: EF, FS, IVST, LVIDd, LVIDs, LA, AO, RVGWT, MVA, and more • Chamber & valve descriptions: LV, RV, MV, AV, TV, PV • Structural findings: ASD, VSD, PDA, thrombus, vegetation, pericardium • Color Flow Doppler observations across all four valves • Free-text cardiologist Impression per report Files Provided: Extracted Medical Reports.csv — 718 rows × 57 columns (55 clinical fields + Cardiac_Class + label) Extracted Data Free Form NLP Ready.csv — each report as a single patient_data free-text string + 4-class label; ready for BERT-style transformer input
Files
Institutions
- North South UniversityDhaka Division, Dhaka