Dataset for Detection and Segmentation of the Radiographic Features of Pulmonary Edema
Description
Objectives: This comprehensive dataset is well suited for training, evaluating, and using machine learning models to detect, segment, and analyze radiological features associated with pulmonary edema in chest X-ray images. Description: This dataset consists of a collection of chest X-rays extracted from the MIMIC database, carefully collected at the Beth Israel Deaconess Medical Center. In total, it comprises 1000 chest X-rays obtained from 741 patients with features suggestive of edema. These X-rays were carefully selected for manual annotation. The annotations are rich and detailed, covering specific radiological features commonly associated with pulmonary edema, including cephalization, Kerley lines, peribronchial cuffing, pleural effusions, bat wings, and infiltrates. Furthermore, each chest radiograph is thoughtfully assigned a severity category, categorizing it as "no edema", "vascular congestion", "interstitial edema", or "alveolar edema". Annotation Method: The annotation process was meticulously performed by a highly qualified clinician with over 10 years of radiology experience, utilizing both frontal and lateral views for each chest X-ray study. Cephalization and Kerley lines were delineated using polylines, while other features were delineated using binary masks. This methodological approach was carefully chosen to provide a comprehensive data set that would ensure accuracy in subsequent analyses and label assignments. Features and Severity Labels: The dataset includes a wide variety of radiological features, with a total of 4293 annotations. These include: 1. Cephalization: 1656 annotations 2. Kerley lines: 609 annotations 3. Peribronchial cuffings: 30 annotations 4. Pleural Effusions: 317 annotations 5. Bat wings: 1604 annotations 6. Infiltrate: 77 annotations At the same time, the dataset features a wealth of severity labels, totaling 741 cases, thoughtfully categorized as: 1. No edema: 21 cases 2. Vascular congestion: 74 cases 3. Interstitial edema: 51 cases 4. Alveolar edema: 595 cases Notably, all features are represented as bounding boxes, meticulously defined by their respective upper-left (x1; y1) and lower-right (x2; y2) corners. In addition, selected features are provided with masks encoded in base 64 format. To facilitate seamless decoding, we provide a conversion script called " mask_converter.py" that allows the transformation of encoded masks into a versatile numpy array format. This feature improves the usability of the dataset for precise analysis and deep learning applications. Datasets: 1. Source dataset: Clinician-labeled chest X-rays, consolidated into a single spreadsheet, featuring only frontal-view images. 2. Processed dataset: Emphasizes lung area analysis, excluding extraneous regions unused in clinician decision-making. 3. COCO dataset: Prepared in COCO format, ideal for training and testing, with subsets for each evaluated feature.