Mammogram Density Assessment Dataset

Published: 28 May 2024| Version 5 | DOI: 10.17632/tdx3h2fn9v.5
Contributors:
Hamid Behravan,
,
,
,

Description

GENERAL OVERVIEW This dataset was compiled to address the limitations of current methods for breast density assessment in mammograms, especially the challenges of: • Shortage of radiologists: There are not enough radiologists to efficiently analyze the large number of mammograms needed for screening. • Subjectivity: Radiologist assessments of breast density can vary, leading to inconsistencies. • Limitations of existing tools: Current CAD tools for breast density estimation often have limitations, such as restricted functionality to specific mammogram views and difficulties with accurate segmentation. This dataset offers a unique solution by expanding the original mammogram images with: • Binary masks of the breast area: These expert-annotated masks precisely delineate the entire breast region in each mammogram, providing valuable ground truth data for segmentation methods. • Binary masks of dense tissue: Similarly, these masks accurately identify areas of dense tissue within each mammogram, further enhancing the dataset's utility for training and evaluating segmentation algorithms. The dataset facilitates the development of automated breast density estimation with deep learning. It also serves as a valuable tool for researchers developing and benchmarking medical image segmentation methods specifically focused on breast tissue analysis in mammograms. DATA DESCRIPTION This dataset consists of segmentation masks for dense tissue and breast area as well as area-based breast density percentage values from the VinDr-Mammo public dataset accessible from [3]. All annotations were performed and validated by an expert radiologist. Files: The data is provided in two compressed archives, ‘train.zip’ and ‘test.zip’. • train.zip: This archive contains two sub-folders: - breast_masks: This sub-folder contains the ground truth segmentation masks for the breast area, also in JPG format. - dense_masks: This sub-folder contains the ground truth segmentation masks for the dense tissue, again in JPG format. The segmentation masks have the dimensions of 2800×3518 pixels. File Lists: Two CSV files are provided alongside the compressed archives: • train.csv: This file contains information about the training set with two columns: - Filename: This column contains the filenames of the training set images. These images can be directly downloaded from the VinDr-Mammo dataset, https://physionet.org/content/vindr-mammo/1.0.0/. - Density: This column provides the ground truth continuous breast density value for each mammogram in the training set, intended for the breast density estimation task. • test.csv: This file contains a single column, “Filename”, listing the filenames of the test set. No ground truth information is provided for the test set. Ground truths are intentionally kept private for Breast Density Kaggle Challenge https://www.kaggle.com/competitions/breast-density-prediction, however, will be eventually open to public in the dataset repository.

Files

Steps to reproduce

https://github.com/uefcancer/Mammography_dataset

Institutions

Ita-Suomen yliopisto, Kuopion yliopistollinen sairaala

Categories

Image Segmentation, Breast Imaging, Image Analysis (Medical Imaging), Breast Density, Deep Learning, Image Analysis

Funding

Finnish Innovation Fund - Sitra

29330000451

Licence