Mammogram Density Assessment Dataset
Description
GENERAL OVERVIEW This dataset was compiled to address the limitations of current methods for breast density assessment in mammograms, especially the challenges of: • Shortage of radiologists: There are not enough radiologists to efficiently analyze the large number of mammograms needed for screening. • Subjectivity: Radiologist assessments of breast density can vary, leading to inconsistencies. • Limitations of existing tools: Current CAD tools for breast density estimation often have limitations, such as restricted functionality to specific mammogram views and difficulties with accurate segmentation. This dataset offers a unique solution by expanding the original mammogram images with: • Binary masks of the breast area: These expert-annotated masks precisely delineate the entire breast region in each mammogram, providing valuable ground truth data for segmentation methods. • Binary masks of dense tissue: Similarly, these masks accurately identify areas of dense tissue within each mammogram, further enhancing the dataset's utility for training and evaluating segmentation algorithms. The dataset facilitates the development of automated breast density estimation with deep learning. It also serves as a valuable tool for researchers developing and benchmarking medical image segmentation methods specifically focused on breast tissue analysis in mammograms. DATA DESCRIPTION This dataset consists of segmentation masks for dense tissue and breast area as well as area-based breast density percentage values from the VinDr-Mammo public dataset accessible from [3]. All annotations were performed and validated by an expert radiologist. Files: The data is provided in two compressed archives, ‘train.zip’ and ‘test.zip’. • train.zip: This archive contains two sub-folders: - breast_masks: This sub-folder contains the ground truth segmentation masks for the breast area, also in JPG format. - dense_masks: This sub-folder contains the ground truth segmentation masks for the dense tissue, again in JPG format. The segmentation masks have the dimensions of 2800×3518 pixels. File Lists: Two CSV files are provided alongside the compressed archives: • train.csv: This file contains information about the training set with two columns: - Filename: This column contains the filenames of the training set images. These images can be directly downloaded from the VinDr-Mammo dataset, https://physionet.org/content/vindr-mammo/1.0.0/. - Density: This column provides the ground truth continuous breast density value for each mammogram in the training set, intended for the breast density estimation task. • test.csv: This file contains a single column, “Filename”, listing the filenames of the test set. No ground truth information is provided for the test set. Ground truths are intentionally kept private for Breast Density Kaggle Challenge https://www.kaggle.com/competitions/breast-density-prediction, however, will be eventually open to public in the dataset repository.
Files
Steps to reproduce
https://github.com/uefcancer/Mammography_dataset
Institutions
Categories
Funding
Finnish Innovation Fund - Sitra
29330000451