Datasets Comparison
Version 4
Mammogram Density Assessment Dataset
Description
This dataset consists of mammogram images, complete with corresponding segmentation masks for dense tissue and breast area annotated by an expert radiologist.
*Files*
train.zip: Comprises three sub-folders: 'images', 'breast_masks', and 'dense_masks'. The 'images' sub-folder houses the original images. The 'breast_masks' and 'dense_masks' sub-folders contain the ground truth segmentation masks for the breast area and dense tissue segmentation, respectively. All images are in JPG format. All masks and the corresponding images have the same dimension.
test.zip: Contains the images for test set in JPG format. No ground truths are provided for the test set.
*File lists*
train.csv: The training set filelist consists of two columns. The first column is the ‘Filename’, and the second column is the ‘Density', the ground truth for the breast density prediction task.
test.csv: The test set filelist contains the filenames of the test sets.
This dataset can be utilized for tasks such as segmentation and breast density estimation. The mammograms were sourced from the public VinDr-Mammo dataset, which can be found at [this link](^https://vindr.ai/datasets/mammo^). We have given annotations, including both segmentation masks and density values, for this public dataset.
If you use this dataset in your research or other purposes, please cite the following studies:
Gudhe, N.R., Behravan, H., Sudah, M. et al. Area-based breast percentage density estimation in mammograms using weight-adaptive multitask learning. Sci Rep 12, 12060 (2022). https://doi.org/10.1038/s41598-022-16141-2
Hieu T. Nguyen et al. “A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography”. 2022. https://doi.org/10.1101/2022.03.07.22272009
Institutions
Institutions
Ita-Suomen yliopisto
Categories
Image Segmentation, Breast Imaging, Image Analysis (Medical Imaging), Breast Density, Deep Learning, Image Analysis
Funders
Finnish Innovation Fund - Sitra
29330000451
Related Links
Licence
Creative Commons Attribution 4.0 International
Version 5
Mammogram Density Assessment Dataset
Description
GENERAL OVERVIEW
This dataset was compiled to address the limitations of current methods for breast density assessment in mammograms, especially the challenges of:
• Shortage of radiologists: There are not enough radiologists to efficiently analyze the large number of mammograms needed for screening.
• Subjectivity: Radiologist assessments of breast density can vary, leading to inconsistencies.
• Limitations of existing tools: Current CAD tools for breast density estimation often have limitations, such as restricted functionality to specific mammogram views and difficulties with accurate segmentation.
This dataset offers a unique solution by expanding the original mammogram images with:
• Binary masks of the breast area: These expert-annotated masks precisely delineate the entire breast region in each mammogram, providing valuable ground truth data for segmentation methods.
• Binary masks of dense tissue: Similarly, these masks accurately identify areas of dense tissue within each mammogram, further enhancing the dataset's utility for training and evaluating segmentation algorithms.
The dataset facilitates the development of automated breast density estimation with deep learning. It also serves as a valuable tool for researchers developing and benchmarking medical image segmentation methods specifically focused on breast tissue analysis in mammograms.
DATA DESCRIPTION
This dataset consists of segmentation masks for dense tissue and breast area as well as area-based breast density percentage values from the VinDr-Mammo public dataset accessible from [3]. All annotations were performed and validated by an expert radiologist.
Files:
The data is provided in two compressed archives, ‘train.zip’ and ‘test.zip’.
• train.zip: This archive contains two sub-folders:
- breast_masks: This sub-folder contains the ground truth segmentation masks for the breast area, also in JPG format.
- dense_masks: This sub-folder contains the ground truth segmentation masks for the dense tissue, again in JPG format.
The segmentation masks have the dimensions of 2800×3518 pixels.
File Lists:
Two CSV files are provided alongside the compressed archives:
• train.csv: This file contains information about the training set with two columns:
- Filename: This column contains the filenames of the training set images. These images can be directly downloaded from the VinDr-Mammo dataset, https://physionet.org/content/vindr-mammo/1.0.0/.
- Density: This column provides the ground truth continuous breast density value for each mammogram in the training set, intended for the breast density estimation task.
• test.csv: This file contains a single column, “Filename”, listing the filenames of the test set. No ground truth information is provided for the test set. Ground truths are intentionally kept private for Breast Density Kaggle Challenge https://www.kaggle.com/competitions/breast-density-prediction, however, will be eventually open to public in the dataset repository.
Steps to reproduce
https://github.com/uefcancer/Mammography_dataset
Institutions
,
Institutions
Ita-Suomen yliopisto
Kuopion yliopistollinen sairaala
Categories
Image Segmentation, Breast Imaging, Image Analysis (Medical Imaging), Breast Density, Deep Learning, Image Analysis
Funders
Finnish Innovation Fund - Sitra
29330000451
Related Links
Licence
Creative Commons Attribution 4.0 International