Histopathological image patches from colorectal cancer with three classes: tumor, stroma and other

Published: 3 April 2023| Version 1 | DOI: 10.17632/37t2d6xmy2.1
Liisa Petäinen


This dataset includes 2770 image patches (224 x 224 px^2, 0.5 MPP) from hematoxylin and eosin-stained tissue samples of colorectal cancer from 17 patients from the Central Finland Healthcare District. Image patches can be utilized e.g. as an external test set for deep learning model. Dataset has three classes: tumor, stroma and other. Patches were tiled from annotated areas, annotations were made by pathologist Juha P. Väyrynen. The class other includes debris, lymphocytes, mucus, normal epithelium and smooth muscle. Tiling was performed with a sliding window procedure with 64 pixels overlapping. If less than 75\% of the patch area included annotated pixels, the patch was discarded. All patches were color normalized by Macenko's method (see "A method for normalizing histology slides for quantitative analysis" by Macenko et al., 2009).


Steps to reproduce

This test set was produced from 17 whole slide images (WSIs) of primary colorectal cancers (stages I-IV). The cohort is described in detail in "Prognostic significance of spatial and density analysis of T lymphocytes in colorectal cancer" by Elomaa et al., 2022. The WSIs were scanned with Hamamatsu NanoZoomer-XR (Hamamatsu Photonics, Hamamatsu City, Japan).


Jyvaskylan Yliopisto, Keski Suomen Sairaanhoitopiiri


Pathology, Colon Cancer, Colorectal Cancer, Image Analysis (Medical Imaging), Digital Pathology, Deep Learning