Multi-Laboratory Hematoxylin and Eosin Staining Variance Unsupervised Machine Learning Dataset

Published: 12 September 2022| Version 1 | DOI: 10.17632/b9dxsybhm9.1
Fabi Prezja, Ilkka Pölönen,


We provide the generated dataset used for unsupervised machine learning in [1]. The data is in CSV format and contains all principal components and ground truth labels, per tissue type. Tissue type codes used are; C1 for kidney, C2 for skin, C3 for colon, and 'PC' for the principal component. Please see the original design in [1] for feature extraction specifications. Features have been extracted independently for each tissue type. Reference: Prezja, F.; Pölönen, I.; Äyrämö, S.; Ruusuvuori, P.; Kuopio, T. H&E Multi-Laboratory Staining Variance Exploration with Machine Learning. Appl. Sci. 2022, 12, 7511.


Steps to reproduce

The exact specifications can be found in [1].


Jyvaskylan Yliopisto


Artificial Intelligence, Machine Learning, Unsupervised Learning, Histopathology