Diagnostic performance of artificial intelligence for histologic melanoma recognition compared to 18 international expert pathologists: Supplementary Material

Published: 08-02-2021| Version 1 | DOI: 10.17632/j87c9jshxy.1
Eva Krieghoff-Henning,
Titus Brinker,
Max Schmitt,
Raymond Barnhill,
Helmut Beltraminelli,
Stephan Braun,
Richard Carr,
Maria-Teresa Fernandez-Figueras,
Gerardo Ferrara,
Sylvie Fraitag,
Raffaele Giannotti,
Mar Llamas Velasco,
Cornelia Müller,
Antonio Perasole,
Luis Requena,
Omar Sangueza,
Carlos Santonja,
Hans Starz,
Esmeralda Vale,
Wolfgang Weyers,
Achim Hekler,
Jakob Kather,
Stefan Fröhling,
Dieter Krahl,
Tim Holland-Letz,
Jochen Utikal,
Andrea Saggini,
Heinz Kutzner


The study was designed to compare the performance of classifiers based on image analyses by convolutionnal neural networks (CNNs) with that of 18 expert dermatopathologists in a binary classification task for pigmented skin lesions. Mendeley Supplementary Figure 1 shows a schematic representation of the testing approach. The CNNs were trained by cross-testing. Each iteration consists of five folds. Orange rectangles represent the folds used for testing and blue rectangles represent folds for training. For each iteration a CNN is trained and tested on the respective folds. Each trained CNN has a performance that is determined on the fold. To calculate the overall performance, the sum of all 5 performances is taken. This procedure was repeated 3 times and the results were combined to generate an ensemble. Mendeley Supplementary Figure 2 depicts the whole slide image (WSI) analysis. Tissue sections on whole slide images (top) were divided into tiles (middle). The CNN (a pre-trained ResNeXt50) assigned a malignancy score to every individual tile (bottom). Red tiles were classified as melanoma, blue tiles as nevus. Scores for all tiles on one image were averaged to a final malignancy score for the complete slide. Supplementary Table 1 shows the characteristics of the pigmented skin lesions included in the test set. Supplementary Table 2 provides an analysis of statistical differences between the performance of pathologists and CNN classifiers.