Image-based quantification of IHC slides in tissue microarray format
Description
This repository contains all the code required to reproduce the image‑based quantification workflow used in this study. The workflow is designed to quantify staining intensity in tissue microarray (TMA) cores by first isolating stained pixels using a calibrated color threshold and then measuring their corresponding grayscale (intensity) values. The pipeline is organized into four sequential stages: Slide acquisition: Stained TMA slides are digitized at high resolution and stored as pyramidal whole‑slide image files to support multiscale processing and visualization. Core localization and extraction: TMA core positions are automatically identified and, where necessary, corrected. Using these coordinates, individual cores are extracted from the whole‑slide image and saved as high‑resolution cropped images for downstream analysis. Quality control: Extracted cores are manually reviewed to identify and exclude samples with visual artifacts (e.g., tissue loss, folding, staining artefacts, or scanning defects) that could bias quantitative measurements. Only cores passing this quality control step are retained. Threshold calibration and quantification: A color threshold is calibrated using a grid‑search approach to optimally distinguish stained pixels, informed by comparisons between tumor and normal tissue samples. The calibrated threshold is then applied uniformly to all retained cores. For each core, grayscale intensity values are quantified over the threshold‑selected pixel population. All intermediate and final outputs—including core position files, cropped core images, threshold parameters, and per‑core intensity measurements—are saved to disk to enable inspection, reuse, and full reproducibility of the analysis.
Files
Steps to reproduce
image_extraction.py implements core position extraction and high-resolution core cropping from a whole-slide image. It detects candidate cores from a low-resolution slide view using a Hough Circle Transform (OpenCV), removes likely false positives using an empirical neighbor-distance rule, and then supports manual correction of remaining mistakes before saving numbered core crops. grid_estimator.py contains helper logic for neighborhood-based filtering and assigning row/column grid coordinates to detected cores. labelme_interface.py supports the manual "adjust/remove markers" step by exporting detections to a LabelMe JSON file and importing edited markers back in. calibrate_and_measure.py implements calibration and measurement. It samples a small set of core images from each class (tumor vs. normal), performs a grid search over hue/saturation bounds to maximize the mean difference between classes, then applies the selected threshold to the remaining samples and saves the resulting intensities. Calibration samples are excluded from downstream plots to avoid bias. Notebooks (generate_splits.ipynb, plot_split.ipynb) are used to create analysis splits and to generate plots from saved intensities.