Quantitative Comparison and Reproducibility of Pathologist Scoring and Digital Image Analysis of Estrogen Receptor β2 Immunohistochemistry in Prostate Cancer
Digital imaging technology has advanced as far as to allow automated scoring of immunohistochemistry (IHC) assays, often better than a pathologist could score visually. This is significant, given that visual scoring has been the method of choice so far for IHC staining quantification.
Some major problems encountered with visual scoring include a smaller range of data, human error, significant inter-observer variability, and final ordinal or quasi-continuous variable data in place of the expected true continuous variable data.
Digital image analysis offers a way to obtain better data, such as allowing the locking of algorithm parameters so that the data remains reproducible despite weak staining which is related in a linear fashion to antigen concentration, as well as getting continuous variable data outputs.
Prior studies show that this type of data from digital imaging showed up IHC cut-offs for biomarkers that were of prognostic significance but were either missed or thought to be of weak association when visual scoring was used. Again, digital image analysis allows experiments to be scaled up to high-throughput magnitudes such as when tissue microarrays are used, which could be cumbersome and prolonged if visual scoring procedures were used instead.
A further benefit is the observed high correlation between these two methods of analysis. Most of this type of research has been carried out on breast cancer tissue using human epidermal growth factor receptor, estrogen receptor, and progesterone receptor, but esophageal, colorectal, ovarian, and prostate cancer (PCa) tissue has been studied with the same results.
Visual scoring scales used by pathologists use a simple ordinal variable scale, such as negative “0”, weak “1 + ”, medium “2 + ”, and strong “3 + ” positive staining, though complex systems have also come into use which yield quasi-continuous variable data by, for instance, estimating the tissue area which has an ordinal value of intensity by the ordinal value itself.
In breast cancer research, the two systems have been compared quite extensively with respect to their correlation and reproducibility. The present study is meant to carry out a similar comparison with PCa tissue. Since biomarkers with prognostic value are few when it comes to routine clinical use in this area, digital methods to assess IHC assays carried out in large numbers could be of great value in helping to rank the utility of various protein biomarkers to evaluate the aggressiveness of a tumor.
One such assay is for the estrogen receptor β2 (ERβ2) which is thought to promote metastasis in PCa as well as being of prognostic value in tumor progression.
This experiment intends to measure how well digital image analysis correlates with pathologist visual scoring by a semi-quantitative scoring technique, as well as their reproducibility and how the former method matches up with disease-specific survival statistics. The method involved the use of a large PCa tissue microarrays (TMA) slides stained for ERβ2.
Prostate cancer TMAs were subjected to digital imaging and then the images were scored either visually or by digital analysis for ERβ2 staining in the tumor epithelium. The images for visual scoring were created by scanning the stained slides using an automated TissueFAXS microscope (TissueGnostics GmbH) and then were reviewed with the help of an online web gallery.
The ERβ2 staining was scored according to the individual TMA spots by a pathologist who had been blinded to the clinical features. The images were analyzed twice independently to assess the reproducibility of visual scoring. The image analysis data was then analyzed further to look for any significant association with the recurrence-free survival period as well as the disease-specific survival after radical prostatectomy.
The current experiment demonstrated a weak to moderate Spearman correlation between the two systems of scoring of tumor cell nuclei in two independent runs, namely 0.42 on Analysis Run A and 0.41 on Analysis Run B. However, there was a moderate to strong correlation between them when it came to tumor cytoplasm at 0.70 and 0.69 for Analysis Run A and B respectively.
Reproducibility was high by Spearman correlates for individual TMA spots by visual scoring in both runs, with correlations of 0.84 and 0.83 for nuclei and cytoplasm respectively, while with digital image analysis it was still higher, at 0.99 for both nuclei and cytoplasm across both runs.
The presence of ERβ2 staining showed significant association with specific mortality risk for PCa (prostate cancer-specific mortality, PCSM) when subjected to quantification by cytoplasmic digital image analysis, with HR 2.16, 95 % CI 1.02–4.57, p = 0.045. For nuclear image analysis the quantitative ERβ2 staining had an HR 2.67, 95 % CI 1.20–5.96, p = 0.016, while for total malignant epithelial area analysis it was HR 5.10, 95 % CI 1.70–15.34, p = 0.004.
Once the clinical and pathological factors were adjusted for, the only significant association was for total malignant epithelial area ERβ2 staining (HR 4.08, 95 % CI 1.37–12.15, p = 0.012).
The present experiment shows that digital image analysis in immunohistochemical quantification has greater reproducibility than a pathologist’s visual scoring of prostate cancer tissue, which may indicate that it should be preferred, particularly when studying samples of a larger size.
Rizzardi AE, Zhang X, Vogel RI, et al. Quantitative comparison and reproducibility of pathologist scoring and digital image analysis of estrogen receptor β2 immunohistochemistry in prostate cancer. Diagn Pathol. 2016;11(1):63. Published 2016 Jul 11. doi:10.1186/s13000-016-0511-5