Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis

Colling, Richard; Colling, Hayleigh; Browning, Lisa; Verrill, Clare

doi:10.1186/s12885-021-08698-4

Research article
Open access
Published: 06 September 2021

Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis

Richard Colling ORCID: orcid.org/0000-0001-6344-9081^1,2,
Hayleigh Colling¹,
Lisa Browning^2,3 &
…
Clare Verrill^1,2,3

BMC Cancer volume 21, Article number: 995 (2021) Cite this article

1753 Accesses
4 Citations
3 Altmetric
Metrics details

Abstract

Background

Pathological grading of non-invasive urothelial carcinoma has a direct impact upon management. This study evaluates the reproducibility of grading these tumours on glass slides and digital pathology.

Methods

Forty eight non-invasive urothelial bladder carcinomas were graded by three uropathologists on glass and on a digital platform using the 1973 WHO and 2004 ISUP/WHO systems.

Results

Consensus grades for glass and digital grading gave Cohen’s kappa scores of 0.78 (2004) and 0.82 (1973). Of 142 decisions made on the key therapeutic borderline of low grade versus high grade urothelial carcinoma (2004) by the three pathologists, 85% were in agreement. For the 1973 grading system, agreement overall was 90%.

Conclusions

Agreement on grading on glass slide and digital screen assessment is similar or in some cases improved, suggesting at least non-inferiority of DP for grading of non-invasive urothelial carcinoma.

Peer Review reports

Background

Most bladder tumours are urothelial carcinomas and around 70–80% of these are either non-invasive or early-invasive (superficial) [1]. Risk stratification based on morphological grading by pathologists is clinically useful for determining prognosis and follow-up management and therefore, histopathological grading confers significant clinical impact for patients. Despite this, little data exist on the reproducibility of these grading systems, especially for the increasing popularity and transitioning to digital pathology (DP) assessment of tumours [2, 3]. This is particularly important given that a number of potential pitfalls are already known in some areas of DP, where digital screen appearances can be challenging to identify or interpret. Most DP validation studies focus on overall diagnostic concordance rather than tumour grade specifically, however grading dysplasia and tumours is often identified as a source of discordance [4]. Recent reviews and guidelines highlight potential pitfalls of digitally grading atypia, including in urothelial cells [5,6,7]. This view has been supported by a number of validation studies that have identified grade discrepancies in the small number of urothelial carcinomas included [8,9,10,11]. The true extent of this problem in urological cancers, and how this relates to background intra and inter-observer variation, is not known. The aim of this study is to evaluate the intra-observer and inter-observer variation in grading of non-invasive urothelial bladder carcinomas, comparing glass and digital reporting/assessment methodologies.

Methods

Fifty consecutive bladder cases, including transurethral resections and biopsies, of non-invasive papillary urothelial carcinomas were selected from the 2019 digital archive for a departmental audit. A formal sample size calculation was not performed; a small set of representative cases were selected, in line with routine validation type studies for laboratory studies. Cases were graded by three specialist uropathologists. All pathologists had at least 12 months experience with DP and the laboratory scans all routine paraffin-embedded histology slides. All cases were graded both on a digital screen and on traditional glass slides. Cases were graded twice on glass and once via DP, with a washout period of at least 2-weeks between sessions. Glass slides were missing for two cases, which were then excluded. Slides were scanned with a × 40 objective using a Philips Ultra Fast Scanner and displayed on a high-resolution (either an Eizo MX242W or a Dell U2715H), calibrated (to a brightness of at least 270 cd/m², gamma of 2.2, and white point at 7500 K) digital screen using the Philips IMS on Google Chrome. Both ISUP/WHO 2004 and WHO 1973 systems were used for grading. Agreement was compared using linear weighted Cohen’s kappa and Fleiss’ kappa. A group consensus grade (by best of three votes) was also used for comparisons. All three pathologists were blinded to the original reports, each other’s grading, and the grades from their own previous assessment sessions (although potential access was available). Cases that were given a diagnosis of papillary urothelial neoplasm of low malignant potential (PUNLMP) during the study, were excluded from the statistical analysis.

Results

The grades assigned to each case in the three separate grading sessions by each pathologist are given in Table 1. Examples of these are shown in Fig. 1. The kappa scores are summarised in Table 2.

Table 1 Grades assigned for each of the 48 cases in the study by each of the three pathologists. Grades were assigned on three occasions, separated by a washout period. Each of the grading sessions were completed on either digital pathology screen (once) or on glass slide (two separate sessions). Both the 1973 (grade 1, 2, 3) and 2004 (PUNLMP, low, high) WHO grading systems were applied for each case in all sessions. Pathologists were blinded throughout the study to the results of the original report, to the grades they gave at previous assessment sessions within the study, and to the grades given by the other pathologists in the study. Discrepancies between the digital reporting and first glass reporting are highlighted in red

Full size table

Table 2 Cohen’s and Fleiss’ kappa scores for grading of non-invasive urothelial carcinoma of the bladder. Individual pathologist grades and pathologist consensus grades compared (top) between 1st and 2nd glass slide grading sessions and between glass slide (1st grading) and digital pathology grading. Agreement between all three pathologists (below) are given for both glass slide grading sessions and digital pathology grading. Both 1973 and 2004 grading systems compared

Full size table

For the 2004 grading system, the number of cases that were in agreement between digital and 1st glass grading for pathologist A was 40/46 (87%), for pathologist B was 44/48 (92%), and for pathologist C was 37/48 (77%), with overall 121/142 (85%) grades in agreement. Of the 21 discrepancies, 13 cases (62%) of the cases deemed high grade on glass were downgraded to low grade on digital, whereas the remaining eight (38%) low grade cases on glass were deemed high grade on digital. A similar trend towards digital downgrading was seen in the 1973 grading system. Here, the number of cases in agreement for pathologist A was 39/46 (85%), for pathologist B was 45/48 (94%), and for pathologist C was 44/48 (92%), with overall 128/142 (90%). Of the 14 discrepancies, six (43%) were deemed grade 2 on glass with four (28%) upgraded to grade 3 and two (14%) downgraded to grade 1 on glass, and eight (57%) were deemed grade 3 on glass and downgraded to grade 2 on digital – 10 cases were therefore downgraded on digital (71% of the 14 cases).

Discussion

The impact of grading on clinical management of patients with superficial bladder cancer is significant. The presence of high grade morphology will often up-stratify a tumour with patients sometimes offered mitomycin C, Bacille Calmette-Guérin (BCG) therapy, or even surgery. It is imperative then that pathologists can reliably grade these tumours, including using increasingly popular DP systems.

Grading of atypia/dysplasia is a requirement for in situ, non-invasive, and invasive neoplasms associated with tumours arising at many sites and thus is applicable beyond just urothelial neoplasia. Some authors have expressed concerns that grading using low power views on a digital screen may pose a risk for missing focal areas of more high-grade disease [5,6,7,8,9]. If this is the case, the issue would be more pertinent to tumours such as urothelial carcinoma (based on highest grade area, even if small or focal) than for other tumours where grading may be an overall appearance. However, it is arguable that the same may be true with traditional glass microscopy.

In this study three pathologists who specialise in reporting urological specimens retrospectively graded a set of 48 non-invasive urothelial carcinomas on three separate occasions. Cases were graded twice on glass (to assess intra-pathologist consistency) and once on a digital screen as a comparison. All grading sessions were carried out with blinding and washout periods. Although a relatively small sample was used, this type of approach is in keeping with standard practice when validating new laboratory equipment and larger sample sizes (over 50) for agreement studies for a very specific context do not usually improve the statistical analysis.

The agreement of all three pathologists on a digital screen grading was moderate, with slightly better performance for the 2004 grading system – as might be expected for a system with fewer categories. This overall level of agreement is not unexpected and is in keeping with data that have been reported in bladder cancers before [12]. For example, a reproducibility study of grading Ta/T1 bladder cancers in 2014 found kappa scores for the agreement between seven pathologists ranging from 0.68 to 0.70 [12]. Similar problems are also reported at other tissue sites [13,14,15]. Specifically on DP systems, studies have identified high discrepancy rates in interpretation of urothelial biopsies when compared with glass slide interpretation [16] and grading urothelial atypia is cited as a common problem [5,6,7,8]. In this study however, no obvious trend in agreement of the three pathologists was seen on glass versus digital, suggesting that digital grading was as good as glass grading.

Intra-observer agreement of pathologists (agreement of pathologists with themselves) was generally better than agreement between (inter-observer) pathologists, regardless of modality. This is probably to be expected for subjective grading systems and so, as has been suggested by some authors, intra-observer agreement may be a more reliable indicator of the reproducibility of DP than inter-observer agreement [4, 17]. In keeping with that view, this study suggests that, overall, DP is non-inferior for grading non-invasive bladder cancer.

Consensus grades (the grade agreed by at least two pathologists) produced largely the highest kappa scores in the study, suggesting that double reporting may also be a useful and safe way of checking grading for potentially high stakes cases in routine practice. With DP, this is increasingly easy as cases can be electronically shared with colleagues at the click of a mouse.

As expected, most of the disagreements (on glass and digitally) were a difference of only one grade either way, and most differences would have no, or very little, impact on patient management. There were cases where all three pathologists agreed on the grade (both grading systems) for all grading sessions (Cases 1, 7, 8, 17, 24, 25, 32, 34, 37, 40, 43, 46, see Table 1), but these were only 12 occasions (25%) and tended to be low grade, grade 2 cases, arguably a middle default grade.

Low grade versus high grade (WHO 2004) and grade 2 versus grade 3 (WHO 1973) are key therapeutic thresholds. In this study, in 87% (2004) and 90% (1973) of cases the grades between pathologists were in agreement on digital and glass assessment, with a slight tendency to undergrade/downgrade on digital. Similar levels of agreement were found in a recent systematic review, which showed a 92.4% agreement between digital and glass diagnosis, but overall diagnosis would probably have less potential for subjective inter-observer variation than grading [18]. The mild tendency to undergrade on digital, could be explained by the observation that pathologists might be inclined to use a lower magnification digital view and miss areas of high grade tumour. Other possible explanations are difficulties with rendering of nuclear detail on digital images, poor focusing, the effect of file compression artefact, and the limited dynamic range of the whole slide image. It is also possible that this trend may not be reproduced in larger studies. Difficulties with diagnosis and grading of atypia / dysplasia on the digital microscope is nonetheless a recurrent theme in the literature and is a potential pitfall for the new digital pathologist [6]. The need for confirming borderline cases on both digital and glass and also asking for second opinions/double reporting when in doubt is re-iterated by the findings in this study.

Conclusions

In this study we have shown that agreement for grading non-invasive bladder tumours on glass slide and digital screen assessment is similar, or in some cases improved by digital reporting. The data suggest that digital reporting of grade in these tumours is at least non-inferior and we have outlined how others can adopt and validate similar techniques in their centres.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Abbreviations

BCG:: Bacille Calmette-Guérin
DP:: Digital pathology
H:: High grade
ISUP:: International society of urological pathology
L:: Low grade
PUNLMP:: Papillary urothelial neoplasm of low malignant potential
WHO:: World Heath Organisation

References

Isharwal S, Konety B. Non-muscle invasive bladder cancer risk stratification. Indian J Urol. 2015;31(4):289–96. https://doi.org/10.4103/0970-1591.166445.
Article PubMed PubMed Central Google Scholar
Jansen I, Lucas M, Savci-Heijink CD, Meijer SL, Marquering HA, de Bruin DM, et al. Histopathology: ditch the slides, because digital and 3D are on show. World J Urol. 2018;36(4):549–55. https://doi.org/10.1007/s00345-018-2202-1.
Article PubMed PubMed Central Google Scholar
Jahn SW, Plass M, Moinfar F. Digital pathology: advantages, limitations and emerging perspectives. J Clin Med. 2020;9(11):3697. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7698715/.
Araújo ALD, Arboleda LPA, Palmier NR, Fonsêca JM, de Pauli Paglioni M, Gomes-Silva W, et al. The performance of digital microscopy for primary diagnosis in human pathology: a systematic review. Virchows Arch. 2019;474(3):269–87. https://doi.org/10.1007/s00428-018-02519-z.
Article PubMed Google Scholar
Royal College of Pathologists. Best practice recommendations for implementing digital pathology January 2018. [cited 2020 June 11]; Available from: https://www.rcpath.org/uploads/assets/f465d1b3-797b-4297-b7fedc00b4d77e51/Best-practice-recommendations-for-implementing-digital-pathology.pdf.
Williams BJ, Treanor D. Practical guide to training and validation for primary diagnosis with digital pathology. J Clin Pathol. 2020;73(7):418–22. https://doi.org/10.1136/jclinpath-2019-206319.
Article PubMed Google Scholar
Williams BJ, DaCosta P, Goacher E, Treanor D. A systematic analysis of discordant diagnoses in digital pathology compared with light microscopy. Arch Pathol Lab Med. 2017;141(12):1712–8. https://doi.org/10.5858/arpa.2016-0494-OA.
Article PubMed Google Scholar
Snead DRJ, Tsang YW, Meskiri A, Kimani PK, Crossman R, Rajpoot NM, et al. Validation of digital pathology imaging for primary histopathological diagnosis. Histopathology. 2016;68(7):1063–72. https://doi.org/10.1111/his.12879.
Article PubMed Google Scholar
Al-Janabi S, et al. Whole slide images for primary diagnostics of urinary system pathology: a feasibility study. J Renal Injury Prev. 2014;3(4):91–6. https://doi.org/10.12861/jrip.2014.26.
Article Google Scholar
Mukhopadhyay S, Feldman MD, Abels E, Ashfaq R, Beltaifa S, Cacciabeve NG, et al. Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: a multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). Am J Surg Pathol. 2018;42(1):39–52. https://doi.org/10.1097/PAS.0000000000000948.
Article PubMed Google Scholar
Babawale M, et al. Verification and validation of digital pathology (whole slide imaging) for primary histopathological diagnosis: all Wales experience. J Pathol Inform. 2021;12:4.
Article CAS Google Scholar
Mangrud OM, Waalen R, Gudlaugsson E, Dalen I, Tasdemir I, Janssen EAM, et al. Reproducibility and prognostic value of WHO1973 and WHO2004 grading systems in TaT1 urothelial carcinoma of the urinary bladder. PLoS One. 2014;9(1):e83192. https://doi.org/10.1371/journal.pone.0083192.
Article CAS PubMed PubMed Central Google Scholar
Montgomery E. Is there a way for pathologists to decrease Interobserver variability in the diagnosis of dysplasia? Arch Pathol Lab Med. 2005;129(2):174–6. https://doi.org/10.5858/2005-129-174-ITAWFP.
Article PubMed Google Scholar
Gomes DS, Porto SS, Balabram D, Gobbi H. Inter-observer variability between general pathologists and a specialist in breast pathology in the diagnosis of lobular neoplasia, columnar cell lesions, atypical ductal hyperplasia and ductal carcinoma in situ of the breast. Diagn Pathol. 2014;9(1):121. https://doi.org/10.1186/1746-1596-9-121.
Article PubMed PubMed Central Google Scholar
Turner JK, Williams GT, Morgan M, Wright M, Dolwani S. Interobserver agreement in the reporting of colorectal polyp pathology among bowel cancer screening pathologists in Wales. Histopathology. 2013;62(6):916–24. https://doi.org/10.1111/his.12110.
Article PubMed Google Scholar
Borowsky AD, Glassy EF, Wallace WD, Kallichanda NS, Behling CA, Miller DV, et al. Digital whole slide imaging compared with light microscopy for primary diagnosis in surgical pathology: a multicenter, double-blinded, randomized study of 2045 cases. Arch Pathol Lab Med. 2020;144(10):1245–53. https://doi.org/10.5858/arpa.2019-0569-OA.
Article PubMed Google Scholar
Pantanowitz L, Sinard JH, Henricks WH, Fatheree LA, Carter AB, Contis L, et al. Validating whole slide imaging for diagnostic purposes in pathology: guideline from the College of American Pathologists Pathology and Laboratory Quality Center. Arch Pathol Lab Med. 2013;137(12):1710–22. https://doi.org/10.5858/arpa.2013-0093-CP.
Article PubMed PubMed Central Google Scholar
Goacher E, Randell R, Williams B, Treanor D. The diagnostic concordance of whole slide imaging and light microscopy: a systematic review. Arch Pathol Lab Med. 2017;141(1):151–61. https://doi.org/10.5858/arpa.2016-0025-RA.
Article PubMed Google Scholar

Download references

Acknowledgements

The authors thank the work of the Digital Pathology Steering Group at Oxford University Hospitals NHS Foundation Trust in overseeing the transition to digital pathology within the department and supporting this work. The authors also thank PathLAKE. Views expressed are those of the authors and not necessarily those of the PathLAKE Consortium members, the NHS, Innovate UK or UKRI. Views expressed are those of the authors and not necessarily those of the PathLAKE Consortium members, the NHS, The NIHR, Department of Health, Innovate UK or UKRI.

Funding

The study did not receive (or need) any direct funding. The work was supported (digital pathology equipment) by the PathLAKE Centre of Excellence for digital pathology and AI which is funded from the Data to Early Diagnosis and Precision Medicine strand of the government’s Industrial Strategy Challenge Fund, managed and delivered by Innovate UK on behalf of UK Research and Innovation (UKRI).

PathLAKE funding reference: 104689 / Application number: 18181.

RC, CV, and LB are part funded (salary) by PathLAKE.

CV, LB, and HC are part funded (salary) by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). Funding is via the Molecular Diagnostics Theme.

HC is part funded (salary) by the Medical Research Council (MRC) Human Immunology Unit, University of Oxford.

RC, CV, and LB are part funded (salary) by the National Health Service (NHS).

Author information

Authors and Affiliations

Nuffield Department of Surgical Sciences, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DU, UK
Richard Colling, Hayleigh Colling & Clare Verrill
Department of Cellular Pathology, Oxford University Hospitals NHS Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK
Richard Colling, Lisa Browning & Clare Verrill
NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, Oxfordshire, UK
Lisa Browning & Clare Verrill

Authors

Richard Colling
View author publications
You can also search for this author in PubMed Google Scholar
Hayleigh Colling
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Browning
View author publications
You can also search for this author in PubMed Google Scholar
Clare Verrill
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LB, RC, CV contributed equally to the study design and all participated in the histological analysis of the cases. HC assisted in collating cases and recording data. RC was the major contributor to the writing of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Richard Colling.

Ethics declarations

Ethics approval and consent to participate

The study was registered in our department as an audit/evaluation of service on diagnostic material. No application for ethics review was submitted and no patient consent was sought as the authors deemed this as not applicable or needed in this study design, which was conducted for an evaluation of service & quality improvement. Permission to use the cases was provided by the hospital trust.

Consent for publication

Not applicable.

Competing interests

The authors and affiliated institutions are part of PathLAKE, one of the UK Government’s funded 5 AI Centres of Excellence. PathLAKE has received in kind industry investment from Philips.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Colling, R., Colling, H., Browning, L. et al. Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis. BMC Cancer 21, 995 (2021). https://doi.org/10.1186/s12885-021-08698-4

Download citation

Received: 16 February 2021
Accepted: 13 August 2021
Published: 06 September 2021
DOI: https://doi.org/10.1186/s12885-021-08698-4

Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Results

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

BMC Cancer

Contact us

Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Results

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Cancer

Contact us