Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis
BMC Cancer volume 21, Article number: 995 (2021)
Pathological grading of non-invasive urothelial carcinoma has a direct impact upon management. This study evaluates the reproducibility of grading these tumours on glass slides and digital pathology.
Forty eight non-invasive urothelial bladder carcinomas were graded by three uropathologists on glass and on a digital platform using the 1973 WHO and 2004 ISUP/WHO systems.
Consensus grades for glass and digital grading gave Cohen’s kappa scores of 0.78 (2004) and 0.82 (1973). Of 142 decisions made on the key therapeutic borderline of low grade versus high grade urothelial carcinoma (2004) by the three pathologists, 85% were in agreement. For the 1973 grading system, agreement overall was 90%.
Agreement on grading on glass slide and digital screen assessment is similar or in some cases improved, suggesting at least non-inferiority of DP for grading of non-invasive urothelial carcinoma.
Most bladder tumours are urothelial carcinomas and around 70–80% of these are either non-invasive or early-invasive (superficial) . Risk stratification based on morphological grading by pathologists is clinically useful for determining prognosis and follow-up management and therefore, histopathological grading confers significant clinical impact for patients. Despite this, little data exist on the reproducibility of these grading systems, especially for the increasing popularity and transitioning to digital pathology (DP) assessment of tumours [2, 3]. This is particularly important given that a number of potential pitfalls are already known in some areas of DP, where digital screen appearances can be challenging to identify or interpret. Most DP validation studies focus on overall diagnostic concordance rather than tumour grade specifically, however grading dysplasia and tumours is often identified as a source of discordance . Recent reviews and guidelines highlight potential pitfalls of digitally grading atypia, including in urothelial cells [5,6,7]. This view has been supported by a number of validation studies that have identified grade discrepancies in the small number of urothelial carcinomas included [8,9,10,11]. The true extent of this problem in urological cancers, and how this relates to background intra and inter-observer variation, is not known. The aim of this study is to evaluate the intra-observer and inter-observer variation in grading of non-invasive urothelial bladder carcinomas, comparing glass and digital reporting/assessment methodologies.
Fifty consecutive bladder cases, including transurethral resections and biopsies, of non-invasive papillary urothelial carcinomas were selected from the 2019 digital archive for a departmental audit. A formal sample size calculation was not performed; a small set of representative cases were selected, in line with routine validation type studies for laboratory studies. Cases were graded by three specialist uropathologists. All pathologists had at least 12 months experience with DP and the laboratory scans all routine paraffin-embedded histology slides. All cases were graded both on a digital screen and on traditional glass slides. Cases were graded twice on glass and once via DP, with a washout period of at least 2-weeks between sessions. Glass slides were missing for two cases, which were then excluded. Slides were scanned with a × 40 objective using a Philips Ultra Fast Scanner and displayed on a high-resolution (either an Eizo MX242W or a Dell U2715H), calibrated (to a brightness of at least 270 cd/m2, gamma of 2.2, and white point at 7500 K) digital screen using the Philips IMS on Google Chrome. Both ISUP/WHO 2004 and WHO 1973 systems were used for grading. Agreement was compared using linear weighted Cohen’s kappa and Fleiss’ kappa. A group consensus grade (by best of three votes) was also used for comparisons. All three pathologists were blinded to the original reports, each other’s grading, and the grades from their own previous assessment sessions (although potential access was available). Cases that were given a diagnosis of papillary urothelial neoplasm of low malignant potential (PUNLMP) during the study, were excluded from the statistical analysis.
For the 2004 grading system, the number of cases that were in agreement between digital and 1st glass grading for pathologist A was 40/46 (87%), for pathologist B was 44/48 (92%), and for pathologist C was 37/48 (77%), with overall 121/142 (85%) grades in agreement. Of the 21 discrepancies, 13 cases (62%) of the cases deemed high grade on glass were downgraded to low grade on digital, whereas the remaining eight (38%) low grade cases on glass were deemed high grade on digital. A similar trend towards digital downgrading was seen in the 1973 grading system. Here, the number of cases in agreement for pathologist A was 39/46 (85%), for pathologist B was 45/48 (94%), and for pathologist C was 44/48 (92%), with overall 128/142 (90%). Of the 14 discrepancies, six (43%) were deemed grade 2 on glass with four (28%) upgraded to grade 3 and two (14%) downgraded to grade 1 on glass, and eight (57%) were deemed grade 3 on glass and downgraded to grade 2 on digital – 10 cases were therefore downgraded on digital (71% of the 14 cases).
The impact of grading on clinical management of patients with superficial bladder cancer is significant. The presence of high grade morphology will often up-stratify a tumour with patients sometimes offered mitomycin C, Bacille Calmette-Guérin (BCG) therapy, or even surgery. It is imperative then that pathologists can reliably grade these tumours, including using increasingly popular DP systems.
Grading of atypia/dysplasia is a requirement for in situ, non-invasive, and invasive neoplasms associated with tumours arising at many sites and thus is applicable beyond just urothelial neoplasia. Some authors have expressed concerns that grading using low power views on a digital screen may pose a risk for missing focal areas of more high-grade disease [5,6,7,8,9]. If this is the case, the issue would be more pertinent to tumours such as urothelial carcinoma (based on highest grade area, even if small or focal) than for other tumours where grading may be an overall appearance. However, it is arguable that the same may be true with traditional glass microscopy.
In this study three pathologists who specialise in reporting urological specimens retrospectively graded a set of 48 non-invasive urothelial carcinomas on three separate occasions. Cases were graded twice on glass (to assess intra-pathologist consistency) and once on a digital screen as a comparison. All grading sessions were carried out with blinding and washout periods. Although a relatively small sample was used, this type of approach is in keeping with standard practice when validating new laboratory equipment and larger sample sizes (over 50) for agreement studies for a very specific context do not usually improve the statistical analysis.
The agreement of all three pathologists on a digital screen grading was moderate, with slightly better performance for the 2004 grading system – as might be expected for a system with fewer categories. This overall level of agreement is not unexpected and is in keeping with data that have been reported in bladder cancers before . For example, a reproducibility study of grading Ta/T1 bladder cancers in 2014 found kappa scores for the agreement between seven pathologists ranging from 0.68 to 0.70 . Similar problems are also reported at other tissue sites [13,14,15]. Specifically on DP systems, studies have identified high discrepancy rates in interpretation of urothelial biopsies when compared with glass slide interpretation  and grading urothelial atypia is cited as a common problem [5,6,7,8]. In this study however, no obvious trend in agreement of the three pathologists was seen on glass versus digital, suggesting that digital grading was as good as glass grading.
Intra-observer agreement of pathologists (agreement of pathologists with themselves) was generally better than agreement between (inter-observer) pathologists, regardless of modality. This is probably to be expected for subjective grading systems and so, as has been suggested by some authors, intra-observer agreement may be a more reliable indicator of the reproducibility of DP than inter-observer agreement [4, 17]. In keeping with that view, this study suggests that, overall, DP is non-inferior for grading non-invasive bladder cancer.
Consensus grades (the grade agreed by at least two pathologists) produced largely the highest kappa scores in the study, suggesting that double reporting may also be a useful and safe way of checking grading for potentially high stakes cases in routine practice. With DP, this is increasingly easy as cases can be electronically shared with colleagues at the click of a mouse.
As expected, most of the disagreements (on glass and digitally) were a difference of only one grade either way, and most differences would have no, or very little, impact on patient management. There were cases where all three pathologists agreed on the grade (both grading systems) for all grading sessions (Cases 1, 7, 8, 17, 24, 25, 32, 34, 37, 40, 43, 46, see Table 1), but these were only 12 occasions (25%) and tended to be low grade, grade 2 cases, arguably a middle default grade.
Low grade versus high grade (WHO 2004) and grade 2 versus grade 3 (WHO 1973) are key therapeutic thresholds. In this study, in 87% (2004) and 90% (1973) of cases the grades between pathologists were in agreement on digital and glass assessment, with a slight tendency to undergrade/downgrade on digital. Similar levels of agreement were found in a recent systematic review, which showed a 92.4% agreement between digital and glass diagnosis, but overall diagnosis would probably have less potential for subjective inter-observer variation than grading . The mild tendency to undergrade on digital, could be explained by the observation that pathologists might be inclined to use a lower magnification digital view and miss areas of high grade tumour. Other possible explanations are difficulties with rendering of nuclear detail on digital images, poor focusing, the effect of file compression artefact, and the limited dynamic range of the whole slide image. It is also possible that this trend may not be reproduced in larger studies. Difficulties with diagnosis and grading of atypia / dysplasia on the digital microscope is nonetheless a recurrent theme in the literature and is a potential pitfall for the new digital pathologist . The need for confirming borderline cases on both digital and glass and also asking for second opinions/double reporting when in doubt is re-iterated by the findings in this study.
In this study we have shown that agreement for grading non-invasive bladder tumours on glass slide and digital screen assessment is similar, or in some cases improved by digital reporting. The data suggest that digital reporting of grade in these tumours is at least non-inferior and we have outlined how others can adopt and validate similar techniques in their centres.
Availability of data and materials
All data generated or analysed during this study are included in this published article.
International society of urological pathology
Papillary urothelial neoplasm of low malignant potential
World Heath Organisation
Isharwal S, Konety B. Non-muscle invasive bladder cancer risk stratification. Indian J Urol. 2015;31(4):289–96. https://doi.org/10.4103/0970-1591.166445.
Jansen I, Lucas M, Savci-Heijink CD, Meijer SL, Marquering HA, de Bruin DM, et al. Histopathology: ditch the slides, because digital and 3D are on show. World J Urol. 2018;36(4):549–55. https://doi.org/10.1007/s00345-018-2202-1.
Jahn SW, Plass M, Moinfar F. Digital pathology: advantages, limitations and emerging perspectives. J Clin Med. 2020;9(11):3697. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7698715/.
Araújo ALD, Arboleda LPA, Palmier NR, Fonsêca JM, de Pauli Paglioni M, Gomes-Silva W, et al. The performance of digital microscopy for primary diagnosis in human pathology: a systematic review. Virchows Arch. 2019;474(3):269–87. https://doi.org/10.1007/s00428-018-02519-z.
Royal College of Pathologists. Best practice recommendations for implementing digital pathology January 2018. [cited 2020 June 11]; Available from: https://www.rcpath.org/uploads/assets/f465d1b3-797b-4297-b7fedc00b4d77e51/Best-practice-recommendations-for-implementing-digital-pathology.pdf.
Williams BJ, Treanor D. Practical guide to training and validation for primary diagnosis with digital pathology. J Clin Pathol. 2020;73(7):418–22. https://doi.org/10.1136/jclinpath-2019-206319.
Williams BJ, DaCosta P, Goacher E, Treanor D. A systematic analysis of discordant diagnoses in digital pathology compared with light microscopy. Arch Pathol Lab Med. 2017;141(12):1712–8. https://doi.org/10.5858/arpa.2016-0494-OA.
Snead DRJ, Tsang YW, Meskiri A, Kimani PK, Crossman R, Rajpoot NM, et al. Validation of digital pathology imaging for primary histopathological diagnosis. Histopathology. 2016;68(7):1063–72. https://doi.org/10.1111/his.12879.
Al-Janabi S, et al. Whole slide images for primary diagnostics of urinary system pathology: a feasibility study. J Renal Injury Prev. 2014;3(4):91–6. https://doi.org/10.12861/jrip.2014.26.
Mukhopadhyay S, Feldman MD, Abels E, Ashfaq R, Beltaifa S, Cacciabeve NG, et al. Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: a multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). Am J Surg Pathol. 2018;42(1):39–52. https://doi.org/10.1097/PAS.0000000000000948.
Babawale M, et al. Verification and validation of digital pathology (whole slide imaging) for primary histopathological diagnosis: all Wales experience. J Pathol Inform. 2021;12:4.
Mangrud OM, Waalen R, Gudlaugsson E, Dalen I, Tasdemir I, Janssen EAM, et al. Reproducibility and prognostic value of WHO1973 and WHO2004 grading systems in TaT1 urothelial carcinoma of the urinary bladder. PLoS One. 2014;9(1):e83192. https://doi.org/10.1371/journal.pone.0083192.
Montgomery E. Is there a way for pathologists to decrease Interobserver variability in the diagnosis of dysplasia? Arch Pathol Lab Med. 2005;129(2):174–6. https://doi.org/10.5858/2005-129-174-ITAWFP.
Gomes DS, Porto SS, Balabram D, Gobbi H. Inter-observer variability between general pathologists and a specialist in breast pathology in the diagnosis of lobular neoplasia, columnar cell lesions, atypical ductal hyperplasia and ductal carcinoma in situ of the breast. Diagn Pathol. 2014;9(1):121. https://doi.org/10.1186/1746-1596-9-121.
Turner JK, Williams GT, Morgan M, Wright M, Dolwani S. Interobserver agreement in the reporting of colorectal polyp pathology among bowel cancer screening pathologists in Wales. Histopathology. 2013;62(6):916–24. https://doi.org/10.1111/his.12110.
Borowsky AD, Glassy EF, Wallace WD, Kallichanda NS, Behling CA, Miller DV, et al. Digital whole slide imaging compared with light microscopy for primary diagnosis in surgical pathology: a multicenter, double-blinded, randomized study of 2045 cases. Arch Pathol Lab Med. 2020;144(10):1245–53. https://doi.org/10.5858/arpa.2019-0569-OA.
Pantanowitz L, Sinard JH, Henricks WH, Fatheree LA, Carter AB, Contis L, et al. Validating whole slide imaging for diagnostic purposes in pathology: guideline from the College of American Pathologists Pathology and Laboratory Quality Center. Arch Pathol Lab Med. 2013;137(12):1710–22. https://doi.org/10.5858/arpa.2013-0093-CP.
Goacher E, Randell R, Williams B, Treanor D. The diagnostic concordance of whole slide imaging and light microscopy: a systematic review. Arch Pathol Lab Med. 2017;141(1):151–61. https://doi.org/10.5858/arpa.2016-0025-RA.
The authors thank the work of the Digital Pathology Steering Group at Oxford University Hospitals NHS Foundation Trust in overseeing the transition to digital pathology within the department and supporting this work. The authors also thank PathLAKE. Views expressed are those of the authors and not necessarily those of the PathLAKE Consortium members, the NHS, Innovate UK or UKRI. Views expressed are those of the authors and not necessarily those of the PathLAKE Consortium members, the NHS, The NIHR, Department of Health, Innovate UK or UKRI.
The study did not receive (or need) any direct funding. The work was supported (digital pathology equipment) by the PathLAKE Centre of Excellence for digital pathology and AI which is funded from the Data to Early Diagnosis and Precision Medicine strand of the government’s Industrial Strategy Challenge Fund, managed and delivered by Innovate UK on behalf of UK Research and Innovation (UKRI).
PathLAKE funding reference: 104689 / Application number: 18181.
RC, CV, and LB are part funded (salary) by PathLAKE.
CV, LB, and HC are part funded (salary) by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). Funding is via the Molecular Diagnostics Theme.
HC is part funded (salary) by the Medical Research Council (MRC) Human Immunology Unit, University of Oxford.
RC, CV, and LB are part funded (salary) by the National Health Service (NHS).
Ethics approval and consent to participate
The study was registered in our department as an audit/evaluation of service on diagnostic material. No application for ethics review was submitted and no patient consent was sought as the authors deemed this as not applicable or needed in this study design, which was conducted for an evaluation of service & quality improvement. Permission to use the cases was provided by the hospital trust.
Consent for publication
The authors and affiliated institutions are part of PathLAKE, one of the UK Government’s funded 5 AI Centres of Excellence. PathLAKE has received in kind industry investment from Philips.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Colling, R., Colling, H., Browning, L. et al. Validation of grading of non-invasive urothelial carcinoma by digital pathology for routine diagnosis. BMC Cancer 21, 995 (2021). https://doi.org/10.1186/s12885-021-08698-4