Multi-vendor evaluation of artificial intelligence as an independent reader for double reading in breast cancer screening on 275,900 mammograms

Sharma, Nisha; Ng, Annie Y.; James, Jonathan J.; Khara, Galvin; Ambrózay, Éva; Austin, Christopher C.; Forrai, Gábor; Fox, Georgia; Glocker, Ben; Heindl, Andreas; Karpati, Edit; Rijken, Tobias M.; Venkataraman, Vignesh; Yearsley, Joseph E.; Kecskemethy, Peter D.

doi:10.1186/s12885-023-10890-7

Table 2 Performance of double reading with and without AI – by site and mammography equipment vendor

From: Multi-vendor evaluation of artificial intelligence as an independent reader for double reading in breast cancer screening on 275,900 mammograms

A) MK / IMS Giotto^b
Performance Metric	Historical double reading	Double reading (DR) with AI	Test outcome for DR with AI^a
On ten-year cohort
Recall rate	9.2% (9.0, 9.4)	7.8% (7.7, 8.0)	Superior 0.85 (0.85, 0.86)
CDR	7.7 per 1000 (7.1, 8.3)	7.6 per 1000 (7.0, 8.2)	Non-inferior 0.99 (0.98, 0.99)
Sensitivity	88.8% (86.2, 90.9)	87.5% (84.9, 89.7)	Non-inferior 0.99 (0.98, 0.99)
Specificity	94.7% (94.3, 95.0)	95.8% (95.4, 96.1)	Superior 1.01 (1.01, 1.01)
PPV	8.3% (8.1, 8.6)	9.6% (9.4, 9.9)	Superior 1.16 (1.14, 1.16)
On 2015-year cohort: with more complete IC data available
Recall rate	8.5% (8.0, 9.1)	7.5% (7.0, 8.0)	Superior 0.88 (0.85, 0.90)
CDR	7.5 per 1000 (6.0, 9.3)	7.4 per 1000 (5.9, 9.2)	Non-inferior 0.99 (0.96, 1.00)
Sensitivity	87.6% (79.2, 93.9)	86.5% (77.9., 92.1)	Non-inferior 0.99 (0.96, 1.00)
Specificity	95.8% (94.6, 96.7)	96.9% (95.8, 97.6)	Superior 1.01 (1.01, 1.02)
PPV	8.7% (7.0, 10.8)	9.8% (7.9, 12.1)	Superior 1.12 (1.09, 1.14)
B) NUH / GE^c
Performance Metric	Historical double reading	Double reading (DR) with AI	Test outcome for DR with AI^a
On ten-year cohort
Recall rate	2.8% (2.7, 2.9)	2.8% (2.7, 3.0)	Non-inferior 1.01 (0.99, 1.03)
CDR	8.8 per 1000 (8.1, 9.5)	8.6 per 1000 (7.9, 9.3)	Non-inferior 0.98 (0.96, 0.99)
Sensitivity	85.5% (82.7, 87.9)	83.5% (80.6, 86.1)	Non-inferior 0.98 (0.96, 0.99)
Specificity	97.9% (97.7, 98.1)	97.9% (97.7, 98.1)	Non-inferior 1.00 (0.9995, 1.00)
PPV	31.6% (29.5, 33.7)	30.4% (28.4, 32.5)	Non-inferior 0.96 (0.95, 0.98)
On 2015-year cohort: with more complete IC data available
Recall rate	2.8% (2.5, 3.2)	2.8% (2.5, 3.1)	Non-inferior 0.99 (0.95, 1.04)
CDR	8.0 per 1000 (6.5, 9.9)	7.9 per 1000 (6.4, 9.8)	Non-inferior 0.99 (0.96, 1.00)
Sensitivity	73.9% (65.4, 81.0)	73.1% (64.5, 80.3)	Non-inferior 0.99 (0.96, 1.00)
Specificity	98.0% (97.7, 98.3)	98.1% (97.8, 98.4)	Non-inferior 1.00 (0.9996, 1.0)
PPV	28.3% (23.6, 33.5)	28.2% (23.5, 33.5)	Non-inferior 1.00 (0.97, 1.01)
C) LTHT / Hologic^c
Performance Metric	Historical double reading	Double reading (DR) with AI	Test outcome for DR with AI^a
On ten-year cohort
Recall rate	5.1% (4.9, 5.3)	5.1% (4.9, 5.2)	Non-inferior 0.99 (0.98, 1.01)
CDR	8.2 per 1000 (7.6, 9.0)	8.0 per 1000 (7.4, 8.8)	Non-inferior 0.97 (0.96, 0.99)
Sensitivity	87.1% (84.2, 89.5)	84.8% (81.7, 87.4)	Non-inferior 0.97 (0.96, 0.99)
Specificity	95.9% (95.7, 96.2)	96.0% (95.7, 96.3)	Non-inferior 1.00 (0.9999, 1.00)
PPV	16.2% (15.0, 17.5)	15.9% (14.7, 17.2)	Non-inferior 0.98 (0.97, 1.00)
On 2015-year cohort: with more complete IC data available
Recall rate	4.3% (4.0, 4.7)	4.1% (3.8, 4.5)	Superior 0.95 (0.92, 0.99)
CDR	7.7 per 1000 (6.2, 9.5)	7.6 per 1000 (6.1, 9.4)	Non-inferior 0.99 (0.96, 1.00)
Sensitivity	88.2% (80.1, 93.3)	87.1% (78.8, 92.5)	Non-inferior 0.99 (0.96, 1.00)
Specificity	96.5% (96.0, 96.9)	96.6% (96.1, 97.1)	Non-inferior 1.00 (0.9998, 1.00)
PPV	17.7% (14.5, 21.5)	18.3% (15.0, 22.2)	Superior 1.04 (1.01, 1.05)
D) ULH / Siemens^c
Performance Metric	Historical double reading	Double reading (DR) with AI	Test outcome for DR with AI^a
On ten-year cohort
Recall rate	3.6% (3.5, 3.8)	3.6% (3.4, 3.7)	Superior 0.98 (0.96, 0.9981)
CDR	9.3 per 1000 (8.6, 10.1)	9.1 per 1000 (8.4, 9.9)	Non-inferior 0.97 (0.96, 0.99)
Sensitivity	85.6% (82.7, 88.1)	83.4% (80.4, 86.1)	Non-inferior 0.97 (0.96, 0.99)
Specificity	97.4% (97.0, 97.7)	97.5% (97.1, 97.8)	Non-inferior 1.00 (0.9994, 1.00)
PPV	25.7% (23.9, 27.6)	25.6% (23.7, 27.5)	Non-inferior 1.00 (0.98, 1.01)
On 2015-year cohort: with more complete IC data available
Recall rate	3.4% (3.1, 3.7)	3.4% (3.1, 3.7)	Non-inferior 0.98 (0.95, 1.02)
CDR	9.0 per 1000 (7.5, 10.7)	8.8 per 1000 (7.4, 10.5)	Non-inferior 0.98 (0.96, 1.00)
Sensitivity	77.6% (70.4, 83.4)	76.3% (69.0, 82.3)	Non-inferior 0.98 (0.96, 1.00)
Specificity	97.6% (97.1, 98.0)	97.7% (97.2, 98.1)	Non-inferior 1.00 (0.9988, 1.00)
PPV	26.2% (22.4, 30.4)	26.2% (22.3, 30.4)	Non-inferior 1.00 (0.97, 1.02)

95% confidence intervals are presented in parentheses
^aThe ratio of proportions and 95% confidence intervals for assessing non-inferiority and superior are presented
^bThe positive pool for CDR, sensitivity, and PPV include screen-detected positives and two-year ICs, which are relevant for HU
^cThe positive pool for CDR, sensitivity, and PPV include screen-detected positives and three-year ICs, which are relevant for the UK

Back to article page

ISSN: 1471-2407

Contact us

Submission enquiries: bmccancer@biomedcentral.com
General enquiries: ORSupport@springernature.com

BMC Cancer

Contact us