Skip to main content

Table 2 Performance of double reading with and without AI – by site and mammography equipment vendor

From: Multi-vendor evaluation of artificial intelligence as an independent reader for double reading in breast cancer screening on 275,900 mammograms

A) MK / IMS Giottob

   

Performance Metric

Historical double reading

Double reading (DR) with AI

Test outcome for DR with AIa

On ten-year cohort

   

 Recall rate

9.2% (9.0, 9.4)

7.8% (7.7, 8.0)

Superior

0.85 (0.85, 0.86)

 CDR

7.7 per 1000 (7.1, 8.3)

7.6 per 1000 (7.0, 8.2)

Non-inferior

0.99 (0.98, 0.99)

 Sensitivity

88.8% (86.2, 90.9)

87.5% (84.9, 89.7)

Non-inferior

0.99 (0.98, 0.99)

 Specificity

94.7% (94.3, 95.0)

95.8% (95.4, 96.1)

Superior

1.01 (1.01, 1.01)

 PPV

8.3% (8.1, 8.6)

9.6% (9.4, 9.9)

Superior

1.16 (1.14, 1.16)

On 2015-year cohort: with more complete IC data available

 Recall rate

8.5% (8.0, 9.1)

7.5% (7.0, 8.0)

Superior

0.88 (0.85, 0.90)

 CDR

7.5 per 1000 (6.0, 9.3)

7.4 per 1000 (5.9, 9.2)

Non-inferior

0.99 (0.96, 1.00)

 Sensitivity

87.6% (79.2, 93.9)

86.5% (77.9., 92.1)

Non-inferior

0.99 (0.96, 1.00)

 Specificity

95.8% (94.6, 96.7)

96.9% (95.8, 97.6)

Superior

1.01 (1.01, 1.02)

 PPV

8.7% (7.0, 10.8)

9.8% (7.9, 12.1)

Superior

1.12 (1.09, 1.14)

B) NUH / GEc

Performance Metric

Historical double reading

Double reading (DR) with AI

Test outcome for DR

with AIa

On ten-year cohort

 Recall rate

2.8% (2.7, 2.9)

2.8% (2.7, 3.0)

Non-inferior

1.01 (0.99, 1.03)

 CDR

8.8 per 1000 (8.1, 9.5)

8.6 per 1000 (7.9, 9.3)

Non-inferior

0.98 (0.96, 0.99)

 Sensitivity

85.5% (82.7, 87.9)

83.5% (80.6, 86.1)

Non-inferior

0.98 (0.96, 0.99)

 Specificity

97.9% (97.7, 98.1)

97.9% (97.7, 98.1)

Non-inferior

1.00 (0.9995, 1.00)

 PPV

31.6% (29.5, 33.7)

30.4% (28.4, 32.5)

Non-inferior

0.96 (0.95, 0.98)

On 2015-year cohort: with more complete IC data available

 Recall rate

2.8% (2.5, 3.2)

2.8% (2.5, 3.1)

Non-inferior

0.99 (0.95, 1.04)

 CDR

8.0 per 1000 (6.5, 9.9)

7.9 per 1000 (6.4, 9.8)

Non-inferior

0.99 (0.96, 1.00)

 Sensitivity

73.9% (65.4, 81.0)

73.1% (64.5, 80.3)

Non-inferior

0.99 (0.96, 1.00)

 Specificity

98.0% (97.7, 98.3)

98.1% (97.8, 98.4)

Non-inferior

1.00 (0.9996, 1.0)

 PPV

28.3% (23.6, 33.5)

28.2% (23.5, 33.5)

Non-inferior

1.00 (0.97, 1.01)

C) LTHT / Hologicc

Performance Metric

Historical double reading

Double reading (DR) with AI

Test outcome for DR

with AIa

On ten-year cohort

 Recall rate

5.1% (4.9, 5.3)

5.1% (4.9, 5.2)

Non-inferior

0.99 (0.98, 1.01)

 CDR

8.2 per 1000 (7.6, 9.0)

8.0 per 1000 (7.4, 8.8)

Non-inferior

0.97 (0.96, 0.99)

 Sensitivity

87.1% (84.2, 89.5)

84.8% (81.7, 87.4)

Non-inferior

0.97 (0.96, 0.99)

 Specificity

95.9% (95.7, 96.2)

96.0% (95.7, 96.3)

Non-inferior

1.00 (0.9999, 1.00)

 PPV

16.2% (15.0, 17.5)

15.9% (14.7, 17.2)

Non-inferior

0.98 (0.97, 1.00)

On 2015-year cohort: with more complete IC data available

 Recall rate

4.3% (4.0, 4.7)

4.1% (3.8, 4.5)

Superior

0.95 (0.92, 0.99)

 CDR

7.7 per 1000 (6.2, 9.5)

7.6 per 1000 (6.1, 9.4)

Non-inferior

0.99 (0.96, 1.00)

 Sensitivity

88.2% (80.1, 93.3)

87.1% (78.8, 92.5)

Non-inferior

0.99 (0.96, 1.00)

 Specificity

96.5% (96.0, 96.9)

96.6% (96.1, 97.1)

Non-inferior

1.00 (0.9998, 1.00)

 PPV

17.7% (14.5, 21.5)

18.3% (15.0, 22.2)

Superior

1.04 (1.01, 1.05)

D) ULH / Siemensc

Performance Metric

Historical double reading

Double reading (DR) with AI

Test outcome for DR

with AIa

On ten-year cohort

 Recall rate

3.6% (3.5, 3.8)

3.6% (3.4, 3.7)

Superior

0.98 (0.96, 0.9981)

 CDR

9.3 per 1000 (8.6, 10.1)

9.1 per 1000 (8.4, 9.9)

Non-inferior

0.97 (0.96, 0.99)

 Sensitivity

85.6% (82.7, 88.1)

83.4% (80.4, 86.1)

Non-inferior

0.97 (0.96, 0.99)

 Specificity

97.4% (97.0, 97.7)

97.5% (97.1, 97.8)

Non-inferior

1.00 (0.9994, 1.00)

 PPV

25.7% (23.9, 27.6)

25.6% (23.7, 27.5)

Non-inferior

1.00 (0.98, 1.01)

On 2015-year cohort: with more complete IC data available

 Recall rate

3.4% (3.1, 3.7)

3.4% (3.1, 3.7)

Non-inferior

0.98 (0.95, 1.02)

 CDR

9.0 per 1000 (7.5, 10.7)

8.8 per 1000 (7.4, 10.5)

Non-inferior

0.98 (0.96, 1.00)

 Sensitivity

77.6% (70.4, 83.4)

76.3% (69.0, 82.3)

Non-inferior

0.98 (0.96, 1.00)

 Specificity

97.6% (97.1, 98.0)

97.7% (97.2, 98.1)

Non-inferior

1.00 (0.9988, 1.00)

 PPV

26.2% (22.4, 30.4)

26.2% (22.3, 30.4)

Non-inferior

1.00 (0.97, 1.02)

  1. 95% confidence intervals are presented in parentheses
  2. aThe ratio of proportions and 95% confidence intervals for assessing non-inferiority and superior are presented
  3. bThe positive pool for CDR, sensitivity, and PPV include screen-detected positives and two-year ICs, which are relevant for HU
  4. cThe positive pool for CDR, sensitivity, and PPV include screen-detected positives and three-year ICs, which are relevant for the UK