- Research article
- Open Access
- Open Peer Review
A simple way to measure the burden of interval cancers in breast cancer screening
BMC Cancervolume 14, Article number: 782 (2014)
The sensitivity of a mammography program is normally evaluated by comparing the interval cancer rate to the expected breast cancer incidence without screening, i.e. the proportional interval cancer rate (PICR). The expected breast cancer incidence in absence of screening is, however, difficult to estimate when a program has been running for some time. As an alternative to the PICR we propose the interval cancer ratio . We validated this simple measure by comparing it with the traditionally used PICR.
We undertook a systematic review and included studies: 1) covering a service screening program, 2) women aged 50-69 years, 3) observed data, 4) interval cancers, women screened, or interval cancer rate, screen detected cases, or screen detection rate, and 5) estimated breast cancer incidence rate of background population. This resulted in 5 papers describing 12 mammography screening programs.
Covering initial screens only, the ICR varied from 0.10 to 0.28 while the PICR varied from 0.22 to 0.51. For subsequent screens only, the ICR varied from 0.22 to 0.37 and the PICR from 0.28 to 0.51. There was a strong positive correlation between the ICR and the PICR for initial screens (r = 0.81), but less so for subsequent screens (r = 0.65).
This alternate measure seems to capture the burden of interval cancers just as well as the traditional PICR, without need for the increasingly difficult estimation of background incidence, making it a more accessible tool when evaluating mammography screening program performance.
Mammography screening is intended to reduce breast cancer mortality by detecting the breast cancer cases at an earlier stage. A high sensitivity is needed for a mammography screening program to fulfil its purpose. This means the program should not have too many interval cancers, i.e. cancers that appear clinically after a negative screening result and before the next scheduled screen. A screening program in a population with a high breast cancer incidence can have a high interval cancer rate and still have as protective an effect on breast cancer mortality as a screening program with a low interval cancer rate running in a population with a low breast cancer incidence. The sensitivity of a mammography screening program is therefore normally evaluated by comparing the interval cancer rate to the expected breast cancer incidence without screening, i.e. the PICR . In order to compare sensitivity across screening programs, the European guidelines provide acceptable and desirable values for this measure. However, over time the difficulties in estimating the expected background incidence makes such comparisons increasingly unreliable.
The expected breast cancer incidence in absence of screening, or background incidence, is difficult to approximate, as the introduction of a screening program makes it difficult to find an unscreened, comparable population group. As the breast cancer incidence has changed over time , it will, some years after introducing of screening, no longer be meaningful to estimate the expected breast cancer incidence without screening from the breast cancer incidence prior to the screening.
The aim of this article is to propose and validate an alternative performance indicator for the burden of interval cancers in an organized mammography screening program. We aim to validate this proposed measure by comparing with the PICR from studies of service screening programs for women aged 50-69. Zorzi et al.  have previously proposed that for a given subsequent screening round, PICR is substituted by . We propose to use the even simpler and to use this measure also for the initial screening round.
We performed a PubMed search using Major MeSH terms with the restriction of the words “mammography” or “screening” required in the abstracts where abstracts were available, in the title where abstracts were not available, and finally in free texts, see Additional file 1. We did this search in March 2012, and it was limited to publications in English. This search resulted in 3299 matches. Among these matches, relevant studies were identified in a two-step search. First, two independent researchers, SBA & SHN, reviewed the titles and abstracts of the 3299 papers. This sorting resulted in 96 papers for further consideration. Second, we selected studies: 1) covering a service screening program, 2) including women aged 50-69 years, 3) reporting observed data (paper based on modeling only were excluded), 4) reporting number of screen detected cancers or screen detection rates and number of screened women and two of these: number of interval cancers, interval cancer rate or number of screened women and 5) reporting estimated breast cancer incidence rate of the background population in the absence of screening. Third, in case consensus was not obtained, a third researcher, EL, participated in the decision. This resulted in inclusion of 5 papers [4–8] describing 12 different screening programs, to be included in this review, Figure 1.
Screen detected cancers
A primary breast cancer found at scheduled screening examination. Some centers allowed a so-called early recall (or intermediate mammography) prescribed for diagnostic reasons 1 year after the screening test. Cases detected at early recall are calculated as SD cancers.
A primary breast cancer diagnosed in a woman, after a screening test negative for malignancy. The breast cancer should either be diagnosed before the next invitation to screening, or within a time period equal to the screening interval in case the woman has reached the upper age limit for screening or for other reasons does not receive more invitations.
Proportional interval cancer rate (PICR)
Interval cancer rate as a proportion of the underlying, expected, breast cancer incidence rate in the absence of screening: . This is the classic epidemiology performance indicator  as used in the EU Guidelines .
Interval cancer ratio (ICR)
Interval cancer as proportion all cancers: . This is the measure we propose as an alternative performance indicator.
From each paper we extracted: Information on number of screened women and number of screen detected cancers or screen detection rate, the expected background annual incidence rate per 10,000 and number of interval cancer cases. If not provided, we calculated interval cancer cases per 10,000 screen negative women (this being number of women screened minus number of screen detected cases). Finally we calculated and . In the Veneto region study the interval cancers were identified by linkage to the regional hospital discharge records. For all other studies interval cancers were identified by linkage to the regional/national cancer register, which all are regarded as complete.
Initial versus subsequent screens
The number of screen detected cases is higher in initial screens than in subsequent screens. Therefore the ICR will be lower in initial screens than in subsequent screens. When comparing interval cancer ratios one therefore has to distinguish between initial screens and subsequent screens. All studies had a screening interval of 2 years, except Marseille where the screening interval was 3 years.
Pearson’s correlation coefficient and best-fit straight line was calculated using Microsoft Office Excel 2007.
The ICR in studies of initial screens varied from 0.10 to 0.28 while the PICR varied from 0.22 to 0.51 in the same studies (Table 1). In studies of subsequent screens the ICR varied from 0.22 to 0.37 with the PICR varying from 0.28 to 0.61 (Table 2). Four studies reported on mixed initial and subsequent screens. The Italian study from the Veneto Region with a majority of initial screens, had an ICR of 0.18, and a PICR of 0.29. The studies from Copenhagen, Denmark, Funen, Denmark and Pirkanmaa, Finland with a majority of subsequent screens, had an ICR of 0.25-0.34 and a PICR of 0.40-0.61.
All studies estimated the expected background incidence by the observed incidence just before the mammography screening program started. With the breast cancer incidence increasing over time , this estimated background incidence will consequently increasingly underestimate the true background incidence.
The Norwegian NBCSP study estimated the background incidence by the observed incidence in women aged 50-69 years before screening started. This will underestimate the expected incidence, since the observed interval cancer rate will derive from women on average being two years older.
The Italian Veneto Region study is based on invasive cancers only, whereas all other studies are based on invasive + ductal carcinoma in situ (DCIS). Since DCIS is far more common among screen-detected cancers calculations excluding DCIS will increase the ICR more than the PICR.The correlation between ICR and PICR was r = 0.76 for initial screens (Figure 2), and r = 0.58 for subsequent screens (Figure 3).
When comparing PICRs across screening programs, differences can reflect true differences in interval cancer rates; differences in methods for estimating the expected background incidence; or differences in the time trend of breast cancer incidence. By using the ICR, instead of estimating the PICR, the uncertainty introduced by estimating the expected background incidence is avoided. Hence, the ICR is potentially a better performance indicator as no estimation is needed. The question is, however, whether this suggested simple performance indicator captures interval cancer burden as well as the old measure.
As seen in Figure 2 there is a high positive correlation (r = 0.76) between the two measures in initial screens. Outliers are Stockholm, Norway, Copenhagen, Marseille, Strasbourg and the Italian Veneto Region. Stockholm and Norway had quite extensive opportunistic screening before the service screening program started [7, 10]. One could therefore argue that the data from these locations did not represent 100% initial screens but were probably more in line with the Veneto Region program, which had 73% initial screens. Since the ICR will be higher for subsequent screens, it was not surprising that the Stockholm, Norway and Veneto Region programs had relatively high ICR for initial screens. The high ICR for the Veneto Region was also a consequence of including only invasive cancers.The relationship between the ICR and PICR for studies with primarily subsequent screens (seen in Figure 3) showed a strong positive correlation (r = 0.58). Data from Turin and Florence are based on small numbers (25 and 28 interval cancers respectively), and excluding these two programs gave a stronger correlation (r = 0.68).
When the expected background incidence is calculated based on the incidence of the general population, the actually screened population could have a different expected background incidence; especially if the attendance rate is low. Marseille had an attendance rate of 43% and had a 3 year screening interval until 2001. Strasbourg had no active invitation for the first screen, implying that the incidence of the screened population could be different from that of the general population. If we excluded Marseille and Strasbourg from the comparison, we got a correlation of r = 0.73 for initial screens. If we for subsequent screens excluded Turin, Florence, Marseille and Strasbourg we got a correlation of r = 0.73.
In randomized controlled studies (RCTs) the expected background incidence is the incidence found in the control group. PICR can therefore be calculated with great confidence in RCTs. We found information on interval cancers and screen detected cancers in both arms of the Gothenburg Breast Screening Trial  and the Swedish two-county trial . We could only find information on number of person years and thereby incidence in the entire period wherefore the incidence in the control arm included one screening. We did neither find information stratified into initial and subsequent screenings. The value of ICR and PICR are therefore not entirely comparable with the values in the studies included in this review. Based on the results from Gothenburg Breast Screening Trial we calculated ICR = 0.21 and PICR = 0.20. From the results in the two-county trial we calculated ICR = 0.27 and PICR = 0.21. Although the results are not completely comparable the ICR and PICR values from these two RCTs are very close to the line showing the connection between ICR and PICR for subsequent screenings.
The measure we propose will make it easier to compare interval cancer rates across screening programs, since an estimation of an expected background incidence is not needed. Especially when controlling for other differences between the programs, we see a high correlation between the PICR and the ICR. It is therefore possible to get a reasonable comparison of the burden of interval cancers across mammography screening programs by comparing the ICR instead of the PICR. It does, of course, not explain other, more in-depth, issues concerning the burden of interval cancers such as difference in tumor size or stage between screen detected and interval cancers.
Strengths & weaknesses
This study includes data from many mammography screening programs throughout Western Europe, which support the potential for use of this simple measure in different settings. As pointed out by the very limited number of studies available for this study, only a few programs actually estimate PICR and thereby check if the sensitivity follows the European guidelines. It is much simpler to calculate ICR, and we therefore believe that reporting of the program sensitivity would be much more common if the gold standard was to use ICR. Using the ICR as a performance indicator instead of the PICR will facilitate comparisons between screening programs.
Some of the centers included in this study allow for early recall. We adopted the method from Törnberg et al. 2010 and calculated cases detected at early recall as screen detected cancers. Whether cases detected at early recall are counted as screen detected cancers or interval cancers, will have a very minor impact on our study as we are comparing PICR = IC/(expected background incidence) to ICR = IC/(IC + SD), which is equivalent to comparing 1/(expected background incidence) to 1/(IC + SD).
It is a strength that the ICR is not affected by uncertainties in the estimates of background incidence, and the ICR is therefore not subject to over-estimation of the burden of interval cancers caused by an under-estimated background incidence. It is, however, a weakness that, unlike for the PICR, the ICR is affected by overdiagnosis, since overdiagnosis will increase the number of screen-detected cases. As the number of screen detected breast cancers is included in the denominator in the calculation of the ICR, this measure could be sensitive to overdiagnosis at screening. Reliable data on overdiagnosis have been reported from the programmes in Denmark and Florence, finding overdiagnosis to account for 1-5% of all incident breast cancers [13, 14]. Larger estimates of overdiagnosis have been reported in the literature, but they mainly reflect that the estimates are not adequately adjusted . An overdiagnosis of 1-5% would change the size of ICR only marginally, wherefore it would not be a major concern in the interpretation of ICRs. Comparing programs with huge differences in overdiagnosis will still favor the program with many overdiagnosed cases. It is a trade-off when choosing one measure instead of the other, but we argue that there are fewer uncertainties involved in calculating the ICR than in calculating the PICR.
In this study we proposed and validated the ICR as an alternative measure for the burden of interval cancers. The proposed measure seems to capture the burden of interval cancers just as well or better than the traditional PICR, as there is no need for estimations of background incidence. In order to further validate this proposed measure, more studies are needed. It should be noted that the measure of ICR should be seen in the context of other short-term performance indicators, and hence should not stand alone in the evaluation of screening performance.
Perry N, Broeders M, de Wolf C, Tornberg S, Holland R, von Karsa L: European guidelines for quality assurance in breast cancer screening and diagnosis. Fourth edition–summary document. Ann Oncol. 2008, 19 (4): 614-622.
Hery C, Ferlay J, Boniol M, Autier P: Changes in breast cancer incidence and mortality in middle-aged and elderly women in 28 countries with Caucasian majority populations. Ann Oncol. 2008, 19 (5): 1009-1018. 10.1093/annonc/mdm593.
Zorzi M, Guzzinati S, Puliti D, Paci E: A simple method to estimate the episode and programme sensitivity of breast cancer screening programmes. J Med Screen. 2010, 17 (3): 132-138. 10.1258/jms.2010.009060.
Hofvind S, Geller B, Vacek PM, Thoresen S, Skaane P: Using the European guidelines to evaluate the Norwegian breast cancer screening program. Eur J Epidemiol. 2007, 22 (7): 447-455. 10.1007/s10654-007-9137-y.
Mammography screening evaluation group HSCHC: Mammography screening for breast cancer in Copenhagen April 1991-March1997. APMIS. 1998, 106 (suppl 83): 1-44.
Njor SH, Olsen AH, Bellstrom T, Dyreborg U, Bak M, Axelsson C, Graversen HP, Schwartz W, Lynge E: Mammography screening in the county of Fyn. November 1993-December 1999. APMIS Suppl. 2003, 110: 1-33.
Tornberg S, Kemetli L, Ascunce N, Hofvind S, Anttila A, Seradour B, Paci E, Guldenfels C, Azavedo E, Frigerio A, Rodrigues V, Ponti A: A pooled analysis of interval cancer rates in six European countries. Eur J Cancer Prev. 2010, 19 (2): 87-93. 10.1097/CEJ.0b013e32833548ed.
Vettorazzi M, Stocco C, Chirico A, Recanatini S, Saccon S, Mariotto R, Cinquetti S, Moretto T, Sartori P, Stomeo A, Ciatto S: Quality control of mammography screening in the Veneto Region. Evaluation of four programs at a local health unit level–analysis of the frequency and diagnostic pattern of interval cancers. Tumori. 2006, 92 (1): 1-5.
Day NE, Williams DR, Khaw KT: Breast cancer screening programmes: the development of a monitoring and evaluation system. Br J Cancer. 1989, 59 (6): 954-958. 10.1038/bjc.1989.203.
Lynge E, Braaten T, Njor SH, Olsen AH, Kumle M, Waaseth M, Lund E: Mammography activity in Norway 1983 to 2008. Acta Oncol. 2011, 50 (7): 1062-1067. 10.3109/0284186X.2011.599339.
Bjurstam N, Bjorneld L, Warwick J, Sala E, Duffy SW, Nystrom L, Walker N, Cahlin E, Eriksson O, Hafström LO, Lingaas H, Mattsson J, Persson S, Rudenstam CM, Salander H, Säve-Söderbergh J, Wahlin T: The Gothenburg breast screening trial. Cancer. 2003, 97 (10): 2387-2396. 10.1002/cncr.11361.
Tabar L, Duffy SW, Yen MF, Warwick J, Vitak B, Chen HH, Smith RA: All-cause mortality among breast cancer patients in a screening trial: support for breast cancer mortality as an end point. J Med Screen. 2002, 9 (4): 159-162. 10.1136/jms.9.4.159.
Njor SH, Olsen AH, Blichert-Toft M, Schwartz W, Vejborg I, Lynge E: Overdiagnosis in screening mammography in Denmark: population based cohort study. BMJ. 2013, 346: f1064-10.1136/bmj.f1064. doi: 10.1136/bmj.f1064
Puliti D, Miccinesi G, Zappa M, Manneschi G, Crocette E, Paci E: Balancing harms and benefits of servide mammography screening programs: a cohort study. Breast Cancer Res. 2012, 14 (1): R9-10.1186/bcr3090.
Puliti D, Duffy SW, Miccinesi G, de Koning H, Lynge E, Zappa M, Paci E, EUROSCREEN Working Group: Overdiagnosis in mammographic screening for breast cancer in Europe: a litterature review. J Med Screen. 2012, 19 (suppl1): 42-56.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2407/14/782/prepub
The authors wish to thank the following people who have provided data to this study: Levent Kemetli, Nieves Ascunce, Solveig Hofvind, Ahti Anttila, Brigitte Sèradour, Eugenio Paci, Cathrine Guldenfels, Edward Azavedo, Alfonso Frigerio, Vitor Rodrigues, Antonio Ponti.
The authors declare no conflict of interest.
SBA: Participated in the design of the study, did the literature search, reviewed the articles resulting from the literature search, drafted the manuscript. ST: Participated in the design of the study, critical revision of the manuscript provided additional data. EL: Participated in the design of the study, reviewed articles when consensus was not reached between SBA & SHN. MVE-C: Decisions on data structure, critical revision of manuscript. SHN: Conceived of the study and participated in its design and coordination, reviewed the articles, critical revision of the manuscript. All authors read and approved the final manuscript.