A discrete event simulation model [19, 20] was built to reproduce the natural history of BC for women invited to participate in the programme and the characteristics of the BCSPBC since its beginning in 1996 through 2011. The screening programme comprised multiple cohorts of women; a cohort was defined as a group of women invited to participate for the first time in the screening programme in a calendar year. Following the terminology used by Hoyle and Anderson the cohort starting screening in the current year is defined as the incident cohort, and those already undergoing screening from previous years are known as the prevalent cohorts [21]. These terms do not correspond to any disease state in this context but to screening eligibility. Our model reproduced the entire female population invited into the programme during the period 1996–2011. Multiple cohort models, also known as dynamic population models, are used to represent the whole women population that have been invited to the program during the study period whereas single cohort models are useful to conclude lifetime effects in a woman with specific characteristics.
A simplified diagram of the model is shown in Fig. 1 and the data used to implement it is specified in the additional data reported (see Additional file 1: Table S1). The simulation model was constructed using Arena simulation software (Version 14.0, Rockwell software, Milwaukee, WI). No ethical approval or consent was required as no experimental research on humans was involved in this study. However, the Ethics Committee for Clinical Research in Gipuzkoa Health Area evaluated and approved the study.
Simulated population
The exact number of women invited to participate in the programme for the first time and their ages at that time were available from the programme database. Exactly 414,041 life histories were created, one for every woman invited at least once into the BCSPBC from 1996 through 2011 (see Additional file 1: Table S2).
The age distribution of the invited women changed during the study period. From 1996 to 1998 during the programme implementation, the population consisted only of the incident cohorts aged 50 through 64 years. In subsequent years, instead, the incident cohort included those aged 50 to 51 years. Actually, the target population also included several prevalent cohorts, apart from the incident cohort. The extension of the target population from 50 to 64 years and then 50 to 69 years began in 2006, with women aged 65 years continuing in the programme until age 69.
Mortality from causes other than BC was randomly assigned, depending on the woman’s birth cohort, based on an empirical function. All-cause and BC-caused mortality data were obtained from the Basque mortality registry for the period 1986–2010 (see Additional file 1: Table S1). The high quality of the Basque cancer registry data has been demonstrated by Izarzugaza et al. [22]. The Basque Statistics Institute (EUSTAT) provided the population of Basque women by age and birth cohort. We applied an actuarial method that removes breast cancer as a cause of death to estimate the age at death from causes other than BC, by birth cohort [23].
Natural history
We modelled the natural history of BC using the approach adopted by Lee et al. [24]. Four main states of health were distinguished: (1) disease-free or undetectable BC; (2) asymptomatic BC that can be diagnosed by screening or preclinical phase; (3) symptomatic BC diagnosed clinically; and (4) death from BC.
The age distribution used to assign the onset of the preclinical phase was obtained from Rue et al. [25]. On the basis of BC incidence from Catalan cancer registries and a distribution of the sojourn time or duration in the pre-clinical state, those authors used a generalized linear model with a Poisson distribution and a polynomial parameterization for the variables of age and cohort for the estimation of BC incidence when no data was available [25]. Cohort effects enabled including upward breast cancer incidence trends in our model. We assumed, as did Lee et al., that the sojourn time of the pre-clinical phase follows an exponential distribution as based on results of clinical trials [24]. The mean value used by Lee et al. for women aged 50 years or more was 4 years.
One of the main assumptions in this model was that every woman who reached the clinical state would be diagnosed clinically at the beginning of this state. Thus, we used the age-specific distribution of BC detection stages observed in the cancer registries of the Basque Country in 1995, before the screening programme began for clinically detected BC (see Additional file 1: Table S3). In situ carcinomas are also included in the model as the lowest stage in which BC could be detected. On the basis of the work by Vilaprinyó et al. [26], we applied distributions of age- and stage-specific survival in women diagnosed either clinically or by screening. Thus, each diagnosed woman was assigned two ages at death and the minimum of these two ages determined the cause and age of death for each woman. Clinical practice and treatments used in detected breast cancers during the studied period followed European guidelines and were described in detail by Arrospide et al. [27].
Screening effects
The screening test for BCSPBC consisted of mammography with double projection (cranio-caudal and oblique lateral view) carried out biennially [28]. The total number of mammograms performed in the programme was determined by the number of invited women (including early recalls) and annual attendance rates, which were exactly known from the programme data base (see Additonal file 1: Table S2). Annual attendance rates were considered independent as correlation of the participation in first and repeated screening rounds was not available.
Sensitivity and specificity of the entire early detection programme, as well as distribution of breast cancer stage for cancers detected by screening, varied during the study period. Four phases were distinguished: (1) from 1996 to 1999, the implementation phase, when most of the women invited into the programme were incident; (2) from 2000 to 2005, the prevalence phase, when the percentage of women invited for the first time was much lower than the percentage of women invited for successive mammograms; (3) from 2006 to 2008, extension phase, when the programme was extended to women aged 65 to 69 years; (4) from 2009 to 2011, digital phase, when the switch to digital mammography occurred. Sensitivity and specificity of the screening programme for each of the defined phases was calculated on the basis of observed mammography results in the BCSPBC, together with the number of invited women and number of screening-detected breast cancers and observed interval cancers (see Additional file 1: Table S4). In the model, a positive or negative screening result was assigned based on the presence or absence of BC in each woman and the corresponding sensitivity and specificity of the programme.
Distributions of disease stages for screening-detected cases were also obtained for the different phases of BCSPBC (implementation, prevalence, extension and digital) with use of observed data from the screening programme (see Additional file 1: Table S3). This figures were in line with other stage distributions used in similar studies carried out in different countries [17, 24]. In each woman the same random number as the one used for clinical detection stage assignment was used to simulate the stage distribution for screening-detected cancers, in order to estimate the advance in detection stage due to screening.
Model validation
The model was run in the screened scenario for the whole female population invited at least once into the BCSPBC during the study period in order to reproduce the actual performance of the programme. Model calibration and validation procedure is fully described in the additional file (see Additional file 1). Three main parameters were calibrated: time between consecutive invitations, age distribution of preclinical phase onset and its mean duration. We obtained the best fitting parameters to include in the final model by following the seven-step approach for calibrating models suggested by Karnon et al. [29] and detailed in the additional file (see Additional file 1).
Screened versus unscreened scenario: outcomes analysis
When the model for the screened scenario was calibrated and validated the same model was run also for the unscreened scenario involving the same female population invited at least once into the BCSPBC during the study period. All the created entities were cloned to obtain two identical populations (screened and unscreened) in each run.
We first used a multi-cohort model that allowed the best approach to reproduce population dynamics in the Basque Country as well as the natural history of breast cancer. However, the assessment of the balance between benefits and harms from preventive programmes typically requires a long follow-up thus the key for its interpretation is to achieve the steady state that is defined as the time when each recently observed behaviour of the system will remain constant in the future [30]. As the first 15 years of the BCSPBC evaluated in this study were not enough to achieve this state, we cannot ensure that the differences between the two scenarios estimated in 2011 will remain in the future.
To further understand the effects of a programme that requires long-term follow-up, we reran the model using a single cohort of 50,000 women aged 50 years who were invited into the programme for the first time in 1996, assuming 100 % participation and life-time follow-up [31]. The inputs used to extrapolate the results from 2011 on were based on the same parameters as those that were used for 2011.
In order to include variability of population characteristics in the model, the multi-cohort model was run 1,000 times. Mean and standard deviations for the results of the 1,000 replications were calculated. The same outputs were obtained for both multi-cohort and single-cohort model. The multi-cohort model was used to estimate population-level effects, whereas individual benefits and harms were estimated with the same model for a single cohort.
The estimated age-specific BC incidence and mortality during the period 1996–2011, which was a scenario that fit the actual development of the screening programme, was compared with the simulated scenario without screening. Overdiagnosis is defined as the number of women who are diagnosed and treated for cases of breast cancer that never would have become symptomatic in the absence of screening. However, the operational definition in the literature used to estimate overdiagnosis is the difference between the number of BC cases detected with screening and the number without screening. In our case as steady state is not achieved, this definition to estimate overdiagnosis includes not only overdiagnosis but also screen detected BC cases that would be clinically detected in the future in absence of screening [32]. Therefore, overdiagnosis is overestimated when analysing population level results in a multi-cohort model that did not achieve the steady state. Accordingly, we will refer in this case to the "incidence increase" instead of "overdiagnosis".
In the case of a single cohort with lifetime follow-up, however the definition used to estimate the incidence increase matches exactly with overdiagnosed cases of BC. We first calculated the relative BC incidence increase (i.e., overdiagnosis) compared with the number of BC cases in a scenario without screening [33] and in a second approach, we estimated the fraction of overdiagnosed cases of BC identified by the screening programme.
In addition, the number of women with a false positive result who were referred to the reference hospital for additional tests based on the sensitivity and specificity data from the BCSPBC were also considered harms of the screening programme.
The probabilities of BC-related death in women detected in the screened and unscreened cohorts were analysed with Cox regression only for the single-cohort analysis. Survival time was measured from the beginning of the assigned clinical phase even for BC detected by screening in order to avoid lead time bias.