Economic evaluation of the breast cancer screening programme in the Basque Country: retrospective cost-effectiveness and budget impact analysis

Background Breast cancer screening in the Basque Country has shown 20 % reduction of the number of BC deaths and an acceptable overdiagnosis level (4 % of screen detected BC). The aim of this study was to evaluate the breast cancer early detection programme in the Basque Country in terms of retrospective cost-effectiveness and budget impact from 1996 to 2011. Methods A discrete event simulation model was built to reproduce the natural history of breast cancer (BC). We estimated for lifetime follow-up the total cost of BC (screening, diagnosis and treatment), as well as quality-adjusted life years (QALY), for women invited to participate in the evaluated programme during the 15-year period in the actual screening scenario and in a hypothetical unscreened scenario. An incremental cost-effectiveness ratio was calculated with the use of aggregated costs. Besides, annual costs were considered for budget impact analysis. Both population level and single-cohort analysis were performed. A probabilistic sensitivity analysis was applied to assess the impact of parameters uncertainty. Results The actual screening programme involved a cost of 1,127 million euros and provided 6.7 million QALYs over the lifetime of the target population, resulting in a gain of 8,666 QALYs for an additional cost of 36.4 million euros, compared with the unscreened scenario. Thus, the incremental cost-effectiveness ratio was 4,214€/QALY. All the model runs in the probabilistic sensitivity analysis resulted in an incremental cost-effectiveness ratio lower than 10,000€/QALY. The screening programme involved an increase of the annual budget of the Basque Health Service by 5.2 million euros from year 2000 onwards. Conclusions The BC screening programme in the Basque Country proved to be cost-effective during the evaluated period and determined an affordable budget impact. These results confirm the epidemiological benefits related to the centralised screening system and support the continuation of the programme. Electronic supplementary material The online version of this article (doi:10.1186/s12885-016-2386-y) contains supplementary material, which is available to authorized users.

Country in 1995, before the screening programme began for clinically detected BC (Table S2). In situ carcinomas were considered the lowest stage in which BC could be detected. On the basis of the work by Vilaprinyó et al. (Vilaprinyo et al, 2009), we applied distributions of age-and stage-specific survival in women diagnosed either clinically or by screening.
Mortality from causes other than BC was randomly assigned, depending on the woman's birth cohort, based on an empirical function. All-cause and BC-caused mortality data were obtained from the Basque mortality registry for the period 1986-2010 (Table S1). Data related to Basque women population by age and birth cohort were provided by The Basque Statistics institute (EUSTAT). In order to estimate the age at death from causes other than BC, by birth cohort, we used the actuarial method that removes breast cancer as a cause of death, described by Vilaprinyo et al. (Vilaprinyo et al, 2008). Thus each diagnosed woman was assigned two ages at death and the minimum of these two ages determined the cause and age of death.

Screening characteristics
The good quality of the programme data base allowed to calculate the exact number of women invited for the first time into the BCSPBC from 1996 through 2011, exactly 414,041 women (Table S3). Their age distribution was also obtained from the programme data base. From 1996 to 1998 during the programme implementation, the population consisted only of women invited for the first time, that is, cohorts aged 50 through 64 years. In subsequent years, instead, only cohort aged 50 to 51 years were invited for the first time. Actually, the target population also included several cohorts that had previously been invited to participate in the programme, apart from those that received the invitation for the first time. The extension of the target population from 50 to 64 years and then 50 to 69 years began in 2006, with women aged 65 years continuing in the programme until age 69 (Sarriugarte, 2011).
The total number of mammograms performed in the programme was determined by the number of invited women (including early recalls) and annual attendance rates, which were exactly known from the programme data base (Table S3). Annual attendance rates were considered independent as correlation of the participation in first and repeated screening rounds was not available.
Four phases were distinguished during the studied period due to the variability of sensitivity and specificity values and screen-detected BC stage distribution: (1) from 1996 to 1999, the implementation phase, when most of the women invited to the programme received their first invitation; (2) from 2000 to 2005, the prevalence phase, when the percentage of women invited for the first time was much lower than the percentage of women invited for successive mammograms; (3) from 2006 to 2008, extension phase, when the programme was extended to women aged 65 to 69 years; (4) from 2009 to 2011, digital phase, when the switch to digital mammography occurred.
Observed screening mammography results were used together with the number of invited women and number of screening-detected breast cancers and observed interval cancers to calculate sensitivity and specificity for each of the defined phases (Table S4).
In the model, a positive or negative screening result was assigned based on the woman's actual health status and the correspondent sensitivity and specificity of the programme.
Observed data was also analysed to obtain the distributions of disease stages for screening-detected cases in the different phases of BCSPBC (implementation, prevalence, extension and digital) (Table S2). In addition, as two identical populations were created for the comparison of the screening and no-screening scenarios, the same random numbers were used to simulate the stage distribution for the clinically and the screening-detected cancers in the same woman, in order to estimate the advance in detection stage due to screening.

Model calibration and validation
The model was run in the screened scenario for the whole female population invited at least once into the BCSPBC during the study period in order to reproduce the actual performance of the programme.
Three main parameters were calibrated: time between consecutive invitations, age distribution of preclinical phase onset and its mean duration. We obtained the best fitting parameters to include in the final model by following the seven-step approach for calibrating models by Karnon et al. (Karnon and Vanni, 2011).
Random search and grid search algorithms were combined, and 25 simulations were run for each possible value. The goodness-of-fit measure applied to assess the difference between observed and estimated outcomes was the chi-square statistic. The overall chisquare statistic of each hypothesis was calculated as the sum of the chi-square statistics calculated for the analysed years. We assumed outcomes for each year to be independent and uncorrelated. Finally, we included in the model the parameter value for which the overall chi-square was the lowest.
First, we calibrated the time between intervals considering that it was not influenced by other unobserved parameters. At the beginning, we used a random search algorithm considering different values from a normal distribution centred in 2 years and standard deviation 0.5. Based on these results, we continued using a grid search algorithm, running 25 simulations for 10 different values between 2.11 and 2.20. The goodness-of-fit measure applied to assess the difference between observed and expected outcomes was the chi-square statistic. We included in the model the parameter value for which the overall chi-square statistic was the minimum: 2.18 year between consecutive invitations ( Figure S2).
Afterwards, we calibrated jointly two factors. The first one will be the relative risk (RR) for the incidence function. The second multiplier will be used to calibrate the mean value for the preclinical state duration which prior estimate was 4.0. Thus we will calibrate the factor t to obtain a final mean preclinical state duration 4t. We considered as target outputs the number of screening-detected cancers from 1996-2011, together with total cancer detection rates by age group (50- Figure S2), age-specific breast cancer incidence ( Figure   S3) or the number of women with a positive mammography result ( Figure S4). We also confirmed that life expectancy for women from the general population and women who died from BC was concordant with the observed data (Table S5).

Probabilistic Sensitivity Analysis
The probabilistic feature of the model is based on varying the main variables randomly at the same time [21]. Each variable has assigned a distribution fitting the range of all possible values and at the beginning of each simulation a random generator selects the value for each variable from the specified distribution. This permits to examine the effect of joint uncertainty in the variables of the model. The distributions used for the main parameters varied in the probabilistic sensitivity analysis were detailed in Table   S6.
Time between invitations was calibrated with the aim of reproducing the number of invitations carried out in the programme and the optimal value obtained was 2.18 years.
Therefore a uniform distribution was used for this parameter centred in 2.18 and including the theoretical value 2.00 years. The same occurred for the mean value of the duration of the pre-clinical state, where a uniform distribution centred in 3.44, calibrated value, and including 4.00, theoretical value, was used.
On the other hand, a Beta distribution was used both for sensitivity and specificity values. In this case the parameters were based on the number of cases observed in the screening programme in the period 1996-2011: true positive and false negative results for sensitivity and true negative and false positive results for specificity.
Finally, Dirichlet distribution was used for the distribution of detection-stage on screendetected cancers. The parameters used for Dirichlet are mainly the number of cases observed in the screening programme for each detection-stage depending on the period and detection-age.
The cost-effectiveness plane displays the incremental cost (vertical axis) and effectiveness (horizontal axis) results of 1,000 simulation runs. In addition, the acceptability curve represents the probability that breast cancer screening is costeffective compared with no screening for varying threshold values of the costeffectiveness ratio [21] ( Figure S5). The ICER obtained in each of the 1,000 runs is confronted with the different thresholds to calculate those probabilities.
Variability in participation rates was not included in the main probabilistic sensitivity analysis as the observed values were obtained from a sample of about 115,000 women each year the variability was assumed very small. However, as we were concerned about the interest on the variation of this parameter we ran the main single-cohort model for two more scenarios with lower participation rates: 50% and 30%.        Multi-cohort -0% discount Multi-cohort -3% discount Single-cohort -0% discount Single-cohort -3% discount