The principal result of this study is that when adapting the incidence and mortality rates, the Gail and Chen models were well calibrated to estimate the risk of invasive BC in a population of Spanish women who participated in a screening program, whereas the Barlow model significantly overestimated this risk. All the three predictive models show a limited level of discrimination, despite the fact that they have been previously used in the US to classify women into high and low risk groups [18]. In general, good performance was seen in the Gail and Chen models when the subgroups of women are defined by categories of risk factors.
It is relevant to point out that the use of these models in our study reproduces the original results in terms of discrimination. In the original article, Chen et al. already compared the discriminatory value of the Gail model against a new model that included breast density. In that case, the AUC for the 5-year prediction was 0.596 for the Gail model and 0.643 for the Chen model [11]. In general, it is considered that a prediction tool should have an AUC greater than 0.7 [22]. With adaptation to the population incidence and mortality rates, we obtained an AUC of 0.561 for the Gail model and 0.586 for the Chen model, for the same 5-year period. Actually, the confidence intervals of the area under the curve in our study contained the values of the original models. The original Barlow publication only showed the discriminatory value of the one-year predictive model, 0.624 [12]. In our study, this figure was 0.602 and the 95% CI (0.440, 0.765) also included the original AUC value.
At the European level, there are adaptations of the Gail model in concrete populations such as an Italian and a Spanish study [26, 27]. One important aspect of these studies is that they include relative risks of the risk factors adapted to their study population. Furthermore, they also modify the incidence of BC as well as mortality by other causes. The risk factors included and the methodology applied for the projection of risks at five years was exactly the same as that used in the original Gail model. Discrimination levels of the Italian and the Spanish adapted models were 0.590 and 0.544, respectively. In the Italian study, the AUC was similar to the 0.586 that Gail found in his study population, whereas in the Spanish study, the AUC was lower and similar to our estimate. Another article published in the US [28] showed that the use of relative risks specific to Hispanic and non-Hispanic populations slightly improved discrimination. In our study, the relative risks were not estimated using the study population due to small frequencies in some of the groups defined by risk factors. Although the original relative risks seem to work well for the Gail and Chen models, they may explain in part the lack of calibration of the Barlow model.
Other facts that can explain why the Barlow model did not perform well are differences in the population characteristics, inclusion criteria, and timing projections. In contrast to our study sample, women included in the Barlow study were racially and ethnically diverse. The Barlow study sample included the incident cases detected by the first mammogram and was developed as a short-time prediction model. Additionally, the model does not use BC incidence rate or mortality by other causes. All these facts also may explain why the Barlow model overestimates risk of breast cancer in our population. A new model for assessing 5-year risk was developed later by the Breast Cancer Surveillance Consortium [29], which would be interesting to assess in a Spanish population in future studies.
In Darabi et al. [30], where the Gail model was evaluated using data from a Swiss study, the result was an AUC and 95% confidence interval of 0.548 (0.527, 0.568). Furthermore, they determined the improvement in prediction due to the incorporation of breast density and body mass index. The expanded model increased the AUC to 0.571 (0.545, 0.597). Our results show that the Chen and Barlow models, that also incorporate breast density, have slightly greater discriminatory power for prediction at five years than the Gail model.
We have identified three published studies in which one of the studied models, the Gail model, was applied to the Spanish population. Pastor-Climente et al. [31] estimated the risk of developing BC in a 5-year period, using the Gail model calculator available on the web, without adapting either incidence or mortality for other causes [32]. The sample used included only women that had been diagnosed with BC. The study concluded that only 42% of women diagnosed with BC had a high risk, defined as 1.67% or greater [18]. Thus, the original Gail model showed low sensitivity, and sensitivity is a required characteristic for a model to be used for decision-making in a screening context. Buron et al. [33], in a screening program context, assessed the utility of the original Gail model to predict BC in women with a prior positive mammogram. At five years, discrimination was low (AUC = 0.61) and, using the standard threshold of 1.67%, sensitivity and specificity were 46.2% and 72.1%, also too low for clinical decision-making. The third study, by Pastor Barriuso et al. [27], assessed the performance of the original and a recalibrated Gail model together with a new model fully developed by the authors. Consistent with our results, the recalibrated Gail model was well calibrated overall, although it tended to underestimate risk for women in low-risk quintiles and to overestimate it in high-risk quintiles. In our study, we observed concordance between expected and observed in the low-risk groups and a slight overestimation of risk in high-risk quintiles.
Breast density is a risk factor strongly associated with the risk of BC, as demonstrated in recent years in various studies [34, 35]. The Chen model was designed as an adaptation of the Gail model with the incorporation of breast density as a risk factor. If we compare the results obtained in our study, we see that the Chen model shows improved discrimination at five years over the Gail model, although in our sample the Chen model overestimates risk for women with high density. The Chen model used a quantitative measure of density, although it was then categorized into a variable with five categories, similar to the BI-RADS classification. Given the significant correlation between the BI-RADS and other quantitative measurement systems [36, 37], and the availability of the BI-RADS in our screening program, we considered using it as an approximation. Nevertheless, the inclusion of longitudinal measurements of breast density in the models could improve the risk estimates, as other authors have shown [38].
Another risk factor with important weight in these models is family history. The coefficient of the Barlow model, for pre-menopausal women, is similar to the Chen model’s coefficient for the variable “number of first-degree relatives with BC”. Nevertheless, the Barlow model for post-menopausal women has a lower coefficient. It is possible that part of the risk attributable to family history is explained by other variables, such as body mass index or surgical menopause, which are not included in the other models mentioned. The Gail model, on the other hand, gives a higher weight to family history in comparison to the Chen model. With the inclusion of breast density in the model, family history loses its impact in risk prediction.
One of the principal contributions of our study is the assessment of the risk models using specific incidence and mortality rates by birth cohort in our geographic area. This procedure makes it possible to improve the Gail and Chen estimates based on the incidence rates of BC and mortality rates by other causes, which were obtained from a cross-sectional study. Given that BC incidence rates have an increasing trend, cross-sectional rates overestimate rates for past periods and underestimate those of future periods. As a result of using mortality rates by birth cohort, estimated survival in women over 50 in our study increased considerably in comparison with the US data of the original models. Therefore, a conclusion of our study was that, when local data for BC incidence and mortality from other causes were used, the Gail and Chen models provided unbiased estimates of risk of developing BC in our population.
One limitation of this study is that the Girona and Tarragona Cancer Registries do not include the population in the area studied. Although there were no differences observed in incidence rates between Girona and Tarragona, two areas of Catalonia that are geographically separated, it may be that the study area had a lower incidence of BC. Nevertheless, in a previous study, no differences were observed in BC mortality between a geographical region that included the study population, and the provinces of Girona and Tarragona [39].
Other limitations are related to the number of cancer cases and to missing values. As mentioned above, the small number of cancer cases precluded estimating specific relative risks, which have an impact on the performance of the models, along with the incidence and mortality rates. With respect to missing values, a sensitivity analysis with complete data showed that the calibration results were similar and discrimination slightly improved.
Finally, it is worth mentioning that the risk estimates are based only on the baseline characteristics reported at the first screening exam of the early detection program. With the number of previous biopsies being an important risk factor, a very small number of women reported having had biopsies before their first screening mammography. In these risk models, this is an important issue, because the estimating equation assumes that the probability or the relative risk is maintained over time.