Study cohort and risk of breast cancer
Our model simulated 100,000 white women aged 40 years with no previous history of breast cancer. Each woman had an underlying risk of developing breast cancer based on a recent risk distribution estimated for US white females using a comprehensive set of genetic and other non-modifiable and modifiable breast cancer risk factors [9]. As criteria for who is considered ‘high risk’ for screening purposes differ across guidelines, we conservatively classified women into three categories: (i) ‘true’ low risk, defined as those with an underlying risk of breast cancer less than 1.1 times the average risk in the population of 40 year old women (that is, relative risk (RR) is lower than 1.1); (ii) ‘true’ high risk, defined as those with RR between 1.1 and 4; and (iii) ‘true’ very high risk, defined as those with RR of 4 or higher. The RR threshold of 1.1 was chosen because it can capture a broad range of factors known for increasing risk of breast cancer, including family history of breast cancer, reproductive risk factors, genetic variations and dense breast on mammography [10]. Meanwhile, the RR threshold of 4 captures factors such as history of chest radiation and atypical hyperplasia [11, 12]. With these RR thresholds, 1% of our hypothetical study cohort was classified as ‘true’ very high risk, 42% as ‘true’ high risk and the remaining 57% as ‘true’ low risk.
Screening strategies
We compared eight alternative screening strategies as shown in Fig. 1. The first two strategies involved no screening and annual screening for all women, respectively. The remaining six strategies were defined by combinations of risk prediction approaches (AI, PRS or family history) and screening frequencies among low-risk women aged 40–49 (no screening or biennial screening). We describe these strategies in detail below.
No screening
In strategy 1 (‘No screening’, hereafter), women were not screened at any age regardless of risk level.
Annual screening for all
In strategy 2 (‘Annual screening for all’, hereafter), all women (regardless of risk level) underwent annual mammography starting at age 40, similar to recommendations by ACOG and ACR.
Screening guided by AI
Strategies 3 and 4 involved risk stratification based on AI reading of an index mammogram. All women underwent an index mammogram at age 40, which was interpreted using AI to predict risk of breast cancer. This mammogram may or may not be part of standard screening services. Women predicted to have high risk (RR > =1.1) underwent annual digital mammography starting at age 40. In strategy 3 (‘AI + no screening for low-risk’, hereafter), women predicted to have low risk were not screened while in strategy 4 (‘AI + biennial screening for low-risk’, hereafter), they underwent biennial screening. This screening pattern continued until age 49. Beyond age 50, screening followed the existing USPSTF guideline as described below.
Screening guided by PRS
In strategies 5 (‘PRS + no screening for low-risk’, hereafter) and 6 (‘PRS + biennial screening for low-risk’, hereafter), screening pathways were the same as in strategies 3 and 4; however, risk stratification was performed using PRS instead of AI. All women underwent genetic testing at age 40 in which 76 single nucleotide polymorphisms (SNPs) known to be associated with breast cancer were genotyped [6].
Screening guided by family history
In strategies 7 and 8, screening was guided by family history (similar to existing recommendations by the USPSTF). For women aged between 40 and 49 years, existing USPSTF recommendation to screen women without family history is only a grade C recommendation (i.e., the net benefit of screening in this group is small) [11, 13]. Therefore, in strategy 7 (‘Family history + no screening for low-risk’, hereafter), we considered that women younger than age 50 without family history were not screened, while in strategy 8 (‘Family history + biennial screening for low-risk’, hereafter), they were screened biennially. The USPSTF guidelines indicate that women with family history may benefit from starting screening before age 50 [2] but do not specify frequency of screening for these women. Given that most other screening guidelines recommend annual screening for high-risk women [2], we considered that women with family history underwent annual mammography starting at age 40.
Beyond age 50, screening in strategies 3-8 followed existing USPSTF guidelines. Therefore, women without family history were screened biennially [11]. Furthermore, as the USPSTF does not specify screening frequency for those with family history, similar to younger women, women with family history underwent annual mammography. In all strategies (except ‘No screening’), screening ceased at age 74.
The eight strategies, thus, differed in the proportion of women subjected to aggressive screening. ‘Annual screening for all’ was the most aggressive as all women, including those at low risk, were screened annually starting at age 40. By contrast, in the remaining strategies, low-risk women younger than age 50 were either not screened or screened only biennially while those aged over 50 were screened biennially. While screening frequencies were the same in strategies 3,5 and 7, and in strategies 4, 6 and 8, these strategies differed in their accuracy of risk prediction for women aged between 40 and 49, which in turn determined the proportion of women screened prior to age 50.
Model structure
We developed a hybrid decision tree/microsimulation model to estimate the costs and effectiveness of the eight screening strategies. The analysis was conducted from the health care system’s perspective. Cycle length was 1 year and lifetime horizon was used.
Figure 2 shows a simplified depiction of the model. The decision tree component of the model captured risk prediction and stratification at age 40 based on AI, PRS or family history. Women entering the model had an underlying ‘true’ low, high or very high risk of breast cancer. As risk factors associated with very high risk (RR > =4) are likely known a priori, women with RR > =4 did not require risk prediction and underwent annual screening regardless of screening strategy (except in the ‘No screening’ strategy). Depending on risk-stratification strategy, AI, PRS or family history were used to predict the underlying risk for the remaining women; the extent to which the estimated risk category matched the underlying risk category was determined by the accuracy of each method (described below).
The microsimulation component, which was adapted from a previously published model [14], simulated the screening, diagnosis, disease progression and mortality from breast cancer. All women entering the microsimulation model had no tumor but could develop in-situ or invasive cancer over time based on observed age-specific incidence rates; in situ cancer could further progress to invasive cancer. Invasive cancers were classified into local, regional and distant stages [14]. Women who underwent mammography screening were more likely to be diagnosed with in situ cancer. However, more aggressive mammography screening also resulted in more cancers being diagnosed in earlier (instead of more advanced) stages [14]. Women who developed invasive breast cancer faced risk of death from cancer or from other causes.
Model inputs
Inputs used in our model are presented in Table 1 and described below. Further details are provided in the Online Supplementary Materials.
Accuracy of risk prediction
The key determinant of costs and effectiveness of each screening strategy was the accuracy of risk prediction. Higher accuracy of risk prediction implied that fewer women with underlying high-risk were incorrectly predicted to be at low risk, resulting in timely diagnosis and treatment of cancer for high-risk women. It also meant that fewer low-risk women were incorrectly predicted to be at high risk, leading to reduction in screening and fewer false-positive diagnoses.
In our model, accuracy of breast cancer risk prediction using AI and PRS was measured using area under the receiver operating characteristic curve (AUC) obtained from published studies [6, 7]. As real-world clinical decisions will also likely utilize information on other demographic and personal risk factors (such as weight, family history and breast density) in addition to AI or PRS, we used AUC values for models based on both AI or PRS and other risk factors. Using data from digital screening mammograms read by deep learning algorithms (AI), information on other demographic and personal risk factors and breast cancer outcomes from tumor registries, Yala et al. estimated an AUC of 0.71 for white females in the US [7]. We chose this study to obtain the AUC for AI owing to its large study sample of patients seen in the US (over 31,000 patients in the training dataset and over 3900 patients in the test set) [7]. Meanwhile, AUC for PRS was obtained from Vachon et al., a recent, high-quality study that estimated the AUC for PRS combined with other risk factors for a large study sample primarily consisting of American women [6]. Vachon et al. estimated an AUC of 0.69 for a model that combined PRSs developed based on 76 SNPs and information from the Breast Cancer Surveillance Consortium (BCSC) five-year risk-prediction model [6]. We followed a previously published method to simulate distributions of RR estimated using AI or PRS using these AUC values [29, 30]. Women with estimated RR of 1.1 or higher were then classified as high risk while those with estimated RR below 1.1 as low risk. We note that as AUC of both AI and PRS is below 1, not all ‘true’ high risk women will be correctly classified as such.
In strategies that involved risk prediction based on family history, as women with an underlying low risk will not have a family history of breast cancer, all low-risk women will be correctly classified as such. Among high-risk women, we assumed that 37% will be correctly classified. This proportion was calculated as the share of US women with first-degree family history of breast cancer (16% [15, 16]) among high-risk women (43% of our study cohort).
Incidence and stage distribution of breast cancer and mortality risk
To estimate a woman’s likelihood of developing in situ or invasive breast cancer, we multiply age-specific breast cancer incidence rates per 100,000 white women in the US [31] (adjusted for increase in incidence rates due to mammography screening [14]) with the woman’s ‘true’ RR. The stage at cancer detection depended on screening frequency and sensitivity of mammography; the latter depended on patient age and was obtained from the published literature [20]. Women receiving more aggressive screening were diagnosed at earlier stages than those receiving less frequent screening. Stage distribution at diagnosis in the absence of screening was calculated based on the proportions of local, regional and distant cancers observed among white women aged below or above 50 years during 1975–1979 (when mammography screening was not widespread in the US) [17]. Meanwhile, stage distributions with annual or biennial screening were obtained from more recent estimates based on 1996–2012 Breast Cancer Surveillance Consortium data [18]. Patients diagnosed with invasive breast cancer faced risk of breast cancer mortality for up to 20 years after diagnosis. This risk was specific to age and stage at diagnosis as well as estrogen-receptor (ER) and human epidermal growth factor 2 (HER2) status [32]. All women faced risk of mortality from non-breast cancer causes which was age-specific, and was obtained by subtracting age-specific breast cancer mortality from the 2017 US life tables [33].
Costs
The cost of each strategy included cost of risk prediction (index mammogram read by AI technology or genetic testing as applicable), cost of screening with digital mammogram (if any), and cost of breast cancer treatment determined by the stage at cancer diagnosis (treatment costs were lower for cancers detected at an earlier stage). Cost of genetic test to determine PRSs was the cost of OncoArray test in US laboratories [22]. We assumed that patients underwent genetic counselling before and after the genetic test, and that each counselling session costed $44 [23]. While cost of AI-based risk prediction in clinical practice is not yet available, calculations by European Society of Radiology suggest fixed costs of €60,000 ($65,300) in addition to an annual cost of €20,000 ($21,770) for the software license [21]. Assuming equipment is amortized in 10 years, and with 8695 mammogram facilities in the US [27] serving nearly two million women aged 40 years [28], cost of AI reading of each mammogram amounts to ~$112. We varied cost of AI reading per mammogram over a wide range (up to $500) in the sensitivity analyses.
The cost of mammogram was obtained from Center for Medicare and Medicaid’s 2020 Physician Fee Schedule [24]. Cost of diagnostic work-up following a positive diagnosis and cost of treatment of breast cancer were obtained from the published literature [19, 25]. All costs were estimated in 2020 US dollars and discounted at 3% per year [34].
Effectiveness
Effectiveness was measured in terms of Quality Adjusted Life Years (QALYs) that captured a person’s life expectancy adjusted by his/her health-related quality of life called utility. Screening entailed disutility of 0.006 QALYs for 1 week and diagnostic workup following a positive screening result involved disutility of 0.105 QALYs for 5 weeks [25]. Utilities were specific to patient age and stage of cancer [14]. For all cancer stages, utilities in the first year after breast cancer diagnosis were lower than in later years [14]. All utility values were discounted at 3% per year [34].
Cost effectiveness analysis
We estimated the total costs and QALYs of the eight strategies. A strategy was considered cost-effective relative to another strategy if the Incremental Cost Effectiveness Ratio (ICER), calculated as the difference between the overall costs of the two strategies divided by the difference between the total QALYs gained, was lower than the conventional willingness-to-pay threshold (WTP) of $100,000 per QALY. Meanwhile, a strategy was dominated if it was both more costly and less effective than the other strategy or extended dominated if it achieved fewer total QALYs than a more costly strategy at a higher incremental cost per QALY (i.e., its ICER relative to the next less costly strategy was higher than the ICER of a more effective strategy) [35].
In addition to the eight strategies examined in the main analysis, we conducted an augmented analysis which included 4 additional strategies. These additional strategies were similar to strategies 3–6 above, except that risk prediction was performed exclusively using AI or PRS, i.e., without considering demographic and personal risk factors. Thus, AUC values in these additional strategies were 0.69 for AI [7] and 0.63 for PRS [36] (instead of 0.71 and 0.69, respectively, in strategies 3-6).
Furthermore, we conducted several sensitivity analyses. First, we varied values of key costs and utilities in one-way sensitivity analyses and addressed parameter uncertainty using probabilistic sensitivity analyses (PSA). Next, we varied AUCs of AI and PRS to 20% lower and higher values than those used in the main analysis.
We also examined the robustness of our findings to the choice of the RR threshold used to define estimated high risk. Following previous studies, we used alternative thresholds of 1.3 and 2 (instead of 1.1 used in the base case analysis) [37]. All analyses were performed using TreeAge Pro 2019 v2.1 [38].
Model validation
We assessed the validity of our model following the Assessment of the Validation Status of Health-Economic decision models (AdViSHE) tool [39] and guidelines of the International Society for Pharmacoeconomics and Outcomes Research [40]. First, we conducted trace analysis and compared the modelled lifetime cumulative breast cancer incidence and mortality with screening to recently observed proportions. Next, while cost-effectiveness of AI-based screening has not been examined previously, we cross-validated the estimated incremental costs, QALYs and false-positive rates (compared with no screening) against previous studies for the strategy where risk prediction is based on family history and those without family history are screened biennially starting at age 50.