Risk factors, prognostic factors, and nomograms for bone metastasis in patients with newly diagnosed infiltrating duct carcinoma of the breast: a population-based study

Background Breast cancer is the most common malignancy in women, and it is also the leading cause of death in female patients; the most common pathological type of BC is infiltrating duct carcinoma (IDC). Some nomograms have been developed to predict bone metastasis (BM) in patients with breast cancer. However, there are no studies on diagnostic and prognostic nomograms for BM in newly diagnosed IDC patients. Methods IDC patients with newly diagnosed BM from 2010 to 2016 in the Surveillance, Epidemiology and End Results (SEER) database were reviewed. Multivariate logistic regression analysis was used to identify risk factors for BM in patients with IDC. Univariate and multivariate Cox proportional hazards regression analysis were used to explore the prognostic factors of BM in patients with IDC. We then constructed nomograms to predict the risk and prognosis of BM for patients with IDC. The results were validated using bootstrap resampling and retrospective research on 113 IDC patients with BM from 2015 to 2018 at the Affiliated Hospital of Chengde Medical University. Results This study included 141,959 patients diagnosed with IDC in the SEER database, of whom 2383 cases were IDC patients with BM. The risk factors for BM in patients with IDC included sex, primary site, grade, T stage, N stage, liver metastasis, race, brain metastasis, breast cancer subtype, lung metastasis, insurance status, and marital status. The independent prognostic factors were brain metastases, race, grade, surgery, chemotherapy, age, liver metastases, breast cancer subtype, insurance status, and marital status. Through calibration, receiver operating characteristic curve and decision curve analyses, we found that the nomogram for predicting the prognosis of IDC patients with BM displayed great performance both internally and externally. Conclusion These nomograms are expected to be a precise and personalized tool for predicting the risk and prognosis for BM in patients with IDC. This will help clinicians develop more rational and effective treatment strategies.


Background
Breast cancer (BC) is the most common malignancy and the leading cause of death among all female cancer patients [1,2]. Globally, there were approximately 2.1 million newly diagnosed female BC cases in 2018 [3]. Recently, with the advancement of early diagnosis and comprehensive treatment, the mortality rate of BC has gradually decreased, and distant metastasis has become the main cause of death for these patients [4,5]. It has been reported that the incidence of metastases in BC patients ranges from 20 to 30% [6]. More importantly, bone metastasis (BM) accounts for 50% of all distant metastases in these patient [7]. At present, most BC patients with BM receive palliative treatment [8]. Although some patients choose surgery, it is not suitable for patients with multiple metastases or a poor overall health [9]. Some studies have shown that the median survival for patients with breast cancer and BM is only 24-36 months [10]. The TNM staging system is the most common tool used to predict the prognosis of cancer patients by assessing tumor size and location (T), distant metastasis (M), and regional lymph node metastasis (N) [11]. However, the TNM staging system does not sufficiently cover cancer biology or predict the outcome for all subtypes of BC [12]. In particular, the TNM staging system fails to quantify the risk for patients with distant metastatic malignancies. Therefore, an increasing number of cancer-related nomograms (statistical tools to estimate the probability of survival or a specific result through a simple graphical representation) have been developed for predicting the prognosis of cancer patients [13]. Nomograms have a number of advantages in predicting the prognosis of some malignant tumors compared to the traditional American Joint Committee for Cancer (AJCC) TNM staging system, making them a good alternative.
It is well established that histological subtypes of breast cancer affect prognosis, and the most common pathological type of BC is infiltrating duct carcinoma (IDC) [14]. At present, there are no studies that have focused on diagnostic and prognostic nomograms for BM in newly diagnosed IDC patients. Therefore, it is necessary to fully understand the epidemiological characteristics of IDC patients with BM to identify the risk and prognostic factors for BM. Welldeveloped clinical nomograms can be used to predict individual outcomes, which is beneficial to both patients and clinicians [15].
Thus, the aim of this study was to develop a predictive model by analyzing the data of the Surveillance, Epidemiology and End Results (SEER) database to determine the risk and prognosis for BM in patients with IDC.

Patients
We included patients with newly diagnosed IDC in the SEER database from 2010 to 2016 in our study. Exclusion criteria were as follows: (1) patients with two or more primary malignancies; (2) patients whose pathological type was not IDC; (3) patients missing important clinical pathological information, including laterality, primary tumor site, grade, TNM stage, or estrogen receptor (ER) or progesterone receptor (PR) status, or HER2 status. Finally, 141,959 patients diagnosed with IDC were included in the present study, of whom 2383 patients (1.68%) had BM, while 139,576 patients (98.32%) did not. In addition, we retrospectively collected data for IDC patients with BM from the Affiliated Hospital of Chengde Medical University (AHOCMU) between 2015 and 2018 as an external validation cohort for our research.

Data collection
The variables were selected to identify the risk factors of BM in IDC patients are as follows: age at diagnosis, sex, race, tumor site, laterality, grade, T stage, N stage, liver metastasis, brain metastasis, lung metastasis, breast cancer subtype, ER status, PR status, HER2 status, insurance, and marital status. In our research, we also performed the survival analyses to study the prognostic factors of IDC patients with BM. In addition to the above variables, the treatment information, including surgery, radiotherapy, and chemotherapy, were also included to study the prognostic factors. Moreover, patients with overall survival (OS) less than 1 month were also excluded from the survival analyses. In the survival analysis, the main endpoint of our study was OS, which was defined as the date from diagnosis to death (due to any cause) or the date of the last follow-up. Risk of developing metastasis was defined as the risk of bone metastasis when the patient was first diagnosed with IDC of the breast. Survival prognosis was defined as the OS of the patient who was first diagnosed with IDC of the breast. Our study was approved by the Institutional Research Committee from AHOCMU.

Development of a diagnostic nomogram
All statistical analyses in our research were performed in R software (version 3.6.1). To identify the risk factors of BM in IDC patients, univariate analysis was performed. Comparisons of continuous data were performed by independent t-tests, while the chi-square test or the Fisher exact probability method were used for categorical data. Variables with a P value < 0.05 in the univariate analysis were included in the multivariate logistic analysis to identify the risk factors for BM in IDC patients. Based on independent risk factors, the rms package was used to build a nomogram and calculate the individual risk score.
Meanwhile, the receiver operating characteristic (ROC) curve was plotted, and the area under the curve (AUC) was used to show the discrimination of the nomogram. Moreover, a calibration curve and decision curve analyses (DCA) were performed to evaluate the nomogram [16].

Development of a prognostic nomogram
To identify the prognostic factors of IDC patients with BM, 2383 patients were included to perform survival analyses. All BM patients were randomly divided into training (n = 1671) and validation (n = 712) cohorts with a ratio of 7:3. The classification process was completely  randomized and it was performed in R software. The best age cutoff values for OS were determined by X-tile software; patients were divided into high, middle, and low groups. We then performed univariate Cox proportional hazards regression analysis to determine the OSrelated variables. Afterward, significant variables in the univariate Cox proportional hazards regression analyses were incorporated into the multivariate Cox proportional hazards regression analyses to determine the independent prognostic factors for IDC patients with BM. Then, a nomogram based on the independent prognostic factors was established to predict the OS for IDC patients with BM. Additionally, time-dependent ROC curves of 1, 3, and 5 years were generated, and the corresponding time-dependent AUCs were used to show the discrimination of the nomogram. Calibration curves and DCA of 1, 3, and 5 years were established. To further validate that the nomogram could perform well in an independent cohort, we validated the nomogram with data from the SEER validation cohort and the AHOCMU cohort. Time-dependent ROC curve, calibration curve, and DCA were also performed in the validation cohort. In the present study, a P value< 0.05(two side) was identified as statistical significance.

Baseline characteristics of the study population
Based on our criteria, a total of 141,959 IDC patients from the SEER database were included, and an additional 113 IDC patients with BM were identified from the AHOCMU for this study. Additionally, 1671 patients were included in the training cohort and 712 patients were included in the validation cohort. As shown in Table 1

Risk factors for IDC patients with BM
As shown in Table 3, variables with a P value < 0.05 in the univariate analysis were included in the multivariate logistic regression analysis to determine the risk factors for BM in IDC patients. The results revealed that sex, primary site, grade, T stage, N stage, brain metastasis, lung metastasis, liver metastasis,  (Table 4).

Diagnostic nomogram development and validation
A nomogram for predicting the risk of BM in IDC patients was established based on the independent predictors (Fig. 1). ROC analysis showed that the AUCs of the nomogram reached 0.907, demonstrating a better discriminative ability (Fig. 2a). The calibration curve showed high consistency between the observed and predicted results (Fig. 2b). In addition, the DCA indicated that the nomogram had good performance in clinical practice (Fig. 2c).

Prognostic factors for IDC patients with BM
In the training cohort, the univariate Cox proportional hazards regression analysis showed that age, race, primary site, grade, radiotherapy, surgery, chemotherapy, liver metastasis, lung metastasis, brain metastasis, breast cancer subtype, HER2 status, insurance status, and marital status were prognostic factors (all P < 0.05) ( Table 5). Then, the multivariate Cox proportional hazards regression analysis was performed. Finally, ten factors, including age, race, grade, surgery, chemotherapy, brain metastases, liver metastases, breast cancer subtypes, insurance status, and marital status, were identified as independent prognostic factors for OS (Table 5).

Prognostic nomogram development and validation
Based on the prognostic factors selected in the training cohort, a nomogram was established to predict the OS for IDC patients with BM (Fig. 3). ROC analysis showed that the AUCs of these nomograms for the 1-, 3-, and 5year OS reached 0.775, 0.758, and 0.731 in the training cohort; 0.770, 0.773, and 0.753 in the internal validation cohort; and 0.756, 0.764, and 0.767 in the external validation cohort, respectively (Fig. 4a, b, c). The calibration curves of the nomograms showed a strong agreement between actual observations and predictions (Fig. 5). Due to data reasons, the 5-year OS calibration curve for the AHOCMU cohort could not be generated. The clinical application value of the nomogram was evaluated by DCA. As shown in Fig. 6, this nomogram shows a notable positive net benefit over a wide range of death risks, indicating that it has a good clinical utility in predicting the OS for IDC patients with BM. The external validation using the established nomogram in the AHOCMU cohort also demonstrated the high accuracy of the prediction model. Kaplan-Meier survival analysis was performed on the training cohort, internal validation cohort, and external validation cohort, and the results showed that there was an obvious difference in survival rates between the three cohorts (Fig. 7).

Discussion
Almost all deaths in patients with BC are caused by metastatic disease [4,5]. Common metastatic sites include bone, lung, liver, and brain, of which bone is the most common [17,18]. However, unlike the metastases to the lung, liver and brain, BM is generally considered to be less fatal [19]. Once BC patients are diagnosed with BM, the OS decreases dramatically and the median life expectancy decreases to 2-3 years [20,21]. IDC is the most common pathological type of BC; therefore, it is necessary to identify the risk and prognostic factors   [23]. Other studies have also reported that involvement of more than four axillary lymph nodes at initial diagnosis, primary tumor size greater than 2 cm, estrogen receptor positive and progesterone receptor negative tumors and younger age are risk factors for BM in BC patients [24,25]. This is similar to the results of our study. In our study, sex, primary site, grade, T stage, N stage, brain metastasis, lung metastasis, liver metastasis, breast cancer subtype, race, insurance status and marital status were significant predictors for BM in IDC patients. Although Zhao et al. established a nomogram model based on gene expression to predict the risk of BM in BC patients, it is not suitable for a wide range of clinical applications and includes all types of BC, which is not conducive to individualized and accurate predictions [26]. To date, no realistic model has been established to predict the risk and prognosis of BM in ICD patients.
To address this problem we extracted, screened, and organized specific and relevant prognostic and risk factors of IDC patients with BM and established an intuitive and practical prediction model. This model is beneficial to both the clinician and the individual patient.
It is generally believed that IDC with only metastases to the bone has a better OS prognosis than IDC with bone and visceral metastasis [27]. Previous studies have also found that patients with BM alone had a median survival of approximately two to three times that of patients with additional visceral metastases [28][29][30]. Lobbezoo DJ et al. compared the results of Fig. 3 Nomogram to predict the survival of IDC patients with BM. In the prognostic nomogram, values for the individual patient are located along the variable axes, and a line is drawn upward to the Points axis to determine the number of points assigned for each variable. There was a Total Points line at the bottom of the nomogram, and each variable score was summed to give the total points. And the accumulated total points can be used to predict the 1-, 3-, and 5-year survival rate of the patient Fig. 4 ROC curves of the nomogram in predicting prognosis at the 1-, 3-, and 5-year points in the training cohort (a), internal validation cohort (b) and external validation cohort (c). The corresponding time-dependent AUCs were used to show the discrimination of the prognostic nomogram. The red line represents the ROC curve for the prognostic nomogram in predicting the prognosis at the 1-year point. The green line represents the ROC curve for the prognostic nomogram in predicting the prognosis at the 3-year point. The blue line represents the ROC curve for the prognostic nomogram in predicting the prognosis at the 5-year point 815 patients with primary or recurrent metastatic BC and found that patients with visceral metastases and patients with multiple metastatic sites had a worse prognosis [31]. Interestingly, our results showed that the presence of brain metastasis and liver metastasis had a significant negative impact on the OS, which is consistent with the above results. In addition, we found that the number of metastatic organ sites also had a significant effect on survival. Previous studies have shown that patients with four metastatic sites are 2.2 times more likely to die than patients with only one metastatic site [27]. We speculate that patients with only BM develop vital organ dysfunction later, so these patients have a higher survival rate than those with both bone and extraosseous metastases. According to previous research, the breast cancer subtype is an independent risk factor for the occurrence of metastasis, and the incidence of BM is highest in BC patients that are HR+/HER2− or HR+/ HER2+ [23,32]. Our results show that patients with HR+/HER2-BC have a higher risk of BM, and patients with Grade 2 BC are more likely to have BM compared to patients with Grades 3 and 4 BC, which is controversial. At present, most people think that once a tumor has distant organ metastasis, it may accelerate the metastasis to other organs, which is consistent with our results [33]. According to our results, chemotherapy had a positive effect on prognosis. Contrary to what we expected, radiotherapy was not a relevant factor for prognosis. Unfortunately, we were unable to compare the effects of different chemotherapy regimens on survival rates because there was no detailed information on chemotherapy strategies in our data.  ). The X-axis represents the nomogram-predicted OS probability; the Y-axis represents the actual OS probability. Plots along the 45-degree line indicate a perfect calibration model in which the predicted probabilities are identical to the actual outcomes. Vertical bars indicate 95% confidence intervals To facilitate clinical work, we established two nomograms to predict the risk and prognosis for BM in IDC patients. Through calibration, ROC curve and DCA, the nomogram shows great performance, both internally and externally, for predicting the prognosis of IDC patients with BM. These models have better prediction capabilities and higher credibility and can provide references for patient consultations, risk assessment and clinical decision-making. To our knowledge, this is the first population-based model to predict the risk and prognosis of newly diagnosed BM in IDC patients. However, we should acknowledge that this study has some limitations. First, it is a retrospective study and only patients with complete information were included. Therefore, selection bias is likely to exist. Second, some patients with BM have no symptoms, causing the number of newly diagnosed patients with BM to be lower than the actual number. Third, we did not have specific information about systemic treatments, such as endocrine therapy or HER2 targeted therapy. Fourth, since the data in this study were from the SEER database, the nomogram Fig. 6 DCA of the nomogram for predicting the 1-(a), 3-(b) and 5-year (c) OS in the training cohort, the 1 (d), 3 (e) and 5-year (f) OS in the internal validation cohort and the 1 (g), 3 (h) and 5-year (i) OS in the external validation cohort. The x-axis is the threshold probability, the y-axis is the net benefit rate. The black horizontal line indicates that death occurred in no patients. The green oblique line indicates that all patients will have specific death. The red line represents the prognostic nomogram we constructed may not be applicable to IDC patients worldwide.

Conclusion
These nomograms could be used as a supportive graphic tool in IDC to help clinicians distinguish, assess and evaluate the risk and prognosis of IDC with BM. Internal and external validation and application in an independent population demonstrated the satisfactory performance and clinical utility of this predictive model.  Fig. 7 Kaplan-Meier survival analysis of the signature for both the training cohort and the validation cohort. In Kaplan-Meier survival analysis, red curve represents the subgroup with higher risk score, and green curve represents lower risk score. Patients with a high risk score demonstrated a worse OS than those with a low risk score in the training cohort (a, d), internal validation cohort (b, e) and external validation cohort (c, f), which suggests the strong predictive ability for BM patient survival outcome