A nomogram for predicting survival in patients with de novo metastatic breast cancer: a population-based study

Background 5–10% of patients are diagnosed with metastatic breast cancer (MBC) at the initial diagnosis. This study aimed to develop a nomogram to predict the overall survival (OS) of these patients. Methods de novo MBC patients diagnosed in 2010–2016 were identified from the Surveillance, Epidemiology, and End Results (SEER) database. They were randomly divided into a training and a validation cohort with a ratio of 2:1. The best subsets of covariates were identified to develop a nomogram predicting OS based on the smallest Akaike Information Criterion (AIC) value in the multivariate Cox models. The discrimination and calibration of the nomogram were evaluated using the Concordance index, the area under the time-dependent receiver operating characteristic curve (AUC) and calibration curves. Results In this study, we included 7986 patients with de novo MBC. The median follow-up time was 36 months (range: 0–83 months). Five thousand three-hundred twenty four patients were allocated into the training cohort while 2662 were allocated into the validation cohort. In the training cohort, age at diagnosis, race, marital status, differentiation grade, subtype, T stage, bone metastasis, brain metastasis, liver metastasis, lung metastasis, surgery and chemotherapy were selected to create the nomogram estimating the 1-, 3- and 5- year OS based on the smallest AIC value in the multivariate Cox models. The nomogram achieved a Concordance index of 0.723 (95% CI, 0.713–0.733) in the training cohort and 0.719 (95% CI, 0.705–0.734) in the validation cohort. AUC values of the nomogram indicated good specificity and sensitivity in the training and validation cohort. Calibration curves showed a favorable consistency between the predicted and actual survival probabilities. Conclusion The developed nomogram reliably predicted OS in patients with de novo MBC and presented a favorable discrimination ability. While further validation is needed, this may be a useful tool in clinical practice.


Background
Breast cancer is the most common kind of malignancy in females worldwide; it ranks second in contributing to tumor related death in women [1,2]. Approximately 266,120 new cases of invasive breast cancer and 40,920 breast cancer deaths were expected to occur among US women in 2018 [1]. 5-10% of patients were diagnosed with metastatic breast cancer (MBC) at the initial diagnosis. Accurately estimating the prognosis of these patients helps greatly in clinical decision-making. However, most prognosis models were developed for early-stage breast cancer [3,4]. Thus, effective prediction models for de novo MBC patients are warranted to be developed.
Breast cancer tends to be heterogeneous, characterized by diverse histopathologic and molecular features, including age at diagnosis, race, differentiation grade, molecular subtypes, and site of metastasis. These characteristics were previously reported to be associated with survival of de novo MBC patients [5,6]. Chemotherapy and radiation therapy remain the mainstay for MBC patients. Primary tumor resection is not routinely recommended because MBC is considered an incurable disease [7,8]; it is only considered as a means of palliation. However, many retrospective analyses reported the survival benefit of primary tumor resection [9][10][11][12]. These factors mentioned above may interact, leading to distinct outcomes across individual patients.
A nomogram is a reliable and accurate visualization model utilizing risk factors identified in multivariate analysis; it is widely used for the prediction of survival in oncology [13,14]. In this study, we developed and validated a nomogram to predict the survival of de novo MBC patients, through a large cohort of wellcharacterized patients identified from the Surveillance, Epidemiology and End Results (SEER) database.

Patients
Data was obtained from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program, which consists of 18 population-based cancer registries, for patients diagnosed between 2010 and 2016. SEER is an open-access resource for tumor-based demographic and pathological information, as well as treatment information and patient survival outcomes. SEER*Stat Version 8.3.4 (http://www.seer.cancer.gov/ seerstat) was used to identify eligible patients.
Because the SEER database began collecting information on the human epidermal growth factor receptor-2 (HER2) status and sites of distant metastasis in 2010, this was used as the starting point. The inclusion criteria of MBC patients were listed as follows: female, year of diagnosis from 2010 to 2016, older than 18 years old when diagnosed, breast cancer as the first and only malignant tumor diagnosis, histology of infiltrating duct or/ and lobular carcinoma(IDC, ILC), at least one distant site of de novo metastasis. Patients with unknown condition of marital status, race, differentiation grade, T stage, N stage, site of metastasis, or follow-up information were excluded.

Statistical analysis
Patient demographics, tumor characteristics and treatment information were compared using the chi-square test. Overall survival (OS) was defined as the time from breast cancer diagnosis to death from any cause. Patients in the initial cohort were allocated randomly into a training cohort and a validation cohort with a ratio of 2: 1. The training cohort was used to develop a nomogram while the validation model was used to validate the model. In the training cohort, the covariates included in the multivariate Cox proportional hazards models were identified by a backward stepwise method based on the smallest Akaike information criterion (AIC) value, which indicated the minimal loss of prognostic information [15,16].
The nomogram was developed on the basis of independent risk factors and using the "rms" R package. The predictive capacity of the nomogram was assessed using Harrell's C-index (the concordance statistic, or Cstatistic) and the area under the time-dependent receiver operating characteristic curve (AUC), which estimates the probability between the observed and predicted OS. Bootstrapping method with 1000 resamples was utilized to generate the calibration curves for validation of the nomogram in the training cohort and in the validation cohort. The scores of each variable were calculated using the "nomogramEx" package in R. On the basis of the scores of each variable, the total scores for each patient could be calculated.
All analyses were performed with SPSS (version 24.0; SPSS, Inc., Chicago, IL) and R version 3.6.0 (http://www. r-project.org). Statistical significance was assumed at a two-side p value of < 0.05.

Patient characteristics
We included 7986 patients with de novo MBC in the final analysis. The flowchart of the patient selection process is shown in Fig. 1. The median follow-up time was 36 months (range: 0-83 months). The median age at diagnosis was 59 years. Most of the patients (75.1%, 5999) were white. 50.3% (4018) of tumors were poorly differentiated or undifferentiated. HR+/HER2-was the most common (57.6%) subtype among MBC patients, followed by HR+/HER2+ (19.0%) and TNBC (triple negative breast cancer) (13.6%) while HR−/HER2+ was the least common (9.8%) subtype. The most common  The included 7986 patients were allocated randomly into the training cohort (N = 5324) and the validation cohort (N = 2662). The demographic, pathological and treatment information of the two cohorts is listed in Table 1. The distribution of these factors was balanced in the training and validation cohorts. The median OS of the training and validation cohorts was 38 months (interquartile range, 13-66 months) and 39 months (interquartile range, 12-68 months), respectively.

Nomogram construction
According to univariate analysis, age at diagnosis, race, marital status, differentiation grade, molecular subtype, T stage, bone metastasis, brain metastasis, liver metastasis, lung metastasis, surgery, radiotherapy and chemotherapy were associated with OS (p < 0.05, Table 2). The smallest AIC value occurred when we incorporated 12 factors into the multivariate Cox analysis: age at diagnosis, race, marital status, differentiation grade, molecular subtype, T stage, bone metastasis, brain metastasis, liver metastasis, lung metastasis, surgery and chemotherapy (AIC = 6606.9). Figure 2 shows the prediction of the 1-, 3-and 5-year OS probability in the nomogram. Every specific value of these factors was allocated a score on the points scale. By adding up these scores, the total score was calculated. The total points was used to estimate the 1-, 3-and 5-year survival probability for every individual patient.

Discussion
The survival of patients with de novo MBC is difficult to predict, because of the lack of prediction models for these patients. In this study, we developed a nomogram to visualize survival of de novo MBC patients identified from the SEER database. This model was validated, and the performance was evaluated. Calibration plots showed an optimal agreement between the observed risks and   nomogram to predict the risk of developing relapsed disease [20]; studies by S. R. Li et al. and Z. C. Xiong et al. combined the de novo MBC patients and those with relapsed disease [18,19]. However, many studies have shown that women with de novo MBC represent a group that is distinct from that of women with relapsed breast cancer [21][22][23] . Patients with de novo MBC usually have better survival than those developed from regional diseases. One hypothesis explaining the better outcome of de novo MBC than recurrent MBC is the use of adjuvant systemic therapy in patients with relapsed disease. Due to the selection of more resistant or aggressive clones during adjuvant therapy, the metastatic disease of recurrent MBC becomes more resistant to therapy. Thus, recurrent MBC patients should not be mixed together with de novo MBC patients. In our study, we only included de novo MBC patients; to our knowledge, it was the first nomogram to predict survival of patients with de novo MBC. MBC is a kind of heterogeneous diseases. Many factors affect the prognosis and therapeutic efficacy of drugs. The molecular subtype is a vital prognostic factor and serves as the cornerstone of treatment [6,24,25]. According to the expression condition of ER, PR and HER2, breast cancer can be divided into four subtypes-HR+/HER2-, HR−/HER2+, HR+/HER2+ and TNBC characterized by the absence of ER, PR and HER2. In our analysis, HR+/HER2-was the most common (57.6%) subtype among MBC patients, followed by HR+/HER2+ (19.0%) and TNBC (13.6%) while HR−/HER2+ was the least common (9.8%) subtype. In the nomogram, molecular subtype played a major role in the scoring system. TNBC subtype yielded the highest score, consistent with previous reports [5,24,26]. The site of distant metastasis was reported to be correlated with the survival of MBC patients. Patients with bone metastasis showed the best prognosis and those with brain metastasis showed the worst prognosis [5,[27][28][29]. The score distribution of metastasis in the nomogram showed consistent results. It also has been reported that among MBC patients, molecular subtype correlated tightly with the preferred metastatic site [5,27,30]. Even in patients metastatic to the same site, molecular subtype showed a significant prognostic role. Age at diagnosis, marital status and differentiation grade also had an impact on survival. In our analysis, we combined all these prognostic factors to construct the nomogram, in order to predict the survival of a specific patient with de novo MBC accurately and identify patients with favorable prognosis. Those at a low risk of mortality should be given aggressive multidisciplinary therapy.
MBC is considered incurable. Systemic therapy remains the mainstay of therapy [7]. Over the past 2 decades, survival of MBC patients has improved dramatically due to the development of target therapy and palliative care [31][32][33]. In our analysis, the 1-, 3-, and 5-year OS rates were 74.5, 45.3, and 28.2%, respectively. However, the prognostic role of primary tumor resection has not been determined. In this study, we found MBC patients benefited from surgery of the primary tumor. This finding was in agreement with conclusions reported in other retrospective studies [9,34,35]. Due to the selection bias existing in retrospective studies, the protective role of surgery couldn't be directly concluded. Prospective randomized clinical trials have investigated the role of primary tumor resection in MBC patients, and resulted in contradictory conclusions [36,37]. These results indicated that primary tumor resection did improve the survival of a subset of patients, but we have to determine who should receive primary tumor resection and when to administer the surgery.
There existed some limitations in this study. Firstly, it was a retrospective study and it was subject to all the inherent biases associated with this type of study design. Furthermore, some prognostic factors were not included in the SEER database, including the number of metastatic lesions, use of endocrine therapy and use of target therapy. Thirdly, the nomogram in our study was validated in the same population and such validation on model performance could be biased. Therefore, the predictive effect of the nomogram needs to be assessed carefully in other cohorts.

Conclusion
The developed nomogram reliably predicted OS in patients with de novo MBC and presented a favorable discrimination ability. Using this model, the role of primary tumor surgery and other significant prognostic factors in MBC patients could be estimated. This will guide surgical decision making in clinical practice, although the findings require additional validation. Availability of data and materials These data were publicly available for use in accordance with a limited use agreement for SEER research data: Surveillance, Epidemiology, and End Results (SEER) Program (https://seer.cancer.gov) SEER*Stat Database.
Ethics approval and consent to participate All procedures performed in studies involving human participants were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. We signed the 'Surveillance, Epidemiology, and End Results Program Data Use Agreement' in accordance with the requirement of using SEER database. Approval was waived by the local ethics committee, as SEER data is publicly available and de-identified.