A competing risk nomogram predicting cause-specific mortality in patients with lung adenosquamous carcinoma

Adenosquamous carcinoma (ASC) is an uncommon histological subtype of lung cancer. The purpose of this study was to assess the cumulative incidences of lung cancer-specific mortality (LC-SM) and other cause-specific mortality (OCSM) in lung ASC patients, and construct a corresponding competing risk nomogram for LC-SM. Data on 2705 patients with first primary lung ASC histologically diagnosed between 2004 and 2015 were extracted from the Surveillance, Epidemiology, and End Results (SEER) database. The cumulative incidence function (CIF) was utilized to calculate the 3-year and 5-year probabilities of LC-SM and OCSM, and a competing risk model was built. Based on the model, we developed a competing risk nomogram to predict the 3-year and 5-year cumulative probabilities of LC-SM and the corresponding concordance indexes (C-indexes) and calibration curves were derived to assess the model performance. To evaluate the clinical usefulness of the nomogram, decision curve analysis (DCA) was conducted. Furthermore, patients were categorized into three groups according to the tertile values of the nomogram-based scores, and their survival differences were assessed using CIF curves. The 3-year and 5-year cumulative mortalities were 49.6 and 55.8% for LC-SM and 8.2 and 11.8% for OCSM, respectively. In multivariate analysis, increasing age, male sex, no surgery, and advanced T, N and M stages were related to a significantly higher likelihood of LC-SM. The nomogram showed good calibration, and the 3-year and 5-year C-indexes for predicting the probabilities of LC-SM in the validation cohort were both 0.79, which were almost equal to those of the ten-fold cross validation. DCA demonstrated that using the nomogram gained more benefit when the threshold probabilities were set within the ranges of 0.24–0.89 and 0.25–0.91 for 3-year and 5-year LCSM, respectively. In both the training and validation cohorts, the high-risk group had the highest probabilities of LC-SM, followed by the medium-risk and low-risk groups (both P < 0.0001). The competing risk nomogram displayed excellent discrimination and calibration for predicting LC-SM. With the aid of this individualized predictive tool, clinicians can more expediently devise appropriate treatment protocols and follow-up schedules.


Background
Adenosquamous carcinoma (ASC), a rare histological subtype of lung cancer, accounts for less than 3% of all lung cancer cases [1][2][3][4]. In general, ASC is defined as a carcinoma with an adenocarcinoma (ADC) component and a squamous cell carcinoma (SCC) component each exceeding 10% of the entire tumour [5]. There is a substantial difference between ASC and other histological subtypes of lung cancer regarding clinicopathological characteristics. ASC patients are more likely to present with a larger tumour size, a higher frequency of lymphatic/pleural invasion, and a worse histological grade than ADC and SCC patients [3,6]. In terms of survival, ASC patients have an unfavourable prognosis compared to ADC and SCC patients [3,6,7].
The majority of ASC patients are diagnosed at an age over 60 years [3,6,7], and elderly patients tend to have a higher prevalence of comorbidities than younger patients [8]. In addition, the presence of comorbidities has been suggested to be a strong predictor of other causespecific mortality (OCSM) in various cancers [9][10][11]. Considering the high risk of OCSM, it is essential to take OCSM into account when performing survival analysis for lung ASC. However, in the presence of competing events (such as OCSM), a traditional Cox proportional hazards model is no longer suitable, as it ignores the existence of competing risks, which may inevitably overestimate the incidence of cancer-specific mortality [12]. In this context, the competing risk model is superior to the conventional Cox model because it takes into consideration competing events and can differentiate between the effects of therapy and risk factors on specific events [12,13]. However, to date, there are no studies that have adopted a competing risk model to examine the factors influencing the prognosis of patients with lung ASC.
In addition, nomograms, which provide a visual display of a linear prediction model, can be used to calculate the individual risk probabilities of a clinical event based on the predictive variables in the graph [14]. Because of their usefulness, nomograms have been extensively applied in various cancers, such as non-small-cell lung cancer (NSCLC) [15], hepatocellular carcinoma [16], and breast cancer [17]. However, to our knowledge, there are no studies using a competing risk regression model to develop a nomogram to predict the survival of lung ASC patients.
Therefore, a competing risk analysis was performed to determine the predictive factors for lung cancer-specific mortality (LC-SM) in patients with lung ASC. We developed a nomogram to offer clinicians a quantitative means to assess the individual cumulative incidences of LC-SM to improve clinical decision making.

Data sources
Data on patients with first primary lung ASC histologically diagnosed between 2004 and 2015 were extracted from the Surveillance, Epidemiology, and End Results (SEER) registries (1975-2016 dataset). The study population comprised patients with the International Classification of Diseases for Oncology, Third Edition (ICD-O-3) site code C340-C349 and histological code 8560/3. The study time span was set from 2004 to 2015 on the basis of the first year of the American Joint Committee  on Cancer (AJCC) 6th edition (2004+) and a minimum of one-year follow-up. Related demographic and clinicopathological variables were collected, including age, sex, race, marital status, tumour laterality, tumour site, histological grade, TNM stage, surgical treatment, survival time, and causes of death. We excluded patients with any unknown variable values mentioned above. Age was categorized as < 65 years and ≥ 65 years; marital status was divided into unmarried and married; histological grade was classified into grade I (well differentiated), grade II (moderately differentiated), grade III (poorly differentiated), and grade IV (undifferentiated); and causes of death were categorized as alive, LC-SM and OCSM. As no radiation and unknown radiation have been merged into "None/Unknown" since 2016, and there was a substantial heterogeneity in chemotherapy, we did not include radiation and chemotherapy as variables in the present study. The detailed screening process is displayed in Fig. 1.

Statistical analysis
Continuous variables with a normal distribution are expressed as the mean ± standard deviation (SD), and continuous variables with a skewed distribution are presented as the median (interquartile range, IQR). In the competing risk model, OCSM was regarded as a competing event for LC-SM. First, we computed the cumulative incidence function (CIF) for LC-SM and OCSM. Further subgroup analysis was carried out according to age, sex, race, marital status, tumour laterality, tumour site, histological grade, TNM stage, and surgery, and corresponding CIF curves were plotted for these variables. The significant differences in CIF values among subgroups were evaluated by Gray's test [18]. Second, for the purpose of developing a competing risk regression model for LC-SM, the data set was randomly split into a training cohort (2/3) and a validation cohort (1/3). In total, 1803 cases serving as the training cohort were employed for model development, and 902 cases serving as the validation cohort were used for model validation. Third, variables that were perceived as clinically relevant beforehand or considered significant in the univariate analysis (P < 0.1) were introduced into a stepwise competing risk regression model. Subsequently, the optimal regression model was fitted when incorporating the predictive variables selected by the stepwise regression procedure. Fourth, we calculated the subdistribution hazard ratio (SHR) of the included variables for LC-SM based on the multivariate competing risk model, and a nomogram on the basis of the coefficients from the model was developed. To evaluate the model performance, the concordance index (C-index) was utilized to estimate the predictive accuracy (discrimination), and calibration curves (agreement between the observed probability and predicted probability at a certain time point) were constructed to assess the calibration with the aid of the R package "riskRegression" [19]. We also performed tenfold cross validation for all data sets which were randomly partitioned into ten equal-sized subsamples [20]. Finally, decision curve analysis (DCA) was conducted to assess the clinical usefulness and net benefit of the competing risk model [21]. To determine whether the nomogram could successfully distinguish high-risk from lowrisk lung ASC patients, each patient's prediction score was derived according to the nomogram, and the patients were categorized into the high-risk, mediumrisk, and low-risk groups based on the tertile values of the risk scores. Subsequently, the corresponding CIF curves of the three groups were plotted for the training set and validation set, and the significant differences in CIF were assessed using Gray's test. All statistical analyses were carried out employing the R software version 3.5.2. A two-tailed P < 0.05 was considered statistically significant.

Results
The baseline characteristics of the whole study cohort are presented in Table 1. In general, a total of 2705 lung The median follow-up of the whole study cohort was 21 months (IQR: 8-52). In total, 1895 (70.1%) patients died throughout the whole follow-up period, of whom 1535 (81.0%) died due to lung cancer and 362 (19.0%) died due to non-lung cancer causes. The 3-year and 5year cumulative incidences of LC-SM and OCSM by different clinicopathological characteristics are displayed in Table 1, and the corresponding CIF curves are presented in Fig. 2. Overall, the 3-year and 5-year LC-SM rates were 49.6% (CI: 47.7-51.5%) and 55.8% (CI: 53.8-57.8%), respectively, while the 3-year and 5-year OCSM rates were 8.2% (CI: 7.1-9.2%) and 11.8% (CI: 10.5-13.1%). Subsequently, both univariate and multivariate competing risk models were adopted to evaluate the LC-SM of lung ASC patients. In univariate analysis, male sex, unmarried status, black race, main bronchus, advanced TNM stage, advanced histological grade, and surgical treatment were related to significantly higher incidences of LC-SM, whereas there were no significant differences for age and tumour laterality (Fig. 3). In multivariate analysis, age, sex, surgery, T stage, N stage, and M stage were independent predictive factors for LC-SM (Table 2). In detail, increasing age was associated with an increased probability of LCSM. Male sex was related to a significantly higher likelihood of LCSM (1. 26 A nomogram on the basis of the competing risk models was developed to calculate the 3-year and 5-year cumulative LC-SM probabilities (Fig. 4). For each patient, first locate the values of different variables on the corresponding rows and then draw vertical lines pointing to the "Points" row to obtain corresponding scores. For instance, for a male patient, by drawing a vertical line straight up to the "Point" row, we would obtain  Similarly, this process is performed for the other variables. By adding up these scores, a total score can be obtained and is located on the "Total Points" row. Subsequently, a vertical line can be drawn straight down to acquire the 3-year or 5-year cumulative death probabilities. For example, if the total score was 100, the corresponding 3-year and 5-year probabilities of LC-SM would be approximately 30 and 36%, respectively. The calibration curves accompanied by C-indexes are displayed in Fig. 5. As shown in Fig. 5, the calibration curves are close to the 45-degree diagonal line, indicating that the developed nomogram is well calibrated (good agreement between the observed mortality probability and the predicted mortality probability). Additionally, the 3-year and 5-year C-indexes for the nomogram predicting the probabilities of LC-SM were 0.83 (CI, 0.78-0.87) and 0.82 (CI, 0.73-0.90) for the training cohort, and 0.79 (CI, 0.75-0.84) and 0.79 (CI, 0.71-0.88) for the validation cohort, respectively, which indicated superb model discrimination. The ten-fold cross validation C-indexes are shown in Table 3. The adjusted 3-year and 5-year C-indexes were 0.81 (CI, 0.80-0.83) and 0.81 (CI, 0.80-0.83), respectively. Overall, the 3-year or 5-year C-indexes of the cross validation were almost equal to those of the training set or validation set, which indicated robust model performance.
The outcomes of DCA are shown in Fig. 6a, which shows that the clinical net benefit gained from the competing risk model was higher than that in the hypothetical non-screening or all-screening scenarios, when the threshold probabilities were within the range of 0.24-0.89 and 0.25-0.91 for 3-year and 5-year LCSM, respectively. According to the tertile values (117.1 and 180.5) of the nomogram-based scores derived from the training cohort, the patients were categorized into high-risk, medium-risk, and low-risk groups in the training cohort and validation cohort. As displayed in Fig. 6b-c, the high-risk group had the highest probabilities of LC-SM, followed by the medium-risk group and the low-risk group in the training cohort and validation cohort (both P < 0.0001). Therefore, when using the nomogram as a predictive tool, clinicians could successfully discriminate among different risk groups. Note: Adjusted C-indexes of the model were calculated using ten-fold cross validation Abbreviations: ASC adenosquamous carcinoma, CI confidence interval, LC-SM lung cancer-specific mortality

Discussion
In the present study, a competing risk analysis was performed to investigate the predictive factors for LC-SM in patients with primary lung ASC from the SEER database. Of the 2705 total patients, 1535 (81.0%) died from lung cancer and 362 (19.0%) died from non-lung cancer causes. The 5-year CIFs for LC-SM and OCSM were 55.8 and 11.8%, respectively. We constructed a nomogram, which functions as a simple and useful clinical tool, to predict the individual probabilities of LC-SM for lung ASC patients, and the nomogram was demonstrated to have excellent clinical usefulness. With regard to LC-SM, T stage, N stage, and M stage were significant independent predictive factors, unanimous with the eighth edition of the AJCC NSCLC staging system [22]. In addition, we also identified other important predictors, such as age, sex, and surgery, which have been incorporated into our nomogram. These predictors of an increased LC-SM, including advanced age, male sex, and surgery, have been proven in other studies [23,24]. For example, a recent retrospective study investigating NSCLC based on the SEER database found that advanced age and male sex were related to decreased lung cancer-specific survival, and any form of surgical resection conferred a decreased risk of LC-SM [23]. H Zhou et al. analysed data from patients with radically resected stage I NSCLC in the SEER database and discovered that advanced age and male sex were correlated with a higher risk of cause-specific death [24]. As the AJCC staging system does not include important risk factors (including age, sex and surgery), the nomogram we developed was more discriminative and a The x-axis represents the threshold probability, and the y-axis represents the net benefit. The black and red dotted oblique lines reflect the assumption that all patients die due to LC-SM, and the black horizontal dotted line reflects the assumption that no patients die due to LC-SM. The black and red solid lines represent the threshold probability range, within which utilizing the nomogram to predict the LCSM gains more benefit than the hypothetical treat-all or treat-none scenarios. b-c CIF curves with the P-value of Gray's test for the training cohort and validation cohort. Abbreviations: LC-SM: lung cancer-specific mortality; CIF: cumulative incidence function; DCA: decision curve analysis capable of providing a more accurate prognostic prediction for individual patients.
For the sake of patient counselling and clinical decision making, it is imperative to evaluate prognosis according to individual risk profiles. With the aid of a prognostic nomogram, clinicians can more expediently devise treatment protocols and follow-up strategies. Notably, competing risk nomograms have been developed for various cancers, such as nasopharyngeal carcinoma, breast cancer, gastrointestinal stromal tumours and melanoma [25][26][27][28]. However, as far as we know, this is the first study that constructed a competing risk nomogram based on a proportional subdistribution hazard model to predict the individual probabilities of LC-SM for lung ASC patients.
To assess the clinical usefulness of the nomogram, DCA was employed to determine whether the nomogram-based decisions could improve patients' survival outcomes. Our findings showed that using the nomogram to predict LC-SM added more benefits than either the hypothetical treat-all-patients or treat-none scenarios as long as the threshold probabilities were within the range of 0.24-0.89 and 0.25-0.91 for 3-year and 5-year LCSM, respectively. In addition, with the assistance of the nomogram, clinicians could successfully discriminate among different risk groups, thereby making wiser clinical decisions. Therefore, the developed nomogram can be extremely useful in the processes of clinical practice.
The major strengths of the present study are that it had a large enough sample size and adopted a competing risk model to perform survival analysis. In general, the SEER database, covering approximately 27.8% of the US population, offers a sufficiently large sample to investigate predictive factors and further develop a modelbased prognostic nomogram. Moreover, findings derived from the analysis based on the population-based database are more generalizable and representative than those from single-centre studies [29]. Moreover, the competing risk model fully takes into consideration any competing events, which renders the results more unbiased. Notably, the variables presented in the nomogram can be easily collected from routine medical records, so clinicians can more expediently predict cumulative death probabilities for lung ASC patients.
Undoubtedly, there are several limitations in this study. First, some known prognostic variables, such as cigarette smoking, chemotherapy, radiation therapy, and comorbidities, were not incorporated into the model. Thus, the nomogram only functions as a reference tool for clinicians to make clinical decisions. Further study is warranted to incorporate these variables into future research. Second, as the whole study population was from the US, the findings of the present study may not be generalizable to populations of other countries. Finally, although our model exhibits excellent performance in predicting the probabilities of LC-SM (with C-indexes fluctuating around 0.8), an external validation cohort including other patients is still necessary to demonstrate the model accuracy further.

Conclusions
In conclusion, this is the first study using a competing risk model to evaluate the cumulative incidence of LC-SM for patients with lung ASC. We further developed a competing risk nomogram, and the nomogram displayed an excellent discrimination and calibration. With the aid of this individualized predictive tool, clinicians can more expediently devise appropriate treatment protocols and follow-up schedules.