A nomogram model to predict death rate among non-small cell lung cancer (NSCLC) patients with surgery in surveillance, epidemiology, and end results (SEER) database
BMC Cancer volume 20, Article number: 666 (2020)
This study aimed to establish a novel nomogram prognostic model to predict death probability for non-small cell lung cancer (NSCLC) patients who received surgery..
We collected data from the Surveillance, Epidemiology, and End Results (SEER) database of the National Cancer Institute in the United States. A nomogram prognostic model was constructed to predict mortality of NSCLC patients who received surgery.
A total of 44,880 NSCLC patients who received surgery from 2004 to 2014 were included in this study. Gender, ethnicity, tumor anatomic sites, histologic subtype, tumor differentiation, clinical stage, tumor size, tumor extent, lymph node stage, examined lymph node, positive lymph node, type of surgery showed significant associations with lung cancer related death rate (P < 0.001). Patients who received chemotherapy and radiotherapy had significant higher lung cancer related death rate but were associated with significant lower non-cancer related mortality (P<0.001). A nomogram model was established based on multivariate models of training data set. In the validation cohort, the unadjusted C-index was 0.73 (95% CI, 0.72–0.74), 0.71 (95% CI, 0.66–0.75) and 0.69 (95% CI, 0.68–0.70) for lung cancer related death, other cancer related death and non-cancer related death.
A prognostic nomogram model was constructed to give information about the risk of death for NSCLC patients who received surgery.
The morbidity and mortality of lung cancer ranked the first in China and globally [1, 2]. Non-small cell lung cancer (NSCLC) accounts for about 75 to 80% of lung cancer patients, thus the treatment of NSCLC has been an urgent health issue worldwide.
Radical surgery is required for early stage and parts of locally advanced NSCLC patients . Survival of NSCLC patients after surgery varies greatly, and previous reported prognostic factors include age, tumor size, metastatic lymph node numbers, clinical stage, etc. [4,5,6] However, other factors such as ethnicity, surgical method, primary tumor location, anatomic sites, histological subtype, etc. remain controversial. Therefore, studies with larger sample data and more rigorous statistical method assessing this problem are still needed.
For the reason that some early stage NSCLC patients who received radical surgery may have relative long-term survival, several other causes of death may occur among NSCLC patients. But previous studies mainly focus on investigating prognostic factors for lung cancer related death, studies considering non-cancer related death are inadequate.
To better evaluate the prognosis of resected NSCLC patients, and therefore to further provide more optimal treatment strategies for these patients, we estimated the causes of lung cancer related, other cancer related, and non cancer related death among patients in a population based Surveillance, Epidemiology, and End Results (SEER) cohort using a innovative and validated nomogram model.
We collected data from the SEER database of National Cancer Institute in the United States . The data was obtained using the SEER* Stat. The North American Association of Central Cancer Registries (NAACCR) documented data items and codes . Primary cancer histology and site were coded by the 3rd edition of the International Classification of Diseases for Oncology (ICD-O-3).
Patients with lung tumors (site codes, C34.0-C34.9) were included in this study from the year 2004 to 2014. The following histologic codes were designated as NSCLC: 8010, 8012, 8013, 8014,8015, 8020,8021,8022,8031,8032, 8046, 8050–8052, 8070–8078, 8140–8147, 8250–8255, 8260, 8310,8323, 8430, 8480, 8481,8482, 8490, 8560, and 8570–8575. Patients who did not receive radical surgery or aged 18 years or younger were excluded. In accordance with the requirement of using SEER database , we obtained the data agreement. Figure 1 displayed the flow chart of patients’ selection procedure in this study. SEER database conducted the follow-up for all patients, and the information of patients’ follow-up time, survival status and survival time were all recorded. Therefore we could investigate the follow-up time and OS for these patients. In this study, the missing data that could not use to assess the survival status was eliminated before statistics.
Demographic and clinical variables adopted in the further analysis included age, gender, ethnicity, primary tumor location, anatomic sites, histological subtype, tumor extent, differentiation, clinical stage, tumor size, lymph node involvement, examined lymph node (ELNs), positive lymph node (PLNs), chemotherapy and radiotherapy. Categorical variables were grouped for clinical reasons, and the decisions regarding grouping were made before data analysis. Mean, medians and ranges were reported for continuous variables, as appropriate. Frequencies and proportions were reported for categorical variables.
The primary endpoint of this study was cause-specific survival. According to the COD code, we defined the cause of death into three groups: lung cancer related, other cancer related and non-cancer related. Cumulative incidence function (CIF) was used to illustrate death rate. The CIF was compared across groups by using Gray’s test . Fine and Gray competing risks proportional hazards regressions was performed to predict five- and ten-year probabilities of the three causes of death . For nomogram construction, two thirds of the patients were randomly assigned to the training data set (n = 31,415) and one third to the validation data set (n = 13,465). We used restricted cubic splines with three knots at the 10, 50, and 90% empirical quantiles to model continuous variables . A model selection technique based on the Bayesian information criteria was employed to avoid overfitting when establishing competing risk models (eTable S1) .
The performance of the nomogram included its discrimination and calibration was tested using the validation data set. Discrimination is the ability of a model to separate subject outcomes, which is indicated by Harrell C index [14, 15]. Calibration, which compares predicted with actual survival, was evaluated with a calibration plot. We used the validation set to compare the final reduced model-predicted probability of death with the observed 5 and 10-year cumulative incidence of death. The predictions were supposed to fall on a 45-degree diagonal line if the model was well calibrated. In addition, the bootstrapping technique was used for internal validation of the developed model based on 1000 resamples.
The R software (version 3.3.3; http:// www.r-project.org) was performed for all statisitcal analysis. We used R packages cmprsk, rms and mstate for modeling and developing the nomogram. The reported significance levels were all two-sided, with statistical significance set at 0.05.
A total of 44,880 NSCLC patients who received surgery from 2004 to 2014 were included in this study. Most patients were diagnosed at stage I (62%), were Caucasians (83.5%) and received lobectomy (82.9%). The median diagnostic age was 67 years. The median follow-up time was 31 months (IQR 12 to 61 months), and for still alive patients, the median follow-up time was 42 months (IQR 17–74 months). At last follow up, the death rate was 41.9%, with 12,958 patients (28.9%) died from lung cancer, 510 (1.1%) died from other cancers, and 5357 (11.9%) died from non-cancer causes. The most frequent other cancer death were resulted from miscellaneous malignant cancer (54.5%), brain and other nervous system (6.9%) and pancreas (3.5%) cancers. The most frequent non-cancer deaths were resulted from diseases of heart (28.3%), chronic obstructive pulmonary disease and associated conditions (19.8%) and cerebrovascular diseases (5.8%) (Table 1).
Lung cancer related, other cancer related and non-cancer related death probability were shown in eFigure S1, S2, S3 and S4. Diagnostic age, gender, ethnicity, anatomic sites, histologic subtype, differentiation status, clinical stage, tumor size, tumor extent, examined lymph node, surgery type, showed significant relationships with overall survival (P<0.001) (eTable S2). Five- and 10-year lung cancer related death probability increased with age, stage, tumor size, tumor extent, lymph node stage, positive lymph node numbers (P<0.001). Male patients had higher lung cancer-related death rate compared with female patients (P<0.001). Ethnicity, histologic subtype, anatomic sites of lung cancer, examined lymph node, differentiation status, surgery type, showed significant relationships with lung cancer related death probability (P< 0.001). Patients who received chemotherapy and radiotherapy had significant higher lung cancer related mortality for NSCLC patients with surgery but were associated with significant lower non-cancer related death rates (P<0.001) (Table 2).
Nomogram prognositc model
A nomogram model was established based on multivariate models of training data set. We could calculate the 5- or 10-year death rate by this nomogram prognositic model (Fig. 2). Schoenfeld−type residuals of a proportional sub distribution hazard model for lung cancer related deaths were shown in eFigure S5. In the validation cohort, the unadjusted C-index was 0.73 (95% CI, 0.72–0.74), 0.71 (95% CI, 0.66–0.75) and 0.69 (95% CI, 0.68–0.70) for lung cancer related death, other cancer related death and non-cancer related death. This indicated that the models are convincingly precise. Figure 3 illustrated the CIF plot calibration. Good coincidence between predicted and actual outcomes was observed because the points are close to the 45-degree line.
To our knowledge, this is the largest population based study establishing a novel nomogram prognostic model predicting lung cancer related death rate, other cancer related death rate, and non–cancer related death rate for NSCLC patients who received surgery in SEER database.
Recent studies showed that several factors including tumor size, lymph node metastasis, clinical stage, age, etc. were associated with long time survival for lung cancer patients with surgery. However, the results were heterogeneous for the reason that most studies evaluating the prognosis of NSCLC had relative short follow-up with limited sample size. Therefore larger sample data with more validated and rigorous statistical methods were required. Besides, the population-based SEER database could be used with the ability to assess this issue on a larger sample with long follow-up, which can effectively avoid biases. In this study, was collected a large population of 44,880 resected NSCLC patients in SEER database.
Moreover, to make the bias minimized, we used a novel and validated prognostic model. Nomogram has been considered as a trustworthy method to generate more accurate prediction of prognosis [16,17,18]. The performance of the nomogram may also have discrimination, thus calibration should be conducted using a validation data set. Our study showed, the unadjusted C-index was 0.73 (95% CI, 0.72–0.74), 0.71 (95% CI, 0.66–0.75) and 0.69 (95% CI, 0.68–0.70) for lung cancer related death, other cancer related death and non-cancer related death in the validation cohort. This indicated that the models are convincingly precise. Besides, our study showed good coincidence between predicted and actual outcomes because the points are close to the 45-degree line.
Our study showed 5- and 10-year lung cancer related death probability increased with age, stage, tumor size, tumor extent, lymph node involvement, positive lymph node numbers which were consistent with previous studies [3,4,5,6]. In our study, male patients had higher lung cancer-related death rate compared with female patients. Several studies have demonstrated that epidermal growth factor receptor (EGFR) - tyrosine kinase inhibitors (TKIs) could noticeably improve survival of EGFR positive mutation advanced NSCLC patients [19,20,21,22]. EGFR mutation is the most common gene mutation in Asian female lung adenocarcinoma patients, therefore the prognosis of female lung cancer patients might be better. Our study showed patients with radiotherapy were associated with a significantly higher lung cancer related death rate. Radiotherapy was always performed to patients with more aggressive stage or, mediastinal lymph node metastasis and these patients may originally have poor prognosis. However, the appropriate opportunity and indication of radiotherapy still need further investment.
Previous studies mainly focus on investigating lung cancer related survival for NSCLC patients, studies with concern of other causes of death are limited. In SEER database, the data of survival status, survival months, cause-specific death classification was available and death resulting from other cancer and non-cancer was also recorded. Therefore we could investigate calculate lung cancer related, other cancer related and non-cancer related death probability using these data. We divided cause of death into lung cancer related, other cancer related and non-cancer related. In our study, the most frequent non-cancer deaths were resulted from diseases of heart, chronic obstructive pulmonary disease and associated conditions, and cerebrovascular diseases. Therefore the complications of heart and respiratory system during treatment procedures require closer monitoring.
There were also some limitations in this study. First, some variables are not recorded in SEER database, such as disease progression time, specific chemotherapy regimens, etc. Besides, we did not use the 7th or 8th AJCC staging system in this study. We selected patients in the SEER database from 2004 to 2014. The 6th AJCC staging system was applied for all patients during the decade. But the 7th AJCC staging system had not been widely used before 2010. The 8th AJCC staging system was applied after 2017. Stage information from 2004 to 2010 could not be accessed when using the 7th or 8th AJCC staging system. For the huge sample size, re-classification of patients was impossible. But there was no significant difference between stage I to stage III patients according to different staging systems, which had no significant impact on the study results.
A novel prognostic nomogram model using a large population based database was constructed to predict mortality for NSCLC patients who received surgery. This validated prognostic model may be helpful to give information about the risk of death for these patients.
Availability of data and materials
Data files were downloaded directly from the SEER website.
International Classification of Diseases for Oncology
Large cell carcinoma
North American Association of Central Cancer Registries
Non-small cell lung cancer
Surveillance, Epidemiology, and End Results
Squamous cell carcinoma
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66:7–30.
Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin. 2016;66:115–32.
Wood DE. National Comprehensive Cancer Network: NCCN clinical practice guidelines in oncology: non-small cell lung cancer. Thorac Surg Clin. 2018;25(2):185.
Liang W, Zhang L, Jiang G, et al. Development and validation of a nomogram for predicting survival in patients with resected non-small-cell lung cancer. J Clin Oncol. 2015;33(8):861–9.
Won YW, Joo J, Yun T, et al. A nomogram to predict brain metastasis as the first relapse in curatively resected non-small cell lung cancer patients. Lung Cancer. 2015;88(2):201–7.
Zhang J, Gold KA, Lin HY, et al. Relationship between tumor size and survival in non -small cell lung cancer (NSCLC): an analysis of the surveillance, epidemiology, and end results (SEER) registry. J Thorac Oncol. 2015;10(4):682–90.
Surveillance, Epidemiology, and End Results (SEER) Program (www.seer. cancer.gov) Research Data (1973-2014), National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released March 2017, based on the March 2017 submission. www.seer.cancer.gov. Accessed 23 March 2017.
Wingo PA, Jamison PM, Hiatt RA, et al. Building the infrastructure for nationwide cancer surveillance and control--a comparison between the National Program of cancer registries (NPCR) and the surveillance, epidemiology, and end results (SEER) program (United States). Cancer Causes Control. 2003;14:175–93.
Surveillance, Epidemiology, and End Results Program. Data use agreement for the 1973-2014 SEER Research Data File. https://seer.cancer.gov/data/access.html#agreement. Accessed Mar 23, 2017.
Gray RJ. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat. 1988;16:1141–54.
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94:496–509.
Harrel FE. Regression modeling strategies: general aspects of fitting regression models. New York: Springer; 2001.
Iasonos A, Schrag D, Raj GV, et al. How to build and interpret a nomogram for cancer prognosis. J Clin Oncol. 2008;26:1364–70.
Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.
Wolbers M, Koller MT, Witteman JC, et al. Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009;20:555–61.
Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.
Han DS, Suh YS, Kong SH, et al. Nomogram predicting long-term survival after d2 gastrectomy for gastric cancer. J Clin Oncol. 2012;30:3834–40.
Karakiewicz PI, Briganti A, Chun FK, et al. Multi-institutional validation of a new renal cancerspecific survival nomogram. J Clin Oncol. 2007;25:1316–22.
Maemondo M, Inoue A, Kobayashi K, et al. Gefitinib or chemotherapy for non-small-cell lung cancer with mutated EGFR. N Engl J Med. 2010;362:2380–8.
Mitsudomi T, Morita S, Yatabe Y, et al. Gefitinib versus cisplatin plus docetaxel in patients with non-small-cell lung cancer harbouring mutations of the epidermal growth factor receptor (WJTOG3405): an open label, randomised phase 3 trial. Lancet Oncol. 2010;11:121–8.
Zhou C, Wu YL, Chen G, et al. Erlotinib versus chemotherapy as first-line treatment for patients with advanced EGFR, mutation-positive non-small-cell lung cancer (OPTIMAL, CTONG-0802): a multicentre, open-label, randomised, phase 3 study. Lancet Oncol. 2011;12:735–42.
Rosell R, Carcereny E, Gervais R, et al. Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial. Lancet Oncol. 2012;13:239–46.
We acknowledge SEER*Stat team for providing patients’ information.
This study was funded by Science Foundation of Peking University Cancer Hospital (18–02); Capital Clinical Characteristics and Application Research (Z181100001718104); Beijing Excellent Talent Cultivation Subsidy Young Backbone Individual Project (2018000021469G264). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
We signed the ‘Surveillance, Epidemiology, and End Results Program Data-Use Agreement’ in accordance with the requirement of using SEER database. Therefore, we obtained the data using permission and could download data from the SEER database.
Consent for publication
Each author satisfies the criteria for authorship. No individual person’s data was applicable in this manuscript.
The Authors Declared No Potential Conflicts of Interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proportional Subdistribution Hazards Models of Death Rate. eTable S2. Prognostic factors for overall survival by multivariable Cox regression. eFigure S1. Lung cancer related, other cancer related and non-cancer related death rates by (A) age, (B) gender, (C) race and (D) primary tumor location. eFigure S2. Lung cancer related, other cancer related and non-cancer related death rates by (E) Anatomic sites, (F) histology subtype, (G) differentiation and (H) clinical stage. eFigure S3. Lung cancer related, other cancer related and non-cancer related death rates by (I) tumor size, (J) tumor extent, (K) lymph node involvement and (L) examined lymph nodes. eFigure S4. Lung cancer related, other cancer related and non-cancer related death rates by (M) positive lymph nodes, (N) surgery, (O) chemotherapy and (P) radiotherapy. eFigure S5. Schoenfeld−type residuals of a proportional subdistribution hazard model for lung cancer related deaths.
About this article
Cite this article
Jia, B., Zheng, Q., Wang, J. et al. A nomogram model to predict death rate among non-small cell lung cancer (NSCLC) patients with surgery in surveillance, epidemiology, and end results (SEER) database. BMC Cancer 20, 666 (2020). https://doi.org/10.1186/s12885-020-07147-y