Patient inclusion
Patients diagnosed with ECS between 2004 and 2015 were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database (SEER*Stat version 8.3.8). For data collection, we limited Primary Site: the International Classification of Diseases for Oncology, third edition (ICD-O-3) C54.1. And select only malignant cancers and known age. In total, 99,177 records were collected.
The inclusion criteria including: (1) patients diagnosed with ECS between 2004 and 2015; (2) patients with a histologic diagnosis of ECS (ICD-O-3:8930 to 8999); (3) patients who were 18 years old or older at diagnosis; (4) patients with regional nodes resection and examined after surgery. The exclusion criteria ruled out patients with inadequate information on race, tumor size, tumor extension, the seventh edition of the AJCC stage, patients with inadequate information on LNs (including examined LNs and positive LNs); and absent information on survival months or cause of death. Finally, based on the aforementioned criteria, a total of 715 patients were included and the data process flowchart was presented in Fig. 1. Afterwards, the patients assigned to the training cohort and the validation cohort with a portion of 7:3, using a random sampling method.
Characteristics
The data of clinical characteristics including year of diagnosis, age, race, metastatic status, histologic grade, tumor size, cause of death, peritoneal cytology status, the seventh edition of the AJCC staging system, the total amount of lymph nodes retrieved, the amount of metastatic lymph nodes, survival time, and survival status were collected from the SEER database. The original staging information of ECS in the SEER database is the seventh edition of the AJCC staging system. On the basis of the 2009 FIGO staging system, we transformed the seventh edition of the AJCC staging system to 2009 FIGO in this study. PLNN represents the numbers of positive lymph nodes. LNR is the ratio of the number of positive LNs to the total number of resected LNs. LODDS, defined as the logarithm of the ratio of the number of positive and negative LNs.
The main endpoint was overall survival (OS) rate which was calculated from the date of diagnosis to the date of death from any cause. Optimal cutoff values were determined using X-tile software. Based on the optimal cut-off value, PLNN, LNR, and LODDS was calculated into categorized variables. Tumor size was divided into ≤58 mm, and > 58 mm groups. PLNN was classified into two group: namely PLNN1 (=0) and PLNN2 (> 0). LNR was divided into two categories, namely LNR1 (≤0.03448276) and LNR2 (> 0.03448276). The LODDS was divided into two subgroups, namely LODDS1 (LODDS≤ − 0.9199705), LODDS2 (LODDS> − 0.9199705).
We obtained approval to access the SEER of the National Cancer Institute in the United States using the reference number 20256-Nov2019.
Statistical analysis
Development of the model
Relations to OS were evaluated with a univariable analysis according to the Kaplan–Meier approach and using the log-rank test to assess statistically significant differences among groups. To predict 1-, 3- and 5-year OS, a multivariate cox proportional hazards model was performed, which included the relevant predictors in univariate analysis (P < 0.1) (Table 2). The multivariate analysis was applied to generate the nomogram based on the R software. We assessed the predictive performance of the nomogram by evaluating the concordance index (C-index), the area under the receiver operating characteristic (ROC) curve (AUC), the Akaike information criterion (AIC) and calibration plots (comparing the survival probability predicted by the nomogram with the observed value by Kaplan–Meier analysis). A smaller AIC value indicated a better model for predicting outcome. Backward stepwise selection was performed to determine independent covariates [12,13,14,15]. Variables entered into the model were age, tumor size, 2009 FIGO, LODDS and peritoneal cytology. Variables were eliminated from the model if their removal actually improved the overall quality of the model (as measured by AIC). Additionally, according to the total score of each patient in the training cohort by using the nomogram, all patients were divided into three prognostic groups (namely low-, intermediate-, and high-risk groups) with similar number of patients to establish a risk classification system. Kaplan-Meier curve and log-rank test were used to illustrate and compare the OS of patients in different risk groups.
Validation of the model
The nomogram was confirmed using the validation cohort of 216 patients. A bootstrap re-sampling method to obtain relatively unbiased estimates (1000 repetitions) was used for external validation. For each group of 1000 bootstrap samples, the model was refitted and tested against the observed sample to estimate the predictive accuracy and bias [6, 12, 13].
Additionally, decision curve analysis (DCA) assisted in confirming the threshold probability range of the nomogram, which was compared with the 2009 FIGO staging system. Besides, the predictive efficiency of PLNN, LNR, and LODDS were compared using the C-index, AIC, and AUC [12,13,14,15].
Descriptive statistics are described as mean ± standard deviation(SD)for continuous variables and number for categorical variables. A chi-square test was used for the analysis of all categorical data. The Kruskal–Wallis H test or Wilcoxon test was used for the analysis of continuous variables. Bonferroni-adjusted significance tests were applied for pairwise comparisons. The Kaplan–Meier method and the log-rank test were used to construct and compare the survival curves, respectively. Statistical analysis was carried out with SPSS (Statistical Package for the Social Sciences) for Windows, version 22, and R 3.6.3 software (http://www.r-project.org). A p < 0.1 was chosen as the criterion for removing a variable from the multivariate Cox proportional hazards model, and a p < 0.05 was considered significant for all other tests.