A modified subclassification to evaluate the survival of patients with N3 gastric cancer: an international database study

Background The eighth TNM classification for gastric cancer categorizes N3 as N3a and N3b in the final pathologic stage. The cutoff for N3a/N3b is defined as 15 metastatic lymph nodes, but the rationale for this cutoff remains unclear. This study aimed to determine the optimal N3a/N3b cutoff and evaluate its prognostic significance. Methods An international database was constructed by combining data from patients with N3 gastric cancer and complete five-year follow-up data from the Surveillance, Epidemiology, and End Results program database (n = 1833) and the Fujian Medical University Union Hospital database (n = 920) (total n = 2753). A log-rank test was performed to determine the optimal N3a/N3b cutoff, and its prognostic significance was confirmed in a two-step multivariate analysis and compared to that of the eighth TNM. Results A cut-point analysis performed at each metastatic lymph node number identified the greatest survival difference between N3a and N3b at 13 metastatic lymph nodes (χ2 = 157.671, P = 3.65 × 10− 36). In patients with 14–15 metastatic lymph nodes, prognoses were significantly worse than those in patients with 7–13 metastatic lymph nodes (P < 0.001) but similar to those in patients with > 15 metastatic lymph nodes (P = 0.078). Therefore, patients with 14–15 metastatic lymph nodes were incorporated into a modified N3b classification. In the two-step multivariate analysis, the eighth N3 classification fell out of the model, while the modified N3 classification remained intact (HR 1.51, P < 0.001). Further analyses demonstrated that the modified TNM classification had superior homogeneity, discriminatory ability, and gradient monotonicity compared to the eighth TNM classification. Conclusions For improved prognostic stratification, we recommend adjusting the cutoff for subclassification of N3 gastric cancer to 13 metastatic lymph nodes. Electronic supplementary material The online version of this article (10.1186/s12885-018-5187-7) contains supplementary material, which is available to authorized users.


Background
Two major classification systems are used for gastric cancer staging. These include the Japanese Classification of Gastric Cancer (JCGC) and the Union for International Cancer Control/American Joint Committee on Cancer (UICC/AJCC) TNM classification system. In previous decades, the ability to accurately stage gastric cancer has continuously improved, and N3 staging has accordingly undergone several revisions. According to the number of metastatic lymph nodes (MLNs), the fifth UICC/AJCC TNM classification defined N3 as > 15 MLNs [1]. However, the early JCGC defined N3 with metastases to Group 3 lymph nodes (LNs) according to the location of the MLNs relative to that of the primary tumor [2,3]. In 2010, the 14th JCGC was unified with the seventh UICC/ AJCC classification system, in which the definition of N3 was modified to > 7 MLNs, and N3 was divided into N3a (7)(8)(9)(10)(11)(12)(13)(14)(15) and N3b (> 15 MLNs). However, the definitions of the N3 subclasses (N3a and N3b) were not different with regard for the final pathological stage [4,5].
The seventh UICC/AJCC classification for LN metastases has been described as reliable for predicting prognoses in gastric cancer in many studies. However, the N3 classification provided in this edition is controversial [6,7]. Sano et al. found that the prognoses of patients with stage N3a and N3b from 15 different countries were distinct [8]. Hence, the eighth TNM classification detailed N3 as N3a and N3b in the final pathological stage of the disease [9].
The cutoff used to distinguish N3a/N3b in the current TNM classification was derived from a German retrospective study that was performed 20 years ago [10]. The study sample was ethnically monotonous and small; therefore, the rationale for adopting 15 MLNs as the cutoff for the two groups remains unclear. Determining an accurate and reasonable N stage is important when planning a treatment strategy, determining a prognosis, evaluating the results of treatment and exchanging information [11]. This study was designed to determine the optimal cutoff for distinguishing N3a/N3b and evaluate its prognostic significance using a newly created combined international dataset.

Patients
An international gastric cancer dataset including Eastern and Western populations was established by combining the Surveillance, Epidemiology, and End Results (SEER) database (http://seer.cancer.gov/) with the Fujian Medical University Union Hospital (FMUUH) database. This study was a retrospective analysis of 1833 patients (SEER) plus 920 patients (FMUUH) with N3 gastric cancer who underwent a gastrectomy between January 1988 and December 2008 and between January 1995 and December 2011, respectively.
The clinicopathological features obtained for this study included age, gender, ethnicity, histology, tumor size, tumor-node-metastasis (TNM) stage, type of surgery, the number of LNs examined, and the number of MLNs. The patients were classified into age groups of < 65 and ≥ 65 years old based on the WHO definition of "elderly" [12]. Ethnicity was classified as White, Asian, Black, Hispanic and Native American. The tumors were divided by size into tumors that were ≤ 60 mm and > 60 mm in diameter. The type of surgery included proximal/ distal gastrectomy and total gastrectomy. Tumor staging was determined based on the eighth edition of the UICC/AJCC TNM classification [9].
Patients in FMUUH were followed up every three months for two years following surgery and then every six months for the next 3-5 years. The majority of patients routinely underwent laboratory tests, chest radiography, abdominopelvic ultrasonography or computed tomography, and annual gastroscopy. Overall survival refers to the period from the day of the operation to the date of death or last follow-up (SEER: December 2013 and FMUUH: December 2016). All survivors were followed up for more than five years.
Statistical analysis SPSS 18.0 (SPSS Inc., Chicago, IL) and STATA12.0 statistical software were used to analyze the data. To compare the clinicopathological characteristics between patients in the SEER and FMUUH databases, the χ 2 test was performed to analyze categorical variables, while unpaired continuous variables were analyzed using the Mann-Whitney U test. To determine the best cutoff for distinguishing N3a/N3b, we evaluated the ability of prognostic stratification at each MLN count value using the magnitude of the log-rank test χ 2 statistic [13]. The cutoff that appeared to provide the greatest actuarial survival difference between the resulting subgroups was verified by X-tile software [14]. Survival curves were constructed according to the Kaplan-Meier method, and a log-rank test was used to determine whether significant differences were present among survival curves. We used a two-step multivariate analysis to evaluate the validity of the modified N3 classification (mN3) [15]. In the first step, the prognostic factors identified in the univariate analysis were incorporated into the multivariate analysis with the eighth N3 staging criteria but excluding the mN3 criteria; in the second step, the eighth N3 and mN3 staging criteria were simultaneously included in the multivariate analysis. Finally, to compare the prognostic performance of the eighth TNM and mTNM systems, we performed a likelihood ratio χ 2 test to evaluate homogeneity within the two TNM classifications [16]. The discriminatory ability and gradient monotonicity of the two classifications were estimated using a linear trend χ 2 test [16]. Additionally, the discriminatory ability of the two TNM classifications was assessed using the Akaike information criteria (AIC) test (e.g., a smaller AIC score demonstrates a model that is more appropriate for evaluating prognosis) [16]. In all analyses, differences with P values < 0.05 were considered statistically significant.

Comparison of characteristics of patients in the SEER and FMUUH databases
In the SEER database, 1833 patients met the screening criteria. Data from the FMUUH gastric cancer database were obtained using the same methods, which yielded 920 patients (Additional file 1: Figure S1). In all, the two databases included 2753 N3 patients. The median follow-up period for all patients was 19 months (range 1-304 months). The median follow-up periods in patients in the SEER and FMUUH databases were 16 months (range 1-304 months) and 25 months (range 1-177 months), respectively.
The relevant patient and tumor characteristics obtained from the two databases in addition to the combined dataset are presented in Table 1. Significant differences were identified in mean age, gender frequency, the proportions of each ethnicity, tumor histology, median tumor size, pT and overall staging, the type of gastrectomy performed and the median number of LNs retrieved. There was no significant difference in pT-stage-T2, N3 stage, pIIIa and pIIIb stage tumors or in the median number of MLNs. The median number of MLNs in all of the patients was 14 (range 7-90). There was no significant difference in the median number of MLNs between patients in the SEER and FMUUH databases (P = 0.469). The median number of LNs examined in all patients was 26 (range, . In patients in the SEER and FMUUH databases, the median number of LNs examined was 24 (range 16-90) and 30 (range 16-80), respectively (P < 0.001).

Two-step multivariate analysis of overall survival in N3 patients
A univariate analysis and a two-step multivariate analysis of the included N3 patients were performed to further evaluate the validity of the mN3 classification (Table 3). In the univariate analysis, age, ethnicity, tumor size, pT classification, and the eighth N3 and mN3 classifications were significantly correlated with survival. In the first step of the multivariate analysis, age, ethnicity, tumor size, pT and the eighth N3 classification were demonstrated to be independent prognostic factors. However, when both the eighth N3 and the mN3 classifications were incorporated into the second step of the multivariate analysis, only the mN3 classification remained significant (HR 1.51, P < 0.001), while the eighth N3 classification disappeared (HR 1.13, P = 0.083).

Comparisons between the eighth TNM and mTNM classification systems
Using mN3 staging, we modified the eighth TNM system. Based on the eighth TNM system, there were 46, 84, 1313 and 1310 patients with stage IIB, IIIA, IIIB and IIIC gastric cancer, respectively, and the five-year survival rates in these patients were 50, 45, 27 and 15%, respectively. Based on the mTNM classification, there were 42,72,1094 and 1545 patients with stage mIIB, mIIIA, mIIIB and mIIIC disease, respectively, and the five-year survival rates in these patients were 55, 46, 29 and 15%, respectively. Figure 3 shows the overall survival curves of the included patients with N3 gastric cancer categorized according to the eighth TNM and mTNM classifications.
The performance of the eighth TNM and the mTNM systems was evaluated using the linear trend χ 2 , likelihood ratio χ 2 , and the AIC tests, as presented in Table 4. Homogeneity was better in the mTNM system than in the eighth TNM system (likelihood ratio χ 2 score, 102.796 vs 76.671) as were discriminatory ability and the monotonicity of the gradients (linear trend χ 2 score, 97.225 vs 74.252). Furthermore, the mTNM classification had a smaller AIC score (32,195.19 vs 32,351.28), indicating optimum prognostic stratification (a smaller AIC score demonstrates a model that is more appropriate for evaluating prognosis).

Discussion
Since the first TNM classification system for gastric cancer was applied in 1968 [17], the definition of N3 gastric cancer has been updated and revised several times. The third edition of the UICC/AJCC TNM system defined N3 as para-aortic or hepatoduodenal node metastasis [18]. The criterion was based on intraoperative clinical observations, which were associated with a high level of subjectivity; therefore, the fourth TNM classification dropped the N3 category and reclassified these patients as M1 [19]. Investigators later found that the number of MLNs is a good indicator of the extent of LN metastases [20,21]. Thus, based on the number of MLNs, the fifth and sixth TNM classification defined N3 as > 15 MLNs [1,22]. In addition to the UICC/AJCC TNM system, the JCGC classification is another internationally authoritative classification for gastric cancer. The 13th JCGC system divided regional LN stations into three tiers based   Log rank test on the location of the MLNs relative to the primary tumor and classified N3 as metastases in Group 3 LNs [3]. Numerous studies have confirmed that basing N classification on the number of MLNs is superior to using anatomical N staging in terms of objectivity, feasibility, reproducibility and prognostic accuracy [23,24]. Hence, in the 14th JCGC, the anatomical-based N stage was altered to the numeric N stage used in the seventh UICC/AJCC classification [4]. In the seventh TNM system, N3 was defined as ≥7 MLNs and split into N3a and N3b based on the cutoff value used for N2/N3 in the fifth/sixth TNM system [5]. During the formulation of the seventh TNM classification, there were no convincing data showing that classifying cases as N3a and N3b had a significant impact on survival in Western patients. Accordingly, the N3 subgroup failed to be an individual determinant of the final TNM stage in this edition [8].
In all staging systems used in gastric cancer, N3 gastric cancer is recognized as an advanced gastric cancer with nodal metastases. Patients with N3 gastric cancer account for a large proportion of patients in China and the United States [25]. Having an accurate stratification for N3 patients would be conducive to making individualized treatment strategies in patients with different illness statuses and improve the prognoses of these patients. Although the seventh TNM classification is widely acknowledged and used, the validity of regarding N3 as a single category during stage grouping remains controversial [6,26]. Many researchers have suggested that N3b tumors are associated with worse outcomes than N3a tumors and that a N3 subclassification should therefore be used for final staging [27,28]. In addition, Komatsu et al. found that a positive LN ratio was useful for stratifying prognoses and evaluating the extent of local tumor clearance in patients with N3 gastric cancer [29]. However, one of the drawbacks of using the LN ratio is that there are no standardized categories for this metric in the literature, and this impedes the spread and application of the LN ratio. Considering usability and reproducibility, the eighth TNM system still uses a numeric N staging system [9]. Moreover, based on a survival analysis of 25,411 patients with gastric cancer, the International Gastric Cancer Association separated N3 into N3a and N3b in their final pathologic staging analysis [8]. Our data also indicate that N3a and N3b may represent diseases with differing severity, as the two groups show significant prognostic differences. However, the overall survival rates of N3a and N3b patients in the current study are distinct from that in previous investigations. This may be due to differences in ethnic composition, the period of patient enrollment or the treatment outcomes of the two datasets. Currently, the presence of 15 MLNs is defined as the cutoff for N3a/N3b gastric cancer. This system was based on a retrospective study of 477 patients performed by the German Gastric Cancer Study Group [10]. This study, which was performed 20 years ago, has some limitations, including the fact that it was ethnically homogenous and contained a small sample of patients with > 7 MLNs (219 patients). It is therefore unclear whether a cutoff of 15 MLNs is appropriate for both Western and Eastern patients with N3 gastric cancer. Although many studies have examined the cutoff for LN staging in gastric cancer, few scholars have discussed the rationality of the existing N3a/N3b cutoff [30]. For the first time, we have specifically analyzed and determined the optimal subclassification of patients with N3a/N3b gastric cancer. We developed an international dataset consisting of 2753 Western and Eastern patients with N3 gastric cancer and used the log-rank test to demonstrate that 13 MLNs is the optimal MLN cutoff value for distinguishing N3a/N3b gastric cancer. Accordingly, we proposed a modified N3 classification system and performed a two-step multivariate analysis to verify that the prognostic significance of this mN3 classification was superior to that of the eighth N3 classification system. Importantly, we show that homogeneity, discriminatory ability, and the monotonicity of gradients is better in the mTNM system than the eighth edition system. We suggest that the superior results of prognostic assessments obtained using the mTNM system are attributable to its improved N3 staging accuracy. An ideal staging system should also be universally applicable. Gastric cancers exhibit differences in biological behavior in addition to the extent of surgery required for treatment and the pathological diagnosis of examined LNs; therefore, there are remarkable differences in stage-specific outcomes between Eastern and Western patients [8,31]. The success of the current study lies in the statistical power provided by the merger of large databases from two different countries. In this study, we found that there were significant differences between the SEER and FMUUH datasets in age, gender, ethnicity, histology, tumor size, T and overall staging, the type of surgery and the number of LNs examined. Thus, establishing an international database not only increases the numbers available for analysis but also significantly enhances the representativeness of the mN3 staging system proposed in the present study [32]. In addition, only patients in whom > 15 LNs were retrieved and complete five-year follow-up data were available were included, and this ensures the accuracy of the results obtained in this study.
Some shortcomings of this study should be noted. Because incomplete information was available regarding adjuvant therapy in the SEER database, the effects of adjuvant treatment were not evaluated. In addition, the study group cases were accessioned for 20 years, during which time diagnostic methods and treatment strategies have changed, and this also limits the strength of our observations.