Skip to main content

Improvements to the gastric cancer tumor-node-metastasis staging system based on computer-aided unsupervised clustering



The Union for International Cancer Control (UICC) tumor-node-metastasis (TNM) classification is a key gastric cancer prognosis system. This study aimed to create a new TNM system to provide a reference for the clinical diagnosis and treatment of gastric cancer.


A review of gastric cancer patients’ records was conducted in The First Hospital of China Medical University and the Liaoning Cancer Hospital and Institute. Based on patients’ prognoses data, computer-aided unsupervised clustering was performed for all possible TNM staging situations to create a new staging division system.


The primary outcome measure was 5-year survival, analyzed according to TNM classifications. Computer-aided unsupervised clustering for all TNM staging situations was used to create TNM division criteria that were more consistent with clinical situations. Furthermore, unsupervised clustering for the number of lymph node metastasis in the N stage led to the formulation of a classification method that differs from the existing N stage criteria, and unsupervised clustering for tumor size provided an additional reference for prognosis estimates.


Finally, we developed a TNM staging system based on the computer-aided unsupervised clustering method; this system was more in line with clinical prognosis data when compared with the 7th edition of UICC gastric cancer TNM classification.

Peer Review reports


In the past 3 decades, both the Japanese and Union for International Cancer Control (UICC) tumor-node-metastasis (TNM) classification systems for gastric cancer have undergone several major changes [1]. The biggest difference between the 2 systems exists in the N stage division method [2]. However, in 2010, the UICC released the 7th edition of TNM classifications of gastric cancer that used the number of metastatic lymph nodes for N classification. This standard has now been adopted by the Japanese TNM [3]. However, the exact threshold values for division between the different N stages have become a critical issue.

In clinical practice, other independent clinical or pathological features can directly or indirectly predict patient survival [4,5,6,7,8,9]. For example, tumor size, although closely related to the T stage, remains an independent prognosticator in patients with gastric cancer. Therefore, the threshold tumor size and its effect on prognosis need to be evaluated to help clinicians determine patient prognosis more accurately.

Importantly, although TNM staging has been revised several times, in clinical practice, there is often a marked difference in the prognoses of patients with the same TNM stage, which might be owing to heterogeneity between patients of different ethnic backgrounds, the evolution of the biological behavior of gastric cancer, and other factors [10]. Moreover, among patients with a poor prognosis, there are those who achieve long-term survival. Therefore, a more accurate division of the TNM stages is needed to determine patient prognoses, comprehensive treatment planning, and other disease management aspects [11,12,13].

To resolve the problems mentioned above and develop a system for improved prognostic accuracy, we summarized information obtained from patients with gastric cancer who underwent treatment over the past 3 decades [14]. We conducted a precise enumeration of the optimal division points for clinical factors related to gastric cancer (e.g., age, tumor size, the number of lymph node metastases), and selected the optimal cut-off points. Data permutations were performed to obtain the final TNM staging system based on the principle of having smaller differences within groups and greater differences between groups. The postoperative 5-year overall survival rate was used as the comparison standard to account for the extensive duration of the study period. This study provided a reference for determining more scientific and accurate TNM stage division criteria, as well as threshold values for various factors that might influence gastric cancer prognosis.



We enrolled 2414 patients with histologically confirmed gastric cancer who underwent surgery at the Liaoning Cancer Hospital and Institute and China Medical University. All patients had complete medical records available.

All patients were followed-up by postal or telephone interviews. The last follow-up was conducted in December 2015, with a total follow-up rate of 91%. Clinical, surgical, and pathological findings, and all follow-up data were collected and recorded in the database.

The study protocol was approved by the Ethics Committee of The First Hospital of China Medical University and the Liaoning Cancer Hospital and Institute, and informed consent was obtained from all subjects. All methods were performed in accordance with the relevant guidelines and regulations.

Endpoints and follow-up

The primary endpoint was the 5-year survival. Overall survival time was calculated from the date of surgery until the date of death or last follow-up contact. Patient data were censored at the last follow-up when they were alive. Follow-up assessments were conducted every 6 months for the first 5 postoperative years, and every 12 months thereafter until death.

Computer-aided unsupervised clustering method

A precision enumeration was performed to determine the optimal division points for clinical factors related to gastric cancer (e.g., age, tumor size, the number of lymph node metastasis), and all possible division points were calculated to form a cycle. For each cycle, the log-rank test was used to derive the p-value between 2 points. At the end of each cycle, the minimum p-value cut-off point was selected as the optimal cut-off point.

Permutations were carried out for the 5 T stages, 4 N stages, and 2 M stages in TNM gastric cancer staging, i.e., a total of 5 × 4 × 2 = 40 groups. Log-rank test p-values between these groups were calculated; differences within groups were minimized, and those between groups were maximized by combining groups with greater p-values into a single unit, thereby, obtaining the 7 most optimal groups as the final TNM stages.

Statistical analyses

Kaplan-Meier survival curves were used to estimate 5-year overall survival. For univariate analyses, the prognostic factors of interest and the diagnosis period were covariates in the Cox regression model. Multivariate analyses were conducted using the Cox proportional hazards regression model to assess risk factors associated with survival. Two-sided p-values < 0.05 were considered statistically significant. Analyses were performed using SPSS software, version 23.0.



Patient characteristics are shown in Table 1. The median age of patients at gastric cancer onset was 57 years, and there were significantly more male patients compared with female patients. In most patients, the gastric cancer was located in the lower portion of the stomach and presented at an advanced stage. Almost 50% of the patients underwent radical surgery, with the scope of lymph node resection being based on D2 surgery. The results of the multivariate analyses of factors associated with survival are shown in Table 2. After adjusting for 16 variables, patient survival was significantly associated with tumor size, tumor site, gross appearance, T stage, N stage, TNM stage, hepatic metastasis, and peritoneum metastasis. Factors such as the surgical extent and joint organ removal also affected prognoses. Adjuvant chemotherapy and the diagnosis period affected the 5-year overall survival rates.

Table 1 Characteristics of population from the three periods (n = 2414)
Table 2 HR for death in population (n = 2414) —univariable and multivariable analysis

Computer-aided unsupervised clustering: tumor size

Patient’s tumor size and survival time were inputted on a dot plot (Fig. 1). After calculations, 5 cm and 9 cm were chosen as the optimal cut-off points, and tumor size was defined as S1 (< 5 cm), S2 (5–8 cm), S3 (≥9 cm), according to when the differences between the groups were maximized (Fig. 2, p < 0.001).

Fig. 1

Scatter distribution of tumor size vs. survival time in patients with gastric cancer

Fig. 2

Survival curves according to tumor size in patients with gastric cancer

Computer-aided unsupervised clustering: number of lymph node metastases

Patient number of lymph node metastases and survival time were inputted on a dot plot (Fig. 3). After calculations, 0, 5, and 15 were chosen as the optimal cut-off points and N stages were subdivided as N0 (n = 0), N1 (n = 1–4), N2 (n = 5–14), and N3 (n ≥ 15), according to when the differences between the groups were maximized (Fig. 4, p < 0.001).

Fig. 3

Scatter distribution of the number of lymph node metastases vs. survival time in patients with gastric cancer

Fig. 4

Comparison of survival curves for the clustered N stage and the UICC N stage

Computer-aided unsupervised clustering: TNM stage

Based on patients’ prognoses data, the computer-aided unsupervised clustering method was applied to re-cluster patients with different TNM stages. Clustering results and the number of patients in each group after clustering are shown in Table 3, which is also thought as the new TNM staging criteria. In the original 7th edition of the UICC gastric cancer TNM stages, there was an orderly arrangement of the different T, N, and M stages, which was disrupted after computer-aided unsupervised clustering.

Table 3 Comparison of the 7th UICC and the clustering TNM stage

Effect of TNM stage on prognosis predictions after unsupervised clustering

The significance of the differences between the various stages is shown in Table 4. When comparing each row, there was a significant difference between the classes in the clustered stages, making it superior to the UICC staging criteria. Survival rate curves for the 2 different staging methods are shown in Fig. 5. Compared with the UICC stages, which is the “7th UICC TNM stage”, the use of the computerized clustering method, which is the “clustering TNM stage”, resulted in a significant decrease in the differences between the groups for each stage, as well as for the different T and N stages (data not shown).

Table 4 Comparison of P values between each stage of UICC and the clustering TNM stage
Fig. 5

Comparison of survival curves of the clustered TNM stages and the UICC TNM stages

Because we performed clustering analysis on N stage in this study, the N stage of many patients was changed. We also introduced the clustering N stages of N0 (n = 0), N1 (n = 1–4), N2 (n = 5–14), and N3 (n ≥ 15) into the UICC TNM stage, which is “the UICC TNM stage based on the clustering N stage” in Fig. 6, and re-performed the unsupervised clustering for TNM stage, which is “the clustering TNM stage based on the clustering N stage”. Survival rate curves for the 2 different staging methods are shown in Fig. 6.

Fig. 6

Comparison of survival curves of the clustered TNM stages based on the clustered N stage and the UICC TNM stages based on the clustered N stage


In the past, when performing confirmation or exploratory TNM staging improvements, differences in survival were always compared between different stages by observer-determined divisions. Such methods could result in selection bias, thereby introducing problems in obtaining accurate staging for a particular patient population. However, in computer-aided unsupervised clustering, which is based on patient survival data, patients are clustered inversely. This ensures the accuracy of the patient population for each stage, produces the least amount of heterogeneity between patients, and maximizes survival differences between each stage. Regarding the degree of difference between the classes, although the UICC and Japanese staging criteria have significantly different p-values that are superior to the cluster staging method, as a whole, there is a greater degree of difference between classes in the cluster staging method. Neither the UICC nor Japanese criteria consider significant differences between groups within the classes. Rather, they take the groups with greater differences and divide them into a separate class. However, by analyzing the degree of difference between groups within classes, the cluster staging method divides the group with the lowest degree of difference into a separate class, thus creating a lesser degree of difference within classes, which is more in line with actual gastric cancer data.

After clustering the TNM stages, we found that there were more pre-IIIA stage patients compared with the UICC staging system, and there was a particularly significant increase in the number of patients with IA stage disease. This shows that in the past, judgments of a good prognosis may have been limited and pessimistic. Therefore, in some patients, prognosis might need to be revisited to formulate a more accurate and rational comprehensive treatment program. After clustering, the T1N1M0 and T1N2M0 patient classes were added to stage IA, which indicates that the invasion depth of gastric cancer might have a greater effect on patient prognosis compared with the extent of lymph node metastases. Furthermore, the adverse effects caused by lymph node metastases in these patients might be more easily controlled through comprehensive treatment.

By contrast, after clustering, there were significantly fewer patients with stage IV gastric cancer. This indicated that, for many patients, the prognosis might be more optimistic than previously considered. However, many of these patients were classified as having stage IIIC disease, which has a 5-year survival rate of < 10%.

Tumor size is directly related to invasion depth and is an independent prognosticator for gastric cancer. Although the existing gastric cancer staging systems do not take tumor size into consideration, we performed cluster analysis on tumor size based on survival data. The results revealed that in our database, 4 cm and 9 cm represented good tumor size threshold values. The adverse effects of a greater tumor size are caused by a greater invasion depth, more extensive lymph node metastases, and a greater possibility of distant metastases, although they might also be related to the need for a greater extent of gastric resection and the possibility of resection of adjacent organs. Furthermore, in the present study, the median tumor size was ~ 5 cm, indicating that significant improvements are needed regarding gastric cancer screening and early diagnosis. The majority of patients with gastric cancer are elderly and from rural areas, and the lack of timely and standardized treatments, in addition to poor compliance, remain significantly severe issues for interventions [15].

In 2010, the UICC and Japanese TNM staging systems came to an agreement on the divisions for N stage according to the number of lymph node metastases. In the present study, a cluster analysis of the number of lymph node metastases (0, 5, and 15 nodes), based on survival data, improved the distinction of patients’ prognoses compared with the existing classification systems. However, to maintain consistency with the existing UICC stages, when performing multivariate analysis, we did not use the cluster analysis division criteria for N stage and TNM stage analyses.

For cluster analysis according to age, 55 years was found to be optimum age for distinguishing patients’ prognoses. Further subgroup analysis including sex, revealed that in female patients, prognoses could not be divided based on significant differences in critical age values, whereas in male patients, the critical age was 53 years. Therefore, in male patients aged > 53 years, there was a significant difference in diagnosis compared with male patients aged < 53 years. The specific mechanism behind this prognostic difference remains unknown, but this phenomenon might provide clues regarding the pathogenesis of gastric cancer between the sexes.

Because the present study was retrospective, the reliability of the data would be inferior to that obtained in prospective clinical trials; therefore, appropriate TNM classification guidelines for gastric cancer, especially in the Chinese population, need to be studied further. Meanwhile, China is an expansive region where people from different areas have different economic circumstances and lifestyle habits, which has certain effects on the development, progression, and outcome of cancer. In the present study, most of our patients are from northeastern China, which is representative of the characteristics of gastric cancer patients in northeastern China to a certain extent, however, not patients in all of China. In future studies we will increase collaboration with hospitals in other regions to investigate staging methods more appropriate to Chinese patients and behavioral characteristics with respect to gastric cancer biology. Nevertheless, these findings provide a reference for the future improvement of gastric cancer TNM staging, accurate determination of gastric cancer prognoses, and improved implementation of more comprehensive treatments.


Compared with the existing TNM staging classification for gastric cancer, there was a greater difference between stage classes when using the computer-aided unsupervised clustering method. In addition, in the cluster staging method, groups with a lesser degree of difference were divided into separate classes, thereby creating a staging system that is more in line with actual gastric cancer data. In summary, in Chinese patients with gastric cancer, the cluster staging method was preferable over the UICC or Japanese TNM classification for determining prognosis regarding the degree of difference within classes or among groups within the classes.





The Union for International Cancer Control


  1. 1.

    Chae S, Lee A, Lee JH. The effectiveness of the new (7th) UICC N classification in the prognosis evaluation of gastric cancer patients: a comparative study between the 5th/6th and 7th UICC N classification. Gastric Cancer. 2011;14:166–71.

    Article  PubMed  Google Scholar 

  2. 2.

    Ramadori G, Triebel J. Nodal dissection for gastric cancer. N Engl J Med. 2008;359:2392. author reply 2393

    Article  PubMed  CAS  Google Scholar 

  3. 3.

    Santiago JM, Sasako M, Osorio J. TNM-7th edition 2009 (UICC/AJCC) and Japanese classification 2010 in gastric Cancer. Towards simplicity and standardisation in the management of gastric cancer. Cir Esp. 2011;89:275–81.

    Article  PubMed  Google Scholar 

  4. 4.

    Lu J, et al. Consideration of tumor size improves the accuracy of TNM predictions in patients with gastric cancer after curative gastrectomy. Surg Oncol. 2013;22:167–71.

    Article  PubMed  Google Scholar 

  5. 5.

    Luo Y, et al. Clinicopathologic characteristics and prognosis of Borrmann type IV gastric cancer: a meta-analysis. World J Surg Oncol. 2016;14:49.

    Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    GASTRIC (Global Advanced/Adjuvant Stomach Tumor Research International Collaboration) Group, Paoletti X, Oba K, Burzykowski T, Michiels S, Ohashi Y, Pignon JP, Rougier P, Sakamoto J, Sargent D, Sasako M, van Cutsem E, Buyse M. Benefit of adjuvant chemotherapy for resectable gastric cancer: a meta-analysis. JAMA. 2010;303(17):1729–37.

    Article  Google Scholar 

  7. 7.

    Fujita T. Gastric cancer. Lancet. 2009;374(9701):1593–4. author reply 1594-5

    Article  PubMed  Google Scholar 

  8. 8.

    Brower V. Modified gastric cancer chemotherapy: more effective, less toxic. Lancet Oncol. 2015;16(16):e590.

    Article  PubMed  Google Scholar 

  9. 9.

    Cristescu R, Lee J, Nebozhyn M, Kim KM, Ting JC, Wong SS, Liu J, Yue YG, Wang J, Yu K, Ye XS, Do IG, Liu S, Gong L, Fu J, Jin JG, Choi MG, Sohn TS, Lee JH, Bae JM, Kim ST, Park SH, Sohn I, Jung SH, Tan P, Chen R, Hardwick J, Kang WK, Ayers M, Hongyue D, Reinhard C, Loboda A, Kim S, Aggarwal A. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat Med. 2015;21(5):449–56.

    Article  PubMed  CAS  Google Scholar 

  10. 10.

    Shah MA, Ajani JA. Gastric cancer--an enigmatic and heterogeneous disease. Jama. 2010;303:1753–4.

    Article  PubMed  CAS  Google Scholar 

  11. 11.

    Markar SR, Wiggins T, Ni M, Steyerberg EW, Van Lanschot JJ, Sasako M, Hanna GB. Assessment of the quality of surgery within randomised controlled trials for the treatment of gastro-oesophageal cancer: a systematic review. Lancet Oncol. 2015;16(1):e23–31.

    Article  PubMed  Google Scholar 

  12. 12.

    Nishida T, Doi T. Improving prognosis after surgery for gastric cancer. Lancet Oncol. 2014;15(12):1290–2.

    Article  PubMed  Google Scholar 

  13. 13.

    Shen L, Shan YS, Hu HM, Price TJ, Sirohi B, Yeh KH, Yang YH, Sano T, Yang HK, Zhang X, Park SR, Fujii M, Kang YK, Chen LT. Management of gastric cancer in Asia: resource-stratified guidelines. Lancet Oncol. 2013;14(12):e535–47.

    Article  PubMed  Google Scholar 

  14. 14.

    Zhang H, et al. Survival trends in gastric cancer patients of Northeast China. World J Gastroenterol. 2011;17:3257–62.

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Herrero R, Parsonnet J, Greenberg ER. Prevention of gastric cancer. Jama. 2014;312:1197–8.

    Article  PubMed  CAS  Google Scholar 

Download references


This work was supported in part by China National Natural Science Foundation (61402089, 61472069, 81402384 and 81572609) for the follow-up, data analysis and writing, the Fundamental Research Funds for the Central Universities (N141904001) for the data analysis, the Natural Science Foundation of Liaoning Province (2015020553) for the clinicopathological data collection, the China Postdoctoral Science Foundation (2016 M591447) for the design of the study, and the Postdoctoral Science Foundation of Northeastern university (20160203) for the data analysis and writing.

Availability of data and materials

The datasets analysed during the current study are available from the corresponding author on reasonable request.

Author information




ZW, HX, and HG participated in the design of the study and drafting the article. ZX, YY, and ML participated in the design of the study, the statistical analysis and drafting the article. YJ, HX, and HZ participated in the design of the study and the statistical analysis. CL, HZ, JX, and PL participated in the design of the study, and revising the article. All the authors read and approved the final manuscript.

Corresponding authors

Correspondence to Hao Zhang or Junchang Xin or Hong Xu or Caigang Liu.

Ethics declarations

Ethics approval and consent to participate

The study was granted ethical approval by the Ethical Committee of China Medical University and the Liaoning Cancer Hospital and Institute, and all the patients provided written informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Li, M., Xu, Z. et al. Improvements to the gastric cancer tumor-node-metastasis staging system based on computer-aided unsupervised clustering. BMC Cancer 18, 706 (2018).

Download citation


  • Gastric cancer
  • Tumor-node-metastasis staging
  • Computer-aided unsupervised clustering method