Skip to main content

The development and validation of oral cancer staging using administrative health data



Oral cancer is a major global health problem. The complexity of histological prognosticators in oral cancer makes it difficult to compare the benefits of different treatment regimens. The Taiwanese National Health database provides an opportunity to assess correlations between outcome and treatment protocols and to compare the effects of different treatment regimens. However, the absence of indices of disease severity is a critical problem. The aim of this study was to ascertain how accurately we could assess the severity of oral cancer at the time of initial diagnosis on the basis of variables in a national database.


In the cancer registry database of a medical center in Taiwan, we identified 1067 histologically confirmed cases of oral cancer (ICD9 codes 140, 141 and 143–145) that had been first diagnosed and subjected to initial treatment in this hospital. The clinical staging status was considered as the gold standard and we used concordance (C)-statistics to assess the model’s predictive performance. We added the predictors of treatment modality, cancer subsite, and age group to our models.


Our final overall model included treatment regimen, site, age, and two interaction terms; namely, interactions between treatment regimen and age and those between treatment regimen, site, and age. In this model, the C-statistics were 0.82–0.84 in male subjects and 0.96–0.99 in female subjects. Of the models stratified by age, the model that considered treatment regimen and site had the highest C-statistics for the interaction term, this value being greater than 0.80 in male subjects and 0.9 in female subjects.


In this study, we found that adjusting for sex, age at first diagnosis, oral cancer subsite, and therapy regimen provided the best indicator of severity of oral cancer. Our findings provide a method for assessing cancer severity when information about staging is not available from a national health-related database.

Peer Review reports


Oral cancer is a major health problem, the worldwide annual incidence being 274,300 cases with 128,000 deaths; two-thirds of this burden is in developing countries [1]. Despite considerable advances in diagnostic and therapeutic techniques, oral cancer continues to portend a poor prognosis. We surveyed available published reports and found that the effect of treatment regimen or other prognosis-related factors is often uncertain and controversial [25]. The complexity of histological prognosticators in oral cancer likely partly accounts for this because it makes it difficult to compare the benefits of different treatment regimens; small samples are another limitation of previous studies [68].

The Taiwan National Health Insurance program, which has operated since 1995, enrolls almost 99% of the inhabitants of Taiwan and is contracted with 97% of hospitals and clinics throughout the nation [9]. It therefore provides an opportunity to assess correlations between outcome and treatment protocol and thus compare the effectiveness of different treatment regimens. However, the major purpose of this program concerns costs of medical services. In general, lack of information about disease severity is a critical problem when analyzing a population database. Anatomic site and disease stage are the most important tumor-related predictors of the prognosis of oral cancer after various treatment regimens [1013]. The aim of this study was to try to assess how accurately the severity of oral cancer at the time of first diagnosis can be assessed on the basis of variables commonly available in national databases.



We used data from a cancer registry database of a medical center in Taiwan. In our study, we included all patients with oral cancer (ICD9 codes 140, 141, 143–145) who had been first diagnosed and undergone initial treatment in this hospital from 1 January 2002 to 31 December 2007. All 1067 of the oral cancer subjects included in the database had been histologically confirmed and staged according to the TNM staging system of the Union for International Cancer Control [14]. Most study subjects had squamous cell carcinoma (SCC; 971 cases, 91%); 577 of these (54.08%) were well differentiated and 290 (27.18%) moderately differentiated. The Institutional Review Board of Kaohsiung Medical University Hospital reviewed and approved our proposal for use of the database (KMUH-IRB-980174).

Data concerning sex, age at first diagnosis, oral cancer subsite (lip, tongue, gum, floor of the mouth, and other sites), clinical stage, and therapy regimen were collected from the database. We considered seven different treatment regimens in this study; all were based on a combination of surgery, radiotherapy, and chemotherapy. The gold standard for classifying oral cancer is considered clinical stage, and we tried to classify it as accurately as possible by using available personal and medical intervention variables. We performed the χ2 test to ascertain which individual variables significantly contributed to the accuracy of staging. To assess the accuracy of our model’s predictive performance, we performed multivariate logistic regression analyses and used concordance (C) statistics. In the logistic regression analysis models, we included: (i) treatment modality (the categories were surgery only; radiation only; chemotherapy only; surgery and chemotherapy; surgery and radiation; radiation and chemotherapy; surgery, and radiation and chemotherapy; (ii) cancer subsite (lip [140], tongue [141], gum [143], floor of mouth [144], and other [145]); (iii) age group (20–44 years, 45–64 years and ≥65 years); and (iv) interactions of these treatments and sites.

A C-statistic of 1.0 represents perfect sensitivity and specificity; whereas a C-statistic of 0.5 represents an essentially worthless test. The C-statistic is an accuracy measure that can be used for ordinal or nominal outcomes. In this study, the C-statistic is a measure of the accuracy with which the model discriminates between patients who were diagnosed as early stage and those who were diagnosed as advanced stage.


More than 90% of our cases were male (995/1067). The mean first diagnosed age was 51.58 years (standard deviation (SD) = 11.12); 51.08 years (SD = 10.67) in male subjects and 58.64 years (SD = 14.44) in female subjects. More than 50% of all cases were in the age group of 45–65 years at the time of diagnosis; 60% of male subjects were in this age group. About 27% of male subjects were diagnosed before the age of 45 years, but only 15% of women. Relevant clinical variables at time of diagnosis are shown in Table 1. More than 50% of cases were first diagnosed at an advanced stage (III or IV), especially in men (>65%). Tongue and buccal mucosa were the dominant subsites of oral cancer in our study. About 30% of oral cancer in men originated in the tongue and 30% in the buccal mucosa; however, in women, the tongue (37.5%) was clearly the most common subsite. Surgery alone and chemotherapy alone were the two most commonly administered treatment regimens.

Table 1 Relevant clinical characteristics of patients with oral cancer

Tables 2 and 3 show the distribution of relevant factors in each sex according to clinical stage. In male patients, age, site, and treatment regimens were significantly associated with clinical stage (stage I vs II–IV and clinical stage I–II vs III–IV). However, for clinical stages I–III versus IV, age was not a significant factor, whereas site and treatment were. In female patients, age was not a significant factor for any of these comparisons. Site was the only factor that was statistically significantly associated with all comparison situations. The factor of treatment regimen showed different patterns of association for different staging combinations; however, none of these were statistically significant because there too few cases in any one category of treatment regimen. Tables 4 and 5 show the stepwise logistic regression models with which we examined the accuracy of the different predictors. In Model 1 of Table 4, only treatment regimens are considered; the C-statistics are all 0.76 for the various combinations compared in male subjects and 0.83–0.85 in female subjects. Model 2 included only site; the C-statistics are 0.60–0.64 in male patients and 0.77–0.82 in female patients. Model 3 included treatment regimen and site; the C-statistics are 0.78–0.79 in male subjects and 0.91–0.96 in female subjects. Interactions between treatment regimens and sites are considered in Model 4; the C-statistics are 0.79–0.81 in male patients and 0.94–0.97 in female patients. Following Model 4, age was considered in Model 5; the C-statistics are 0.80–0.82 in male subjects and 0.96-0.99 in female subjects. The final model shown is Model 6, which included treatment regimen, site, age and two interaction terms; namely, the interaction effect of treatment regimen/age and of treatment regimen/site/age. The C-statistics in Model 6 are 0.82–0.84 in male patients and 0.96–0.99 in female patients. In Table 5, the models are stratified by age and the accuracy evaluated by the predictors of treatment regimen and site. There are four models in this table; these consider treatment regimen, site, treatment regimen, and site, and adding the interaction terms of the two factors in each of Models 1, 2, 3, and 4 separately. For each stratified group, Model 4 has the highest C-statistics, the values being greater than 0.80 in male patients and 0.9 in female patients. The accuracy tended to be better in older age groups, but we found no significant variations in the various age groups.

Table 2 Distribution of relevant factors in male patients according to clinical stage
Table 3 Distribution of relevant factors in female patients according to clinical stage
Table 4 Staging accuracy according to logistic regression models evaluating the variables of treatment, site, and age
Table 5 Accuracy of each model according to logistic regression analysis of various combinations of predictors


Knowledge of the anatomy and disease staging is essential to optimal treatment planning [15]. Some anatomic sites, such as the superior gingivolabial sulcus, are linked with poor outcomes because of their rich lymphatic drainage and difficulty in evaluating the extent of local invasion, and therefore in selecting an appropriate management strategy [16]. Vascular and lymphatic networks, which vary between different anatomic sites, may influence tumor evolution and hence the outcome; thus, SCCs at the base rather than the oral part of the tongue have a higher rate of metastasis [17]. Cancer staging reflects both homogeneous survival data and important variations in disease characteristics that affect treatment options. Differentiation between stages I or II and stages III or IV of oral SCCs is most important for treatment planning, because early-stage tumors (stages I and II) typically require only single-modality therapy (mostly surgical resection), whereas stage III and IV tumors may require multimodality therapy with a combination of chemotherapy, radiation, and surgical resection. The appropriate therapeutic modalities depend on the site of origin of the primary tumor [18]. Population-based administrative data are an effective source of information about chronic disease or for cancer surveillance. However, the ways in which data can be extracted from such databases differ; in practice certain categories of clinical information may be unavailable.

This study provides a method for adjusting for cancer severity when staging information is not available. We found that the severity of oral cancer can be assessed based on sex, age at first diagnosis, oral cancer subsite, and therapy regimen with an accuracy of 84% in male subjects and more than 96% in female subjects. In Taiwan, oral cancer is a male-dominant cancer, the male:female ratio being 9:1 [19]. More than 70% of men with oral cancer have the habits of both chewing and smoking tobacco, whereas only approximately 10% of female patients have these habits [20]. Although some studies have failed to find an association between prognosis and smoking tobacco or consuming alcohol [21], most authors have reported higher mortality in smokers and alcohol drinkers [22, 23]. In a study from Taiwan [21], Lo et al. reported that areca quid chewing is also correlated with a poor prognosis. Smokers and alcohol drinkers seem to be at higher risk of developing second primary oral cancers than nonsmokers and nondrinkers; thus, they face worse outcomes [24, 25]. In our study, we found that the sex of the patient seemed to affect the choice of treatment plan: a higher proportion of male than female patients had undergone combined multimodality therapy, especially those with early-stage disease. This finding may be related to the sexes having different habits; it requires further study.

Previous studies have suggested that sex differences in oral cancer prognosis are attributable to a delay in seeking medical care and differences in rate of compliance with recommended treatment. Some studies have reported lower survival rates in female subjects [22, 26], whereas others have found no sex-based difference in prognosis [21, 27, 28]. A correlation between prognosis and age is controversial; some authors reporting they are unrelated and others having found that older patients have worse prognoses [22, 23]. Most researchers accept that disease staging has a crucial influence on outcome [21, 2830].

This study has some limitations. Patients were included on the basis of a previous diagnosis of oral cancer. The training and expertise of the personnel who performed the pathological assessments is unknown; therefore, we are unable to determine the reliability of their findings. Measurement methods and diagnostic criteria were also likely variable. However, because the database used was from a medical center, its accuracy is reliable.


The main conclusion of this study is that adjusting for sex, first diagnosed age, oral cancer subsite, and therapy regime facilitates accurate assessment of the severity of oral cancer. Our findings provide a method for adjusting for cancer severity when staging information is not available from national health-related databases.


  1. 1.

    Elango KJ, Anandkrishnan N, Suresh A, Iyer SK, Ramaiyer SK, Kuriakose MA: Mouth self-examination to improve oral cancer awareness and early detection in a high-risk population. Oral Oncol. 2011, 47 (7): 620-624. 10.1016/j.oraloncology.2011.05.001.

    Article  PubMed  Google Scholar 

  2. 2.

    McMahon J, O’Brien CJ, Pathak I, Hamill R, McNeil E, Hammersley N, Gardiner S, Junor E: Influence of condition of surgical margins on local recurrence and disease-specific survival in oral and oropharyngeal cancer. Br J Oral Maxillofac Surg. 2003, 41 (4): 224-231. 10.1016/S0266-4356(03)00119-0.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Sutton DN, Brown JS, Rogers SN, Vaughan ED, Woolgar JA: The prognostic implications of the surgical margin in oral squamous cell carcinoma. Int J Oral Maxillofac Surg. 2003, 32 (1): 30-34. 10.1054/ijom.2002.0313.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Yuen AP, Lam KY, Wei WI, Lam KY, Ho CM, Chow TL, Yuen WF: A comparison of the prognostic significance of tumor diameter, length, width, thickness, area, volume, and clinicopathological features of oral tongue carcinoma. Am J Surg. 2000, 180 (2): 139-143. 10.1016/S0002-9610(00)00433-5.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Leemans CR, Tiwari R, Nauta JJ, van der Waal I, Snow GB: Recurrence at the primary site in head and neck cancer and the significance of neck lymph node metastases as a prognostic factor. Cancer. 1994, 73 (1): 187-190. 10.1002/1097-0142(19940101)73:1<187::AID-CNCR2820730132>3.0.CO;2-J.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Razak AA, Saddki N, Naing NN, Abdullah N: Oral cancer survival among Malay patients in Hospital Universiti Sains Malaysia, Kelantan. Asian Pac J Cancer Prev. 2010, 11 (1): 187-191.

    PubMed  Google Scholar 

  7. 7.

    Ogawa T, Matsuura K, Shiga K, Tateda M, Katagiri K, Kato K, Saijo S, Kobayashi T: Surgical treatment is recommended for advanced oral squamous cell carcinoma. Tohoku J Exp Med. 2010, 223 (1): 17-25.

    Article  Google Scholar 

  8. 8.

    Shim SJ, Cha J, Koom WS, Kim GE, Lee CG, Choi EC, Keum KC: Clinical outcomes for T1-2N0-1 oral tongue cancer patients underwent surgery with and without postoperative radiotherapy. Radiat Oncol. 2010, 5: 43-10.1186/1748-717X-5-43.

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Bureau of National Health Insurance DoH, Executive Yuan: National Health Insurance in Taiwan. 2012, Taiwan, –2013

    Google Scholar 

  10. 10.

    Kreppel M, Eich HT, Kubler A, Zoller JE, Scheer M: Prognostic value of the sixth edition of the UICC’s TNM classification and stage grouping for oral cancer. J Surg Oncol. 2010, 102 (5): 443-449. 10.1002/jso.21547.

    Article  PubMed  Google Scholar 

  11. 11.

    Carinci F, Pelucchi S, Farina A, Calearo C: A comparison between TNM and TANIS stage grouping for predicting prognosis of oral and oropharyngeal cancer. J Oral Maxillofac Surg. 1998, 56 (7): 832-836. 10.1016/S0278-2391(98)90007-6. discussion 836–837

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Greene FL, Sobin LH: The staging of cancer: a retrospective and prospective appraisal. CA Cancer J Clin. 2008, 58 (3): 180-190. 10.3322/CA.2008.0001.

    Article  PubMed  Google Scholar 

  13. 13.

    van der Schroeff MP, de Jong RJ B: Staging and prognosis in head and neck cancer. Oral Oncol. 2009, 45 (4–5): 356-360.

    Article  PubMed  Google Scholar 

  14. 14.

    O’Sullivan B, Shah J: New TNM staging criteria for head and neck tumors. Semin Surg Oncol. 2003, 21 (1): 30-42. 10.1002/ssu.10019.

    Article  PubMed  Google Scholar 

  15. 15.

    Trotta BM, Pease CS, Rasamny JJ, Raghavan P, Mukherjee S: Oral cavity and oropharyngeal squamous cell cancer: key imaging findings for staging and treatment planning. Radiographics. 2011, 31 (2): 339-354. 10.1148/rg.312105107.

    Article  PubMed  Google Scholar 

  16. 16.

    Tiwari R: Squamous cell carcinoma of the superior gingivolabial sulcus. Oral Oncol. 2000, 36 (5): 461-465. 10.1016/S1368-8375(00)00036-1.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Genden EM, Ferlito A, Bradley PJ, Rinaldo A, Scully C: Neck disease and distant metastases. Oral Oncol. 2003, 39 (3): 207-212. 10.1016/S1368-8375(02)00049-0.

    Article  PubMed  Google Scholar 

  18. 18.

    Forastiere AA, Ang KK, Brizel D, Brockstein BE, Burtness BA, Cmelak AJ, Colevas AD, Dunphy F, Eisele DW, Goepfert H, Hicks WL, Kies MS, Lydiatt WM, Maghami E, Martins R, McCaffrey T, Mittal BB, Pfister DG, Pinto HA, Posner MR, Ridge JA, Samant S, Schuller DE, Shah JP, Spencer S, Trotti A, Weber RS, Wolf GT, Worden F: Head and neck cancers. J Natl Compr Canc Netw. 2008, 6 (7): 646-695.

    CAS  PubMed  Google Scholar 

  19. 19.

    Ho PS, Ko YC, Yang YH, Shieh TY, Tsai CC: The incidence of oropharyngeal cancer in Taiwan: an endemic betel quid chewing area. J Oral Pathol Med. 2002, 31 (4): 213-219. 10.1034/j.1600-0714.2002.310404.x.

    Article  PubMed  Google Scholar 

  20. 20.

    Yang YH, Lien YC, Ho PS, Chen CH, Chang JS, Cheng TC, Shieh TY: The effects of chewing areca/betel quid with and without cigarette smoking on oral submucous fibrosis and oral mucosal lesions. Oral Dis. 2005, 11 (2): 88-94. 10.1111/j.1601-0825.2004.01061.x.

    Article  PubMed  Google Scholar 

  21. 21.

    Lo WL, Kao SY, Chi LY, Wong YK, Chang RC: Outcomes of oral squamous cell carcinoma in Taiwan after surgical therapy: factors affecting survival. J Oral Maxillofac Surg. 2003, 61 (7): 751-758. 10.1016/S0278-2391(03)00149-6.

    Article  PubMed  Google Scholar 

  22. 22.

    Leite IC, Koifman S: Survival analysis in a sample of oral cancer patients at a reference hospital in Rio de Janeiro. Brazil Oral Oncol. 1998, 34 (5): 347-352. 10.1016/S1368-8375(98)00019-0.

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Ribeiro KC, Kowalski LP, Latorre MR: Impact of comorbidity, symptoms, and patients’ characteristics on the prognosis of oral carcinomas. Arch Otolaryngol Head Neck Surg. 2000, 126 (9): 1079-1085. 10.1001/archotol.126.9.1079.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Carvalho AL, Singh B, Spiro RH, Kowalski LP, Shah JP: Cancer of the oral cavity: a comparison between institutions in a developing and a developed nation. Head Neck. 2004, 26 (1): 31-38. 10.1002/hed.10354.

    Article  PubMed  Google Scholar 

  25. 25.

    Hall SF, Groome PA, Rothwell D: The impact of comorbidity on the survival of patients with squamous cell carcinoma of the head and neck. Head Neck. 2000, 22 (4): 317-322. 10.1002/1097-0347(200007)22:4<317::AID-HED1>3.0.CO;2-0.

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Massano J, Regateiro FS, Januario G, Ferreira A: Oral squamous cell carcinoma: review of prognostic and predictive factors. Oral Surg Oral Med Oral Pathol Oral Radiol Endod. 2006, 102 (1): 67-76. 10.1016/j.tripleo.2005.07.038.

    Article  PubMed  Google Scholar 

  27. 27.

    Oc P, Pillai G, Patel S, Fisher C, Archer D, Eccles S, Rhys-Evans P: Tumour thickness predicts cervical nodal metastases and survival in early oral tongue cancer. Oral Oncol. 2003, 39 (4): 386-390. 10.1016/S1368-8375(02)00142-2.

    Article  Google Scholar 

  28. 28.

    Nguyen TV, Yueh B: Weight loss predicts mortality after recurrent oral cavity and oropharyngeal carcinomas. Cancer. 2002, 95 (3): 553-562. 10.1002/cncr.10711.

    Article  PubMed  Google Scholar 

  29. 29.

    Munoz Guerra MF, Naval Gias L, Campo FR, Perez JS: Marginal and segmental mandibulectomy in patients with oral cancer: a statistical analysis of 106 cases. J Oral Maxillofac Surg. 2003, 61 (11): 1289-1296. 10.1016/S0278-2391(03)00730-4.

    Article  PubMed  Google Scholar 

  30. 30.

    Gonzalez-Moles MA, Esteban F, Rodriguez-Archilla A, Ruiz-Avila I, Gonzalez-Moles S: Importance of tumour thickness measurement in prognosis of tongue cancer. Oral Oncol. 2002, 38 (4): 394-397. 10.1016/S1368-8375(01)00081-1.

    Article  PubMed  Google Scholar 

Pre-publication history

  1. The pre-publication history for this paper can be accessed here:

Download references

Author information



Corresponding author

Correspondence to Ho Pei-Shan.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

PSH and LTC designed, conducted, and implemented the study and drafted the manuscript. CHC critically revised the original draft of the manuscript. YYH conceived the study, participated in its design and coordination, and helped to draft the manuscript. All authors have read and approved the final manuscript.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li-Ting, C., Chung-Ho, C., Yi-Hsin, Y. et al. The development and validation of oral cancer staging using administrative health data. BMC Cancer 14, 380 (2014).

Download citation


  • Oral cancer
  • Validation
  • National health database
  • Taiwan