The development and validation of oral cancer staging using administrative health data

Background Oral cancer is a major global health problem. The complexity of histological prognosticators in oral cancer makes it difficult to compare the benefits of different treatment regimens. The Taiwanese National Health database provides an opportunity to assess correlations between outcome and treatment protocols and to compare the effects of different treatment regimens. However, the absence of indices of disease severity is a critical problem. The aim of this study was to ascertain how accurately we could assess the severity of oral cancer at the time of initial diagnosis on the basis of variables in a national database. Methods In the cancer registry database of a medical center in Taiwan, we identified 1067 histologically confirmed cases of oral cancer (ICD9 codes 140, 141 and 143–145) that had been first diagnosed and subjected to initial treatment in this hospital. The clinical staging status was considered as the gold standard and we used concordance (C)-statistics to assess the model’s predictive performance. We added the predictors of treatment modality, cancer subsite, and age group to our models. Results Our final overall model included treatment regimen, site, age, and two interaction terms; namely, interactions between treatment regimen and age and those between treatment regimen, site, and age. In this model, the C-statistics were 0.82–0.84 in male subjects and 0.96–0.99 in female subjects. Of the models stratified by age, the model that considered treatment regimen and site had the highest C-statistics for the interaction term, this value being greater than 0.80 in male subjects and 0.9 in female subjects. Conclusion In this study, we found that adjusting for sex, age at first diagnosis, oral cancer subsite, and therapy regimen provided the best indicator of severity of oral cancer. Our findings provide a method for assessing cancer severity when information about staging is not available from a national health-related database.


Background
Oral cancer is a major health problem, the worldwide annual incidence being 274,300 cases with 128,000 deaths; two-thirds of this burden is in developing countries [1]. Despite considerable advances in diagnostic and therapeutic techniques, oral cancer continues to portend a poor prognosis. We surveyed available published reports and found that the effect of treatment regimen or other prognosis-related factors is often uncertain and controversial [2][3][4][5]. The complexity of histological prognosticators in oral cancer likely partly accounts for this because it makes it difficult to compare the benefits of different treatment regimens; small samples are another limitation of previous studies [6][7][8].
The Taiwan National Health Insurance program, which has operated since 1995, enrolls almost 99% of the inhabitants of Taiwan and is contracted with 97% of hospitals and clinics throughout the nation [9]. It therefore provides an opportunity to assess correlations between outcome and treatment protocol and thus compare the effectiveness of different treatment regimens. However, the major purpose of this program concerns costs of medical services. In general, lack of information about disease severity is a critical problem when analyzing a population database. Anatomic site and disease stage are the most important tumor-related predictors of the prognosis of oral cancer after various treatment regimens [10][11][12][13]. The aim of this study was to try to assess how accurately the severity of oral cancer at the time of first diagnosis can be assessed on the basis of variables commonly available in national databases.

Database
We used data from a cancer registry database of a medical center in Taiwan. In our study, we included all patients with oral cancer (ICD9 codes 140, 141, 143-145) who had been first diagnosed and undergone initial treatment in this hospital from 1 January 2002 to 31 December 2007. All 1067 of the oral cancer subjects included in the database had been histologically confirmed and staged according to the TNM staging system of the Union for International Cancer Control [14]. Most study subjects had squamous cell carcinoma (SCC; 971 cases, 91%); 577 of these (54.08%) were well differentiated and 290 (27.18%) moderately differentiated. The Institutional Review Board of Kaohsiung Medical University Hospital reviewed and approved our proposal for use of the database (KMUH-IRB-980174).
Data concerning sex, age at first diagnosis, oral cancer subsite (lip, tongue, gum, floor of the mouth, and other sites), clinical stage, and therapy regimen were collected from the database. We considered seven different treatment regimens in this study; all were based on a combination of surgery, radiotherapy, and chemotherapy. The gold standard for classifying oral cancer is considered clinical stage, and we tried to classify it as accurately as possible by using available personal and medical intervention variables. We performed the χ 2 test to ascertain which individual variables significantly contributed to the accuracy of staging. To assess the accuracy of our model's predictive performance, we performed multivariate logistic regression analyses and used concordance (C) statistics. In the logistic regression analysis models, we included: (i) treatment modality (the categories were surgery only; radiation only; chemotherapy only; surgery and chemotherapy; surgery and radiation; radiation and chemotherapy; surgery, and radiation and chemotherapy; ( A C-statistic of 1.0 represents perfect sensitivity and specificity; whereas a C-statistic of 0.5 represents an essentially worthless test. The C-statistic is an accuracy measure that can be used for ordinal or nominal outcomes. In this study, the C-statistic is a measure of the accuracy with which the model discriminates between patients who were diagnosed as early stage and those who were diagnosed as advanced stage.

Results
More than 90% of our cases were male (995/1067). The mean first diagnosed age was 51.58 years (standard deviation (SD) = 11.12); 51.08 years (SD = 10.67) in male subjects and 58.64 years (SD = 14.44) in female subjects. More than 50% of all cases were in the age group of 45-65 years at the time of diagnosis; 60% of male subjects were in this age group. About 27% of male subjects were diagnosed before the age of 45 years, but only 15% of women. Relevant clinical variables at time of diagnosis are shown in Table 1. More than 50% of cases were first diagnosed at an advanced stage (III or IV), especially in men (>65%). Tongue and buccal mucosa were the dominant subsites of oral cancer in our study. About 30% of oral cancer in men originated in the tongue and 30% in the buccal mucosa; however, in women, the tongue (37.5%) was clearly the most common subsite. Surgery alone and chemotherapy alone were the two most commonly administered treatment regimens. Tables 2 and 3 show the distribution of relevant factors in each sex according to clinical stage. In male patients, age, site, and treatment regimens were significantly associated with clinical stage (stage I vs II-IV and clinical stage I-II vs III-IV). However, for clinical stages I-III versus IV, age was not a significant factor, whereas site and treatment were. In female patients, age was not a significant factor for any of these comparisons. Site was the only factor that was statistically significantly associated with all comparison situations. The factor of treatment regimen showed different patterns of association for different staging combinations; however, none of these were statistically significant because there too few cases in any one category of treatment regimen. Tables 4 and 5 show the stepwise logistic regression models with which we examined the accuracy of the different predictors. In Model 1 of Table 4 Table 5, the models are stratified by age and the accuracy evaluated by the predictors of treatment regimen and site. There are four models in this table; these consider treatment regimen, site, treatment regimen, and site, and adding the interaction terms of the two factors in each of Models 1, 2, 3, and 4 separately. For each stratified group, Model 4 has the highest C-statistics, the values being greater than 0.80 in male patients and 0.9 in female patients. The accuracy tended to be better in older age groups, but we found no significant variations in the various age groups.

Discussion
Knowledge of the anatomy and disease staging is essential to optimal treatment planning [15]. Some anatomic sites, such as the superior gingivolabial sulcus, are linked with poor outcomes because of their rich lymphatic drainage and difficulty in evaluating the extent of local invasion, and therefore in selecting an appropriate management strategy [16]. Vascular and lymphatic networks, which vary between different anatomic sites, may influence tumor evolution and hence the outcome; thus, SCCs at the base rather than the oral part of the tongue have a higher rate of metastasis [17]. Cancer staging reflects both homogeneous survival data and important  variations in disease characteristics that affect treatment options. Differentiation between stages I or II and stages III or IV of oral SCCs is most important for treatment planning, because early-stage tumors (stages I and II) typically require only single-modality therapy (mostly surgical resection), whereas stage III and IV tumors may require multimodality therapy with a combination of chemotherapy, radiation, and surgical resection. The appropriate therapeutic modalities depend on the site of origin of the primary tumor [18]. Population-based administrative data are an effective source of information about chronic disease or for cancer surveillance. However, the ways in which data can be extracted from such databases differ; in practice certain categories of clinical information may be unavailable.
This study provides a method for adjusting for cancer severity when staging information is not available. We found that the severity of oral cancer can be assessed based on sex, age at first diagnosis, oral cancer subsite, and therapy regimen with an accuracy of 84% in male subjects and more than 96% in female subjects. In Taiwan, oral cancer is a male-dominant cancer, the male: female ratio being 9:1 [19]. More than 70% of men with oral cancer have the habits of both chewing and smoking tobacco, whereas only approximately 10% of female patients have these habits [20]. Although some studies have failed to find an association between prognosis and smoking tobacco or consuming alcohol [21], most authors have reported higher mortality in smokers and alcohol drinkers [22,23]. In a study from Taiwan [21], Lo et al. reported that areca quid chewing is also correlated with a poor prognosis. Smokers and alcohol drinkers seem to be at higher risk of developing second primary oral cancers than nonsmokers and nondrinkers; thus, they face worse outcomes [24,25]. In our study, we found that the sex of the patient seemed to affect the choice of treatment plan: a higher proportion of male than female patients had undergone combined multimodality therapy, especially those with early-stage disease. This finding may be related to the sexes having different habits; it requires further study.
Previous studies have suggested that sex differences in oral cancer prognosis are attributable to a delay in seeking medical care and differences in rate of compliance with recommended treatment. Some studies have reported lower survival rates in female subjects [22,26], whereas others have found no sex-based difference in prognosis [21,27,28]. A correlation between prognosis and age is controversial; some authors reporting they are unrelated and others having found that older patients have worse prognoses [22,23]. Most researchers accept that disease staging has a crucial influence on outcome [21,[28][29][30].
This study has some limitations. Patients were included on the basis of a previous diagnosis of oral cancer. The training and expertise of the personnel who performed the pathological assessments is unknown; therefore, we are unable to determine the reliability of their findings. Measurement methods and diagnostic criteria were also likely variable. However, because the database used was from a medical center, its accuracy is reliable.

Conclusion
The main conclusion of this study is that adjusting for sex, first diagnosed age, oral cancer subsite, and therapy regime facilitates accurate assessment of the severity of oral cancer. Our findings provide a method for adjusting for cancer severity when staging information is not available from national health-related databases.