Oral tongue cancer gene expression profiling: Identification of novel potential prognosticators by oligonucleotide microarray analysis

Background The present study is aimed at identifying potential candidate genes as prognostic markers in human oral tongue squamous cell carcinoma (SCC) by large scale gene expression profiling. Methods The gene expression profile of patients (n=37) with oral tongue SCC were analyzed using Affymetrix HG_U95Av2 high-density oligonucleotide arrays. Patients (n=20) from which there were available tumor and matched normal mucosa were grouped into stage (early vs. late) and nodal disease (node positive vs. node negative) subgroups and genes differentially expressed in tumor vs. normal and between the subgroups were identified. Three genes, GLUT3, HSAL2, and PACE4, were selected for their potential biological significance in a larger cohort of 49 patients via quantitative real-time RT-PCR. Results Hierarchical clustering analyses failed to show significant segregation of patients. In patients (n=20) with available tumor and matched normal mucosa, 77 genes were found to be differentially expressed (P< 0.05) in the tongue tumor samples compared to their matched normal controls. Among the 45 over-expressed genes, MMP-1 encoding interstitial collagenase showed the highest level of increase (average: 34.18 folds). Using the criterion of two-fold or greater as overexpression, 30.6%, 24.5% and 26.5% of patients showed high levels of GLUT3, HSAL2 and PACE4, respectively. Univariate analyses demonstrated that GLUT3 over-expression correlated with depth of invasion (P<0.0001), tumor size (P=0.024), pathological stage (P=0.009) and recurrence (P=0.038). HSAL2 was positively associated with depth of invasion (P=0.015) and advanced T stage (P=0.047). In survival studies, only GLUT3 showed a prognostic value with disease-free (P=0.049), relapse-free (P=0.002) and overall survival (P=0.003). PACE4 mRNA expression failed to show correlation with any of the relevant parameters. Conclusion The characterization of genes identified to be significant predictors of prognosis by oligonucleotide microarray and further validation by real-time RT-PCR offers a powerful strategy for identification of novel targets for prognostication and treatment of oral tongue carcinoma.

recurrence (P=0.038). HSAL2 was positively associated with depth of invasion (P=0.015) and advanced T stage (P=0.047). In survival studies, only GLUT3 showed a prognostic value with disease-free (P=0.049), relapse-free (P=0.002) and overall survival (P=0.003). PACE4 mRNA expression failed to show correlation with any of the relevant parameters.

Conclusion:
The characterization of genes identified to be significant predictors of prognosis by oligonucleotide microarray and further validation by real-time RT-PCR offers a powerful strategy for identification of novel targets for prognostication and treatment of oral tongue carcinoma.

Background
Cancer arising from the oral cavity accounts for approximately 1.6% of all cancers diagnosed in the United States with an incidence of 22,000 new cases per year [1]. Despite the advances in multimodality treatment, the overall prognosis for patients with oral squamous cell carcinoma (SCC) has remained unchanged in the past three decades. Furthermore, variability in the clinical course of patients with oral SCC remains unexplained and conventional clinicopathological parameters fail to answer all questions. Identification of novel prognostic factors may allow a rational selection of the most appropriate therapeutic options for individual patients. The cellular and molecular heterogeneity of oral SCC and the large number of genes potentially involved in oral carcinogenesis and progression emphasize the importance of studying multiple gene alterations on a global scale. Gene expression profiling by high-throughput technologies have proven to be valuable tools for prognostication of outcome and progression in human malignancies including head and neck cancer [2][3][4][5][6][7][8][9][10]. These technologies permit us to classify individual cancers and enhance our understanding of molecular cancer pathogenesis.
There are several distinct subsites within the oral cavity cancer including buccal mucosa, oral tongue, floor of mouth, gingiva, retromolar trigone and hard palate. Since they differ in their biological and clinical behaviors, the present study focused on one subsite -the oral tongue. This study utilized high-density oligonucleotide array to generate a molecular portrait of oral tongue SCC and to explore the correlations between gene expression patterns and clinically relevant parameters. We performed hierarchical clustering analysis, analyzed gene expression profiles by comparing primary tumor and their matched normal mucosa and compared different patient groups based on lymph node status and tumor stage to identify clinically significant genes. Data from the microarray analysis were then validated by real-time RT-PCR. The present study is the first to demonstrate the ability of gene expression profiling to predict clinical outcome in one cancer subsite within the oral cavity.

Tumor Selection
Following guidelines established by the Institutional Review Board at Memorial Sloan-Kettering Cancer Center (MSKCC), fresh tissue samples were sequentially collected after obtaining written informed consent from 49 patients undergoing therapeutic surgical resection for SCC of the oral tongue at the Head and Neck Service, MSKCC from January 28, 1998 to January 2, 2002. Post-operative adjuvant treatment was given to selected patients following the institutional protocol. In each case, the portion of tumor was resected near the advancing edge of the tumor to avoid its necrotic center. After excision, the tissues were immediately snap-frozen and stored in liquid nitrogen until use. Histologically normal mucosae of the upper aerodigestive tract, resected 5 cm away from the tumor area, were obtained in all cases and used as controls. Tumors were staged according to the AJCC/UICC TNM classification 5 th edition [11]. "Node-positive cases" in this study refers to the presence of positive cervical nodes based on a histological diagnosis after a neck dissection, while the patients who experienced no metastasis for at least 12 months post-operatively were scored as "nodenegative cases." The clinical and pathological characteristics of all patients analyzed in the study are summarized in Table 1.

Oligonucleotide microarray analysis
Tumor and normal tissues from 37 of the 49 patients were used for the oligonucleotide microarray analysis. Twenty (TN paired) of the 37 patients had primary tumor samples and matched normal mucosa available for analysis. Total RNA from snap-frozen tissue samples from the 37 patients was extracted with TRIsol™ reagent (Gibco BRL) following the manufacturer's protocol and re-purified by the RNAeasy Mini-spin column (Qiagen). Five to 10 μg of total RNA was reverse transcribed in the presence of an oligo dT-T7 primer. The cDNA was used for in vitro transcription amplification reaction in the presence of biotinylated nucleotides. Fifteen μg of labeled cRNA was fragmented and then hybridized against the Affymetrix HG_U95Av2 oligonucleotide arrays (Affymetrix, Santa Clara, CA). The arrays were scanned using a Hewlett Packard confocal laser scanner and analyzed using MicroArray Suite 5.0 (Affymetrix).

RNA preparation and real-time RT-PCR
RT-PCR of GLUT3, HSAL2, and PACE4 was performed on the larger cohort of 49 patients. Two μg of total RNA was reverse transcribed with MultiScribe™ Reverse Transcriptase (Applied Biosystems, Inc.). Gene specific primers were designed using the Primer3 Program. Sequences of PCR primer sets (in 5'-3' direction) were as follows: GLUT3 forward: TAGAAAGCCTGTTCCCCTCA, GLUT3 backward: GTGGCGGGATTACTTCAAAA; HSAL2 forward: CCCTCCTATTTCAGCCTCCT, HSAL2 backward: TCTTCAGTACCGGCACCTTC; PACE4 forward: CCTGT-GTGACCCTCTGTCCT, PACE4 backward: GGTTCATC-CACGCACTTTTT. The sequence of PCR primer sets for 18S rRNA were previously described [12]. Quantification of transcripts was performed by the ICycler Detection System (Bio-Rad Laboratories) using SYBR green detection. The relative quantification of a target gene in comparison to a reference (18S rRNA) was performed as described [13]. Unless otherwise stated, each assay included duplicate reactions for each sample and was repeated twice.

Statistical analysis
All correlation and outcome analysis was performed using the JMP statistical software package version 4.0.0 (SAS Institute, Inc.). Disease-free survival is defined as the time from surgery to the day of the first recurrence or death. Relapse-free survival was defined as time from surgery to the day of the first recurrence. Overall survival was defined as time from surgery to the day of death or last follow-up.

Gene expression patterns in oral tongue SCC
We analyzed gene expression profiles in 20 patients with oral tongue SCC by comparison between primary tumor samples and their matched morphologically normal mucosa. Among 12,625 probe sets in the Affymetrix array, 77 probe sets had statistically significant difference (P < 0.05) between all tumors and their matched normal tissues. There were 60 probe sets representing 45 genes and 11 ESTs that were increased and 17 probe sets representing 9 genes and 8 ESTs that were decreased in tumors compared to normal controls. Table 2 lists the genes that were up-regulated or down-regulated along with the fold changes in gene expression in tumor compared to their normal counterparts. These include genes known to be relevant in oncogenesis such as cell proliferation, apoptosis, development, angiogenesis, invasion and metastasis as well as genes that have not been implicated in oral tongue carcinogenesis. Among the 45 over-expressed genes, MMP-1 encoding interstitial collagenase showed the highest level of increase (average: 34.18 fold). MMP-7 and MMP-12 were also found to be overexpressed. Matrix metalloproteinases (MMPs), a family of 23 human zinc-dependent extracellular endopeptidases involved in the degradation of extracellular matrix and basement membrane during tumor cell invasion, have been implicated in a number of different human tumors including head and neck SCC [14,15]. Not surprisingly, enhanced MMP-1 expression has been found to be associated with malignant progression as well as poor outcome in head and neck SCC [15][16][17]. Likewise, MMP-7 and MMP-12 have both been implicated in tumor aggressiveness in oral SCC [18]. Genes that have been shown to be involved in epithelial development and differentiation such as the cytokeratins KRT16 and KRT17 were found to be overexpessed. In their study investigating RNA from head and neck SCC and normal tissues, Villaret et al. (2000) found KRT6 and KRT16 to be the genes most commonly expressed [19]. Similarly, KRT16 has also been found to be highly expressed in squamous cell carcinoma of the skin [20]. In addition, those that play a role in angiogenesis, such as hypoxia-inducible factor (HIF-1 ) and platelet-derived endothelial cell growth factor (ECGF1) were also overexpressed. Several transcripts were found to be significantly underexpressed or absent in tumor compared with matched normal tissues including those that encode for cell surface (CO-029), nuclear (ZAKI-4) and extracellular proteins (hSBP).     (3) the tumor samples by themselves (T, n = 31); and (4) the normal samples by themselves (N, n = 26). The data were clustered using the standard hierarchical method with ward linkage and using the Pearson correlation to determine the distance function. The distance between samples was dist = (1-p)/2 where p is the correlation coefficient. Before clustering, the data was filtered to remove genes that were scored Absent (A) by the MAS5.0 software in 75% or more of the samples as they are likely to be measuring noise in the system. To assess the robustness of the clustering results, a resampling method was used to create 1000 replica datasets by adding Guassian noise to each point. These 1000 data sets were individually clustered and then a consensus tree was built from them. The number at each node in the tree indicates how often that subtree appears in the 1000 replica trees. The higher the number, the more robust is the subtree. The samples from the TN cluster set clearly segregated the tumor from the normal samples ( Figure 1A). Similarly, TN paired set separated the tumor samples from their matched normal counterparts ( Figure 1B). The two clusters exhibit nearly identical patterns of gene expression changes. As shown on Table 2, the genes that were overexpressed included those involved in tumor invasion, epithelial development and angiogenesis. In contrast, however, clustering analysis failed to show significant segregation of patients based on expression profiling in both the T and N cluster sets pos-Hierarchical clustering of the gene expression data for the TN cluster set, N = 37 Figure 1 Hierarchical clustering of the gene expression data for the TN cluster set, N = 37. A; and the TN paired cluster set, N = 20. B. Approximately 12, 625 genes were clustered using the method described in the text. The genes shown represent the top 80 genes that were up-regulated (red) and down-regulated (green) in the sample sets. The normal tissue (N) and tumor tissue (T) are followed by their corresponding case numbers.

A B
sibly due to the heterogeneous nature of the samples as well as the relatively small number of samples in this study (Figure 2).

Assessment of correlation with lymph node status, stage and outcome
We grouped patients with pathologic stages I and II disease into an early-stage disease and grouped patients with pathologic stages III and IV disease into a late-stage disease category. Through statistical regression analysis, we identified genes whose expression differed in tumor versus normal mucosa and those whose expression was most different between the staging subgroups (Table 3A). The same analysis was performed to compare patients without cervical nodal metastasis (N0) to those with nodal disease (N1-N3) ( Table 3B). We analyzed data from patients (n = 20) for which tumor and matched normal mucosae were available and not the larger subgroup of patients (n = 37) in order to obtain a more meaningful comparison of gene expression changes between tumor and normal mucosa. We selected three genes, GLUT3, HSAL2 and PACE4 for further analysis and validation in a larger cohort of 49 patients. We selected genes with known important roles in cellular functions and carcinogenesis and for which anti-bodies were available. We employed a two-step quantitative RT-PCR to validate expression changes identified by gene array analysis for the three selected genes in all 49 cases. We defined the cut-off value for over-expression as two-fold or greater relative to matched normal controls. Using these criteria, 30.6%, 24.5% and 26.5% of patients expressed high levels of GLUT3, HSAL2 and PACE4, respectively. We assessed the prognostic significance of expression of the selected genes and various clinicopathological parameters. Univariate analyses demonstrated that GLUT3 over-expression correlated with depth of invasion (P < 0.0001), tumor size (P = 0.024), pathological stage (P = 0.009) and recurrence (P = 0.038). HSAL2 was positively associated with depth of invasion (P = 0.015) and advanced T stage (P = 0.047). PACE4 expression failed to show correlation with clinicopathological parameters. Table 4 depicts the univariate analysis of GLUT3, HSAL2 and PACE4 expression and various clinicopathologic parameters. In survival studies, only GLUT3 showed a prognostic value with disease-free survival (P = 0.049), relapse-free survival (P = 0.002) and overall survival (P = 0.003) (Figure 3). Multivariate analysis with Cox's proportional hazards revealed that all parameters remained independent prognosticators in this group of patients. Malignant cells show an increased glucose uptake in vitro and in vivo [21,22]. This process is thought to be mediated by glucose transporters (GLUTs), the expression and activity of which is regulated by oncogenes, growth factors and cytokines [23,24]. Studies of GLUT genes in human cancers have shown over-expression of GLUT1 and GLUT3 in cancers of various sites including the head and neck [25][26][27][28]. Recent studies in laryngeal carcinoma demonstrated an association between GLUT3 protein levels and poorer outcome [29]. HSAL2 is a member of a gene family that encodes a group of putative transcription factors. Evidence from various studies suggests that the HSAL gene family is necessary for normal embryonic development and genetic alterations can lead to human congenital defects and cancer [30,31]. HSAL2 is thought to have a role as a tumor suppressor gene in ovarian cancer [32]. The present study suggests the potential role of GLUT3 and HSAL2 in oral tongue SCC. Proprotein convertases (PC) are a family of serine endoproteases that play important roles in regulating cell function by converting proproteins to biologically active molecules such as neuropeptides and polypeptide hormones, protein tyrosine phosphatases, growth factors and their receptors, and enzymes including MMPs. Numerous members of the PC family have been associated with invasion and proliferation in various cancers including head and neck, breast and lung cancers [33][34][35]. PCs are thought to activate certain substrates that may play a significant role in carcinogenesis. Among these substrates are MMPs which are known to be involved in the degradation of extracellular matrix, a key process in the initiation of tumor microinvasion into the connective tissue. PACE4, a member of the PC family, activates membrane type MMPs (MT-MMPs). Bassi et al. demonstrated that PACE4 expression results in enhanced susceptibility to carcinogenesis in vivo [35]. In the present study, PACE4 failed to show clinical significance when validated by real-time RT-PCR likely due to the small sample size and heterogeneous nature of the specimens.
There is growing literature on the use of microarray technology to examine genomewide genetic expression changes associated with head and neck SCC development and to identify biomarkers as it relates to response to therapy and clinical outcome [5][6][7][8][9][10]. However, there is discordance among the studies. Furthermore, the biomarkers   (N1-N3). B; Genes whose expression in tumor was different in normal mucosa and which were most different between the clinical stage and nodal disease subgroups were identified by statistical regression analysis (t test). The genes identified within the two subgroups are listed. Note that HSAL2 were identified in both subgroups. Only genes that were present in at least 5 samples were included. classified based on lymph node status and the presence of extracapsular spread. Among the genes that were shown to be associated with metastasis included MMP-9 [38].

Kaplan-Meier plot showing over-all survival
Our findings show overexpression of numerous genes that have been previously shown in other DNA microarray studies to have a potential role in the development of head and neck carcinogenesis; these include MMP-1 and KRT16. Furthermore, we have identified and validated GLUT3 and HSAL2 to be potential prognosticator of head and neck SCC. Although the present study, like the others, is based on a relatively small number of patients and the findings do not allow us to draw definitive conclusion regarding their biological importance, it is the first to use large scale transcriptional profiling for predicting survival outcome in oral tongue SCC.

Conclusion
The use of high-density oligonucleotide probe arrays to identify gene expression differences between oral tongue SCC and normal tissues provide powerful means to decode the molecular events involved in the genesis and progression of oral SCC. Although these initial findings will need to be validated in relationship to clinical parameters and outcome in larger patient cohorts, the characterization of genes identified to be significant predictors by oligonucleotide microarray analysis may provide novel targets for the prognostication and treatment of oral cavity cancer. Finally, a large multi-institutional study including specimens of uniform characteristics using independent techniques to verify gene expression at the RNA, DNA and protein levels will be vital in reaching our ultimate of goal of improving the care of head and neck cancer patients.