A key genomic signature associated with lymphovascular invasion in head and neck squamous cell carcinoma

Background Lymphovascular invasion (LOI), a key pathological feature of head and neck squamous cell carcinoma (HNSCC), is predictive of poor survival; however, the associated clinical characteristics and underlying molecular mechanisms remain largely unknown. Methods We performed weighted gene co-expression network analysis to construct gene co-expression networks and investigate the relationship between key modules and the LOI clinical phenotype. Functional enrichment and KEGG pathway analyses were performed with differentially expressed genes. A protein–protein interaction network was constructed using Cytoscape, and module analysis was performed using MCODE. Prognostic value, expression analysis, and survival analysis were conducted using hub genes; GEPIA and the Human Protein Atlas database were used to determine the mRNA and protein expression levels of hub genes, respectively. Multivariable Cox regression analysis was used to establish a prognostic risk formula and the areas under the receiver operating characteristic curve (AUCs) were used to evaluate prediction efficiency. Finally, potential small molecular agents that could target LOI were identified with DrugBank. Results Ten co-expression modules in two key modules (turquoise and pink) associated with LOI were identified. Functional enrichment and KEGG pathway analysis revealed that turquoise and pink modules played significant roles in HNSCC progression. Seven hub genes (CNFN, KIF18B, KIF23, PRC1, CCNA2, DEPDC1, and TTK) in the two modules were identified and validated by survival and expression analyses, and the following prognostic risk formula was established: [risk score = EXPDEPDC1 * 0.32636 + EXPCNFN * (− 0.07544)]. The low-risk group showed better overall survival than the high-risk group (P < 0.0001), and the AUCs for 1-, 3-, and 5-year overall survival were 0.582, 0.634, and 0.636, respectively. Eight small molecular agents, namely XL844, AT7519, AT9283, alvocidib, nelarabine, benzamidine, L-glutamine, and zinc, were identified as novel candidates for controlling LOI in HNSCC (P < 0.05). Conclusions The two-mRNA signature (CNFN and DEPDC1) could serve as an independent biomarker to predict LOI risk and provide new insights into the mechanisms underlying LOI in HNSCC. In addition, the small molecular agents appear promising for LOI treatment.


Background
Head and neck squamous cell carcinoma (HNSCC) is one of the most common cancers with high morbidity and mortality rates worldwide; > 90% of head and neck cancers are squamous cell carcinomas that arise in the oral cavity, oropharynx, and larynx [1]. Metastasis is the main cause of treatment failure and an important factor affecting prognosis [2]. Thus, elucidating the underlying genomic changes seems valuable for controlling lymph node metastases.
In case of HNSCC, advanced TNM stage, histological grade, and lymph node status, which are well-known major risk factors of metastatic disease and poor overall survival (OS) and disease-free survival, are poor prognostic indicators [3][4][5]. Lymphovascular invasion (LOI) has been associated with lymph node metastasis in HNSCC [6][7][8]. Thus, identification of effective molecular prognosticators of LOI should be a useful way to decrease the risk of metastasis in patients with HNSCC.
According to recent studies, the clinical characteristics of and parameters contributing to LOI remain uncertain. In fact, the incidence of LOI in patients with HNSCC is highly inconsistent, varying from 14 to 47% [9,10]. This considerable variation can be attributed to small sample sizes, distribution differences, and heterogeneity of HNSCC. Therefore, it is imperative to conduct clinical studies with large sample sizes to analyze the genomic and clinical characteristics of LOI. This should, consequently, facilitate the development of novel therapeutic targets, enhancing the survival of HNSCC patients with LOI.
The Cancer Genome Atlas (TCGA) has generated comprehensive, multidimensional maps of key genomic changes in several types of cancers, including HNSCC, and provided histopathological annotations and clinical survival data relevant to patients with HNSCC over a follow-up duration of 10 years. This has enabled the systematic evaluation of the relationship between LOI and gene signatures, providing clarity on key gene modules involved in LOI in patients with HNSCC. This has in turn provided us with comprehensive, systemic understanding of LOI not only at the genomic but also at the prognostic level.

Methods
Patient selection and data pre-processing Data pertaining to patients with HNSCC were downloaded from TCGA database. RNA expression profiles and clinical survival data of 500 patients were obtained (Table 1). Among them, clinical prognosis data of 339 patients were available. According to the threshold of |logFC| > 1 and P < 0.05, 2248 genes that met the criteria were identified as differentially expressed genes (DEGs) (Additional file 1: Table S1). The intersection of DEGs based on the NCBI Gene (Additional file 2: Table S2) and Online Mendelian Inheritance in Man (OMIM) (Additional file 3: Table S3) databases was performed using the Venn Diagram package in the R language.

Construction of the co-expression network
Based on mRNA expression data, the scale-free gene modules of co-expression were constructed using weighted gene co-expression network analysis (WGCNA) [11,12]. To ensure reliability of the coexpression network, hierarchical clustering was performed based on Euclidean distance, and two outlier samples were removed. Module-trait associations were considered to be important clinical characteristics between the clinical phenotype and module eigengenes (MEs). We analyzed the module-trait correlation and determined relevant modules, which were closely associated with the LOI clinical phenotype. An adequate softthreshold power that met the scale-free topology criterion was selected for transforming the former correlation matrix into an adjacency matrix, which was subsequently converted into a Topological Overlap Matrix (TOM) using the "TOM similarity" function in R. TOM-based dissimilarity was computed as measure distance, and an mRNA clustering dendrogram and module colors were obtained. In the clustering dendrogram, the minimum module size and cut height were separately set to 30 and 0.25, respectively. For key gene modules, gene significance and module membership indicated a positive correlation level between RNA expression profiles and the LOI clinical phenotype and between RNA expression profiles and clinical MEs.

Enrichment analysis of key co-expression modules
As per previously reported methodology [12,13], aberrantly expressed mRNAs in key gene modules were selected, and gene ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed. For the former, corresponding genes were classified into the biological process (BP) category, and for the latter, genes within key co-expression modules were used to detect the function of gene modules. P < 0.05 indicated statistical significance.

Protein-protein interaction (PPI) network analysis and hub gene identification
As previously reported [14,15], key gene co-expression modules were further explored to predict gene function correlation using the STRING database (confidence score > 0.9). Cytoscape was employed to screen significant gene pairs in the PPI network [16]. We further screened the PPI network modules by performing Molecular Complex Detection (MCODE) analysis. The criteria of MCODE were as follows: degree cutoff = 2, node cutoff = 0.2, maximum depth = 100, and k-score = 2. Finally, 24 genes were selected as hub genes and further analyzed using univariate survival analysis. Seven genes with significant prognostic differences were selected as characteristic genes, with P < 0.05.

Survival analysis of hub genes
According to the expression profiles of characteristic genes, Kaplan-Meier analysis was performed to explore prognostic differences; Cox proportional hazard ratio and 95% confidence interval were used for analysis. P < 0.05 indicated statistical significance. The least absolute shrinkage and selection operator (LASSO) model was then used to identify vital mRNAs from the prognostic hub genes. The LASSO method was utilized by the package "glmnet" in the R (version 3.5.1) software.

mRNA expression analysis
We used GEPIA (http://gepia.cancer-pku.cn/), a webbased tool that delivers fast and customizable functionalities based on TCGA and Genotype-Tissue Expression data, to analyze mRNA expression levels of the seven hub genes [17].

Immunohistochemistry analysis
To validate the protein expression levels of the seven hub genes, as per the method reported by Jian et al. [18], we used the Human Protein Atlas database (https://www.proteinatlas.org/) (HNSCC samples = 519, normal tissue samples = 44; scale bar = 200 μm). All captured images were manually annotated by certified pathologists.

Establishment of prognostic risk score formula
In light of the expression level of the hub genes and regression coefficients, a prognostic risk formula was established by multivariable Cox regression analysis. A risk score was calculated for each patient using this formula. All patients were consequently classified into a high-and low-risk group by utilizing the median risk score as the cutoff value. Next, the Kaplan-Meier survival curve was used to compare prognosis between the low-and high-risk groups. Moreover, a time-dependent receiver operating characteristic (ROC) curve was applied to assess diagnostic accuracy based on the risk score for 1-, 3-, and 5-year OS probability. P < 0.05 indicated statistical significance.

Identification of small molecular drugs
DrugBank is a comprehensive, high-quality, freely accessible, online database that combines quantitative drug data and target information [19]. The turquoise and pink modules in the PPI network were mapped onto the DrugBank database. |Connectivity score| > 2 was used as the cutoff value to identify small molecular drugs that could target HNSCC.

Statistical analysis
Univariate analysis was performed using SPSS 17.0 (SPSS Inc., Chicago, IL, USA). Cumulative survival time was calculated and analyzed by the Kaplan-Meier and log-rank test. Differences between the groups were tested by the chi-square or Fisher's exact test. P < 0.05 was considered statistically significant.

WGCNA and key module analysis
The initial quality was assessed using the average linkage method. Two outlier samples were removed after the clustering. The remaining 339 HNSCC and 44 normal tissue samples with clinical information pertaining to LOI were used for subsequent analyses. In total, 2601 genes showed the highest variance via the average linkage/hierarchical clustering method.
To establish a scale-free network, the scale-free index (Fig. 1a) and mean connectivity (Fig. 1b, c) were calculated. We found that when the power value of β = 7, the scale-free topology for the fitting index reached 0.85 (Fig. 1d). Different genes were subsequently grouped into modules according to the association of expression. Moreover, genes with similar expression patterns were placed into different modules via average linkage clustering. Finally, a total of 10 modules were identified (Fig. 2). On exploring the correlation between the MEs and LOI clinical phenotype, we found that 10 co-expression modules were correlated with the LOI clinical phenotype ) and were associated with cancer status, particularly turquoise and pink key modules (Fig. 3b). Then, scatter diagrams were constructed for correlation analyses between gene significance for LOI status and module membership in the turquoise (Fig. 3c) and pink ( Fig.  3d) modules, which revealed that genes in the two modules were significantly related with LOI status. The correlation and P values (Fig. 3c, d) indicated that the turquoise and pink modules showed high correlations with LOI status.

Enrichment analysis of key co-expression modules
To determine the function of genes in the key coexpression modules, GO function and KEGG pathway analyses were performed. GO function analysis showed that the turquoise module was associated with DNA replication, mitotic nuclear division, chromosome segregation, nuclear division, and DNA-dependent DNA replication, whereas KEGG pathway analysis indicated that it was associated with cell cycle, DNA replication, mismatch repair, and p53 signaling pathway (P < 0.05) (Fig. 4a, b). Similarly, GO function analysis indicated that the pink module was involved in not only squamous cell functions, such as epidermal cell differentiation, keratinocyte differentiation, skin development, epidermis development, and cornification (P < 0.05), but also regulation of protein secretion, for example, via the negative regulation of proteolysis, peptidase activity, and endopeptidase activity (P < 0.05) (Fig. 4c).
These results indicated that the turquoise and pink modules played a pivotal role in LOI in patients with HNSCC.

PPI network analysis and hub genes
To identify hub genes in the key modules, PPI network analysis was performed using the STRING database.
Connection threshold was used to define the hub genes; 89 genes, including the top five genes KIF18B, BUB1, BUB1B, KIF4A, and EXO1, in the turquoise module (connect threshold > 0.25) and 38 genes, including the top five genes KRT78, CNFN, SLURP1, PRSS27, and CRCT1, in the pink module (connect threshold > 0.10) were screened as candidate hub genes (Fig. 5, Additional file 4: Table S4, Additional file 5: Table S5). In addition, connect degree (> 6) was used to define the hub genes, which led to the identification of 24 hub genes (18 in the turquoise module and 6 in the pink module).

Prognostic value and expression analysis of hub genes
After excluding samples with no survival information or survival duration < 1 month, 339 HNSCC samples were used to evaluate the prognosis of the 24 hub genes. We found that HNSCC samples with LOI showed a poor clinical outcome than those without LOI (P < 0.05), indicating that LOI is a key histological characteristic in HNSCC (Fig. 6a). Univariate survival analysis was then performed using the R-package survival, and the results indicated that CNFN was associated good survival ( Fig.  6a) but KIF18B, KIF23, PRC1, CCNA2, DEPDC1, and TTK were associated with poor survival in HNSCC samples with LOI (P < 0.05; Fig. 6c-h).
To determine mRNA expression levels of the seven hub genes (CNFN, KIF18B, KIF23, PRC1, CCNA2, DEPDC1, and TTK), we used GEPIA and found that CNFN was significantly downregulated but KIF18B, KIF23, PRC1, CCNA2, DEPDC1, and TTK were significantly upregulated in HNSCC (P < 0.05; Fig. 6i). To assess protein expression levels of the seven hub genes, we performed protein expression analyses using the HPA database (Fig.  6j). The expression level of CNFN was low and thus could  (Fig. 6k).

Establishment of the prognostic risk score formula
Using the LASSO method and multivariable Cox regression analysis, two mRNAs (CNFN and DEPDC1) were identified as integrated prognostic biomarkers in patients with HNSCC. We then established a prognostic risk score formula based on the expression profiles of CNFN and DEPDC1 and their regression coefficients. The prognostic risk score formula was as follows: risk score = EXP DEPDC1 * 0.32636 + EXP CNFN * (− 0.07544). The risk score was calculated for all patients, classifying patients into the highrisk (n = 165) and low-risk (n = 165) group using the median risk score as the cutoff value (Additional file 6: Table  S6). The distribution of risk scores and survival status of patients are shown in Fig. 7a and b, respectively. We then assessed the prognostic value of the aforementioned formula using Kaplan-Meier analysis. Patients in the low-risk group showed better OS than those in the high-risk group (P < 0.001; Fig. 7c). Moreover, timedependent ROC analysis was utilized to evaluate the prognostic capacity of the formula. The areas under the ROC curve for 1-, 3-, and 5-year OS were 0.582, 0.634, and 0.636, respectively, implying that the integrated two-mRNA signature was much better at predicting the risk of LOI in patients with HNSCC (Fig. 7d).

Identification of small molecular agents
To determine which small molecular agents in the turquoise and pink modules could target LOI, we searched all drug-gene interactions in the DrugBank database. |Connectivity score| > 2 and P < 0.05 were used for screening; we found that five drug-module interactions (XL844, AT7519, AT9283, alvocidib, and nelarabine) in the turquoise module and three drug-module interactions (benzamidine, Lglutamine, and zinc) in the pink module could be used to  Table 2). To investigate the clinical application of the eight small molecular agents in head and neck cancer or solid tumor, we examined the clinical trial registration of these agents using ClinicalTrials.gov (https:// clinicaltrials.gov/ct2/home). Although a study on benzamidine remains to be conducted, three clinical trials of Lglutamine (NCT03015077, NCT02282839, NCT00006994) and zinc (NCT00036881, NCT03531190, NCT02868151)  Table S7). Moreover, XL844 (NCT00475917), AT7519 (NCT00390117, NCT02503709), AT9283 (NCT00443976, NCT00985868), alvocidib (NCT00080990), and nelarabine (NCT01376115) have been explored in the context of solid tumor/cancer. These findings indicated that XL844, AT7519, AT9283, alvocidib, nelarabine, benzamidine, L-glutamine, and zinc appear promising for treating LOI.

Discussion
Metastasis is the leading cause of treatment failure in patients with HNSCC [20]. Nodal metastatic disease is an independent factor for poor survival in HNSCC [21][22][23]. Several clinicopathological parameters have been associated with nodal metastasis, such as tumor size [9], tumor depth [24], tumor differentiation [25], histological grade [26], and LOI [4]. Herein we performed comprehensive, integrative genomic analyses of LOI in patients with HNSCC from the molecular to clinical and prognostic levels. We established a novel two-mRNA signature for predicting LOI risk in HNSCC. The survival curves indicated that the low-and high-risk groups stratified by the mRNA signature had a significant difference in prognoses. Time-dependent ROC analysis revealed that the mRNA signature had a high accuracy in predicting OS. Moreover, the small molecular agents, namely XL844, AT7519, AT9283, alvocidib, nelarabine, benzamidine, L-glutamine, and zinc, were identified as novel candidates for treating LOI.
With the application of sequencing techniques, genomic studies have transitioned from assessing aberrant expression levels of individual genes to systematically integrating omics data from cancer tissues. The molecular mechanisms underlying LOI remain unclear. TCGA database has been used by several studies to define the genomic landscape of HNSCC, providing us an opportunity to integrate genomics data and understand molecular changes associated with LOI. In the current study, we constructed a co-expression network module of HNSCC and found that the turquoise and pink modules were significantly associated with LOI. Functional enrichment analysis indicated that the key gene modules were involved in not only squamous cell functions, such as epidermal cell differentiation, keratinocyte differentiation, skin development, epidermis development, and cornification but also regulation of protein secretion, for instance, via the negative regulation of proteolysis, peptidase activity, and endopeptidase activity. Furthermore, the turquoise module was associated with DNA replication, mitotic nuclear division, nuclear division, and DNA-dependent DNA replication. KEGG pathway analysis validated that the turquoise module was associated with cell cycle, DNA replication, mismatch repair, and p53 signaling pathway, indicating the involvement of pertinent genes in LOI in patients with HNSCC.
Lymphatic vessels are remodeled by the tumor microenvironment, including cancer cells, mutations of oncogenic driver genes, and interactions between immune checkpoint signals and their receptors [27]. Herein we systematically analyzed the mRNA expression level of 339 HNSCC samples with LOI and 44 normal tissue samples; 2522 DEGs were identified. PPI network and module analyses showed that 18 genes in the turquoise module (e.g., KIF18B, BUB1, BUB1B, KIF4A, and EXO1) and six genes in the pink module (e.g., KRT78, CNFN, SLURP1, PRSS27, and CRCT1) were associated with LOI in HNSCC. However, the roles and mechanisms of these 24 genes in the metabolic and immune reprogramming of the tumor microenvironment demand further exploration.  Early diagnosis of LOI is pivotal, considering that timely treatment is of utmost importance in HNSCC patients with LOI [28,29]. Despite the development and application of magnetic resonance imaging and positron emission tomography-computed tomography to assess LOI in HNSCC, the detection rate of early-stage LOI remains low [30]. In this study, the hub genes in the key modules related to LOI were screened, and prognostic value and expression analyses showed that CNFN was downregulated and associated with good prognosis, whereas KIF18B, KIF23, PRC1, CCNA2, DEPDC1, and TTK were upregulated and associated with poor prognosis. The two-mRNA signature could stratify the risk of LOI and predict OS in patients with HNSCC; however, there are also some limitations. First, the two-mRNA signature needs to be further explored. Second, the prognostic value of the mRNA panel was not very satisfactory and thus demands additional investigation. Finally, the biological functions and mechanisms of the two mRNAs were not assessed in this study.
Although the targeted treatment for LOI is lacking and unreliable, DrugBank provides comprehensive molecular information pertaining to drugs and their targets for treating LOI. Based on interactions between the drugs and key modules, we found eight small molecular agents (benzamidine, L-glutamine, zinc, XL844, AT7519, AT9283, alvocidib, and nelarabine) that could target LOI. AT7519 and alvocidib, a cyclin-dependent kinase inhibitor, have been reported to target CDK1 and thus proposed to have anticancer effects [31][32][33][34][35]. XL844 is a specific inhibitor of checkpoint kinase-1 and -2 and prevents the formation of a normal mitotic spindle; it can reportedly effectively sensitize cancer cells to induce cell cycle arrest [36]. Clinical trial registration analyses of the eight small molecular agents indicated that they have been widely explored in head and neck cancer or solid tumor. These results indicated that they could be used for treating LOI in patients with HNSCC; however, their roles and mechanisms in the context of LOI require further exploration.

Conclusions
To summarize, we herein performed a comprehensive, integrative genomic analysis of LOI in patients with HNSCC and established a two-mRNA signature that could stratify the risk of LOI and predict OS. Finally, we report that benzamidine, L-glutamine, zinc, XL844, AT7519, AT9283, alvocidib, and nelarabine are novel candidate drugs for controlling LOI in HNSCC.