Identification and validation of a six-gene signature associated with glycolysis to predict the prognosis of patients with cervical cancer

Background Cervical cancer (CC) is one of the most common gynaecological cancers. The gene signature is believed to be reliable for predicting cancer patient survival. However, there is no relevant study on the relationship between the glycolysis-related gene (GRG) signature and overall survival (OS) of patients with CC. Methods We extracted the mRNA expression profiles of 306 tumour and 13 normal tissues from the University of California Santa Cruz (UCSC) Database. Then, we screened out differentially expressed glycolysis-related genes (DEGRGs) among these mRNAs. All patients were randomly divided into training cohort and validation cohort according to the ratio of 7: 3. Next, univariate and multivariate Cox regression analyses were carried out to select the GRG with predictive ability for the prognosis of the training cohort. Additionally, risk score model was constructed and validated it in the validation cohort. Results Six mRNAs were obtained that were associated with patient survival. The filtered mRNAs were classified into the protective type (GOT1) and the risk type (HSPA5, ANGPTL4, PFKM, IER3 and PFKFB4). Additionally, by constructing the prognostic risk score model, we found that the OS of the high-risk group was notably poorer, which showed good predictive ability both in training cohort and validation cohort. And the six-gene signature is a prognostic indicator independent of clinicopathological features. Through the verification of PCR, the results showed that compared with the normal cervial tissuses, the expression level of six mRNAs were significantly higher in the CC tissue, which was consistent with our findings. Conclusions We constructed a glycolysis-related six-gene signature to predict the prognosis of patients with CC using bioinformatics methods. We provide a thorough comprehension of the effect of glycolysis in patients with CC and provide new targets and ideas for individualized treatment. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-020-07598-3.


Background
Cervical cancer (CC) is one of the most common gynaecological cancers, accounting for the fourth cause of cancer-related death in women [1]. According to global cancer statistics, in 2018, there were nearly 570 thousand new cases of CC worldwide, with approximately 310 thousand deaths [1]. In recent years, with the launch of the human papillomavirus (HPV) vaccination programme, the incidence of CC in developed countries has significantly decreased, but in developing countries, it is still on the rise [2], and the age of onset tends to be younger. In addition, a large proportion of patients with CC are found to be in an advanced stage, and at this time, treatment options are extremely limited, and side effects are more serious, with a 5-year survival rate of less than 20% [3][4][5]. Moreover, patients with the same clinical stage and pathological type tend to adopt the same treatment, but the prognosis of patients is different, which is mainly due to the genetic heterogeneity of patients. Therefore, it is necessary to identify effective biomarkers to predict the prognosis of patients with CC.
Aerobic glycolysis is a special mechanism of tumour cell metabolism, also known as the Warburg effect [6], which plays an important role in promoting the growth and metastasis of various tumours, including CC. Some studies have found that HPV protein can promote the development of cancer through the Warburg effect, and the Warburg effect may also contribute to the enhancement of virus replication ability in the early stage of HPV infection [7]. In addition, total lesion glycolysis (TLG) is a measure of tumour metabolic activity, and some retrospective studies have found that TLG was significantly related to the recurrence-free survival (RFS) and overall survival (OS) of patients with CC [8][9][10]. Moreover, some glycolytic enzymes have also been proven to be related to the prognosis of CC. For example, hexokinase-2 (HK2), an enzyme that catalyses the conversion of glucose into glucose-6-phosphate, is overexpressed in a variety of cancers, has a promoting effect on the occurrence and development of CC, and is significantly related to the prognosis of patients [11]. Lactate dehydrogenase A (LDHA) and phosphofructokinase P (PFKP) were found to be significantly correlated with progression-free survival (PFS) and OS in patients with CC, and the expression level of LDHA in recurrent tumours was significantly higher than that in nonrecurrent tumours [12]. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a classical glycolytic enzyme that has been reported to be significantly increased in CC [13]. However, single gene biomarkers cannot achieve a good prediction effect, and some studies have suggested that gene signatures may be a better choice for predicting patient outcomes.
By mining public databases, many relevant studies have found that the glycolysis-related gene (GRG) signature is closely associated with the prognosis of patients with cancer, such as lung adenocarcinoma [14], liver cancer [15], pancreatic ductal carcinoma [16] and endometrial carcinoma [17]. Nevertheless, there is no bioinformatics research on this in CC. Thus, in this study, we analysed the relationship between the GRG signature and CC through the University of California Santa Cruz (UCSC) Database, which helped us better assess the prognosis of patients and provided new insights for individualized treatment of patients with CC.

Acquisition of mRNA expression dataset
The workflow of the present study is displayed as Fig. 1 Table 2).

Identification and analysis of differentially expressed glycolysis-related genes (DEGRGs)
To obtain the cancer-related genes, we compared the expression levels of 16,208 mRNAs between CC and normal tissues using the limma package. A gene with false discovery rate (FDR) < 0.05 and |logFC| > 1 was defined as the DEG. Then, a list of GRGs (GO_ GLYCOLYTIC_PROCESS, HALLMARK_GLYCOLY-SIS, KEGG_GLYCOLYSIS_GLUCONEOGENESIS, RE ACTOME_GLYCOLYSIS, and BIOCARTA_GLY-COLYSIS_PATHWAY) was downloaded from the Molecular Signatures Database v5.1 (MSigDB). The Venn diagram was to identify the DEGRGs by combining GRGs and DEGs. Furthermore, to understand the potential function and related pathway of DEGR Gs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were carried out by the "clusterprofiler" package.
Construction and validation of the prognostic risk score signature Previous researches indicated that GRGs have potential prognostic value for cancer patients, but the role in CC patients remains unclear. Therefore, the survival analyses of GRGs in CC patients were performed in the present study. First, according to the ratio of 7: 3, all CC patients were randomly divided into training cohort (193 patients) and validation cohort (80 patients) ( Table 1). In our research, the prognostic signature was developed in the training cohort and validated in the validation cohort. To identify the prognostic value of DEGRGs in CC patients, we performed univariate Cox regression analysis to confirm OS-related DEGRGs in training cohort. Then, based on the OS-related DEGRGs, a stepwise model selection by the Akaike information criterion (AIC) was used to avoid overfitting, and significant genes were incorporated into the multivariate Cox analysis to construct the GRG-based prognostic signature. By linear combination of the expression values of the selected genes weighted with the regression coefficients of the multivariate Cox regression analysis, the prognostic risk score model was established as follows: (β i is the coefficient of the gene; i in multivariate Cox analysis; G i represents the expression value of gene i; n is the number of genes in the signature). The time-dependent receiver operating characteristic (ROC) curve and area under the curve (AUC) value were used to evaluate the discrimination of the prognostic model. Additionally, the optimal cutoff value of risk score was determined by performing X-tile software, and patients were divided into low-and high-risk groups. Kaplan-Meier (K-M) survival curve and the log-rank test were adopted to compare the prognosis between two groups.
To further study the prognostic value of GRG signature in an independent cohort, the risk score of each patient in the validation cohort. Time-dependent ROC curve, AUC value, and K-M survival curve were used to show the prognostic ability of GRG signature in the validation cohort.

Development of a clinical-GRG nomogram for CC patients
As reported in previous studies, several clinical variables were confirmed as prognostic variables, such as AJCC, T classification, N classification and M classification. Therefore, to develop a comprehensive prognostic model for CC patients, we firstly studied the prognostic value of clinical data, including age, grade, ATCC stage and TNM classification in CC patients by univariate Cox analysis. Moreover, the OS-related clinical data and GRG signature were incorporated into the multivariate Cox analysis and the independent variables were used to develop a nomogram. Concordance-index(C-index) and calibration curve were used to evaluate the nomogram.

Specimens sources
Ten cases of CC and 10 cases of normal cervical tissue were obtained from the Department of gynecology and obstetrics, the First Affiliated Hospital of Wenzhou Medical University. Informed consent was provided by all patients. This study was approved by the ethics committee of the First Affiliated Hospital of Wenzhou Medical University. All tissues were pathologically diagnosed and stored in − 80°C for preservation.

Conducting quantitative real time (qRT)-PCR on tissues
Total RNA was refined from clinical specimens by TRI-zol® reagent (Vazyme, Nanjing, China), then, following the manufacturer's protocol, a PrimeScript RT reagent kit (Vazyme) was used to reverse transcription. qRT-PCR was performed to evaluate the DEGRGs expression in different specimens using SYBR-Green Premix (Vazyme) with specifc PCR primers shown in Supplementary Table 3. With GAPDH as the internal control, and the Ct method (2-ΔΔCt) was used to normalize the relative genes expression values.

Correlation analysis between immune cell infiltration and glycolysis
We adopted CIBERSORT [18], which is widely used to describe the immune cell composition of the gene expression profiles to explain the 22 immune cell subtypes. The global P value of each sample deconvolution was determined by CIBERSORT, and samples with CIBER-SORT P < 0.05 was selected for further study. Thus, we included a total of 184 patients from 193 patients. To Abbreviations: T Tumor, N Node (regional lymph node), M Metastasis determine the relationship between immune cell infiltration and glycolysis, we divided 184 patients into highand low-risk groups based on the prognostic risk score and compared the differences in immune cells between the two groups and displayed the results by heatmap and violin plot. Furthermore, we conducted univariate Cox regression, LASSO and multivariate Cox regression analyses to examine whether the risk score and immune cells can be used as independent factors to predict the prognosis of CC patients.

Results
Preliminary screening of glycolysis-related mRNA We obtained the mRNA expression profiles from the UCSC database, including 306 tumour samples and 13 normal samples. By comparing tumour and normal tissue samples, we screened 4772 differentially expressed mRNAs from 16,208 mRNAs. Then, Venn diagram software was applied to identify the DEGRGs in the 4772 differentially expressed mRNAs and 5 glycolytic gene sets, and the results showed that 96 DEGRGs were detected (Fig. 2).
To further analyse the relationship between these genes and glycolysis, we carried out GO analysis and KEGG pathway enrichment analysis on these genes. As a result, we found in GO analysis that the primary enriched terms were ADP metabolic process, pyruvate metabolic process and glycolytic process (Fig. 3a). For KEGG pathway enrichment analysis, glycolysis/gluconeogenesis, fructose and mannose metabolism and carbon metabolism were the most enriched. Based on these results, it has been shown that the genes we selected are indeed related to glycolysis (Fig. 3b).

Identification of DEGRGs associated with survival and visualization of the DEGRG status
We analysed 96 mRNAs selected above to screen out survival-related mRNAs in training cohort by univariate Cox regression analysis and obtained 19 mRNAs with P < 0.05 (Fig. 4a). Next, we performed a stepwise model selection by the AIC, and six mRNAs (HSPA5, ANGP TL4, PFKM, GOT1, IER3 and PFKFB4) were obtained. Finally, multivariate Cox analysis was carried out to further identify the relationship of selected mRNAs with CC patients' prognosis and obtained the coefficients. The filtered mRNAs were classified into the protective type (GOT1), whose HR < 1 was related to better prognosis, and the risk type (HSPA5, ANGPTL4, PFKM, IER3 and PFKFB4), whose HR > 1 was related to poor prognosis ( Table 2).
In addition, to examine the DEGRG status of the whole genome, we visualized the data by using Circos plots [19], which are shown in Fig. 4b. The outer layer includes chromosomes and 96 DEGRGs. The middle four layers, from the outside to the inside, are the average expression values of DEGRGs in normal tissues, the average expression values of DEGRGs in tumour tissues, the logFC of the difference analysis, and the 19 meaningful prognostic DEGRGs obtained by univariate Cox regression analysis in the training cohort. The inner layer is the protein-protein interaction (PPI) network of the DEGRGs. In addition, 0.4 was defined as the minimum required interaction score, and between 0.4 and 0.9 is displayed in blue, while greater than 0.9 is displayed in red.
Constructing and validating a six-mRNA signature to predict patient prognosis Using the linear combination of the expression value of the selected genes and the regression coefficient of multivariate Cox regression analysis, the following predictive risk scoring model was established: risk score = − 0.5224 × expression of GOT1 + 0.6886 × expression of HSPA5 + 0.1495 × expression of ANGPTL4 + 0.4280 × expression of PFKM + 0.3741 × expression of PFKFB4 + 0.2287 × expression of IER3. Based on the prognosis risk score, 193 patients were divided into high-and low-risk groups by using optimal risk score cutoff identified by X-tile as the boundary value (Fig. 5a) The outermost layer shows 96 DEGRGs and their location in the whole genome, the middle part from the outside to the inside shows the expression of these genes in normal and tumour tissues, and the 19 DEGRGs associated with prognosis obtained by univariate Cox regression analysis in the training cohort. The inner layer is the PPI network. The medium confidence of the minimum required interaction score is 0.4, < 0.9 is blue, and > 0.9 is red patients was presented (Fig. 5b). The K-M analysis showed that compared to the high-risk group, the OS in the lowrisk group was dramatically better (p < 0.0001; Fig. 5c). The AUCs for 1-, 3-and 5-year OS were 0.791, 0.731 and 0.782, respectively (Fig. 5d), suggesting that the six-mRNA signature has excellent diagnostic significance for prognosis prediction. Additionally, we generated a heatmap to exhibit the expression profiles of the six mRNAs, from which we can see that, compared with the low-risk group, the expression level of risk-type mRNAs (HSPA5, ANGPTL4, PFKM, IER3 and PFKFB4) of the high-risk group was apparently higher, while the expression level of the protective-type mRNA GOT1 was opposite (Fig. 5e). Next, we validated the predictive ability of the six-mRNA signature in the validation cohort. Eighty patients were divided into high-and low-risk groups by using the same optimal risk score cutoff which were used in the training cohort (Fig. 6a), and the respective survival status of 80 patients was presented (Fig. 6b). And the K-M analysis showed that the patients in the low-risk group had significantly better OS (P = 0.021) (Fig. 6c), which AUC values were 0.664, 0.635 and 0.661 for 1-, 3-and 5-year, respectively (Fig. 6d).   Table 3). Based on these results above, we constructed a nomogram prediction model combined the risk score with N classification to predict CC patients' OS (Fig. 7a), which C index is 0.83. The calibration plot showed that in the nomogram, the predicted values of OS at 1, 3 and 5 years for CC patients have a good correlation with the actual values ( Fig. 7b-d).  Next, stratified analysis was performed to further validate whether the six-mRNA signature can be used as an independent prognostic factor, according to the grade, AJCC, T classification and age. First, we classified patients into the grade 1-2 dataset and grade 3-4 dataset, and the patients in the datasets were divided into high-and lowrisk groups, respectively. As a result, we found that there were significant differences in OS between these two groups, and the OS of the low-risk group was obviously better (Fig. 8a). Next, we did the same thing for the other clinicopathological features, and we found that the OS of the high-risk group was significantly worse than that of the low-risk group regardless of the AJCC stage (stage I-II or stage III-IV) (Fig. 8b), T classification (T0 or T1) (Fig. 8c) and age (younger than 50 or older than 50) (Fig. 8d), further confirming the reliability of our analysis.

Validating the overexpression of the six mRNAs in CC tissues by qRT-PCR
We validated the expression level of six mRNAs in 10 CC tissues and 10 normal cervical tissues by using qRT-PCR. The results showed that compared with the normal cervial tissuses, the expression level of HSPA5, ANGP TL4, PFKM, GOT1, IER3 and PFKFB4 were significantly higher in the CC tissue (Fig. 9), making the bioinformatics analysis result much more reliable and precise.

The relationship between immune cell infiltration and glycolysis
We analysed a total of 184 patients' immune cell composition from their gene expression profiles. Additionally, we divided these patients into two groups, and we can see that the expression levels of CD8 T cells, regulatory T cells (Tregs) and resting mast cells were significantly higher in the low-risk group compared to the high-risk group; however, in the high-risk group, the expression levels of neutrophils, M0 macrophages, activated mast cells and resting CD4 memory T cells were dramatically higher (Fig. 10).
In addition, we analyzed the relationship between immune cells and prognosis of CC patients. In the univariate Cox regression analysis, CD8 T cells, resting CD4 memory T cells, M0 macrophages, M2 macrophages, resting mast cells, activated mast cells and Neutrophils all showed significant differences with p < 0.05. Among these immune cells, the HR of CD8 T cells, M2 macrophages and resting mast cells were less than 1, which associated with better prognosis. In the next multivariate Cox regression analysis, only activated mast cells showed significant differences with p < 0.05. And both in univariate or multivariate Cox regression analysis, the risk score showed significant prognostic value (p < 0.001) ( Table 4).

Discussion
In recent years, increasing attention has been paid to the relationship between energy metabolism and tumours, among which the Warburg effect is a notable feature of the energy metabolism of tumour cells. Despite the low efficiency of glycolysis, tumour cells are still active in glycolysis, which can produce more energy and various metabolites in a short time so that tumour cells can  [6]. More importantly, the rapid growth of tumours is often accompanied by hypoxia, and hypoxia inducible factor (HIF), as the key regulator of glycolysis, can solve the nutritional needs of tumour cells by inducing the expression of glucose and amino acid transporters (glucose transporter type 1 (GLUT1)) and L-type amino acid transporter 1 (LAT1) to further promote the progression of cancer [20,21]. There have been many studies on the relationship between glycolysis and tumours, but the relationship between GRGs and the prognosis of patients with CC is still very limited. Because a single gene biomarker cannot provide a strong prediction effect, a more accurate and reliable gene signature was used to predict the clinical outcomes of patients. For instance, a glycolysis-related nine-gene signature was used to predict the survival of patients with endometrial cancer [17]. A tumour microenvironmentrelated nine-gene signature was used to predict overall survival with ovarian cancer [22]. Moreover, an autophagy-related five-gene signature was developed for  prognosis prediction in patients with prostate cancer [23]. In this study, we used the GRG signature for the first time to predict the survival of patients with CC and obtained a good prediction effect. First, we screened out the differentially expressed glycolytic mRNAs from the CC dataset in the UCSC database and selected six DEGRGs with predictive ability for the prognosis of patients with CC in the training cohort through univariate and multivariate Cox regression analysis (P < 0.05). Subsequently, by constructing the prognostic risk score model, we found that there was a significant difference in OS between the high-and lowrisk groups, and the OS of the high-risk group was evidently worse. Furthermore, the risk score model was validated in the validation cohort with good predictive ability. Finally, further verification was done through PCR, we found that the expression of these six genes in tumor tissue was indeed significantly higher than that in normal tissue.
From our analysis, we selected six DEGRGs (HSPA5, ANGPTL4, PFKM, GOT1, IER3 and PFKFB4). Heat shock protein A5 (HSPA5, also konwn as GRP78), a member of the HSP70 family, has been found to play an important role in cancer malignancy and anti-tumor therapy. And it has been reported that the high expression of HSPA5 can protect cancer cells from immune surveillance, and inhibition of its expression can promote cell apoptosis and inhibit tumor growth [24,25]. Angiopoietin-like4 (ANGP TL4) is highly expressed in various tumors, which is closely related to tumor growth and metastasis, such as hepatocellular carcinoma [26], breast cancer [27], head and neck squamous cell carcinoma [28] and melanoma [29]. And ANGPTL4 participates in the construction of GRG signature in lung adenocarcinoma to predict the survival and metastasis of patients [14]. In addition, previous studies have found that ANGPTL4 is significantly associated with the susceptibility of CC, is a potential risk factor [30]. Muscular phosphofructokinase (PFKM), a member of the phosphofructokinase (PFK) family, can promote the growth of muscle-infiltrating bladder cancer [31] and can be used as a new breast cancer gene [32], and its expression level can distinguish normal tissues from CC tissues [33]. 6-Phosphofructo-2-kinase/fructose-2, 6-biphosphatase 4 (PFKFB4), is a key kinase in Warburg pathway [34], has been found to be associated with a variety of cancers, including breast cancer [35], prostate cancer [36] and glioblastoma [37], promoting the progression and metastasis of cancer, and may become an effective molecular target of anti-tumor drugs. It was found that the expression of immediate early response gene 3 (IER3) was increased in advanced cancer [38,39], but some studies have found that IER3 can also promote tumour cell apoptosis and has anti-tumour activity, such as lower expression in CC tissues [40], and increasing the expression of IER3 can enhance the sensitivity of CC cells to radiotherapy [41]. In contrast, in our study, we found that IER3 exists as a risk factor. Moreover, glutamate oxaloacetate transaminase 1 (GOT1), is a gene that encodes cytoplasmic aspartate aminotransferase and a key aspartate-producing protein. Glutamine metabolism is essential for the proliferation of cancer cells in addition to glucose. Glutamine derived glutamic acid is used to generate nonessential amino acids by GOT1 in high proliferative cells, which plays an important role in cell proliferation [42,43]. Moreover, HIFɑ can inhibit the proliferation of tumor cells by inhibiting the synthesis of aspartate [44]; however, in our study, we found that GOT1 was a protective factor, with higher expression in the low-risk group than in the high-risk group. Hence, further studies are needed to explore the relationship between these two genes (IER3 and GOT1) and the prognosis of patients with CC.
Interestingly, we also found some divergences in immune cell infiltration between the high-and low-risk groups, suggesting that glycolysis may be correlated with tumour immunity. In addition, several previous studies have elucidated the relationship between glycolysis and immunity. Excess glycolysis will lead to acidic tumor microenvironment, affect the infiltration of immune cells in varying degrees, and contribute to the survival of cancer cells [45]. Cascone et al. found that in patients with melanoma and NSCLC, high glycolytic activity will be accompanied by poor immune cell infiltration, such as cytoxic T cells, T helper cells, memory T cells, macrophases, or natural killer (NK) cells reduction [46]. Li et al. found that high glycolysis group had a higher immunescore and higher TIL percentage in breast cancer, however, the immune cells with anti-tumor effect were not enriched, and concluded that high glycolysis was associated with immunosuppression of tumor microenvironment [47]. In our study, the expression levels of CD8 T cells and resting mast cells were significantly higher in the low-risk group. CD8 T cells, to our knowledge, play a pivotal role in the control of tumour cell growth, and their relationship with the prognosis of patients with CC has also been confirmed in a previous study [48,49]. For mast cells, some studies have found that mast cell infiltration in CC indicates poor clinical prognosis [49][50][51][52], which was consistent with our findings that resting mast cells were highly expressed in the low-risk group and that activated mast cells were highly expressed in the high-risk group. These findings are consistent with previous studies and to some extent explain why patients in the low-risk group have better survival outcomes.
It is undeniable that this study does have some defects. On the one hand, we obtained clinical gene information for only 273 cases of CC, and the sample size was not large enough. On the other hand, it would be better to have an external validation cohort to verify the accuracy of the prediction model rather than the internal validation cohort. However, we verified these six DEGRGs between tumor and normal tissues, which to some extent compensated for this defect.
In summary, we first identified the relationship between the GRG signature and the prognosis of patients with CC using bioinformatics methods and found that patients in the high-risk group had significantly lower OS than those in the low-risk group. Furthermore, we found some relationships between the infiltration of immune cells and glycolysis.

Conclusion
In brief, we constructed a six-gene signature (HSPA5, ANGPTL4, PFKM, GOT1, IER3 and PFKFB4) to predict the prognosis of patients with CC, which was also a prognostic factor independent of clinicopathological features. These findings will provide us with new insights into the role of glycolysis in CC, guiding individualized treatment for patients with CC.