Development of a prognostic signature for esophageal cancer based on nine immune related genes

Background Function of the immune system is correlated with the prognosis of the tumor. The effect of immune microenvironment on esophageal cancer (EC) development has not been fully investigated. Methods This study aimed to explore a prognostic model based on immune-related genes (IRGs) for EC. We obtained the RNA-seq dataset and clinical information of EC from the Cancer Genome Atlas (TCGA). Results We identified 247 upregulated IRGs and 56 downregulated IRGs. Pathway analysis revealed that the most differentially expressed IRGs were enriched in Cytokine-cytokine receptor interaction. We further screened 13 survival-related IRGs and constructed regulatory networks involving related transcription factors (TFs). Finally, a prognostic model was constructed with 9 IRGs (HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM, STC2, NGPTL3 and NR2F2) by multivariate Cox regression analysis. The patients were classified into two subgroups with different outcomes. When adjusted with clinical factors, this model was verified as an independent predictor, which performed accurately in prognostic prediction. Next, M0 and M2 macrophages and activated mast cells were significantly enriched in high-risk group, while CD8 T cells and regulatory T cells (Tregs) were significantly enriched in low-risk group. Conclusions Prognosis related IRGs were identified and a prognostic signature for esophageal cancer based on nine IRGs was developed. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-07813-9.


Background
Esophageal cancer (EC) is the eighth commonest cancer worldwide. The National Cancer Institute estimated 16, 910 new cases and 15,910 deaths from esophageal cancer in the United States in 2016 [1]. Its incidence has risen by more than six times (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) [2]. The overall five-year survival of EC and that after esophagectomy are still poor, although great improvements have been made in treatment [3]. Squamous cell carcinoma is the most common histological type of EC [4]. Tobacco, alcohol, and malnutrition are the most associated risk factors in the development of EC [5]. Once diagnosed, EC must be accurately staged prior to the initiation of treatment. TNM (tumor, lymph node, metastasis) is a staging system based on the status of tumor invasion, lymph node, and metastasis [6]. Early-stage EC is usually treated with endoscopic surgery, advanced EC with surgery with or without chemoradiation [7].
Certain specific genes and biomarkers are needed to predict the patient's therapeutic response and increase their survival [3]. Immune responses is critical in the tumor microenvironment. Tumor cells with genomic alterations can produce new antigens that can be recognized by the immune cells [8]. Expression of IRGs can serve as efficient biomarkers. Previous research have explored the IRGs-based prognostic features in patients with non-squamous non-small cell lung cancer [9] and papillary thyroid carcinoma [10]. However, prognostic models based on IRGs for EC remain to be elucidated.
This study investigated the clinical significance of a prognostic model based on immunogenomics.

Identification of differentially expressed genes (DEGs)
DEGs between EC and normal tissues were identified via R software (version: × 64 3.2.1) and package Limma. The p value was adjusted into the false discovery rate (FDR). A value of FDR less than 0.05 and |log2(FC)| higher than 1 were considered significant.

Identification of immune-related genes (IRGs)
DEGs overlapped with immune-related genes were obtained as the differentially expressed IRGs. Based on these IRGs, Gene Ontology (GO) [15] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [16] analyses were performed with the clusterprofiler R package to explore the underlying mechanisms of these IRGs.

Identification of prognosis-related IRGs and construction of regulatory network
Prognosis-related IRGs were identified using univariate COX regression analysis. We analyzed these prognosisrelated IRGs using the package R. Then, we investigated the interaction of these IRGs and differentially expressed TFs with a threshold of P < 0.05. Coefficient > 0.3 was considered as positive regulation, otherwise as negative regulation. Subsequently, we constructed a regulatory network with relevant TFs and prognosis-related IRGs by using cytoscape software 3.7.1 [17].

Construction of a prognostic model in EC based on IRGs
We constructed a prognostic model based on the results of a multivariate Cox regression analysis. Based on the median risk score, EC patients were divided into highrisk and low-risk groups. The performance of prognostic model was validated by survival analysis between groups with thresholds of p < 0.05 using the survival and survminer package of R. Receiver operating characteristic (ROC) analysis was performed via the survivalROC package, and the area under curve (AUC) was calculated to evaluate the efficiency of the model in predicting disease onset [18]. At the same time, we collated the patient's clinical information and deleted the incomplete information. Finally, a total of 115 patients' clinical information (Supplementary Table 1) were used for univariate and multiple regression analysis to determine whether the riskscore may become an independent predictor of ESCC. Association between IRG expression and clinical parameters was tested using independent ttests, and p < 0.05 were considered statistically significant. Clinical survival analysis in subgroups was also conducted, and p < 0.05 was considered statistically significant.

Verification of the prognosis-related IRGs in this model
We used the online software Oncomine (https://www. oncomine.org) to verify the IRGs. For screening, we set the following criteria: 1 "Gene: IRGs in this model"; 2 "Analysis Type: Cancer vs. Normal Analysis"; 3 "Cancer Type: Esophageal Cancer"; 4" Clinical Outcome: Survival Status "; 5 "Data Type: mRNA". Based on the specific binding between antibodies and antigens, immunohistochemistry can reveal the relative distribution and abundance of proteins. Using The Human Protein Atlas (THPA) (https://www.proteinatlas.org) [19], we observed the differences in key gene expression between normal and EC tissues.

Building a predictive nomogram
To investigate the possibility of EC 1-OS and 3-OS, we established nomograms by including all independent prognostic factors identified by multivariate Cox regression analysis. The effectiveness of the nomogram was evaluated by discrimination and calibration. Finally, we plotted the curve of the nomogram by package rms of R to observe the relationship between the predicted rate of nomogram and the observed rate.

Functional enrichment analysis
We used Gene Set Enrichment Analysis (GSEA) [20] to identify consistent differences between high-risk and low-risk groups and the associated biological processes. In screening the gene list of KEGG, p < 0.05 was considered statistically significant.

Differential expression of tumor-infiltrating immune cells between high-risk and low-risk groups
Status of immune infiltration in EC patients was achieved from the dataset of CIBERSORT. Subsequently, we tested the abundance of immune cells, and its difference between high-risk and low-risk groups by using two-sample T-test.

DEGs between EC and normal samples
The RNAseq tertiary data set of EC from TCGA included the biological information of 11 normal tissue and 160 EC samples. We identified 4094 DEGs, including 3272 upregulated DEGs and 822 downregulated DEGs. (Fig. 1a).

Identification of IRGs
By overlapping the immune-related genes and DEGs of EC, we identified 247 upregulated and 56 downregulated IRGs, as shown in Fig. 1b. Figure 2 shows the results of functional enrichment analysis. GO analysis (Fig. 2a) demonstrated that these IRGs were most involved in leukocyte migration in Biological Process (BP), vesicle lumen in Cellular Component (CC) and receptor ligand activity in Molecular Function (MF). KEGG analysis indicated that these genes were most involved in the interaction of cytokines with cytokine receptors. (Fig. 2b).

Survival analysis and construction of regulatory network
A total of 13 survival-associated IRGs were identified after integrating clinical information from TCGA via univariate COX regression, as shown in Fig. 3. After examining the expression of 318 transcription factors (TF), we found 61 with differential expressions between EC and normal samples, as shown in Fig. 4a, b. Finally a regulatory network was constructed using these survivalassociated IRGs with differently expressed TFs (Fig. 4c).

Construction of a prognostic model based on prognosisrelated IRGs and external validation
We constructed a prognostic model with nine prognostic IRGs based on the results of multivariate Cox regression analysis ( Table 1). The formula was as follows: Risk score = expression level of HSPA6*0.006713979 + S100A12*0.003828117 + CACYBP*0.042341765 + NOS2*0.02490294 + DKK1*0.015602891 + OSM*0.207589957 + STC2*0.075574581 + ANGP TL3*0.645334283 + NR2F2*0.015710952. We further explored the protein expression of these nine prognosisrelated IRGs in THPA (Fig. 5). Consistent with our results, THPA database showed that HSPA6, S100A12, CACYBP, NOS2, and STC2 in EC tissues were upregulated, and ANGPTL3 was down-regulated compared with those in normal tissues. However, we did not find expression of DKK1, OSM and NR2F2 proteins in the database.

Validation of the prognosis-related IRGs in the Oncomine database
We validated the reliability of the prognosis-related IRGs by using Oncomine. The databases showed that the IRGs were differentially expressed in EC and normal tissues. As shown in Supplementary Fig. 1, HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM and STC2 were up-regulated, and ANGPTL3 and NR2F2 were downregulated in EC tissues compared with those in normal tissues. We found that the results were almost consistent with our predictions.

Validation of the prognostic capacity of the model
Patients were separated into the high-risk group and the low-risk group based on the median risk score (Fig. 6 ac). Survival analysis showed that the survival rate in the high-risk group was remarkably lower than those in the low-risk group (p==2.366e− 06, Fig. 6d). The area under  curve (AUC) of the receiver operating characteristic (ROC) curve was 0.826 (Fig. 6e). Compared with clinical factors (including age, gender, grade, stage and TMN), this signature showed a greater performance in predicting the prognosis of EC. At the same time, univariate and multiple regression analysis (Fig. 7a, b) showed that when other clinical parameters were adjusted, the prognostic signature may become an independent predictor. The clinical significance of included genes was also explored in this study ( Fig. 8a-j). In order to assess the prognostic capacity of the model, we conducted a stratified analysis of clinical factors. Interestingly, we found that nearly the high-risk patients in subgroups of age ≤ 65 (Fig. 9a), male (Fig. 9b), G1 & G2 (Fig. 9c), stage III & IV (Fig. 9d), T-3-4( Fig. 9e), MO (Fig. 9f), N1-3 (Fig. 9g) and EAC (Fig. 9h) were inclined to unfavorable overall survival.

Construction and validation of predictive nomogram
Using a number of independent prognostic factors (including age, gender, grade, stage, TMN, histology, and

Identification of related biological processes and pathways
We employed risk score to classify the entire data set and determine the related pathways with these nine genes by using the Java software GSEA. The results showed that "one carbon pool by folate", "proteasome", "spliceosome" and "RNA degradation" were more abundant in the high-risk group than in the low-risk group. This suggests that in high-risk patients, the nine genes were most involved in pathways of protein degradation, RNA degradation and splicing. That is to say, patients with protein degradation, RNA degradation and splicing effects were more inclined to a poor prognosis (Fig. 11).

Difference of tumor-infiltrating immune cells between the two risk groups
To explore the relationship between the present IRGbased prognostic signature and tumor immune microenvironment, we compared the infiltration of immune cells in different risk groups as defined by the present IRG-based prognostic signature. The results showed that Macrophages M0, Macrophages M2 and activated mast cells were significantly enriched in high-risk group, while CD8 T cells and regulatory T cells (Tregs) were significantly enriched in the low-risk group (Fig. 12) Fig. 2).

Discussion
Esophageal cancer has a large number of new cases every year, and it has historically been regarded as an uncontrollable disease process. The etiology of esophageal cancer may be multifactorial, but part of it is due to the unique manifestation of this cancer [21]. At present, for the treatment of esophageal cancer, attention has shifted to the development of immunotherapy with novel immune biomarkers [22]. Somatic cells acquire malignancy through genetic alterations. Cancer cells usually evade the recognition of the immune system and develop into clinically meaningful masses [23]. Compared with conventional therapies, cancer immunotherapy shows long-lasting response with fewer adverse reactions [24]. This provides a new option for the treatment of EC. The prognostic model for EC has been continuously updated [25][26][27]. In this study, we identified 247 upregulated and 56 down-regulated IRGs in EC and screened out survival-related IRGs. Based on these data, we established a prognostic model that divided EC patients into high-risk and low-risk groups. This model showed a good predictive performance (AUC 0.826). The model was also an independent prognostic indicator by multivariate analysis incorporating other clinical factors. KEGG analysis indicated that the main pathway was enriched in cytokine-cytokine receptor interaction. Many biological processes are regulated by cytokines, including cell growth, differentiation, immunity, inflammation, and metabolism [28]. Tumor progression can be promoted by cytokines that affect the tumor microenvironment and directly act on cancer cells [29]. Moreover, cytokines participate in the immune response of cytotoxic T lymphocytes (CTLs) by modulating the differentiation of Th1 and Th2 cells [30]. Kita Y et al. found that STC2 may be involved in lymph node metastasis, making it a potential prognostic marker for patients with EC  [31]. Studies also demonstrated that STC2 may play an important role in ESCC tumorigenesis [32]. Abnormal expression of DKK1, which is regulated by DKK1-CKAP4 pathway, predicts the poor prognosis of esophageal squamous cell carcinoma (ESCC) [33]. These results are consistent with our findings. CacyBP regulates cell proliferation, tumorigenesis, differentiation or gene expression [34]. In colon cancer, CacyBP can promote the growth of cancer cells by enhancing the ubiquitinmediated degradation of p27kip1 [35]. In addition,    [36,37]. In our prognostic model, the IRGs showing prognostic values included HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM, STC2, ANGPTL3 and NR2F2. Among the, HSPA6 may be associated with early recurrence of HCC [38]. In ESCC, S100A12 is downregulated at the protein level [39]. In Barrett's esophagus and related adenocarcinoma, expression of inducible nitric oxide synthase (NOS-2) is increased, and NOS-2 also plays a role in inflammation and epithelial cell growth [40]. OSM has been identified as an inhibitor of tumor cell growth in a variety of cancers, including melanoma, ovarian cancer, and glioblastoma carcinomas [41][42][43]. The splice variant of oncostatin M receptor β is overexpressed in human esophageal squamous cell carcinoma [44]. Angiopoietinlike protein 3(ANGPTL3) is indicative of EC prognosis [45]. NR2F2 is involved in the progression of prostate adenocarcinoma [46], and NR2F2 expression is a prognostic factor for breast neoplasms [47]. High expression of NR2F2 in certain gastric and esophageal adenocarcinomas is associated with abnormal expression of cadherin 11, suggesting that the NR2F2-related embryonic pathways in these tumors are reactivated [48]. Proteasome dysregulation is implicated in the development of many types of cancer [49]. The proteasome is involved in cell cycle and transcription, two processes indispensable for cancer development [50]. The spliceosome catalyzes pre-mRNA splicing, a key regulatory step in gene expression [51,52]. Mutations in genes encoding splice proteins are frequently found in cancer [53]. Small molecule inhibitors that target splice components can be used to create anti-cancer drugs [52]. RNA degradation is a key posttranscriptional regulatory checkpoint to maintain proper functions of organisms. Ribonuclease, a key enzyme responsible for RNA stability, can be used alone for RNA degradation, and can bind to other proteins in the RNA degradation complex [54].
Previous immunotherapies mainly rely on T cells in tumor immune defense [55,56]. In the present research, the abundance of CD8 T cells and regulatory T cells in the low-risk group increased. T cells are critical in host defense against cancer [57]. The value of CD8 T cells for cancer prognosis has been assessed [58][59][60][61][62]. In addition, CD8 T cells also play a role in the progression of EC [63,64].
Tregs are divided into two major subpopulations: thymus-derived Tregs (nTregs) and inducible Tregs (iTregs) [65]. Tregs show significant versatility in their inhibitory mechanisms by releasing cytokines to directly inhibit signal transduction of effector T cells [66]. Tregs can also inhibit and kill B cells by inducing programmed cell death [67]. Indeed, Treg infiltration into the tumor has been negatively correlated to OS in a majority of human solid tumors [68,69]. However, this correlation is highly variable, depending on the tumor type [70]. In cancers that share a common feature of prominent chronic inflammation, such as colon, breast, bladder or head and neck cancers, intra-tumor accumulations of Treg appear to associate with favorable prognosis and improved OS [71][72][73] .This association has been explained by the capability of Treg to suppress "tumor promoting inflammation" (TPI). Moreover, previous study found that regulatory T cells are positively Fig. 10 Nomogram predicting overall survival for EC patients. a For each patient, several lines are drawn upward to determine the points received from the predictors in the nomogram. The sum of these points is on the "total point" axis. Then a line is drawn downward to determine the possibility of 1-and 3-year overall survival of EC. b, c The calibration plot for internal validation of the nomogram. The Y-axis represents actual survival, and the X-axis represents nomogram-predicted survival correlated with locoregional control may be through down-regulation of harmful inflammatory reaction, which could favor tumor progression in head and neck squamous cell carcinoma [71]. So it can be explained that why the abundance of regulatory T cells (Treg) in the low-risk group was higher than in the high-risk group in our finding. In high-risk group, we found that macrophages M0, M2 and activated mast cells were also significantly enriched. Tumor-associated macrophages are the most abundant cancer immune cells. Studies have found that the transcription factor forkhead box protein O1 (FOXO1) can promote the polarization of macrophages M0 to M2 and the recruitment of macrophages M2 in ESCC through transcriptional regulation [74]. Macrophage M2 can be transformed into macrophage M1, and can promote the proliferation, migration and ring-forming ability of lymphatic endothelial cells associated with EC [75]. In addition, macrophage M2 can promote the migration and invasion of ESCC cells, enhance the epithelial-mesenchymal transition process, and promote tumor progression, resulting in poor prognosis for ESCC patients [76]. Tissue kallikrein (TK1), which is highly expressed in activated mast cells, can participate in the formation of mitogenic kinin, which can stimulate the proliferation of tumor cells and enhance metastasis by increasing vascular permeability [77]. All these researches above can support the finding of our study.
It is the first time that a prognostic nomogram is developed with nine immune related genes. This nomogram can be routinely applied and is cost-effective in practice, as it does not need whole-genome sequencing for EC patients. When combined with clinical parameters like TNM stage, the nomogram can show a greater prognostic performance.
Although we constructed a novel nine-gene prognostic signature in esophageal cancer, several limitations of this study should also be acknowledged. Firstly, our prognostic signature was only based on the data from TCGA database, which is not validated in other databases or other centers across different populations. The performance of this prognostic signature might be more reliable if validation is performed with independent external data sets with long-term follow up. Secondly, this study only preliminary proposed a prognostic model and the validity of the five-gene signature model needs to be further verified by clinical trials. Our study was designed on the basis of a retrospective analysis and prospective research should be performed to verify the outcomes. Thirdly, the mechanisms underlying the nine immune-related genes in the prognosis prediction of esophageal cancer needed to be investigated through in vitro and in vivo experiments.

Conclusions
We identified the IRGs associated with the prognosis of EC and developed an IRGs-based prognostic signature that stratify EC patients into two subgroups with statistically different survival outcomes.
Additional file 2: Supplementary Fig. 2. Differences between the immune-related prognostic index and infiltration abundances of other important types of immune cells.