Development and validation of a five-immune gene prognostic risk model in colon cancer
BMC Cancer volume 20, Article number: 395 (2020)
Colon cancer is a common and highly malignant cancer. Its morbidity is rapidly increasing, and its prognosis is poor. Currently, immunotherapy is a rapidly developing therapeutic modality of colon cancer. This study aimed to construct a prognostic risk model based on immune genes for the early diagnosis and accurate prognostic prediction of colon cancer.
Transcriptomic data and clinical data were downloaded from The Cancer Genome Atlas database. Immune genes were obtained from the ImmPort database. Differentially expressed (DE) immune genes between 473 colon cancer and 41 adjacent normal tissues were identified. The entire cohort was randomly divided into the training and testing cohort. The training cohort was used to construct the prognostic model. The testing and entire cohorts were used to validate the model. The clinical utility of the model and its correlation with immune cell infiltration were analyzed.
A total of 333 DE immune genes (176 up-regulated and 157 down-regulated) were detected. We developed and validated a five-immune gene model of colon cancer, including LBP, TFR2, UCN, UTS2, and MC1R. This model was approved to be an independent prognostic variable, which was more accurate than age and the pathological stage for predicting overall survival at five years. Besides, as the risk score increased, the content of CD8+ T cells in colon cancer was decreased.
We developed and validated a five-immune gene model of colon cancer, including LBP, TFR2, UCN, UTS2, and MC1R. This model could be used as an instrumental variable in the prognosis prediction of colon cancer.
Colon cancer is the third most common type of malignant tumor, which affects millions of people worldwide . Despite significant advances that have been made for the treatment of colon cancer, its morbidity is rapidly increasing and its 5-year survival rate is low [2, 3]. Accordingly, to better the prognosis of colon cancer patients, it is essential and urgent to identify new indicators for the prognosis evaluation and targeted therapy of colon cancer.
The treatment of colon cancer has evolved to include not only the traditional methods of surgery, chemotherapy, and radiotherapy, but the rapidly developing immunotherapy . It was also found that reduced immune cytotoxicity  and lack of T-cell infiltration  predict adverse outcomes in patients with colorectal carcinoma. Although immunotherapy has been reported to be effective in colon cancer with microsatellite instability , in contrast to other tumor types, inhibitors of PD-1/−L1 or CTLA 4 have not yet shown relevant efficacy in unselected colorectal cancer . Also, because of the high heterogeneity of colon cancer , the prognosis may be considerably different between patients with similar clinical characteristics. Thus, it is essential to identify a multiple molecular model reflecting the sensitivity of patients to immunotherapy so that personalized treatments for colon cancer can be achieved.
In recent years, the development of high-throughput gene detection technology provides molecular markers for prognosis prediction and personalized treatment of colon cancer [9, 10]. However, as we know, none of these signatures were constructed based on multiple immune genes. Therefore, in the present study, we develop and validate a reliable prognostic model of colon cancer using differentially expressed (DE) immune genes, and verified the clinical utility of this model in colon cancer patients.
Transcriptomic data and clinical data were downloaded from The Cancer Genome Atlas (TCGA) database. Immune genes and Immune infiltrate data were downloaded from the ImmPort database (www.immport.org) and Tumor Immune Estimation Resource (TIMER) (http://cistrome.org/TIMER) , respectively.
Identification of DE genes
The Wilcoxon signed-rank test was used to conduct differential analysis. Benjamini and Hochberg’s algorithm was applied to control the false discovery rate (FDR). Log2(fold change [FC]) > 1 and FDR < 0.05 were set as the cut-offs. Pheatmap package and gplots package was used to make heatmap and volcano map.
Identification of DE immune genes
Based on the identified DE genes and immune gene list, the DE immune genes were detected using R software (v3.5.3). The pheatmap package and gplots package was used to make heatmap and volcano map.
Function and pathway analysis of DE immune genes
The org.Hs.eg.db package and clusterProfiler package was used to conduct gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. GO terms and KEGG terms were identified as significantly enriched when p.adjust < 0.05.
Construction of the prognostic risk model
Based on DE immune genes in the training cohort, univariate analysis was performed to identify significant DE immune genes when p < 0.05. Then, Lasso regression was performed to eliminate genes that might overfit the model. Lastly, we applied multivariate analysis to identify the optimal prognostic immune genes for the model. The risk score was calculated based on a linear combination of the Cox coefficient and gene expression. The following calculation formula was used for the analysis:
N, Expi, and Coei represented gene number, level of gene expression, and coefficient value, respectively. The median was set as the cutoff value to divided all colon cancer patients into high-risk and low-risk groups. A high-risk score shows poor survival for colon cancer patients. Survival package and survminer package were used to conduct survival analysis. Time-dependent receiver operating characteristic (ROC) analysis for overall survival (OS) was used to evaluate the accuracy of the prognostic model. The survivalROC package was used to conduct a ROC analysis. An area under the ROC (AUC) > 0.60 was treated as an acceptable prediction value, and an AUC > 0.75 was considered as excellent for predictions [12, 13]. Risk score distribution plots, survival status scatter plots, and heatmap between the low-risk and high-risk groups were also applied to evaluate the model.
Validation of the prognostic risk model
We used the testing cohort and the entire TCGA cohort to verify the accuracy of the prognostic risk model. Survival analysis and time-dependent ROC analysis were used to validate the model. Risk score distribution plots, survival status scatter plots, and heatmap was also used to evaluate the model.
Independent prognostic value of the model in the entire cohort
To assess the prognostic value of the immune gene risk model, we applied both univariate and multivariate analyses of prognostic factors using Cox proportional hazards regression. Age, pathological stage, T, M, and N were treated as continuous variables. Gender was coded as female (0) or male (1). Factors in which p < 0.05 based on both univariate and multivariate analyses were identified as independent prognostic variables.
Clinical utility of the model
To evaluate the prediction ability of the model in colon cancer patients, we assessed the relationships between our model (level of risk genes and the risk score) and the clinical features (age, gender, pathological stage, T, M, and N) in the entire cohort. Patients were separately divided into two groups according to age (> = 70 and < 70 years old), gender (female and male), stage (stage I&II and stage III&IV), T (T1–2 and T3–4), M (M0 and M1), and N (N0 and N1–3). Differences between the two groups were assessed with independent t-tests.
Correlation between the model and immune cell infiltration
To understand whether the model could reflect the status of the tumor immune microenvironment in colon cancer patients, we evaluated the correlation between the risk score of the model and immune cell infiltration in the entire TCGA cohort. Pearson correlation coefficient test was used to estimate the relationship between the risk score of the model and the content of different types of immune cells.
Gene expression profile of 514 samples (473 tumors and 41 adjacent normals) for colon cancer patients was obtained from TCGA database. Twenty-four tumor samples coming from the same patients with other samples and thirty-one tumor samples with short follow-up time (< 30 days) were deleted. The clinical data of the remaining 418 tumor samples were shown in Table S1. Then these 418 tumor samples in the entire cohort were randomly classified into two groups, including a training cohort (n = 209) and a testing cohort (n = 209) (Table S2). The training cohort was applied to develop the prognosis risk model, and the testing cohort and the entire cohort were used to validate the model. The workflow of our study was illustrated in Fig. 1.
Identification of DE genes
3120 DE genes (1899 up-regulated and 1221 down-regulated) were recognized in colon cancer tissues compared with normal tissues. The DE genes were evaluated by the heatmap, as shown in Fig. 2a. The distribution of all DE genes according to the two dimensions of -log10FDR and log2FC was represented by a volcano map in Fig. 2b.
Identification of the DE immune genes
A total of 1811 immune genes were downloaded from the ImmPort database. Based on DE genes and immune-related genes, 333 DE immune genes (176 up-regulated and 157 down-regulated) were detected. The top 10 up-regulated and down-regulated DE immune genes were shown in Table S3. The heatmap and volcano map of DE immune genes were depicted in Fig. 2c and d.
Functional enrichment analysis of the DE immune genes
In total, 1775 GO terms, including 1690 biological process terms, 18 cellular component terms, and 67 molecular function terms were identified as significantly enriched. Likewise, 40 significantly enriched KEGG terms were detected. The top ten function and pathway terms were shown in Fig. 2e-h.
Construction of a five-immune gene prognostic risk model
Based on the training cohort, we screened seven immune genes that were possible prognostic genes using univariate Cox analysis (Fig. 3a). Then, we used Lasso regression to get six-candidate prognostic immune genes (Fig. 3b and c). Finally, we used multivariate Cox analysis to acquire five optimal immune genes, including lipopolysaccharide binding protein (LBP), transferrin receptor protein 2 (TFR2), urocortin (UCN), urotensin-II (UTS2), and melanocortin 1 receptor (MC1R) (Fig. 3d). All these five immune genes were high hazard genes, which were up-regulated DE genes (Fig. 3e). The formula for the risk score model was as follows: risk score = (0.4248 × expression value of LBP) + (0.2467 × expression value of TFR2) + (0.4666 × expression value of UCN) + (0.4139 × expression value of UTS2) + (0.209 × expression value of MC1R).
Based on the median risk score, all colon cancer patients were divided into a high-risk group (n = 104) and a low-risk group (n = 105). The time-dependent ROC analysis for OS was significantly different between the two risk groups (p = 7.407e-05) (Fig. 3f). The median survival time of the low-risk group was more than ten years, while that of the high-risk group was less than five years. Also, the 3- and 5-year survival rates of the low-risk group were 91 and 91%, respectively, whereas the corresponding rates in the high-risk group were 72 and 35%. The AUC values for the five-immune gene prognostic model at three and five years of OS was 0.756 and 0.881, respectively (Fig. 3g). The risk score distribution plot, survival status plot, and the heatmap between two groups was shown in Fig. 3h-j.
Validation of the risk model
To validate the accuracy of the risk model, we analyzed the model in the testing cohort and the entire TCGA cohort. The risk score of each patient in the testing and the entire cohort was calculated and then divided into two groups based on the median. In the testing cohort, 104 patients and 105 patients were categorized as high-risk and low-risk groups, respectively. Similarly, in the entire cohort, 209 patients and 209 patients were divided into high-risk and low-risk groups, respectively. There were significant differences in survival curves between the two risk groups in the testing cohort and the entire cohort (p < 0.05) (Fig. 4a-b). The AUC values at 3- and 5-year were 0.603 and 0.582 in the testing cohort, respectively (Fig. 4c). The AUC values at 3- and 5-year were 0.663 and 0.713 in the entire cohort, respectively (Fig. 4d). The risk score distribution plot, survival status plot, and heatmap of risk gene expression in the two cohorts were presented in Fig. 4e-j. All risk gene level was higher in the high-risk group than that in the low-risk group, which showed that the risk model could accurately predict the prognosis of colon cancer patients.
Independent prognostic value of the risk model
Both the univariate analysis and the multivariate analysis revealed that age, the pathological stage, and the risk score were related to OS in the entire cohort (p < 0.05) (Fig. 5a-b). These results indicated that age, the pathological stage, and the prognostic risk model could be used independently to predict the prognosis of colon cancer patients. We then further compared these variables and found that the risk score was more accurate than the pathological stage and age in predicting OS at five years. The AUCs at five years for the risk score, age, and the pathological stage were 0.713, 0.634 and 0.678, respectively (Fig. 5c).
Clinical utility of the model
When the values of LBP increased, the T category of colon cancer patients increased in the entire cohort (p < 0.05) (Fig. 5d). Similarly, as the values of MC1R increased, the value of the pathological stage and N category increased (p < 0.05) (Fig. 5e-f). These results proved that the immune gene expression of the model is related to the development of colon cancer.
Correlation between the risk model and immune cell infiltration
The risk score was negatively correlated with the content of CD8+ T cells in colon cancer tissues in the entire cohort (p < 0.05) (Fig. 6). The result demonstrated that the immune gene model might reflect the status of the tumor immune microenvironment in colon cancer patients.
Colon cancer is one of the most common carcinomas worldwide, responsible for about 1,100,000 new cases and 550,000 deaths in 2018 . Several studies have reported the role of the immune gene in the initiation and progression of carcinoma [15, 16]. In the current study, we established and validated a prognostic model based on five DE immune genes, which could be used as an independent prognostic variable. We found that this model could provide more accurate predictive value than the pathological stage and age in predicting OS at five years. Additionally, the immune gene model could reflect the tumor immune microenvironment according to the correlation analysis between the model and immune cell infiltration. Besides, we conducted an enrichment analysis of function and pathway of the DE immune genes, which might provide a reference for further basic research in colon cancer.
In the current study, we developed a prognostic risk model based on five DE immune genes, named LBP, TFR2, UCN, UTS2, and MC1R. Firstly, this model was constructed by five DE immune genes between colon cancer and normal tissues. These DE immune genes might reflect the progression of colon cancer, which could contribute to the early diagnosis of colon cancer. Secondly, multiple algorithms were applied for model selection, and the prediction value of the model had also been confirmed, which proved the accuracy and dependability of the prognostic model. Besides, these DE immune genes may have great promise to be novel molecular targets in immunotherapy. LBP, as a pattern recognition protein, can activate the cell to produce cytokines when faced with various microbial ligands . Serum LBP was also proved to be a useful prognostic parameter for breast cancer patients after radiation therapy . TFR2, which play a crucial role in the regulation of iron homeostasis, was found high expression in human colon cancer cell [19, 20]. UCNs are corticotropin-releasing factor-related peptides, regulating gastrointestinal motor and visceral pain during stress . UTS2 was recently used as a new drug target towards colon cancer cells . Individuals carrying MC1R variants are associated with a higher risk of melanoma, and MC1R had been used as an intervention target for melanoma .
To assess the prediction capability of the model, we analyzed several clinical variables as well as the risk score. Age, the pathological stage, and the risk score were identified as independent prognostic variables. Age is a prominent risk factor for multiple tumors including colorectal cancer , which was in line with the results predicted by the model. Further comparison showed that the predicted value of the model is better than age and the pathological stage. Thus, our model showed a high prediction ability. To evaluate the clinical applicability of the model, we analyzed the relationships between factors in the model and certain clinical variables. We found that higher gene expression of the immune genes in the model was highly correlated with higher pathological stage, which, on the other hand, verifies the reliability of our prognostic model. The previous study has reported that immune infiltration is vital in response to treatment and prognosis of colon cancer . Galon et al.  reported that individual immune cell markers have prognostic impacts on patients who have colon cancer. It was reported that the inhibition of CD8+ T cells was associated with enhanced tumor progression, and mesenchymal stromal cells PD-L1 could promote colon cancer by inhibiting the antitumor immune responses of CD8+ T cell . In the present study, we discovered that the risk score was negatively related to the infiltration of CD8+ T cells. These results might also confirm that our model was reliable in predicting the prognosis of colon cancer.
In the present study, we conducted an enrichment analysis of the DE immune genes. Leukocyte migration, cell chemotaxis, extracellular matrix, and receptor regulatory activity were enriched GO terms. Solid tumor sample from TCGA comprises both tumor and other cells, among which immune cell play vital roles in the development of the tumor. Chu et al.  reported that leukocyte migration is a natural process, from blood to tissue through the vascular barrier, to deal with the invasion of pathogens, which might reflect the status of tumor microenvironment. The distinct biochemical and biophysical properties of extracellular matrix can influence cell phenotype, and the dysregulation of extracellular matrix dynamics leads to the development of cancer . KEGG analysis found that cytokine-cytokine receptor interaction, chemokine signaling pathway, and MAPK signaling pathway were significant pathways. Cytokine-cytokine receptor interaction was related to the viability of colon cancer cell lines . Besides, previous studies have demonstrated that abnormally activated MAPK pathway is closely associated with growth and metastasis of colon cancer [30, 31]. These significantly enriched functions and pathways may provide a reference for further basic experiments.
The current study has several advantages. Firstly, we constructed a prognostic model of colon cancer based on DE immune genes for the first time. Secondly, we created the model using various statistical methods and validated the model using the testing cohort and the entire cohort. Thus, the prognostic risk model for colon cancer patients was accurate and reliable. Thirdly, the risk score model could be used as an independent prognostic index, which was more accurate than the pathological stage and age in predicting OS. Finally, our model could also be used to predict immune cell infiltration in the progression of colon cancer.
The present study has limitations. Firstly, we developed the prognostic risk model based on public databases, which was not verified by prospective clinical trials. Additionally, the underlying mechanisms of how the detected DE immune genes impact the progress of colon cancer require further study by basic experiments.
We developed and validated a five-immune gene model of colon cancer, including LBP, TFR2, UCN, UTS2, and MC1R. This model could be used as an instrumental variable in the prognosis prediction of colon cancer.
Availability of data and materials
All analyzed data are included in this published article and its supplementary information file. The original data are available upon reasonable request to the corresponding author.
The Cancer Genome Atlas
False discovery rate
Kyoto Encyclopedia of Genes and Genomes
Receiver operating characteristic
Area under the curve
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ. Cancer statistics, 2009. CA Cancer J Clin. 2009;59:225–49.
Center MM, Jemal A, Smith RA, Ward E. Worldwide variations in colorectal cancer. CA Cancer J Clin. 2009;59:366–78.
Edwards BK, Ward E, Kohler BA, Eheman C, Zauber AG, Anderson RN, Jemal A, Schymura MJ, Lansdorp-Vogelaar I, Seeff LC, et al. Annual report to the nation on the status of cancer, 1975-2006, featuring colorectal cancer trends and impact of interventions (risk factors, screening, and treatment) to reduce future rates. Cancer. 2010;116:544–73.
Jacome AA, Eng C. Role of immune checkpoint inhibitors in the treatment of colorectal cancer: focus on nivolumab. Expert Opin Biol Ther. 2019;19:1247–63.
Mlecnik B, Bindea G, Kirilovsky A, Angell HK, Obenauf AC, Tosolini M, Church SE, Maby P, Vasaturo A, Angelova M, et al. The tumor microenvironment and Immunoscore are critical determinants of dissemination to distant metastasis. Sci Transl Med. 2016;8:327ra26.
Galon J, Costes A, Sanchez-Cabo F, Kirilovsky A, Mlecnik B, Lagorce-Pages C, Tosolini M, Camus M, Berger A, Wind P, et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science. 2006;313:1960–4.
Stein A, Folprecht G. Immunotherapy of Colon Cancer. Oncol Res Treat. 2018;41:282–5.
Wang H, Wang X, Xu L, Zhang J, Cao H. A molecular sub-cluster of colon cancer cells with low VDR expression is sensitive to chemotherapy, BRAF inhibitors and PI3K-mTOR inhibitors treatment. Aging (Albany NY). 2019;11:8587–603.
Kohne CH. Successes and limitations of targeted cancer therapy in colon cancer. Prog Tumor Res. 2014;41:36–50.
Yang C, Zhang Y, Xu X, Li W. Molecular subtypes based on DNA methylation predict prognosis in colon adenocarcinoma patients. Aging (Albany NY). 2019;11:11880–92.
Li B, Severson E, Pignon JC, Zhao H, Li T, Novak J, Jiang P, Shen H, Aster JC, Rodig S, et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol. 2016;17:174.
Wan B, Liu B, Huang Y, Yu G, Lv C. Prognostic value of immune-related genes in clear cell renal cell carcinoma. Aging (Albany NY). 2019;11:11474–89.
Han ME, Kim JY, Kim GH, Park SY, Kim YH, Oh SO. SAC3D1: a novel prognostic marker in hepatocellular carcinoma. Sci Rep. 2018;8:15608.
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424.
Song Q, Shang J, Yang Z, Zhang L, Zhang C, Chen J, Wu X. Identification of an immune signature predicting prognosis risk of patients in lung adenocarcinoma. J Transl Med. 2019;17:70.
Zheng S, Luo X, Dong C, Zheng D, Xie J, Zhuge L, Sun Y, Chen H. A B7-CD28 family based signature demonstrates significantly different prognoses and tumor immune landscapes in lung adenocarcinoma. Int J Cancer. 2018;143:2592–601.
Finberg RW, Re F, Popova L, Golenbock DT, Kurt-Jones EA. Cell activation by toll-like receptors: role of LBP and CD14. J Endotoxin Res. 2004;10:413–8.
Chalubinska-Fendler J, Graczyk L, Piotrowski G, Wyka K, Nowicka Z, Tomasik B, Fijuth J, Kozono D, Fendler W. Lipopolysaccharide-binding protein is an early biomarker of cardiac function after radiation therapy for breast Cancer. Int J Radiat Oncol Biol Phys. 2019;104:1074–83.
Calzolari A, Deaglio S, Maldi E, Cassoni P, Malavasi F, Testa U. TfR2 expression in human colon carcinomas. Blood Cells Mol Dis. 2009;43:243–9.
Calzolari A, Oliviero I, Deaglio S, Mariani G, Biffoni M, Sposi NM, Malavasi F, Peschle C, Testa U. Transferrin receptor 2 is frequently expressed in human cancer cell lines. Blood Cells Mol Dis. 2007;39:82–91.
Martinez V, Wang L, Million M, Rivier J, Tache Y. Urocortins and the regulation of gastrointestinal motor function and visceral pain. Peptides. 2004;25:1733–44.
Zappavigna S, Abate M, Cossu AM, Lusa S, Campani V, Scotti L, Luce A, Yousif AM, Merlino F, Grieco P, et al. Urotensin-II-targeted liposomes as a new drug delivery system towards prostate and Colon Cancer cells. J Oncol. 2019;2019:9293560.
Chen S, Zhu B, Yin C, Liu W, Han C, Chen B, Liu T, Li X, Chen X, Li C, et al. Palmitoylation-dependent activation of MC1R prevents melanomagenesis. Nature. 2017;549:399–403.
Wang T, Maden SK, Luebeck GE, Li CI, Newcomb PA, Ulrich CM, Joo JE, Buchanan DD, Milne RL, Southey MC, et al. Dysfunctional epigenetic aging of the normal colon and colorectal cancer risk. Clin Epigenetics. 2020;12:5.
Peng D, Wang L, Li H, Cai C, Tan Y, Xu B, Le H. An immune infiltration signature to predict the overall survival of patients with colon cancer. IUBMB Life. 2019;71:1760–70.
O'Malley G, Treacy O, Lynch K, Naicker SD, Leonard NA, Lohan P, Dunne PD, Ritter T, Egan LJ, Ryan AE. Stromal cell PD-L1 inhibits CD8(+) T-cell antitumor immune responses and promotes Colon Cancer. Cancer Immunol Res. 2018;6:1426–41.
Chu D, Dong X, Shi X, Zhang C, Wang Z. Neutrophil-based drug delivery systems. Adv Mater. 2018;30:e1706245.
Walker C, Mojares E, Del Rio Hernandez A. Role of extracellular matrix in development and Cancer progression. Int J Mol Sci. 2018;19:3028.
Shi YJ, Zhao QQ, Liu XS, Dong SH, JF E, Li X, Liu C, Wang H. Toll-like receptor 4 regulates spontaneous intestinal tumorigenesis by up-regulating IL-6 and GM-CSF. J Cell Mol Med. 2020;24:385–97.
Xu L, Zhang Y, Wang H, Zhang G, Ding Y, Zhao L. Tumor suppressor miR-1 restrains epithelial-mesenchymal transition and metastasis of colorectal carcinoma via the MAPK and PI3K/AKT pathway. J Transl Med. 2014;12:244.
Wang H, An H, Wang B, Liao Q, Li W, Jin X, Cui S, Zhang Y, Ding Y, Zhao L. miR-133a represses tumour growth and metastasis in colorectal cancer by targeting LIM and SH3 protein 1 and inhibiting the MAPK pathway. Eur J Cancer. 2013;49:3924–35.
This work was not funded by any grant.
Ethics approval and consent to participate
Not applicable. All data in this study are publicly available.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chen, H., Luo, J. & Guo, J. Development and validation of a five-immune gene prognostic risk model in colon cancer. BMC Cancer 20, 395 (2020). https://doi.org/10.1186/s12885-020-06799-0