Comprehensive analysis of ceRNA networks reveals prognostic lncRNAs related to immune infiltration in colorectal cancer

Background Competing endogenous RNA (ceRNA) represents a class of RNAs (e.g., long noncoding RNAs [lncRNAs]) with microRNA (miRNA) binding sites, which can competitively bind miRNA and inhibit its regulation of target genes. Increasing evidence has underscored the involvement of dysregulated ceRNA networks in the occurrence and progression of colorectal cancer (CRC). The purpose of this study was to construct a ceRNA network related to the prognosis of CRC and further explore the potential mechanisms that affect this prognosis. Methods RNA-Seq and miRNA-Seq data from The Cancer Genome Atlas (TCGA) were used to identify differentially expressed lncRNAs (DElncRNAs), microRNAs (DEmiRNAs), and mRNAs (DEmRNAs), and a prognosis-related ceRNA network was constructed based on DElncRNA survival analysis. Subsequently, pathway enrichment, Pearson correlation, and Gene Set Enrichment Analysis (GSEA) were performed to determine the function of the genes in the ceRNA network. Gene Expression Profiling Interactive Analysis (GEPIA) and immunohistochemistry (IHC) were also used to validate differential gene expression. Finally, the correlation between lncRNA and immune cell infiltration in the tumor microenvironment was evaluated based on the CIBERSORT algorithm. Results A prognostic ceRNA network was constructed with eleven key survival-related DElncRNAs (MIR4435-2HG, NKILA, AFAP1-AS1, ELFN1-AS1, AC005520.2, AC245884.8, AL354836.1, AL355987.4, AL591845.1, LINC02038, and AC104823.1), 54 DEmiRNAs, and 308 DEmRNAs. The MIR4435-2HG- and ELFN1-AS1-associated ceRNA subnetworks affected and regulated the expression of the COL5A2, LOX, OSBPL3, PLAU, VCAN, SRM, and E2F1 target genes and were found to be related to prognosis and tumor-infiltrating immune cell types. Conclusions MIR4435-2HG and ELFN1-AS1 are associated with prognosis and tumor-infiltrating immune cell types and could represent potential prognostic biomarkers or therapeutic targets in colorectal carcinoma. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-07995-2.


Background
Colorectal cancer (CRC), the third most common cancer and the second leading cause of cancer mortality worldwide, is caused by the accumulation of long-term genetic mutations and epigenetic alterations [1]. Although recent interventional and therapeutic advances have greatly improved the overall survival (OS) of CRC, patients with advanced CRC still show poor prognosis due to the high frequency of recurrence and metastasis [2]. Investigation of the molecular mechanisms underlying CRC, from initiation to metastasis, is urgently needed to improve its early diagnosis, treatment, and prognosis. Long noncoding RNAs (lncRNAs) are noncoding transcripts that range from 200 to 100,000 nucleotides in length and lack protein-coding capacity. They function as signals, decoys, guides, and scaffolds in gene regulatory networks [3]. The regulatory mechanism of lncRNA depends on its subcellular localization. Cytoplasmic lncRNAs primarily participate in post-transcriptional and translational regulation by acting as microRNA (miRNA) sponges and modulating mRNA stability. An increasing number of lncRNAs have been found to participate in CRC initiation and progression by functioning as competing endogenous RNAs (ceRNAs) [4][5][6]. Specifically, lncRNAs derepress target gene expression by competitively binding to shared sequences of miRNAs [7,8]. Recent studies showed that ceRNA crosstalk was involved in various biological processes, including CRC cell proliferation, invasion, epithelial-to-mesenchymal transition (EMT), metastasis, and immune infiltration [9][10][11][12][13].
In the present study, we acquired the expression profiles and clinical information for colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ) from The Cancer Genome Atlas (TCGA). Differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miR-NAs (DEmiRNAs) were then identified by comparing tumor samples and normal samples. We performed survival analysis and predicted RNA interaction pairs to construct a prognosis-related ceRNA network. The protein-protein interaction (PPI) network was embodied in Cytoscape to identify module clusters and hub genes. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed to explore the biological functions of the mRNAs in the ceRNA and module network. In addition, we analyzed the correlations between key lncRNAs and the clinical features of CRC patients. Co-expression analysis between lncRNAs and target genes was performed to verify the ceRNA hypothesis. Eventually, we established ceRNA regulatory subnetworks based on the above results. GEPIA and immunohistochemistry were used to validate the differential expression of genes, whereas GSEA was used to predict the underlying functions of the genes in the subnetwork. The CIBERSORT algorithm was used to estimate the proportion of immune cells for further correlation analysis (Fig. 1).

Differential expression analysis of RNAs
To screen differentially expressed RNAs (DERNAs) between tumor samples and normal samples, we used the "edgeR" package in R software to standardize and analyze the downloaded data. DEmRNAs, DElncRNAs, and DEmiRNAs were identified with the criteria of |log2 fold change (FC)| > 1.0 and adjusted P value < 0.05. Heatmap clustering and volcano plot were generated using R package "edgeR" to reveal the differential expressions of the DEmRNAs, DElncRNAs, and DEmiRNAs.

Survival analysis and ceRNA network construction
RNA expression profiles and related clinical information of the CRC patients were acquired from TCGA database. To identify lncRNAs significantly associated with patient prognosis, we used the R package "survival" to perform Kaplan-Meier curve analysis of DElncRNAs. DElncRNAs with P < 0.05 were considered to be key lncRNAs for construction of the next network. Based on the hypothesis that lncRNA acts as a miRNA sponge to regulate mRNA expression [7], we selected key lncRNAs and lncRNA-miRNA-mRNA interaction pairs to build the ceRNA regulatory network. Ultimately, we combined specific lncRNA-miRNA and miRNA-mRNA pairs to establish a prognosis-related network. The established ceRNA network was visualized using Cytoscape v3.6.0 software [18]. Univariate and multivariate Cox regression analyses were performed to identify the independent prognostic indicators for overall survival (OS).

Construction of PPI network
To better understand the interactions of DEmRNAs in the ceRNA network, we constructed a PPI network using the Search Tool for the Retrieval of Interacting Genes (STRING; https://string-db.org/) [19] database. The median confidence score (0.4) was used to filter protein interactions. Cytoscape 3.6.0 was used to embody the constructed PPI networks. In addition, we used the plugin cytoHubba to identify hub genes in the PPI network according to degree of connectivity. The top 30 hub genes were identified through the Degree ranking method. The MCODE plugin was used to analyze the PPI network and select the top module for subnetwork analysis.

Functional annotation analysis
To explore the potential biological functions and pathways of mRNAs in the ceRNA and module network, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis through the R package "ClusterProfiler". GO terms and KEGG pathways with P < 0.05 were considered significantly enriched.

Clinical feature analysis of key lncRNAs
Clinical information, including age, sex, tumor stage, and TNM stage, was extracted from TCGA database. Based on the prognosis-related ceRNA network, we evaluated the associations between key lncRNAs and the clinical features of CRC patients using the R package "ggpubr".

Correlation analysis of lncRNAs and mRNAs
We performed Pearson correlation analysis and selected the positive correlations between the lncRNAs and mRNAs in ceRNA network. Pearson correlation coefficient > 0.4 and P < 0.001 were used as inclusion criteria. Based on the above co-expression relationships, we constructed a ceRNA subnetwork to investigate the potential mechanisms.

Validation of gene expression
Gene Expression Profiling Interactive Analysis (GEPIA; http://gepia.cancer-pku.cn/) is an online analysis tool based on TCGA and the Genotype-Tissue Expression (GTEx) dataset [20]. Differential expression boxplot in GEPIA was used to compare the expression levels of genes between tumor samples and normal samples. The Human Protein Atlas (THPA; https://www.proteinatlas.org) is a database that provides immunohistochemistry-based maps of human protein profiles for cancer tissues and normal tissues and cells [21]. We obtained immunohistochemistry images and data from this database to validate gene expression at the protein level. Staining intensity was evaluated as follows: no staining, 0; low staining, 1; medium staining, 2; high staining, 3. Staining quantity was scored as follows: none, 0; < 25%, 1; 25-50%, 2; 51-75%, 3; > 75%, 4. The total staining score was calculated by multiplying these two scores. Differences in the total staining scores between normal and CRC tissues were evaluated using the unpaired t-test. P < 0.05 was considered statistically significant. Fig. 4 The protein-protein interaction (PPI) network of differentially expressed genes in the prognosis-related ceRNA network. Each node corresponds to a protein-coding gene, and the size of the node represents the connection degree

Gene set enrichment analysis
To explore the biological pathways of genes in a subnetwork, Gene Set Enrichment Analysis (GSEA) analysis was performed based on TCGA dataset. According to the cutoff value, the samples were divided into high and low expression phenotypes. The gene set permutations were performed 1000 times per analysis. Gene sets with |normalized enrichment score (NES)| > 1 and P < 0.05 were considered significantly enriched.

Evaluation of immune cell infiltration
Normalized gene expression data were uploaded to TIMER2.0 (http://timer.cistrome.org/) [22], a website that provides comprehensive evaluation and analysis of tumor-infiltrating immune cells, to estimate the relative abundance of 22 immune cell types based on the CIBER-SORT algorithm. To evaluate the correlation between lncRNA and immune cell infiltration in the tumor microenvironment, CRC samples from the TCGA database were divided into high and low lncRNA expression groups. The Wilcoxon rank-sum test was performed to compare the differences in immune cell infiltration between the high and low lncRNA expression groups.

Differentially expressed lncRNAs, miRNAs, and mRNAs
We obtained the expression profiles of 19,645 mRNAs, 14,166 lncRNAs, and 1881 miRNAs from TCGA

Prediction of lncRNA-miRNA and miRNA-mRNA interactions
We predicted lncRNA-miRNA interaction pairs based on DElncRNAs using the online tool DIANA-LncBase version 3 and found that 153 of the 217 DElncRNAs might target 199 of the 376 DEmiRNAs. Subsequently, miRTarBase, miRDB, and TargetScan databases were used to identify miRNA-mRNA interaction pairs. Target mRNAs that were recognized by all three databases were selected as candidate genes. The results showed that 184 of the 199 specific DEmiRNAs might target 3475 candidate genes. After comparing the 3475 candidate genes with 2480 DEmRNAs, we obtained 455 overlapping genes and 1148 miRNA-mRNA interaction pairs.

Construction of prognosis-related ceRNA network
A ceRNA network was constructed using 2489 lncRNA-miRNA pairs and 1148 miRNA-mRNA pairs using Cytoscape software. The association of 153 lncRNAs in the ceRNA network with patient prognosis was analyzed using Kaplan-Meier survival curves. The results showed that 11 of the 153 lncRNAs were significantly associated with OS. The 11 prognosis-related DElncRNAs were considered key lncRNAs. LINC02038 and AC104823.1 were positively correlated with OS, whereas MIR4435-2HG, NKILA, AFAP1-AS1, ELFN1-AS1, AC005520.2, AC245884.8, AL354836.1, AL355987.4, and AL591845.1 were negatively correlated with OS (Fig. 3). Ultimately, we constructed a survival-related ceRNA network with 11 key lncRNAs, 54 DEmiRNAs, and 308 DEmRNAs. This network contained 373 nodes and 739 edges (Supplementary Figure S2). A total of 11 lncRNAs and 308 mRNAs in the ceRNA network were included in the univariate cox regression analysis. Six lncRNAs and 28 mRNAs with P < 0.05 were subjected to multivariate COX regression analysis. The lncRNA ELFN1-AS1 and the SPTBN2, CCNF, and VEGFA mRNAs were identified as independent prognostic factors for OS in CRC patients (Supplementary Table S3).

PPI network and module analysis
The mRNAs in the ceRNA network were used to construct a PPI network based on the STRING database. The PPI network was visualized by Cytoscape and included 265 nodes and 1109 edges (Fig. 4). The cyto-Hubba plugin was used to identify the top 30 hub genes with the highest degrees (Table 1), of which 7 genes (VEGFA, BIRC5, CDC25A, KPNA2, DLGAP5, CCNE1, and CCNF) were significantly related to OS using Kaplan-Meier analysis (Fig. 5). The MCODE plugin was used in the PPI network to identify the top module, which contained 31 nodes and 341 PPIs (Fig. 6a).

GO and KEGG enrichment analysis of DEmRNAs in ceRNA
We performed GO and KEGG enrichment analysis on 308 mRNAs in the ceRNA network and 31 mRNAs in the module network with the R package "ClusterProfiler". GO functional enrichment analysis suggested that mRNAs in the ceRNA network were significantly enriched in "proximal promoter sequence-specific DNA binding", "transcription coregulator activity", and "RNA polymerase II proximal promoter sequence-specific DNA binding" (Fig. 7a). KEGG pathway analysis showed that ceRNA-related mRNAs were enriched in "microRNAs in cancer", "cell cycle", and "PI3K-Akt signaling pathway" (Fig. 7b). Enrichment analysis of mRNAs in the module network showed that the enriched GO terms included "microtubule binding", "cyclin-dependent protein serine/threonine kinase regulator activity", and "tubulin binding", whereas the most enriched KEGG pathways were "cell cycle", "cellular senescence", and "p53 signaling pathway" (Fig. 6b and Fig. 7c).

Validation of gene expression by GEPIA and immunohistochemistry
We used TCGA and GTEx datasets in GEPIA to further validate the differential expression of genes between tumor and normal samples. The expression levels of PLAU, OSBPL3, LOX, VCAN, E2F1, and SRM were upregulated in COAD and READ tissues compared with normal tissues (all P < 0.05, Fig. 10). However, the expression of COL5A2 showed no differential expression between tumor and normal tissues. Immunohistochemistry results were obtained from the THPA database, and the staining scores were quantified to verify the protein expression levels of the genes. The staining scores for SRM (P = 0.0015), OSBPL3 (P = 0.0263), and VCAN (P = 0.0229) were higher in the CRC tissue than normal tissue, whereas PLAU (P = 0.0169) showed the opposite pattern. There was no significant difference between the E2F1 staining scores (P = 0.6213) for CRC and normal tissue (Fig. 11).

GSEA-predicted biological functions of genes
We used GSEA analysis to predict the enriched KEGG pathways between high and low expressions of genes in a subnetwork. PLAU, encoding a secreted serine protease, was positively correlated with "extracellular matrix", "angiogenesis", and "serine hydrolase activity". LOX was significantly enriched in "extracellular matrix", "cell adhesion molecule binding", and "angiogenesis". VCAN belongs to the aggrecan/versican proteoglycan family, which was positively correlated with "extracellular matrix component", "cell adhesion molecule binding", and "proteoglycan metabolic process". COL5A2 encodes one alpha chain of fibrillar collagen trimers. The high expression of COL5A2 was correlated with "extracellular matrix", "collagen trimer", and "collagen fibril organization", whereas E2F1 was negatively correlated with "biological adhesion", "regulation of cell adhesion", and "cell motility" (Fig. 12).

Tumor-infiltrating immune cells in CRC
A cohort of 568 CRC patients was divided into high and low expression groups based on lncRNA expression. The results of the Wilcoxon rank-sum test revealed that plasma B cells (P = 0.001) and CD4+ memory resting T cells (P < 0.001) were greater in patients with low MIR4435-2HG expression, while follicular helper T cells (P = 0.037) and neutrophils (P < 0.001) were greater in patients with high MIR4435-2HG expression (Fig. 13a). The high ELFN1-AS1 expression group had significantly higher numbers of plasma B cells (P = 0.039), CD4+ memory-activated T cells (P = 0.006), gamma delta T cells (P = 0.007), monocytes (P = 0.047), and resting myeloid dendritic cells (P = 0.031). The low ELFN1-AS1 expression group had higher numbers of CD8+ T cells (P = 0.001) and neutrophils (P = 0.024) (Fig. 13b). Correlation analysis was performed to determine the coexpression patterns among 22 tumor-infiltrating immune cell types (Supplementary Figure S4).

Discussion
Colorectal cancer (CRC) is one of the leading causes of cancer-related death worldwide. Increasing evidence has shown that dysregulated lncRNAs or lncRNA-mediated ceRNA networks are involved in the progression and prognosis of the CRC. LncRNAs suppress miRNAinduced translational inhibition of gene expression by competitively binding to miRNA response elements. In recent years, they have gained wide attention due to their important roles in tumor progression, patient prognosis, and immune infiltration [7,13,23]. The lncRNA  H19/miR-29b-3p/PGRN axis promotes EMT in CRC and counts a great deal for seeking appropriate diagnostic biomarkers and therapeutic targets for CRC [11]. The oncogenic HOTAIR/miR-331-3p/HER2 axis is associated with tumor progression and poor prognosis in gastric cancer [24]. In addition, the RP11-1094 M14.8/miR-1269a/CXCL9 axis is a potential prognostic marker for gastric cancer with different degrees of immune cell infiltration [13]. Thus, identifying lncRNAs and their associated ceRNA networks as prognostic factors may provide a foundation for the development of early diagnostic biomarkers and therapeutic targets in CRC.
In this study, we analyzed TCGA sequence data from CRC patients and constructed a prognostic ceRNA network to explore the potential function of lncRNAs and their related genes. MIR4435-2HG and ELFN1-AS1 were identified as potential prognostic biomarkers and therapeutic targets in CRC based on survival and clinical feature analysis. MIR4435-2HG promotes CRC growth and metastasis through the miR-206/YAP1 axis and predicts poor prognosis in CRC patients [25][26][27]. ELFN1-AS1 is elevated in CRC tissues, and its upregulation promotes CRC cell proliferation and migration and inhibits apoptosis by competing with the endogenous miR-4644/ TRIM44 axis [28,29]. While downregulation of ELFN1-AS1 suppresses the activation of extracellular signalregulated protein kinase (ERK), decreases vimentin expression, and increases E-cadherin expression.
Based on these ceRNA networks and lncRNA-mRNA co-expression relationships, we obtained two novel ceRNA regulatory subnetworks for MIR4435-2HG and ELFN1-AS1. According to the univariate and multivariate Cox analysis, ELFN1-AS1 was identified as an independent prognostic factor for CRC patients. MIR4435-2HG positively regulated PLAU, VCAN, LOX, COL5A2, and OSBPL3 mRNA expression by acting as a miRNA sponge. PLAU upregulation in primary tumor tissues is positively correlated with distant metastasis and indicates poor patient prognosis in CRC [30]. MiR-193a inhibits CRC cell growth and progression partly by downregulating the expression of target gene PLAU, which is consistent with our prediction of PLAU/miR-193a interaction [31]. VCAN is overexpressed in various human malignancies, including CRC [32][33][34][35][36][37], and CRC patients with high VCAN expression have worse disease outcomes than those with lower expression [38]. Upregulation of LOX expression is correlated with the progression and metastasis of various solid tumors [39]. There have been contradictory reports regarding the role of LOX in CRC. Csiszar et al. [40] reported that reduced LOX expression in colon tumor tissue supports the function of LOX as a tumor suppressor gene. However, recent studies found that LOX expression was significantly increased in tumor tissue and correlated with poor clinical outcomes of CRC patients [41,42]. COL5A2 is upregulated in CRC tissue compared to normal tissue, suggesting that COL5A2 is involved in CRC carcinogenesis [43]; however, there are no reports about the role of OSBPL3 in CRC. E2F1 and SRM were predicted to be co-expressed genes involved in the ELFN1-AS1 regulatory network. Several studies have demonstrated that E2F1 is an oncogene involved in CRC  [44,45]. However, there is also evidence to support E2F1 as a tumor suppressor gene involved in the induction of apoptosis in CRC [46,47]. SRM encodes one of the key enzymes in the polyamine biosynthesis pathway. Polyamines are essential for cell proliferation and growth, and high expression of polyamines has been linked to several tumors, including CRC [48][49][50][51][52]. However, the direct correlation between the expression of SRM and human tumorigenesis remains unclear.
Subsequently, we performed GSEA analysis to explore the enriched biological pathways of the coexpressed genes. Most were enriched in the "extracellular matrix component," "angiogenesis," and "cell adhesion molecule binding," which correlated with cancer pathogenesis. The extracellular matrix (ECM) is the main component of the tumor microenvironment and is composed of proteins, glycoproteins, proteoglycans, and polysaccharides [53]. Disorders in the ECM structure and composition promote cellular transformation and metastasis during cancer progression [54]. Angiogenesis is the formation of new blood vessels from existing vasculature, an important event in tumor growth, invasion into surrounding tissue, and distant metastasis [55]. Inhibition of angiogenesis has been a valuable therapeutic strategy for many solid tumors. Cell adhesion molecules (CAMs) play an important role in the interaction of cells with other cells and their surrounding intercellular environment. The role of CAMs in cancer involves angiogenesis, disruption of tissue architecture, and loss of contact with other cells and the ECM [56,57]. Tumor-infiltrating immune cells (TIICs) include T cells, dendritic cells (DCs), B cells, macrophages, neutrophils, mast cells, and natural killer (NK) cells. They are major components of the complex microenvironment that regulates tumor progression and predicts prognosis in CRC [58][59][60]. To further explore the immune infiltration-related mechanism of how ceRNA regulates CRC prognosis, the estimated proportions of 22 tumor-infiltrating immune cell types in CRC were evaluated by CIBERSORT analysis based on MIR4435-2HG and ELFN1-AS1 expression levels. Our results revealed that nine Fig. 11 Immunohistochemical analysis of protein expression levels of genes based on The Human Protein Atlas (THPA) database. a Immunohistochemical results for normal tissues. b Immunohistochemical results for CRC tissues. c Plot diagram of staining scores between CRC and normal tissues immune cell types, including CD4+ memory resting T cells, CD4+ memory activated T cells, CD8+ T cells, gamma delta T cells, follicular helper T cells, plasma B cells, neutrophils, monocytes, and resting myeloid DCs, showed statistical differences between the high and low expression groups. Several recent studies reported that the type, density, and location of tumor-infiltrating immune cells are associated with favorable outcomes in CRC [58,61]. The densities of CD4+ and CD8 + memory T cells were negatively correlated with local tumor invasion based on the T stage of the TNM classification [59]. DCs are potent antigen-presenting cells that play an important role in initiating, programming, and regulating antitumor immune responses. DCbased immunotherapy has been used as anti-cancer immunotherapy to treat various cancers, including CRC [62,63]. The peripheral neutrophil-tolymphocyte ratio (NLR) and lymphocyte-tomonocyte ratio (LMR) are prognostic factors for CRC patients [64,65]. The association of MIR4435-2HG and ELFN1-AS1 lncRNA expression with tumor-infiltrating immune cell types supports their role as valuable prognostic markers and immunotherapeutic targets in CRC.
Nevertheless, there are still some limitations to the present study. First, the data used in this study were mainly from Western countries and lacked data from the Asian population. Second, the number of normal cases was relatively small, and a larger number of paired samples are needed for further analysis. Third, the multiple sources of the online verification data need to be further confirmed by homogeneous research samples and systematic experimental studies. For example, although the immunohistochemistry data were extracted from The Human Protein Atlas (THPA) to verify the upregulation of the identified genes in CRC at the protein level, the exact molecular mechanisms involving these hub genes have not been experimentally investigated in the present study. In addition, we observed opposing PLAU expression patterns at the mRNA and protein levels ( Fig. 10 and Fig. 11), which is inconsistent with previous reports [66,67]. Taken together, further experimental studies based on a larger sample size are needed to verify our hypotheses.

Conclusions
The regulatory networks of MIR4435-2HG and ELFN1-AS1 are considered to be closely associated with CRC prognosis. Although further functional experiments are required, our findings contribute to the development of MIR4435-2HG and ELFN1-AS1 as potential prognostic biomarkers and therapeutic targets in CRC.