Comprehensive Analysis of lincRNA-Related ceRNA Network in Glioblastoma

Long intergenic non-coding RNAs (lincRNAs) are capable of regulating several tumours, while competitive endogenous RNA (ceRNA) networks are of great signicance in revealing the biological mechanism of tumours. Currently, there is a dearth of studies on the ceRNA network of lincRNAs in glioblastoma (GBM), which we aimed to assess in the present study. We obtained GBM and normal brain tissue samples from TCGA, GTEx, and GEO databases, and performed weighted gene co-expression network analysis and differential expression analysis on all lincRNA and mRNA data. Subsequently, we predicted the interaction between lincRNAs, miRNAs, and target mRNAs. Univariate and multivariate Cox regression analyses were performed on the mRNAs using CGGA data, and a Cox proportional hazards regression model was constructed. National Center Biotechnology Gene Expression Omnibus (GEO), Chinese Glioma Genome Atlas (CGGA), and Genotype Tissue Expression (GTEx) databases. We used weighted gene co-expression network analysis (WGCNA) and differential analysis to screen key lincRNAs, construct lincRNA-related ceRNA networks in GBM, and obtain a prognostic model. We believe this study provides a basis for further studies on the molecular mechanism of GBM and exploration of new therapeutic targets.


Conclusion
We identi ed four lincRNAs that have the most research values for GBM and obtained the ceRNA network. Our research is expected to facilitate in-depth understanding and study of the molecular mechanism of GBM, and provide new insights into targeted therapy and prognosis of the tumour. Background Glioblastoma (GBM) is the most aggressive and destructive primary malignant central nervous system tumour [1]. At present, the most established treatment protocol for this tumour involves curative surgery and adjuvant radiotherapy combined with temozolomide; however, its prognosis is still very poor, and the average survival time of patients is approximately 2 years [2]. Moreover, various treatment methods affect normal brain tissue, and the quality of life of patients does not signi cantly improve afterwards.
Therefore, an in-depth study of the molecular mechanism of GBM occurrence and development, while exploring possible therapeutic targets, may prove bene cial for the diagnosis and treatment of the tumour.
Although approximately 90% of genes in the human genome can be transcribed, only approximately 2% of the genes are protein coding, and non-coding RNAs account for most of the remainder [3]. Long noncoding RNAs (lncRNAs) contain >200 nucleotides and have no protein coding function. They are mainly transcribed from different regions of the genome by RNA polymerase II, and have been shown to be closely related to cancer [4,5]. The tissue speci city of lncRNA expression has been reported to be higher than that of mRNA expression, which applies even in pathological conditions such as cancer [6]. Long intergenic non-coding RNA (lincRNA) is a type of lncRNA that does not overlap with exons of proteincoding genes and other non-lincRNA genes, and participates in many important biological processes [7,8]. Cancer-related lincRNAs may be targeted to provide new ways for cancer diagnosis and treatment [9].
Linc-ROR is a tumour promoter, which mainly participates in tumour cell proliferation, apoptosis, invasion, angiogenesis, and cancer stem cell generation by regulating target genes [10]. NEAT1 is a p53-regulated lincRNA, which plays a key role in cancer occurrence [11]. LincRNA-p21 plays an important role in regulating TAM function in tumour microenvironments, and can be used as a new cancer treatment target for macrophage in ltration [12]. In general, molecular mechanisms related to lincRNA in tumours have important research value.
A competitive endogenous RNA (ceRNA) network is the interaction between lncRNAs, miRNAs, and mRNAs that constitutes a complex regulatory network systems, which has extensive functions in the human genome and plays a signi cant role in cancer [13]. It has been reported that linc-ROR can be used as the ceRNA of miR-145 to regulate the proliferation, invasion, and tumourigenicity of pancreatic cancer cells [14]. Further, a study has demonstrated that lincRNA-HOTAIR is highly expressed in a variety of tumour tissues and cells, and is associated with tumour metastasis and poor prognosis. Additionally, its related ceRNA network has been shown to play a role in the progression of kidney cancer [15]. However, to our knowledge, there is currently no study on ceRNA network related to lincRNA in GBM. Therefore, in this study, we comprehensively analysed the GBM data retrieved from The Cancer Genome Atlas (TCGA), National Center for Biotechnology Information (NCBI), Gene Expression Omnibus (GEO), Chinese Glioma Genome Atlas (CGGA), and Genotype Tissue Expression (GTEx) databases. We used weighted gene coexpression network analysis (WGCNA) and differential analysis to screen key lincRNAs, construct lincRNA-related ceRNA networks in GBM, and obtain a prognostic model. We believe this study provides a basis for further studies on the molecular mechanism of GBM and exploration of new therapeutic targets.

Data acquisition and pre-processing
We obtained RNA-Seq data of 165 GBM samples and 1,029 normal brain tissue samples from the TCGA and GTEx databases, respectively, using the UCSC Xena Browser (https://xena.ucsc.edu/). Further, the mRNA-seq and clinical data of 693 glioma samples were retrieved from CGGA (http://cgga.org.cn/), from which 248 GBM samples were selected for this study. Among the selected samples, 236 patients had survival information, while 198 had all clinical information. Additionally, we retrieved GSE50161 (comprising 34 GBM and 13 normal samples; platform: Affymetrix-GPL570) and GSE134783 (comprising 71 GBM samples; platform: Affymetrix-GPL570) from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The IDs of all samples were transformed into gene symbols based on GENCODE (https://www.gencodegenes.org/human/). The RNA-seq data fetched from TCGA and GTEx were collated and merged, and the expression matrices of mRNA and lincRNA were obtained. We also combined and normalised the GEO data and obtained the mRNA expression matrix.

Construction of weighted gene co-expression network
Using the combined data of TCGA and GTEx, a weighted gene co-expression network was constructed for lincRNA and mRNA. Based on the mRNA of the GEO combined data, a weighted gene co-expression network was constructed. WGCNA R software package [16] was used to build a co-expression network to mine key modules related to GBM. First, a hierarchical cluster analysis was performed to check the heterogeneity of the samples. Subsequently, according to the scale-free topological standard, we chose the appropriate soft threshold power (β) to construct a weighted adjacency matrix and convert the adjacency relationship into a topological overlap matrix. Finally, we obtained the gene module, estimated the relationship between the module and trait, determined the module related to GBM, and showed the correlation between GBM and key gene modules, using a scatter plot.

Screening of differentially expressed genes
We used the Limma R package [17] to screen the differentially expressed lincRNA (DElincRNA) between the GBM patients and control group from the combined data of TCGA and GTEx.
Differential expression analysis was performed on the mRNA data in the TCGA and GTEx combined data and GEO combined data, and the two selected differential genes were combined to obtain differentially expressed mRNA (DEmRNA)(P <0.05, and |log2 FC| ≥ 1).

Gene function and pathway enrichment analysis
We used the clusterPro ler R package [18] for Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. GO is used to describe gene function, and KEGG is used to obtain possible pathways. Further, the GOplot R package [19] was used to visualise the GO term or KEGG approach (adjusted p<0.05).

Construction of Cox regression model
Univariate COX regression analysis was performed on the CGGA data, using the Survival R software package, to evaluate the effect of mRNA expression on the survival time of GBM patients. Additionally, we performed multivariate COX regression analysis, constructed a multivariate Cox regression model, and identi ed the corresponding coe cients of GBM prognostic features. Further, we calculated the risk score to predict survival time, dividing the samples into high-risk and low-risk groups, with the median as the critical value. The "predict ()" function was used to calculate the risk score: risk score = h 0 (t)*exp(β 1 x 1 +β 2 x 2 +…+β n x n ). The correlation between the prognostic characteristics of patients and overall survival rate was then calculated through univariate and multivariate Cox regression analyses of clinical factors related to overall survival. Finally, the R package survivalROC was used to draw the receiver operating characteristic curve and calculate the area under the curve (AUC).
ceRNA network in GBM According to the mRNA obtained by multivariate Cox regression analysis, we screened the corresponding miRNA and lincRNA and reconstructed the ceRNA network.
Gene Set Enrichment Analysis (GSEA) GSEA software was used to analyse the RNA-seq data of lincRNA retrieved from the CGGA database. According to the median of lincRNA expression, the samples were divided into high and low expression groups. Statistical signi cance was set at p<0.05. A set of genes "c2. cp. kegg. v7.2. symbols. gmt" was retrieved from the Molecular Signature Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/index.jsp) and selected as the reference gene set.

WGCNA identi es key modules
We combined GTEx and TCGA data to construct a co-expression network for all lincRNAs, using the R package 'WGCNA', and con rmed that the β value in the network was 4 ( Figure 1A). Further, the dynamic tree cutting method was used to generate co-expression modules, and the closely related modules were merged into larger modules. Finally, 19 modules were generated in the lincRNA co-expression network ( Figure 1B). The module eigengenes (MEs) of the turquoise module had the strongest correlation with GBM traits ( Figure 1C). Figure 1D shows the correlation between gene signi cance and module membership in the turquoise module, which was considered a key module containing 2,206 lincRNAs.
We also constructed the co-expression networks of all mRNAs for GTEx and TCGA combined data and GEO combined data, comprising 20,270 and 14,807 mRNAs, respectively. In the mRNA co-expression networks of the two sets of data, β values were 8 and 4 ( Figure 2A and Figure 3A), and 22 and 21 modules were generated ( Figure 2B and Figure 3B), respectively. In the combined GTEx and TCGA data, the MEs of the blue and dark green modules have the strongest correlations with the tumour and normal traits, respectively ( Figure 2C). The blue and dark green modules were considered key modules, comprising 7,736 mRNAs. In the GEO data, Figure 3C shows that the MEs of the blue and turquoise modules (comprising 8,022 mRNAs) have the strongest correlations with the tumour and normal traits, respectively. Figures 2D-E and 3D-E show the correlation between gene signi cance and module membership.

Identi cation of differentially expressed lincRNAs (DE lincRNAs) and mRNAs (DEmRNAs)
We identi ed DElincRNAs and DEmRNAs from the GTEx and TCGA combined data, including 163 upregulated lincRNAs, 176 down-regulated lincRNAs, 2,953 up-regulated mRNAs, and 2,932 down-regulated mRNAs. From the GEO data, 2,434 up-regulated and 1,995 down-regulated mRNAs were identi ed. The two groups of mRNAs were then comprehensively analysed to obtain overlapping 1,040 down-regulated mRNAs and 1,358 up-regulated mRNAs. Finally, 339 DElincRNAs and 2,398 DEmRNAs were identi ed.

Preliminary construction of ceRNA network
Through WGCNA analysis of lincRNAs, we obtained 2,206 lincRNAs in the turquoise module and then integrated the analysis with 339 DElincRNAs to obtain 251 overlapping lincRNAs. Further, using the miRcode online tool, 251 lincRNAs were used to predict miRNAs. Additionally, using the miRDB, TargetScan, and miRTarBase datasets, the predicted miRNAs were used to obtain the corresponding mRNAs. By analysing the predicted mRNAs and DEmRNAs, we obtained 111 mRNAs (24 down-regulated and 87 up-regulated mRNAs). The predicted miRNAs and 251 lincRNAs were analysed using the 111 mRNAs, and 25 lincRNAs and 30 miRNAs were nally obtained. Subsequently, we used Cytoscape version 3.7.2 software to build a lincRNA-miRNA-mRNA ceRNA network (Figure 4).

Functional Enrichment Analysis
A functional enrichment analysis was performed on the 111 mRNAs obtained previously. The biological processes of enrichment were mainly cell cycle, epithelial cell proliferation, ossi cation, and sex differentiation ( Figure 5A). The cellular components were concentrated in the transcription factor, protein kinase, and serine/threonine protein kinase complexes (Figure 5 B). Enriched molecular function was mainly involved in DNA-binding transcription activator activity, RNA polymerase II-speci c core promoter binding, and HMG box domain binding (Figure 5 C). KEGG pathway analysis showed that the genes were associated with cellular senescence, cell cycle, multiple cancers, miRNA in cancer, and TGF-beta signalling pathway (Figure 5 D).

Construction of prognostic models
Among the CGGA GBM samples, complete survival information was available for 236 samples, and all clinical information was available for 198 samples. Univariate Cox regression analysis was performed on the 111 mRNAs previously obtained, and 27 mRNAs related to survival time were obtained (p<0.05) ( Figure 6A). Multivariate Cox analysis was performed using 27 mRNAs, and a Cox proportional hazards regression model of GBM patients containing 13 mRNAs was constructed ( Figure 6B). Based on the median risk score, all patients were divided into two groups (high-risk and low-risk groups). Figure 6C shows the survival status, survival time, and mRNA expression levels of patients. Survival analysis showed that patients in the low-risk group survived longer than those in the high-risk group (p<0.001) ( Figure 6D). The univariate Cox regression analysis showed that recurrence (p=0.004), isocitrate dehydrogenase (IDH) mutation (p=0.009), 1p19q codeletion (p=0.014), and risk score (p=0.001) were predictors ( Figure 6E). Moreover, multivariate Cox regression analysis con rmed that recurrence (p=0.002) and risk score (p<0.001) were independent risk factors ( Figure 6F). The risk score and recurrence AUCs of the nomogram were 0.668 and 0.663, respectively ( Figure 6G).

ceRNA network in GBM
The 13 mRNAs obtained through the previous multivariate COX regression analysis, as well as the 14 miRNAs and 23 lincRNAs interacting with them, were used to construct the nal ceRNA network ( Figure  7).

GSEA reveals the close relationship between lincRNA and GBM
Using | log2FC | ≥2 as a condition to screen 23 kinds of lincRNA, we obtained seven of them: MALA1, MEG3, NEAT1, MIR7-3HG, FAM95B1, EPB41L4A-AS1, and AC125494.1. Among them, extensive research has been conducted on MALAT1 and NEAT1. Thus, we selected MEG3, MIR7-3HG, FAM95B1, and EPB41L4A-AS1 in 248 CGGA samples for GSEA. As shown in Figure 8, the GSEA results revealed that these lincRNAs were closely related to cancer. MEG3 showed potential effects on prostate cancer, colorectal cancer, and cancer pathways ( Figure 8A), MIR7-3HG demonstrated potential role in small cell lung cancer, prostate cancer, and cancer pathways ( Figure 8B), FAM95B1 was closely related to endometrial cancer, non-small cell lung cancer, renal cell carcinoma, and MTOR signalling pathway ( Figure 8C), while EPB41L4A-AS1 was associated with small cell lung cancer, prostate cancer, and JAK/STAT signalling pathway ( Figure 8D). AC125494.1 is not found in the CGGA data.

Discussion
GBM, which is the most common type of glioma, accounts for about 15% of all brain tumours. It has been reported that there are about three GBM patients per 100,000 people, and the average survival time of patients is only approximately 12-18 months. The 5-year and 10-year survival rates are approximately 5.5% and 2.9%, respectively [20,21]. The low recovery rate and poor survival time of GBM patients may be related to the lack of e cient therapeutic targets. Therefore, it is of great signi cance to nd new biomolecular markers and therapeutic targets of GBM. Currently, many scholars and research institutions focus on the role of non-coding RNAs in tumours, and several molecules have been signi cantly associated. Studies have shown that many lincRNAs can act as oncogenes or tumour suppressor genes in cancer. In recent years, ceRNA network-related research has gained intense attention. In a ceRNA network, lincRNAs can competitively bind miRNAs to regulate the expression of target mRNAs. Additionally, several studies have shown that ceRNA network is closely related to the occurrence and development of cancer and may be valuable in predicting the prognosis of patients.
In this study, we used WGCNA to nd key modules related to GBM and combined differentially expressed genes for analysis. Based on the lincRNA-miRNA-mRNA interaction, a ceRNA network of GBM patients was preliminarily constructed, and 111 mRNAs in the network were functionally enriched. The results showed that these mRNAs have potential roles in cell cycle, cell proliferation, various cancers, miRNA in cancer, DNA-binding transcriptional activator activity, TGF-β signalling pathway, transcription factor complex, RNA polymerase II speci city, etc. Further, univariate Cox regression and multivariate Cox regression analyses were used to construct a Cox proportional hazards regression model with 13 key genes: CDK6, FAM84B, FBLIM1, FJX1, GNB5, HOXA3, MLEC, PDXK, SOX11, SPRY4, TBPL1, TRIB2, and WEE1. We calculated the risk score of patients, and veri ed that the prognostic model is highly accurate. In addition, the nomogram based on the risk score has the highest AUC value. Finally, 13 mRNAs, 14 miRNAs, and 23 lincRNAs were used to construct a ceRNA network. We selected four lincRNAs, including MEG3, MIR7-3HG, FAM95B1, and EPB41L4A-AS1, and predicted their functions using the GSEA software. Our previous study showed that these four lincRNAs were signi cantly under-expressed in GBM. These lincRNAs were shown to be closely related to a variety of cancers. Among them, MEG3 and MIR7-3HG are related to cancer pathways, while these two and EPB41L4A-AS1 all regulate the JAK/STAT signalling pathway.
MALAT1 and NEAT1 have been widely studied non-coding RNAs, and some studies have demonstrated that they have clear correlations with various cancers such as hepatocellular carcinoma and lung cancer [22][23][24][25]. It has also been reported that MALAT1 can be used as ceRNA of miR-199a to promote the expression of ZHX1, which in turn can regulate the proliferation of GBM cells [26]. The ceRNA effect between NEAT1 and miR-194-5p is related to the angiogenesis of glioma [27].
MEG3, also known as gene trap locus 2, is an imprinted gene located at 14q32 [28]. It is a new type of tumour suppressor that plays a role in various tumours, such as ovarian and bladder cancers [29,30]. Studies have shown that MEG3 is under-expressed in gliomas, and its over-expression has a signi cant inhibitory effect on the proliferation and migration of glioma cells, while promoting its apoptosis and autophagy, and inhibiting the PI3K/AKT/mTOR signalling pathway [31,32]. MEG3 can act as a ceRNA for miR-19a, thereby exerting an inhibitory effect on glioma [33]. The present study also predicted that MEG3 is potentially valuable in multiple tumours and cancer pathways.
According to previous reports, MIR7-3HG is associated with tumour progression and is highly expressed in endometrial cancer. Bioinformatics analysis by Wang et al. showed that its high expression in breast cancer is signi cantly related to the survival time of patients [34,35]. It has also been reported that this gene can be used as ceRNA to up-regulate the expression of PEG10 by sponged miR-27a-3p, and thus plays a role in retinoblastoma [36]. EPB41L4A-AS1, located in the 5q22.2 region of the human genome, is an induced gene of p53 and pGC-1α, can regulate glycolysis and glutamine metabolism, and plays an important role in cancer metabolic reprogramming [37]. It is highly expressed in colorectal cancer tissues and is involved in the proliferation, invasion, and migration of colorectal cancer cells [38]. Additionally, FAM95B1 has been shown to be associated with thyroid cancer [39]. Limitations of our study include the lack of cell or animal experiments and its retrospective nature. Subsequently, we hope to verify our results by conducting cell-based experiments.

Conclusion
Conclusively, in this study, we constructed the lincRNA-miRNA-mRNA ceRNA network of GBM, which may be involved in its molecular regulation, and identi ed four lincRNAs with potential roles in the tumour. To our knowledge, three of these lincRNAs, namely MIR7-3HG, FAM95B1, and EPB41L4A-AS1, are novel potential therapeutic targets for GBM, as there are no previous related studies. Further, we created a prognostic model containing 13 genes, which may serve as a reliable prognostic indicator of GBM. Limitations of our study include the lack of cell or animal experiments and its retrospective nature.
Subsequently, we hope to verify our results by conducting cell-based experiments. However, the present study is expected to facilitate in-depth understanding and study the molecular mechanism of GBM, and provide new insights into targeted therapy and prognosis of tumours.     The ceRNA network of lincRNA, miRNA and mRNA. Notes: Red hexagon denotes lincRNA, green triangle represents miRNA, and cyan ellipse represents mRNA.  The construction of a ceRNA network including 13 mRNAs, 14 miRNAs and 23 lincRNAs. Notes: Red hexagon denotes lincRNA, green triangle represents miRNA, and cyan ellipse represents mRNA.