Microarray data analysis on gene and miRNA expression to identify biomarkers in non-small cell lung cancer

The aim of this study was to gain further investigation of non-small cell lung cancer (NSCLC) tumorigenesis and identify biomarkers for clinical management of patients through comprehensive bioinformatics analysis. miRNA and mRNA microarray datasets were downloaded from GEO (Gene Expression Omnibus) database under the accession number GSE102286 and GSE101929, respectively. Genes and miRNAs with differential expression were identified in NSCLC samples compared with controls, respectively. The interaction between differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs) was predicted, followed by functional enrichment analysis, and construction of miRNA-gene regulatory network, protein-protein interaction (PPI) network, and competing endogenous RNA (ceRNA) network. Through comprehensive bioinformatics analysis, we anticipate to find novel therapeutic targets and biomarkers for NSCLC. A total of 123 DEmiRs (5 up- and 118 down-regulated miRNAs) and 924 DEGs (309 up- and 615 down-regulated genes) were identified. These genes and miRNAs were significantly involved in different pathways including adherens junction, relaxin signaling pathway, and axon guidance. Furthermore, hsa-miR-9-5p, has-miR-196a-5p and hsa-miR-31-5p, as well as hsa-miR-1, hsa-miR-218-5p and hsa-miR-135a-5p were shown to have higher degree in the miRNA-gene regulatory network and ceRNA network, respectively. Furthermore, BIRC5 and FGF2, as well as RTKN2 and SLIT3 were hubs in the PPI network and ceRNA network, respectively. Several pathways (adherens junction, relaxin signaling pathway, and axon guidance) miRNAs (hsa-miR-9-5p, has-miR-196a-5p, hsa-miR-31-5p, hsa-miR-1, hsa-miR-218-5p and hsa-miR-135a-5p) and genes (BIRC5, FGF2, RTKN2 and SLIT3) may play important roles in the pathogenesis of NSCLC.


Background
Lung cancer presents the major cause of cancer deaths in the past few decades and has been a major public health problem [1]. Histologically, non-small cell lung cancer (NSCLC) accounts for more than 80% of lung malignancy cases, with an overall 5-year survival rate below 15% [2]. More than 70% of these cases are diagnosed with locally advanced or metastatic disease and the prognosis for NSCLC patients remains poor [3]. Therefore, it places a high priority on elucidating the molecular mechanisms of NSCLC pathogenesis and identify diagnostic or predictive biomarkers.
Research has revealed significant interactions between gene alterations and tumorigenesis and tumor progression of many types of cancers [4]. CD44 was shown to be overexpressed in NSCLC and involved in the occurrence and migration of NSCLC [5]. The genes L1TD1 (LINE-1 Type Transposase Domain Containing 1) and SPAG6 (Sperm Associated Antigen 6) are tumor-specifically methylated in NSCLC [6]. Recently, Morris et al. reported that FPR1 mRNA levels in whole blood predicts both NSCLC and small cell lung cancer [7]. MicroRNAs (miRNAs) are a large group of small non-coding RNAs of 20-24 nucleotides that are involved in the fine-tuning of various biological processes. They bind to multiple target mRNAs typically in the 3′-untranslated region (3′-UTR) and govern gene expression at the post-transcriptional level [8]. A possible tumor suppressor role for hsa-miR-30d in progression of NSCLC was shown in a recent study [9]. Yang et al. demonstrated that miR-598 suppressed the invasion and migration in NSCLC as a tumor suppressor through negatively regulate Derlin-1 (DERL1) and epithelial-mesenchymal transition (EMT) [10]. However, the roles of genes and miRNAs in NSCLCs are still not well understood [11].
Expression profiling with high-throughput microarrays has become a mature and widely used technology to obtain more global views on cancer genes and to identify novel cancer biomarkers [12]. In this study, we reported an integrated analyses of miRNAs and gene expression by reanalyzing public datasets from GSE102286 and GSE101929. Differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs) were identified in NSCLC samples compared with controls, respectively. The interaction between DEmiRs and DEGs was predicted, followed by functional enrichment analysis, and construction of miRNA-gene regulatory network, protein-protein interaction (PPI) network, and competing endogenous RNA (ceRNA) network. Through comprehensive bioinformatics analysis, we anticipate to find novel therapeutic targets and biomarkers for NSCLC.

Source of the microarray data
The public Gene Expression Omnibus (GEO) repository at http://www.ncbi.nlm.nih.gov/geo/ serve as a public data archive that freely disseminates high-throughput functional genomic data [13]. In this study, miRNA and mRNA microarray datasets were downloaded from the GEO database under the accession number GSE102286 and GSE101929,  Patients that underwent curative NSCLC surgery between 1998 and 2014 were enrolled. The mRNA and miRNA cohorts included matched tumor and normal pairs. The mRNA cohort included 25 matched pairs and 7 unmatched pairs and the miRNA cohort included 93 matched pairs and two unmatched pairs.

Data preprocessing and DEmiRs and DEGs screening
The miRNA raw data (RCC files) were received and read into the R statistical environment, followed by data preprocessing using the NanoStringNorm package [14] (v1.2.1, https://cran.r-project.org/web/packages/NanoStringNorm/ ), including background correction, normalization, and concentration prediction. The results of non-human miRNA probes on the chip were removed, and the remnant was used as the final miRNA expression value.
In addition, the mRNA raw data (CEL files) and the platform annotation file were downloaded. The affy package [2] was used for data preprocessing, including background correction, normalization, and expression level calculation. In accordance with the platform annotation  information, the probes were converted into gene symbols. For multiple probes corresponding to the same gene symbol, the probe average was taken as the final expression value of gene. The expression matrix was divided into tumor group and normal group, and DEmiRs and DEGs were screened separately. Non-paired t-test provided by limma [15] were used to calculate the P-values of significant gene expression differences. Benjamini-Hochberg (BH)-corrected P-values < 0.05 and |log 2 fold change (FC)| > 1 were chosen as the threshold for the identification of significant DEmiRs and DEGs.
Identification of DEmiR-DEGs relationship pairs miRWalk 2.0 tools [16] was applied to predict the target genes of the top 10 up-and down-regulated miRNAs with high FC value. The miRNA-target gene regulatory data from at least 5 of the 7commonly used databases (miR-Walk, miRanda, miRDB, miRMap, miRNAMap, RNA22, Targetscan) were collected. The obtained miRNA-target relationship was matched with DEGs obtained by microarray analysis to obtain the interaction between DEmiRs and DEGs (DEmiR-DEGs relationship pairs).

Pathway enrichment analysis of miRNAs and genes
The Kyoto Encyclopedia of Genes and Genomes (KEGG) [17] function enrichment analysis of miRNAs involved in the DEmiR-DEG interaction was carried out using R clusterprofiler package [18], and the pathways with P-value < 0.05 and count ≥2 were significantly enriched.
KEGG [17] pathway enrichment analysis was also conducted on the up-and down-regulated genes involved in the DEmiR-DEG relationship pairs. The commonly used DAVID [7] (version6.8, https://davidd.ncifcrf.gov/) was applied for the function enrichment analysis with gene count≥2 and P-value < 0.05 as the thresholds.

Construction of miRNA-gene regulatory network and module analysis
Based on the interaction information of the DEmiR-DEG, the construction of miRNA-gene regulatory network was performed using the Cytoscape software (vesrion: 3.2.0) [19]. The topological properties of network nodes were analyzed, such as the centrality of nodes.

PPI network of the DEGs involved in DEMIR-DEG
The STRING database [20] (version: 10.0) was used to predict the PPI pairs of proteins encoded by DEGs involved in the DEmiR-DEG. The input gene set was DEGs and the species was Homo sapiens. The parameter of PPI score was set as 0.4 (indicating medium confidence) [21]. A PPI network involved in DEmiR-DEG was constructed by Cytoscape software (version: 3.2.0) [19].

Construction of ceRNA network
ceRNA can act as decoys for miRNA binding and form complex regulatory networks based on an "miRNA response element (MRE) language" that regulate the abundance of any MRE-containing RNA transcripts such as long noncoding RNAs (lncRNAs) [22]. We used the miRNA-lncRNA interaction pairs in the starBase database (http://starbase.sysu.edu.cn/) [23] to screen the lncRNAs that have interaction with the DEMIR. The screening criteria for screening miRNA-lncRNA regulatory relationship pairs was medium stringency ≥2 and number of cancer types ≥1. Baes on the predicted miRNA-lncRNA relationship pairs and DEmiR-DEG (miRNA-mRNA) regulatory relationships obtained above, the lncRNA-miRNA-mRNA network (ceRNA network) was constructed using Cytoscape software.

Identification of DEGs and DEmiRs
After data preprocessing, a total of 924 DEGs were obtained, of which 309 genes were significantly up-regulated and 615 were significantly down-regulated. The principal component analysis (PCA) and volcano plots of DEGs were shown in Fig. 1a and b. Additionally, 123 DEmiRs were identified, among which 5 miRNAs were significantly up-regulated and 118 miRNAs were significantly downregulated. PCA and volcano plots of those DEmiRs were shown in Fig. 1c and d. Then, by searching the miRWalk2.0 database, we found target genes for 18 mature miRNAs. Finally, we obtained 715 pairs of DEmiR-DEG, including 6 up-and 12 down-regulated miRNAs, as well as 101 up-and 245 down-regulated genes.

Construction of PPI network
The PPI network of genes involved in the DEmiR-DEG interaction was constructed as Fig. 4, including 860

ceRNA network construction
The lncRNAs were predicted for all 18 miRNAs in the regulatory network, resulting in 9 miRNA-lncRNA relationships, including 5 miRNAs and 7 lncRNAs. We Fig. 4 Protein-protein interaction network. The red circle indicates the up-regulated gene, the green prism represents the down-regulated gene, and the node size represents the degree value. The greater the degree value, the larger the node compared these 5 miRNAs in the regulation relationship of DEmiR-DEGs and integrated with miRNA-lncRNAs interaction. A total of 214 relationships were obtained, including 3 up-regulated miRNAs (hsa-miR-9-5p, has-miR-196a-5p, and hsa-miR-31-5p), 2 down-regulated miRNAs (hsa-miR-135a-5p and hsa-miR-1), 41 up-regulated genes, 124 downregulated miRNAs, and 7 lncRNAs. The ceRNA network was shown in Fig. 5. The genes RTKN2 and SLIT3 had relative higher degree of 4 and 3, respectively. Both of RTKN2 and SLIT3 can be regulated by hsa-miR-1, hsa-miR-9-5p, and hsa-miR-135a-5p.

Discussion
In this study, we integrated mRNA expression profile and miRNA expression profile to evaluate changes of gene and miRNA expression in NSCLC. A total of 123 DEmiRs (5 up-and 118 down-regulated miRNAs) and 924 DEGs (309 up-and 615 down-regulated genes) were found. These genes and miRNAs were significantly involved in different pathways including relaxin signaling pathway, adherens junction, and axon guidance. Hsa-miR-9-5p, has-miR-196a-5p and hsa-miR-31-5p, as well as hsa-miR-1, hsa-miR-218-5p and hsa-miR-135a-5p were shown to have higher degree in the miRNA-gene regulatory network and ceRNA network, respectively. Furthermore, BIRC5 and FGF2, as well as RTKN2 and SLIT3were hubs in the PPI network and ceRNA network, respectively.
Evidence suggests that relaxin, a peptide hormone, can promote invasiveness of breast cancer cell lines through increasing the expression of matrix metalloproteinases (MMPs) [24,25]. A study has shown that mouse relaxin Fig. 5 Competing endogenous RNA (ceRNA) network. The red circle represents the up-regulated gene, green prism represents the downregulated gene, yellow triangle represents the up-regulated miRNA, blue arrow represents the down-regulated miRNA, and the white square is the lncRNA gene expression in the lungs can result in enhanced MMP activity [26]. However, there are few direct evidence targeting the role of relaxin in the progression of lung cancer. In this study, we found that the DEGs and DEmiR were associated with relaxin signaling pathway. Disruption of adherens junction has been demonstrated to play a causal role in cancer initiation and progression [27]. In addition, Notch1 destabilizates the adherens junction complex and influence NSCLC cell proliferation [28]. Cancer cells secrete axon guidance molecules control the invasion and migration of cancer [29]. In this study, we found that DEGs and DEmiRs were also involved in the adherens junction and axon guidance. Taken together, we suggested that relaxin signaling pathway, adherens junction, and axon guidance may play important roles in the pathogenesis of NSCLC.
Furthermore, BIRC5 and FGF2, as well as RTKN2 and SLIT3 were hubs in the PPI network and ceRNA network. Previous study showed that BIRC5 was up-regulated in NSCLC [36]. FGF2-mediated autocrine signaling is activated in NSCLC cell lines [37]. Epigenetic inactivation of SLIT3 has been found in human cancers [38]. However, there is no evidence regarding to the role of RTKN2 in lung cancer. Nevertheless, we suggested that BIRC5, FGF2, RTKN2 and SLIT3 may be important for lung cancer; but experiment verification should be performed in future.

Conclusion
In conclusion, the present study has demonstrated the potentially critical roles of several pathways (adherens junction, relaxin signaling pathway and axon guidance) miRNAs (hsa-miR-9-5p, has-miR-196a-5p, hsa-miR-31-5p, hsa-miR-1, hsa-miR-218-5p, and hsa-miR-135a-5p) and genes (BIRC5, FGF2, RTKN2 and SLIT3) in the progression of NSCLC. The miRNAs and genes may be favorable biomarkers for patients with this cancer. However, there were still some limitations. Firstly, not all mRNA samples were from miRNA samples, which will reduce the statistical power for examining the relationship between miRNA and mRNA. Secondly, this findings were identified in one cohort, which should be confirmed in other cohorts. And hence future validations with the miRNA and mRNA samples from the same patients should be conducted.