A genetic variant in long non-coding RNA MALAT1 associated with survival outcome among patients with advanced lung adenocarcinoma: a survival cohort analysis

Recently studies have demonstrated that the long non-coding RNA (lncRNA) metastasis associated lung adenocarcinoma transcript 1 (MALAT1) may participate in the development and progression of lung cancer. In this study, we hypothesized that genetic variant of this lncRNA may affect the prognosis of lung cancer patients. We conducted a follow-up study for 538 patients with non–small cell lung carcinoma (NSCLC), including 140 early-staged (stage I and II) and 398 advanced staged (stage III and IV) patients. The genetic variant rs3200401 in MALAT1 was then genotyped among this population by using TaqMan assay. The association of this variant with overall survival of these patients was further analyzed. It was shown that among the advanced lung adenoma patients, subjects carrying rs3200401 CT and CT + TT genotypes had significantly longer median survival time (MST = 29.9, 28.9 vs. 19.3 month, Long-rank P = 0.019 and 0.024, respectively) and decreased death risks [crude HR (95% CI) = 0.65 (0.43–0.98) and 0.64 (0.44–0.95), P = 0.040 and 0.025, respectively], when compared to subjects wtih the MALAT1 rs3200401 CC genotype. However, the beneficial effect of rs3200401 was not seen among early NSCLC and advanced lung squamous cell carcinoma patients. We further tested the TCGA data, and found that a higher expression of MALAT1 was associated with metastatic of advanced lung adenocarcinoma but not with lung squamous cell carcinoma. The rs3200401 T allele located on the lncRNA MALAT1 was associated with a better survival for advanced lung adenocarcinoma patients, which may offer a novel prognostic biomarker for this patient subgroup. However, these results need to be validated in larger populations of lung cancer and the biological function of this variant still warrants further investigation.


Background
Lung cancer was the most common cancer and one of the leading cause of cancer-related death worldwide, and it contributed to 13.0% of new cancer cases diagnosed in 2012 [1]. The non-small cell lung cancer (NSCLC) represents almost 85 to 90% of total diagnosed lung cancer, while lung adenocarcinoma is the most common histological subtype. The 5-year survival rate of advanced NSCLC is less than 5% based on the Surveillance, Epidemiology, and End Results (SEER) Program data [2].
Long non-coding RNAs are RNA genes larger than 200 bps, which do not code for proteins but regulate gene expression and protein synthesis [3]. Several lncRNAs were reported to be associated with tumorigenesis, tumor progression, and tumor metastasis [4][5][6][7]. In general, most lncRNAs are expressed at low levels. Metastasis associated lung adenocarcinoma transcript 1 (MALAT1), also known as noncoding nuclear-enriched abundant transcript 2 (NEAT2), is one of the most abundant and highly conserved lncRNAs, indicating its potential functional importance [8]. MALAT1 is broadly expressed in normal human tissues and overexpressed in numerous cancers as well as NSCLC [9]. It has been proposed that MALAT1 can regulate gene expression and alternative splicing. MALAT1 is localized in nucleus speckles at SRSF2 splicing domain in several cell lines and can interact with pre-mRNA-splicing factor SF2/ ASF and CC3 antigen [10][11][12][13][14]. In vitro studies revealed that MALAT1 can regulate cell proliferation, migration, and vessel growth [15][16][17].
MALAT1 was originally identified as a marker for predicting metastasis and prognosis of early-staged NSCLC patients [9,18]. Many researches also indicated that MALAT1 is linked to other cancer types or diseases as a negative prognosis factor, such as glioma, pancreatic cancer, colorectal cancer, etc. [19][20][21][22][23][24][25]. However, the associations between genetic polymorphisms of lncRNA MALAT1 and lung cancer prognosis were less investigated.
In this study, we genotyped the single nucleotide polymorphism (SNP) rs3200401 located in lncRNA MALAT1 and aimed to investigate its association with the survival outcome of 398 advanced NSCLC patients.

Study patients
The study enrolled 538 patients who were diagnosed as NSCLC and treated at the Department of Oncology at Wuhan Iron and Steel (Group) Corporation Staff-Worker Hospital between January 2003 and December 2012. Patients who were still alive on December 31, 2013 (132 patients) were considered as censored, and the survival time for each patient was calculated from the date when they were confirmed diagnosed of lung cancer until the date of death or the last follow-up. The demographic data, lifestyle risk factors (e.g. smoking status, drinking), medical history, and clinical features were gathered by interview or from the patients' medical records. A large part of study patients has been studied and reported in our previous study [26].

SNP selection and genotyping
The NCBI dbSNP database were used to select the SNP of MALAT1 (https://www.ncbi.nlm.nih.gov/snp/). It was indicated by the dbSNP database that there are 16 SNPs located on MALAT1 gene with the MAF > 0.01, however, only rs3200401 had the MAF > 0.10 in all the 1000 Genome, the NHLBI "Grand Opportunity" Exome Sequencing Project (GO-ESP), and Exome Aggregation Consortium (ExAC) projects (https://www.ncbi.nlm.nih.gov/variation/view/) (Additional file 1: Table S1). Thus, to acquire adequate statistical power, the SNP rs3200401 was investigated in the present study, while the other SNPs with MAF < 0.10 were not selected. Genomic DNA samples were extracted from blood cells by using Gentra Puregene Blood Kit (QIAGEN, Hilden, Germany) following manufacturer's instructions. The MALAT1 polymorphism, rs3200401 C > T, was genotyped by Taq-Man assay among all study subjects using ABI 7900HT Sequence Detection System (Applied Biosystems, Waltham, Massachusetts, USA) and each sample was analyzed in duplicate. The primers and probes were purchased from Life Technologies (Catalog No. C___3246069_10). The genotyping call rate was 98.1% and the concordance rate was 100%.

Statistical analysis
Kaplan-Meier analysis and log-rank test were used to assess the associations between survival time and demographic characteristics, clinical features, and MALAT1 rs3200401 genotypes. We use dominant model to assess the association of SNP rs3200401 genotypes and survival outcome of early and advanced NSCLC patients, respectively. The multivariate Cox regression models, with adjustment for age, sex, smoking status, histology, TNM stage, and therapy treatments of surgical resection, chemotherapy, and radiotherapy, were used to estimate the adjusted hazard ratio (HR) and 95% CIs for the effect of MALAT1 rs3200401 genotypes on death risk for NSCLC patients. All data analyses were performed in SPSS software (version 22, IBM SPSS Statistics, IBM Corporation, Chicago, IL). Power analysis was performed by using Power and Sample Size version 13.2 application in SAS (version 9.4, SAS Institute Inc., Cary, NC).

Patient characteristics
The demographic information and clinical features for NSCLC patients were presented in Table 1. Among these patients, a total of 78 early-staged and 328 advanced NSCLC patients were confirmed death of lung cancer until the date of the last follow-up. Among these NSCLC patients, there are 450 males (84%) and 88 (16%) females, with a median age of 66.5 years (range, . Log-rank test and univariate cox-regression showed that NSCLC patients with age > 65, advanced stage (stage III or IV), and without surgical treatment had lower median survival time (MST) and higher death risks than their counterparts (all log-rank P < 0.05). However, there were no significant effects of smoking, chemotherapy, and radiotherapy on the MST and death risk of these NSCLC patients. The statistical power analysis showed that, as for the SNP rs3200401 (MAF = 0.187) analyzed in this study, it had the statistical power of 0.913 to detect the association with HR = 1.4 by using 398 subjects in the survival analysis.

Association of genetic variant rs3200401 with survival outcome among NSCLC patients
The associations between lncRNA MALAT1 rs3200401 genotypes and survival outcome of early-staged NSCLC patients were shown in Table 2. No significant associations were found between rs3200401 genotypes and the MST and death risk of early NSCLC patients, either in adenocarcinoma or squamous cell carcinoma. (Table 2).
We further analyzed the associations of MALAT1 rs3200401 with survival outcomes among advanced patients with lung adenocarcinoma and squamous cell carcinoma separately (Table 3). Compared to advanced  (Fig. 1a). However, this effect was not seen among advanced lung squamous cell carcinoma patients (Fig. 1b).

Stratified analysis for advanced NSCLC patients
We then stratified the advanced lung adenocarcinoma and squamous cell carcinoma patients by age, sex, smoking, TNM stage, surgical operation, chemo-or radio-therapy, respectively (Table 4). We found the rs3200401 CT + TT genotype was associated with a decreased death risk at a borderline significance among lung adenocarcinoma patients with age

Discussion
In this follow-up study for case-only survival analysis, we investigated the association of the genetic variation rs3200401 in lncRNA MALAT1 with the survival outcome of NSCLC patients. We found that among advanced lung adenocarcinoma patients, those carrying MALAT1 rs3200401 CT or CT + TT genotypes had significant longer survival time and decreased death risks than those carrying rs3200401 CC genotype. This finding suggested that rs3200401 C > T variant of MALAT1 might be a potential prognostic biomarker for predicting the survival of advanced lung adenocarcinoma patients.
Metastasis is the major cause of death from lung cancer [27], and MALAT1 was significantly associated with metastasis of early-stage NSCLC patients [9]. The in vitro cell and in vivo animal studies had revealed that the expression level of lncRNA MALAT1 is related to cell migration potential and tumor growth. Increased MALAT1 expression in tumor tissues of NSCLC patients is associated with an unfavorable overall survival [19], while the high expression of MALAT1 in tumor tissues was also found to be associated with an increased risk of metastasis and a poor overall survival among colorectal cancer [21], pancreatic cancer [22], glioma [23], and clear cell renal cell carcinoma [24]. MALAT1 can induce metastasis through various mechanisms. The in vitro siRNA-mediated MALAT1 silencing resulted in impaired lung cancer cell motility by altering the expression levels of cell motility-related genes, such as HMMR at pre-mRNA transcriptional level and CTHRC1, CCT4 and ROD1 at post-transcriptional level [15]. Inhibition of MALAT1 was seen to have an anti-proliferative effect and controls phenotypic switch in endothelial cells, indicating that MALAT1 may regulate angiogenesis and result in metastasis [17,28]. More researches suggested that upregulated MALAT1 can induce an epithelial-tomesenchymal transition (EMT) and bladder cancer cell migration [29] and promote brain metastasis [30].
Although MALAT1 expression in lung cancer tissue was reported to be associated with poor prognosis in lung squamous cell carcinoma [19], our findings indicated that the SNP rs3200401 cannot affect the survival outcome of lung squamous cell carcinoma patients. Ji et al. found that the association of MALAT1 with metastasis of NSCLC was distinct among different histological  subtypes: MALAT1 expression in metastatic lung adenocarcinoma was several fold higher than in non-metastatic adenocarcinoma, but no significant differences were found between metastatic and non-metastatic lung squamous cell carcinoma patients [9]. We obtained the clinical data and MALAT1 gene expression data from the cancer genome atlas (TCGA) using cBioPortal [31]. A total of 654 patients with primary lung cancer (359 lung adenocarcinoma and 295 lung squamous cell carcinoma) and available MALAT1 expression data from white population were included. Among them, a total of 117 patients were diagnosed at an advanced stage. The demographic data for these advanced NSCLC patients was presented in Additional file 1: Table S2. We found significant higher MALAT1 expression levels in lung adenocarcinoma tissues than in lung squamous cell carcinoma tissues (P <0.001, Fig. 2a). In 75 tumors with advanced lung adenocarcinoma, higher expression levels of MALAT1 were seen in tissues from M1 or Mx patients than those from M0 patients (P = 0.049). However, this difference was not shown in 42 patients with advanced lung squamous cell carcinoma (Fig. 2b). Considering the different characteristics of these two major subtypes of NSCLC, the lncRNA MALAT1 may induce tumor metastasis through different mechanisms between adenocarcinoma and squamous cell carcinoma.
Regarding the biological function for lncRNA MALAT1, it was found to localize to nuclear speckles and interact with serine/arginine-rich (SR) proteins like serine/arginine-rich splicing factor 1 (SRSF1), SRSF2 (SC35) and RNA-binding protein with serine-rich domain 1 (RNPS1) that controls alternative splicing of pre-mRNA, a transcriptional level regulation of gene expression [10,13,[32][33][34][35]. Large studies have reported that genetic polymorphisms on certain genes may affect the susceptibility of lung cancer [36,37], sensitivity of chemo-or radiotherapy [38][39][40], and length of survival or prognosis [41][42][43][44]. Liu et al. found a borderline significant association between rs619586 in MALAT1 and decreased hepatocellular carcinoma risk [45]. Another study carried out by Gong et al. did not found any association between rs619586 genotype and lung cancer risk, but patients with rs619586 A allele had more chance of response to platinum-based chemotherapy [40]. In this study, we did not investigate the rs619586 of MALAT1 because the dbSNP database suggested it is a lowfrequency SNP. Further studies with larger sample sized populations could investigate the low-frequency SNPs of MALAT1 with higher detection power, and the biological functions for the positive variants of MALAT1 also warrant further deep investigation.
The interaction between lncRNA and other molecules was probably determined by its structure rather than by its sequence. The polymorphism within lncRNA sequence may exert its function through alternative splicing of the transcript or lncRNA secondary structure change, resulting in gain or loss of function [46]. The SNP rs3200401 C > T variant locates in the region M of MALAT1 (6008-7011 nt), which is one of the binding sites to SRSF2 [34]. We use lncRNASNP database to predict potential functions of this SNP [47], such as structure change and miRNA-lncRNA interaction. We found that the C > T variation of rs3200401 caused 1.62 kcal/mol minimal free energy (MFE, ΔG) change, which may alter structural features of MALAT1 (Additional file 2: Figure S1), resulting in weaken interaction between MALAT1 and its binding protein SRSF2. MALAT1 can modulate phosphorylation of SRSF2, interact with SR proteins as a "molecular sponge" and influence their stability, and regulate the alternative splicing of pre-mRNAs [10,13]. SRSF2 and phosphorylated SRSF2 were reported to correlate with aggressive features of lung adenocarcinoma but not with lung squamous cell carcinoma patients [48]. It was biologically possible that SNP rs3200401 C > T variant may cause MALAT1-SRSF2 binding loss, affect phosphorylation of SRSF2, down-regulate phosphorylation of SRSF2, change the alternative splicing of pre-mRNAs, and then alter the expression levels of metastasis associated genes. These effects may result in a lower aggressive feature and a better survival for lung adenocarcinoma but not for squamous cell carcinoma patients.
Although this is the first study to describe lncRNA SNP and lung adenocarcinoma survival, some limitations should Fig. 2 a. The different expression levels of MALAT1 between lung adenocarcinoma and squamous cell carcinama tissues; b. differences in tissues from M1 or Mx to M0 patients be taken into consideration. First, this study used a singleinstitution cohort to investigate the association between MALAT1 variant and the survival outcome of lung cancer patients for practical reason. It would be ideal to have a multi-center based replication to validate our findings. Without such replication, our findings should be considered preliminary. Second, to get adequate statistical power by using this moderate cohort of lung cancer patients, we only choose the SNP with MAF > 0.1 (rs3200401) to investigate in the present study. The other low-frequency SNPs in MALAT1 were also needed to be investigated in further large sample-sized populations. Finally, it is a pity that there were no rs3200401 genotype data in the TCGA database, except for the expression of MALAT1 in lung tumor tissues. To give a clue for further interpretation and explanation of the biological function of rs3200401, we still analyzed the TCGA data and found that a higher MALAT1 expression level was associated with a worse survival outcome among advanced lung adenocarcinoma patients. But the biological function of this SNP and its effect on MALAT1 expression level need to be deep investigated by further biological studies.

Conclusions
In conclusion, this study revealed that the genetic variation SNP rs3200401 T allele located on lncRNA MALAT1 was associated with a better survival of advanced lung adenocarcinoma patient, while this effect was not seen among lung squamous cell carcinoma patients. The protective effect of rs3200401 T allele may because it can influence the secondary structure of lncRNA MALAT1, or influence the interaction between MALAT1 and SR proteins thus by altering the expression levels of metastasis associated genes. The MALAT1 rs3200401 T allele may serve as a novel biomarker for predicting clinical outcomes of lung adenocarcinoma. Further large population based survival analysis and mechanistic studies are required to confirm our findings.

Additional file
Additional file 1: Table S1. The SNPs located on the lncRNA MALAT1 gene (data source: the dbSNP database). Table S2. Demographic and clinical characteristics of 117 advanced NSCLC patients from TCGA. (DOCX 24.8 kb) Additional file 2: Figure S1. Predicted secondary structures of lncRNA MALAT1. C > T variation of rs3200401 caused 1.62 kcal/mol minimal free energy (MFE, ΔG) change, which may alter structural features of MALAT1, resulting in weaken interaction between MALAT1 and its binding protein SRSF2. (TIF 215 kb) Abbreviations lncRNA: Long non-coding RNA; MALAT1: Metastasis associated lung adenocarcinoma transcript 1; NSCLC: Non-small cell lung carcinoma; SNP: Single nucleotide polymorphism