A TMEFF2-regulated cell cycle derived gene signature is prognostic of recurrence risk in prostate cancer

Background The clinical behavior of prostate cancer (PCa) is variable, and while the majority of cases remain indolent, 10% of patients progress to deadly forms of the disease. Current clinical predictors used at the time of diagnosis have limitations to accurately establish progression risk. Here we describe the development of a tumor suppressor regulated, cell-cycle gene expression based prognostic signature for PCa, and validate its independent contribution to risk stratification in several radical prostatectomy (RP) patient cohorts. Methods We used RNA interference experiments in PCa cell lines to identify a gene expression based gene signature associated with Tmeff2, an androgen regulated, tumor suppressor gene whose expression shows remarkable heterogeneity in PCa. Gene expression was confirmed by qRT-PCR. Correlation of the signature with disease outcome (time to recurrence) was retrospectively evaluated in four geographically different cohorts of patients that underwent RP (834 samples), using multivariate logistical regression analysis. Multivariate analyses were adjusted for standard clinicopathological variables. Performance of the signature was compared to previously described gene expression based signatures using the SigCheck software. Results Low levels of TMEFF2 mRNA significantly (p < 0.0001) correlated with reduced disease-free survival (DFS) in patients from the Memorial Sloan Kettering Cancer Center (MSKCC) dataset. We identified a panel of 11 TMEFF2 regulated cell cycle related genes (TMCC11), with strong prognostic value. TMCC11 expression was significantly associated with time to recurrence after prostatectomy in four geographically different patient cohorts (2.9 ≤ HR ≥ 4.1; p ≤ 0.002), served as an independent indicator of poor prognosis in the four RP cohorts (1.96 ≤ HR ≥ 4.28; p ≤ 0.032) and improved the prognostic value of standard clinicopathological markers. The prognostic ability of TMCC11 panel exceeded previously published oncogenic gene signatures (p = 0.00017). Conclusions This study provides evidence that the TMCC11 gene signature is a robust independent prognostic marker for PCa, reveals the value of using highly heterogeneously expressed genes, like Tmeff2, as guides to discover prognostic indicators, and suggests the possibility that low Tmeff2 expression marks a distinct subclass of PCa. Electronic supplementary material The online version of this article (10.1186/s12885-019-5592-6) contains supplementary material, which is available to authorized users.


Background
Cancer of the prostate (PCa) is the second leading cause of cancer death in male Americans. The clinical behavior of PCa is variable, and while the majority of PCa cases remain indolent, 10% of patients progress with aggressive metastatic disease and subsequent emergence of therapy-resistant PCa [1,2]. In current practice, clinical variables including Gleason score, tumor stage, and PSA levels are used at the time of diagnosis to predict disease outcome [3,4]. However, these prognostic factors have limitations, resulting in significant rates of overtreatment, with associated comorbidities [5][6][7], and undertreatment, leading to disease progression and increased risk of PCa-specific mortality [8][9][10].
The clinical heterogeneity of PCa reflects, in part, a remarkable genomic heterogeneity [11][12][13][14][15][16][17][18]. This suggests that disease stratification based on molecular features may be of prognostic value beyond standard clinicopathological variables, and aid in the clinical management of the disease, as is the case for other cancers, i.e. breast [19][20][21]. Currently, several tissue-based molecular tests offer prognostic information for patients with PCa either before or after treatment. These are based on general features of malignancy, such as the Prolaris test (initially described by Cuzick et al. [22]), which incorporates information from 31 cell-cycle related genes, or on molecular features more specific for PCa (Decipher, Oncotype DX, ProMark, and Con-firmMDx tests [23][24][25][26][27]). In addition, recent work has outlined the existence of several molecular subtypes of PCa [28][29][30][31]. Notably, in one of these studies, the molecular subtypes were defined by specific driver mutations or gene fusions that are essentially mutually exclusive and that are able to categorize up to 74% of the analyzed tumors [32]. If shown to correlate with clinical behavior, these molecular subtypes could prove critical for the management and treatment of the disease. However, currently their prognostic value is not fully established, and a significant fraction of primary prostate cancers in the study could not be categorized within these molecular subsets, suggesting the existence of additional relevant molecular alterations.
High levels of variability in gene expression between tumors can be useful in identifying prostate and other cancers' risk genes [33]. We hypothesized that molecular subtypes of primary prostate cancers may exist that have gene expression patterns associated with changes in the expression of these highly variable genes. A recent report lists TMEFF2 as one of the top 100 mRNA transcripts with the highest levels of inter-tumor variability in primary PCa tissues [34]. TMEFF2 is an androgen regulated transmembrane protein mainly restricted to brain and prostate. Our studies in PCa demonstrate a role of TMEFF2 as a tumor suppressor [35][36][37][38]. Furthermore, studies using limited numbers of clinical samples, reveal changes in the expression of Tmeff2 with disease stage in PCa [39,40] and gliomas [41], supporting an important role of Tmeff2 in these diseases.
We have investigated the expression pattern of TMEFF2 in human prostate tissues and explored the potential of a TMEFF2 associated gene signature as a biomarker for disease prognosis. We report that low TMEFF2 mRNA expression is associated with decreased disease free survival (DFS) in the MSKCC PCa dataset. Using transcriptional profiling of cell lines and publically available PCa clinical data, we have identified a low TMEFF2 driven gene signature associated with poor clinical outcome, comprised of cell cycle related genes. This study not only provides new insights into the clinical relevance of Tmeff2 in cancer, but also specifies a group of cell cycle related genes as prognostic and potential therapeutic targets.

TMEFF2 expression data
TMEFF2 mRNA expression in benign and malignant samples of PCa was interrogated using Oncomine Compendium of Expression Array data [42] [46]).

Validation cohorts
Four prostate cancer cohorts were used in this study to establish the prognostic value of the TMCC11 signature: MSKCC [46] (GSE21032); Cambridge [34] (GSE70768) and Stockholm [34] (GSE70769), are microarray datasets, and the TCGA PRAD (https://gdc.cancer.gov), a RNA sequencing cohort. Cancer samples for all cohorts were from RP specimens. Biochemical recurrence (MSCKK, Cambridge and Stockholm) or recurrence/progression (TCGA-PRAD) was the follow-up endpoint. Clinical, histopathological data and summary of the cohorts are listed in Table 1 and Additional file 1: Table S1.

RNA extraction and RNA-Seq
LNCaP cell expressing sh_TMEFF2 or the sh_scramble control were grown for 14 days after transduction and then 24 h in hormone-depleted media before stimulation with 10 nM DHT (or ethanol as vehicle control) for 24 h prior to harvesting for RNA extraction. Three biological replicates per sample were used. Total RNA was extracted with RNeasy mini kit (Qiagen, Waltham, MA) and cDNA was synthesized with SuperScript III First-Strand synthesis system (Life Technologies Inc., Carlsbad, CA). RNA integrity and quantity was assessed using the Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA). Raw 75 bp paired-end sequences were generated from an Illumina NextSeq 500 sequencer (Illumina, San Diego, CA). Sequenced reads first underwent quality control with the FASTQC tool and then aligned to a contaminant genome to filter out reads which align to human ribosomal RNA, poly-A, poly-C, phiX virus or mitochondrial DNA sequence. The filtered reads were trimmed using Trimmomatic [47], as well as read clipping based on quality over a sliding window, retaining reads with a minimum length of 15 bp. Trimmed, filtered reads were pseudoaligned to the GRCh38 human reference transcriptome using kallisto version 0.42.3 [48], with enabled bias correction and 50 bootstrapping rounds. Expression values for 173,259 unique transcripts were measured and transcripts with an average of 5 count per million (CPM) or less across all samples were removed from further analysis. To perform

TMCC11 signature selection process
From the initial group of 25 genes nuclear genes selected as significantly upregulated (Log2 fold change ≥1.8, ≤3.1; FDR < 0.05) by DHT in the LNCaP-TMEFF2 knockdown cells, we chose the 21 top-ranking upregulated genes (Log2 fold change ≥2.0) (Additional file 1: Figure S3). We interrogated this 21 gene subset in the MSKCC dataset (n = 150) in cBioPortal [51,52] and selected those genes (n = 11; TMCC11) whose expression was upregulated in at least 4 of those patients with low TMEFF2 mRNA expression, and that maintain a strong functional association as demonstrated using STRING [53] and IPA pathway analyses (Additional file 1: Figure S4). Two other signatures were used for SigCheck analysis. TMCC13 is a modified TMCC11 signature including two additional genes, E2F7 and GSG2 (from the TMEFF2 21 top-ranking upregulated genes; Additional file 1: Figure  S3), selected based on their individual prognostic values and lack of overlap with genes from the Cuzick [22] signature. TMCC3 consist of the CDC45, NCAPG and CLSPN genes and was selected from TMCC11 as the optimal subset in predicting time to BCR in the Stockholm dataset. For this purpose, the dependence of time to BCR on the signature gene expression was modeled using GLM cox regression, and the search for the best subset relied on elastic net regularization, a standard features selection procedure implemented in the R package glmnet.

TMCC11 signature score development
Patients were divided in two categories (high and low) based on the TMCC11 gene signature, by calculating the mean expression over all the genes in the signature for each sample. The distribution for the population was calculated, and samples were included in the high group when their mean fell within the upper tertile (above the 67th percentile) and in the low group when below the 67th percentile.

Databases and statistics
Databases/platform used during this study: cBioportal [51,52], Oncomine [42], the R2 genomic analysis and visualization platform (http://r2.amc.nl); the STRING database [53]; and SurvExpress [54]. The parameters used are referenced in the corresponding figure legends if applicable. For publicly available microarray or RNA-Seq expression data sets, the normalized expression data was downloaded from the Oncomine, cBioportal or R2 databases. Hierarchical clustering of the TMCC11 signature genes (Euclidean distance with average linkage on zscore transformed expression values) on samples from the MSKCC dataset was performed in R2.
Data analysis were performed by non-parametric Wilcoxon multiple comparison test or Student t-test as indicated in figure legends. Statistical significance was defined as P < 0.05 unless otherwise stated. Time-to-event outcomes were evaluated using Kaplan-Meyer analysis and survival-time differences were compared using the log-rank test. Uni-, multi-variate and C-statistics were used to assess the independent effect of biomarker status on clinical outcome. Univariate hazard ratios and p-values were obtained using the Cox proportional hazard model. Multivariate analysis was performed using the Cox proportional hazard model. A stepwise model selection procedure coupled with Cox proportional hazard model was used to define the final model. The Harrell's method was used to compute the concordance statistics. Covariates included in the multivariate models were: biopsy and/or surgical gleason score, PSA, pathological T-stage, positive surgical margins and/or extracapsular extension. Covariates were adjusted as follow: Gleason -High (≥4 + 3): Low (≤3 + 4); PSA -High (≥10):Low(< 10); Path Stage -High(≥T3):Low(≤T2); Positive surgical margins -Y:N; Extracapsular extension (ECE) -Y:N. These analyses were conducted using SAS 9.4 and a p-value of less than 0.05 or 0.01 if indicated, was deemed statistically significant.

Gene signature analysis with SigCheck
We analyzed the prognostic potential and specificity of the TMCC11 signature using the Bioconductor package SigCheck [55]. This software allows comparison of a gene signature prognostic performance against random and known gene signatures. In a first analysis, we compared the prognostic power of the TMCC11 gene signature and 253 oncogenic signatures available from the literature. The prognostic power of a gene signature was quantified by the log-rank test p-value for the difference between the time to BCR in high versus low risk groups according to overall signature gene expression. Mean expression over all the genes in the signature for each sample was computed, and high versus low expression was considered as over or below the 67th percentile respectively. Log-rank P-values for each signature were computed using the Stockholm ( [34], GSE70769), Cambridge ( [34], GSE70768) and MSKCC ( [46], GSE21034) datasets downloaded from the GEO website. In a second analysis, we comparatively assessed the superiority of the TMCC11 and the other 253 oncogenic signatures against randomly constructed predictors. For each signature under study, 10,000 signatures of the same number of genes were selected at random and for each log-rank p-value scores of their predictive power were computed as described above. A bootstrap p-value was then determined as the proportion of random gene signatures scoring better than the original gene signature. Stockholm, Cambridge and MSKCC datasets were also used for this analysis. The code for the analysis is available upon request. See Additonal file 1 for supplementary Methods.

Results
Low expression of TMEFF2 is associated with advanced disease and is prognostic of clinical outcome The previously described cell growth inhibitory function of TMEFF2 in PCa [35][36][37] led us to determine the relationship of Tmeff2 expression alterations to the clinicopathologic features of PCa. We first analyzed tumor associated changes in TMEFF2 expression by immunohistochemistry in PCa tissues (Additional file 1: Figure S1A). TMEFF2 protein expression was higher in patients with localized disease as compared to non-tumor samples (not shown). However, when patients were stratified by tumor stage, TMEFF2 expression was significantly decreased in more advanced pathological stages (Additional file 1: Figure S1B).
We then used Oncomine [42] to examine alterations of TMEFF2 mRNA expression in publically available samples from PCa patients. Expression of TMEFF2 mRNA is significantly increased in the primary tumors of patients with PCa when compared to normal tissue, in multiple independent datasets (Fig. 1a). However, in samples from metastases and castration resistant prostate cancer (CRPC), the levels of TMEFF2 mRNA are either unchanged or decreased compared to normal prostate, and significantly decreased (P < 0.05) when compared to primary tumors (Fig. 1a). These data suggest a negative correlation between TMEFF2 mRNA expression and progression to the advanced stages of PCa.
Based on these observations, we analyzed the prognostic value of TMEFF2 mRNA expression in the MSKCC dataset ( [46]; Table 1), a publically available human PCa dataset with clinical outcome data. Kaplan-Meier analysis demonstrated a significant (p < 0.0001) correlation between TMEFF2 levels and disease progression (assessed by biochemical recurrence, BCR). Patients with the lowest TMEFF2 mRNA expression had faster BCR (20 vs. 110 months; Fig. 1b). These findings underscore the clinical significance of Tmeff2 in cancer.
TMEFF2 silencing in the LNCaP cell line increases androgen-driven expression of a group of cell-cycle related genes TMEFF2 is one of the top 100 mRNA transcripts with the highest levels of inter-tumor variability in patient samples from several publically available datasets ( [34] and Additional file 1: Table S1). Such heterogeneity and the fact that low TMEFF2 mRNA expression correlates with advanced disease, suggest that it may define a molecular signature with prognostic value. To begin understanding the molecular consequences of decreased TMEFF2 expression and its potential to define a prognostic gene signature, we conducted TMEFF2-targeted RNA interference experiments. Using shRNA, we silenced expression of TMEFF2 in LNCaP cells (Fig. 2a and Additional file 1: Figures S2A and S2B), a PCa cell line that expresses high levels of TMEFF2 mRNA and protein. Using RNA-Seq, we identified a group of 25 nuclear genes that were moderately but significantly upregulated by DHT in the context of TMEFF2 silencing (Log2 fold change ≥1.8, ≤3.1; FDR < 0.05), as compared to control cells (transduced with scramble shRNA; Additional file 1: Figure S2C). STRING pathway analysis [53] suggests that most of these genes are functionally associated (Additional file 1: Figure S2D) and belong to the DNA replication and cell cycle gene ontology categories. All together these results suggest that TMEFF2 silencing alters expression of androgen receptor (AR) targets, andthat previously reported TMEFF2 effects on growth [37] may be driven, in part, by TMEFF2-modulated AR-mediated expression of genes involved in cell cycle related processes (Additional file 1: Supplementary Discussion).
Out of the initial group of genes, we selected 11 (see Methods and Additional file 1: Figsure S3A and S3B) referred to as the "TMEFF2 modulated cell cycle 11 (TMCC11)" gene signature. qRT-PCR analysis in LNCaP cells confirmed that DHT mediated induction of the TMCC11 genes was significantly increased in LNCaP cells in which TMEFF2 expression was low compared to control cells (Fig. 2b). High expression of these genes with low TMEFF2 expression was also seen in patients' samples from the MSKCC dataset (Additional file 1: Figure S3C). Clustering analysis of the TMCC11 signature genes in the MSKCC dataset indicate that expression of these genes is highly correlated (Fig. 2c). These 11 genes are all tightly related to cell-cycle and DNA replication and repair processes (Additional file 1: Figure S3B). Moreover, silencing of TMEFF2 in PCa cells affects cell cycle progression (Additional file 1: Figure S4) supporting the role of TMEFF2 in modulating expression of cell-cycle related genes (see also Additional file 1: Supplementary Discussion).
In clinical samples from the Grasso [45] and MSKCC [46] datasets, the expression of the individual genes from the TMCC11 signature is significantly increased in CRPC and metastatic disease samples when compared to normal tissue, and inversely correlated with the A B Fig. 1 Low expression of TMEFF2 mRNA is associated with advanced disease and clinical outcome. a Scatter plot showing TMEFF2 mRNA expression levels in normal, primary and metastatic/CRPC tissue from patients from different prostate cancer cohorts. Expression levels were obtained from Oncomine and compared using a Wilcoxon multiple comparison test. b Kaplan-Meier analysis of biochemical-relapse free survival for patients from the MSKCC prostate cohort stratified by TMEFF2 mRNA expression (n = 11; lowest expression in the cohort) vs. the rest of the patients. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001 expression of TMEFF2 in the same samples (Additional file 1: Figure S5A and S5B). In addition, mRNA coexpression analysis using the PCa MSKCC and PRAD TCGA datasets indicates that these genes are significantly co-expressed (Additional file 1: Figure S6).
The TMEFF2-modulated gene signature is an independent marker of recurrence after prostatectomy in multiple clinical datasets Based on the results suggesting that loss of TMEFF2 often predates aggressive/metastatic disease, we postulated that the TMEFF2-modulated TMCC11 gene signature could have prognostic value. We evaluated this hypothesis using BCR as the clinical endpoint in the PCa MSKCC dataset [46] (Table 1 and Additional file 1: Table S2 and Figure S7 provide information on the samples). The MSKCC dataset includes a number of prostatectomy samples from patients with wide range of times to BCR as measured by increased levels of PSA. Individually, increased expression of each of the genes comprising TMCC11 was statistically significant (P < 0.01) in predicting BCR (Additional file 1: Table S3; for CLSPN p = 0.0137). In Kaplan-Meier analyses, high expression of the TMCC11 signature was associated with a median time to progression of 55.39 months vs. greater than 150 months for patients with low expression of TMCC11 (log-rank P value = 1.11e-05; Fig. 3a). These results indicate that the TMCC11 signature is a powerful predictor of aggressive PCa, segregating the   a Western Blot analysis to determine knockdown of TMEFF2 in LNCaP cells using three different TMEFF2 targeted shRNAs. Only sh_TMEFF2-1 and sh_TMEFF2-2 appreciably silenced TMEFF2 expression. Note that Tmeff2 is an androgen-regulated gene. Representative blot from > 3 repeats. b qRT-PCR data in the LNCaP-sh_TMEFF2 cells confirming increased expression in response to androgen stimulation of the cell cycle genes selected for the TMCC11 signature. Data is the average of 3 independent repeats and was analyzed using T-test. Error bars correspond to s.e.m. c Clustering analysis of TMCC11 signature genes in the MSKCC cohort. Each column corresponds to an individual patient. The status of some clinicopathological variables for each sample has been included in the figure at the top of the heatmap. *P < 0.05; **P < 0.01 tumors into high and low-risk groups based on time to BCR. We obtained similar results using the SurvExpress [54] database for analysis (Additional file 1: Figure S8).
In Cox regression analyses, TMCC11 was a significant prognostic variable (p < 0.001) with a hazard ratio (HR) of 4.1 ( Table 2). In multivariate analysis, and a model constructed using a forward stepwise selection process coupled with Cox proportional hazard, TMCC11 remained a significant prognostic variable with a HR of 2.27 and 2.35 respectively ( Table 2). The final model also selected pathological T-score and surgical Gleason score as significant predictors of BCR (Table 2).
We validated the prognostic findings in additional independent publically available datasets (see Table 1, Additional file 1: Table S2 and Fig. S7 for descriptions). Kaplan-Meier analysis of relapse free survival demonstrated that TMCC11 was a significant (log-rank p = 5.75e-04, p = 1.52e-04 and p = P = 1.01e-07) predictor of outcome in the Cambridge (CAM; n = 112; [34]), Stockholm (STO; n = 92; [34]) and PRAD TCGA (n = 490) cohorts, segregating patients with better/worse prognosis based on disease recurrence data over 60, 100 and 180 months respectively (Fig. 3b-d). Results using multivariate Cox regression analysis including expression level of the TMCC11 signature and several clinical variables, demonstrate that the TMCC11 signature is an independent predictor of recurrence after prostatectomy in these datasets (Table 2). Taken together, these data suggest that the TMCC11 signature is prognostic for risk of disease recurrence after radical prostatectomy, and has an added benefit in the context of standard clinical variables in several independent datasets.
The prognostic value of the TMCC11 signature was further evident using C-statistics (Additional file 1: Table S4). The TMCC11 signature was a significant predictor across all datasets. In the TCGA-PRAD, it performed better (C-index, 0.64; confidence interval, 0.58-0.70; p < 0.001) than Gleason (C-index, 0.62; confidence interval, 0.58-0.67; p < 0.001) or pathological score (C-index, 0.61; confidence interval, 0.57-0.66; p < 0.001). Moreover, in all the datasets, the TMCC11 signature significantly improved prognostic ability when combined with other A B C D  Table S4). The persistence of the interaction terms as significant effects proves that the TMCC11 predictive effectiveness might vary with the levels of the other clinical variables. In selected patients from the MSKCC and TCGA-PRAD datasets with high pathological T (≥ T3) or Gleason (≥ 4 + 3) scores, high TMCC11 significantly stratified men at risk for disease recurrence/progression (Additional file 1: Figures S9 and S10). TMCC11 provides prognostic information in high-risk patients beyond that provided by established clinicopathological prognostic features as demonstrated using multivariate analysis (Additional file 1: Tables in Figures S9 and S10). These results suggest that TMCC11 has prognostic value in men with high-grade tumors, after RP. TMCC11 failed to stratify patients with low surgical Gleason score, however, preliminary data using the MSKCC [46] and Stockholm [34] datasets indicate that TMCC11 can stratify patients presenting with low biopsy Gleason score, suggesting that the signature may be informative for PCa management after a positive biopsy (Additional file 1: Figure S11).

Prognostic assessment of the TMCC11 gene signature
Several gene signatures have prognostic capabilities in PCa. We therefore conducted additional tests to determine the value of the TMCC11 signature when compared to other signatures, using the Bioconductor package SigCheck [55]. This software allows comparison of a gene signature's prognostic performance against random and known gene signatures. Initially, we analyzed the prognostic power (based on time to recurrence) of TMCC11 and other previously identified oncogenic signatures: 6 signatures for PCa [22,25,34,[56][57][58], 189 oncogenic signatures from multiple cancer types in MSigDB, and 48 breast oncogenic signatures (compiled in [59]) (n = 243, Table 3 and Additional file 1: Table S5). TMCC11 outperformed most signatures (Additional file 1: Table S5). Considering just the 6 PCa gene signatures, only the Cuzick (n = 31) signature achieved comparable performance to the TMCC11 across the three datasets for identifying patients with shorter time to biochemical relapse, and the performance depended on the dataset utilized (Table 3). Of note, 5 genes within the Cuzick set overlap with the TMCC11 set. We obtained similar results using   Table S5). TMCC13 is a modified form of TMCC11 including two additional genes, E2F7 and GSG2, while TMCC3 consisted of only 3 genes from the TMCC11 signature that do not overlap with the Cuzick signature. These results underscore the independent prognostic value of the genes included in the TMCC11 signature. We then analyzed the performance of the oncogenic signatures against 10,000 signatures consisting of the same number of genes (for the specified signature) selected at random (Tables 3 and Additional file 1: Table S6). The TMCC11 signature performed in the 97th and 99th percentiles, with only 3, 1.2 and 0.18% of the random signatures demonstrating an equal or smaller p-value (empirical p-values of p = 0.0305, p = 0.012 and p = 0.0018) in predicting relapse in the Stockholm, Cambridge and MSKCC datasets respectively. Considering the PCa signatures, only the Cuzick (n = 31) signature achieved comparable performance to the TMCC11 across the three datasets (Table 3). TMCC11, TMCC13 and TMCC3 outperformed most of the oncogenic signatures described above (n = 243), when tested against random signatures (Additional file 1: Table S6).

Discussion
Here, we have identified an 11-gene prognostic signature (TMCC11) for PCa progression consisting of genes associated with cell-cycle and DNA damage response. The prognostic value of this signature was confirmed on several publically available cohorts totaling 834 samples from geographically different cohorts of patients that underwent RP. TMCC11 is an independent predictor of biochemical recurrence after RP and added significant prognostic value to standard clinicopathological variables. In multivariate analysis TMCC11 was the only variable consistently predictive of disease recurrence in all of the datasets, and it significantly increased risk prediction over other clinical variables and when combined with other variables (Table 2 and Additional file 1: Table S4). Moreover, in subsets of patients with high Gleason or pathological scores, the TMCC11 signature provided a statistically significant stratification of patients identifying high and low risk groups for disease recurrence, and preliminary data suggests that TMCC11 can stratify patients that present with low biopsy or pre-operative Gleason scores. All together, these results suggest that TMCC11 may provide relevant prognostic information in several clinical scenarios and have an impact not only on the decision of whether to provide adjuvant therapy after RP, but also on treatment management after a positive biopsy.
Genomic and transcriptomic analyses have provided insight into the complexity of prostate tumors and the existence of molecular subtypes. However, the clinical applicability of these classifications has been thwarted, due in part to the highly heterogeneous nature of PCa and the difficulty of identifying additional relevant alterations that occur at low frequencies [11][12][13][14][15][16][17][18] [60]. We hypothesized that heterogeneously expressed genes can expose unidentified molecular subclasses of PCa and/or identify translationally relevant gene sets. Expression of Tmeff2, an androgen regulated gene, is highly variable across several different PCa datasets ( [34], Additional file 1: Table S1). Low TMEFF2 mRNA expression significantly associated with shorter time to post-RP BCR. Although the prognostic value of low TMEFF2 mRNA levels is uncertain, low TMEFF2 mRNA correlates with: 1) increased androgen response of the cell cycle genes that define the TMCC11 signature in cell lines; and 2) increased mRNA levels of the same genes in samples from clinical datasets (see also Additional file 1: Supplementary Discussion). Interestingly, SPINK1 also demonstrates highly variable expression across the same datasets (Additional file 1: Table S1). SPINK1 is an androgen-regulated gene highly overexpressed in approximately 10% of PCa cases [61][62][63]. While the prognostic role of SPINK1 for PCa is unclear [64], it has been suggested that pathways downstream of SPINK1 may have translational and prognostic significance [64,65]. These observations hint to highly variably expressed genes as a potential source of information with translational value. Currently several tissue-based genomic biomarkers offer prognostic information for patients with PCa either before or after treatment [23]. The Decipher™ [24], Oncotype DX® [25] and Prolaris® [22] are commercially available panels based on measurement of gene expression changes at the RNA level. The Prolaris® panel, based on the set described in Cuzick [22], examines the expression of 31 genes involved in cell cycle progression and 5 out of the 11 genes in TMCC11 are common to this panel. We observed a similar prognostic performance for the Cuzick [22] and the TMCC11 signatures when compared against random size-matched signatures. In addition, the prognostic power (based on p-value) of our signature vs. Cuzick [22] was dependent on the dataset utilized, but they were similarly informative and both behaved as strong risk predictors. While these comparisons need to be verified in independent studies, TMCC11 represents a smaller and more focused distinct gene set with potentially added value in specific patient subsets. The smaller size of the TMCC11 signature (11 genes vs. 31 of Cuzick [22]) is an advantage in clinical use since smaller signatures are more amenable to testing with reduced RNA quantities (i.e. biopsy samples) or even assayed with immunohistochemistry. In addition, TMCC3, a signature consisting of three genes selected from the TMCC11 signature, that does not overlap with the Cuzick gene set, demonstrated excellent prognostic ability in SigCheck analysis. This suggests that subsets of the TMCC11 genes can be of prognostic value. Finally, the fact that our studies have independently led to the identification of a cell-cycle based signature validates the results and points to the value of using cell cycle genes as prognostic markers in PCa. See Additonal file 1 for a supplementary Discussion.

Conclusions
Using an unconventional approach, we have identified an 11-gene signature consisting of functionally related nuclear genes with roles in DNA replication/ repair and/or cell cycle that can improve accuracy of prognosis in patients with PCa after RP in the context of current clinicopathological variables. Prognostic gene signatures containing, or based on, cell cycle gene expression changes have been identified using other approaches and different sample types. This observation not only validates our results, but also suggests that heterogeneity may lead to similar cellular consequences, providing cell cycle based signatures with rather global prognostic values. The TMCC11 signature requires further validation in multi-institutional cohorts and clinical trials. In addition, the ability of TMCC11 to provide prognostic information using biopsy samples needs to be further explored.

Additional file
Additional file 1: A TMEFF2-regulated cell cycle derived gene signature is prognostic of recurrence risk in prostate cancer. Figure S1. Expression of TMEFF2 protein in prostate cancer. Figure S2. Androgen-induction of nuclear genes is affected by TMEFF2 silencing. Figure S3. Selection of the TMEFF2 modulated cell cycle (TMCC11) gene subset. Figure S4. Effect of TMEFF2 silencing on cell cycle progression. Figure S5. The TMCC11 signature genes are highly expressed in metastatic prostate cancer and clinical CRPC. Figure S6. The genes in the TMCC11 signature are significantly co-expressed. Figure S7. Distribution of the TMCC11 signature score in patients from the different datasets used in this study. Figure S8. High expression of TMCC11 correlates with poor prognosis in the MSKCC dataset using the SurvExpress platform for analysis. Figure S9. High TMCC11 expression correlates with decreased disease-free survival in subsets of patients with high pathological or surgical Gleason score in the MSKCC dataset. Figure S10. High TMCC11 expression correlates with decreased disease-free survival in subsets of patients with high pathological or surgical Gleason score in the PRAD-TCGA dataset. Figure S11. TMCC11 stratifies patients presenting with low biopsy or pre-operative Gleason score. Supplementary Methods. Supplementary Discussion. Supplementary References. Table S1. List of the 100 most variable expressed genes in 5 different datasets. Table S2. Overview of clinical datasets used in this study with expression data. Table S3. Summary of Kaplan-Meier analysis for DFS of the individual 11 genes corresponding to the TMCC11 signature. Table S4. C-statistical analysis for time to BCR comparing the performance of TMCC11 alone or in combination with other clinical variables. Table S5. Performance of multiple oncogenic signatures on predicting relapse. Table S6. Comparison of the prognostic potential for relapse of multiple oncogenic signatures against random sets of genes.