Gene expression signatures of neuroendocrine prostate cancer and primary small cell prostatic carcinoma
BMC Cancer volume 17, Article number: 759 (2017)
Neuroendocrine prostate cancer (NEPC) may be rising in prevalence as patients with advanced prostate cancer potentially develop resistance to contemporary anti-androgen treatment through a neuroendocrine phenotype. While prior studies comparing NEPC and prostatic adenocarcinoma have identified important candidates for targeted therapy, most have relied on few NEPC patients due to disease rarity, resulting in thousands of differentially expressed genes collectively and offering an opportunity for meta-analysis. Moreover, past studies have focused on prototypical NEPC samples with classic immunohistochemistry profiles, whereas there is increasing recognition of atypical phenotypes. In the primary setting, small cell prostatic carcinoma (SCPC) is frequently admixed with adenocarcinomas that may be clonally related, and a minority of SCPCs express markers typical of prostatic adenocarcinoma while rare cases do not express neuroendocrine markers. We derived a meta-signature of prototypical high-grade NEPC, then applied it to develop a classifier of primary SCPC incorporating disease heterogeneity.
Prototypical NEPC samples from 15 patients across 6 frozen tissue microarray datasets were assessed for genes with consistent outlier expression relative to adenocarcinomas. Resulting genes were used to determine subgroups of primary SCPCs (N=16) and high-grade adenocarcinomas (N=16) profiled by exon arrays using formalin-fixed paraffin-embedded (FFPE) material from our institutional archives. A subgroup classifier was developed using differential expression for feature selection, and applied to radical prostatectomy cohorts.
Sixty nine and 375 genes demonstrated consistent outlier expression in at least 80% and 60% of NEPC patients, with close resemblance in expression between NEPC and small cell lung cancer. Clustering by these genes generated 3 subgroups among primary samples from our institution. Nearest centroid classification based on the predominant phenotype from each subgroup (9 prototypical SCPCs, 9 prototypical adenocarcinomas, and 4 atypical SCPCs) achieved a 4.5% error rate by leave-one-out cross-validation. The classifier identified SCPC-like expression in 40% (2/5) of mixed adenocarcinomas and 0.3-0.6% of adenocarcinomas from prospective (4/2293) and retrospective (2/355) radical prostatectomy cohorts, where both SCPC-like retrospective cases subsequently developed metastases.
Meta-analysis generates a robust signature of prototypical high-grade NEPC, and may facilitate development of a primary SCPC classifier based on FFPE material with potential prognostic implications.
Neuroendocrine prostate cancer (NEPC) is a rare aggressive variant of prostate cancer comprising a spectrum of diseases emerging in different clinical settings, from de novo primary small cell prostatic carcinoma (SCPC) to treatment-related metastatic NEPC . The 2016 WHO classification of NEPC consists of adenocarcinoma with neuroendocrine differentiation (Ad+NED), well-differentiated neuroendocrine tumor, small cell neuroendocrine carcinoma (synonymous with SCPC), and large cell neuroendocrine carcinoma (LCNEC), of which the last two are particularly aggressive and referred to in this paper as high-grade NEPC. Prevalence of NEPC is anticipated to rise as patients with metastatic prostate cancer receive newer anti-androgen treatments and potentially develop resistance through a neuroendocrine phenotype .
Molecular characteristics associated with high-grade NEPC include absence of androgen receptor (AR) signaling, RB loss combined with p53 dysfunction, and reduced REST activity together with up-regulation of neuroendocrine genes [3, 4]. Diagnosis is often supported through immunohistochemistry (IHC) of corresponding proteins, with high-grade NEPC exhibiting the prototypical profile of negative AR, high Ki-67, and positive neuroendocrine markers. In the primary setting however, IHC studies have demonstrated PSA positivity in 17-20% of SCPC and retention of other markers associated with adenocarcinomas in up to 25%, while panels of neuroendocrine markers can be entirely negative in up to 12% [5, 6]. In the metastatic setting, intermediate NEPC-like characteristics have been observed among some adenocarcinomas progressing to androgen-independence [7, 8]. Although prognostic implications of atypical features have not been formally established, rare hybrid tumors with aggressive progression have been described [9, 10].
Diagnostically, NEPC may be challenging to distinguish histologically from poorly differentiated high-grade adenocarcinoma, however prompt recognition is important since NEPC is relatively resistant to anti-androgen treatment but initially sensitive to platinum-based chemotherapy. Comparisons of NEPC and adenocarcinomas have led to candidates for diagnostic markers or targeted therapy, such as AURKA . Studies have generally been based on few NEPC patients with classic immunophenotype and have resulted in at least 8 lists with thousands of differentially expressed genes collectively [4, 8, 11,12,13,14,15], suggesting potential opportunity for meta-analysis. Alternatively, larger populations of NEPC tumors might be profiled by leveraging archived formalin-fixed paraffin-embedded (FFPE) diagnostic samples. Improved technology has demonstrated gene expression concordance between FFPE and fresh frozen tissue despite RNA degradation in FFPE, with an ability to detect molecular subtypes of prognostic and predictive importance [16, 17].
In this study, we first compared and assessed published NEPC gene expression studies on the level of differentially expressed gene-lists, cohort details, and gene expression signatures. Using a meta-analysis approach, we consolidated common patterns of prototypical high-grade NEPC, specifically identifying genes with consistent outlier expression among SCPC and LCNEC samples of classic immunophenotype across 6 frozen tissue microarray datasets, yielding a 69-gene model with almost indistinguishable behavior between high-grade NEPC and small cell lung cancer (SCLC). We next analyzed an FFPE exon array dataset from our institution (JHU-FFPE) profiling 16 primary SCPCs and 16 adenocarcinomas (predominantly Gleason 9), notable for inclusion of mixed cases, AR-positive SCPCs, PSA-positive SCPCs, and NE-marker negative SCPCs. Based on meta-analysis genes, we identified 3 subgroups (labeled prototypical SCPC, prototypical adenocarcinoma, and atypical SCPC) and developed a LIMMA-based 3-centroid-classifier. Although we lacked a validation set, the classifier achieved a 4.5% estimated error rate on leave-one-out cross-validation and detected SCPC-expression in 40% (2/5) of mixed adenocarcinomas and 0.3-0.6% of adenocarcinomas from radical prostatectomy (RP) cohorts, with a possible enrichment for adverse events.
NEPC gene-lists in the literature
We searched the literature for published gene-lists of differentially expressed genes between NEPC and prostatic adenocarcinoma based on expression profiling of patient tumor samples or patient-derived xenografts (Table 1) [4, 8, 11,12,13,14,15]. To compare gene-lists and identify common genes, we updated gene names and probe assignments with current HGNC symbols, and where possible, resolved un-annotated probes and non-standard transcripts through BLAT alignment of underlying sequences to hg19. For rough statistical assessment of similarity, we evaluated pair-wise overlaps of gene-sets via Fisher exact test, with a presumptive background of ~20000 genes.
Bioinformatic processing and analysis
We collected various datasets for meta-analysis and ancillary tests (Table 2). Microarrays were processed by RMA-based pipelines to arrive at absolute log-intensities. Gene signatures of AR signaling (ARS) (“Hieronymus up” genes) , neuronal phenotype (Lapuk) , and cell cycle progression (CCP) (Cuzick)  were scored by average expression. LIMMA and DAVID/PANTHER were used for differential expression and gene-ontology analyses . Details are provided in Additional file 1.
For an NEPC sample, a gene was considered an outlier if its expression was greater than 2 standard deviations and log2-fold change 1 away from the mean of the dataset’s adenocarcinoma cohort. For adenocarcinomas, this definition was applied after first removing the evaluated sample from the adenocarcinoma cohort, although not possible for the smallest dataset. For each gene, the number of NEPC (or adenocarcinoma) samples with outlier up-expression or down-expression was tabulated and further summarized by patient using fractional counts for multiple samples from the same patient. Genes with outlier status in the same direction in N or more NEPC patients were referred to as meta-N genes. NEPC and adenocarcinoma centroids were similarly calculated on the patient level through fractional weights, and used for correlation-based scoring and classification.
JHU-FFPE patient sample selection
Thirty-three FFPE samples (Table 3), diagnosed as 16 SCPC’s, 16 high-grade adenocarcinomas (majority Gleason 9), and 1 adenocarcinoma with neuroendocrine differentiation, including 4 matched pairs from mixed tumors, were retrieved from surgical pathology and consultation files of Johns Hopkins Hospital from 1999-2013 after IRB approval and successfully processed for gene expression profiling with Human Exon 1.0 ST GeneChips (Affymetrix), as described in a previous study using 22 of these samples . Diagnoses were in accordance with recently proposed morphologic criteria of neuroendocrine differentiation in prostate cancer . A tissue microarray (TMA) containing 11 of the 33 samples with IHC of Rb1 and cyclin D1 was described previously , and additional IHC was performed for the prostate-related markers PSA (Ventana), AR (Ventana SP107), and Nkx3.1 (Biocare), and the neuroendocrine markers chromogranin A (Ventana LK2H10), synaptophysin (Novocastra 27G12), and CD56 (Cell Marque 123C3.D5) [1, 22].
LIMMA-based centroid models
For binary classification based on training subgroups A and B, LIMMA was used for feature selection (differentially expressed genes between A and B with adjusted p-values < 0.05), and a nearest centroid model based on A and B was developed. For ternary classification based on training subgroups A, B, and C, feature selection consisted of differentially expressed genes common to 2 or more LIMMA comparisons (A versus B, A versus C, and B versus C), and a nearest centroid model based on A, B, and C was developed. Leave-one-out cross-validation (LOOCV) with mixed pairs removed together was used to evaluate models, starting from new feature selection upon each removal.
Expression profiles (N=3428) of adenocarcinomas from RP specimens were retrieved from Decipher GRID® prostate cancer database , consisting of high risk cases from clinical use of the Decipher test (NCT02609269; Prospective cohort) or from retrospective institutional studies with outcomes data (JHU-RP and Mayo cohorts) [24,25,26]. Specimen selection, RNA extraction, and Human Exon 1.0 ST Array hybridization were done in a Clinical Laboratory Improvement Amendments (CLIA/CAP/NYS)-certified laboratory facility (GenomeDx Biosciences, San Diego, CA, USA) as previously described . Normalization was performed using Single Channel Array Normalization (SCAN).
Literature NEPC gene-lists comprise thousands of genes with significant overlap but no universal genes despite a common NEPC immunophenotype and common gene signature patterns
We identified 8 gene-lists from the literature comparing gene expression of NEPCs and prostatic adenocarcinomas, based on a collective total of 29 and 114 unique patients respectively (Table 1) [4, 8, 11,12,13,14,15]. Cohort definitions varied slightly between studies, specifically regarding treatment of adenocarcinomas with NE differentiation, which were grouped with NEPCs in WCMC mCRPC, with adenocarcinomas in MDA xeno, and variably with either cohort depending on IHC status in UW mCRPC (grouped with NEPC when synaptophysin and chromogranin both positive). NEPC cohorts thus contained significant proportions of adenocarcinomas with NE differentiation for WCMC mCRPC and UW mCRPC (46% and 50% of NEPCs respectively), but otherwise consisted exclusively of SCPCs and one rare LCNEC for most gene-lists (6 of 8). IHC of annotated SCPCs and the LCNEC, when provided, was always negative for PSA (17/17 patients) and AR (10/10), always positive for synaptophysin (17/17), and usually positive for chromogranin (9/15). Thus most gene-lists, in particular the 6 of 8 based on SCPCs / LCNEC, corresponded to a classic NEPC immunophenotype and notably lacked AR-positive or PSA-positive SCPCs, which have been reported in 17-20% of primary SCPCs [5, 6].
Collectively, the 8 gene-lists consisted of 1782 up-genes and 1785 down-genes with increased and decreased expression in NEPC, including 433 (24%) and 235 (13%) common to multiple lists although some studies were not entirely independent (Additional file 2: Table S1). No genes were common to all lists, with the most frequent comprised of 9 largely neuronal up-genes in 5/8 lists (BSN, CRMP1, GPRIN1, INA, MAST1, MYT1, RAB3C, SNAP25, UNC13A) and 5 largely androgen-related down-genes in 4/8 lists (CYP1B1, KLK2, KLK3, STEAP1, TRPV6). Gene-lists demonstrated pair-wise similarity, often related to cohort or statistical details (Additional file 2: Table S2); the study with greatest statistical power (WCMC mCRPC) generated the largest list (>2000 genes)  and overlapped most with other gene-lists, while comparisons of metastatic NEPC versus primary adenocarcinoma (WCMC 2011, VPC 2012) resulted in enrichment of metastasis-associated genes (Additional file 2: Table S3).
We obtained available NEPC gene expression data corresponding to 5 of the 8 gene-lists, 3 more studies with known SCPCs (including an FFPE dataset from our institution), and 1 study (SU2C) with rare NEPCs consisting mostly (80%) of adenocarcinomas with NE differentiation (Table 2). Gene signature scores were used to assess samples (Fig. 1), similar to a recent study . Annotated SCPCs (and the LCNEC) from frozen tissue datasets almost always demonstrated a prototypical pattern of low ARS, high neuronal phenotype, and high CCP scores, in accordance with a classic NEPC immunophenotype. In xenograft and frozen tissue primary datasets, ARS and neuronal phenotype scores completely separated SCPCs / LCNEC from adenocarcinomas (AUC 100%). Annotated adenocarcinomas with NE differentiation generally demonstrated gene signature scores similar to adenocarcinomas, except possibly with slightly elevated neuronal phenotype scores. A few NEPCs from WCMC and SU2C also demonstrated gene signature scores similar to adenocarcinomas, and possibly represented adenocarcinomas with NE differentiation, however specific NEPC subtype was not provided in annotations of these datasets .
Outlier-based meta-analysis identifies NEPC expression patterns on the patient level
We produced a meta-analysis signature of prototypical high-grade NEPC (omitting adenocarcinomas with NE differentiation) by utilizing 6 frozen tissue microarray datasets profiling 23 NEPC samples (from 15 patients) with SCPC or LCNEC morphology, classic immunophenotype (when provided), and low ARS and high neuronal phenotype scores (Table 2 , Additional file 2: Table S4) [12,13,14, 21, 28, 29]. These datasets largely contained NEPCs and adenocarcinomas from similar clinical stages, ideally reducing confounding effects; known adenocarcinomas with NE differentiation were considered separately. RNA-seq datasets were excluded from meta-analysis as it was not possible to separate adenocarcinomas with NE differentiation from the NEPC cohorts based on available annotations. The FFPE dataset, which will be analyzed in detail in a later section, was excluded due to attenuated expression and cohort heterogeneity. We compiled the meta-12 (Table 4) and meta-9 (Additional file 2: Table S5) gene-sets, comprised of 69 and 375 genes with consistent outlier status in at least 80% (12/15) and 60% (9/15) of high-grade NEPC patients. Meta-12 genes, which required agreement between NEPCs from at least 4 datasets due to cohort sizes, were enriched for “generation of neurons” (adj p=2.6e-6 in up-genes) and “androgen receptor signaling” (adj p=3.8e-3 in down-genes) but not cell cycle. Rather, “cell division” became the most enriched gene-ontology term among meta-9 up-genes (adj p=2.6e-6), partly due to cell-cycle genes meeting outlier criteria in primary but not necessarily metastatic NEPC. Most meta genes appeared in the literature: 90% of meta-12 including AR, ASCL1, SRRM4, and CCND1, and 78% of meta-9 including PEG10, REST, EZH2, CHGA, and RB1, as expected since published NEPC gene-lists (Additional file 2: Table S1) used 9 of the NEPC patients. However, outlier analysis potentially missed genes with modest fold-changes or large variability such as HIST1H4C, which was an outlier in 55% of NEPC patients but increased to 92% under relaxed criteria. Metastatic CRPC NEPC samples demonstrated the least outlier agreement overall, while rare adenocarcinomas had NEPC-like outlier behavior and were often associated with notable features (Additional file 3: Figure S1).
We next examined genes not present on all microarrays but still demonstrating consistent outlier expression. The most prevalent was CCEPR, overexpressed in 11.5/13 (88%) NEPC patients . This sparsely studied long non-coding RNA did not appear in probe annotation files or GENCODE (v25), but was targeted by probes A_32_P216820 (Agilent), 228679_at (Affymetrix), and 3290641 (Affymetrix exon) based on BLAT; one NEPC gene-list included 228679_at without gene annotation . Genomic location of CCEPR almost overlapped with the meta-9 up-gene PHYHIPL from the opposite strand, and these genes were highly correlated in meta-analysis datasets (r=0.70-0.93). PHYHIPL probe-set 226623_at moreover had the top co-expression similarity score (3.2e-138) to CCEPR probe-set 228679_at under Multi-Experiment Matrix analysis based on hundreds of Affymetrix datasets .
Meta-12 genes were derived from conceptually similar criteria underlying the recent integrated NEPC classifier . We adopted further modifications, including nearest centroid scoring and equal weighting of patients, whereas the integrated classifier relied on a single centroid (NEPC) and utilized equal weighting of samples, with significant influence from one patient providing almost half of NEPC samples (6/13) with highly similar expression profiles. The classifiers were similarly sized (69 versus 70 genes; 11 shared), highly correlated across NEPC mCRPC datasets (UM 0.73, SU2C 0.87, WCMC 0.90), and produced identical classifications of SU2C, but disagreed on rare respective discovery samples (2 WCMC NEPCs and 2 UM adenocarcinomas). Both classifiers were based on NEPCs with below average ARS scores (WCMC initially included one NEPC with elevated ARS, which was excluded before derivation of the final classifier). Nearest centroid classification relative to meta-12 centroids (Table 4) yielded sensitivities and specificities of 91% and 100% on training samples (AUC 100% for correlation difference), and 60-80% and 94-100% in non-training NEPC datasets (Additional file 3: Figure S2). In non-prostate datasets, SCLC had the most similar profiles to NEPC, followed by CNS samples (Fig. 2); rare cell lines from other sites, including gastric small cell carcinomas, also resembled NEPC. In JHU-FFPE, meta-12 centroid profiles appeared to generate two main clusters, with the predominantly adenocarcinoma cluster containing 5 SCPCs. These SCPCs will be further characterized in the next section.
JHU-FFPE demonstrates heterogeneity of primary SCPC with associated gene expression patterns relative to signatures and meta-9 genes
We used exon arrays to profile FFPE material of 16 primary SCPCs, 16 high-grade adenocarcinomas, and 1 adenocarcinoma with NE differentiation from our institutional archives (JHU-FFPE) (Table 3), intended to represent the natural heterogeneity of primary SCPC. Primary SCPC is known to frequently co-occur with adenocarcinoma (43% in the largest published series), typically of high Gleason grade (> 8 in 85% of cases) . In JHU-FFPE, 10/16 (62.5%) SCPCs were mixed with adenocarcinomas, mostly of primary Gleason pattern 5 (80%), although only 4 fully matched pairs were available for gene expression profiling. Overall, JHU-FFPE adenocarcinomas were predominantly Gleason grade 9 (88%) by design, and most had primary Gleason pattern 4 (56%).
Primary SCPC is also known to infrequently retain expression of adenocarcinoma markers (AR 17%; PSA 17-19%) or lack expression across neuroendocrine panels (12%) [5, 6]. Among SCPC samples from JHU-FFPE with available IHC status, 2/9 (22%) expressed AR robustly, 3/9 (33%) expressed AR weakly, 1/12 (9%) expressed PSA, and 1/9 (11%) had joint negativity of synaptophysin, chromogranin, and CD56 (Table 5). SCPCs with robust AR IHC (mixed 57912_S and pure 56107) exhibited unusual hybrid IHC profiles with uniform positivity of some androgen-related (AR, Nkx3.1) and neuroendocrine (synaptophysin, CD56) markers, and negativity of others (PSA and chromogranin) (Fig. 3). On the gene expression level, ARS scores were retained at levels similar to adenocarcinomas (fold-change > -0.5 and z-score > -1 relative to adenocarcinomas) in 5/16 (31%) SCPCs (Fig. 1), corresponding to the SCPCs clustering with adenocarcinomas in the meta-12 centroid profiles (Fig. 2), including both pure and mixed cases, and comprised of the SCPCs with robustly positive AR IHC (57912_S, 56107) and SCPCs with unknown AR status (56057, 57914, 57915). The robust AR-positive SCPCs both had elevated KLK3 expression despite absence of the PSA protein product on IHC. In other public datasets, annotated SCPCs with similarly retained ARS scores were rare, if present at all (Additional file 3: Figure S3).
Hierarchical clustering relative to meta-9 genes generated 3 main subgroups, labeled “prototypical” adenocarcinomas, “prototypical” SCPCs, and “atypical” SCPCs, which generally corresponded to pure adenocarcinomas, SCPCs with reduced ARS, and SCPCs with retained ARS respectively (Fig. 4). The exceptions were one SCPC outlier with retained ARS (57914) that clustered with prototypical adenocarcinomas, one pure adenocarcinoma outlier (57634) described previously in a case report for its unusually aggressive clinical progression  that clustered with prototypical SCPCs, and heterogeneous behavior of mixed adenocarcinomas. Highly similar hierarchical clusters were generated using the collective genes of the ARS, CCP, and neuronal phenotype signatures, of which 38% (49/128 genes) overlapped with meta-9 genes. By contrast, hierarchical clustering relative to meta-12 genes (noted previously to lack enrichment for cell cycle) failed to produce the subgroup of SCPCs with retained ARS.
The pure adenocarcinoma outlier (57634), which behaved similar to prototypical SCPCs under meta-9 and also meta-12, clustered adjacent to the SCPC with joint neuroendocrine marker negativity (56322). Both samples were characterized by low ARS, non-elevated neuronal phenotype, and high CCP scores relative to adenocarcinomas (Fig. 5). We queried for the first 2 joint conditions in other datasets (relaxing the CCP constraint initially), specifically searching for outlier ARS scores (fold-change < -1, z-score < -2) and non-elevated neuronal phenotype scores (fold-change < 0.5, z-score < 1), with slightly relaxed ARS criteria (fold-change < -0.75, z-score < -1.5) for JHU-FFPE and WCMC CRPC due to attenuated expression. We identified 20 such clinical samples from 18 patients across metastatic datasets (Fig. 5). RAB3B, up-regulated in prostate cancer through AR , was the top-most jointly differentially expressed gene in this subgroup, with reduced expression relative to either NEPCs or adenocarcinomas (Additional file 3: Figure S4). CCP levels varied widely among these samples. High levels occurred across multiple datasets and included UM WA46, which was noted to have morphologic features of prostate cancer with NE differentiation . Low levels potentially reflected response to treatment, as demonstrated in a previous study where ARS and CCP decreased in every patient after ADT (Additional file 3: Figure S5) . This variation in CCP may partially explain the discordance between a recent report of negative correlation between AR signaling and proliferation signatures in metastatic CRPC versus earlier analysis reporting positive correlation between AR and E2F1 [7, 35].
Mixed adenocarcinomas were distributed among all 3 meta-9 clustering subgroups, possibly associated with degree of clonal relation with SCPCs. Clonal genomic alterations shared by components of a mixed tumor have been observed in key SCPC genes such as TP53 , and are capable of driving gene expression changes despite maintenance of morphology; for instance, gene expression changes intermediate to SCPC were recently reported in a xenograft model of transdifferentiation derived from a primary prostatic adenocarcinoma with bi-allelic alterations in TP53, RB1, and PTEN [12, 36]. On the other hand, mixed tumors are also susceptible to improper sampling, especially when components are intermingled. One mixed adenocarcinoma (56104_A), which clustered adjacent to its SCPC component (56104_S), was suspicious for such contamination. It unusually had the highest CCP score among JHU-FFPE adenocarcinomas (and #6 overall versus #2 for 56104_S) despite having the lowest Gleason grade (3+4), and one of the highest neuroendocrine phenotype scores (#3 overall versus #1 for 56104_S), including elevated expression levels of genes underlying chromogranin, synaptophysin, and CD56 despite IHC negativity. On one TMA core of the mixed tumor, an adenocarcinoma gland appeared upon deeper cuts of the SCPC component, demonstrating their close proximity (Additional file 3: Figure S6). We also speculated whether the mixed SCPC outlier (57914) might similarly be contaminated with adenocarcinoma, but had no evidence other than the remote possibility gleaned from its diagnostic report, which noted areas of merging with Gleason grade 5+5 prostatic adenocarcinoma.
Meta-9 derived subgroups yield a differential expression based classifier for prototypical and atypical SCPC in the primary setting
Comparison of SCPC and adenocarcinomas from JHU-FFPE produced 385 differentially expressed genes by LIMMA (111 up, 274 down) (Additional file 3: Figure S6), including 124 (32%) from literature NEPC gene lists. Down-genes included numerous prostate specific genes (e.g., KLK3, NKX3-1) and the known NEPC-related genes CCND1 and REST . Up-genes were enriched for “cell cycle” (adj p=7.8e-10) but included only 1 neuronal phenotype gene despite presence of the neuronal gene repressor REST among the down genes. We explored the exon array’s ability to detect known truncated splice variants associated with reduced REST activity, given that probe-set 2728423 targeted the 50-62bp cryptic exon found in neuroblastoma (hREST-N62), small cell lung cancer (sREST), and presumably NEPC [14, 37, 38]. There was no evidence of cryptic exon use in JHU-FFPE, however we could not rule out poor probe-set performance (Additional file 3: Figure S7) . Differential expression increased substantially by reducing cohort heterogeneity (e.g., 5.8-fold to 2235 genes by removing SCPCs with retained ARS). Nearest centroid classification, based on SCPC versus adenocarcinoma with LIMMA feature selection, reflected this known heterogeneity and achieved an estimated error rate of 25% (8/32) under LOOCV, with incorrect predictions of cases highlighted by meta-9 clustering: the 5 SCPCs with retained ARS, the 2 mixed adenocarcinomas clustering with SCPCs, and the pure adenocarcinoma outlier.
We constructed a new set of cohorts based on meta-9 clusters. We selected 9 prototypical SCPCs and 9 prototypical adenocarcinomas by excluding non-standard samples: specifically mixed adenocarcinomas, the outlier adenocarcinoma, adenocarcinomas associated with NE differentiation, SCPCs with robust AR positive IHC or retained ARS, and samples archived over 10 years in FFPE. We then selected the 4 atypical SCPCs with retained ARS, excluding the outlier 57914. LIMMA produced 1624 differentially expressed genes between prototypical categories, 118 between atypical SCPC and prototypical adenocarcinoma, and 115 between atypical and prototypical SCPC (Additional file 3: Figure S7). Most differentially expressed genes involving atypical SCPC were already differentially expressed between prototypical categories (79/118 and 97/115 genes; p=1.7e-63 and 4.8e-95), with greatest enrichment for “cell cycle phase” (p=1.9e-28) and including known NEPC-related epigenetic genes (EZH2, DNMT1, HIST1H4C). Thus, atypical SCPCs demonstrated a hybrid or intermediate phenotype.
Nearest centroid classification based on the 3 newly defined cohorts and common genes between > 2 pair-wise LIMMA comparisons (Table 6) achieved an estimated error rate of 4.5% (1/22), with incorrect prediction of the atypical SCPC training sample 56107 (although correct classification before LOOCV). On remaining non-training samples, 4/10 classified discordantly with diagnoses: the meta-9 outliers (57914, 57634) and 2/5 mixed adenocarcinomas (56321_A as atypical SCPC, 56104_A as prototypical SCPC; also under models derived after excluding their matched SCPC from training). Behavior of mixed adenocarcinomas, especially considering biopsies, may thus potentially be prognostic of an underlying undetected SCPC component in a subset of cases presumably enriched for mixed tumors with shared clonal driver alterations. On the other hand, 56104_A may have contained an admixed population of SCPC cells as discussed earlier, and if so, it is possible its true adenocarcinoma component might no longer be prognostic.
We transferred the 3-centroid classifier to the GenomeDx GRID® by reformulating centroids under SCAN (Table 6), a single-sample normalization method compatible with routine clinical lab environments although susceptible to batch effects. JHU-FFPE samples were handled relatively uniformly, yet demonstrated notable effects based on RNA processing date (Additional file 3: Figure S8); nevertheless SCAN (compared to RMA) empirically produced identical classification of JHU-FFPE, suggesting robustness. We applied the 3-centroid model to selected GRID® adenocarcinoma cohorts, and found that 2 Prospective (0.09%), no JHU-RP (0%), and 2 Mayo (0.3%) samples classified as prototypical SCPC, and 4 Prospective (0.17%), 2 JHU-RP (0.6%), and 10 Mayo (1.3%) samples classified as atypical SCPC (Fig. 6). Both JHU-RP samples with atypical SCPC classification were part of a distinct cluster of 4 samples featuring the highest CCP and 3 lowest ARS scores among JHU-RP, and all 4 subsequently developed metastases. Mayo had greater proportions classifying as SCPC but included suspected false positives far from training samples with low correlations to all 3 centroids. In the earlier meta-12 analysis (Fig. 2), the Mayo-FFPE dataset similarly exhibited multiple samples with low correlations to both meta-12 centroids. Mayo samples overall also had weaker correlations to the adenocarcinoma centroid (r=0.82) versus samples from Prospective (r=0.92) or JHU-RP (r=0.89).
We remark that our JHU-FFPE datasets had variable archive ages (Table 3), which potentially impacted expression and is discussed further in the next section. SCPCs and adenocarcinomas were at least relatively balanced (mean 3.6 and 3.0 years after removing the oldest sample), ideally minimizing differential bias. By contrast, cohorts demonstrated a few notable differences in tissue sources, for example pure adenocarcinomas were all biopsies. This potentially affected both expression and differential expression, however we at least found no significant differences by LIMMA between biopsies and TURPs (the most common sources) when restricted to SCPCs, or among all samples.
FFPE introduces an extra source of variability to the JHU-FFPE dataset
Principal components analysis of JHU-FFPE, considering all genes for an unsupervised approach, demonstrated rough separation of phenotypes, intermediate behavior of mixed adenocarcinomas, and discordant behavior of the meta-9 outlier samples (57914, 57634) (Fig. 7). Of all 33 principal components, the second (PC2) best separated phenotypes (AUC 86.3%) and had the greatest magnitude correlations to each of CCP (r = -0.88), ARS (r=0.69), and neuronal phenotype scores (r=-0.54), with higher correlation to the difference of ARS and CCP (r=0.93). Indeed, under GSEAPreranked applied to the PC2 gene coefficients, NELSON-RESPONSE-TO-ANDROGEN-UP was the #2 most up-regulated gene-set (out of 3739 gene-sets from the Molecular Signatures Database curated collection C2 after size filters), while the top down-regulated gene-sets were largely cell cycle related (ROSTY-CERVICAL-PROLIFERATION-CLUSTER was #1, REACTOME-CELL-CYCLE was the top Reactome pathway at #28, and KEGG-CELL-CYCLE was the top KEGG pathway at #82). By contrast, in principal component analyses of the 4 frozen tissue primary or xenograft NEPC datasets, the first principal component (PC1) always separated NEPCs from adenocarcinomas (AUC 100%) (Additional file 3: Figure S9), and moreover always had the greatest magnitude correlations to ARS (r=-0.76 to -0.98), neuronal phenotype (r=0.87 to 0.98), and CCP scores (r=0.57 to 0.93), with the exception of CCP in 1/4 datasets.
Thus in JHU-FFPE, its first principal component (PC1, representing the direction of greatest variability) appeared to include a different source of variability. While still demonstrating moderate correlations to ARS (r=-0.62) and neuronal phenotype (r=0.46) and to lesser degree to CCP (r=-0.20), PC1 did not separate phenotypes very well (AUC 61.7%), and its greatest magnitudes were notably from SCPCs of oldest FFPE age (54674, 56321_S), both archived 14-16 years (versus 0-6 years for other SCPCs). There was moderate correlation between PC1 and archive age (r=0.50), and PC1 modestly differentiated older archived samples (> 3y in FFPE) versus newer samples (p=0.04). We also tested whether PC1 was associated with sample type (biopsies versus TURPs) but did not find evidence for this (p=0.48). We applied GSEAPreranked to better characterize the source of variability captured by PC1. The most down-regulated gene-sets were related to RNA translation (REACTOME-SRP-DEPENDENT-COTRANSLATIONAL-TARGETING-TO-MEMBRANE was #1 while KEGG-RIBOSOME was the top KEGG pathway at #5). Eighteen of the top 100 gene coefficients by magnitude corresponded to ribosomal protein subunits, with PC1 highly anti-correlated to their average gene expression (r=-0.93). These genes included RPL19, which has been used previously in FFPE gene expression analysis to normalize sample input . Up-regulated gene-sets were considerably rarer (47 versus 2104 with nominal p-val < 0.01), and included epigenetic-related gene-sets (e.g., KONDO-PROSTATE-CANCER-WITH-H3K27ME3 was #3).
Given the possible influence of the variable archive ages in JHU-FFPE on gene expression, we attempted to investigate individual gene performance. Since the exon array contained probe-sets for almost every exon of a gene, probe-sets targeting the same transcript ideally behaved concordantly, and we defined correlation strength (CS) as average correlation between probe-sets targeting the same gene and restricted here to genes with 10 or more probe-sets. CS was considerably weaker in FFPE datasets versus a frozen tissue dataset, with decline related to archive age and presumably to RNA degradation (Additional file 3: Figure S10). In JHU-FFPE, CS was lower for neuronal phenotype genes (mean 0.24) versus cell cycle progression genes (0.36) or AR-signaling genes (0.56), consistent with the relative paucity of neuronal genes in differential expression analysis. For instance, CHGA had relatively weak CS versus frozen tissue (CS=0.31 versus 0.77), while androgen-related genes (KLK3, KLK2, ACPP) had the highest CS (0.86-0.88) and standard deviations (1.90-1.95) (Additional file 3: Figure S11). Accuracy in FFPE has been reported to improve upon using each gene’s most variable probe-set . Compatible with this, CS increased on average by 0.14 in JHU-FFPE upon restricting to each gene’s 5 most variable probe-sets, likely through exclusion of weakly binding, oversaturated, or unused alternative exon probe-sets. We also investigated expression in JHU-FFPE of the gene CCEPR, elevated in 88% of NEPCs in the meta-analysis. CS no longer applied since only one exon probe-set (3290641) targeted CCEPR. This probe-set did not differentiate between phenotypes (nominal p=0.54 compared with its neighbor PHYHIPL p=0.05), had relatively narrow dynamic range, and lost correlation to PHYHIPL (r=0.17 versus 0.71 in NIH Roadmap data), suggesting poor performance in FFPE.
We utilized an outlier-based meta-analysis approach to study prototypical high-grade NEPC across multiple frozen tissue datasets, although more sophisticated methods have also been described . We believe meta-12 centroids may provide a useful tool to assess for prototypical high-grade NEPC status given high quality frozen tissue expression data, however we also found evidence of highly similar meta-12 centroid correlation profiles between prototypical high-grade NEPC and small cell carcinomas from lung and possibly other sites, reflecting the challenge of determining site of origin in small cell carcinoma of unknown primary. Although we did not validate individual genes in this study, e.g. via PCR or RNA in-situ hybridization, we believe meta-12 genes are strong candidates for potential diagnostic markers, either through RNA or protein; in a previous study, we found that cyclin D1 performed effectively as a negative IHC marker of SCPC , and further evaluation of selected meta-12 genes, both up and down, is currently underway.
We provided one of the largest gene expression datasets to date of primary SCPC and high-grade adenocarcinoma, albeit in FFPE, including significant proportions of mixed SCPCs (63%), slightly above estimates from the literature (40-50%), and SCPCs with preserved AR signaling (31%), slightly above reported frequencies of AR-positive or PSA-positive SCPC (17-20%) [5, 6]. Based on meta-signature-derived subgroups of this dataset, we developed a nearest 3-centroid classifier for primary samples profiled by exon array. One adenocarcinoma, with highly aggressive metastatic progression described in a previous case report, was classified as prototypical SCPC. Two mixed adenocarcinomas (40%) were additionally classified as SCPC (1 prototypical, 1 atypical), suggesting that mixed cases might be enriched for SCPC signatures in their adenocarcinoma components, due perhaps to shared clonal origins although possibly false positives from admixture. The classifier may thus provide utility for detection of mixed cases in the biopsy setting, where only the adenocarcinoma component might get sampled.
Rare adenocarcinomas among GRID® cohorts were also classified as SCPCs under the 3-centroid model, similar to behavior of the JHU-FFPE outlier or unusual mixed adenocarcinomas. Percentages of such GRID® cases (0.3-0.6%, excluding Mayo due to suspected false positives) were generally below the presumptive frequency of SCPC (often reported as 0.5-2%) , roughly in line with expectations given that GRID® cohorts consisted of RP adenocarcinomas and inherently excluded SCPCs. We suspect these cases may correspond to diagnostically challenging poorly differentiated tumors, misdiagnosed samples, mixed adenocarcinomas, or fortuitously sampled occult SCPC components, however further investigation is necessary. Cases were also too scarce for meaningful Kaplan-Meier analysis, however the 2 JHU-RP cases with atypical SCPC classification belonged to a cluster of 4 cases that all subsequently developed metastases. Thus, we speculate the classifier may detect unusually aggressive cases and potentially have prognostic relevance.
One main limitation of our study was the lack of an independent validation set of primary SCPCs to test the 3-centroid classifier. In contrast to the multiple large GRID® adenocarcinoma cohorts, few SCPCs have been profiled on the GRID®, due to rarity of diagnosis and also scarcity of tissue, given that SCPC patients have traditionally been treated with systemic therapy (usually after biopsy-based diagnosis) and not with RP. Moreover, patients found to have unexpected SCPC upon RP would typically have little need for prognostic clinical RNA expression testing on the GRID®. Consequently we were not aware of other exon array datasets with annotated SCPCs. However, it was at least encouraging that the 2 JHU-FFPE SCPCs excluded from training due to old archive age were indeed classified as prototypical SCPCs despite their outlier PCA trends.
Another limitation of the classifier was its derivation from relatively few atypical SCPCs, indicating a need for more samples to definitively establish whether cases such as 57912_S with a uniform hybrid IHC pattern and small cell morphology are indeed a true subcategory with common underlying genomic properties. Similarly, the pattern of low ARS without neuronal over-expression may deserve a separate category in the primary or metastatic setting, but also requires more examples. Such non-standard cases, often manifesting as hybrid or unusual IHC profiles, can be puzzling for pathologists to evaluate. The ultimate clinical question will be whether these potential expression-based subtypes have prognostic relevance or predict response to therapy. Anecdotally, the outlier adenocarcinoma in our JHU-FFPE dataset with low ARS and non-elevated neuronal expression had unusually aggressive metastatic progression described in a case report . We did not have access to outcome data of the atypical hybrid SCPCs in our dataset and were not aware of other hybrid SCPCs in the literature, however rare adenocarcinoma cases with aggressive progression and hybrid IHC co-expression of AR and chromogranin have been reported [9, 10]. There is also increasing evidence for lineage plasticity between adenocarcinoma and neuroendocrine phenotypes in metastatic prostate cancer, induced upon anti-androgen therapy and partially reversed through epigenetic interventions such as EZH2 inhibition [43,44,45]. Our atypical hybrid SCPCs, as well as the outlier adenocarcinoma, overexpressed epigenetic genes including EZH2. We hope increased recognition of these unusual phenotypes will lead to larger collections of cases and eventual clarity on their clinical relevance.
Meta-analysis generates a robust signature of prototypical high-grade NEPC, with close resemblance to small cell lung cancer. Atypical NEPC potentially includes a hybrid subcategory exhibiting preserved AR-signaling and a non-neuronal subcategory with AR loss and high proliferation but without expression of neuroendocrine markers that may overlap with adenocarcinomas. In the primary setting, FFPE material may be used to generate a classifier of SCPC incorporating disease heterogeneity, with potential prognostic implications. However, further testing with a proper validation set is required.
- AdCa (or Ad):
AR-signaling (gene signature)
Cell cycle progression (gene signature)
Castration-resistant prostate cancer
Large cell neuroendocrine carcinoma
Neuroendocrine prostate cancer
Small cell lung cancer
Small cell prostatic cancer
Epstein, JI.,et al., Proposed morphologic classification of prostate cancer with neuroendocrine differentiation. Am J Surg Pathol, 2014. 38(6): p. 756-767.
Wang HT, et al. Neuroendocrine prostate cancer (nepc) progressing from conventional prostatic adenocarcinoma: factors associated with time to development of nepc and survival from nepc diagnosis-a systematic review and pooled analysis. J Clin Oncol. 2014;32(30):3383–90.
Tan, HL, et al., Rb loss is characteristic of prostatic small cell neuroendocrine carcinoma. Clin Cancer Res, 2014. 20(4): p. 890-903.
Lapuk AV, et al. From sequence to molecular pathology, and a mechanism driving the neuroendocrine phenotype in prostate cancer. J Pathol. 2012;227(3):286–97.
Yao JL, et al. Small cell carcinoma of the prostate: an immunohistochemical study. Am J surg pathol. 2006;30(6):705–12.
Wang W, Epstein JI. Small cell carcinoma of the prostate. A morphologic and immunohistochemical study of 95 cases. Am J Surg Pathol. 2008;32(1):65–71.
Kumar A, et al. Substantial interindividual and limited intraindividual genomic diversity among tumors from men with metastatic prostate cancer. Nat Med. 2016;22(4):369–78.
Beltran H, et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat Med. 2016;22(3):298–305.
Roudier MP, et al. Metastatic conventional prostatic adenocarcinoma with diffuse chromogranin a and androgen receptor positivity. J Clin Pathol. 2004;57(3):321–3.
Wu C, et al. Integrated genome and transcriptome sequencing identifies a novel form of hybrid and aggressive prostate cancer. J Pathol. 2012;227(1):53–61.
Beltran H, et al. Molecular characterization of neuroendocrine prostate cancer and identification of new drug targets. Cancer Discov. 2011;1(6):487–95.
Lin D, et al. High fidelity patient-derived xenografts for accelerating prostate cancer discovery and drug development. Cancer Res. 2014;74(4):1272–83.
Tzelepi V, et al. Modeling a lethal prostate cancer variant with small-cell carcinoma features. Clin Cancer Res. 2012;18(3):666–77.
Zhang X, et al. Srrm4 expression and the loss of rest activity may promote the emergence of the neuroendocrine phenotype in castration-resistant prostate cancer. Clin Cancer Res. 2015;21(20):4698–708.
Hansel DE, et al. Shared tp53 gene mutation in morphologically and phenotypically distinct concurrent primary small cell neuroendocrine carcinoma and adenocarcinoma of the prostate. Prostate. 2009;69(6):603–9.
Gravendeel LA, et al. Gene expression profiles of gliomas in formalin-fixed paraffin-embedded material. Br J Cancer. 2012;106(3):538–45.
Abdueva D, et al. Quantitative expression profiling in formalin-fixed paraffin-embedded samples by affymetrix microarrays. J Mol Diagn. 2010;12(4):409–17.
Hieronymus H, et al. Gene expression signature-based chemical genomic prediction identifies a novel class of hsp90 pathway modulators. Cancer cell. 2006;10(4):321–30.
Cuzick J, et al. Prognostic value of an rna expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study. Lancet Oncol. 2011;12(3):245–55.
Ritchie ME, et al. Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):E47.
Tsai H, et al. Cyclin d1 loss distinguishes prostatic small-cell carcinoma from most prostatic adenocarcinomas. Clin Cancer Res. 2015;21(24):5619–29.
Travis WD. Update on small cell carcinoma and its differentiation from squamous cell carcinoma and other non-small cell carcinomas. Mod Pathol. 2012;(25 suppl 1):S18–30.
Dalela D, et al. Contemporary role of the decipher(r) test in prostate cancer management: current practice and future perspectives. Rev Urol. 2016;18(1):1–9.
Karnes RJ, et al. Validation of a genomic classifier that predicts metastasis following radical prostatectomy in an at risk patient population. J Urol. 2013;190(6):2047–53.
Erho N, et al. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy. Plos One. 2013;8(6):E66855.
Ross AE, et al. Tissue-based genomics augments post-prostatectomy risk stratification in a natural history cohort of intermediate- and high-risk men. Eur Urol. 2016;69(1):157–65.
Glass AG, et al. Validation of a genomic classifier for predicting post-prostatectomy recurrence in a community based health care setting. J Urol. 2016;195(6):1748–53.
Grasso CS, et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature. 2012;487(7406):239–43.
Sircar K, et al. Mitosis phase enrichment with identification of mitotic centromere-associated kinesin as a therapeutic target in castration-resistant prostate cancer. Plos One. 2012;7(2):E31259.
Yang M, et al. Long noncoding rna cche1 promotes cervical cancer cell proliferation via upregulating pcna. Tumour Biol. 2015;36(10):7615–22.
Adler P, et al. Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods. Genome biol. 2009;10(12):R139.
Haffner MC, et al. Diagnostic challenges of clonal heterogeneity in prostate cancer. J Clin Oncol. 2015;33(7):E38–40.
Tan PY, et al. Integration of regulatory networks by nkx3-1 promotes androgen-dependent prostate cancer survival. Mol cell biol. 2012;32(2):399–414.
Rajan P, et al. Next-generation sequencing of advanced prostate cancer treated with androgen-deprivation therapy. Eur urol. 2014;66(1):32–9.
Sharma A, et al. The retinoblastoma tumor suppressor controls androgen signaling and human prostate cancer progression. J clin invest. 2010;120(12):4478–92.
Akamatsu S, et al. The placental gene peg10 promotes progression of neuroendocrine prostate cancer. Cell rep. 2015;12(6):922–36.
Palm K, Metsis M, Timmusk T. Neuron-specific splicing of zinc finger transcription factor rest/nrsf/xbr is frequent in neuroblastomas and conserved in human, mouse and rat. Brain res mol brain res. 1999;72(1):30–9.
Shimojo M, et al. The small cell lung cancer-specific isoform of re1-silencing transcription factor (rest) is regulated by neural-specific ser/arg repeat-related protein of 100 kda (nsr100). Mol Cancer Res. 2013;11(10):1258–68.
Greytak SR, et al. Accuracy of molecular data generated with ffpe biospecimens: lessons from the literature. Cancer Res. 2015;75(8):1541–7.
Yang W, et al. Direct quantification of gene expression in homogenates of formalin-fixed, paraffin-embedded tissues. Biotechniques. 2006;40(4):481–6.
Hughey JJ, Butte AJ. Robust meta-analysis of gene expression using the elastic net. Nucleic acids res. 2015;43(12):E79.
Helpap B, Kollermann J, Oehler U. Neuroendocrine differentiation in prostatic carcinomas: histogenesis, biology, clinical relevance, and future therapeutical perspectives. Urol int. 1999;62(3):133–8.
Kleb B, et al. Differentially methylated genes and androgen receptor re-expression in small cell prostate carcinomas. Epigenetics. 2016;11(3):184–93.
Ku SY, et al. Rb1 and trp53 cooperate to suppress prostate cancer lineage plasticity, metastasis, and antiandrogen resistance. Science. 2017;355(6320):78–83.
Mu P, et al. Sox2 promotes lineage plasticity and antiandrogen resistance in tp53- and rb1-deficient prostate cancer. Science. 2017;355(6320):84–8.
The authors would like to thank Angelo M. De Marzo for contributing several cases to the study, Luigi Marchionni for comments and advice, and the reviewers for valuable suggestions.
Funding was provided in part by NIH/NCI Prostate SPORE P50CA58236 (TLL). None of the funding bodies had any part in the design of the study and collection, analysis, and interpretation of data, or in writing the manuscript.
Availability of data and materials
Gene expression datasets used in this study are available from the Gene Expression Omnibus database GEO (microarrays), cBioPortal (RNA-seq), or the corresponding author on reasonable request. Inquiries regarding the GRID Prospective dataset can be directed to authors from GenomeDx.
Ethics approval and consent to participate
Informed consent to use the tissue samples in this study was waived by the John Hopkins School of Medicine Institutional Review Board.
Consent for publication
JL, MA, NE and ED are employees of GenomeDx Biosciences; TLL has received research funding from GenomeDx. HKT declares no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional methods on bioinformatic processing and analysis, and additional legends. (DOCX 49 kb)
Additional tables on NEPC gene-lists, meta-9 genes, and LIMMA comparisons. (XLSX 266 kb)
Additional figures on meta-12 scores, AR signaling versus AR / CCP / RAB3B, mixed tumors, REST exons, batch effects, principal components, and correlation strengths. (PDF 13484 kb)
About this article
Cite this article
Tsai, H.K., Lehrer, J., Alshalalfa, M. et al. Gene expression signatures of neuroendocrine prostate cancer and primary small cell prostatic carcinoma. BMC Cancer 17, 759 (2017). https://doi.org/10.1186/s12885-017-3729-z