In this study we used unsupervised hierarchical clustering and gene expression pattern combination approaches to detect robust molecular clusters which classify RCC into three molecular groups with distinct prognostic values. In many previous studies, RCC cases were either preselected or expression data were linked according to pathologic and clinical criteria for further analysis
[15–26]. Potential markers may have therefore represented surrogate traits, overall confirming phenotypes at a molecular basis. To our knowledge, patterns of gene expressions, independent of pathological or single molecular parameters pointing to general RCC biology remained uncovered to date.
Classification system databases suitable for comprehensive gene expression clustering
To identify common RCC gene expression signatures, we searched for large gene sets using the classification systems INGENUITY (
http://www.ingenuity.com/), KEGG (
http://www.genome.jp/kegg/) and PANTHER (
http://www.pantherdb.org). Our gene expression analyses demonstrated that more than 150 genes are required to obtain major and clearly distinguishable gene clusters. In contrast to Ingenuity and KEGG, only PANTHER is able to integrate several hundred genes into “superior pathways”. Only within the clustering of these four dominating processes different major group patterns were obtained. The number of genes in the remaining pathways was too low and therefore not suitable for cluster analysis. It is important to understand that it was not our intention to analyze specific pathways within RCC. We rather used this platform to visualize sets of gene expression clusters which are differentially regulated within different RCC. There was no notable association between any of the RCC groups and any of the 4 pathways. The 92 genes extracted from the 4 matrices (pathways) were rather equally distributed over the 3 RCC groups suggesting the partial involvement of all 4 pathways in the RCC groups. In our opinion the results of the clustering by using randomly selected 5 sets of about 700 genes clearly indicate that we could have taken any arbitrary chosen gene list for clustering independent of any pathways.
Unsupervised versus supervised clustering
For supervised analysis of gene expression patterns in tumors algorithms are commonly used that are linked to known clinical parameters such as tumor subtype, metastatic-nonmetastatic or treated-untreated. Consequently, the number of clusters to be expected is already known. As we tried to identify non-biased gene expression patterns, we chose unsupervised analysis for which the resulting numbers of clusters are unknown. To circumvent this problem we combined the strongest gene expression patterns into a new matrix and re-clustered them by using the second clustering step, importantly, against the same tumor cohort (
Additional file 2: Figure S1 A-D and Figure
1). Our approach to randomly select genes and re-cluster them demonstrated that the three tumor groups remained stable (
Additional file 4: Figure S2). We therefore believe that our two-times-two-way non-supervised clustering method is an alternative strategy to re-classify tumor types independent of TNM criteria. We cannot rule out that additional groups exist which may appear if more samples are included in the analysis.
Molecular signatures strictly separate RCC tissue from RCC cell lines
Surprisingly, the expression signature yielded from the renal cancer cell lines was clearly distinguishable from those derived from renal cancer tissues. We observed that individual cell line expression profiles, independent of their respective primary tumors, were all similar to each other. This general finding may mainly be caused by culture conditions, the artificial environment and the two-dimensional structure of cell culture layers. We therefore believe that expression profiling using cell lines would never lead to the detection of common renal cancer tissue-specific signatures. This also raises concerns about the possibility of discovering novel strategies for diagnosis and therapies by using in vitro systems only.
Molecular signatures do not coincide with pathologic criteria
In contrast to the cell lines which represent a separate group, RCC metastases and primary RCC split into group A, B or C, irrespective of the tumor subtype, stage, differentiation grade or sarcomatoid differentiation. When looking at RCC group A, which contains almost only ccRCC, it seems that the clustering results correlate with the histological subtype. However, these ccRCC were of different tumor stage and grade. The same is true for the tumors in group B and C. In these groups ccRCC, pRCC as well as chRCC of different pathologic parameters were allocated. Furthermore, our molecular classification allows to additionally refine the staging and grading of tumors. Organ-confined RCC, particularly pT1 tumors, generally considered to have a good prognosis can further be subdivided in group A (good), B (worse) or C (worst) which also may have predictive impacts. Although ccRCC, pRCC and chRCC have a different morphological background, the combined appearance of the three histological subtypes across different clusters suggests molecular and functional similarities.
The three RCC output signatures are not influenced by the VHL/HIF axis
Based on the results obtained from a series of previous VHL mutation analyses, it is widely accepted that the loss of function of pVHL mainly contributes to the development of ccRCC
. The inactivation of pVHL leads to HIF-α stabilization and, hence, to the upregulation of a number of genes involved in RCC progression (i.e. VEGFA, PDGF, TGF, CXCR4, CA9)
[44–46]. Therefore, we assumed to detect gene expression patterns connected to HIF signaling pathways. However, gene expression patterns demonstrated no remarkable linkage between HIF-regulated pathways and any of the RCC subgroups. This finding is in line with the results of a recent study in which VHL wild-type tumors, HIF-1α and HIF-2α overexpressing tumors, as well as HIF-2α-only overexpressing tumors were found in both ccRCC clusters
We also looked at the VHL mutation status in all analyzed ccRCC and identified gene sequence alterations in the majority of the tumors
. A recent study demonstrated that the thermodynamic stability and the functionality of pVHL is dependent on the location and the type of mutation
. As the frequencies and types of VHL mutations were similar in all three RCC groups, it was not surprising that there was no association with the gene expression patterns, neither with the VHL mutation status nor with any HIF-driven pathways (data not shown). Our data strongly suggest the existence of pVHL-independent mechanisms, resulting in distinct gene expression outputs which reflect common biologic pathways in renal cell cancer.
RCC gene expression signatures are not directly linked to copy number alterations
Our integrative approach that combined SNP- and microarray data, revealed no direct correlation between the signatures of CNA-affected genes analyzed in 45 RCC and the three RCC groups. Only one of the 92 cluster forming genes (ITGAL; see
Additional file 3: Table S2) belonged to the 769 genes residing within the 126 CNAs found in our RCC set. Moreover, hierarchical clustering of both CNA-affected and non-affected genes demonstrated that the three RCC gene expression patterns are not directly influenced by copy number alterations. This finding is in line with a recent study which also found many discrepancies between CNA and gene expression
. The authors suggest that the expression of many “driver” genes are less correlated with their copy number than “passenger” genes due to selective pressure. Additional multiple ways exist to up- or down-regulate a gene.
RCC is not caused by alteration of single genes and pathways
It is remarkable that, although type and frequencies of CNAs were largely differing within the tumor cohort and varied between none (!) and 18 altered genomic regions in single tumors, each of the three group-specific gene expression patterns remained stable. We postulate that each of these RCC must have developed individual mechanisms in addition to CNAs (i.e. mutations, methylations, transcriptional and translational modifications), which together support the regulation of molecular components to reach one of the three tumor groups. A recent study showing that low CNA rates in tumors are related to increased levels of global DNA methylation and vice versa supports our hypothesis.
In contrast to previous approaches, we combined several subtypes of RCC for non-supervised hierarchical clustering approaches in combination with LDA entirely unbiased from different clinico-pathologic parameters. Our results demonstrate that RCC group formation patterns remained very similar across various sets of genes arguing for a substantial number of genes which participate in the molecular definition of a RCC group. It is therefore not surprising that more than one third of the human genes have already been identified as being cancer-relevant
 and many of them being claimed as potential biomarkers
. As a consequence, we believe that in a tumor many molecular pathways must be directly or indirectly affected to eventually reach one of the three output signatures.
Characterization of the three RCC groups at the protein level
By subsequently performing our TMA analysis on a second, larger cohort of RCC we validated our results also on the protein level. To find appropriate markers we tested several antibodies directed against proteins whose genes were clearly upregulated in one of the groups. Among 10 candidates tested only MSH6, a DNA mismatch repair enzyme, and DEK, a chromatin- and RNA-associated protein mutated or overexpressed in certain cancers, showed reliable immunostaining results. The third protein, CD34, was indirectly identified by retrospectively analyzing the tumors histologically after the clustering analyses (Figure
1). We found increased microvessel density in group A by selecting the RCC samples randomly without knowing any specific pathological features (with the exception of stage and grade). Although not expressed in RCC cells, this endothelial marker is an ideal marker to morphologically distinguish group A from group B and C. Our effort to select suitable protein markers for the RCC groups demonstrated strong differences between the expression signatures at the RNA and the protein levels. Further protein analyses are needed to identify additional markers or marker combinations with both prognostic and predictive value.