Skip to main content
  • Research article
  • Open access
  • Published:

Molecular subtyping improves prognostication of Stage 2 colorectal cancer



Post-surgical staging is the mainstay of prognostic stratification for colorectal cancer (CRC). Here, we compare TNM staging to consensus molecular subtyping (CMS) and assess the value of subtyping in addition to stratification by TNM.


Three hundred and eight treatment-naïve colorectal tumours were accessed from our institutional tissue bank. CMS typing was carried out using tumour gene-expression data. Post-surgical TNM-staging and CMS were analysed with respect to clinicopathologic variables and patient outcome.


CMS alone was not associated with survival, while TNM stage significantly explained mortality. Addition of CMS to TNM-stratified tumours showed a prognostic effect in stage 2 tumours; CMS3 tumours had a significantly lower overall survival (P = 0.006). Stage 2 patients with a good prognosis showed immune activation and up-regulation of tumour suppressor genes.


Although stratification using CMS does not outperform TNM staging as a prognostic indicator, gene-expression based subtyping shows promise for improved prognostication in stage 2 CRC.

Peer Review reports


Colorectal cancer (CRC) is a highly heterogeneous disease with different outcomes, therapeutic response and clinical behavior, even within the same clinically designated staging and grading systems. Recent attention has been focused on the molecular mechanisms underlying CRC, and developing novel approaches to stratify tumours into subgroups that have clinical utility for improved prognostication and targeted therapy [1].

Previous molecular classification systems have relied on combinations of molecular features, including BRAF, KRAS and TP53 mutation status, microsatellite instability, CpG island methylator phenotype, somatic copy number alterations, and activation of various molecular pathways such as WNT and MYC, in order to classify CRC into subgroups [2,3,4,5]. Associations with clinical and histological features and outcome have been reported using these subtyping classification systems, and many show overlapping characteristics. However, discrepancies exist between these classification systems, and they have not been widely embraced in the clinical setting for CRC, due to technological limitations and cost of undertaking the necessary laboratory tests, as well as the question of the impact of tumour heterogeneity on results [6, 7].

The advent of high-throughput sequencing has ushered in a new era of subtyping studies, and in 2015, the Colorectal Cancer Subtyping Consortium published a novel classification system based solely on gene expression data that drew largely on six previously published molecular classification systems [8]. The introduced classifier stratifies CRC into four consensus molecular subtypes (CMS). In particular, the value of using CMS for prognostication and precision medicine has been highlighted in several recent publications [9]. However, in the absence of targeted therapy regimens for primary CRC, the value of stratifying tumours using CMS as a prognostic tool has yet to be evaluated.

In order to assess the utility of CMS as a prognostic tool for CRC, we have evaluated molecular subtyping in relation to clinicopathologic features and patient outcome in a large single-institution cohort of treatment-naïve CRC tumours, and compared the findings to that of standard histological classification.



Colorectal cancer (CRC) tissue samples banked at the Cancer Society Tissue Bank (University of Otago, Christchurch, New Zealand), with informed written consent were used in this study. Patient data, including staging, recurrence, metastases, treatment and histology was retrospectively collected from patient medical records. Exclusion criteria included patients with hereditary CRC, and patients who had received pre-operative chemotherapy or radiation therapy, resulting in a cohort of 308 patients. This study was undertaken with ethical approval from the University of Otago Human Ethics Committee (ethics approval number: H16/037).

RNA extraction

Tumour core samples were dissected from surgical specimens and immediately frozen in liquid nitrogen and initially stored at − 80 °C. RNA was extraction was carried out as detailed previously [10]. Briefly, RNA was extracted from < 20 mg of tissue using RNEasy Plus Mini Kit (Qiagen, Hilden, Germany), including DNAse treatment, following tissue disruption using a Retsch Mixer Mill. Purified RNA was quantified using the NanoDrop 2000c spectrophotometer (Thermo Scientific, Asheville, NC, USA), and stored at − 80 °C.

RNA sequencing

RNA-sequencing was performed using the Illumina HiSeq 2500 V4 platform (Illumina, San Diego, CA, USA) to produce 125 bp paired end reads, as previously described [10]. In brief, library preparation, including ribosomal RNA depletion using RiboZero Gold, was carried out using Illumina TruSeq V2 reagents. The libraries were sequenced on 3 × 5 lanes of an Illumina HiSeq 2500 instrument.

RNA sequencing data processing

Low quality read segments, remnant adaptor sequences and very short reads were subsequently removed using fastq-mcf from ea-utils (v 1·1·2·779 [11],) and SolexaQA++ with default parameters (v3·1·7·1 [12],). Salmon (v0·11·2 [13],) was used to quantify transcript expression of GRCh38 (Ensembl release 93). Gene-level tags-per-million (TPM) counts were derived using the tximport package (v1·6·0 [13],). The data processing protocols can be accessed at

Consensus molecular subtyping

The Single Sample Predictor (SSP) method available in the CMS classifier (v1·0·0,!Synapse:syn4961785) R package [8] was used to classify samples into molecular subtypes of colorectal cancer based on derived TPM values for genes. Similarity between gene expression profiles is calculated as Pearson’s correlation of log2 scaled values, and a sample is considered to be similar to a centroid if the correlation is at least 0·15. In order for a sample to be classified, a correlation to the most similar centroid has to be higher by 0·06 than a correlation to the second most similar centroid. These values are set by default in the SSP method. The data processing protocols can be accessed at

Differential gene expression analysis

We used count tables from the RNA sequencing data processing, in addition to the class information produced through the CMS classification step, and ran a differential gene expression analysis for all genes. We used each sample within a given subtype as a replicate of that subtype, and ran the edgeR (v3·20·7 [14],) package to compare each subtype against each other, extracted genes that are up- or down-regulated, using a Benjamini and Hochberg false-discovery rate [15] adjusted P-value (< 0·05), and a log2 fold-change greater or smaller than zero. Only genes were considered that were differentially expressed in all comparisons for a subtype against all other subtypes. The data processing protocols can be accessed at

Enrichment analysis

We used the differentially expressed genes per CMS subtype, and the type of expression (up- or down-regulation) and sub-selected the top 500 genes and input to the clusterProfiler package (v3·6·0 [16],) for term enrichment analysis. We sub-selected terms based on an FDR adjusted P-value < 0·1 for enrichment of the gene-set in biological categories. The biological categories and corresponding gene-sets used in the analysis were extracted from MSigDB [17] (version 6·1). We sub-selected the following categories for the analysis: KEGG, REACTOME, BIOCARTA, PID, HALLMARK GENES, and Gene Ontology (GO) biological processes. The data processing protocols can be accessed at

Statistical analysis

Associations between classifiers and clinicopathological variables were assessed by Chi-square tests or Fisher’s exact test, if there were expected cell counts less than one, or if 80% of cells had counts less than five. For tables larger than two by two, P-values were computed with Monte-Carlo simulation. Kaplan-Meier survival curves were calculated using 5- and 10-year estimates of survival for overall survival (OS) and progression-free survival (PFS). Association of classifiers with OS and PFS progression was assessed using Cox proportional hazard models with and without clinicopathological covariates. Significance of multilevel factors was assessed with Chi-squared tests on analysis of deviance. Examination of residual plots showed that modelling assumptions were valid. All statistical tests were 2-sided and considered significant at a P-value of 0·05, and all analysis was performed in R 3·4·3 (R foundation for statistical computing, Vienna, Austria).


Patient cohort

Colorectal cancers from 308 patients were included (median age 73·7 years; range, 28–91 years). One hundred and sixty-three patients were female and 145 were male, with 296 of the patients of European decent, three Asian and nine Maori. Right-sided tumours were more common (55%) compared to left-sided colon tumours (27%) and rectal tumours made up 18%. Median follow up was 50 months (range, 0·3–172 months). More detailed patient demographics are shown in Additional file 1: Table S1.

Association of TNM staging with clinical variables

Post-surgical TNM staging based on pathological examination stratified the cohort as follows: 53 stage 1, 128 stage 2 patients, 105 stage 3 and 22 stage 4 patients. Analysis of associations between post-surgical staging of patients and clinicopathological variables (Tables 1 and 2) showed that increasing TNM stage was significantly associated with lymph-node positivity and subsequent development of metastasis, which can be attributed to liver metastases; there was no association with local recurrence.

Table 1 Tumour recurrence, metastasis and lymph-node invasion by post-operative stage and by Consensus Molecular Subtype
Table 2 Histological characteristics of colorectal cancer tumours by post-operative stage and by Consensus Molecular Subtype

CMS subtypes and clinical variables

Of the 308 patients, 60 were classified as CMS1 (19%), 145 as CMS2 (47%), 38 as CMS3 (12%) and 17 as CMS4 (6%) (Additional file 1: Table S2). Univariate analysis of patient demographic and clinical variables showed that CMS1 tumours were more likely to be right-sided, found in females, poorly-differentiated, with a high proportion of mucinous histology and less likely to be seen in younger patients. CMS2 tumours made up nearly half of our cohort and were predominantly left-sided tumours found in male patients, and showed a negative association with mucinous type. CMS4 tumours were associated with younger age, and presented at an advanced TNM stage with lymph-node positivity. There was no significant difference in the local recurrence rates between subtypes, but CMS2 and CMS4 were associated with higher rates of distant metastases and this association was attributable to liver metastases. A detailed breakdown of associations with CMS subtypes is given in Tables 1, 2 and 3.

Table 3 Patient and tumour characteristics by Consensus Molecular Subtype

Survival analysis

The median follow-up period was 50 months (0·3–172 months) with a median survival of 82 months (95% CI 71·8–110·5). Survival curves and proportions at 5 and 10 years are shown in Fig. 1. Both progression-free survival (PFS, P = 0·039) and overall survival (OS, P = 0·036) were associated with CSM subtype in the classified samples. The associations were largely due to the difference between CMS subtype 4 and the other classes; the hazard ratios for CMS4 relative to all other classified samples were 2·28 (95% CI 1·28–4·05, P = 0·005) and 2·29 (95% CI 1·26–4·18, P = 0·007) for PFS and OS, respectively. However, after adjusting for age and sex, there was no significant association between CMS stage and OS (P = 0·11) or PFS (P = 0·12).

Fig. 1
figure 1

a Progression-free and (b) overall survival by consensus molecular subtype (CMS), and (c) progression-free and (d) overall survival by TNM stage. Kaplan-Meier survival curves with estimates and 95% confidence intervals for survival probabilities at 5 and 10 years

10-year overall survival based on TNM staging showed that, when adjusted for age and gender, Stage 1 and 2 show little difference in survival outcome. However, there is some evidence that Stage 3 is associated with increased mortality, while it is quite clear that Stage 4 is associated with increased mortality (OR = 2·8, 95% CI 1·6–5·0, P <  0·0005).

Considering all samples, older and male patients were at greater risk of poorer outcomes from CRC (Table 4). Cancers that were rectal, had lymph-node involvement, local recurrence or post-operative metastases posed significantly greater risk, however side did not significantly affect risk. Adjusting for all other covariates showed that there was independent risk associated with lymph-node involvement, local recurrence and post-operative metastases, but not rectal cancers. Including both TNM stage and CMS in models of survival analysis shows that TNM stage significantly explains mortality independently of age and gender, whereas CMS subtype does not. From this we conclude that stratification using CMS does not perform as well as TNM staging as an independent prognostic indicator in our cohort.

Table 4 Hazard ratios for risk factors associated with mortality in colorectal cancer

Of the 308 patients, 63 patients had relapse of their disease; either local recurrence or distant metastases or both within the follow-up period. There was no significant difference in the median survival after relapse which was 16·5 months, 12·4 months, 33·9 months and 4·6 months for CMS1, CMS2, CMS3 and CMS4 tumours respectively (P = 0·187).

Prognostic effect of CMS in CRC stratified by TNM stage

There were 17 participants with stage 4 cancer classified by CMS and coincidentally 17 CMS4 patients. These numbers were insufficient to draw robust conclusions for stage 4 or CMS4 when cross tabulated, and were omitted from the analysis. Differential survival by stage (1 to 3) and CMS (1 to 3) was identified by analysis of deviance on a Cox proportional hazard model with interaction between TNM stage and CMS (P = 0.048). Including covariates for age (dichotomous at 80), sex, tumour site and side increased the significance of this effect (P = 0.022). To assess the magnitude of the differences between CMS subtypes within different stage tumours, survival analysis was performed on the data stratified by TNM stage. Median survival times were calculated and Cox proportional hazard models fitted with and without covariates for age, sex, site and side (Table 5). There was a significant difference in survival predicted by CMS for stage 1 tumours. However, this was explained by covariates. For stage 2 tumours, there was a suggestion that CMS subtype 3 has worse survival than CMS1 and 2, which was statistically significant after adjusting for age, sex, site and side. There was no evidence that outcome differed by CMS subtype for stage 3 or 4.

Table 5 Survival time by CMS stratified by TNM stage

Differential gene expression and gene-set enrichment analysis in stage 2 tumours

Differentially expressed genes between Stage 2 patients who died and those who were alive at the end of the follow-up period, were further analysed to identify genes potentially associated with survival in Stage 2 tumours. Differentially up-regulated genes strongly associated with survival in this patient group includes immune-cell related genes, in particular genes coding for B-cell markers, and several known (LRRC4, PKNOX2, FEZF2) and putative tumour suppressor genes (MTO18B, NCAM1 and SCN4B) (Additional file 2: Table S3). Genes that were significantly up-regulated in patients with poor survival included pro-inflammatory genes (IL17REL, RETNLB) and genes that have been previously associated with progression and poor outcome in CRC (ERBB2, TBLRXR1, TAPBP, CPS1, AGR2) (Additional file 3: Table S4).

In order to compare biologic pathways and processes potentially associated with survival in this subgroup of patients, we used DEGs as input into an assortment of gene ontology tools (Additional file 1: Figure S1). Differentially upregulated biologic pathways associated with survival were predominantly immune pathways, including B-cell activation, IL-12 and PD-1 signalling and T-cell activation. In addition, glutamatergic signalling was differentially enriched in Stage 2 patients still alive at the end of follow-up (Additional file 4: Table S5). GSEA showed an enrichment of pathways involved in metabolic regulation in Stage 2 tumours, which reflects the association of CMS3 with poor survival, and also differential up-regulation of processes involved in protein and nucleic acid synthesis (Additional file 5: Table S6).


Recent advances in gene expression analysis have culminated in the publication of a Consensus Molecular Subtyping (CMS) system by the Colorectal Cancer Subtyping Consortium (CRCSC) that stratifies CRC into one of four subtypes based on transcriptional profiling. The CRCSC study reported an association between CMS4 and worse patient outcome, and between CMS1 and survival after relapse. Although many subsequent reports mention the prognostic potential of CMS, no study, to date, has validated the prognostic impact of the subtyping system compared to the routinely used staging for primary CRC.

In a large, single-institution cohort of chemotherapy-naïve, surgically treated colorectal cancers, we have shown that traditional TNM staging outperforms molecular subtyping in prognostication of CRC. Post-surgical staging of this cohort was carried out according to UICC guidelines and staging was similar to that expected of a treatment-naïve cohort. Association of stage with clinical variables found few associations beyond the parameters used to carry out staging, namely lymph-node involvement and distant metastases. As previously reported for other cohorts [18, 19], the association of increasing stage with metastasis was also largely driven by liver metastases.

In addition to histological staging, we carried out consensus molecular subtyping (CMS), based on RNA-sequencing derived gene-expression profiles from tumour tissue. Stratification into CMS yielded similar proportions of CMS1 and CMS3 and unclassified tumours as described by the CRCSC [8]. Our cohort contained a considerably greater proportion of CMS2 tumours at 47%, compared to 37% reported by CRCSC, and fewer CMS4 tumours, 6% compared to 23%. The difference in reported proportions of CMS may be, at least in part, accounted for by the inclusion criteria of surgery with curative intent and the exclusion of patients who received neo-adjuvant chemo- or radiotherapy in this study. A recent report by Trumpi et al. reported that neoadjuvant therapy induces a mesenchymal phenotype in residual tumour cells and, as such, may lead to an increase in the reporting of CMS4 subtypes [20]. These criteria may have excluded many advanced-stage tumours, which were shown to be associated with CMS4 [8]. Intra-tumoural heterogeneity may also affect the classification of CMS4 tumours, as the EMT-associated genes seen in CMS4 tumours may reflect upregulated genes derived from fibroblast and mesenchymal cells present in the stromal background rather than directly from the tumour itself [9, 10, 21, 22], and several studies have suggested that the location and number of tumour biopsies can undermine the accuracy of CMS [23,24,25]; a limitation of this study is the use of a single tumour sample to carry out gene-expression profiling.

Stratification into CMS showed similar associations with clinic-pathological variables as previously reported by CRCSC and other studies. CMS1 tumours were associated with right-side, female, node-negative and poorly-differentiated, with a high proportion of mucinous histology and less likely to be seen in younger patients under 60 years of age. CMS2 tumours made up nearly half of our cohort and were predominantly left-sided tumours found in male patients, and showed a negative association with mucinous type. Patients with CMS3 type tumours were associated with a lower TNM stage. CMS4 tumours were associated with younger age, rectal tumours and presented at an advanced TNM stage with lymph node positivity. Indeed, lymph-node positivity is shown to increase through CMS1, 2, 3 to CMS4.

The established association between increasing tumour stage and poorer outcome was recapitulated in our cohort, in terms of progression-free and overall survival. While post-surgical staging is the mainstay of prognostication in most clinical centres, the potential for refining prognostication using molecular features has been widely investigated, and the combined use of different clinical and molecular markers have shown links with prognosis in CRC, e.g. while BRAF mutations have been associated with poorer outcome [26], the effects of these mutations may be mitigated in MSI tumours [27]. KRAS mutations are also associated with a poorer outcome, but this association is stronger in distal compared to proximal tumours [28].

The original study by Guinney et al. first describing consensus molecular subtyping showed an association between CMS4 and poor overall survival, and between CMS1 and survival after relapse [8]. Several studies have incorporated CMS with other molecular features to in order to refine prognostic groups, and have described poorer outcomes in BRAF-mutated CMS1 MSS tumours, and KRAS-mutated CMS2/3 MSS tumours [29], and favourable outcomes in CMS1 MSI tumours [30]. Although many subsequent publications have emphasised the prognostic importance of CMS, the utility of CMS as a stand-alone prognostic tool in the clinical setting has not been investigated in an independent cohort. Survival analysis showed an association between CMS and both progression-free and overall survival in our cohort, and this was largely due to the difference between CMS4 and the other CMS classes. However, after adjusting for age and sex, CMS4 was not an independent prognostic marker for survival in this study. Including both TNM stage and CMS in models of overall survival shows that TNM significantly explains mortality independently of age and gender, whereas CMS does not. A potential limitation of the study is the relatively low numbers of CMS4 tumours, as discussed above, and that almost half of the tumours in our cohort are CMS2, and this imbalance may affect the power of our study to detect effects specific to CMS1, 3 and 4.

Clinical management of CRC is usually based on histological staging, with stage 1 tumours conservatively managed with surgery and tumours with nodal or distant metastases (Stage 3 and 4) usually treated with adjuvant chemotherapy. Stage 2 tumours remain a conundrum in terms of prognostication, as approximately 20% of patients with Stage 2 CRC die from the disease [31]. Various factors including acute presentation with obstruction and perforation, histological factors such as perineural and perivascular invasion, as well as high grade, have been used as markers of poor prognosis, and as such indicators for adjunctive postoperative chemotherapy. Further stratification using molecular markers, such as BRAF and KRAS mutations, and MSI status [32] have been investigated with regard to their prognostic potential in this tumour group, but have not widely adopted to direct clinical management.

Molecular subtyping is a cornerstone of precision medicine in cancer treatment, and the mutation status of genes in the EGFR pathways, including RAS genes, PIK3CA, PTEN and BRAF have been shown to predict response to EGFR blockade therapy in CRC [33]. MSI status and the effect of the tumour microenvironment, in particular the amount and type of tumour infiltrating lymphocytes, have more recently been proposed as predictors of response to immunotherapy [34]. To date, although CMS1 tumours encompass a large proportion of MSI positive CRC, no targeted treatment options based solely on CMS have been proposed. In the context of metastatic CRC, CMS appears to associate with survival in clinical trials of patients with wild-type KRAS tumours, treated with anti-EGFR or VEGF inhibitors [35, 36]. However, in primary CRC and outside the clinical trials setting, improved stratification of CRC had not yet been demonstrated using CMS. In this study, we have observed that subtyping of TNM-stratified tumours into CMS could improve prognostication for Stage 2 CRC; tumours that were CMS3 subtype had significantly lower overall survival compared to other molecular subtypes. This demonstrates, for the first time, the potential utility of CMS in improving prognostication of CRC in combination with existing methods.

Differential gene expression between Stage 2 patients who died and those who were alive at the end of the follow-up period, identified significant up-regulation of immune-related genes and biologic processes, and tumour-suppressor genes, associated with survival. The importance of the immune microenvironment in tumour progression has been demonstrated in solid tumours, and has been linked to outcome in CRC. Our findings suggest that immune signatures may identify Stage 2 patients with good prognosis, for whom surgery alone may suffice, and conversely a CMS3 signature may identify patients who would benefit from adjuvant chemotherapy/increased surveillance. Further evaluations of the genetic signatures identified in this study, and prospective validation using a more clinically accessible platform e.g. gene panel test, will be necessary to confirm these findings.


Although stratification using CMS does not outperform TNM staging as a prognostic indicator in our cohort, it currently represents the best description of tumour heterogeneity in colorectal cancer at the level of gene expression, and shows promise for future advancement of precision medicine. Our findings also suggest the use of CMS in refining prognostication in the clinically heterogenous Stage 2 colorectal cancer.

Availability of data and materials

All sequencing data will be archived in Sequence Read Archive upon acceptance of the manuscript for publication. Additional information (bioinformatics code and limited patient metadata) will be provided upon reasonable request to the corresponding author.


  1. Wang W, Kandimalla R, Huang H, Zhu L, Li Y, Gao F, et al. Molecular subtyping of colorectal cancer: recent progress, new challenges and emerging opportunities. Semin Cancer Biol. 2018;55:37–52.

    Article  CAS  Google Scholar 

  2. Jass JR. Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology. 2007;50(1):113–30.

    Article  CAS  Google Scholar 

  3. Leggett B, Whitehall V. Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology. 2010;138(6):2088–100.

    Article  CAS  Google Scholar 

  4. Network CGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330–7.

    Article  Google Scholar 

  5. Domingo E, Ramamoorthy R, Oukrif D, Rosmarin D, Presz M, Wang H, et al. Use of multivariate analysis to suggest a new molecular classification of colorectal cancer. J Pathol. 2013;229(3):441–8.

    Article  CAS  Google Scholar 

  6. Alwers E, Jia M, Kloor M, Blaker H, Brenner H, Hoffmeister M. Associations Between Molecular Classifications of Colorectal Cancer and Patient Survival: A Systematic Review. Clin Gastroenterol Hepatol. 2018;17(3):402–10.

    Article  Google Scholar 

  7. Roseweir AK, McMillan DC, Horgan PG, Edwards J. Colorectal cancer subtypes: translation to routine clinical pathology. Cancer Treat Rev. 2017;57:1–7.

    Article  Google Scholar 

  8. Guinney J, Dienstmann R, Wang X, de Reyniès A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nat Med. 2015;21(11):1350–6.

    Article  CAS  Google Scholar 

  9. Rodriguez-Salas N, Dominguez G, Barderas R, Mendiola M, Garcia-Albeniz X, Maurel J, et al. Clinical relevance of colorectal cancer molecular subtypes. Crit Rev Oncol Hematol. 2017;109:9–19.

    Article  Google Scholar 

  10. Purcell RV, Visnovska M, Biggs PJ, Schmeier S, Frizelle FA. Distinct gut microbiome patterns associate with consensus molecular subtypes of colorectal cancer. Sci Rep. 2017;7(1):11590.

    Article  Google Scholar 

  11. Aronesty E. ea-utils: “Command-line tools for processing biological sequencing data”; 2011.

    Google Scholar 

  12. Cox MP, Peterson DA, Biggs PJ. SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinfosrmatics. 2010;11:485.

    Article  Google Scholar 

  13. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9.

    Article  CAS  Google Scholar 

  14. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.

    Article  CAS  Google Scholar 

  15. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125(1–2):279–84.

    Article  CAS  Google Scholar 

  16. Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7.

    Article  CAS  Google Scholar 

  17. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.

    Article  CAS  Google Scholar 

  18. Riihimaki M, Hemminki A, Sundquist J, Hemminki K. Patterns of metastasis in colon and rectal cancer. Sci Rep. 2016;6:29765.

    Article  Google Scholar 

  19. Augestad KM, Bakaki PM, Rose J, Crawshaw BP, Lindsetmo RO, Dorum LM, et al. Metastatic spread pattern after curative colorectal cancer surgery. A retrospective, longitudinal analysis. Cancer Epidemiol. 2015;39(5):734–44.

    Article  CAS  Google Scholar 

  20. Trumpi K, Ubink I, Trinh A, Djafarihamedani M, Jongen JM, Govaert KM, et al. Neoadjuvant chemotherapy affects molecular classification of colorectal tumors. Oncogenesis. 2017;6(7):e357.

    Article  CAS  Google Scholar 

  21. Salazar R, Roepman P, Capella G, Moreno V, Simon I, Dreezen C, et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J Clin Oncol. 2011;29(1):17–24.

    Article  Google Scholar 

  22. Li L, Li W. Epithelial-mesenchymal transition in human cancer: comprehensive reprogramming of metabolism, epigenetics, and differentiation. Pharmacol Ther. 2015;150:33–46.

    Article  CAS  Google Scholar 

  23. Li H, Courtois ET, Sengupta D, Tan Y, Chen KH, Goh JJL, et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat Genet. 2017;49(5):708–18.

    Article  CAS  Google Scholar 

  24. Arnadottir SS, Jeppesen M, Lamy P, Bramsen JB, Nordentoft I, Knudsen M, et al. Characterization of genetic intratumor heterogeneity in colorectal cancer and matching patient-derived spheroid cultures. Mol Oncol. 2018;12(1):132–47.

    Article  CAS  Google Scholar 

  25. Dunne PD, McArt DG, Bradley CA, O'Reilly PG, Barrett HL, Cummins R, et al. Challenging the Cancer molecular stratification dogma: Intratumoral heterogeneity undermines consensus molecular subtypes and potential diagnostic value in colorectal Cancer. Clin Cancer Res. 2016;22(16):4095–104.

    Article  CAS  Google Scholar 

  26. Ogino S, Nosho K, Kirkner GJ, Kawasaki T, Meyerhardt JA, Loda M, et al. CpG island methylator phenotype, microsatellite instability, BRAF mutation and clinical outcome in colon cancer. Gut. 2009;58(1):90–6.

    Article  Google Scholar 

  27. Marzouk O, Schofield J. Review of histopathological and molecular prognostic features in colorectal cancer. Cancers (Basel). 2011;3(2):2767–810.

    Article  Google Scholar 

  28. Sinicrope FA, Mahoney MR, Yoon HH, Smyrk TC, Thibodeau SN, Goldberg RM, et al. Analysis of molecular markers by anatomic tumor site in stage III Colon carcinomas from adjuvant chemotherapy trial NCCTG N0147 (Alliance). Clin Cancer Res. 2015;21(23):5294–304.

    Article  CAS  Google Scholar 

  29. Smeby J, Sveen A, Merok MA, Danielsen SA, Eilertsen IA, Guren MG, et al. CMS-dependent prognostic impact of KRAS and BRAFV600E mutations in primary colorectal cancer. Ann Oncol. 2018;29(5):1227–34.

    Article  CAS  Google Scholar 

  30. Sveen A, Johannessen B, Tengs T, Danielsen SA, Eilertsen IA, Lind GE, et al. Multilevel genomics of colorectal cancers with microsatellite instability-clinical impact of JAK1 mutations and consensus molecular subtype 1. Genome Med. 2017;9(1):46.

    Article  Google Scholar 

  31. Compton CC. Optimal pathologic staging: defining stage II disease. Clin Cancer Res. 2007;13(22 Pt 2):6862s–70s.

    Article  Google Scholar 

  32. Dienstmann R, Mason MJ, Sinicrope FA, Phipps AI, Tejpar S, Nesbakken A, et al. Prediction of overall survival in stage II and III colon cancer beyond TNM system: a retrospective, pooled biomarker study. Ann Oncol. 2017;28(5):1023–31.

    Article  CAS  Google Scholar 

  33. Sepulveda AR, Hamilton SR, Allegra CJ, Grody W, Cushman-Vokoun AM, Funkhouser WK, et al. Molecular biomarkers for the evaluation of colorectal Cancer: guideline from the American Society for Clinical Pathology, College of American Pathologists, Association for Molecular Pathology, and American Society of Clinical Oncology. Arch Pathol Lab Med. 2017;141(5):625–57.

    Article  CAS  Google Scholar 

  34. Xiao Y, Freeman GJ. The microsatellite instable subset of colorectal cancer is a particularly good candidate for checkpoint blockade immunotherapy. Cancer Discov. 2015;5(1):16–8.

    Article  CAS  Google Scholar 

  35. Lenz HJ, Ou FS, Venook AP, Hochster HS, Niedzwiecki D, Goldberg RM, et al. Impact of Consensus Molecular Subtype on Survival in Patients With Metastatic Colorectal Cancer: Results From CALGB/SWOG 80405 (Alliance). J Clin Oncol. 2019.

    Article  Google Scholar 

  36. Mooi JK, Wirapati P, Asher R, Lee CK, Savas P, Price TJ, et al. The prognostic impact of consensus molecular subtypes (CMS) and its predictive effects for bevacizumab benefit in metastatic colorectal cancer: molecular analysis of the AGITG MAX clinical trial. Ann Oncol. 2018;29(11):2240–6.

    CAS  PubMed  Google Scholar 

Download references


The authors would like to thank Helen Morrin at the Cancer Society Tissue Bank, Christchurch, and the patients involved for generously participating in this study.


Maurice and Phyllis Paykel Trust.

Gut Cancer Foundation (NZ), with support from the Hugh Green Foundation.

Colorectal Surgical Society of Australia and New Zealand (CSSANZ).

The Health Research Council of New Zealand.

The funding bodies had no role in the design of the study or collection, analysis, and interpretation of data or in writing the manuscript.

Author information

Authors and Affiliations



RP carried out sequencing preparation of tumour samples, and was a major contributor to study design and manuscript writing. SS carried out bioinformatics analysis and preparation of figures and contributed to manuscript writing. YL was involved in collated patient data and statistical analysis. JP was involved in study design, bioinformatics and data analysis, and manuscript preparation. FF was involved in study design and clinical aspects of the study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rachel V. Purcell.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was granted by the University of Otago Human Ethics Committee (ethics approval number: H16/037). Study participants gave informed written consent, and the study was performed in accordance with the Declaration of Helsinki.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1:

Table S1. Patient demographics and clinical characteristics. Table summarizing clinical and patient characteristics of the cohort analysed in the study. Table S2. Comparison of proportions of Consensus Molecular Subtypes (CMS) and associated gene signatures of colorectal cancers (CRC) from Colorectal Cancer Subtyping Consortium8 and the current New Zealand CRC Predict study. Table summarizing the percentage of the study cohort assigned to each CMS, and the associated gene signatures, compared to the original CMS publication. Figure S1. Overview of enriched biological terms per subtype for differentially up-regulated genes. Illustration of the main molecular signatures associated with each CMS in our cohort

Additional file 2:

Table S3. Differentially upregulated genes associated with increased survival in Stage 2 colorectal tumours

Additional file 3:

Table S4. Differentially upregulated genes associated with decreased survival in Stage 2 colorectal tumours

Additional file 4:

Table S5. Differentially upregulated biologic pathways associated with increased survival in Stage 2 colorectal tumours

Additional file 5:

Table S6. Differentially upregulated biologic pathways associated with decreased survival in Stage 2 colorectal tumours

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Purcell, R.V., Schmeier, S., Lau, Y.C. et al. Molecular subtyping improves prognostication of Stage 2 colorectal cancer. BMC Cancer 19, 1155 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: