Even though several breast cancer prognostic signatures have been published, the study described here is notable for several reasons. The use of untreated patients for the training and test sample sets permits unequivocal identification of prognostic genes that are not confounded with response genes, thereby providing insight into pathway directed therapies and opportunities for basic research. The prognostic signature does not contain ER, ER-responsive genes or HER2 and therefore circumvents the expressed concern that expression signatures should provide information independent of these valuable and routinely tested IHC markers. In addition, we have shown that the signature provides additional information than the commonly used Ki67 proliferation marker. This signature is expected to be generalizable given the consistent results observed in the geographically diverse sample sets. Our results further suggest that the prognostic score from untreated patients retains its prognostic value in tamoxifen-treated patients. The relatively small number of genes in the described signature will facilitate follow up functional studies in support of their mechanistic role in distant metastasis. Finally, the relatively small number of genes in this prognostic signature, which does not depend on a complex algorithm, coupled with the wide-spread use of fixed tissue and familiarity of RT-PCR should facilitate the broader transfer of these types of analyses to multiple testing laboratories as well as facilitate submission of in vitro diagnostic products to regulatory agencies.
We selected genes from 3 previously reported prognostic gene signatures plus ER-related genes and analyzed the expression of 197 genes in a training set of non-systemically treated, N-, T1/T2 (≤ 3 cm), ER+, breast cancer patients. A subset of 14 genes, found to be prognostic for breast cancer, was used to generate a metastasis score (MS) to quantify risk for individuals at different timeframes as well as dichotomize samples into high and low risk groups. Following initial selection and analysis within the training set, we validated the expression signature on an independent sample set using the precise dichotomized cutoff of the training set. Performance characteristics of the signature in the training and validation sets were similar. Univariate and multivariate hazard ratios to predict DMFS were 4.34 and 3.16 in the training set and 4.71 and 4.02 for the validation set, respectively. In multivariate analysis, only the metastasis score remained significant. The 14-gene prognostic signature also predicts overall survival with univariate and multivariate hazard ratios of 2.48 and 2.00 in the training and 2.26 and 1.97 in the validation set, respectively. When comparing the predictive accuracy with a commonly used Adjuvant! Online, the areas under the ROC curves were slightly higher for the 14-gene signature classification than for the Adjuvant! classification indicating MS may provide additional diagnostic value.
We were curious whether the signature developed in patients without systemic therapy would be predictive in tamoxifen-treated patients. In a study of a small number of tamoxifen-treated women, the signature predicted two risk groups using the same single cutpoint as for untreated patients, but the results only trended to significance due likely to sample size. Since tamoxifen treatment only reduces distant recurrence by approximately 30%, larger data sets will be required to discern the prognostic nature of the signature in women who do and do not respond to tamoxifen.
Several investigators [35, 36] have queried whether molecular expression scores provide discrete information to those routinely provided by single or composite pathological prognostic tests already routinely provided. As an example, Ki-67 LI determined proliferation status has been reported in numerous individual studies as well as a meta-analysis study to be a prognostic factor for recurrence-free and disease-specific survival [37–44]. We tested Ki-67 LI because of the strength of reports in literature and availability in the training set. Ki-67 labeling index was predictive for recurrent disease; however, after adjustment for the metastatic expression signature this often used marker lost significance. As with two previous reports (Potemski et al  and Tan et al ) we did not find a strong correlation between the Ki-67 LI full range of staining and the mRNA levels of this gene.
The 14 upregulated genes represent a unique signature and do not fully overlap with any of the original 3 signatures from which the genes were selected. Three proliferation genes (BUB1, CCNB1 and MYBL2) highlighted in Whitfield et al  appear in the 14 gene signature described here but only MYBL2 overlaps with the p53 status signature recently reported by Miller at al . Even though the TP53 genes have not appeared in lists of proliferation genes, network analysis of the genes of the proliferation signature described here is suggestive of network involvement (see Additional file 11). The signature lacks the ER and PgR genes. The absence of these hormonal receptors is not unexpected given that these genes have been reported to be weakly prognostic in untreated patients. The majority of the genes in the signature are involved in processes associated with tumor growth such as DNA replication (BUB1, CCNB1, CENPA, ORC6L, RFC4, TK1), cell cycle control (BUB1, CCNB1, MYBL2, ORC6L, PKMYT1, RACGAP1), cellular assembly and organization (BUB1, CCNB1, CENPA, DIAPH3), and ubiquitination (UBE2S). Many of the genes in the signature have been implicated in cancers. The known and inferred role of these genes in cell proliferation is consistent with their contribution to the disease process. While the 14-gene tumor expression profile reported here has practical importance in classifying distant metastasis as an outcome in patients with operable, invasive breast cancer, the identification of prognostically relevant gene pathways has ramifications for targeted therapy in the future, with applications to conventional cytotoxic drugs and novel experimental therapies [47–49].
The sample population and the experimental approaches we employed vary in some aspects from previously reported studies. First, the signature was developed and validated on FFPE samples from non-systemically treated breast cancer patients to capture solely prognostic information without confounding by genes that may play a role in recurrence and/or response to treatment. In contrast, Oncotype Dx  was trained in tamoxifen-treated patient samples – which may have contributed to the identification of ER and PgR as important markers. As discussed by Hayes , ER and ER-related genes are known to be positive predictors of endocrine therapy but only weakly prognostic. Second, our study population has a broad distribution of age covering both pre- and postmenopausal women that is representative of a typical breast cancer patient population. In comparison, the MammaPrint signature [1, 2] was developed using samples from primarily younger women and the Oncotype DX signature  was developed using clinical trial samples. Third, the number and equal weighting of each of the genes of the signature permits more focused follow-up mechanistic studies. Fourth, the long duration of follow-up in the validation set allows quantification of risk over different time frames as well as categorizing risk into different groups. This is important as individuals differ substantially in their risk tolerance and time horizon concern. Fifth, the signature was developed on FFPE samples and expression analysis was performed using RT-PCR. This sample type enables analysis of archived sections that have extended outcome data as well as present day specimens that are routinely processed in a similar manner. Gene signatures developed on frozen tissues (for example, MammaPrint and wound response signatures ) would require a change in present sample collection and storage. Finally, clinical data reported by Esteva  suggest that a multigene expression profile assay, trained on tamoxifen treated samples, may not necessarily classify the risk of recurrent disease in patients with N(-) breast cancer who do not receive adjuvant tamoxifen or chemotherapy. The 14-gene prognostic signature reported here was developed on untreated patient samples, and as suggested by one of the referees, one potential implication of the current study is that the 14-gene expression signature may identify a low-risk patient-group with hormone receptor-positive breast cancer, whose predicted absolute survival benefit from systemic adjuvant therapy is so low that a woman, armed with this prognostic information, may favor the avoidance of the occasionally troublesome side effects of endocrine therapy.
The reported study has limitations. In order to identify a cohort of non-systemically treated patients, it was necessary to assemble samples from patients before tamoxifen became a routine treatment option. As a result, the samples in this cohort may not represent ER+ breast cancer patients today. In this study, we used a retrospective population-based cohort study design. While a cohort study is expected to have fewer hidden confounders and biases than a case-control study, we cannot exclude the presence of masked bias. Further, population-based cohorts have less uniformity than patients from the controlled setting of clinical trials. On the other hand, such studies are likely to be more representative of a community setting in which the molecular prognostic assay would be applied .