Comparative performances of prognostic indexes for breast cancer patients presenting with brain metastases

Background Several prognostic indexes (PI) have been developed in the brain metastases (BM) setting to help physicians tailor treatment options and stratify patients enrolled in clinical studies. The aim of our study was to compare the clinical relevance of the major PI for breast cancer BM. Methods Clinical and biological data of 250 breast cancer patients diagnosed with BM at two institutions between 1995 and 2010 were retrospectively reviewed. The prognostic value and accuracy of recursive partitioning analysis (RPA), graded prognostic assessment (GPA), basic score for BM (BS-BM), breast RPA, breast GPA, Le Scodan’s Score and a clinico-biological score developed in a phase I study (P1PS) were assessed using Cox regression models. PI comparison was performed using Harrell’s concordance index. Results After a median follow-up of 4.5 years, median overall survival (OS) from BM diagnosis was 8.9 months (CI 95%, 6.9–10.3 months). All PI were significantly associated with OS. Harrell’s concordance indexes C favored BS-BM and RPA. In multivariate analysis, the RPA, Le Scodan’s score and GPA were found to be the best independent predictors of OS. In multivariate analysis restricted to the 159 patients with known LDH and proteinemia, RPA 2 and 3, Le Scodan’s Score 3 and P1PS 2/3 were associated with worse survival. RPA was the most accurate score to identify patients with long (superior to 12 months) and short (inferior to 3 months) life expectancy. Conclusions RPA seems to be the most useful score and performs better than new PI for breast cancer BM.


Background
The Recursive Partitioning Analysis RPA [1] was the first prognostic score developed in the brain metastases (BM) setting. This classification was created in 1997 by the Radiation Therapy Oncology Group after analysis of the relative contributions of pretreatment variables to survival of patients with BM. Since this date, several scores and prognostic indexes (PI), such as the Graded Prognosis Assessment (GPA) [2], the Basic Score for BM (BS-BM) [3], the Phase 1 Prognostic Score (P1PS) [4], the Rotterdam score [5], the Score Index for Radiosurgery (SIR) [6] and the Rades's score [7] have been developed both to help physicians tailor treatment options depending on patient prognosis, and to stratify patients enrolled in clinical studies. However, it has been demonstrated that the prognostic value of these scoring systems differs according to the primary tumor site [8], which raises the question of the usefulness of a breast-specific score.
Breast cancer is the second cause of BM, after lung cancer. Breast cancer is a heterogeneous disease with metastatic pattern and survival varying with the expression of biological markers such as the hormonal receptor (HR) status and human epidermal growth factor receptor-2 (HER2) overexpression. While the incidence of BM from breast cancer has increased over the past decade, especially for the subgroup of HER2-overexpressing tumors, several studies have shown that biological subtypes influence survival, even after BM diagnosis. In a series of 223 breast cancer patients irradiated for BM, Dawood et al. showed that HER2 positive status was an independent favorable prognostic factor [9]. On the contrary, the triple negative population seems to be associated with worse prognosis [10,11]. These results have prompted the development of specific prognostic scores for BM from breast cancer taking into account either tumor phenotypic characteristics [12,13] or not [14]. Given the number of scoring systems that have been devised for clinical use, the aim of our study was to compare the clinical relevance of the major existing prognostic scores in a cohort of breast cancer patients with BM and known HER2 and HR status.

Study population
Medical records of breast cancer patients with BM were retrospectively extracted from the databases of two French cancer centers. Patients were accrued over a 15year period, between 1995 and 2010. Inclusion criteria were as follows: histologically proven breast carcinoma, intradural BM detected by contrast-enhanced cerebral computed tomography or magnetic resonance imaging, and known HR and HER2 status. The tumor was considered HR positive when more than 10% of cells were labeled in immunohistochemistry (IHC) or when the concentrations of estrogen and progesterone receptors were above 10 ng/ml and 50 ng/ml using the radioligand binding method, respectively. The tumor was considered HER2 positive if the primary tumor was scored 3+ by IHC or if the HER2 gene was amplified by fluorescence in situ hybridization (FISH). If the tumor was scored 2+ by IHC, it was re-analyzed using FISH. Patients with history of other primitive carcinoma or leptomeningeal carcinomatosis were excluded. In addition, an additional brain MRI was performed to all patient presenting with 1 to 3 BM at baseline CT-scan. Clinical data and, when available, biological parameters were extracted in order to score patients using the RPA [1], the GPA [2], the BS-BM [3], the P1PS [4], the Breast-GPA [12], the Breast-RPA [14] and Le Scodan's score [13], whose constituting parameters are detailed in Table 1. Ethical approval, as well as permission to create, complete and access the comprehensive database used in this study, was provided by the local research ethics committee of the Val d' Aurelle Cancer Institute. Due to the retrospective, non interventional nature of this study, no consent was requested by the local research ethics committee.   [15], and presented with their 95% confidence intervals (95% CIs). The median length of follow-up was estimated using a reverse Kaplan-Meier method and presented with 95% CIs. Pair wise comparisons of subgroups were performed for each score. Survival curves were drawn and the logrank test was performed to assess differences between groups. Harrell's concordance Index (C index) was used to assess the discriminating ability of the different PIs [16]. To investigate prognostics factors, multivariate analyses were carried out using the Cox's proportional hazards regression model with a stepwise selection procedure [17,18]. Hazard ratios (HR) with 95% CIs are presented to display risk reductions. All p values reported are two-sided, and the significance level was set at 5% (p < 0.05). Statistical analysis was performed using the STATA 11 software (Stata Corporation, College Station, TX).

Patient characteristics
There were a total of two hundred and fifty patients included in this analysis. Patient characteristics are detailed in Table 2. At the time of BM diagnosis, the median age was 55 years (range 25-85), and 74% of patients had good performance status (80-100). The brain was the first metastatic site in about one third of patients (34%), and the only site of metastatic disease in 12% of patients. Of the 250 patients, 44% had a primary tumor that over-expressed HER2, while 26% were diagnosed with a triple negative breast cancer (negative HR and HER2 status). A total of 47 patients (18.8%) underwent targeted local treatment, namely stereotactic radiotherapy or surgery. Whole brain radiation therapy (WBRT), used as primary treatment but also as adjuvant treatment after localized treatment, was given to 217 patients (86.8%). Fifteen patients received best supportive care only. After a median follow-up of 4.5 years, the median OS (MOS) was 8.9 months (95% CI, 6.9-10.3 months). The six-month, one-year and two-year overall survival rates were 61% (95% CI, 54-67%), 40% (95% CI, 34-46%) and 22% (95% CI, 17-27%), respectively. Table 3 lists the study population distribution as well as the MOS for each PI. Survival curves are depicted in Figure 1. The results showed that all scores were able to discriminate with statistical significance (p < 0.001) patients for OS according to the prognostic category.

Discussion
This comprehensive and simultaneous analysis of 7 prognostic scores was performed on a large, wellcharacterized and homogeneous population of 250 breast cancer patients with BM. This study examined three common scores, namely the RPA, the GPA, and the BS-BM, as well as four new scores incorporating biological or breast-specific parameters: the breast RPA, the breast GPA, Le Scodan's score, and the P1PS. With respect to other scoring systems, the Rotterdam score was not investigated since it uses, as a prognostic variable, the clinical response to steroid therapy prior to panencephalic radiotherapy, which is a subjective information not necessarily collected in clinical observations [5]. In the same way, neither the volume of the largest BM, nor the time between BM diagnosis and the beginning of radiotherapy were available to calculate the SIR [6] and Rades [7] scores, respectively.
Until recently, there have been few studies focusing on BM prognostic scores in breast cancer. Yet, it has been demonstrated that the reliability and clinical relevance of these scores vary greatly depending on the type of primary tumor. Sperduto et al. found that, in a population of 4,259 patients with 642 breast cancers, the GPA was unfit not only for breast tumor, but also for gastrointestinal, melanoma, and renal cell cancer [8]. Similarly, the widely used RPA index has some limitations in breast disease as it does not consider specific tumor markers, such as the status of HR and HER2. Moreover, the description of extra-cerebral disease is probably not the best suited variable for this pathology, since the prognosis of women with bone metastases or locoregional recurrences differs from that of patients with liver or lung metastases. Recently, efforts have been made to improve accuracy of previous classifications by taking into account breast cancer biomarkers. As such, the GPA score has been replaced by a score specific to breast cancer integrating the status of both HER2 and HR [12]. Likewise, Le Scodan's score, including the breast cancer molecular subtype and treatment parameters, has been proposed from a retrospective analysis of a selected population of patients presenting with advanced disease [12].
Overall, our results indicated that the different scores were able to discriminate the prognosis of patients, which is in keeping with the analysis of Nieder et al. who compared a variety of prognostic classifications from all published trials performed on more than 20 patients [19]. However, the new classifications failed to improve patient selection, with the Breast GPA and Breast RPA scores showing lower Harrell's concordance indexes than the original RPA score. The diversity of populations between studies might explain discrepancies in results and makes generalization difficult. Indeed, the patients analyzed in the Breast GPA pivotal study did not reflect daily clinical practice since 62% of patients presented 1 to 3 BM, 35% had BM without extra-cranial metastases, 37% were aged less than 50 years, 57% had tumors overexpressing HER2 receptor, and 68% of patients received targeted local treatments, which probably explains an impressively good survival (13.8 months). Regarding the results from the Breast RPA pivotal study, in comparison of our study population, the irradiation of 98% of the population represents a selection bias related to the treatment received after BM diagnosis compared to a general clinical practice situation [14]. Contrary to previous indexes, Le Scodan's score had an independent prognostic value in multiparametric analysis, emphasizing the importance of biological subtypes and blood parameters [13]. However, the drawback is that the definition of biological subtype varies depending on the author. Le Scodan et al. distinguished between HER2 positive population treated with trastuzumab and triple negative breast cancer [13], while Sperduto et al. [12] and Niwinska et al. [14] distinguished between luminal A, B, HER2, and basal tumors.
In these last two studies, 77% and 50% of the HER2+ population were treated using anti-HER2 agents, respectively. It would have been interesting to integrate, as did Le Scodan, the anti-HER2 treatment in the biological subtype since there is increasing evidence that anti-HER2 treatments prolong survival of breast cancer patients with BM [9][10][11]20]. Biological parameters, such as lymphopenia for Le Scodan's score and LDH and proteinemia for the P1PS [4], have been shown to have an independent prognostic value on multiparametric analysis and thus warrant further evaluation. Evaluating subclinical disease activity and the impact on nutritional status may confer additional prognostic information.
One of the strengths of our study is to reflect routine clinical practice population, without selection based on performance status, number of metastases or treatment. This is essential to provide physicians with a clinical tool applicable to the whole patient population at the time of BM diagnosis. According to our analysis, the RPA score can still be considered as the reference score for several reasons. Firstly, although Harrell's concordance Indexes were quite similar for all PIs, the hazard ratio of the RPA was higher than those of other PIs in multivariate analysis. Our results were consistent with those reported by (i) Le Scodan et al. [21] and Mahmoud-Ahmed et al. [22] who confirmed the prognostic value of the RPA score in the setting of BM from breast cancer (ii) Viani et al. who found a superiority of the RPA score over the BS-BM one [23]. Secondly, one must keep in mind the primary goal of these classifications which is to adapt treatment options to the individual patient prognosis. We need to mitigate the treatment burden for patients with short life expectancy, and conversely to intensify therapeutic interventions for patients for whom an improvement in overall survival is expected. Hence, it is important to know how often the prognostic scores wrongly categorize patients in inappropriate prognosis groups. Nieder et al. studied their ability to correctly classify patients with good prognosis (MOS longer than 6 months from the diagnosis of BM) and patients with poor prognosis (MOS shorter than 2 months from the diagnosis of BM) [24]. In our study, the MOS was 8.9 months and 40% of the population was alive at 1 year, so we decided to adapt the cut offs used by Nieder to our study population, and we considered boundaries to be a MOS of less than 3 months and a MOS of more than 12 months. In these circumstances, the RPA proved to be more efficient than the other scores to predict median survival since 85% of patients classified as RPA 1 survived more than 12 months, and 62% of patients classified as RPA 3 survived less than 3 months. Furthermore, the RPA misclassified a smaller proportion of patients than the other scoring systems as no patients classified RPA 1 survived less than 3 months and only 3% of patients classified as RPA 3 survived more than 12 months.
A particular weakness of some of the classification systems is the lack of homogeneous distribution of patients between the different prognostic categories. Indeed, a score that would identify a subgroup with excellent prognosis in a very small number of patients, a situation rarely seen in clinical practice, would have limited impact to aid therapeutic decision making in routine practice. This is one of the pitfalls of the GPA scoring since the class 3.5-4 of better prognosis accounts only for 2.8% of our daily clinical practice population. Finally, an ideal prognostic score should be simple and easily usable in clinical practice. Our analysis at this stage differs from that of Sperduto et al. [2] in so far that we believe that the RPA score is more readily reproducible in practice thanks to a limited number of variables to be collected and fewer prognostic classes.
Nevertheless, due to its retrospective nature, our study suffers some limitations. First, in retrospective analysis, it could be difficult to assess controlled versus uncontrolled distant metastases. As this information is required in Breast RPA prognostic index, the retrospective analysis of this factor could have misclassified some patients. Similarly, a retrospective evaluation of KPS appears less reliable than the evaluation of Performance Status using ECOG classification, and could have led to some degrees of misclassification.

Conclusion
The new PIs did not perform better than the original scores. Although tumor subtypes, HER2 expression, and blood parameters (LDH, proteinemia, lymphopenia) may have an interesting additional prognostic value, the RPA appears to be the most appropriate and simplest available tool to help clinicians select breast cancer patients with BM.