This comprehensive and simultaneous analysis of 7 prognostic scores was performed on a large, well-characterized and homogeneous population of 250 breast cancer patients with BM. This study examined three common scores, namely the RPA, the GPA, and the BS-BM, as well as four new scores incorporating biological or breast-specific parameters: the breast RPA, the breast GPA, Le Scodan’s score, and the P1PS. With respect to other scoring systems, the Rotterdam score was not investigated since it uses, as a prognostic variable, the clinical response to steroid therapy prior to panencephalic radiotherapy, which is a subjective information not necessarily collected in clinical observations . In the same way, neither the volume of the largest BM, nor the time between BM diagnosis and the beginning of radiotherapy were available to calculate the SIR  and Rades  scores, respectively.
Until recently, there have been few studies focusing on BM prognostic scores in breast cancer. Yet, it has been demonstrated that the reliability and clinical relevance of these scores vary greatly depending on the type of primary tumor. Sperduto et al. found that, in a population of 4,259 patients with 642 breast cancers, the GPA was unfit not only for breast tumor, but also for gastrointestinal, melanoma, and renal cell cancer . Similarly, the widely used RPA index has some limitations in breast disease as it does not consider specific tumor markers, such as the status of HR and HER2. Moreover, the description of extra-cerebral disease is probably not the best suited variable for this pathology, since the prognosis of women with bone metastases or locoregional recurrences differs from that of patients with liver or lung metastases. Recently, efforts have been made to improve accuracy of previous classifications by taking into account breast cancer biomarkers. As such, the GPA score has been replaced by a score specific to breast cancer integrating the status of both HER2 and HR . Likewise, Le Scodan’s score, including the breast cancer molecular subtype and treatment parameters, has been proposed from a retrospective analysis of a selected population of patients presenting with advanced disease .
Overall, our results indicated that the different scores were able to discriminate the prognosis of patients, which is in keeping with the analysis of Nieder et al. who compared a variety of prognostic classifications from all published trials performed on more than 20 patients . However, the new classifications failed to improve patient selection, with the Breast GPA and Breast RPA scores showing lower Harrell’s concordance indexes than the original RPA score. The diversity of populations between studies might explain discrepancies in results and makes generalization difficult. Indeed, the patients analyzed in the Breast GPA pivotal study did not reflect daily clinical practice since 62% of patients presented 1 to 3 BM, 35% had BM without extra-cranial metastases, 37% were aged less than 50 years, 57% had tumors overexpressing HER2 receptor, and 68% of patients received targeted local treatments, which probably explains an impressively good survival (13.8 months). Regarding the results from the Breast RPA pivotal study, in comparison of our study population, the irradiation of 98% of the population represents a selection bias related to the treatment received after BM diagnosis compared to a general clinical practice situation . Contrary to previous indexes, Le Scodan’s score had an independent prognostic value in multiparametric analysis, emphasizing the importance of biological subtypes and blood parameters . However, the drawback is that the definition of biological subtype varies depending on the author. Le Scodan et al. distinguished between HER2 positive population treated with trastuzumab and triple negative breast cancer , while Sperduto et al. and Niwinska et al. distinguished between luminal A, B, HER2, and basal tumors. In these last two studies, 77% and 50% of the HER2+ population were treated using anti-HER2 agents, respectively. It would have been interesting to integrate, as did Le Scodan, the anti-HER2 treatment in the biological subtype since there is increasing evidence that anti-HER2 treatments prolong survival of breast cancer patients with BM [9–11, 20]. Biological parameters, such as lymphopenia for Le Scodan’s score and LDH and proteinemia for the P1PS , have been shown to have an independent prognostic value on multiparametric analysis and thus warrant further evaluation. Evaluating subclinical disease activity and the impact on nutritional status may confer additional prognostic information.
One of the strengths of our study is to reflect routine clinical practice population, without selection based on performance status, number of metastases or treatment. This is essential to provide physicians with a clinical tool applicable to the whole patient population at the time of BM diagnosis. According to our analysis, the RPA score can still be considered as the reference score for several reasons. Firstly, although Harrell’s concordance Indexes were quite similar for all PIs, the hazard ratio of the RPA was higher than those of other PIs in multivariate analysis. Our results were consistent with those reported by (i) Le Scodan et al. and Mahmoud-Ahmed et al. who confirmed the prognostic value of the RPA score in the setting of BM from breast cancer (ii) Viani et al. who found a superiority of the RPA score over the BS-BM one . Secondly, one must keep in mind the primary goal of these classifications which is to adapt treatment options to the individual patient prognosis. We need to mitigate the treatment burden for patients with short life expectancy, and conversely to intensify therapeutic interventions for patients for whom an improvement in overall survival is expected. Hence, it is important to know how often the prognostic scores wrongly categorize patients in inappropriate prognosis groups. Nieder et al. studied their ability to correctly classify patients with good prognosis (MOS longer than 6 months from the diagnosis of BM) and patients with poor prognosis (MOS shorter than 2 months from the diagnosis of BM) . In our study, the MOS was 8.9 months and 40% of the population was alive at 1 year, so we decided to adapt the cut offs used by Nieder to our study population, and we considered boundaries to be a MOS of less than 3 months and a MOS of more than 12 months. In these circumstances, the RPA proved to be more efficient than the other scores to predict median survival since 85% of patients classified as RPA 1 survived more than 12 months, and 62% of patients classified as RPA 3 survived less than 3 months. Furthermore, the RPA misclassified a smaller proportion of patients than the other scoring systems as no patients classified RPA 1 survived less than 3 months and only 3% of patients classified as RPA 3 survived more than 12 months.
A particular weakness of some of the classification systems is the lack of homogeneous distribution of patients between the different prognostic categories. Indeed, a score that would identify a subgroup with excellent prognosis in a very small number of patients, a situation rarely seen in clinical practice, would have limited impact to aid therapeutic decision making in routine practice. This is one of the pitfalls of the GPA scoring since the class 3.5-4 of better prognosis accounts only for 2.8% of our daily clinical practice population. Finally, an ideal prognostic score should be simple and easily usable in clinical practice. Our analysis at this stage differs from that of Sperduto et al. in so far that we believe that the RPA score is more readily reproducible in practice thanks to a limited number of variables to be collected and fewer prognostic classes.
Nevertheless, due to its retrospective nature, our study suffers some limitations. First, in retrospective analysis, it could be difficult to assess controlled versus uncontrolled distant metastases. As this information is required in Breast RPA prognostic index, the retrospective analysis of this factor could have misclassified some patients. Similarly, a retrospective evaluation of KPS appears less reliable than the evaluation of Performance Status using ECOG classification, and could have led to some degrees of misclassification.