The neurocognitive function change criteria after whole-brain radiation therapy for brain metastasis, in reference to health-related quality of life changes: a prospective observation study

Background We sought to construct the optimal neurocognitive function (NCF) change criteria sensitive to health-related quality of life (HR-QOL) in patients who have undergone whole-brain radiation therapy (WBRT) for brain metastasis. Methods We categorized the patients by the changes of NCF into groups of improvement versus deterioration if at least one domain showed changes that exceeded the cut-off while other domains remained stable. The remaining patients were categorized as stable, and the patients who showed both significant improvement and deterioration were categorized as ‘both.’ We examined the clinical meaning of NCF changes using the cut-off values 1.0, 1.5, and 2.0 SD based on the percentage of patients whose HR-QOL changes were ≥ 10 points. Results Baseline, 4-month and 8-month data were available in 78, 41 (compliance; 85%), and 29 (81%) patients, respectively. At 4 months, improvement/stable/deterioration/both was seen in 15%/12%/41%/32% of the patients when 1.0 SD was used; 19%/22%/37%/22% with 1.5 SD, and 17%/37%/37%/9% with 2.0 SD. The HR-QOL scores on the QLQ-C30 functional scale were significantly worse in the deterioration group versus the others with 1.0 SD (p = 0.013) and 1.5 SD (p = 0.015). With 1.5 SD, the HR-QOL scores on the QLQ-BN20 was significantly better in the improvement group versus the others (p = 0.033). However, when ‘both’ was included in ‘improvement’ or ‘deterioration,’ no significant difference in HR-QOL was detected. Conclusions The NCF cut-off of 1.5 SD and the exclusion of ‘both’ patients from the ‘deterioration’ and ‘improvement’ groups best reflects HR-QOL changes.


Background
Brain metastases are a common sequela of cancer, occurring in approx. 24-45% of all cancer patients [1]. Neurocognitive function (NCF) is considered to reflect the status of the brain tumor burden as well as the degree of the adverse radiation effect on the brain [2,3]. The combination of the Hopkins Verbal Learning Test-Revised (HVLT-R), the Trail-Making Test (TMT), and the Control Oral Word Association Test (COWAT) is a standardized neurocognitive battery, proposed by the Response Assessment in Neuro-Oncology (RANO) Group [4].
However, NCF change score criteria are not uniform across studies [5], and the proportion of patients reported to be suffering from neurocognitive deterioration after whole-brain radiation therapy (WBRT) has thus varied widely among studies, from 52 to 91.7% [6,7]. It is also difficult to interpret whether the deterioration at 3-4 months after radiation is clinically meaningful, because of numerous confounding factors. We examined the validity of SD (1.0, 1.5, or 2.0) [6,8], which are employed in the previous major trials.
We designed an observational study to address these issues, using successively collected NCF and health-related quality of life (HR-QOL) data in patients who underwent WBRT for brain metastases. Our study objectives were to clarify the appropriate NCF change criteria reflecting clinically meaningful changes in HR-QOL. This is the first-ever study using the Japanese version of the RANO-proposed battery for the assessment of NCF and HR-QOL.

Study population and treatment
Patients were eligible if they were ≥ 18 years old and had one or more brain metastases and were scheduled to undergo WBRT. The recruitment of patients took place between April 2012 and March 2017. The exclusion criteria were: Karnofsky Performance Status (KPS) < 60, and severe neurological deficits that could interfere with the administration of the NCF and HR-QOL examinations. Written informed consent was obtained from all patients. This study was approved by the Institutional Review Board (Study #1449).

The assessment of neurocognitive function and healthrelated quality of life
The patients' NCF was assessed using the Japanese versions of the HVLT-R, TMT, and COWAT. The HVLT-R consists of three domains, i.e., Total Recall (TR), Delayed Recall (DR), and Delayed Recognition (DRec), which are related to immediate and learning memory, delayed memory, and recognition, respectively. Parts A and B of the TMT (TMT-A) and (TMT-B) assess an individual's processing speed and executive function, respectively. The COWAT assesses semantic fluency.
We converted the patients' raw individual test scores into standardized z-scores by using the means and standard deviations (SDs) of individually age-and gender-matched healthy controls [9,10]. We selected changes in the z-score of ≥1.0, 1.5, and 2.0 of SD to examine how the differences in the cut-off value may affect the HR-QOL values [6,8,11].
The patients' HR-QOL was assessed with two meansures: the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 (EORTC QLQ-C30) ver. 3.0 [12] and the Brain Cancer Module (QLQ-BN20) [13]. The QLQ-C30 is composed of one scale measuring an individual's global health status (GHS), five functional scales (physical, role, social, emotional, and cognitive functioning), and nine symptom scales (fatigue, nausea/vomiting, pain, dyspnea, insomnia, appetite loss, constipation, diarrhea, and financial difficulties). The QLQ-BN20 is a disease-specific module for brain cancer patients, and it consists of 11 symptom scales: future uncertainty, visual disorder, motor dysfunction, communication deficit, headaches, seizures, drowsiness, itchy skin, hair loss, weakness of legs, and bladder control. All raw scores were linearly transformed and scored from 0 to 100 according to the guidelines. A higher score on the GHS scale or the functioning scales indicates higher quality of life, whereas a higher score on the symptom scales indicates poorer quality of life.
We used the patients' NCF test scores and HR-QOL questionnaire responses obtained at baseline, 4 months, and 8 months after WBRT in this study.

Statistical analyses At baseline
The patients' NCF and HR-QOL scores are presented as mean scores. We compared the scores between groups based on age (< 65 years old vs. ≥65 years old), KPS (100-80 vs. 70-60), the patients' number of brain metastases (1-4 vs. ≥5 or meningeal carcinomatosis, the Graded Prognostic Assessment (GPA) (4.0-2.0 vs. 1.5-1.0) [14], the number of examinations (1: baseline-only data were available vs. ≥2: the baseline plus 4-month data, or the baseline, 4-month, and 8-month data were available), the use of surgical resection for brain metastases, and the systemic therapies prior to WBRT. We used the Mann-Whitney test for the individual NCF domains and the HR-QOL scales.

At 4 months and 8 months
NCF and HR-QOL Regarding the domain level, we assigned the changes in the score on each NCF domain over the time periods of the baseline to 4 months and the baseline to 8 months to one of three categories: 'improvement,' 'stable,' or 'deterioration.' Improvement and deterioration were defined as an increase or decrease in the score over the cut-off value (1.0, 1.5, or 2.0 SD). Other changes were defined as 'stable.' Regarding the patient level, we classified the patients into four categories: 'improvement,' 'deterioration,' 'stable,' and 'both.' 'Improvement' and 'deterioration' were defined as exhibiting significant improvement or deterioration in NCF on at least one domain while other domains remained stable. The category of 'stable' was defined as no significant changes in any domain, whereas the 'both' category included the patients who showed both significant improvement and deterioration, in different domains.
Regarding the patients' HR-QOL, the ≥10-point changes of each scale at 4 months and 8 months were assigned to one of three categories: 'improvement,' 'stable,' or 'deterioration' for individual HR-QOL scales. The same categorization was done for the mean QLQ-C30 functional/symptoms scales and the QLQ-BN20 scales.
The relationship of the changes in NCF and HR-QOL at 4 months We compared the percentages of deterioration and improvement on HR-QOL scales at 4 months post-WBRT between groups at the patient level (deterioration vs. others, deterioration + both vs. others, improvement vs. others, and improvement + both vs. others) using the three cut-off values of 1.0, 1.5, and 2.0 SD. We used Fisher's exact test to examine the differences in the independent HR-QOL scales and the differences in the means QLQ-C30 functional/symptom scales and QLQ-BN20 scales.
The factors related to the deteriorations in NCF and HR-QOL at 4 months To examine the deterioration of NCF and that of HR-QOL from baseline to 4 months, we compared the percentage of deterioration on the NCF domains, GHS scale, and the means QLQ-C30 functional/symptom scales and QLQ-BN20 scales between groups based on age (< 65 vs. ≥65 years old), GPA Overall survival was assessed by the Kaplan Meier-method. A p-value < 0.05 was considered significant. All statistical analyses were performed with SPSS ver. 25 software.

The patients' characteristics
The characteristics of the 78 consecutive patients at baseline and the 41 patients who underwent the examinations at 4 months are listed in Tables 1 and 2 Figure S1).

The baseline NCF and HR-QOL data
The mean z-score was − 1.46 for the TR, − 1.75 for the DR, − 1.07 for the DRec, − 0.46 for the COWAT, − 1.46 for the TMT-A, and − 1.12 for the TMT-B. The mean score for each HR-QOL scale are provided in Additional file 2: Table  S1. Among the factors examined, only poor KPS of 60-70 (vs. 80-100) was associated with significantly poor NCF in two domains. Regarding the HR-QOL (total 26 scales), significantly poor scores were observed in the patients with a KPS of 60-70 on 16 scales, or a GPA of 0-1.5 on 11 scales; age ≥ 65 years on five scales, the availability of examination (baseline-only) in three scales, undergoing surgery in three scales, the systemic therapies prior to WBRT in three scales; and the number of metastases ≥5 or meningeal carcinomatosis on two scales. (Additional file 2: Table S1).

The relationship between the changes in NCF and HR-QOL at 4 months
The scales on which the 'NCF-deterioration group' was significantly worse than the other statuses were physical function with 2.0 SD as the cut-off, and social function, nausea and vomiting, and visual disorder with 1.0 SD. The scales on which the 'NCF-improvement group' was significantly better than the other statuses were social function and future uncertainty with 1.5 SD (Additional file 2: Table S2). Regarding the mean QLQ-C30/BN-20 scales, the mean QLQ-C30 functional scale score was significantly worse in the NCF-deterioration group than the other statuses with 1.0 SD (p = 0.013) and 1.5 SD  (21) 60 12 (15) No. of brain metastases:  (p = 0.015), but not significantly different with 2.0 SD. The mean QLQ-BN20 scale score was significantly better in the NCF-improvement group than the others only with 1.5 SD (p = 0.033). However, when 'both' was included in either 'deterioration' or 'improvement,' no significant difference in HR-QOL was detected ( Table 3).

The factors related to the deterioration of NCF and HR-QOL at 4 months
When the cut-off value of 1.5 SD was used, the significant factors for the deterioration of the patients' NCF test at 4 months were (1) intracranial tumor control (non-PD vs. PD) in two domains, and (2) the availability of 8-month examination data in four domains. The significant factor for the deterioration of the patients' HR-QOL at 4 months were the availability of 8-month examination data in the mean QLQ-C30 functional scale scores, and the use of the molecular targeted therapy after WBRT in the mean QLQ-BN20 scores (Table 4).

Discussion
The avoidance of WBRT is a current trend in some clinical situations [15], but WBRT is still important for most cases of brain metastases [16]. Recent prospective studies demonstrated that the use of WBRT is more frequently associated with cognitive side effects compared to stereotactic radiosurgery (SRS) alone. However, the role of WBRT is to improve the intracranial tumor control, and there is possibility that NCF deteriorate due to progression intracranial tumor by omitting WBRT [2]. In the JROSG 99-1 Study by Aoyama et al., the patients assigned to WBRT+SRS group demonstrated better short-term NCF compared with those assigned to SRS alone group.
On the other hand, in this study, intracranial tumor control has been shown to be related to NCF deterioration at 4 months. Moreover, the reported percentage No. of brain metastases:   [6]. In the MD Anderson Cancer Center trial in the same setting, 52% of the patients in the WBRT+SRS arm and 24% in the SRS-alone arm showed neurocognitive deterioration at 24 weeks after treatment [7]. In that trial, a 5-point drop in any domain of the HVLT-R was defined as deterioration. In the RTOG 0614 study comparing WBRT with or without memantine, the cut off value for cognitive failure were 2SD and the reliable change index [17]. The probability of cognitive function failure at 24 weeks was 53.8% in the memantine arm and 64.9% in the placebo arm (p = 0.01) [8].
There are the two pitfalls in interpreting these data. One is that some patients in the deterioration group could have both deterioration in one or more domains and improvement in other domains. These 'both' patients could be included in an improvement group if the 'improvement on at least one domain' definition was applied. In a study that assessed the changes in NCF after SRS alone, van der Meer et al. used the cut-off of 1.5 SD and categorized their patients into four groups: improvement (14%), stable (67%), deterioration (14%), and both (6%) at 3 months after the initial SRS [11]. In the present study, as much as 32% of the patients (with 1.0 SD as the cut-off), 22% (with 1.5 SD), or 9% (with 2.0 SD) showed both deterioration and improvement at 4 months. When these 'both' patients are included in the deterioration group or the improvement group with the cut-off of 1.0 SD, the percentage of patients with deterioration and the percentage of those with improvement increase from 41 to 73% (41% + 32%) and from 15 to 47% (15% + 32%), respectively. The other pitfall in the data interpretation is that the cutoff value is different for each study. Therefore, we tried to construct the NCF change criteria based on HR-QOL because the goal of the treatment of brain metastases is the maintenance or even improvement of the patient's HR-QOL, considering the palliative nature of the treatment. In addition, it is known that NCF and HR-QOL are related after WBRT for brain metastasis [18]. It should be noted that assessments of 'clinically meaningful changes in HR-QOL status' is another topic for research, but we used the HR- QOL cut-off value of ≥10 points, which reflects a moderate change in the HR-QOL status in the QLQ-C30 and the QLQ-BN20 [19,20]. In the present analyses, based on the patients' mean QLQ-C30 functional scale scores with 1.0 SD and 1.5SD, a clinically meaningful deterioration of the patients' HR-QOL was observed significantly more frequently in the NCF deterioration group compared to the NCF improvement group, stable group, and both-deterioration-and-improvement groups at 4 months. On the other hand, a clinically meaningful improvement of the patients' HR-QOL was observed significantly more frequently in the NCF improvement group compared to the other three patient groups based on the mean of the QLQ-BN20 scores with 1.5 SD at 4 months. However, when the 'both' patients were included in the deterioration group or the improvement group, the statistical significance of all of the above findings was lost. This implies that patients who show both deterioration and improvement in different domains should be differentiated from patients who exhibit only deterioration or only improvement. Therefore, the use of 'at least one domain' for the definition of deterioration/improvement should be interpreted carefully, and the automatic inclusion of these 'both' patients in a deterioration group should be questioned. Our present findings demonstrate that the use of a stricter cut-off value (i.e., 2.0 SD) resulted in a reduction of the number of 'both' patients but provided less sensitivity, and it could not detect significant HR-QOL changes. We thus propose that the cut-off value of 1.5 SD and the exclusion of 'both' patients from the deterioration and improvement groups would most closely reflect the clinically meaningful changes in HR-QOL.
Our study has several limitations. First, although the examination compliance in our study population was better (85% at 4 months) than that reported in other studies (59-73% at 2-4 months) [6,8,11,[21][22][23], there were patients who could not perform the follow up examinations. Second, this study was unable to perform multivariate analysis that considers confounding factors related to cognitive dysfunction such as opioids [24], chemotherapy [25], surgery, etc.. However, in this study patients group, it has been confirmed that the presence or absence of surgery/chemotherapy did not cause a significant difference in NCF tests at baseline. Third, the prognoses of our study population were heterogeneous, since patients who have undergone WBRT postoperatively and patients with meningeal carcinomatosis were analyzed together. The prognosis of individuals with brain metastases has been prolonged, and it may be necessary to change the time points of the NCF and HR-QOL examinations according to patients' prognoses. Finally, due to the small sample size, we could not fully analyze the patients' NCF and HR-QOL after 8 months post-WBRT. It is necessary to verify our present findings with a larger number of patients over longer follow-up periods.

Conclusions
The use of 1.5 SD as the cut-off for NCF best reflected the patients' HR-QOL status. The inclusion of 'both' in either a 'deterioration' or 'improvement' group blurs the changes in NCF reflecting the clinical meaningful changes in HR-QOL, and therefore patients showing both deterioration and improvement should not be included in either group.