Predictive value of baseline metabolic tumor volume in early-stage favorable Hodgkin Lymphoma – Data from the prospective, multicenter phase III HD16 trial

Background 18F -fluorodeoxyglucose (FDG) positron emission tomography (PET) plays an important role in the staging and response assessment of lymphoma patients. Our aim was to explore the predictive relevance of metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in patients with early stage Hodgkin lymphoma treated within the German Hodgkin Study Group HD16 trial. Methods 18F-FDG PET/CT images were available for MTV and TLG analysis in 107 cases from the HD16 trial. We calculated MTV and TLG using three different threshold methods (SUV4.0, SUV41% and SUV140%L), and then performed receiver-operating-characteristic analysis to assess the predictive impact of these parameters in predicting an adequate therapy response with PET negativity after 2 cycles of chemotherapy. Results All three threshold methods analyzed for MTV and TLG calculation showed a positive correlation with the PET response after 2 cycles chemotherapy. The largest area under the curve (AUC) was observed using the fixed threshold of SUV4.0 for MTV- calculation (AUC 0.69 [95% CI 0.55–0.83]) and for TLG-calculation (AUC 0.69 [0.55–0.82]). The calculations for MTV and TLG with a relative threshold showed a lower AUC: using SUV140%L AUCs of 0.66 [0.53–0.80] for MTV and 0.67 for TLG [0.54–0.81]) were observed, while with SUV41% an AUC of 0.61 [0.45–0.76] for MTV, and an AUC 0.64 [0.49–0.80]) for TLG were seen. Conclusions MTV and TLG do have a predictive value after two cycles ABVD in early stage Hodgkin lymphoma, particularly when using the fixed threshold of SUV4.0 for MTV and TLG calculation. Trial registration ClinicalTrials.gov NCT00736320.

Since the introduction of 18 F -fluorodeoxglucose (FDG) positron emission tomography (PET) into the management of many oncological diseases, it has taken on a major role in the staging and response assessment of lymphoma patients [9][10][11][12]. It has been shown that radiotherapy can safely be omitted in patients with PET-negative residual tissue after effective first-line chemotherapy for advanced-stage Hodgkin lymphoma [13]. Furthermore in advanced Hodgkin lymphoma, chemotherapy can be reduced to a total of 4 cycles eBEACOPP (Bleomycin, Etoposide, Doxorubicin, Cyclophosphamide, Vincristine, Procarbazine and Prednisone in escalated doses) if the PET is negative after 2 cycles [14]. For patients with early-stage unfavorable Hodgkin lymphoma, the HD17 trial has shown that radiotherapy can be omitted for PETnegative patients after effective chemotherapy without any clinically relevant loss of efficacy [15]. However, in early stage favorable Hodgkin lymphoma, three different randomized trials (H10, HD16 and RAPID) have shown that omitting radiotherapy in patients who are PET-negative after ABVD (doxorubicin, bleomycin, vinblastine and dacarbazine) chemotherapy is associated with a relevant loss of tumor control and increased number of relapses [6,16,17]. As the role of PET for individual tailoring of treatment is limited after 2 cycles of ABVD in early-stage favorable Hodgkin lymphoma, additional prognostic factors are needed urgently. Accordingly, we performed an analysis of metabolic tumor volume (MTV) and total lesion glycolysis (TLG) derived from PET at staging as potentially useful predictive factors, in early-stage Hodgkin lymphoma.

Study cohort
From November 2009 through December 2015, the prospective, multicenter phase III trial HD16 recruited a total of 1,150 therapy-naive Hodgkin lymphoma patients, aged 18 to 75 years. HD16 included patients in clinical stage I or II without risk factors such as three or more involved nodal areas, large mediastinal mass (≥ 1/3 of the maximal thoracic diameter as measured on chest X-ray), extra-nodal disease or elevated erythrocyte sedimentation rate (≥ 50 mm/h for patients without B symptoms and ≥ 30 mm/h in case of B symptoms).
In HD16, individuals were randomly assigned either to standard combined-modality treatment including 2 cycles of ABVD followed by PET (PET-2) and consolidating radiotherapy irrespective of PET-2 result, or to the experimental arm where irradiation was omitted in cases of PET negativity after 2 cycles of ABVD. In HD-16 PET-2 was mandatory for all patients, while a staging PET before start of treatment was not a mandatory part of the protocol. The PET scans were performed according to the respective national guidelines, which did not include SUV harmonization. Accordingly, our analysis set consisted of those 107 individuals with baseline PET (PET-0) images available to the central review panel for quantitative assessment. (Fig. 1).
The HD16 trial was approved by the responsible ethics committees and was conducted according to the Declaration of Helsinki, in compliance with the Good Clinical Practice guidelines of the International Conference on Harmonization. All patients provided written informed consent before participation.

Image analysis
Baseline MTV and TLG were calculated in all baseline PET scans available for quantitative analyses, using the PET/CT Viewer in FIJI (ImageJ). First, maximal standardized uptake value (SUVmax) for the liver was obtained from a spherical 3-cm volume of interest (VOI) in the right liver lobe. Following that, SUVmax was estimated within all tumor sites with increased F-FDG uptake. Manual corrections were performed in cases where non-lymphoma tissue was included in the automatic calculation.
Within these MTVs, SUVmean was estimated and TLG was calculated as the sum of supra-threshold voxels of all lymphoma lesions multiplied by SUVmean within the respective MTV as follows: 1. Sum of MTV 41% multiplied by SUVmean (TLG 41% ), 2. Sum of MTV 4.0 multiplied by SUVmean (TLG 4.0 ), 3. Sum of MTV 140%L multiplied by SUVmean (TLG 140%L ),

Statistical evaluation
Patient characteristics and PET-2 response data were obtained from the study database. All data were analyzed descriptively. MTV and TLG distributions were visualized in histograms. The correlation of the different thresholding methods was assessed by Pearson product moment correlation coefficients. Receiver operating characteristic (ROC) analysis was performed to evaluate baseline MTV and TLG as predictors of PET-2 response, using the liver as cutoff for PET positivity (Deauville score 4) [18,19]. Additionally, p-values resulting from logistic regressions on log-transformed data are reported to explore and quantify the predictive value of MTV and TLG on PET-2 positivity. All statistical computations were performed using SAS 9.4 (SAS Institute, Cary, NC, USA).

Patients
Of the 1,139 patients from the intention-to-treat population of the HD16 study, 107 with available PET-0 were eligible for the present analysis. Characteristics of eligible and non-eligible patients are shown in Table 1. Among the 4 participating countries, the proportion of patients receiving a PET-0 scan was lowest in Germany. Other characteristics were similar in patients with and without PET-0. In PET-2, 16 (15%) of the patients examined were positive (Deauville Score 4) while 91 (85%) were negative (Deauville Score < 4).

MTV
Histograms of the MTV distributions are presented in

Effect of MTV on PET-2 positivity
The ROC curves for PET response after two cycles of ABVD were derived from MTV using different

Effect of TLG on PET-2 positivity
The ROC curves for PET response after two cycles of ABVD were derived from TLG using different thresholding methods, and are displayed in Fig. 5. AUC for TLG 41% , TLG 4.0 and TLG 140%L were 0.64 (95% CI

Discussion
The following results emerge from our analysis of 107 patients with early-stage favorable Hodgkin lymphoma: All three methods used for calculating MTV and TLG show a moderate predictive impact with regard to PET response after 2 cycles of ABVD in early-stage favorable Hodgkin lymphoma. Both the calculations of MTV and TLG using SUV 4.0 as fixed threshold, showed a small advantage, as compared to the other methods used. Various studies have indicated the prognostic potential of baseline MTV in Hodgkin lymphoma patients [19][20][21][22][23][24][25][26]. Akhtari and colleagues showed that MTV and TLG could help predict worse outcome in 267 patients with early-stage Hodgkin lymphoma who received combined standard modality treatment. Furthermore, two distinct categories can be discerned from MTV or TLG: low and high disease burdens [20]. In a study of 59 patients with Hodgkin lymphoma treated with anthracycline-based chemotherapy, Kanoun and colleagues highlighted a possible division into two risk groups with regard to longterm success on the basis of the MTV and the metabolic signature [21]. Cottereau and colleagues showed baseline MTV to be a strong prognostic factor in 258 patients with early-stage Hodgkin lymphoma who received standard combined modality treatment. This collective of patients could also be divided into two risk groups based on the MTV [22]. In another group of 127 patients, Song and colleagues showed that MTV can be a prognostic factor and can also usefully influence selection of the necessary therapy regimen [23]. In 65 patients with a relapsed or refractory Hodgkin, Moskowitz and colleagues found MTV to be a very strong prognostic factor and one that can also improve the predictive value of PET before autologous stem cell transplantation [24]. In 310 patients with advanced Hodgkin lymphoma, Mettler and colleagues have shown that the MTV can predict patient response after two cycles of eBEACOPP, regardless of the method used to determine it [25]. The receiver-operating-characteristic curves in their study did not point to any unique cut-offs, but indicated a wide range of possible cut-offs [25], as we have also observed here in the present work. Analyzing a group of 140 DLBCL patients, Kim and colleagues show that the metabolic tumor burden expressed as TLG can be a prognostic factor for survival after R-CHOP [26]. All these studies are in line with our finding that initial MTV and TLG are parameters of additional use for response prediction.
Here, it should be noted that there is as yet no standardized procedure for measuring MTV and TLG [27]. A variety of methods and software platforms are currently in use for MTV and TLG calculation. The use of algorithms with fixed-threshold or relative-threshold values and adaptive threshold values is often encountered in this context [28]. Using a relative threshold of 41% of SUV max for 106 patients with peripheral T-cell lymphoma, Cottereau and colleagues demonstrated that baseline MTV is a relevant risk factor [22]. Kanoun and colleagues showed that MTV calculation using a fixed threshold of SUV 2.5 gave a higher volume than a relative limit of 41% of SUV max [21]. In their cohort of 140 DLBCL patients, Kim and colleagues showed that a TLG calculated with 50% of the SUV max has the highest prognostic accuracy when a relative threshold is used [26]. This is contrary to our results that indicate that a fixed threshold has a higher predictive value than a relative threshold. Furthermore, in a group of 121 patients, Tutino and coworkers observed that with a fixed threshold of SUV 4.0 MTV-calculation is less dependent on the reviewer and can be reproduced better than calculation using a relative threshold of SUV 41% [29]. This is in line with our observation that MTV and TLG using a fixed cut-off of 4.0 may be slightly superior in terms of predicting PET-2 positivity. Here we have observed that MTV4.0 works comparably well in a cohort in which no SUV standardization between participating PET centers has been performed.
Due to the limited number of survival events, our analyses were restricted to PET-2 positivity and a further investigation of the influence on long-term efficacy in terms of progression-free survival is pending. However, as the prognostic influence of PET-2 on progression-free survival has been demonstrated for the HD16 trial [6], PET-2 could be regarded as a surrogate for longer-term efficacy. In order to further improve response prediction and risk-adapted individualization of therapy, additional risk factors are needed. Such risk factors might include but need not be restricted to the use of MTV and TLG, possibly in combination with PET-2. New biomarkers such as thymus and activation-regulated chemokine or cell free DNA would be well worth further investigation for individual tailoring of treatment in patients with Hodgkin lymphoma [30].

Conclusion
MTV and TLG show predictive value after two cycles of ABVD in early-stage favorable Hodgkin lymphoma. When determining MTV and TLG, due to higher reproducibility and a slight advantage over a relative threshold, we favor a fixed cut-off of SUV 4.0.
However, it remains to be shown whether these factors can have a useful impact on prognosis when applied in combination with PET-2 assessment and other biomarkers in early stage Hodgkin lymphoma.