Disease-free survival as a surrogate endpoint for overall survival in adjuvant trials of pancreatic cancer: evidence from 20 randomized controlled trials

Background: we aimed to assess whether disease-free survival (DFS) could serve as a reliable surrogate endpoint for overall survival (OS) in adjuvant trials of pancreatic cancer. Methods: We systematically reviewed adjuvant randomized trials for non-metastatic pancreatic cancer after curative resection that reported a hazard ratio (HR) for DFS and OS. We assessed the correlation between treatment effect (HR) on DFS and OS, weighted by sample size or precision of hazard ratio estimate, assuming fixed and random effects, and calculated the surrogate threshold effect (STE). We also performed sensitivity analyses and a leave-one-out cross validation approach to evaluate the robustness of our findings. Results: After screening 450 relevant articles, we identified a total of 20 qualifying trails comprising 5170 patients for quantitative analysis. We noted a strong correlation between the treatment effects for DFS and OS, with coefficient of determination of 0.82 in the random effect model, 0.82 in the fixed effect model, and 0.80 in the sample size weighting; the robustness of this finding was further verified by the leave-one-out cross-validation approach. Sensitivity analyses with restriction to phase 3 trials, large trials, trials with mature follow-up periods, and trials with adjuvant therapy versus adjuvant therapy strengthened the correlation (0.75 to 0.88) between DFS and OS. The STE was 0.96 for DFS. Conclusions: Therefore, DFS could be regarded as a surrogate endpoint for OS in adjuvant trials of pancreatic cancer. In future similar adjuvant trials, a hazard ratio for DFS of 0.96 or less would predict a treatment impact on OS.

meta-analysis to evaluate whether DFS could be used as a surrogate endpoint to measure the effect of the adjuvant therapy of pancreatic cancer.

Search strategy and data collection
In December 2018, we searched Medline and Embase systematically using the key words "pancreatic neoplasm", "chemotherapy", "radiotherapy", and "chemoradiotherapy", limited to "clinical trial", "controlled clinical trial" or "randomized controlled trial". We also search the ClinicalTrials.Gov and Cochrane Library databases, and manually searched the references of the included trials and abstracts of two conference proceedings (the 2019 American Society of Clinical Oncology [ASCO] annual meeting and the European Society for Medical Oncology [ESMO] 2018 congress) to retrieve additional studies.
Inclusion criteria were randomized controlled trials of adjuvant treatment for non-metastatic pancreatic cancer after curative resection, reporting hazard ratio (HR) for OS and DFS in full-text publication. For each trial, the following data were collected by two independent investigators (RCN and SQY): OS and DFS results, final publication year, trial conduct period, type of study (phase II or III), staging information, treatment arms, number of patients, primary endpoint, and median follow-up time.

Statistical analysis
This analysis is at the trial level throughout, with no individual patient-level data being incorporated.
We computed the correlation between the treatment effect (HR) on DFS and OS through a linear regression model [27]. To interpret the differences between studies regarding study size and precision of HR estimates, we weighted the analysis proportionally to the study sample size or to the precision of the observed treatment effects. Hence, we applied three weighting strategies (sample size, fixed effect, and random effect) as the weighting strategies [30]. While the fixed effect metaanalysis is based on the presumption that a common treatment effect exists among every trial and uses the estimated inverse variance as weights, the random effect meta-analysis permits treatment effect discrepancy from trial to trial and merges the potential among-trial variation of effects into the weights. According to A' Hern et al.
[31], we down-weighted the sample size if trials reported more than two treatment arms.
We calculated the weighted coefficient of determination (R 2 ) to quantify the variation explained by the surrogate endpoints, with R 2 value higher than 0.75 as a strong correlation, higher than 0.5 as good, higher than 0.25 as moderate, and equal to or lower than 0.25 as poor. We performed several sensitivity analyses that restricted the analyses to phase 3 trials, large trials (included patients ≥ 200), trials with mature follow-up periods (median follow-up ≥ 24 months), trials with adjuvant therapy versus observation, and trials with adjuvant therapy versus adjuvant therapy to verify the robustness of our findings. We also calculated the surrogate threshold effect (STE), which was defined as the minimum treatment effect on the surrogate necessary to predict an OS benefit [32]. The upper limit of the confidence interval for the estimated surrogate treatment effect should fall below the STE to predict a non-zero effect on OS. For each meta-analysis, we applied an internal validation through leave-one-out analysis to evaluate the prediction accuracy of the surrogate model [33]. Each trial was left out once, and the surrogate model was built with other trials. This model was then re-applied to the left-out trial, and a 95% prediction interval was calculated to compare the predicted and observed treatment effect on OS. We used R version 3.4.0 for all statistical analyses (http://www.r-project.org).

Results
After the systematic literature review, we identified 20 qualifying trials (5 phase 2 trials and 15 phase latter publication in the present study. The CONKO-001 trial was also first published in 2007 [16] and was updated in 2013 [7]. Overall, the 20 trials included 23 comparisons for quantitative analysis, among which nine comparisons reported improvement in OS, and eleven comparisons reported improvement in DFS (Table 2).
We first assessed the degree of association through sample size weighting strategy, and observed that the correlation between the treatment effect on DFS and OS was strong (R 2 = 0.80, 95% CI: 0.49 to 0.99) (Figure 2). Additionally, we noted that permitting difference (random effect model) and no difference (fixed effect model) between therapy type and treatment effect on DFS and OS slightly strengthened the degree of association (fixed effect: 0.82, 0.52 to 0.99; random effect: 0.82, 0.52 to 0.99). We then calculated the STE of 0.96, indicating that a future adjuvant trial would need less than 0.96 for DFS of the upper limit of the confidence interval to predict with 95% confidence an OS benefit.
Given the potential heterogeneity of the included studies, we performed several sensitivity analyses (Table 3), and noted that restriction of the analysis to phase 3 trials would strengthen the correlation between DFS and OS (0.82 to 0.83). When we restricted the analyses to trials with adjuvant therapy versus observation, the degree of association between DFS and OS was not strong (0.68 to 0.73) ( Figure 3A). Nonetheless, we recognized that adjuvant therapy versus adjuvant therapy rather than observation is now the standard design setting for pancreatic cancer; thus, we then restricted the analyses to trials with adjuvant therapy versus adjuvant therapy, and observed a very strong correlation between DFS and OS (0.89 to 0.93). Other sensitivity analyses that restricted the analyses to large trials and trials with mature follow-up periods also exhibited strong correlations between DFS and OS (0.80 to 0.87) ( Figure 3B).
Finally, we performed a leave-one-out cross validation approach to assess the accuracy of DFS in predicting OS. We noted that the observed HR for OS fell between the limits of the 95% prediction intervals in 22 of 23 comparisons, indicating that the treatment effect on DFS is a reliable predictor of OS (Figure 4).

Discussion 8
The point at which a potential surrogate endpoint could be theoretically validated has been seriously discussed [41]. The correlation approach has been widely adopted to validate the efficiency of a surrogate endpoint in locally advanced lung cancer [25], gastric cancer [26,42] and colorectal cancer [27]. In the present study, we included a total of 20 high quality adjuvant randomized controlled trials to evaluate the surrogacy of DFS for OS in pancreatic cancer. Our finding demonstrated that the correlation between DFS and OS was strong (0.80 to 0.82), irrespective of the applied weighting strategies. Sensitivity analyses that were restricted to phase 3 trials, large trials, trials with mature follow-up periods, and trials with adjuvant therapy versus adjuvant therapy also yielded strong or very strong correlations (0.80 to 0.93) between DFS and OS. Therefore, we proposed the use of DFS as the surrogate endpoint for OS in adjuvant trials of pancreatic cancer.
Although the recent advance in adjuvant chemotherapy have translated into substantial survival benefit for pancreatic cancer, a large number of these treated patients still suffered from relapse or metastasis; thus, new therapeutic strategies are urgently needed. Clinicians are now evaluating the therapeutic effect of more intensive adjuvant chemotherapy, adjuvant targeted therapy and immunotherapy in pancreatic cancer after curative resection. It is well recognized that OS is the standard endpoint for clinical trials; however, using the endpoint of OS to perform the phase 3 trials is time consuming, thus postponing the new therapy strategies in clinical application. Therefore, we urgently need reliable surrogate endpoints for OS in adjuvant trials of pancreatic cancer, among which DFS is the most reasonable surrogate endpoint, and it has been set as the primary endpoint in several phase 3 trials [7, 17-19, 23, 37]. A previous meta-analysis reported that the correlation between DFS and OS was not strong enough to support the DFS as the reliable surrogate endpoint for OS in adjuvant trials of pancreatic cancer [28]; nonetheless, they only included a total of 12 trial, among which one trial was adjuvant setting for periampullary cancer rather than pancreatic cancer [29]. Therefore, in the present meta-analysis, we applied more rigorous criteria through three weighting strategies to address this urgent issue. Our findings revealed that the degree of association between DFS and OS was strong, which was further verified through extensive sensitivity analyses and a leave-one-out analysis validation approach. We believe that the robust correlation between DFS and OS in adjuvant therapy of pancreatic cancer is mainly attributable to the fact that pancreatic cancer is an aggressive tumor and that the subsequent lines of therapy are limited if patients develop relapse or metastasis.
Given the fact that adjuvant chemotherapy has showed superior survival outcome to observation for pancreatic cancer, adjuvant chemotherapy including gemcitabine-based or S-1-based regimens rather than observation would be set as the control arm in adjuvant trials. Interesting, we found that the correlation between DFS and OS was not strong (0.68 to 0.73) with restriction to trials with adjuvant therapy versus observation; nonetheless, we noted a very strong correlation between DFS and OS when we restricted the analysis to trials with adjuvant therapy versus adjuvant therapy (0.89 to 0.93).
Therefore, in future adjuvant trials of pancreatic cancer, DFS could be served as the robust surrogate endpoint for OS.
STE is an alternative measure for surrogate endpoint validation [32]. Using a surrogate endpoint with STE closer to 1, it would be easier to predict an OS benefit. In the present meta-analysis, our finding showed that the STE was 0.96 for DFS, indicating that an adjuvant trial in pancreatic cancer producing a hazard reduction of at least 4% for disease recurrence or death could be expected to promise a statistically significant reduction in OS.
There are several limitations that should be noted. First, the data for our analysis were extracted from trial level rather than an individual patient; therefore, a potential published bias cannot be excluded.
Second, the included trials spanned nearly three decades, and the ascertainment of DFS was mainly influenced by the image examination and surveillance interval, thus may have changed considerably over time and among trials. Third, long-term follow-up was not available from all trials included in our analysis. Pancreatic cancer is a relatively aggressive malignancy with severe heterogeneity; thus, the short follow-up in adjuvant trials will result in fairly wide confidence intervals of HR about the treatment effects. In the sensitivity analysis, the correlation between DFS and OS remained strong (R 2 = 0.75) when we included trials with median follow-up > 24 months. Third, the included trials at our analysis comprised a wide range of therapeutic strategies, which included trials of adjuvant chemotherapy, radiation therapy, chemoradiotherapy, chemoimmunotherapy and targeted treatment.
Although we performed sensitivity analysis to eliminate the potential effect of these treatment heterogeneities, the results of our analysis should be interpreted with caution. Fourth, the present study was performed at trial-level rather than individual-level. Therefore, we strongly recommended authors of individual trials to share their data to further verify the results of our analysis through individual-patient data.

Conclusions
In conclusion, our analysis suggested that DFS could serve as a reliable surrogate endpoint for OS in adjuvant trials of pancreatic cancer. In future similar adjuvant trials, a hazard ratio for DFS of 0.96 or less would predict a treatment impact on OS. However, these results should be further verified by individual-patient data analysis.    Correlation between treatment effects on DFS and OS (related to Table 3 Leave-one-out cross-validation analysis of the prediction of OS by treatment effect on DFS: observed HR for OS for left-out trial vs. predicted HR for OS and 95% prediction interval for predicted HR for OS. To assess model accuracy, a leave-one-out cross-validation strategy was used: each unit of analysis was left out once, and the linear model was then constructed from scratch using the remaining data [33]. This model was then re-applied to the left-out study in order to compare the predicted and observed treatment effect on OS.
Based on the linear regression models, a 95% prediction interval was calculated compare the predicted and observed treatment effect on OS. OS, overall survival; DFS, disease-free survival; HR, hazard ratio