Modeling and simulation of maintenance treatment in first-line non-small cell lung cancer with external validation

Maintenance treatment (MTx) in responders following first-line treatment has been investigated and practiced for many cancers. Modeling and simulation may support interpretation of interim data and development decisions. We aimed to develop a modeling framework to simulate overall survival (OS) for MTx in NSCLC using tumor growth inhibition (TGI) data. TGI metrics were estimated using longitudinal tumor size data from two Phase III first-line NSCLC studies evaluating bevacizumab and erlotinib as MTx in 1632 patients. Baseline prognostic factors and TGI metric estimates were assessed in multivariate parametric models to predict OS. The OS model was externally validated by simulating a third independent NSCLC study (n = 253) based on interim TGI data (up to progression-free survival database lock). The third study evaluated pemetrexed + bevacizumab vs. bevacizumab alone as MTx. Time-to-tumor-growth (TTG) was the best TGI metric to predict OS. TTG, baseline tumor size, ECOG score, Asian ethnicity, age, and gender were significant covariates in the final OS model. The OS model was qualified by simulating OS distributions and hazard ratios (HR) in the two studies used for model-building. Simulations of the third independent study based on interim TGI data showed that pemetrexed + bevacizumab MTx was unlikely to significantly prolong OS vs. bevacizumab alone given the current sample size (predicted HR: 0.81; 95 % prediction interval: 0.59–1.09). Predicted median OS was 17.3 months and 14.7 months in both arms, respectively. These simulations are consistent with the results of the final OS analysis published 2 years later (observed HR: 0.87; 95 % confidence interval: 0.63–1.21). Final observed median OS was 17.1 months and 13.2 months in both arms, respectively, consistent with our predictions. A robust TGI-OS model was developed for MTx in NSCLC. TTG captures treatment effect. The model successfully predicted the OS outcomes of an independent study based on interim TGI data and thus may facilitate trial simulation and interpretation of interim data. The model was built based on erlotinib data and externally validated using pemetrexed data, suggesting that TGI-OS models may be treatment-independent. The results supported the use of longitudinal tumor size and TTG as endpoints in early clinical oncology studies.


Background
There is still an unmet medical need in the treatment of non-small cell lung cancer (NSCLC) in both the first-line and recurrent settings. Maintenance treatment has been investigated in patients with disease control (i.e. without progressive disease) during first-line therapy in a number of trials with the goal to prolong time to disease progression (progression-free survival, PFS), improve quality of life and ultimately prolong overall survival (OS) [1][2][3][4]. However, the risk-benefit ratio of maintenance therapy in NSCLC is still unclear, and several aspects of this strategy have raised considerable debate [2]. Therefore models that could predict the clinical outcomes of maintenance therapy may be of great importance to practitioners and drug developers.
Modeling and simulation may provide quantitative support for interpretation of interim data and development decisions in oncology [5,6]. Tumor response of patients can be characterized using tumor growth inhibition (TGI) metrics, which are estimated based on modeling of longitudinal tumor size data. TGI metrics have been shown to predict treatment effect on OS in solid tumors and in multiple myeloma [5]. These TGI metrics include model-based estimates of change in tumor size from baseline at end of cycle 2 (e.g. week 6 or 8), tumor growth rate and time to tumor regrowth [5]. TGI metrics could be used as alternative endpoints [7] in early clinical studies to optimize drug dosing, support clinical trial design for investigational anti-cancer treatments [5,6].
Although a few models linking OS with TGI metrics and prognostic factors have been published for NSCLC first-line [8][9][10] and second-line [8] therapies, there has been no investigation of TGI metrics and of their link to OS in the context of maintenance therapy to date. Furthermore, there is insufficient published external validation of such models. External validation is critical for assessing treatment independence of the models and favour acceptance [5]. Finally, the OS models are assumed to be disease-specific but treatment-independent. However, to date, there has been insufficient validation of the treatment-independence assumption.
Accumulation of valuable clinical data has made it possible to build and externally validate a TGI-OS model for maintenance therapy in NSCLC patients whose disease did not progress during first-line therapy. Erlotinib maintenance prolonged both PFS [11] and OS [12] in the SATURN trial. The addition of erlotinib to bevacizumab during maintenance therapy significantly prolonged PFS but not OS compared to the bevacizumab-only maintenance in the ATLAS trial [13]. The AVAPERL trial compared maintenance bevacizumab plus pemetrexed vs. bevacizumab alone and showed a significant prolongation of PFS [14] but not of OS [15] following bevacizumab plus pemetrexed compared to bevacizumab alone.
The objectives of this work were 1) to develop a model for OS after maintenance therapy in NSCLC based on erlotinib data from SATURN and ATLAS, 2) to prospectively predict the probability to success of AVAPERL study and perform an external validation by simulating the OS outcomes of AVAPERL study (pemetrexed data) based on interim tumor size data (up to PFS database lock).

Trials and data
Data were collected from all patients enrolled in three studies evaluating maintenance treatment after first-line NSCLC therapy. In all studies, patients whose disease did not progress after four cycles of first-line treatment were randomized to maintenance treatment. Details of the studies can be found in the respective papers, in the introduction section and in Table 1. The studies complied with the Declaration of Helsinki and Good Clinical Practice guidelines, and were approved at all investigating centers by local ethics committees. All patients provided written informed consent for participation and publication of the data [11][12][13][14][15]. An ethics statement was not required for this analysis as they have been provided in each of the three individual studies [11][12][13][14][15]. The SATURN trial compared maintenance erlotinib vs. placebo in patients whose disease did not progress after four cycles of platinum-based first-line chemotherapy [11,12]. The ATLAS trial compared maintenance erlotinib plus bevacizumab vs. bevacizumab alone in patients whose disease did not progress after four cycles of platinum-doublet chemotherapy in combination with bevacizumab [13]. The AVAPERL trial compared maintenance bevacizumab plus pemetrexed vs. bevacizumab alone in patients whose disease did not progress after four cycles of first-line chemotherapy of cisplatin plus pemetrexed in combination with bevacizumab [14,15].
The following baseline patient characteristics were tested as prognostic factors for OS based on SATURN and ATLAS data: age, gender, ethnicity, Eastern Cooperative Oncology Group (ECOG) score, smoking status, tumor size, and histology. In addition, study effects and response to first-line therapy were investigated. Interim AVAPERL data consisted in longitudinal tumor size collected by the time of PFS database lock (data cutoff: July 2011) and baseline patient characteristics only.

Tumor growth inhibition metrics
The full TGI profile was modeled using equations adapted from previously published simplified TGI models [16] ( Fig. 1) that were fit to data from evaluable patients using a nonlinear mixed-effect modeling (population) approach (NONMEM, version 7, FOCE algorithm with interaction) [17]. To be evaluable in this analysis, patients had to have at least one tumor size measurement after randomization to maintenance treatment. Tumor size was assessed as the sum of longest diameters of target lesions by Response Evaluation Criteria In Solid Tumors (RECIST) [18,19]. Shrinkage in model-parameter estimates was estimated as previously described [20]. Model fitting was assessed using standard goodness-of-fit plots.
Two patient-level TGI metrics were calculated based on individual posthoc parameter estimates: the time to tumor regrowth (TTG) [16], and the week 8 ECTS (early change in tumor size) that represented early tumor shrinkage and was calculated as the ratio of model-predicted tumor size at week 8 to baseline estimated by the model. Equations are displayed in Fig. 1. Only the TGI metrics during the maintenance phase were of interest and were calculated.

Overall survival model development
Data from SATURN and ATLAS were used to build the OS model. The impact of individual factors on OS was assessed using Kaplan-Meier and Cox regression analyses using survfit and coxph functions, respectively in R (version 2.15.0) [21]. The baseline patient prognostic factors together with the TGI metrics were tested to explain variability in OS.
A parametric survival regression model (using the survreg function in R version 2.15.0) was developed that describes OS distribution. The probability density function that best describes the observed survival time was selected among normal, lognormal, Weibull, logistic, log-logistic, and exponential by using difference in Akaike information criterion (AIC) [22] of the alternative models.
A "full" model was built by including all significant covariates (baseline prognostic factors, TGI metrics) from the Cox univariate analysis with a significance level of p < 0.05 per the log-likelihood ratio test where the difference in −2*log-likelihood (score) between alternative models follows a χ 2 distribution. The score indicates the level of significance for the association between this covariate and OS: the higher the score, the more significantly this covariate is associated with OS. Then a backward stepwise elimination was carried out. At each elimination step, one covariate was removed from the model. If the reduced model (without this removed covariate) became significantly worse (p < 0.01), the removed covariate stayed in the model. The relative influence of each remaining covariate on the model was re-evaluated by deleting it from the reduced model on an individual basis with a significance level of p < 0.01. The backward elimination resulted in the final model, in which all covariates were significant.
The model simulation performances were evaluated using a posterior predictive check. OS distributions and hazard ratios (HR) in SATURN and ATLAS were simulated 1000 times. Model parameters were sampled from the estimated mean values and uncertainty in parameter estimates for each of the simulated study replicate. Censoring was assumed to be 30 % as in the original data.

Simulations
OS of AVAPERL study were simulated based on TGI metrics estimated using interim tumor size data to predict the likelihood of a successful OS outcome for AVA-PERL and further assess performance of the OS model (external validation). In order to calculate the prediction interval and make statistical inferences, the study was simulated multiple times (20,000) by sampling survival model parameters from their estimated uncertainty distribution. Patient survival times were drawn from the appropriate survival distribution defined by model parameters, baseline prognostic factors and TGI metric of AVAPERL patients. Censoring was simulated in sampling patient study duration, assumed to be independent of death. Patient survival times were censored assuming a uniform distribution of patient study duration from 50 to 140 weeks, which was consistent with the minimum and the maximum time period the patient stayed in the SATURN study without a death event. For each of the replicates, simulated data were analyzed by Kaplan-Meier estimation and Cox regression. Kaplan-Meier estimates of OS distributions and HR used to compare both arms were summarized by median and 95 % prediction interval (PI) across the replicates.

Data
Patients with at least one post-randomization tumor size measurement were included in this analysis. Overall 1534 patients were evaluable to estimate TGI metrics used for building the OS model: 837 (94 %) out of 889 patients from SATURN, and 697 (94 %) out of 743 patients from ATLAS. Interim AVAPERL data used as the external validation dataset were collected by the time of PFS database lock (data cutoff: July 2011) and included 231 evaluable patients out of 245 (94 %) randomized to maintenance treatment.

Tumor size model
The simplified TGI model adequately described the observed tumor size data, as shown by goodness-of-fit plots and individual fits (Additional file 1: Figure S1 and Additional file 2: Figures S2). Parameters were adequately estimated with small standard errors and shrinkage (Table 2) except that inter-individual variability could not be estimated on λ1 due to the limited number of observations during first-line treatment phase. TGI metric estimates (TTG and week 8 ECTS) that were calculated from the TGI model parameters ( Table 2) using equations displayed in Fig. 1 were highly variable: the range from 5th to  Figure S2).

Overall survival model
In univariate Cox analysis (Table 3), TTG was the most significant covariate associated with OS (score 151.7) and much better than week 8 ECTS (score 45.1). The most significant baseline prognostic factors and patient characteristics were tumor size, gender, smoking status, Asian ethnicity and ECOG score (scores 8 to 50, p < 0.0001). Also OS tended to be longer in erlotinib treated patients and in ATLAS trial compared to SATURN (p < 0.01). OS distribution by quartiles of TTG is shown in Fig. 2.
A lognormal distribution had the best likelihood to describe the OS distribution (lower AIC than other distributions). All covariates that were significant in the Cox univariate analysis were included in the "full" model, and underwent backward stepwise elimination. The final model included TTG and the following baseline prognostic factors: baseline tumor size, ECOG score (0 vs. >0), Asian ethnicity, age and gender. All parameters in the final OS model were estimated with good precision (Table 4). According to the model, good prognostic is predicted for patients with longer TTG (treatment effect), small baseline tumor size, age below 55 years, Asian ethnicity, ECOG score 0 and for female patients.
The model was evaluated by simulating OS distributions in each of the study arms (Fig. 3) and the HR of treatment vs. control arm in SATURN and ATLAS ( Fig. 4a and b). The observed HR (0.79 for SATURN and 0.93 for ATLAS) was within the 95 % PI by the model (0.74-0.97 for SATURN and 0.70-1.00 for ATLAS).

Simulation
The final OS model was applied to prospectively predict the expected OS outcome of AVAPERL study (external validation). The goal was to predict the likelihood of a successful OS outcome using interim tumor size data collected by the time of PFS database lock (data cutoff: July 2011). This dataset was not used for model-building (Table 1). Median OS was not yet reached at the time of data cutoff, and the immature OS data that were observed by the time of data cutoff were not used. Patients in AVAPERL study had more favorable prognostic factors than those from SATURN and ATLAS with a smaller proportion of ECOG score >0 (52 % vs. 66-69 %) and smaller baseline tumor size (5.2 cm vs. >6 cm) ( Table 1). Simulations indicated that pemetrexed plus bevacizumab as maintenance treatment in AVAPERL was unlikely to demonstrate a significant OS prolongation vs. bevacizumab alone. The expected HR was 0.81 with a 95 % PI of 0.59-1.09 (62 % of events), which contained 1 (Fig. 4c). Predicted median OS was 17.3 and 14.7 months in both arms, respectively. These prospective simulations were consistent with the results of the final OS analysis published recently [15]: the final observed HR was 0.87 with a 95 % confidence interval of 0.63-1.21 (58 % of events). The

Discussion
Maintenance treatment in responders after induction first-line treatment, without waiting for disease progression and start of a new line of therapy, is a therapeutic strategy investigated and used in several tumor types including adult and pediatric acute lymphocytic leukemia [23,24], follicular non-Hodgkin lymphoma [25,26], multiple myeloma [27], breast cancer [28], metastatic colorectal cancer [29,30], and advanced ovarian cancer [31][32][33]. Although well established for certain hematologic cancers, maintenance therapy has only recently become a treatment option for NSCLC [1][2][3]. The risk-benefit ratio of maintenance therapy in NSCLC is still unclear, and the thoracic oncology community has seen considerable debate over several aspects of this strategy [2]. Even when maintenance treatment allows prolonging PFS and possibly OS, it is unclear whether OS is prolonged compared to classical first-line followed by second-line paradigm.
Model-based approaches are gaining momentum to optimize anti-cancer drug usage and development [6]. Estimates of TGI metrics from modeling of longitudinal tumor size data have been used to predict clinical outcomes and simulate clinical trials [5] in variety of settings including first-and second-line treatment of NSCLC [8][9][10]. We present here an adaptation of the modeling framework for maintenance treatment in NSCLC. The framework is developed based on two erlotinib maintenance studies and assessed in simulating outcome of an independent pemetrexed study. As observed in first-line treatment [9,10], an estimate of time to tumor regrowth (TTG) after start of maintenance treatment captured drug effect, i.e. an OS model incorporating TTG and baseline prognostic factors was able to simulate erlotinib HR in SATURN and ATLAS. Baseline prognostic factors in the model are well known prognostic factors for OS: good prognostic for patients with small baseline tumor size, age below 55 years, Asian ethnicity, ECOG score 0 and for female patients. Smoking status and histology (squamous vs. non-squamous) that were of significant prognosis in the univariate analysis were not retained in the final multivariate model.
As previously discussed [16], the TGI model does not account for exposure to the treatment drugs and is not subjected to any simulation-based assessment (e.g. visual predictive check) because it is not meant to be used for simulation but only to estimate the TGI metrics to be used in the OS model. The TGI model could be in other forms as well, such as a combination of exponential and/ or linear models [8,34] or a simple spline function. Therefore the fundamental assumption of constant exposure over time that was previously used [35] to derive this TGI model from the more complex exposure-driven model is irrelevant here as the model is not used in simulations of response for alternative exposure. There is also no need to assess covariate effects on the TGI model parameters because the model is not used to simulate tumor sizes in new patients.
We performed a two-stage analysis, meaning that we first estimated TGI metrics and then developed the OS model, and we thereby ignored time-dependent hazard   [36]. Additionally, simulations have shown that TTG was not confounded with OS [37,38]. In the OS model, the censoring model is meant to mimic the duration (treatment plus follow-up period) a patient stays in the study if no death event occurs. The distribution of this duration is defined per protocol by the maximum duration of the study and the patient inclusion rate. If a patient is predicted to die after his predicted duration in the study, this patient is censored. The distribution of study duration is  Another limitation of our analysis is that patients needed to have at least two tumor size measurements in the maintenance phase to be evaluable in the TGI model because the TGI parameters were unidentifiable with only one tumor size measurement. These excluded patients who died or dropped out of the study early before the first tumor size measurement may have rapidly growing tumors. However, this may not have a significant impact on this analysis because 94 % of the patients were evaluable.
The model successfully simulated the OS outcomes of the pemetrexed maintenance study AVAPERL based on interim tumor size data collected by the time of PFS database lock before median OS was even reached. This is the first modeling framework for maintenance treatment and one of the few such frameworks validated in simulating an independent study with a drug with a different mechanism of action (pemetrexed) compared to the one used to develop the model (erlotinib), providing support to the hypothesis that TGI metrics capture drug effect independent of treatment [5]. This framework may be used to support design and interim analysis of upcoming maintenance studies and to help in the selection of patients most likely to benefit from maintenance treatment.

Conclusion
In conclusion, a robust TGI-OS model linking OS with TGI metrics and prognostic factors was developed for maintenance therapy following first-line NSCLC treatment. The model successfully predicted the OS outcomes of an independent study (AVAPERL) based on interim tumor size data (up to PFS database lock), indicating that the model may be used for trial simulation and facilitate interpretation of interim data and development decisions. The model was built based on erlotinib data and externally validated using pemetrexed data, suggesting that TGI-OS models may be treatment-independent. The results also supported the use of longitudinal tumor size and TTG as endpoints in early clinical oncology studies.