Time-dependent endpoints as predictors of overall survival in multiple myeloma
© Félix et al.; licensee BioMed Central Ltd. 2013
Received: 19 June 2012
Accepted: 7 March 2013
Published: 16 March 2013
Skip to main content
© Félix et al.; licensee BioMed Central Ltd. 2013
Received: 19 June 2012
Accepted: 7 March 2013
Published: 16 March 2013
Supporting health care sector decisions using time-dependent endpoints (TDEs) such as time to progression (TTP), progression-free survival (PFS), and event-free survival (EFS) remains controversial. This study estimated the quantitative relationship between median TDE and median overall survival (OS) in multiple myeloma (MM) patients.
Studies (excluding allogeneic transplantation) published from 1970 to 2011 were systematically searched (PubMed). The nonparametric Spearman’s rank correlation coefficient measured the association between median TDE and OS. The quantitative relationship between TDEs and OS was estimated with a two-step approach to a simultaneous Tobit model.
We identified 153 studies: 230 treatment arms, 22,696 patients and mean study duration of 3.8 years. Mean of median TDEs was 22.5 months and median OS was 39.1 months. Correlation coefficients of median TTP, PFS, and EFS with median OS were 0.51 (P = 0.003), 0.75 (P < 0.0001), and 0.84 (P < 0.0001), respectively. We estimate a 2.5 month (95% confidence interval, 1.7–3.2) increase in median OS for each additional month reported for median TDEs. There was no evidence that this relationship differed by type of surrogate.
TDEs predict OS in MM patients; this relationship may be valuable in clinical trial design, drug comparisons, and economic evaluation.
Multiple myeloma (MM) is the second most common hematologic malignancy, after non-Hodgkin lymphoma [1, 2]. In the US, it was estimated that over 20,500 new cases of MM and more than 10,600 deaths occurred in 2011 . Despite improved survival over the past decades, MM remains an incurable disease, with research focused on finding more effective treatments . Although improving overall survival (OS) has been the gold standard outcome for new anticancer treatments, large costly trials with long follow-up periods are required to document an impact on OS [5, 6]. Furthermore, OS can be influenced by trial design characteristics, such as crossover and sequential treatments [7, 8]. Therefore, surrogate endpoints that can be measured sooner and more frequently during the course of a clinical trial, are being used to provide an earlier indication of efficacy .
A surrogate endpoint is a measurement that can be substituted for the final endpoint (e.g., improvement in OS) to successfully measure the effect of an intervention . Common surrogate endpoints for OS used in clinical oncology trials include: response rate; time to disease progression (TTP); progression-free survival (PFS); and event-free survival (EFS) . For study conclusions to be valid, differences or changes observed in the surrogate endpoints must accurately reflect changes in the final endpoint . There is ongoing debate about the utilization of these time-dependent endpoints (TDEs) as intermediate endpoints for OS in clinical trials [12, 13], as well as their value to health authorities when assessing drug approvals and assessing costs of drug therapy [7, 14, 15].
In 1992, the US Food and Drug Administration (FDA) instituted the accelerated approval process to allow earlier marketing of drugs that treat serious, life-threatening diseases . Recently, the FDA ruled that both TTP and PFS are valid and clinically relevant TDEs that can be used in the accelerated approval process for MM agents . Although these endpoints are generally thought to be reliable in MM, their predictive value for OS is unknown. Our objective was to estimate a quantitative relationship between median TDEs and median OS from prospective published MM studies in order to address the question of what the expected median OS would be given the observed effect in the median TDE.
We systematically searched PubMed for articles published between January 1970 and February 2011 to identify experimental or observational prospective studies that assessed OS in MM using TTP, PFS, or EFS as a primary endpoint. Key search words included “myeloma” or “survival” or “progression”; see Additional file 1 for full search details. Retrospective studies were excluded, as were studies involving allogeneic transplantation, which generally target younger patients with a clinical status that differs greatly from the majority of MM patients. The TDEs considered were median TTP, PFS, and EFS. The final outcome measures considered were median OS, 12-month OS, 24-month OS, 36-month OS, or any survival data that described the proportion of patients alive or deceased and the duration of follow-up. Studies lacking surrogate endpoint outcomes or OS data were excluded; see Additional file 1 for the selection process scheme. The following variables were collected: authors, publication year, journal, study sample characteristics (period of analysis, median age, percentage of males, type and number of previous therapies), and study results (therapies used, median TDEs, median OS, 12-month OS, 24-month OS, and 36-month OS).
Using the median TDE as an independent observation to estimate median OS, with each study arm representing a single observation, presents a statistical challenge. Outcome measures such as TTP, PFS, and EFS differs on their exact definition, especially with the inclusion or exclusion of death as event . Concerning the estimation of the statistical model, this points to the presence of endogeneity of the main regressor of interest (TDE) and heterogeneity of observations. In the context of linear regression, an estimation based on an instrumental variables approach may be considered, when the endogeneity of regressors is suspected .
In addition, the data analyzed are sourced from a literature review of studies with different study designs, patient populations, and treatments. It is possible that the variance of the unexplained share (the residuals) differs among observations and that heteroskedasticity (non-constant variance) is present. In the presence of heteroskedasticity, the instrumental variables estimator yields consistent, but inefficient, estimates of the coefficients and an inconsistent estimate of the covariance matrix . One way to minimize inefficiency in the coefficient estimates is to weight each observation by the number of patients enrolled in the corresponding study arm and to use the Generalized Method of Moments (GMM) . Inconsistency of the standard error estimates may be corrected by using heteroskedasticity robust estimators of the variance–covariance matrix, such as the Huber/White sandwich estimator [20, 21].
The set of available exogenous variables (candidates to instruments) comprise the following: 12-months OS rate; proportion of females; median age; dummy variables characterizing patients by previous treatment; type of TDEs; and publication year.
Not all study arms included in the review reached the median OS by the last published follow-up; those that did not are censored observations. If, at the end of a given study, less than 50% of the patients at risk were alive, then the exact value of the dependent variable being modeled (median OS) is known. However, if the duration of the study is shorter than the number of months needed to have less than 50% of patients alive, then all that is known about the dependent variable is that it is higher than the study duration. This information is potentially relevant and was included in our analysis.
The regression of the endogenous variable on the excluded instruments and exogenous variables (Step 1) was weighted by the number of patients in each study arm and estimated using the GMM Cragg estimator . This estimator makes use of information provided by the excluded instruments (median age, percentage of males, and classification of patients by number of previous treatments) to increase efficiency in the presence of heteroskedasticity of unknown form.
Given that the censoring point varies with the study arm, a censored normal-weighted regression with the robust option  was estimated in Step 2, instead of a purely Tobit model.
Following the approach described in Smith and Blundell (1986) , variance of the estimators’ formulas was corrected for one endogenous regressor. With such corrections Smith and Blundell demonstrate that estimates for standard errors, t-statistics, and confidence intervals (CIs) are asymptotically valid and hence applicable in the context of the present analysis.
Evidence in favor of the use of the instrumental variable approach was generated by the Durbin-Wu-Hausman test for endogeneity of the TDE regressor [24–26]. The validity of the instruments used was confirmed by both a high degree of correlation between the instruments and the endogenous regressor and the orthogonality between the endogenous regressor and the error term (Hansen J-statistic) . The Breusch-Pagan test for heterogeneity  was used to check for the adequacy of the GMM estimator. The selection of the final functional form of the model was based on the RESET test , a test that checks the possible omission of relevant variables or, more specifically, validates the linearity assumption.
Alternative model specification was based on three criteria: validity of the instruments used; explanatory power of the instruments in the first-stage regression; and AIC (Akaike Information Criteria) in the second-stage (censored) regression.
The association between median TDE and median OS was quantified through Spearman’s rank correlation coefficient in a restricted subset of data consisting of trials only with simultaneously observed values for median TDE and median OS, and excluding those trials with unobserved median OS values [30, 31]. This complete-case analysis is known to result in loss of accuracy and precision when the data are not missing completely at random , but was assumed in order to avoid data imputation methods or assumptions about the distribution of the unobserved median OS values. The analysis was performed with the Stata Statistical Software (Release 11. College Station, TX: StataCorp LP).
Characteristics of the studies and patients included in the analysis
(n = 22,696)
Age in years, mean of medians ± standard deviationa
61.3 ± 7.2
Multiple myeloma, no. of patients (study arms)
Relapsed, refractory, or advanced
Mixed or not reportedb
Duration of the studies (years), mean ± standard deviation
3.8 ± 2.0
Mean of median time-dependent endpoint, months (range)
Time to progression
Median overall survival in months, mean (range)
Effect of median time-dependent endpoints (TDEs) on median overall survival (OS) estimated from a multivariate censored normal regression, by type of TDE
Median TDE, months
1.71 ; 3.20
− 0.42 ; 0.52
Median age, years
− 0.16 ; 1.34
Year of publication
− 0.39 ; 0.87
Relapsed, refractory, or advanced MM
− 0.66 ; 29.14
Mixed or not reported type MM
− 2.89 ; 16.78
TDE (time to progression)
− 2.21 ; 14.11
TDE (event-free survival)
− 11.42 ; 4.40
Treatment including Thal, Bort, or Len
− 17.89 ; 5.94
Effect of median time-dependent endpoints (TDEs) on median overall survival estimate based on alternative modeling specifications
With censored observations (n = 230)
Without censored observations (n = 163)
Base model (BM)a
1.56 ; 2.08
1.69 ; 2.50
BM without two perturbing observations
1.58 ; 2.11
1.73 ; 2.58
BM with adjustment for type of patient and type of TDEb
1.69 ; 3.04
1.52 ; 3.75
BM with adjustment for type of patientc, type of TDE and treatment including Bort, or Len, or Thal
1.71 ; 3.20
1.52 ; 3.71
Overall, the consistency of the estimated values for the effect of median TDE on median OS is evident. The maximum variation in the effect of median TDE on median OS is 34% (β = 2.45; 95% CI, 1.71–3.20) relative to the base model (β = 1.82; 95% CI, 1.56–2.08).
Results based on the sample with uncensored observations provide higher effect values, most likely related to the study design. In our sample, phase II and phase III randomized controlled trials (RCTs) represent 38.8% and 46.3%, respectively, of the study arms with censored observations, and 23.9% and 55.2%, respectively, of the study arms without censored observations. The results from the regression models including and excluding the two outliers are quite similar, indicating the small effect of these two observations in the analysis.
Additional details of the statistical tests performed to assess the validity of the modeling procedures can be found in the Additional file 1.
The detailed predicted median OS and associated 95% CIs based on the observed median TDE for each study arm included in our analysis are presented in the Additional file 1. In general, lower predicted median OS values in the study arms using TTP were found. TTP is the only TDE that does not include death. There was a higher proportion of study arms using TTP as the primary endpoint in the relapsed, refractory, or advanced MM population (46%) compared with 26% and 25% of the study arms evaluating PFS and EFS, respectively. In addition, 35.5%, 63.7%, and 58.0% of the observed median OS are contained in the 95% CI prediction for the TTP, PFS, and EFS, respectively. In approximately one-third of the TTP arms, median OS is under-observed relative to the 95% CI. In the PFS subset, 26.3% of arms reported median OS below the predicted 95% CI and 10% above it. In the EFS subset, 36% of arms report median OS below the predicted 95% CI and 6% above it.
Since 2007, results from 8 phase III trials investigating Thal combinations in patients with newly diagnosed MM not eligible for transplantation have been reported. With the exception of the studies by Ludwig et al.  and Morgan et al. , all other studies compare Thal added to melphalan plus prednisone (MPT) with melphalan plus prednisone (MP). The majority of median OS values reported (13/16) are within our 95% CI estimate for OS, confirming the value of TDE in predicting OS (Figure 2). Another practical implication from our method is that once median TTP/PFS/EFS is observed, we can derive an estimate for the median OS even if it has not been reported. The OS estimates can be used to inform decision makers on newer and promising MM treatment regimens, along with other relevant clinical parameters such as acceptable tolerability, favorable safety profile, and sustainable quality-of-life outcomes.
In Figure 3, we present the estimated median OS vs the observed OS in phase III RCTs using Bort- or Len-based regimens for the treatment of relapsed/refractory MM. The only treatment arm reporting median OS within the 95% CI from our model is the Len plus dexamethasone (Dex) arm from the study by Dimopoulos et al. . At first glance, these results suggest that our model is less suitable for predicting median OS in relapsed/refractory MM. However, it should be noted that the effect from subsequent salvage therapies is expected to be more pronounced, with a shorter median TDE, i.e. the faster progression occurs.
A recent review by Reece  identified several phase III RCTs of post-transplantation maintenance therapy with Thal, Len, and the proteasome inhibitor Bort. These studies differ substantially in the type of induction regimen, transplantation strategy (single vs tandem) and maintenance scheme (drug combinations, doses, and duration). However, this heterogeneity provided us with the opportunity to further evaluate our model. Figure 4 incorporates updated survival outcomes from these trials and presents estimations for the median OS. Trials identified in Reece  that did not report median intermediate TDE were not considered in this figure. The study by Lokhorst et al.  was the only one from this set to contribute with data to our regression model. At present, we can estimate an average absolute increase of 27 months, 17 months, and 47 months in median OS for post-transplantation maintenance therapy with Thal, Bort, and Len, respectively.
There is a sound body of evidence suggesting that TDEs such as PFS, TTP, and EFS are appropriate surrogate endpoints for OS in several types of cancer [31, 60–71]. However, some conflicting evidence [61, 72, 73] and some methodological , regulatory , and conceptual/practical  arguments fuel the ongoing discussion about surrogate endpoints in the cancer literature [76–78] and challenge the establishment of TDEs in oncology clinical development .
Our study is the first to highlight the value of TDEs in predicting OS in MM and to confirm the recommendations of the American Society of Hematology/US FDA Workshop on Clinical Endpoints in both newly diagnosed or relapsed/refractory MM . We focused our research on estimating the absolute effect of TDEs on OS rather than using a relative measure. We are aware that other assessments of potential surrogate endpoints require a two-step validation process, which involves: 1) establishing that the surrogate endpoint predicts the final endpoint accurately; 2) demonstrating that the effect of treatment on both the surrogate endpoint and the final endpoint is closely correlated . Our methodology, while inherently considering the two last criteria, follows a less formal  approach by using regression modeling methods to show that the effect on median OS is captured by the TDE (validation criterion 1) and that adding treatment to the linear predictor does not improve the prediction (validation criterion 2) (i.e. does not improve the fit), hence suggesting that the causal link between treatment and endpoint has been captured by the TDE predictor.
In order to assess our model’s ability to predict OS in different settings, we confronted our estimates with data from studies in relapsed/refractory MM led by Dimopoulos et al.  and Richardson et al.  (Figure 3). The study by Dimopoulos et al. comparing Len plus Dex with Dex alone reported a hazard ratio (HR) for progression of 0.31 (median 13.4 vs 4.6 months) , and for death of 0.71 (median 38.0 vs 31.6 months) . In the study by Richardson et al., which compared Bort to Dex, the reported HR for progression was 0.55 (median 6.2 vs 3.5 months)  and for death was 0.77 (median 29.8 vs 23.7 months) .
Estimates of median OS using our model suggest an HR for death of 0.34 for Len plus Dex vs Dex alone (median OS 33 vs 11 months), and 0.55 for Bort vs Dex (median OS 15 vs 9 months), assuming event times are exponentially distributed . In this case, the treatment effect on TTP would explain more than 90% of the treatment effect on OS for both Len plus Dex and for Bort.
It has been argued that OS is not a realistic endpoint in this setting , especially considering the ever increasing availability of new, effective drugs that can be used as salvage therapies [8, 85] which may mask the real survival differences between treatment arms. Statistical methods to correct for bias resulting from non-informative censoring (crossover and subsequent treatment options) in survival analysis are increasingly popular [86, 87]. In a recent paper by Ishak et al. , information from trials conducted by the Medical Research Council (United Kingdom) was used to calibrate survival regression analyses in order to reproduce survival estimates corrected for patient crossover in clinical trials. These authors present a median OS of 11.6 months (95% CI, 9.5–14.2) for patients with > 1 prior therapy randomized to Dex , which is similar to our estimate of 11.3 months. Furthermore, in a survival analysis adjusted for crossover in the APEX trial, Pacou et al. report an OS HR of 0.59 for Bort relative to Dex , which is also very similar to the value derived from our model (HR 0.55), suggesting that our model performs accurately in trials with substantial crossover. Nonetheless, caution is recommended for extrapolation outside the context of our sample because more mature data on more recent clinical trials and future research in this topic is clearly needed.
We provide a more straightforward way of calculating the expected effect of treatment on median OS (prior to the observation of mature OS data), by estimating an absolute rather than a relative measure for the quantitative relationship between the median TDE and median OS. This regression model recognizes the influence of subsequent therapies because it estimates a mean effect of median TDE on median OS using OS data published in the literature, which is uncorrected for the effect of non-randomized subsequent treatment options. We estimated an average increment of 2.45 months in median OS for each additional month of median TDE. As previously highlighted, these estimates are valuable to assess the expected impact of treatments on median OS, for example in trials of newly diagnosed MM where median OS may not be reached for several years.
Information on survival is essential for clinical trial design , accelerated approvals for new drugs , indirect drug comparisons, and economic considerations (e.g. formulary inclusion and other reimbursement decisions), particularly in the absence of head-to-head comparative clinical trials. Such information may help clinicians select the most suitable treatment options for MM patients.
Other studies examining the relationship between TDEs and OS have been reported in metastatic colorectal cancer (mCRC) [31, 62] and in metastatic breast cancer . In mCRC, there was a strong association between PFS and OS [31, 62], with similar correlation coefficients as obtained in our analysis of MM patients . In metastatic breast cancer, no particular endpoint was determined to be an adequate surrogate for OS . The different conclusions from studies in breast cancer, mCRC, and MM emphasize the fact that appropriate TDEs cannot be generalized in oncology, and their validity depends on tumor type.
The following caveats should be considered when interpreting our results. Although it seems reasonable to question the endogeneity of TDEs as an explanatory variable for OS, this issue has not been addressed in the MM literature. The methodology presented here attempts to solve the endogeneity problem, but its applicability depends on the availability of valid instruments.
In this analysis, TDEs include three distinct surrogate endpoints; TTP, PFS, and EFS. The estimated relationship between the TDE and OS represents the relationship between an “average” TDE and OS. Although no statistical differences have been found in modeled OS by the type of TDE, the value of the information is limited. Further studies are necessary, particularly to clarify the data from studies using TTP, both because of the competing risk estimation problems  and the arguments against the use of TTP . Testing could be performed by either modeling each of the subsamples or by including an interaction term between the TDE and type of surrogate endpoint marker in the regression. In the current analysis, no testing could be performed due to the sample size and need for additional (valid) instruments.
Our analysis includes therapies available over a period of 40 years that demonstrated a wide range of efficacy levels. We attempted to control these differences by using publication year as a covariate. In addition, our censored analysis omitted treatment arms with proportionally longer median OS and therefore may not reflect adequately the impact of newer, more effective therapies. Finally, the majority of the studies did not report whether data for OS included patients who were allowed to crossover between treatment arms. Study designs that include automatic treatment crossover can obscure differences in OS, due to the benefit achieved from subsequent treatments .
In conclusion, our analysis confirms the potential value of TDEs (TTP, PFS, and EFS) in predicting OS in patients with MM. Additional research is welcomed to refine this model or to identify alternative complementary statistical models. Until such models are available and validated, the quantitative relationship presented here may be of value in the design of clinical trials, indirect drug comparisons, and economic assessment of new MM drugs.
We thank Isabel Boaventura, MD, from Celgene Portugal for her helpful comments on the manuscript. Excerpta Medica provided editorial support in the preparation of this manuscript, funded by Celgene Corporation. Neither Excerpta Medica nor Celgene Corporation influenced the content of the manuscript, nor did the authors receive financial compensation of any kind for authorship of the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.