Systematic literature review
A comprehensive systematic search of the literature was conducted on 31 March 2021 using the following databases on the Ovid platform: Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica database (EMBASE), and Cochrane Central Register of Controlled Trials (Additional file: Tables S1, S2, S3). Searches were conducted in accordance with recommendations from the Cochrane Collaboration, National Institute for Health and Care Excellence (NICE) guidance, Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG in Germany). Manual searches were also undertaken of relevant conference proceedings over the previous 2 years, as well as international clinical trial databases, to identify additional eligible studies.
Eligible studies for the SLR were among adults (aged ≥18 years) with r/r iNHL after failure of two or more lines of therapy. For the purpose of this study, the analysis-set was further reduced to r/r FL patients as discussed in further detail below. Randomized control trials, non-randomized trials, observational studies and registries were all eligible study designs. Eligible interventions were any approved for treatment in the US or Europe, best supportive care or placebo. Here too, the SLR scope was broad, including genetic therapies and therapies approved for other iNHL indications (e.g., ibrutinib is approved for marginal zone lymphoma and other iNHL, but not for FL). The full study eligibility criteria, defined in terms of the population, interventions, comparisons, outcomes, and study design (PICOS), are outlined in Additional file 1: Table S4.
Two reviewers, working independently, reviewed all abstracts and proceedings identified in the searches according to the selection criteria, with the exception of outcome criteria which were adjudicated during full-text screening. Eligible studies then underwent full-text screening by the same two reviewers, and full-text studies that met the inclusion criteria were identified for data extraction. Any disagreement between the two reviewers was adjudicated and resolved by a third reviewer. This process is detailed in the PRISMA [15] flow diagram (Fig. 1).
Data on study characteristics, interventions, patient characteristics, and outcomes for the final list of included studies was extracted by the two independent reviewers. Since direct access to study data was not available for time-to-event outcomes, survival curves were digitally extracted using the DigitizeIt software. These were then used to generate pseudo-individual patient-level data by applying the Guyot algorithm with numbers at risk tables [16]. Time-to-event data from the reconstructed survival curves were extracted by one reviewer and then independently verified by the second reviewer.
Given the mixed study designs eligible for the evidence base (i.e., the eligibility of both randomized and non-randomized studies), the quality assessment for the evidence base was performed using the Downs and Black checklist [17]. This study quality tool is well established and lends itself to all eligible study designs, which allowed for a single assessment tool to be used for all studies.
Study selection for inclusion in the analysis set was conducted in two steps. First, a feasibility-assessment-set was identified by reducing patients to the scope of the project at hand. Studies including small lymphocytic lymphoma, lymphoplasmacytic lymphoma, MZL only or transformed FL/MZL were removed, unless subgroups excluding these patients were available. Studies restricted to Grade 1 and 2 FL were also excluded from analyses. One study explicitly included Grade 3b patients [18], which, after further review, a judgment was made that the few Grade 3b patients included in the trial would have negligible impact on the outcomes of interest, and thus this study was included in the analysis set. Studies examining CAR-T therapy were also removed as CAR-T did not represent an available treatment modality at the time of analysis. Second, studies were further restricted following the results of the feasibility assessment. The analysis set was restricted to sample sizes of at least 20 patients because a few studies reporting on FL as a subgroup had very small sample sizes (often below 5) that led to high levels of heterogeneity.
Statistical analyses
A frequentist meta-analysis approach was used for the ORR, CR, PFS and OS outcomes and a Bayesian approach was used in meta-analysis of the digitized Kaplan-Meier curve data for the time-to-event outcomes. Treatments identified from studies that met the inclusion criteria were simplified for the purpose of analysis into the following categories: standard of care (SoC), PI3k-δ inhibitors, Lenalidomide + Rituximab, Bortezomib + Rituximab, Obinutuzumab + Benda, 90Y + Anti-CD20 combination, Autologous stem cell transplant (SCT), and Allogeneic SCT. The evidence base included data from three studies [7, 19, 20] that included a heterogenous sampling of both treatments and patient populations. These were considered to be representative of typical care and thus were dubbed to be representative cohorts. The most common treatments were anti-CD20 monoclonal antibodies, with or without chemotherapy [21, 22], and PI3k-δ inhibitors [23,24,25,26].
All meta-analyses using single summary statistics of proportions were based on dichotomous outcomes: ORR and CR. For the analysis of each of these outcomes, inverse-variance meta-analyses were used. The Freeman-Tukey double arcsin transform was used throughout to ensure stability in the extreme proportion values (near 1 or 0). Our review of the data revealed multiple instances of observed proportions of 1, so this was deemed necessary. The analyses were stratified by the treatment categories outlined above. Both fixed- and random-effects were used within the strata, but random-effects were not used between them. The results from each stratum were combined using a weighted mean with relative sample size as the weight. Weights were designed to sum up to 1 to ensure an unbiased estimate. Heterogeneity within strata was assessed using the I2 statistic.
Meta-analyses for the digitized Kaplan-Meier survival curves, for both OS and PFS, were analyzed in both the frequentist and Bayesian framework. Bayesian analyses used non-informative prior distributions and were based on methods for network meta-analyses of survival data using a multidimensional treatment effect as an alternative to the synthesis of the constant hazard ratios, as developed be Ouwens et al. [27] and Jansen [28]. Namely, the hazard functions of the interventions in a trial were modeled using known parametric survival functions or fractional polynomials. Given the non-comparative nature of this evidence base, a simple version of the model introduced by Jansen was used for the meta-analyses of OS and PFS [28, 29].
Of note, patients included in the representative cohorts were followed from one line to the next and as a result, observations were not fully independent for OS and PFS. In addition, restricting analyses to include only patients in their third line of treatment was deemed more detrimental than having repeated measures among some patients, and thus no such restrictions were implemented. Where permitted by the evidence, analyses also included those patients receiving a fourth line or more of treatment.
For Bayesian analyses, the deviance information criterion (DIC) was used to compare the goodness-of-fit of competing survival models [30]. A difference in DIC of approximately 5 points was considered meaningful and, in the case of survival models, the hazard functions were visually inspected for over-fitting [16]. The parameters of the different models were estimated using a Markov Chain Monte Carlo method implemented in the JAGS software package. A first series of 20,000 iterations from the JAGS sampler were discarded as ‘burn-in’, and the inferences were based on an additional 40,000 iterations using two chains. For all analyses, model convergence was assessed through trace plots, density plots and Gelman-Rubin-Brooks (shrink factor) plots [31].
The patient population in the primary analyses were restricted to patients with FL receiving therapies other than transplant because: a) this treatment modality represents a very different intervention to those being studied; b) the SCT study populations tended to be significantly younger and healthier; and c) these studies appeared to be overrepresented in the evidence base. Furthermore, as these studies only reported on patients who survived through to SCT, these studies were at risk of immortal time bias. The primary model also excluded off-label treatments for FL, as these were considered atypical. A second model included only study cohorts that were representative of care. Two supplemental models included a) off-label treatments, and b) only SCT studies. The viability of each model depended upon data availability (Additional File 1: Table S5).