A systematic review of lenvatinib and sorafenib for treating progressive, locally advanced or metastatic, differentiated thyroid cancer after treatment with radioactive iodine

Background Treatment with radioactive iodine is effective for many patients with progressive, locally advanced or metastatic, differentiated thyroid cancer. However, some patients become refractory to treatment. These types of patients are considered to have radioactive iodine refractory differentiated thyroid cancer (RR-DTC). Methods We searched Embase, MEDLINE, PubMed and the Cochrane Library from January 1999 through January 2017. Reference lists of included studies and ongoing trial registries were also searched. Reports of randomized controlled trials (RCTs), prospective observational studies, and systematic reviews/indirect comparisons were eligible for inclusion. In the absence of direct clinical trial evidence comparing lenvatinib versus sorafenib, we assessed the feasibility of conducting an indirect comparison to obtain estimates of the relative efficacy and safety of these two treatments. Results Of 2364 citations, in total, 93 papers reporting on 2 RCTs (primary evidence), 9 observational studies and 13 evidence reviews (supporting evidence) were identified. Compared to placebo, RCT evidence demonstrated improvements with lenvatinib or sorafenib in median progression-free survival (PFS) and objective tumour response rate (ORR). Overall survival (OS) was confounded by high treatment crossover (≥75%) in both trials. Adverse events (AEs) were more common with lenvatinib or sorafenib than with placebo but the most common AEs associated with each drug differed. Primarily due to differences in the survival risk profiles of patients in the placebo arms of the RCTs, we considered it inappropriate to indirectly compare the effectiveness of lenvatinib versus sorafenib. ORR and AE findings for lenvatinib and sorafenib from the supporting evidence were broadly in line with RCT evidence. Health-related quality of life (HRQoL) data were limited. Conclusions Lenvatinib and sorafenib are more efficacious than placebo (a proxy for best supportive care) for treating RR-DTC. Uncertainty surrounds the extent of the impact on OS and HRQoL. Lenvatinib could not reliably be compared with sorafenib. Choice of treatment is therefore likely to depend on an individual patient’s circumstances.


Background
Thyroid cancer accounts for approximately 1% of all new malignancies in the United Kingdom (UK) [1] and approximately 3% of all new malignancies in the United States (US) [2]. Commonly asymptomatic and so often discovered incidentally [3], the most common type of thyroid cancer is differentiated thyroid cancer (DTC). A review of 2936 US patients registered with DTC found papillary carcinoma (PTC), follicular carcinoma (FTC) and Hürthle cell carcinoma to constitute 86, 10 and 4% of cases respectively [4]. Globally, DTC incidence is increasing [5]. In part, this increase has been attributed to improved diagnostic and detection techniques [6].
Surgery followed by daily oral medication (levothyroxine) to suppress blood thyroid stimulating hormone (TSH) levels is the mainstay of treatment for DTC [7][8][9][10]. Additional treatment in the form of radioactive iodine may be required for patients who develop local, regional or metastatic disease (5 to 20% patients [7,9]). For most patients, radioactive iodine treatment is effective. However, 5 to 15% [4,[11][12][13][14][15] of people with DTC develop radioactive iodine refractory differentiated thyroid cancer (RR-DTC), i.e. they are unable to safely tolerate treatment or they develop DTC that has become resistant to treatment.
For patients with RR-DTC, treatment options have been limited. Chemotherapy is rarely or never recommended by the authors of clinical guidelines [7][8][9][10] and thus, for many patients, best supportive care (BSC) has been the only treatment option. However, the authors of published clinical guidelines have noted the promise of targeted therapies including tyrosine kinase inhibitors (TKIs). Lenvatinib is the most recent TKI to be licensed for treating RR-DTC, receiving a licence in the US in February 2015 [16] and in the European Union (EU) in May 2015 [17]. The only other licensed TKI is sorafenib, which was licensed for the treatment of RR-DTC in the US in November 2013 [18] and in the EU in January 2015 [19]. The authors of the US National Comprehensive Cancer Network (NCCN) guidelines now recommend that lenvatinib and sorafenib should be considered for treating progressive and/or symptomatic RR-DTC [10]. The authors, however, caution against their use for patients with stable or slowly progressive indolent disease [10]. The authors of the American Thyroid Association (ATA) guidelines caution that patients who are candidates for TKI therapy "should be thoroughly counseled on the potential risks and benefits of this therapy as well as alternative therapeutic approaches including best supportive care" [7]. Important risks associated with lenvatinib highlighted by regulatory agencies [16,17] include: hypertension; cardiac dysfunction; arterial thromboembolic events; hepatotoxicity, renal failure or impairment; proteinuria; diarrhea; fistula formation and gastrointestinal perforation; QT interval prolongation; hypocalcemia; reversible posterior leukoencephalopathy syndrome; hemorrhagic events; impairment of TSH suppression/thyroid dysfunction; wound healing complications; and embryo-fetal toxicity. Important risks associated with sorafenib highlighted by regulatory agencies [18,19] include: dermatologic toxicities including severe skin adverse events (AEs) and hand-foot syndrome; hypertension; posterior reversible encephalopathy syndrome; hemorrhage (including lung hemorrhage, gastrointestinal hemorrhage and cerebral hemorrhage); arterial thrombosis (myocardial infarction); congestive heart failure; QT interval prolongation; squamous cell cancer of the skin; gastrointestinal perforation; symptomatic pancreatitis and increases in lipase and amylase; hypophosphatemia; renal dysfunction; interstitial lung disease-like events; drug-induced hepatitis; impairment of TSH suppression; and embryo-fetal toxicity.
While lenvatinib and sorafenib are available for treating RR-DTC in several countries, the extent to which they are available to patients has varied. For example, lenvatinib and sorafenib are available for all patients who require these treatments in Scotland via the National Health Service (NHS) [20,21]. However, prior to August 2018, they were only available for patients in special circumstances in the NHS in England. In order to be routinely used in the NHS in England, a positive recommendation from the National Institute for Health and Care Excellence (NICE) is required. We, the Liverpool Reviews and Implementation Group (LRiG), were commissioned, in our capacity as an independent Assessment Group, to provide an independent review of the clinical and cost effectiveness evidence as part of a NICE multiple technology appraisal (MTA). In this paper, we report our systematic review of the clinical effectiveness evidence for lenvatinib and sorafenib and discuss how the evidence has impacted on NICE recommendations for clinical practice.

Methods
Our systematic review protocol was registered with PROSPERO, the international prospective register of systematic reviews (registration number CRD42017055516). The review was conducted in accordance with the Centre for Reviews and Dissemination (CRD) published guidance on conducting systematic reviews in healthcare [22] and the review is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [23].

Search methods for identification of studies
On 10 January 2017, four electronic databases (Embase (Ovid), MEDLINE (Ovid), PubMed and the Cochrane Library) were searched for studies published since 1 January 1999. On 16 May 2017, the clinicaltrials.gov website (a service of the US National Institutes of Health), the International Clinical Trials Registry Platform and the European Union Clinical Trials Register, were searched for information on studies in progress. To identify relevant studies, a combination of index terms for the disease (e.g. thyroid neoplasms) and free text words (e.g. lenvatinib or Lenvima or E7080 or Sorafenib or Nexavar or bay439006) were employed. The database searches were limited to human research and English language studies. No other search restrictions were applied. The search strategies employed are provided in Additional file 1: Online Resource 1.
Evidence submissions from the sponsors of lenvatinib [24] and sorafenib [25] that were submitted to NICE as part of the MTA process were considered for inclusion in our review. The lists of references from the company submissions and all relevant studies identified via the literature searches were cross-checked to identify any papers not identified by the electronic searches.

Study selection and data extraction
Randomized controlled trials (RCTs), prospective observational studies and systematic reviews/indirect comparisons (hereafter referred to as evidence reviews) of lenvatinib or sorafenib were selected for inclusion in the review. To be included, the population must have included adults with progressive, locally advanced or metastatic thyroid cancer refractory to radioactive iodine, of which at least a subgroup of patients had RR-DTC. A summary of the a priori inclusion and exclusion criteria are provided in Table 1.
Two reviewers independently screened all titles and abstracts (screening stage 1). Full-text articles of all potentially relevant citations identified during screening stage 1 were retrieved and assessed for eligibility based on the inclusion criteria (screening stage 2). Where necessary, any discrepancies or uncertainties were resolved by discussion or consultation with a third reviewer.
Two reviewers independently extracted and checked data by using a pre-tested data extraction form. Data were extracted relating to study design, patient characteristics and outcomes for RCTs and observational studies and the number and type of studies included, type of analysis conducted and the overall findings/conclusions for evidence reviews. For all study types, data reported in multiple publications were extracted and reported as a single study.

Quality assessment
The quality of included RCTs and evidence reviews was assessed according to the criteria set out in the Centre for Review and Dissemination's Guidance [22] for undertaking reviews in healthcare. Two reviewers independently assessed the quality of these studies and, where necessary, disagreements were resolved by consultation with a third reviewer. In accordance with the protocol, quality assessment of the prospective observational studies was not conducted.

Data synthesis
Data from the included RCTs were considered to provide primary clinical effectiveness evidence. Data from observational studies and from evidence reviews were considered to provide supporting evidence.
For the RCTs, in addition to the primary published papers [26,27], data were extracted from other sources identified from the searches, as appropriate. In this paper, additional information for the SELECT trial was extracted from the company submission from Eisai Ltd. [24], the clinical study report (CSR) (unpublished), three conference abstracts [48][49][50] and the European public assessment report (EPAR) for lenvatinib [51]. For the DECISION trial, additional information was extracted from the company submission from Bayer HealthCare [25], an additional published paper with supplementary safety data [52], the CSR (unpublished), three conference abstracts [53][54][55] and the EPAR for sorafenib [56].
For one of the included prospective observational studies of sorafenib, known as UPCC-03305 [32], the majority of data were extracted from later conference reports of the same study [57][58][59] which reported baseline characteristics from a greater number of patients [58], efficacy data [59] and safety data [57].

Characteristics of included studies
Characteristics of randomized controlled trials (primary evidence) Both of the included RCTs [26,27] were phase III multicentre double-blind trials designed to compare the intervention of interest (lenvatinib or sorafenib) with placebo. Subjects were randomized 2:1 to the intervention and comparator arms of the SELECT trial (lenvatinib, n = 261; placebo, n = 131) [26] and 1:1 in the DECISION trial (sorafenib, n = 207; placebo, n = 210) [27]. Both trials permitted some concomitant therapies (such as TSH suppression) in both the intervention and placebo arms. Thus, the placebo arm in both trials could be considered to be equivalent to BSC. The types of concomitant therapies were broadly similar in both trials. However, a potentially important difference between the two trials was that palliative radiotherapy, which is commonly available as part of BSC in clinical practice, was only permitted in the DECISION trial, not the SELECT trial. Nonetheless, rates of palliative radiotherapy administered to patients in the DECISION trial were relatively low: 10.6% of patients treated with sorafenib and 21.4% of patients treated with placebo [25].
Patients were eligible to receive treatment (intervention or placebo) in both the SELECT and DECISION trials until disease progression [26,27]. In both trials, patients were then enrolled into open extension phases [24,25]. In the DECISION trial, patients who had progressed on sorafenib were permitted to continue to receive sorafenib until further disease progression and approximately a quarter (26.6%) of patients did so [53,54]. In both the SELECT and DE-CISION trials, patients in the placebo arms could cross over from the placebo arm to the active treatment arm. Patient crossover on disease progression was high in both trials (SELECT: 87.8%, DECISION: 75%) [24,25]. In addition, in both trials, patients in either arm were also eligible to receive subsequent anti-cancer treatments that were not part of the trial protocols [24,25]. In the SELECT trial, at the primary data-cut, 15.7% of patients randomized to lenvatinib and 12.2% of patients randomized to placebo, had received subsequent treatment (data from CSR) including treatment with another TKI (data from CSR). Of those who received subsequent treatment, 17.1% of patients in the lenvatinib arm received pazopanib and 14.6% received sorafenib (data from CSR). In the placebo arm, the respective proportions were 18.8 and 12.5% (data from CSR). In the DECISION trial, at the primary data-cut, 20.3% of patients randomized to sorafenib and 8.6% of patients randomized to placebo received subsequent treatments [27]. Information on the specific agents used during the DECI-SION trial follow-up period was not collected.
The median duration of follow-up at the primary datacut was approximately 17 months in both trials [26,27]. OS results were also reported at a second and third data-cut in both trials [24,25]. At the third data-cut, the median length of follow-up was approximately 38 months in the SELECT trial [24] and 36 months in the sorafenib arm of the DECISION trial [25] (length of follow-up data have only been reported for the sorafenib arm of this trial).
The OS results from both trials were adjusted for treatment crossover using the Rank Preserving Structural Failure Time Model (RPSFTM) [60]. No adjustments were made, in either trial, to take into account subsequent anti-cancer treatment, as there is no recognised approach for making such adjustments.
A key difference in eligibility between the two RCTs was that the SELECT trial permitted the enrolment of patients who had been previously treated with a TKI (including sorafenib) [26], whilst patients recruited to the DECISION trial were all TKI naïve [27]. Overall, 25.3% of patients in the lenvatinib arm and 20.6% of patients in the placebo arm of the SELECT trial had received prior treatment with a TKI [26]. Approximately three quarters of patients who received a TKI in the SELECT trial had previously been treated with sorafenib (77.2% in the lenvatinib arm and 77.8% in the placebo arm) [26].
Most of the observational studies were conducted in single countries (and often in single centres) in Europe [28,31,34,35], the US [33,58], and Asia [30,36]. However, there was one multi-centre international study of lenvatinib (Study 201) [29]. Where reported, patients were recruited prior to the commencement of the SE-LECT [26] and DECISION [27] trials, the exception was a Japanese study of lenvatinib (Study 208) [36] that began after recruitment to the SELECT trial had ended.
The median length of follow-up, as reported in the EPAR for lenvatinib [56], was longer in the observational studies of lenvatinib [29,36] than in the SELECT trial [24]: 40 months in Study 208 [56] and 51.6 months in Study 201 [56]. Conversely, where reported [28,34,35], the median length of follow-up in the observational studies of sorafenib was shorter for OS but longer for other outcomes than in the DECISION trial [25]: 19 months [34] to 25 months [35].
The number of patients included in the nine prospective observational studies varied from nine [30] to 58 [29]. In total, across all studies, 109 patients were treated with lenvatinib, of whom 83 had RR-DTC; 213 patients were treated with sorafenib, of whom 186 had RR-DTC. Other patients included in four of the studies [28,33,36,58] had anaplastic (n = 26) or medullary (n = 27) carcinoma. Participant characteristics were reported for all treated patients in each study and, where reported, median age ranged from 55 years [28] to 64 years [33]. Where reported, four studies included a majority of males [28,29,33,35] and three studies included a majority of females [31,34,58]. Only two studies explicitly stated that patients could have received a prior TKI [29,34] and, in these studies, the proportion of patients who did receive a prior TKI ranged from 11.8% [34] to 29.3% [29].
The earliest review, which presented evidence narratively, was published in 2013 [37] and the most recent reviews (from 2017) were the evidence submissions from the sponsors of lenvatinib [24] and sorafenib [25]. Both of the evidence submissions [24,25] included modified versions of the indirect comparisons of lenvatinib versus sorafenib originally conducted by Tremblay et al. 2016 [46]; the original results [46] were also reported in the Canadian Agency for Drugs and Technologies in Health (CADTH) submission for lenvatinib [39]. One other publication [42], included an indirect comparison of lenvatinib versus sorafenib. The two reviews that included only observational studies of sorafenib meta-analyzed the data from the studies they included [44,45].

Results from the included studies Primary evidence efficacy evidence
We have reported RCT evidence from the primary datacuts of the SELECT and DECISION trials [26,27], with the exception of OS data, which are reported for the third data-cut [24,25]. The results for OS, PFS and ORR from the RCTs are summarized in Table 2.
For OS, no statistically significant differences between trial arms were found in either trial [24,25]. When OS results from both trials were adjusted for treatment crossover, the difference was reported to be statistically significant in the SELECT trial, favouring lenvatinib over placebo [24] but a similar finding was not reported in the DECISION trial for sorafenib versus placebo [25]. Compared to placebo, median PFS and ORR were improved with lenvatinib in the SELECT trial [26] and with sorafenib in the DECISION trial [27]. The difference in ORR between trial arms was particularly pronounced in the SELECT trial, difference in ORR 63.2% (95% CI: 57.1 to 69.4%) [26]; the difference in ORR in the DECISION trial was 11.7% (95% CI: 7.0 to 16.5%). Differences between arms were reported to be statistically significant for PFS and ORR in both trials [26,27].
As some patients in the SELECT trial had previously received a TKI (including sorafenib), subgroup analyses were conducted to assess the effect of this previous treatment and the results have been reported for median PFS and ORR [26]. Median PFS was longer for patients treated with lenvatinib compared with placebo, irrespective of whether patients had received a TKI [26]. Median PFS for those previously treated was 15  Assessed by blinded independent review at primary data-cut d Unlike the SELECT trial, patients who were unevaluable for response were excluded from the analyses in the DECISION trial. There were 18 (4.3%) patients who were excluded from the objective tumour response analyses in the DECISION trial, 9 (4.3%) patients in each arm [27] Source: [26,27] with additional OS data from Eisai Ltd. 2017 [24] and Bayer HealthCare 2017 [25] and additional ORR data (95% CIs) from European public assessment report (EPAR) for lenvatinib [51] and EPAR for sorafenib [56] Indirect comparison of lenvatinib versus sorafenib In the absence of direct clinical trial evidence comparing treatment with lenvatinib versus treatment with sorafenib, we assessed the feasibility of conducting an indirect comparison to obtain estimates of the relative efficacy and safety of these two treatments. As both the SELECT and DECISION trials shared a common comparator (placebo), it is possible to construct a network. Indeed, indirect comparisons have been reported in evidence reviews [24,25,39,42,46]. We therefore tested whether all these assumptions were supported by the data.
In relation to (i), we found that there were a number of differences in trial and participant characteristics, which were most pronounced when comparing the placebo arms of the two trials, as highlighted in Table 3. In relation to (ii), from an examination of PFS data, it was also evident that the survival risk profiles of the shared comparator (the placebo arms) were not comparable (Fig. 2). In relation to (iii), we tested the validity of the proportional hazards assumption for OS, RPSFTMadjusted OS and PFS against a non-linear (quadratic) counterfactual using an analysis of variance (ANOVA) test. With the exception of unadjusted OS data in the DECISION trial, we found the PH assumption was violated and thus the network of evidence was compromised for all efficacy outcomes. Therefore, we did not undertake an indirect comparison to compare the efficacy of lenvatinib versus sorafenib.

Supporting efficacy evidence
Efficacy findings from the observational studies [28-31, 33-36, 59], and meta-analyses conducted by the authors of two sorafenib reviews [44,45] are summarised in Table 4. Data were also extracted from the EPAR for sorafenib [56] for OS and ORR for one of the observational studies [33] and for ORR for another observational study [28]. This is because these results were not presented only for patients with RR-DTC in the published papers of these studies.
Median OS reported in both observational studies of lenvatinib [29,36] was approximately 32 months, lower than the median OS estimates reported for both arms of the SELECT trial (lenvatinib: 41.6 months, placebo: 34.5 months) [24]. Similarly, median OS reported in three studies of sorafenib [33,35,59], which ranged from 23 months [33] to 34.5 months [35], was lower than median OS reported in either arm of the DECISION trial (sorafenib: 39.4 months, placebo: 42.8 months) [25]. Median OS could not be estimated in one other study of sorafenib, as it had not yet been reached [28].
Two published papers have reported efficacy results from indirect comparisons of lenvatinib with sorafenib [42,46] utilising data from the SELECT and DECISION trials [26,27]. There were no statistically significant differences in OS (whether RPSFTM-adjusted, or not) but in both papers, it was reported that PFS was significantly better with lenvatinib versus sorafenib (HR 0.36, 95% CI: 0.22 to 0.57) [42,46]. The results from a matched adjusted indirect comparison (MAIC) for OS and PFS were very similar to the unmatched results [46]. One of the published papers also included a comparison for ORR and found no statistical significance between lenvatinib and sorafenib (relative benefit 1.72, 95% CI: 0.15 to 19.40) [42].

Primary safety evidence
Safety evidence from the SELECT and DECISION trials is summarised in Table 5. The majority of AE data for the SELECT trial is taken from the Eisai Ltd. evidence submission [24] as, similar to the reporting in the DECISION trial [27], this reported treatmentemergent AEs, whereas the primary published paper mostly reported treatment-related AEs [26]. Treatment with both lenvatinib and sorafenib led to an increase in the incidence of AEs versus treatment with placebo [24,27]. Dose interruptions and reductions were very frequent for patients treated with both lenvatinib and sorafenib [26,27]. Fatal AEs were recorded for 7.7% of patients treated with lenvatinib and 4.6% of patients who received placebo in the  [24,26], EPAR for lenvatinib [27,51] and appendix to Bayer HealthCare 2017 [25] Text in bold relates to the most notable differences between placebo arms and shaded cells the most notable differences between trials in any arm SELECT trial [26]. Fatal AEs in the DECISION trial were recorded for 5.8% of patients treated with sorafenib and 2.9% of patients in the placebo arm [27]. The most frequently reported AEs occurring in around two-thirds of patients were, for lenvatinib, hypertension and diarrhoea [24] and, for sorafenib, hand-foot syndrome, diarrhoea and alopecia [27]. Hypertension was a very frequent Grade ≥ 3 AE reported with lenvatinib [24] and hand-foot syndrome was a frequent Grade ≥ 3 AE reported with sorafenib [27].
Analyses have been undertaken to determine the median time to onset of five AEs for patients treated with lenvatinib in the SELECT trial [48], and eight AEs with for patients treated with sorafenib in the DECISION trial [52]. The results suggest that, when treated with either   [7] ORR, % 50-68 [2] 15-38.3 [7] 20.9 22 95% CI 14.3-27.5 [6] 15-28 [7] -= not applicable, CI Confidence interval, ORR Objective tumour response rate, OS Overall survival, PFS Progression-free survival a An additional study reported that the median OS had not been met [28] b One other study reported that the median PFS had not been met [28] and another reported mean PFS only (9.7 months) [30]; in this latter study sorafenib was studied at half the dose of all other studies and included only 9 patients c No meta-analyses were identified [x] denotes the number of studies from which data are derived lenvatinib or sorafenib, most AEs typically occur early, with a decrease in incidence, prevalence and severity over time [48,52]. However, hypertension was a notable AE omitted from the analysis of lenvatinib data [48].
The incidences of any all-Grade and Grade ≥ 3 AEs for patients treated with lenvatinib were similar in patients who had received a prior TKI to those who had not [49,50]. The proportion of patients who had at least one lenvatinib dose reduction was also similar between these two subgroups [49,50].
Although there were differences in the incidences of some AEs across studies [28,29,33,35,36,44,45,57] and compared to the SELECT and DECISION trials [24,27], the most common types of AEs with both drugs were similar to those found in the RCTs. As with the RCT evidence [26,27], dose interruptions and reductions were very frequent for patients treated with either lenvatinib [29] or sorafenib [35,57].

Evidence for health-related quality of life with treatment
HRQoL data were only collected during the DECISION trial and the results were presented in a conference abstract [55] and in Bayer HealthCare's evidence submission to NICE [25]. Cancer-specific HRQoL was measured using the Functional Assessment of Cancer Therapy -General (FACT-G) questionnaire [62] and general health status was measured using the generic EuroQol five dimensions, three-level questionnaire (EQ-5D-3 L) and the EQ-5D visual analogue scale (VAS) [63]. All questionnaires were self-administered at baseline and day 1 of every 28-day cycle until disease progression [55]. The overall questionnaire completion rate during the DECISION trial was reported to be > 96% [25].
At baseline, patients' HRQoL data were considered by the authors to be comparable to a normative adult cancer population [25,55]. However, at the first assessment (cycle 2, day 1), HRQoL scores (FACT-G, EQ-5D-3 L and VAS) had deteriorated in the sorafenib arm [25,55]. Thereafter, the sorafenib arm scores remained similar to the scores recorded at the first assessment until disease progression [25,55]. Scores for the placebo arm remained very similar to the baseline scores at the first assessment and all subsequent assessments until disease progression [25,55]. Results from a mixed linear model showed that, compared with placebo, the FACT-G score was 3.45 points lower in the sorafenib arm than the placebo arm (p = 0.0006) [25,55]. This is reported to represent a clinically meaningful difference between arms in favour of the placebo arm [25,55]. While the between arm differences were statistically significant for both EQ-5D-3 L and VAS (p < 0.0001), the treatment effects (− 0.07 and − 6.75, respectively) were reported to be of a small magnitude which did not reach the threshold considered to represent a clinically meaningful difference [25,55].

Discussion
The aim of this review was to compare the clinical effectiveness evidence for lenvatinib or sorafenib in relation to BSC and also to compare the effectiveness of both drugs with each other.
Trial results show that both drugs are more efficacious in terms of median PFS [26,27] and ORR [26,27] but also result in more AEs than placebo [24,27]. Placebo can be considered to be a proxy for BSC in both trials, even though concurrent use of palliative radiotherapy was not permitted for patients in the SELECT trial (data from CSR). Some of the most common types of AEs differ by drug, most notably hypertension being very common with lenvatinib [24] and hand-foot syndrome being very common with sorafenib [27]. We were unable to determine the true impact of lenvatinib and sorafenib on OS or how both drugs, particularly lenvatinib, impact upon HRQoL. This is because OS is confounded by treatment crossover in both trials [26,27] and HRQoL data is limited to reports of sorafenib from the DECI-SION trial [25,55].
It should however be noted that results for OS (except in the case of the DECISION trial), RPSFTM-adjusted OS and PFS described as statistically significant (or otherwise) should be interpreted with caution, since we found for that for these outcomes, the PH assumption was violated. It is therefore not possible to ascertain whether the HRs are overestimates or underestimates of the effect of the intervention versus placebo in either trial.
In conducting a feasibility assessment of performing indirect comparisons, we identified potential differences in trial and population characteristics at baseline. Since the PH assumption for OS and PFS data were also found to be violated, we considered that the validity of conducting an indirect comparison (matched or otherwise) using standard methods was questionable. Importantly, we also identified differences in the survival risk profiles of patients in the placebo arms of the trials. These differences may reflect known or unknown differences in trial and participant characteristics. The identification of these differences was our primary reason for considering an indirect comparison to be inappropriate. Of note, the CADTH have also considered the populations to be different, stating that the SELECT trial population had more aggressive disease as reflected by PFS in the placebo arms [39]. Furthermore, in its consideration of the evidence base during the MTA process, the NICE Appraisal Committee agreed that the Kaplan-Meier plots for PFS in the placebo arms of the trials were sufficiently different to suggest there were important differences limiting the robustness of the indirect comparisons [64].
NICE guidance is based on the recommendations of the NICE Appraisal Committee. The extent to which the findings from either of the SELECT and DECISION trials are generalizable to clinical practice was one of the key considerations for the NICE Appraisal Committee [64]. In clinical practice, patients are often not treated with lenvatinib or sorafenib unless their disease is symptomatic, or they have clinically significant progressive disease (e.g. obvious radiological or biochemical progression). Data published in the EPAR for sorafenib [56] indicate that approximately 20% of patients in the DECI-SION trial had been retrospectively defined as being symptomatic; the equivalent proportion in the SELECT trial was unknown. To be eligible for entry into both trials, patients were required to have had radiographic evidence of disease progression within the last 12 months (SELECT trial) or 14 months (DECISION trial) [26,27]. Arguably these eligibility criteria suggest that patients had clinically significant disease that was likely to be rapidly progressing, if left untreated. Indeed, clinical opinion presented to the NICE Appraisal Committee was that if patients were not yet symptomatic in the trials, it was likely they would soon become symptomatic [64]. The evidence from both trials, even though it appears to include slightly different trial populations, was, therefore, considered to be generalizable to clinical practice.
In the absence of results from reliable indirect comparisons, findings from observational studies provide important supporting evidence. The magnitude of effects in relation to OS, PFS and the incidence of some AEs differed in prospective observational studies [28-31, 33-36, 57, 59] and meta-analyses [44,45] to the RCT findings [24][25][26][27]. There are a number of reasons that could explain this. First, as with the RCTs, differences in unknown patient characteristics may be contributory factors. Second, the differing lengths of follow-up should be considered. Third, all of the prospective observational studies were relatively small, and so the results are more prone to being influenced by any outlying cases. However, while caution needs to be exercised in comparing results across studies of different study populations, the combined evidence from RCTs [26,27] and observational studies [28-31, 33-36, 59] suggests ORR may be higher for patients treated with lenvatinib than for patients treated with sorafenib. Evidence from observational studies [28-31, 33, 35, 36, 57] and meta-analyses [44,45] also show that many common AEs reported with lenvatinib and sorafenib in the RCTs [26,27] are also experienced by patients treated with these drugs in other study populations. The evidence shows that some AEs are very common to both lenvatinib and sorafenib (e.g. diarrhoea), whereas other AEs tend to be more drug specific (e.g. hypertension with lenvatinib and hand-foot syndrome with sorafenib) [28,29,33,35,36,44,45,57]. Therefore, the body of evidence taken as a whole supports the NCCN recommendation that "The decision of whether to use lenvatinib (preferred) or sorafenib should be individualized for each patient based on likelihood of response and comorbidities" [10].
No HRQoL data for lenvatinib are available from either the SELECT trial or the supporting observational studies, [29,36]. Only the DECISION trial collected HRQoL data for patients treated with sorafenib, and then only until the end of treatment [25,55]. In the DECISION trial, "mild" reductions in HRQoL were reported for patients treated with sorafenib compared to those receiving the placebo [25,55]. Given the different objective tumour response rates and types of AEs reported in the studies of lenvatinib, HRQoL data for patients treated with lenvatinib would have been very informative. It is unclear whether, for patients treated with lenvatinib, obtaining an objective response to treatment is associated with improved HRQoL, or if they too would experience "mild" reductions in HRQoL. The exploration of HRQoL associated with treatment with both drugs is an area requiring further research.
Another area where further research is required relates to the sequential use of lenvatinib and sorafenib. Subgroup analysis results from the SELECT trial suggest that differences in PFS, ORR and AEs for lenvatinib versus placebo were similar regardless of whether a patient had been previously treated with a TKI, or not [26,49,50]. However, no OS evidence has been reported for these subgroups. Furthermore, the number of patients in these subgroups, particularly in the placebo arm, is small. Importantly, there is no evidence for the efficacy or safety of treatment with sorafenib following treatment with lenvatinib.
The evidence presented in our review has been used as the basis for making recommendations for practice in England. Guidance was issued by NICE in August 2018 [64]. In drafting the guidance, the NICE Appraisal Committee considered the uncertainties identified in our review, alongside cost effectiveness evidence, and testimonies from clinical and patient experts. NICE guidance recommends the use of lenvatinib or sorafenib for treating RR-DTC if both drugs are provided at a discounted price [64]. However, NICE guidance also includes the restriction that lenvatinib or sorafenib are only available to patients who have not previously received treatment with a TKI or "if they have had to stop taking a TKI within 3 months of starting it because of toxicity (specifically, toxicity that cannot be managed by dose delay or dose modification)" [64]. The reason given for this restriction is because NICE considered that there is "not enough clinical evidence and no cost-effectiveness evidence to determine whether the treatments are effective when used sequentially" [64]. Restricted use of lenvatinib or sorafenib differs to the licensing [16][17][18][19] and also reimbursement approval received elsewhere in the UK [21].

Conclusions
It is not possible to reliably estimate the relative effectiveness of lenvatinib versus sorafenib for treating RR-DTC, but the evidence base clearly demonstrates improvements in PFS and ORR for these treatments when compared with placebo, a proxy for BSC. The improvements in PFS and ORR are, however, accompanied by an increased risk of AEs, whilst the effect on patients' OS and HRQoL remains uncertain. Given the slightly different safety profiles of lenvatinib and sorafenib, the evidence from our review supports clinical guideline recommendations that the choice of treatment should consider each patient's circumstances, including their need for a response to treatment and comorbidities.