 Software
 Open Access
 Published:
A web application for the design of multiarm clinical trials
BMC Cancer volume 20, Article number: 80 (2020)
Abstract
Background
Multiarm designs provide an effective means of evaluating several treatments within the same clinical trial. Given the large number of treatments now available for testing in many disease areas, it has been argued that their utilisation should increase. However, for any given clinical trial there are numerous possible multiarm designs that could be used, and choosing between them can be a difficult task. This task is complicated further by a lack of available easytouse software for designing multiarm trials.
Results
To aid the wider implementation of multiarm clinical trial designs, we have developed a web application for sample size calculation when using a variety of popular multiple comparison corrections. Furthermore, the application supports sample size calculation to control several varieties of power, as well as the determination of optimised armwise allocation ratios. It is built using the Shiny package in the R programming language, is free to access on any device with an internet browser, and requires no programming knowledge to use. It incorporates a variety of features to make it easier to use, including help boxes and warning messages. Using design parameters motivated by a recently completed phase II oncology trial, we demonstrate that the application can effectively determine and evaluate complex multiarm trial designs.
Conclusions
The application provides the core information required by statisticians and clinicians to review the operating characteristics of a chosen multiarm clinical trial design. The range of designs supported by the application is broader than other currently available software solutions. Its primary limitation, particularly from a regulatory agency point of view, is its lack of validation. However, we present an approach to efficiently confirming its results via simulation.
Background
Drug development is becoming an increasingly expensive process, with the estimated average cost per approved new compound now standing at over $1 bn [1]. In no small part this is due to the high failure rate of clinical trials, in particular in phases II and III. This is particularly true in the field of oncology, where the likelihood of approval from phase I is only 5.1% [2]. Consequently, the clinical research community is constantly seeking new methods that may improve the efficiency of the drug development process.
One possible method, which has received substantial attention in recent years, is the idea to make use of multiarm designs that compare several experimental treatments to a shared control group. Several desirable, interrelated, features of such designs have now been described. For example, the number of patients on the control treatment is typically reduced compared to conducting separate twoarm trials, and simultaneously patients are more likely to be randomized to an experimental treatment, which may help with recruitment [3, 4]. Furthermore, the overall required sample size, for the same level of power, will typically be smaller than that which would be required if multiple twoarm trials were conducted [5]. Finally, multiarm designs offer a fair headtohead comparison of experimental treatments in the same study [3, 4], and the cost of assessing a treatment in a multiarm trial is often around half of that for a separate twoarm trial [3].
Based upon these advantages, and their experiences of utilising such designs in several oncology trials, Parmar et al. [3] make a compelling case for the need for more multiarm designs to be used in clinical research. We are not aware of any systematic evidence on whether this has now permeated through to practice, but a simple search of PubMed Central suggests it may be the case: 859 articles have included the phrases “multiarm" and “clinical trial” since 2015, as opposed to just 273 in all years prior to this. Considering this result in combination with the findings of Baron et al. [6], who determined 17.9% of trials published in 2009 were multiarm, as well as the recent publication of a key guidance document on reporting results from multiarm trials [7], it is clear that there is now much interest within the trials community in such designs.
However, whilst there are numerous advantages of multiarm trials, it is important to recognise that determining a suitable design for a multiarm clinical trial can be a substantially more complex process than for a twoarm trial. In particular, a decision must be made on how to account for the multiple comparisons that will be made. Indeed, whether the final analysis should adjust for multiplicity has been a topic of much debate within the literature. In brief, presented arguments primarily revolve around the fact that failing to account for multiplicity can substantially increase the probability of committing a typeI error. Yet, if a series of twoarm trials were conducted, no adjustment would be made to the significance level used in each trial. For brevity, we will not repeat all further arguments on this issue here, and instead refer the reader to several key discussions on multiplicity [5, 8–18].
For the purposes of what follows in this article, the more important consideration is that when a multiple comparison correction (MCC) is to be used, one of a wide selection must actually be chosen (see, e.g., [19–21] for an overview). MCCs vary widely in their complexity, with Bonferroni’s correction often recommended because of its simplicity [7]. However, other MCCs often perform better in terms of the operating characteristics they impart, as Bonferroni’s correction is known to be conservative [10, 18, 20, 22]. A recent review found that amongst those multiarm trials that did adjust for multiplicity, 50% used one of the comparatively simple Bonferroni or Dunnett corrections [5]. Thus, there arguably remains the potential for increased efficiency gains to be made in multiarm trials, if more advanced MCCs can be employed.
Furthermore, regardless of whether a MCC is utilised, there are other complications that must also be addressed in multiarm trial design, including how to power the trial, and what the allocation ratio to each experimental arm relative to the control arm will be. Indeed, power is not a simple quantity in a multiarm trial, whilst the literature on how to choose the allocation ratios in an optimal manner is extensive (see, e.g., [23] for an overview), and deciding whether to specify allocation ratios absolutely, or whether they can be optimised to improve trial efficiency may not be an easy decision.
These considerations imply that userfriendly software for designing multiarm clinical trials would be a valuable tool in the trials community. It is unfortunate therefore that, as we discuss further later, little software is available to assist with such studies. For this reason, we have developed a web application for multiarm clinical trial design. We hope that the availability of this application will assist with the utilization of more advanced multiarm designs in future clinical trials.
Implementation
The web application is written using the Shiny package [24] in the R programming language [25]. It is available as a function in (for offline local use), and is built using other functions from, the R package multiarm [26]. A vignette is provided for multiarm that gives great detail on its formal statistical specifications. A less technical summary is provided here.
Design setting
It is assumed that outcomes X_{ik} will be accrued from patients i∈{1,…,n_{k}} on treatment arms k∈{0,…,K}, with arm k=0 corresponding to a shared control arm, and arms k∈{1,…,K} to several experimental arms. Later, we provide more information on the precise types of outcome that are currently supported by the web application. The hypotheses of interest are assumed to be H_{k}:τ_{k}≤0 for k∈{1,…,K}. Here, τ_{k} corresponds to a treatment effect for experimental arm k∈{1,…,K} relative to the control arm. Thus, we assume onesided tests for superiority. Note that in the app, reference is also made to the global null hypothesis, H_{G}, which we define to be the scenario with τ_{1}=⋯=τ_{K}=0.
To test hypothesis H_{k}, we assume that a Wald test statistic, z_{k}, is computed
In what follows, we use the notation \(\boldsymbol {z}_{k}=(z_{1},\dots,z_{k})^{\top }\in \mathbb {R}^{k}\). With this, note that our app supports design in particular scenarios where Z_{k}, the random pretrial value of z_{k}, has (at least asymptotically) a kdimensional multivariate normal (MVN) distribution, with
As is discussed further later, this includes normally distributed outcome variable scenarios and, for large sample sizes, other parametric distributions such as Bernoulli outcome data.
Ultimately, to test the hypotheses, z_{K} is converted to a vector of pvalues, p=(p_{1},…,p_{K})^{⊤}∈[0,1]^{K}, via p_{k}=1−Φ_{1}(z_{k},0,1), for k∈{1,…,K}. Here, Φ_{n}{(a_{1},…,a_{n})^{⊤},λ,Σ} is the cumulative distribution function of an ndimensional MVN distribution, with mean λ and covariance matrix Σ. Precisely
where ϕ_{n}{x,λ,Σ} is the probability density function of an ndimensional MVN distribution with mean λ and covariance matrix Σ, evaluated at vector x=(x_{1},…,x_{n})^{⊤}.
Then, which null hypotheses are rejected is determined by comparing the p_{k} to a set of significance thresholds specified based on a chosen MCC, in combination with a nominated significance level α∈(0,1). Before we describe the currently supported MCCs however, we will first describe the operating characteristics that are currently evaluated by the app.
Operating characteristics
Our app returns a wide selection of statistical operating characteristics that may be of interest when choosing a multiarm trial design. Specifically, it can compute the following quantities for any nominated multiarm design and true set of treatment effects
The conjunctive power (P_{con}): The probability that all of the null hypotheses are rejected, irrespective of whether they are true or false.
The disjunctive power (P_{dis}): The probability that at least one of the null hypotheses is rejected, irrespective of whether they are true or false.
The marginal power for arm k∈{1,…,K} (P_{k}): The probability that H_{k} is rejected, irrespective of whether it is true or false.
The perhypothesis errorrate (PHER): The expected value of the number of typeI errors divided by the number of hypotheses.
The ageneralised typeI familywise errorrate (FWER_{Ia}): The probability that at least a∈{1,…,K} typeI errors are made. Note that FWER_{I1} is the conventional familywise errorrate (FWER); the probability of making at least one typeI error.
The ageneralised typeII familywise errorrate (FWER_{IIa}): The probability that at least a∈{1,…,K} typeII errors are made.
The false discovery rate (FDR): The expected proportion of typeI errors amongst the rejected hypotheses.
The false nondiscovery rate (FNDR): The expected proportion of typeII errors amongst the hypotheses that are not rejected.
The positive false discovery rate (pFDR): The rate that rejections are typeI errors.
The sensitivity (Sensitivity): The expected proportion of the number of correct rejections of the hypotheses to the number of false null hypotheses.
The specificity (Specificity): The expected proportion of the number of correctly not rejected hypotheses to the number of true null hypotheses.
Multiple comparison corrections
Perhypothesis errorrate control
The most simple method for selecting the significance thresholds against which to compare the p_{k}, is to compare each to the chosen significance level α. That is, to reject H_{k} for k∈{1,…,K} if p_{k}≤α. This controls the PHER to α.
A potential problem with this, however, can be that the statistical operating characteristics of the resulting design may not be desirable (e.g., in terms of FWER_{I1}). As discussed earlier, it is for this reason that we may wish to make use of a MCC. Currently, the web application supports the use of a variety of such MCCs, which aim to control either (a) the conventional familywise errorrate, FWER_{I1} (with these techniques subdivided into singlestep, stepdown, and stepup corrections) or (b) the FDR.
Singlestep familywise errorrate control
These MCCs test each of the H_{k} against a common significance level, γ∈(0,1) say, rejecting H_{k} if p_{k}≤γ. The currently supported singlestep corrections are
Bonferroni’s correction: This sets γ=α/K [27].
Sidak’s correction: This sets γ=1−(1−α)^{1/K} [28].
Dunnett’s correction: This sets γ=1−Φ_{1}{z_{D},0,1}, where z_{D} is the solution of the following equation
$$\alpha = 1  \Phi_{K}\{(z_{D},\dots,z_{D})^{\top},\boldsymbol{0}_{K},\text{Cov}(\boldsymbol{Z}_{K})\}, $$with \(\boldsymbol {0}_{n}=(0,\dots,0)^{\top }\in \mathbb {R}^{n}\) an ndimensional vector of zeroes [29].
Note that each of the above specify a γ such that the maximum probability of incorrectly rejecting at least one of the null hypotheses H_{k}, k∈{1,…,K}, over all possible values of \(\boldsymbol {\tau }\in \mathbb {R}^{K}\) is at most α. This is referred to as strong control of FWER_{I1}.
Stepdown familywise errorrate control
Stepdown MCCs work by ranking the pvalues from smallest to largest. We will refer to these ranked pvalues by p_{(1)},…,p_{(K)}, with associated hypotheses H_{(1)},…,H_{(K)}. The p_{(k)} are compared to a vector of significance levels γ=(γ_{1},…,γ_{K})∈(0,1)^{K}. Precisely, the maximal index k such that p_{(k)}>γ_{k} is identified, and then H_{(1)},…,H_{(k−1)} are rejected and H_{(k)},…,H_{(K)} are not rejected. If k=1 then we do not reject any of the null hypotheses, and if no such k exists then we reject all of the null hypotheses. The currently supported stepdown corrections are
HolmBonferroni correction: This sets γ_{k}=α/(K+1−k) [30].
HolmSidak correction: This sets γ_{k}=1−(1−α)^{K+1−k}.
Stepdown Dunnett correction: This can only currently be used when the \(\text {Cov}(Z_{k_{1}},Z_{k_{2}})\) are equal for all k_{1}≠k_{2}, k_{1},k_{2}∈{1,…,K}. In this case, it sets γ_{k}=1−Φ_{1}{z_{Dk},0,1}, where z_{Dk} is the solution to
$${} \alpha = 1  \Phi_{K+1k}\{(z_{Dk},\dots,z_{Dk})^{\top},\boldsymbol{0}_{K+1k},\text{Cov}(\boldsymbol{Z}_{K+1k})\}. $$
Note that the above methods provide strong control of FWER_{I1}.
Stepup familywise errorrate control
Stepup MCCs also work by ranking the pvalues from smallest to largest, and similarly utilise a vector of significance levels γ. However, here, the largest k such that p_{(k)}≤γ_{k} is identified. Then, the hypotheses H_{(1)},…,H_{(k)} are rejected, and H_{(k+1)},…,H_{(K)} are not rejected. Currently, one such correction is supported: Hochberg’s correction [31], which sets γ_{k}=α/(K+1−k). This method also provides strong control of FWER_{I1}.
False discovery rate control
It may be of interest to instead control the FDR, which can offer a compromise between strict FWER_{I1} control and PHER control, especially when we expect a large proportion of the experimental treatments to be effective. Currently, two methods that will control the FDR to at most α over all possible \(\boldsymbol {\tau }\in \mathbb {R}^{K}\) are supported. They function in the same way as the stepup corrections discussed above, with
BenjaminiHochberg correction: This sets γ_{k}=kα/K [32].
BenjaminiYekutieli correction: This sets [33]:
$$\gamma_{k}=\frac{k\alpha}{K\left(1 + \frac{1}{2} + \dots + \frac{1}{K}\right)}.$$
Sample size determination
The sample size required by a design to control several types of power to a specified level 1−β, under certain specific scenarios, can be computed. Precisely, following for example [34], values for ‘interesting’ and ‘uninteresting’ treatment effects, \(\delta _{1}\in \mathbb {R}^{+}\) and δ_{0}∈(−∞,δ_{1}) respectively, are specified and the following definitions are made
The global alternative hypothesis, H_{A}, is given by τ_{1}=⋯=τ_{K}=δ_{1}.
The least favourable configuration for experimental arm k∈{1,…,K}, LFC_{k}, is given by τ_{k}=δ_{1}, τ_{1}=⋯=τ_{k−1}=τ_{k+1}=⋯=τ_{K}=δ_{0}.
Then, the following types of power can be controlled to level 1−β by design’s determined using the app
The conjunctive power under H_{A}.
The disjunctive power under H_{A}.
The minimum marginal power under the respective LFC_{k}.
Allocation ratios
One of the primary goals of the app is to aid the choice of values for n_{0},…,n_{K}. The app specifically supports the determination of values for these parameters by searching for a suitable n_{0} via a onedimensional root solving algorithm, and then sets n_{k}=r_{k}n_{0}, r_{k}∈(0,∞), for k∈{1,…,K}. Here, r_{k} is the allocation ratio for experimental arm k relative to the control arm.
For this reason, the app also allows the allocation ratios to be specified in a variety of ways: they can be defined explicitly, or alternatively can be determined in an optimal manner. For this optimality problem, many possible optimality criteria have been defined, each with their own merits. Therefore, we refer the reader to Atkinson (2007) [23] for further details of optimal allocation in multiarm designs. Instead, we simply note that in the web application, the allocation ratios can currently be determined for three such criteria
Aoptimality: Minimizes the trace of the inverse of the information matrix of the design. This results in the minimization of the average variance of the treatment effect estimates.
Doptimality: Maximizes the determinant of the information matrix of the design. This results in the minimization of the volume of the confidence ellipsoid for the treatment effect estimates.
Eoptimality: Maximizes the minimum eigenvalue of the information matrix. This results in the minimization of the maximum variance of the treatment effect estimates.
The optimal allocation ratios are identified in the app using available closedform solutions were possible (see [35] for a summary of these), otherwise nonlinear programming is employed.
Other design specifications
Finally, the web application also supports the following options
Plot production: Plots can be produced of (a) all of the operating characteristics quantities listed earlier when τ_{1}=⋯=τ_{K}=θ, as well as (b) the P_{k} when τ_{k}=θ and τ_{l}=θ−(δ_{1}−δ_{0}) for l≠k. If these are selected for rendering, the quality of the plots, in terms of the number of values of θ used for linegraph production, can also be controlled.
Require \(n_{k}\in \mathbb {N}\) for k∈{0,…,K}: By default, the sample size determined for each arm will only be required to be a positive number. In practice, such values need to be integers. This can thus be enforced if desired, with the integer n_{k} specified by rounding up their determined continuous values.
Supported outcome variables
Normally distributed outcome variables
Currently, the app supports multiarm trial design for scenarios in which the outcome variables are assumed to be either normally or Bernoulli distributed.
Precisely, for the normal case, it assumes that \(X_{ik}\sim N(\mu _{k},\sigma _{k}^{2})\), and that \(\sigma _{k}^{2}\) is known for k∈{0,…,K}. Then, for each k∈{1,…,K}
where x_{ik} is the realised value of X_{ik}.
Note that in this case, Z_{K} has a MVN distribution, and thus the operating characteristics can be computed exactly and efficiently using MVN integration [36]. Furthermore, the distribution of Z_{K} does not depend upon the values of the μ_{k}, k∈{0,…,K}. Consequently, these parameters play no part in the inputs or outputs of the app.
Bernoulli distributed outcome variables
In this case, X_{ik}∼Bern(π_{k}) for response rates π_{k}, and for each k∈{1,…,K}
Thus, a problem for design determination becomes that the I_{k} are dependent on the unknown response rates. In practice, this is handled at the analysis stage of a trial by setting
for \(\hat {\pi }_{k} = \sum _{i=1}^{n_{k}}x_{ik}/n_{k}\), k∈{0,…,K}. This is the assumption made where required in the app. With this, Z_{K} is only asymptotically MVN. Thus, in general it would be important to validate operating characteristics evaluated using MVN integration via simulation.
In addition, note that the above problem also means that the operating characterstics under H_{G}, H_{A}, and the LFC_{k} are not unique without further restriction. Thus, to achieve uniqueness, the app requires a value be specified for π_{0} for use in the definition of these scenarios. Moreover, for this reason, the inputs and outputs of functions supporting Bernoulli outcomes make no reference to the τ_{k}, and work instead directly in terms of the π_{k}. Finally, note that this problem also means that to determine A, D, or Eoptimised allocation ratios, a specific set of values for the π_{k} must be assumed.
In this case, we should also ensure that δ_{1}∈(0,1) and δ_{0}∈(−π_{0},δ_{1}), for the assumed value of π_{0}, since π_{k}∈[0,1] for k∈{1,…,K}.
Results
Support
The web application is freely available from https://mjgrayling.shinyapps.io/multiarm/. The R code for the application can also be downloaded from https://github.com/mjg211/multiarm. Furthermore, as noted earlier, the app is built in to the package multiarm [26], as the function gui(), for easeofuse without internet access. The application has a simple interface, and has the capability to
Determine the sample required in each arm in a specified multiarm clinical trial design scenario;
Summarise and plot the operating characteristics of the identified design;
Produce a report describing the chosen design scenario, the identified design, and a summary of its operating characteristics.
Inputs
The outputs (i.e., the identified design and its operating characteristics) are determined based upon the following set of user specified inputs (Fig. 1)
 1.
The number of experimental treatment arms, K.
 2.
The chosen multiple comparison correction (e.g., Dunnett’s correction).
 3.
The significance level, α.
 4.
The type of power to control (e.g., the conjunctive power under H_{A}).
 5.
The desired power, 1−β.
 6.
For Bernoulli distributed data, the control arm response rate π_{0}.
 7.
The interesting treatment effect, δ_{1}.
 8.
The uninteresting treatment effect, δ_{0}.
 9.
For normally distributed data, the standard deviations, σ_{0},…,σ_{K}. These are allocated by first selecting the type of standard deviations (e.g., that they are assumed to be equal across all arms), and then the actual values for the parameters.
 10.
The allocation ratios (e.g., Aoptimal).
 11.
For Bernoulli distributed data, when searching for optimal allocation ratios, the response rates to assume in the search.
 12.
Whether the sample size in each arm should be required to be an integer;
 13.
Whether plots should be produced, and if so the plot quality.
Note that a Reset inputs button is provided to simplify returning the inputs to their default values. Once the inputs have been specified as desired, the outputs can be generated by clicking the Update outputs button.
Example
Here, we demonstrate specification of the input parameters (Fig. 1), and then subsequent output generation (Figs. 2, 3, and 4), for parameters motivated by a threearm phase II randomized controlled trial of treatments for myelodysplastic syndrome patients, described in [37]. This trial compared, via a binary primary outcome, two experimental treatments with conventional azacitidine treatment. The trial was designed with α=0.15, β=0.2, δ_{1}=0.15, and π_{0}=0.3. For simplicity, we assume that the familiar Dunnett correction will be used, that δ_{0}=0, and that allocation will be equal across the arms (r_{1}=⋯=r_{K}=1). Finally, we assume it is the minimum marginal power that should be controlled.
Each input widget in Fig. 1 can be seen to have been allocated accordingly based on the description above, whilst we have additionally elected to produce plots (of medium quality), and to not require the armwise sample sizes to be integers. Note that in Fig. 1 we can see that the input widgets are supported by help boxes that can be opened by clicking on the small question marks beside them.
Figure 2 then depicts the output to the Design summary box once the user clicks on Update outputs. Specifically, a summary of the chosen inputs and the identified design is rendered. Furthermore, in Fig. 3 we can see the tables that provide the various statistical quantities under H_{G}, H_{A}, the LFC_{k}, as well as the various treatment effect scenarios that are considered for plot production.
Finally, in Fig. 4 the plots discussed earlier are shown. Observe that horizontal and vertical lines are added at the values α, 1−β, δ_{1}, and δ_{0} respectively. Note that these plots are outputted in a manner to allow the user to zoom in on a particular subcomponent if desired.
In all, Figs. 2, 3, and 4 provide a set of outputs with a variety of features that should be anticipated given the chosen input parameters. Firstly, the specification that the allocation to all arms should be equal means that n_{0}=⋯=n_{K}. In addition, FWER_{I1} is equal to 0.15 under H_{G}, and the minimum marginal power is 0.8, as was desired. Moreover, the specification that r_{1}=⋯=r_{K} means that P_{con} and P_{dis} are equal for each of the LFC_{k}, and P_{1}=P_{2}.
Finally, as noted above, and as can be seen in Fig. 1, a Generate report button is provided that can produce a copy of the outputs in either PDF (.pdf), HTML (.html), or Word (.docx) format. The user can also nominate a name for this file in the Report filename input widget. This allows a record of designs to be stored, presented, and compared to other designs if required. A copy of the report, in PDF form, for the inputs shown in Fig. 1, is given as Additional file 1.
Comparison to other software solutions
In this section we discuss solutions that are available for designing multiarm trials in a range of popular trial design packages, using this to describe the advantages and disadvantages of our web application.
Firstly, we note that we are unaware of any other code for R that directly facilitates the design of a multiarm trial: in particular the CRAN Task View for Clinical Trial Design, Monitoring, and Analysis does not list any potential solution [38]. Nonetheless, a multiarm trial designed to achieve a particular level of marginal power, that controls either the PHER or the FWER via a singlestep MCC, could be identified using one of the many functions available for designing twoarm trials (see, e.g., power.prop.test() from the stats package). However, one would not then be able to readily explore the resultant design’s operating characteristics. Similar statements hold for Stata [39] and SAS [40], with the power command and the PROC POWER procedure respectively enabling the determination and evaluation of twoarm trial designs, but neither directly supports multiarm trial design. Moreover, nQuery [41], to the best of our knowledge does not appear to currently support the design of multiarm trials.
Direct solutions for certain types of multiarm trial are available in several other proprietary software packages: namely East [42], FACTS [43], and PASS [44]. Unfortunately, the cost of these packages may be prohibitive to many working within academia. Indeed, this was our primary motivation for developing the presented web application, and we are only able to comment precisely here on the available functionality in PASS, as we do not have access to either East or FACTS.
Firstly, we note that from the information provided online, the MULTIARM module for East facilitates the determination of a range of multiarm trial designs. So to does it support their comparison in terms of numerous operating characteristics, including the FWER and several varieties of power. It will also produce a selection of insightful plots, handles both continuous and binary outcome variables, and eleven MCCs. Less information is available online about the precise support available in FACTS, but it is stated that its ‘Core’ functionality can handle scenarios with multiple treatment arms. In PASS, support is provided to design a multiarm trial with Bernoulli outcomes via formula provided in Chow et al. (2008) [45]. Specifically, the Bonferroni correction is used to control the FWER to a specified level, and the sample size required to achieve a particular level of the minimum marginal power can be computed, under several allocation ratio scenarios. Furthermore, a report is ultimately generated on the calculations performed. PASS also supports similar calculations, using either Dunnett’s or the KruskalWallis MCC, for a vast array of outcome types via simulation (including both Bernoulli and normally distributed outcomes). These calculations explicitly address the sample size required to control the conjunctive or disjunctive power, and allow for flexible assumptions about the allocation ratios.
Thus, a variety of multiarm trial designs can be determined using solutions other than our web application. However the cost of these packages may render them unsuitable, particularly in academic departments. This reveals arguably the greatest advantage of our web application: that it is provided under a license that makes it completely free to utilise and modify as a user sees fit. In addition, like the discussed proprietary solutions, our web application allows for calculations via a GUI that contains several features to make it easier to use, without compromising on the type of multiarm designs that can be determined. In fact, we would argue that our application supports a broader range of multiarm design scenarios than any other currently available solution.
We feel that there are only two principal limitations of our application. Firstly, MVN integration is utilised by the application in all instances to determine the statistical operating characteristics of potential multiarm designs. This makes the execution time for returning outputs with many possible input parameters fast. However, there is an unavoidable complexity in certain multiarm designs, which may make execution time long. This is particularly true of scenarios with K≥5. It can also be true of designs that utilise the more complex stepwise MCCs. It is for this reason that the web application places an upper cap in the inputs of K=5, and also returns a warning in scenarios for which a lengthy execution time would be anticipated. Nonetheless, users may have to wait several minutes in certain situations to identify their desired design. In contrast, proprietary solutions may exploit more efficient solutions to reduce execution time, with FACTS in particular noting its use of efficient lowlevel languages.
More significantly, it is crucial that all software for clinical trial design be validated. Each of the discussed proprietary solutions will almost certainly have gone through more rigorous testing than we are able to achieve. Specifically, it is challenging to validate our results because of the limited freely available software solutions for multiarm trials. We have compared the output of our application to that of PASS for a variety of supported input parameters, but output for many possible inputs remains difficult to corroborate because of a lack of equivalent available functionality. For this reason, we have carefully followed recommended goodprogramming practices and perform all statistical calculations within the application by calling functions from the R package multiarm, in which the code has been modularised [26].
Furthermore, in this package we have created a function that simulates multiarm clinical trials that use a given design. This allows us to perform an additional check on our analytical computations. As an example, we demonstrate how to identify the example design discussed above, but under the assumption of normally distributed data with σ_{1}=⋯=σ_{K}=1:>¡set.seed(1)>¡design¡<¡multiarm::des_ma(K¡=¡2, +¡¡alpha¡=¡0.15, +¡¡beta¡=¡0.2, +¡¡delta1¡=¡0.15, +¡¡delta0¡=¡0, +¡¡sigma¡=¡c(1,¡1,¡1), +¡¡ratio¡=¡c(1,¡1), +¡¡correction¡=¡~dunnett~, +¡¡power¡=¡~marginal~, +¡¡integer¡=¡T) Then, 100,000 replicate simulations of trials that utilise this design, under H_{G}, H_{A}, and the LFC_{k}, can be calculated with:>¡simulated¡<¡multiarm::sim_ma(design) Finally, the maximum absolute difference in the operating characteristics of this design, as determined analytically and via simulation can be evaluated as:>¡max(abs(simulated\(sim¡¡design\)opchar))[1]¡0.002166331
Thus, the maximal difference is within what would be anticipated allowing for simulation error.
In Additional file 2, we demonstrate how we repeated the above for 1000 randomly generated combinations of possible input parameters, thus covering an extremely wide range of supported design scenarios. As above, the analytical operating characteristics returned by the web application in the Operating characteristics summary boxes were compared to those based on trial simulation, using 100,000 replicate simulations in each instance. Across all considered scenarios, the maximum absolute difference between the analytical and simulated operating characteristics was just 5×10^{−3}, which is again within what would be anticipated due to simulation error. Consequently, it does appear that our application is functioning as it should. However, it remains that the principal argument for not utilising our application would be to attain a stronger guarantee on the results.
Conclusions
A possible barrier to previous calls for increased use of multiarm clinical trial designs is a lack of available easytoaccess userfriendly software that facilitates associated sample size calculations. For this reason, we have created an online web application that supports multiarm trial design determination for a wide selection of possible input parameters. Its use requires no knowledge of statistical programming languages and is facilitated via a simple user interface. Furthermore, we have made the application available on the internet, so that it is readily accessible, and have also made it freely available for download for remote use without an internet connection. Like similar applications that have been released recently for phase I clinical trial design [46, 47], we hope that the availability of this application will assist with the design of future multiarm studies. As we have discussed, however, users should bear in mind the primary limitation of our application: that it is not validated. Therefore, alternative proprietary solutions may be needed if certain guarantees on outputs are required.
Finally, we note several possible avenues for future development of the web application. Firstly, numerous papers have now provided designs for adaptive multiarm trials (e.g., [48, 49]), and software for their determination in certain settings [50, 51]. Given the evidential increased interest in such designs [52], allowing for their determination would be a valuable extension to our application. In addition, our web application currently focuses on design for normally and Bernoulli distributed outcomes. But, timetoevent outcomes are also commonly used in oncology. Permitting such calculations therefore likewise offers a valuable avenue for subsequent versions of the app.
Availability and requirements
Project name: Multiarm trial web application.
Project home page: https://mjgrayling.shinyapps.io/multiarm/.
Operating system(s): Platform independent.
Programming language: R.
Other requirements: Version 3.5.2 or later.
License: MIT.
Any restrictions to use by nonacademics: None.
Availability of data and materials
Access to the application online is available at https://mjgrayling.shinyapps.io/multiarm/. The R code for the application can be downloaded from https://github.com/mjg211/multiarm.
Abbreviations
 MCC:

Multiple comparison correction
 MVN:

Multivariate normal
References
JA DiMasi, HG Grabowski, RW Hansen, Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ. 47:, 20–33 (2016).
Biotechnology Innovation Organization (BIO), Biomedtracker, AMPLION, Clinical development success rates 20062015 (2016).
MKB Parmar, J Carpenter, MR Sydes, More multiarm randomised trials of superiority are needed. Lancet. 384(9940), 283–4 (2014).
T Jaki, JMS Wason, Multiarm multistage trials can improve the efficiency of finding effective treatments for stroke: a case study. BMC Cardiovasc Disord. 18(1), 215 (2018).
JMS Wason, L Stecher, AP Mander, Correcting for multipletesting in multiarm trials: is it necessary and is it done?Trials. 15:, 364 (2014).
G Baron, E Perrodeau, I Boutron, P Ravaud, Reporting of analyses from randomized controlled trials with multiple arms: a systematic review. BMC Med. 11:, 84 (2013).
E Juszczak, DG Altman, S Hopewell, K Schulz, Reporting of multiarm parallelgroup randomized trials: extension of the CONSORT 2010 statement. JAMA. 321(16), 1610–20 (2019).
KJ Rothman, No adjustments are needed for multiple comparisons. Epidemiology. 1(1), 43–6 (1990).
RJ Cook, VT Farewell, Multiplicity considerations in the design and analysis of clinical trials. J R Stat Soc Ser A. 159(1), 93–110 (1996).
MA Proschan, MA Waclawiw, Practical guidelines for multiplicity adjustment in clinical trials. Control Clin Trials. 21(6), 527–39 (2000).
R Bender, S Lange, Adjusting for multiple testing  when and how?J Clin Epidemiol. 54(4), 343–349 (2001).
RJ Feise, Do multiple outcome measures require pvalue adjustment?,. BMC Med Res Methodol. 2:, 8 (2002).
MD Hughes, Multiplicity in clinical trials. Encycl Biostat. 5:, 3446–51 (2005).
B Freidlin, EL Korn, R Gray, A Martin, Multiarm clinical trials of new agents: some design considerations. Clin Cancer Res. 14:, 4368–4371 (2008).
G Li, M Taljaard, ER Van den Heuvel, MAH Levine, DJ Cook, GA Wells, PJ Devereaux, L Thabane, An introduction to multiplicity issues in clinical trials: the what, why, when and how. Int J Epidemiol. 46(2), 746–55 (2016).
EM Agency, Guideline on Multiplicity Issues in Clinical Trials (2017). https://www.ema.europa.eu/en/documents/scientificguideline/draftguidelinemultiplicityissuesclinicaltrials_en.pdf. Accessed 17 Jan 2020.
U. F. D. Administration, Multiple Endpoints in Clinical Trials Guidance for Industry (2017). https://www.fda.gov/regulatoryinformation/searchfdaguidancedocuments/multipleendpointsclinicaltrialsguidanceindustry. Accessed 17 Jan 2020.
DR Howard, JM Brown, S Todd, WM Gregory, Recommendations on multiple testing adjustment in multiarm trials with a shared control group. Stat Methods Med Res. 27(5), 1513–30 (2018).
Y Hochberg, AC Tamhane, Multiple Comparison Procedures (Wiley, New York, 1987).
JC Hsu, Multiple Comparisons (Chapman & Hall, London, 1996).
F Bretz, T Hothorn, P Westfall, Multiple Comparisons using R (CRC Press, Boca Raton, 2010).
AJ Sankoh, RBS D’Agostino, MF Huque, Efficacy endpoint selection and multiplicity adjustment methods in clinical trials with inherent multiple endpoint issues. Stat Med. 22(20), 3133–50 (2003).
A Atkinson, A Donev, R Tobias, Optimum Experimental Designs, with SAS (Oxford University Press, Oxford, 2007).
W Chang, J Cheng, JJ Allaire, Y Xie, J McPherson, shiny: Web Application Framework for R (2019). https://CRAN.Rproject.org/package=shiny. Accessed 17 Jan 2020.
R Core Team, R: a Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, 2018). https://www.Rproject.org/. Accessed 17 Jan 2020.
MJ Grayling, multiarm: Design and analysis of fixedsample multiarm clinical trials (2019). http://www.github.com/mjg211/multiarm/. Accessed 17 Jan 2020.
CE Bonferroni, Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze. 8:, 3–62 (1936).
Z Šidák, Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc. 62(318), 626–33 (1967).
CW Dunnett, A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 50(272), 1096–121 (1955).
S Holm, A simple sequentially rejective multiple test procedure. Scand J Stat. 6(2), 65–70 (1979).
Y Hochberg, A sharper bonferroni procedure for multiple tests of significance. Biometrika. 75(4), 800–2 (1988).
Y Benjamini, Y Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 57(1), 289–300 (1995).
Y Benjamini, D Yekutieli, The control of the false discovery rate in multiple testing under dependency. Annals Stat. 29(4), 1165–88 (1995).
J Wason, D Magirr, M Law, T Jaki, Some recommendations for multiarm multistage trials. Stat Methods Med Res. 25(2), 716–27 (2016).
O Sverdlov, WF Rosenberger, On recent advances in optimal allocation designs in clinical trials. J Stat Theory Pract. 7(4), 753–73 (2013).
A Genz, F Bretz, T Miwa, M X, L F, S F, H T, mvtnorm: Multivariate normal and t distributions. R package version 1.010 (2019). http://CRAN.Rproject.org/package=mvtnorm. Accessed 17 Jan 2020.
L Jacob, U M, S Boulet, I Begaj, S Chevret, Evaluation of a multiarm multistage Bayesian design for phase II drug selection trials  an example in hematooncology. BMC Med Res Methodol. 16:, 67 (2016).
CRAN Task View: Clinical Trial Design, Monitoring, and Analysis. https://cran.rproject.org/web/views/ClinicalTrials.html. Accessed: 16 Oct 2019.
Stata. https://www.stata.com/. Accessed: 16 Oct 2019.
SAS. https://www.sas.com/en_gb/home.html. Accessed: 16 Oct 2019.
nQuery. https://www.statsols.com/nquery. Accessed: 16 Oct 2019.
East. https://www.cytel.com/software/east. Accessed: 04 May 2019.
FACTS. https://www.berryconsultants.com/software/. Accessed: 16 Oct 2019.
PASS. https://www.ncss.com/software/pass/. Accessed: 16 Oct 2019.
S Chow, H Wang, J Shao, Sample Size Calculations in Clinical Research (Chapman & Hall, Boca Raton, 2008).
GM Wheeler, MJ Sweeting, AP Mander, AplusB: A Web Application for Investigating A + B Designs for Phase I Cancer Clinical Trials. PLoS ONE. 11(7), 0159026 (2016).
NA Wages, GR Petroni, A web tool for designing and conducting phase I trials using the continual reassessment method. BMC Cancer. 18:, 133 (2018).
D Magirr, T Jaki, J Whitehead, A generalized Dunnett test for multiarm multistage clinical studies with treatment selection. Biometrika. 99(2), 494–501 (2012).
J Wason, N Stallard, J Bowden, C Jennison, A multistage dropthelosers design for multiarm clinical trials. Stat Methods Med Res. 26(1), 508–24 (2017).
FMS Barthel, P Royston, MKB Parmar, A menudriven facility for samplesize calculation in novel multiarm, multistage randomized controlled trials with a timetoevent outcome. Stata J. 9(4), 505–23 (2009).
T Jaki, P Pallmann, D Magirr, The R package MAMS for designing multiarm multistage clinical trials. J Stat Softw. 88(4), 1–25 (2019).
M Dimairo, E Coates, P Pallmann, S Todd, SA Julious, T Jaki, J Wason, AP Mander, CJ Weir, F Koenig, MK Walton, K Biggs, J Nicholl, T Hamasaki, MA Proschan, JA Scott, Y Ando, D Hind, DG Altman, Development process of a consensusdriven CONSORT extension for randomised trials using an adaptive design. BMC Med. 16:, 210 (2018).
Acknowledgements
Not applicable.
Funding
This work was supported by the Medical Research Council [grant number MC_UU_00002/6 to JMSW]. The funding body did not have any role in the design of this study, collection, analysis, and interpretation of data, nor in the writing of the manuscript.
Author information
Authors and Affiliations
Contributions
MJG and JMSW contributed to conception of the web application. MJG wrote the code for the web application. MJG and JMSW contributed to drafting and revising the manuscript. MJG and JMSW gave final approval of the manuscript submitted for publication.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1
PDF report. A copy of the PDF report generated by clicking the Generate report button in the web application, for the input parameters shown in Fig. 1.
Additional file 2
Analytical vs. simulated operating characteristics comparison. R code to replicate our comparison of the analytical operating characteristics returned by the web application against those based on simulation.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Grayling, M.J., Wason, J.M. A web application for the design of multiarm clinical trials. BMC Cancer 20, 80 (2020). https://doi.org/10.1186/s1288502065250
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1288502065250
Keywords
 False discovery rate
 Familywise errorrate
 Multiple comparisons
 Optimal design
 Power
 Sample size