Skip to main content

Bias and inconsistency in the estimation of tumour mutation burden

Abstract

Background

Tumour mutation burden (TMB), defined as the number of somatic mutations per megabase within the sequenced region in the tumour sample, has been used as a biomarker for predicting response to immune therapy. Several studies have been conducted to assess the utility of TMB for various cancer types; however, methods to measure TMB have not been adequately evaluated. In this study, we identified two sources of bias in current methods to calculate TMB.

Methods

We used simulated data to quantify the two sources of bias and their effect on TMB calculation, we down-sampled sequencing reads from exome sequencing datasets from TCGA to evaluate the consistency in TMB estimation across different sequencing depths. We analyzed data from ten cancer cohorts to investigate the relationship between inferred TMB and sequencing depth.

Results

We found that TMB, estimated by counting the number of somatic mutations above a threshold frequency (typically 0.05), is not robust to sequencing depth. Furthermore, we show that, because only mutations with an observed frequency greater than the threshold are considered, the observed mutant allele frequency provides a biased estimate of the true frequency. This can result in substantial over-estimation of the TMB, when the cancer sample includes a large number of somatic mutations at low frequencies, and exacerbates the lack of robustness of TMB to variation in sequencing depth and tumour purity.

Conclusion

Our results demonstrate that care needs to be taken in the estimation of TMB to ensure that results are unbiased and consistent across studies and we suggest that accurate and robust estimation of TMB could be achieved using statistical models that estimate the full mutant allele frequency spectrum.

Peer Review reports

Background

Immunotherapy is an evolving and promising cancer treatment that works by provoking the patient’s own immune response to target cancer cells [1]. Several studies have examined the effectiveness of Immunotherapy drugs in cancer treatment [24]. Although many patients show a strong and durable response to immune checkpoint inhibitors (ICIs), some patients do not respond well to treatment [5, 6]. Therefore, there is a need for effective biomarkers to distinguish between patients who are more or less likely to benefit from this treatment. Several biomarkers have been proposed that correlate well with the response of immunotherapy in multiple cancer types, including tumour mutation burden (TMB), neoantigen burden, DNA mismatch repair deficiency, and high microsatellite instability [711].

TMB was initially introduced as a biomarker for ICIs in melanoma. Recently the FDA has approved pembrolizumab in all cancers with TMB greater than 10 mutations per megabase (MB) as assessed by the targeted FoundationOne CDx assay [12]. Several studies have argued that patients with high tumor mutation burden (TMB-H) are more likely to respond to checkpoint inhibitors because a higher number of mutations in a tumour correlates with an increase in the number of neoantigens that can be recognized by T cells [1315]. However, some recent studies have shown that TMB-H fails to predict immune checkpoint blockade response in breast cancer, prostate cancer and glioma [16, 17]. Despite the low efficacy of TMB in these cancer types, significant correlations have been reported between TMB-H and response to ICIs in several other cancer types such as melanoma, lung, and bladder cancers where CD8 T-cell levels positively correlated with neoantigen load ([16, 18]). TMB-H has been reported to be the most robust, effective and clinically verifiable biomarker in these cancer types [19].

Tumour mutation burden is calculated by counting the number of somatic mutations above a threshold frequency in data derived from whole genome sequencing, whole exome sequencing (WES) or panel sequencing and dividing by the size of the target region [20]. Although WES is frequently used to measure TMB in a research setting, it can be impractical for clinical use due to its higher cost, and the low average coverage which could result in missing rare somatic mutations. To overcome these issues some studies suggest using panel sequencing [21, 22]. The US FDA approved two cancer-related gene panels, FoundationOne CDx (F1CDx) and MSK-IMPACT [23, 24]. Also [19, 25] proposed pan-cancer TMB panels that showed higher correlation with WES than other panels. Several studies have suggested using various thresholds, filtering strategies and models to improve the robustness of TMB measurement for targeted-panel sequencing and to correct panel design biases in order to avoid overestimating TMB [2629]. To validate the panel-based sequencing approach, the TMB derived from the target-panel is compared against TMB measured from WES [30], which is considered the standard TMB measurement [31].

Although TMB is an effective biomarker in several cancer types and the efficacy of panel based sequencing in TMB estimation has been shown in several studies as an alternative to WES sequencing for clinical use, there is still no standard method to determine the genes to include in the panel, the type of mutation or the cut-off to distinguish between high and low TMB values [32, 33]. Apart from the lack of a standardization method which is essential in order to be able to compare the different TMB values estimated from different gene panels, there are some factors that could influence TMB estimation regardless of the NGS platform used. These factors could be categorized into two groups, patient and sample-specific factors, such as the site of biopsy, sample type, sample purity and technical factors, such as sequencing depth and bioinformatics pipeline [3439].

The first step in a TMB estimation pipeline is to detect somatic mutations. Many mutation calling tools have been introduced to call somatic single nucleotide variants and the performance of these tools has been evaluated extensively [4042]. In particular, [43] showed that Mutect2, developed by the Broad Institute, EBCall [44], Virmid [45] and Strelka [46] are the most reliable tools, with similar performance. Several studies have also evaluated the effect of sequencing depth in detecting somatic mutations by different mutation callers. In these studies, in particular [47], it has been shown that a sequencing depth ≥200X is sufficient for calling 95% of mutations with mutation frequency (≥20%); for mutations at lower frequencies, it has been recommended to increase the sequencing depth or improve the experimental method. Therefore in the absence of sufficient coverage depth, detecting somatic variant with low frequency is still a major challenge. Insufficient sequencing depth could also impact on TMB estimation by reducing the accuracy of mutation frequency estimates as the mutation frequency is required in order to determine the number of mutations exceeding the threshold frequency.

In this study, we investigate statistical biases affecting TMB estimation. We explore the impact of these biases on TMB estimates using simulations with parameters informed by real cancer sequencing studies. We also investigate the relationship between inferred TMB and sequencing depth, both by down-sampling sequencing reads from exome sequencing datasets from TCGA and by assessing the relationships between inferred TMB and sequencing depth across TCGA cohorts. The relationship between sequencing depth and inferred TMB is likely to reflect both the power to detect somatic mutations [47] as well as bias in the TMB estimates, which is also a function of sequencing depth. We suggest that a statistical modelling approach that estimates the parameters of the entire mutation frequency spectrum, rather than counting mutations above a fixed threshold, is likely to provide a more robust means of estimating TMB.

Methods

Simulations

We used simulations to investigate the extent of the bias in the estimate of mutant allele frequencies resulting from neglecting mutations for which the empirical frequency was below a threshold, τ. The simulations consisted of 200 loci, each covered by 100 reads. True values of the mutant allele frequency were considered, over the range shown in Fig. 1. For each true frequency, f, we obtained 1,000 random samples from a binomial random variable, with parameter, f and size 100, using R. Two estimates were then returned for the mutant allele frequency, one derived from all of the samples and a second, following truncation (i.e. using only the samples in which the proportion of mutant alleles was at least τ).

Fig. 1
figure 1

The estimated frequencies (the proportion of successes) versus the true frequencies (the success probability) for both binomial (blue) and 0.05-truncated binomial (red) random variables

Theoretical derivation of the relative error in the TMB estimates

Here we provide a theoretical derivation of the extent of the bias in TMB corresponding to subclonal mutations with beta-distributed mutant allele frequency spectrum, with parameters α and β, as illustrated in Fig. 2. The true number of somatic mutations with frequency above τ is

$$ T=\left(1-{F}_B\left(\tau; \alpha, \beta \right)\right)\times \mathrm{S}. $$
(1)
Fig. 2
figure 2

A. Three beta distributions, with α and β parameters as shown in the legend, representing alternative mutant allele frequency spectra for subclonal mutations. B. Relative error in the estimated TMB contribution from subclonal mutations derived from the three distributions in A

where FB is the cumulative distribution function of a beta random variable and S is the total number of somatic mutations.

Let di be the sequencing depth at a site i at which a somatic mutation has occurred. The number of reads carrying the mutant allele at site i is a beta-binomial random variable with size di and parameters α and β and the expected number of somatic mutations with frequency ≥τ is:

$$ E=\sum \limits_{i=1}^S\left(1-{F}_{BB}\left({d}_i\tau; {d}_i,\alpha, \beta \right)\right). $$
(2)

where FBB is the cumulative distribution function of the beta-binomial random variable. The relative error (depicted in Fig. 2) is (ET)/T.

Down-sampling TCGA data and TMB calculation

Whole exome sequencing data (in BAM format) was downloaded for four lung adenocarcenoma (LUAD) TCGA samples from cBioPortal ([48]); TCGA-55-8205, TCGA-78-7159, TCGA-78-7161 and TCGA-78-7162). We used samtools to downsample each BAM file progressively to 50%. We used Mutect2, GATK4 with the default options to infer somatic mutations and their frequencies. To estimate TMB we determined the number of PASS somatic mutations with estimated frequencies above 0.05.

Correlation between sequencing depth and TMB

We analyzed paired tumour-normal whole-exome sequencing data from 4,850 TCGA samples from ten primary tumour types (bladder urothelial carcinoma; N=411 sample pairs, breast invasive carcinoma; N=1,043, colon adenocarcinoma; N=432, kidney renal clear cell carcinoma; N=338, brain lower grade glioma; N=511, lung adenocarcinoma; N=569, lung squamous cell carcinoma; N=496, ovarian serous cystadenocarcinoma; N=440, prostate adenocarcinoma; N=496 and skin cutaneous melanoma; N=104). The number of mapped reads from the tumour BAM for each donor was used as a proxy for sequencing depth. In order to observe how tumour heterogeneity affects the relationship between sequencing depth and TMB, we used TCGA samples from all cancer groups that are present within PCAWG data, for which the proportion of clonal mutations in each sample is known. These files were downloaded from ICGC ([49]) and access was granted through DACO-5661 and dbGAP Project 21959. In each cancer cohort we calculated the proportion of samples with high clonal fraction (i.e. the proportion of the samples with clonal fraction greater than 50%).

Results

The proportion of reads corresponding to a somatic mutation is a biased estimator of mutation frequency

To calculate the TMB a tumour sample is obtained and the genome (or a targeted subset of the genome, such as the exome or a gene panel) is sequenced, usually to a relatively high-depth. In the first instance, we make the simplifying assumptions that all somatic mutations present in the sequence reads can be detected with perfect efficiency (i.e. we neglect the effects of sequencing and mapping errors and any other artefacts) and that a sample consisting only of cancer cells has been sequenced to depth of N reads (constant across sites). We wish to obtain an estimate of the TMB, defined as the number of somatic mutations per megabase whose true frequencies are above a threshold, τ. This is estimated by counting the mutations with frequency above τ and dividing by the size of the target region. This approach requires the mutant allele frequency to be estimated (to determine whether it exceeds τ). The proportion of reads containing the mutant allele is used as an estimate of the true mutation frequency, f ([50]); however, in this case the proportion of reads containing the mutant allele is a biased estimator of f. This bias results from the fact that the proportion is calculated only for sites at which at least one mutant allele is observed, whereas the true set of sites at which a somatic mutation has occurred is unknown (and may be much larger [47]). The number of mutant alleles observed among the sequence reads at a site is, therefore, a sample from a zero-truncated binomial random variable [51]. The expected value of the proportion of successes from a zero-truncated binomial random variable is \(\frac {f}{1-(1-f)^{N}}\), which exceeds f (Fig. S1).

Fisher derived a maximum likelihood procedure to estimate the parameter f in a singly truncated binomial distribution [51]. The extent of the bias resulting from zero-truncation depends on the true (but unknown) frequency spectrum of the somatic mutations in the sample and is largest when there are many low-frequency somatic mutations. In practice, the bias is much larger than shown in Fig. S1, because in the calculation of TMB, typically only sites at which the observed proportion of reads containing the mutant allele is greater than the threshold τ (often set at 0.05) are considered. The number of reads with the alternative allele is therefore a sample from an τN-truncated binomial random variable and the upward bias in the estimate of the true mutation frequency has the potential to be substantial (Fig. 1).

Bias in TMB resulting from uncertainty in frequency estimates

Even if the complete set of sites at which a somatic mutation has occurred were known, so that the mutation frequencies were not affected by truncation bias, the number of mutations above the frequency threshold is likely to be a biased estimate of the TMB. This is because in addition to bias, the mutation frequency estimates include uncertainty. If we count the number of mutations with empirical proportions greater than τ and if the mutation frequency spectrum has a strongly negative slope at τ then the number of mutations with true frequency below τ but empirical proportion above τ (i.e. moving from left to right across the yellow line in Fig. 3, Fig. S2) may be much larger than the number passing the threshold in the other direction, resulting in over-estimation of TMB. As an example, Fig. S3 shows the points that cross the threshold to the left and right for a simulation using α=0.1 and β=100.

Fig. 3
figure 3

Illustration of how uncertainty in mutation frequency estimates can lead to over-estimation of the number of mutations above the frequency threshold, even if the estimated frequencies are unbiased. The red and blue shaded areas correspond to mutations for which sampling error could cause them to cross the frequency threshold (i.e. the estimated frequencies of mutations with true frequencies in the red shaded area may be below the threshold due to sampling error, while the estimated frequencies of mutations in the blue shared area may be above the threshold). Because the blue shaded area is much larger than the red area, the number of mutations that pass the threshold from left to right is likely to be much larger than the number of mutations that pass the threshold in the other direction, leading to over-estimation of the number of mutations above the threshold

Models of exponential growth and largely neutral tumour evolution predict a large number of low-frequency variants [52] and simulations suggest that the two sources of bias introduced in this and the previous section can result in substantial bias in TMB estimated from cancer samples, with the impact on TMB depending on the shape of the mutant allele frequency spectrum (Fig. 2 and Fig. S3).

Relationship between coverage and TMB in 10 different cancer cohorts

The above demonstration assumes perfect power to identify the somatic mutations on the sequenced reads. In reality, many of the somatic mutations present on the sequenced reads may not be identified by somatic mutation calling pipelines. The power to detect a somatic mutation at a site will depend on the sequencing depth and depth also influences the extent of the bias resulting from τN-truncation. The combination of these effects is likely to result in instability in the TMB estimate as the sampling depth is varied, potentially resulting from inconsistent results obtained by different experimental protocols. To illustrate this potential instability we down-sampled the sequencing reads from real data (from TCGA) to 50% of their sequencing depths and implemented a pipeline to estimate TMB (with a frequency threshold of 0.05). The TMB estimated in this way was sensitive to sequencing depth following down sampling and showed no evidence of having reached a plateaux by the time the full sample depth was included (Fig. 4). To investigate the relationship between sequencing depth and TMB across real exome sequencing data we analyzed 4,850 TCGA samples from ten cancer types. In six of the cancer types there was a statistically significant positive correlation between sequencing depth and TMB, although the depth appears to explain only a small proportion of the variation in TMB (Fig. S5).

Fig. 4
figure 4

Each TCGA data was downsampled to 50 percent. The plot shows the TMB in each downsampled data with respect to its corresponding original full data

Discussion

The number of somatic mutations observed in tumours has been studied extensively in recent years to understand its efficacy as a predictive biomarker in different cancer types as well as the factors that contribute to its variation between individuals and across cancer types [16, 17, 3436]. To estimate TMB mutation callers are used to identify somatic variants and to estimate their frequencies. The number of somatic mutations above a specified threshold frequency per megabase of the genomic region targeted in the experiment is then reported as the TMB. The performance of the mutation callers depends on sequencing depth and mutation frequency and many mutations with low frequencies (≤10%) may be missed by mutation callers at moderate sequencing depths [47], potentially leading to underestimation of TMB. Unbiased estimation of TMB also requires unbiased frequency estimation. In this study, we report two sources of bias in TMB estimation that can lead to incorrect TMB estimates and inconsistency across studies.

The first source of bias results from misestimation of somatic mutation frequencies. The number of mutant alleles obtained when a genomic site is sequenced at some depth is a binomial random variable. However, if only sites at which a mutation is observed are considered, then this random variable is zero-truncated. In the early part of the last century Fisher showed that the proportion of successes obtained from a zero-truncated random variable is a biased estimate of the success probability [51]. This bias becomes more severe if the distribution is truncated at a frequency above zero, as is the case in the calculation of TMB. The second source of bias is due to the uncertainty in the estimated frequency. Even if the frequency estimate is unbiased, the number of somatic mutations with estimated frequency above a given threshold may be a biased estimate of the number of mutations with true frequencies above the threshold. This is because the number of somatic mutations with true frequencies to the left or right of the threshold may be very different, as illustrated in Fig. 3. The extent to which these two sources of bias affect TMB estimates depends on the shape of the variant allele frequency spectrum and can be substantial if there is a high proportion of low-frequency subclonal variants, with a steep slope in the variant allele frequency spectrum around the threshold (Fig. 2).

In the results based on simulated data the locations of all somatic mutations were known. In real data the somatic mutations must be inferred from mapped sequencing reads and they are not recovered with perfect efficiency, so that estimated TMB will be a function of both the power to detect somatic mutations as well as the biases, described above, that affect the number of mutations with observed frequency above the threshold. The down-sampling experiments we carried out were intended to assess the combined effects of these factors as a function of sequencing depth. We observed that the TMB is not consistent across different sequencing depths (Fig. 4). We also analyzed 10 TCGA cancer cohorts to assess the relationship between TMB and sequencing depth in real cancer samples. Our results showed that, in six out of the ten cancer types, TMB and sequencing depth are positively correlated (consistent with Fig. 4). The influence of tumour purity on mutation detection in mutation caller tools has been studied previously [53] and it is well known that the presence of normal cells in tumour samples can result in underestimation of tumour mutation allele frequencies [50], hence impacting on TMB estimation. A positive correlation has previously been reported between sample purity and TMB [36] and methods have been suggested to account for tumour purity in calculation of TMB, such as increasing sequencing depth and dividing variant allele frequency (VAF) by purity and increasing the threshold [54]. Given that the majority of samples we studied have high purity (above 50%), our results suggest that sequencing depth can have an impact on the TMB even in samples with high purity (Fig. S6).

Although many of the cancer types showed evidence of correlation between sequencing depth and TMB, this was not the case for bladder urothelial carcinoma (BLCA), lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), skin cutaneous melanoma (SKCM). Using PCAWG data in which clonal and subclonal mutations have been distinguished, we found that the lack of correlation between TMB and coverage in some cancer types is likely due to the high clonal proportions in samples within these cancer types (Fig. S7). Because they occur at high frequencies (far in excess of the frequency thresholds used to define TMB), clonal mutations are easier to detect and contribute unambiguously to the count of mutations exceeding the frequency threshold. Unsurprisingly, more heterogenous tumours (with high proportions of subclonal mutations) are more likely to be influenced by changes in sequencing depth. This is consistent with the observation of higher TMB in metastatic cancers, which has been suggested to result from bottlenecks in cell populations leading to increased proportions of clonal variants [55, 56]. Therefore, tumour heterogeneity may impact TMB estimates and may explain some of the variability in estimated TMB values across studies. Interestingly, tumour heterogeneity has also been suggested as a companion to TMB to achieve better performance in ICI response prediction ([57]).

Our study demonstrates that there can be substantial biases in TMB estimates when the mutational burden includes a large contribution from subclonal mutations. These biases result from lack of power to detect low-frequency variants as well as bias and uncertainty in estimated mutation frequencies. There are at least two ways in which this issue can be addressed. At higher sequencing depths the power to detect low-frequency variants increases. Given any TMB threshold it is possible to determine the sequencing depth that would be required to achieve higher power to recover somatic mutations at or above that threshold. Although, the biases we describe here also decrease with increasing sequencing depth it is less easy to determine the relationship between sequencing depth and the bias in TMB resulting from these effects as they depend on the shape of the mutation frequency spectrum. An alternative approach, which may provide stable estimates of TMB even for lower sequencing depths, would be to use all of the data generated to estimate the shape of the variant allele frequency spectrum and, from this, to derive an estimate of the TMB. Although in-principle this is possible, it will require the development of sophisticated statistical models that can account appropriately for all technical factors that can influence the probability with which somatic mutations are recovered and their observed frequency in tumour sequencing data.

Conclusion

We have examined two sources of bias that can affect current methodologies to estimate TMB. The impact of these biases depends on the mutant allele frequency spectrum and it can be substantial when the TMB includes a large contribution from subclonal mutations. These strength of these biases, as well as the power to detect subclonal mutations, vary with sequencing depth, resulting in the potential for inconsistency in TMB estimated using different sequencing depths. We show through an analysis of data from TCGA that there is a correlation between sequencing depth and estimated TMB, except in the case of tumours with large proportions of clonal variants. Overall, our findings caution that current methods to estimate TMB can be biased as well as inconsistent at different sequencing depths and we suggest that accurate and robust estimation of TMB could be achieved using statistical models to estimate parameters of the mutant allele frequency spectrum.

Availability of data and materials

The data that support the findings of this study are available from TCGA but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of TCGA.

Abbreviations

TMB:

Tumour Mutation Burden

ICI:

Immune Checkpoint Inhibitors

FDA:

Food and Drug Administration

CD8:

cluster of differentiation 8

WES:

Whole exome sequencing

NGS:

Next generation sequencing

TCGA:

The Cancer Genome Atlas

VAF:

variant allele frequency

BAM:

Binary Alignment Map

ICGC:

International Cancer Genome Consortium

PCAWG:

PanCancer Analysis of Whole Genomes

BLCA:

Bladder Urothelial Carcinoma

BRCA:

Breast invasive carcinoma

COAD:

Colon adenocarcinoma

KIRC:

Kidney renal clear cell carcinoma

LGG:

Brain Lower Grade Glioma

LUAD:

Lung adenocarcinoma

LUSC:

Lung squamous cell carcinoma

OV:

Ovarian serous cystadenocarcinoma

PRAD:

Prostate adenocarcinoma

SKCM:

Skin Cutaneous Melanoma

References

  1. Esfahani K, Roudaia L, Buhlaiga N. a., Del Rincon S, Papneja N, Miller W. A review of cancer immunotherapy: from the past, to the present, to the future. Curr Onco. 2020; 27(s2):87–97.

    Article  Google Scholar 

  2. Hargadon KM, Johnson CE, Williams CJ. Immune checkpoint blockade therapy for cancer: an overview of fda-approved immune checkpoint inhibitors. Int Immunopharmacol. 2018; 62:29–39.

    Article  CAS  PubMed  Google Scholar 

  3. Vaddepally RK, Kharel P, Pandey R, Garje R, Chandra AB. Review of indications of fda-approved immune checkpoint inhibitors per nccn guidelines with the level of evidence. Cancers. 2020; 12(3):738.

    Article  CAS  PubMed Central  Google Scholar 

  4. Petrelli F, Consoli F, Ghidini A, Perego G, Luciani A, Mercurio P, Berruti A, Grisanti S. Efficacy of immune checkpoint inhibitors in rare tumours: A systematic review. Front Immunol. 2021; 12:7207482021.

    Article  CAS  Google Scholar 

  5. Wolchok JD, Kluger H, Callahan MK, Postow MA, Rizvi NA, Lesokhin AM, Segal NH, Ariyan CE, Gordon R-A, Reed K, et al. Nivolumab plus ipilimumab in advanced melanoma. N Engl J Med. 2013; 369:122–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Topalian SL, Hodi FS, Brahmer JR, Gettinger SN, Smith DC, McDermott DF, Powderly JD, Carvajal RD, Sosman JA, Atkins MB, et al. Safety, activity, and immune correlates of anti–pd-1 antibody in cancer. N Engl J Med. 2012; 366(26):2443–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Onuma AE, Zhang H, Huang H, Williams TM, Noonan A, Tsung A. Immune checkpoint inhibitors in hepatocellular cancer: current understanding on mechanisms of resistance and biomarkers of response to treatment. Gene Expr. 2020; 20(1):53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Reck M, Rodríguez-Abreu D, Robinson AG, Hui R, Csőszi T, Fülöp A, Gottfried M, Peled N, Tafreshi A, Cuffe S, et al. Pembrolizumab versus chemotherapy for pd-l1–positive non–small-cell lung cancer. N Engl J med. 2016; 375:1823–33.

    Article  CAS  PubMed  Google Scholar 

  9. Alex F, Alfredo A. Promising predictors of checkpoint inhibitor response in nsclc. Expert Rev Anticancer Ther. 2020; 20(11):931–37.

    Article  CAS  PubMed  Google Scholar 

  10. Patel SP, Kurzrock R. Pd-l1 expression as a predictive biomarker in cancer immunotherapy. Mol Cancer Ther. 2015; 14(4):847–56.

    Article  CAS  PubMed  Google Scholar 

  11. Banna GL, Olivier T, Rundo F, Malapelle U, Fraggetta F, Libra M, Addeo A. The promise of digital biopsy for the prediction of tumor molecular features and clinical outcomes associated with immunotherapy. Front Med. 2019; 6:172.

    Article  Google Scholar 

  12. Marcus L, Fashoyin-Aje LA, Donoghue M, Yuan M, Rodriguez L, Gallagher PS, Philip R, Ghosh S, Theoret MR, Beaver JA, et al. Fda approval summary: pembrolizumab for the treatment of tumor mutational burden–high solid tumors. Clin Cancer Res. 2021; 27(17):4685–89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Riaz N, Morris L, Havel JJ, Makarov V, Desrichard A, Chan TA. The role of neoantigens in response to immune checkpoint blockade. Int Immunol. 2016; 28(8):411–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Ott PA, Hu Z, Keskin DB, Shukla SA, Sun J, Bozym DJ, Zhang W, Luoma A, Giobbie-Hurder A, Peter L, et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature. 2017; 547(7662):217–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Cohen CJ, Gartner JJ, Horovitz-Fried M, Shamalov K, Trebska-McGowan K, Bliskovsky VV, Parkhurst MR, Ankri C, Prickett TD, Crystal JS, et al. Isolation of neoantigen-specific t cells from tumor and peripheral lymphocytes. J Clin Investig. 2015; 125(10):3981–91.

    Article  PubMed  PubMed Central  Google Scholar 

  16. McGrail D, Pilié P, Rashid N, Voorwerk L, Slagter M, Kok M, Jonasch E, Khasraw M, Heimberger A, Lim B, et al. High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann Oncol. 2021; 32(5):661–72.

    Article  CAS  PubMed  Google Scholar 

  17. Addeo A, Friedlaender A, Banna GL, Weiss GJ. TMB or not TMB as a biomarker: That is the question. Crit Rev Oncol Hematol. 2021; 163:103374.

    Article  PubMed  Google Scholar 

  18. Bai R, Lv Z, Xu D, Cui J. Predictive biomarkers for cancer immunotherapy with immune checkpoint inhibitors. Biomark Res. 2020; 8(1):1–17.

    Article  Google Scholar 

  19. Xu Z, Dai J, Wang D, Lu H, Dai H, Ye H, Gu J, Chen S, Huang B. Assessment of tumor mutation burden calculation from gene panel sequencing data. OncoTargets Ther. 2019; 12:3401.

    Article  CAS  Google Scholar 

  20. Sha D, Jin Z, Budczies J, Kluck K, Stenzinger A, Sinicrope FA. Tumor mutational burden as a predictive biomarker in solid tumors. Cancer Discov. 2020; 10(12):1808–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Campesato LF, Barroso-Sousa R, Jimenez L, Correa BR, Sabbaga J, Hoff PM, Reis LF, Galante PAF, Camargo AA. Comprehensive cancer-gene panels can be used to estimate mutational load and predict clinical benefit to pd-1 blockade in clinical practice. Oncotarget. 2015; 6(33):34221.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Johnson DB, Frampton GM, Rioth MJ, Yusko E, Xu Y, Guo X, Ennis RC, Fabrizio D, Chalmers ZR, Greenbowe J, et al. Targeted next generation sequencing identifies markers of response to pd-1 blockade. Cancer Immunol Res. 2016; 4(11):959–67.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Allegretti M, Fabi A, Buglioni S, Martayan A, Conti L, Pescarmona E, Ciliberto G, Giacomini P. Tearing down the walls: Fda approves next generation sequencing (ngs) assays for actionable cancer genomic aberrations. J Exp Clin Cancer Res. 2018; 37(1):1–3.

    Article  Google Scholar 

  24. Samstein RM, Lee C-H, Shoushtari AN, Hellmann MD, Shen R, Janjigian YY, Barron DA, Zehir A, Jordan EJ, Omuro A, et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat Genet. 2019; 51(2):202–06.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Pestinger V, Smith M, Sillo T, Findlay JM, Laes J-F, Martin G, Middleton G, Taniere P, Beggs AD. Use of an integrated pan-cancer oncology enrichment next-generation sequencing assay to measure tumour mutational burden and detect clinically actionable variants. Mol Diagn Ther. 2020; 24(3):339–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Yao L, Fu Y, Mohiyuddin M, Lam HY. ectmb: a robust method to estimate and classify tumor mutational burden. Sci Rep. 2020; 10(1):1–10.

    Article  CAS  Google Scholar 

  27. Chalmers ZR, Connelly CF, Fabrizio D, Gay L, Ali SM, Ennis R, Schrock A, Campbell B, Shlien A, Chmielecki J, et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017; 9(1):1–14.

    Article  CAS  Google Scholar 

  28. Rizvi H, Sanchez-Vega F, La K, Chatila W, Jonsson P, Halpenny D, Plodkowski A, Long N, Sauter JL, Rekhtman N, et al. Molecular determinants of response to anti–programmed cell death (pd)-1 and anti–programmed death-ligand 1 (pd-l1) blockade in patients with non–small-cell lung cancer profiled with targeted next-generation sequencing. J Clin Oncol. 2018; 36(7):633.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Büttner R, Longshore JW, López-Ríos F, Merkelbach-Bruse S, Normanno N, Rouleau E, Penault-Llorca F. Implementing TMB measurement in clinical practice: considerations on assay requirements. ESMO Open. 2019; 4(1):000442.

    Article  Google Scholar 

  30. Li Y, Luo Y. Optimizing the evaluation of gene-targeted panels for tumor mutational burden estimation. Sci Rep. 2021; 11(1):1–11.

    CAS  Google Scholar 

  31. Zhou C, Chen S, Xu F, Wei J, Zhou X, Wu Z, Zhao L, Liu J, Guo W. Estimating tumor mutational burden across multiple cancer types using whole-exome sequencing. Ann Transl Med. 2021; 9(18):1437.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Huang T, Chen X, Zhang H, Liang Y, Li L, Wei H, Sun W, Wang Y. Prognostic Role of Tumor Mutational Burden in Cancer Patients Treated With Immune Checkpoint Inhibitors: A Systematic Review and Meta-Analysis. Front Oncol. 2021; 11:706652.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Fenizia F, Pasquale R, Abate RE, Lambiase M, Roma C, Bergantino F, Chaudhury R, Hyland F, Allen C, Normanno N. Challenges in bioinformatics approaches to tumor mutation burden analysis. Oncol Lett. 2021; 22(1):1–7.

    Article  CAS  Google Scholar 

  34. Meléndez B, Van Campenhout C, Rorive S, Remmelink M, Salmon I, D’Haene N. Methods of measurement for tumor mutational burden in tumor tissue. Transl Lung Cancer Res. 2018; 7(6):661.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Schou Nørøxe D, Flynn A, Westmose Yde C, Østrup O, Cilius Nielsen F, Skjøth-Rasmussen J, Brennum J, Hamerlik P, Weischenfeldt J, Skovgaard Poulsen H, et al. Tumor mutational burden and purity adjustment before and after treatment with temozolomide in 27 paired samples of glioblastoma: a prospective study. Mol Oncol. 2022; 16(1):206–18.

    Article  PubMed  CAS  Google Scholar 

  36. Tong J, Zhang X, Qu H, Yang Q, Duan J, Xu M. The positive correlation between tumor mutation burden and the purity of tumor samples in nonŰsmall cell lung cancer and colorectal cancer. J Clin Oncol. 2020; 38(15_suppl):e13683.

    Article  Google Scholar 

  37. Fancello L, Gandini S, Pelicci PG, Mazzarella L. Tumor mutational burden quantification from targeted gene panels: major advancements and challenges. J Immunother Cancer. 2019; 7(1):1–13.

    Article  Google Scholar 

  38. Stenzinger A, Allen JD, Maas J, Stewart MD, Merino DM, Wempe MM, Dietel M. Tumor mutational burden standardization initiatives: recommendations for consistent tumor mutational burden assessment in clinical samples to guide immunotherapy treatment decisions. Gene Chromosome Cancer. 2019; 58(8):578–88.

    Article  CAS  Google Scholar 

  39. Strickler JH, Hanks BA, Khasraw M. Tumor mutational burden as a predictor of immunotherapy response: is more always better?Clin Cancer Res. 2021; 27(5):1236–41.

    Article  CAS  PubMed  Google Scholar 

  40. Beije N, Helmijr JC, Weerts MJ, Beaufort CM, Wiggin M, Marziali A, Verhoef C, Sleijfer S, Jansen MP, Martens JW. Somatic mutation detection using various targeted detection assays in paired samples of circulating tumor dna, primary tumor and metastases from patients undergoing resection of colorectal liver metastases. Mol Oncol. 2016; 10(10):1575–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Teer JK, Zhang Y, Chen L, Welsh EA, Cress WD, Eschrich SA, Berglund AE. Evaluating somatic tumor mutation detection without matched normal samples. Hum Genomics. 2017; 11(1):1–13.

    Article  CAS  Google Scholar 

  42. Wang Q, Jia P, Li F, Chen H, Ji H, Hucks D, Dahlman KB, Pao W, Zhao Z. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 2013; 5(10):1–8.

    Article  CAS  Google Scholar 

  43. Krøigård AB, Thomassen M, Lænkholm A-V, Kruse TA, Larsen MJ. Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLoS ONE. 2016; 11(3):0151664.

    Article  CAS  Google Scholar 

  44. Shiraishi Y, Sato Y, Chiba K, Okuno Y, Nagata Y, Yoshida K, Shiba N, Hayashi Y, Kume H, Homma Y, et al. An empirical bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 2013; 41(7):89.

    Article  CAS  Google Scholar 

  45. Kim S, Jeong K, Bhutani K, Lee JH, Patel A, Scott E, Nam H, Lee H, Gleeson JG, Bafna V. Virmid: accurate detection of somatic mutations with sample impurity inference. Genome Biol. 2013; 14(8):1–17.

    Article  CAS  Google Scholar 

  46. Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics. 2012; 28(14):1811–17.

    Article  CAS  PubMed  Google Scholar 

  47. Chen Z, Yuan Y, Chen X, Chen J, Lin S, Li X, Du H. Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency. Sci Rep. 2020; 10(1):1–9.

    CAS  Google Scholar 

  48. cBioPortal. https://www.cbioportal.org,. Accessed 20 Sep 2021.

  49. ICGC Data Portal. https://dcc.icgc.org,. Accessed 28 Feb 2022.

  50. Mannakee BK, Gutenkunst RN. Batcave: calling somatic mutations with a tumor-and site-specific prior. NAR Genomics Bioinforma. 2020; 2(1):004.

    Article  CAS  Google Scholar 

  51. Fisher RA. The effect of methods of ascertainment upon the estimation of frequencies. Ann Eugenics. 1934; 6(1):13–25.

    Article  Google Scholar 

  52. Spencer DH, Tyagi M, Vallania F, Bredemeyer AJ, Pfeifer JD, Mitra RD, Duncavage EJ. Performance of common analysis methods for detecting low-frequency single nucleotide variants in targeted next-generation sequence data. J Mol Diagn. 2014; 16(1):75–88.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Cheng J, He J, Wang S, Zhao Z, Yan H, Guan Q, Li J, Guo Z, Ao L. Biased influences of low tumor purity on mutation detection in cancer. Front Mol Biosci. 2020; 7:343.

    Article  Google Scholar 

  54. Fernandez EM, Eng K, Beg S, Beltran H, Faltas BM, Mosquera JM, Nanus DM, Pisapia DJ, Rao RA, Robinson BD, et al. Cancer-specific thresholds adjust for whole exome sequencing–based tumor mutational burden distribution. JCO Precis Oncol. 2019; 3:1–12.

    PubMed  Google Scholar 

  55. Papillon-Cavanagh S, Hopkins JF, Ramkissoon SH, Albacker LA, Walsh AM. Pan-cancer analysis of the effect of biopsy site on tumor mutational burden observations. Commun Med. 2021; 1(1):1–7.

    Article  Google Scholar 

  56. Schnidrig D, Turajlic S, Litchfield K. Tumour mutational burden: primary versus metastatic tissue creates systematic bias. Immuno-Oncol Technol. 2019; 4:8–14.

    Article  Google Scholar 

  57. Gao Y, Yang C, He N, Zhao G, Wang J, Yang Y. Integration of the Tumor Mutational Burden and Tumor Heterogeneity Identify an Immunological Subtype of Melanoma With Favorable Survival. Front Oncol. 2020; 10:2435.

    Article  Google Scholar 

Download references

Acknowledgements

The results published or shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

Funding

The project was funded by Science Foundation Ireland (SFI), grant number 16/IA/4612.

Author information

Authors and Affiliations

Authors

Contributions

Authors’ contributions

MAM performed the analysis and drafted the manuscript. CS initiated and supervised the research project. BOS performed simulations and estimated TMB in TCGA data. All authors edited and approved the final manuscript.

Authors’ information

All authors work at the School of Mathematical & Statistical Sciences, National University of Ireland, Galway.

Mohammad Adib Makrooni-first author: Post-doctoral researcher

Brian O’Sullivan: PhD student

Cathal Seoighe-corresponding author: Professor of Bioinformatics and principal investigator.

Corresponding author

Correspondence to Cathal Seoighe.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

Supplementary material. The additional file 1 contains figurers.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Makrooni, M.A., O’Sullivan, B. & Seoighe, C. Bias and inconsistency in the estimation of tumour mutation burden. BMC Cancer 22, 840 (2022). https://doi.org/10.1186/s12885-022-09897-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12885-022-09897-3

Keywords