A simple algebraic cancer equation: calculating how cancers may arise with normal mutation rates
© Calabrese and Shibata. 2010
Received: 19 February 2009
Accepted: 5 January 2010
Published: 5 January 2010
Skip to main content
© Calabrese and Shibata. 2010
Received: 19 February 2009
Accepted: 5 January 2010
Published: 5 January 2010
The purpose of this article is to present a relatively easy to understand cancer model where transformation occurs when the first cell, among many at risk within a colon, accumulates a set of driver mutations. The analysis of this model yields a simple algebraic equation, which takes as inputs the number of stem cells, mutation and division rates, and the number of driver mutations, and makes predictions about cancer epidemiology.
The equation [p = 1 - (1 - (1 - (1 - u) d ) k ) Nm ] calculates the probability of cancer (p) and contains five parameters: the number of divisions (d), the number of stem cells (N × m), the number of critical rate-limiting pathway driver mutations (k), and the mutation rate (u). In this model progression to cancer "starts" at conception and mutations accumulate with cell division. Transformation occurs when a critical number of rate-limiting pathway mutations first accumulates within a single stem cell.
When applied to several colorectal cancer data sets, parameter values consistent with crypt stem cell biology and normal mutation rates were able to match the increase in cancer with aging, and the mutation frequencies found in cancer genomes. The equation can help explain how cancer risks may vary with age, height, germline mutations, and aspirin use. APC mutations may shorten pathways to cancer by effectively increasing the numbers of stem cells at risk.
The equation illustrates that age-related increases in cancer frequencies may result from relatively normal division and mutation rates. Although this equation does not encompass all of the known complexity of cancer, it may be useful, especially in a teaching setting, to help illustrate relationships between small and large cancer features.
The motivation for this article is to present an easy to understand equation that illustrates how cancers can arise within a lifetime from relatively normal mutation and division rates. Given the multiplicity and greater sophistication of many other cancer models, it is primarily presented as a teaching tool to demonstrate how cancers may result as mutations accumulate in stem cells under a very simple scenario. The goal is to illustrate to a wider audience (an average college graduate) that many numerical aspects of cancer biology may be described mathematically. (A short slide presentation summarizing the major points is provided as Additional file 1.)
A more formal analysis of this equation was previously published , which predicted human colorectal cancers could arise with relatively normal mutation and division rates, and the current presentation is a simpler, algebraic version that may be easier to understand and manipulate. As recently noted , algebraic methods are often more intuitive and easy to understand than differential equations. Since its publication, the mutational landscape of colorectal cancer genomes has been better characterized [3–5]. An interesting observation is that the mutation frequency in a cancer genome is less than one mutation per 100,000 bases, which is consistent with relatively normal mutation and division rates . This new experimental data motivates us to revisit how cancers may arise with normal division and mutation rates. Because all cells initially have normal mutation and division rates, it is possible to estimate the relative roles of old age and "bad luck" (i.e. a parsimonious pathway because functional changes are unnecessary during progression), versus a necessity of overcoming specific anti-cancer barriers during progression to cancer.
Parameters are p (probability of cancer), b (a constant), t (age of individual), and k (the number of rate-limiting stages). The equation fits the epidemiology of colorectal cancer when k is 5 or 6.
This equation does not include many biological parameters, which are presumably incorporated into its constant "b". Intuitively, cancer incidence should increase with greater numbers of cells at risk, with greater numbers of cell divisions, and with higher mutation rates. Here we present a simple algebraic equation that relates small biological features (adult stem cells and their niches, tissue size, numbers of rate-limiting driver mutations, and mutation rates) with the epidemiology of colorectal cancer.
Colorectal Cancer With Specific Gene Targets*
Colorectal Cancer With Pathway Gene Targets**
5 driver gene mutations
6 pathway mutations
number of crypts
stem cells per crypt
target mutation rate
1 × 10-6 per gene per division
3 × 10-6 per pathway per division
divisions since birth
once every four days
once every four days
probability of cancer
Model Assumptions For An Average Repair Proficient Colorectal Cancer
driver mutations or rate-limiting stages (k)
the same for all cancers, transformation or growth does not occur until all driver mutations accumulate
value is unknown and is inferred to fit the epidemiology, growth is likely to precede transformation but most mutations likely accumulate in normal stem cells
number of crypts (m)
does not change during life
crypt number may vary between individuals
stem cells per crypt (n)
does not change during life
value is unknown (minimum of one per crypt), inferred to fit the epidemiology.
mutation rate (u)
does not change during life
value may differ between genes but is ~10-10 to 10-9 per base per division
stem cell divisions since birth (d)
constant division rate during life
value is unknown but may be as high as once per day 
probability of cancer (p)
no significant time between transformation and diagnosis
data from cancer epidemiology, lag time may vary between patients
This epidemiology can be reconstructed with Equation  and parameter values consistent with colon biology (Table 1). The number of crypts per colon is ~15 million . The mutation rate is set at 10-6 per division per gene . The division rate is set at one division every four days, as modeled in a recent analysis . Uncertain are the numbers of stem cells per crypt and the number of rate-limiting stages or mutations.
Curve fitting with five k rate-limiting mutations and 40 stem cells per crypt approximates the epidemiologic data (Fig 2B). However, recent cancer genome data suggest that functional or regulatory pathways rather than specific sets of genes are more relevant oncogenic targets because driver mutations are diverse [4, 5, 11, 12]. Mutation of several genes in a regulatory pathway may be functionally equivalent. If three genes are at risk in a pathway, then the probability of mutation (u) of any one of the three genes in a single division is 3 × 10-6 instead of 1 × 10-6 with a single gene target (i.e. the mutation target is 3,000 bases instead of 1,000 bases). Curve fitting with six k rate-limiting pathway mutations and eight stem cells per crypt also approximates the epidemiologic data (Fig 2B and Table 1). Equation  with six k rate-limiting driver pathway mutations will be subsequently used for further analysis because of a better conceptual fit with the idea that regulatory pathways rather than specific single genes are altered in cancer .
Most of the divisions to cancer likely occur in stem cells because the genealogy of a cancer cell starts at the zygote and ends at the present day cancer genome (Fig 1). Phenotype varies along this genealogy, but a crypt stem cell phenotype occupies the longest interval because visible tumorigenesis before the age of 50 years is rare. The stem cell division rate is uncertain because human crypt stem cells have not been conclusively identified or characterized. In mice, crypt stem cell division rates were estimated at once per day, using a new potential stem cell marker Lgr5 . A human stem cell division rate of once every four days and the parameter values in Table 1 approximate the epidemiology and the observed mutation frequencies of colorectal cancers.
To further test the utility of Equation , we apply it to another data set. The equation predicts the incidence of colorectal cancer will increase with the number of crypts. Colon lengths are difficult to measure because the organ is elastic, but taller individuals generally have longer colons . Taller individuals also appear to have higher risks for colon cancer. In one study, the relative risk of cancer increased 1.4 in men and 1.8 in women between the tallest and shortest quintiles of individuals .
One can model these cancer frequency changes with about 16.7% fewer crypts in the shorter quintile and 16.7% more crypts in the taller quintile for men, and 28.6% fewer and 28.6% more crypts in women (Fig 2C). Colon lengths may vary over 2-fold , allowing for the variation predicted with the equation. This example indicates how a small biological feature (m or the number of crypts) is interrelated with cancer risks and can be indirectly measured from cancer epidemiology.
The biological meaning of "half" a rate-limiting pathway mutation is unclear, and may indicate that Equation  does not readily apply after the onset of visible tumorigenesis (see below). Alternatively, a parameter change that can decrease the incidence of a cancer subtype without changing k is a decrease in the size of the mutational pathway targets, which effectively lowers the mutation rate u. Progression to a particular cancer subtype may require a smaller subset of all possible mutational targets for a general type of cancer (Fig 4). Whereas u is 3 × 10-6 for all colorectal cancers, localized or regional cancers, and metastatic cancers appear to have smaller mutational targets, respectively 2.55 × 10-6 and 2.2 × 10-6 (Fig 2F). Instead of linear progression (Fig 4A), this modeling implies that metastatic cancers also require only six rate-limiting driver pathway mutations that confer both transformation and the ability to metastasize.
Whether or not the capability for metastasis is present at transformation or acquired after transformation, the geometry of Equation  predicts minimal differences in numbers of mutations between a primary and its metastases because the interval before transformation is typically much greater than the interval after transformation (Fig 4B). For example, if transformation occurred at 78 years of age and a metastatic cancer is removed two years later, only 2.5% of the cancer genealogy interval accumulated after transformation. A recent study also found few mutational differences between a metastatic tumor and its primary . On average 97% of the mutations found in the metastatic lesion were also detected in its primary.
Familial cancers are characterized by cancer at earlier ages and germline inactivation of one allele of an important tumor suppressor gene. For example, familial adenomatous polyposis (FAP) is characterized by heterozygous germline APC mutations , and APC somatic mutations are present in most sporadic colorectal cancers . Decreasing the number of rate-limiting pathway mutations from six (sporadic cancer value) to five recreates the earlier age onset of cancer in FAP (Fig 2F).
The number of crypt stem cells is difficult to measure directly because of the lack of specific or sensitive markers. Estimates of stem cell numbers per crypt range from one to forty in mice . Human crypt stem cell numbers are more uncertain as experimental manipulations are limited.
Niches modify mutation accumulation. Most early mutations are lost because most stem cell lineages become extinct during crypt clonal evolution. The niche serves as a crucible---early mutations in a cancer genealogy must also achieve fixation by occurring in the single stem cell that attains crypt clonal dominance. Because the niche population size is small, neutral or even mutations that confer a slight disadvantage may become fixed by chance or drift  rather than selection within a crypt. A requirement for both mutation and subsequent fixation (Fig 5C), or two hurdles with each rate-limiting stage (or "relatively rare event" ) may help make cancer even rarer .
Transformation of a stem cell lineage later in life is contingent on its persistence earlier in life despite periodic threats of extinction during niche clonal evolution, which may help explain why APC mutations are found in nearly all colorectal cancers . Crypt stem cell survival depends on several signaling pathways. WNT signaling appears necessary for crypt stem cell survival, and APC is a central regulator of the Wnt pathway [24, 25].
FAP individuals are born with normal appearing colon crypts but have heterozygous APC germline mutations. Certain APC mutations confer dominant-negative effects with up-regulation of Tcf-B-catenin-mediated transcription in experimental systems . Some heterozygous APC mutations appear to decrease cell mobility , which may enhance survival of its stem cell relative to surrounding wild type stem cells that more readily migrate out of the niche. In this way, certain APC mutations may be more common in colorectal cancers because when acquired earlier in life, they also favor persistence of its stem cell through subsequent crypt clonal evolution cycles. Simplistically, APC mutations may favor progression with a minimum of divisions because fixation of subsequent driver mutations is less imperative (Fig 5D). Interestingly, APC may undergo sequential mutation and selection during progression .
Passenger methylation patterns in normal appearing FAP crypts are more diverse than non-FAP crypts, consistent with enhanced stem cell survival . This enhanced stem cell survival effectively doubles the number of FAP niche stem cells and increases the average crypt stem cell clonal evolution interval from eight to 30 years . The doubling of FAP crypt stem cells increases the risk of cancer (Fig 2F), and this addition effect of certain APC mutations along with one fewer rate-limiting k mutations better fits the observed incidence of FAP cancer with aging .
Conversely, inhibition of the Wnt-signaling pathway may effectively decrease niche stem cell numbers and reduce cancer. Non-steroidal anti-inflammatory drugs inhibit Wnt-signaling and down regulate Tcf-B-catenin transcription [30, 31]. Aspirin use is associated with reduced colorectal cancer, with relative risks of about 0.8 compared to non-aspirin users . A 25% reduction in effective stem cell number (N) from 8 to 6 per crypt can account for the ~0.8 relative risk decrease with aspirin use (Fig 2G).
More mutations will accumulate with increased mutation or cell division rates . Inflammatory bowel disease (IBD) is associated with increased cancer risks, which increases with the length and extent of disease . IBD was modeled with Equation  with either a 10% increase in mutation or stem cell division rates (Fig 2H). The predicted effect is a ~1.8-fold increased relative risk of cancer. Stem cell proliferation or mutation rate changes appear to be equivalent with respect to cancer risks.
Potentially there are many more mutations secondary to copy number changes from chromosomal instability or CIN . However, early sequencing studies suggest that relatively few DNA breaks may underlie CIN. For example, less than one hundred somatically acquired breakpoint sequences per lung cancer cell line (~1 breakpoint per 10,000,000 bases) were detected with genome-wide massively parallel paired-end sequencing .
Logically the first cell that transforms requires fewer divisions than subsequent cells. Alternative but longer, less parsimonious pathways may not be observed simply because transformation cannot occur within a lifetime. A start from conception and decades in normal colon may also help explain why the numbers of divisions to cancer appear consistent with near normal division and mutation rates , because uncontrolled proliferation may be limited to the relatively short terminal neoplastic phase of a cancer genealogy. Decades in normal colon can also help explain why pathways to cancer almost always collect an APC mutation that may favor persistence during niche clonal evolution and lessen a fixation requirement for subsequent driver mutations.
Cancer modeling has a long history (see for example ref ) and it is possible to fit many models to cancer data. Such modeling is complicated because many parameter values are uncertain and likely to differ between individuals, populations, and through time. Ideally, experimentalists and modelers interact, but many cancer equations are incomprehensible to many students and experimentalists. The current equation incorporates some of the assumptions in other cancer models (see Table 2), but its algebraic format may be easier to understand and manipulate .
What "causes" cancer? This model examines whether colorectal cancers can arise within a lifetime from normal division and mutation rates, and without serial selection and clonal expansion (a parsimonious pathway). Whereas the accumulation of sufficient numbers of driver mutations might be highly unlikely with normal mutation rates , new experimental data illustrate that colorectal cancer mutation frequencies are relatively low and consistent with normal mutation and division rates . This new data constrains models because proliferation or mutation rates do not have to be and are not significantly altered during most of progression. Stem cells, which are the long-lived lineages that can accumulate mutations during progression , might seldom divide, but recent studies in mice suggests crypt stem cells are not quiescent but actively divide about once per day . An important distinction is that an individual gets cancer when the first cell and not the average cell accumulates a critical number of driver mutations.
Here we illustrate that mutation accumulation from normal cell replication can account for the low per cell transformation rates and low cancer genome mutation frequencies. Progression to cancer is complex and variable, but certain biological features are likely to be fundamentally important when averaged over many individuals and many years. These factors are the number of divisions (d), the number of stem cells (N × m), the number of critical rate-limiting driver pathway mutations (k), and the mutation rate (u). The probability that at least one stem cell accumulates the required number of driver mutations in an individual's lifetime is substantially greater than the probability a typical stem cell acquires these mutations. Given a 5% risk of colorectal cancer by 100 years of age, only five cells in 100 individuals transform after 100 years. There are ~15 million crypts per colon and therefore at least ~15 million stem cells at risk for colorectal cancer in an individual. Therefore, only ~five of 1.5 billion crypt stem cell lineages transform within a 100 years, or a single transformation event per ~30 billion crypt stem cell years (stem cell lineage transformation efficiency ~3 × 10-9). Chance and the enormous variation generated by replication errors in millions of stem cell lineages may be sufficient for the selection of low frequency cancer phenotypes within a lifetime.
A probabilistic description of cancer has several aspects consistent with cancer genome data, which show relatively low mutation frequencies, diverse combinations of mutations between different tumors, and a high proportion (>80%) of neutral passenger mutations [4, 5, 11, 12]. Equation  models random mutation that starts from conception and therefore the numbers and types of mutations in a cancer genome is highly dependent on what happens in normal colon (Fig 1). Mutations (predominantly passenger mutations) may arise as replication errors, and cancer results by chance from rare and diverse driver mutation combinations that confer a malignant phenotype in a single cell. Certain APC mutations may be common in colorectal cancer because they enhance stem cell survival during niche clonal evolution and shorten pathways by effectively increasing subsequent numbers of stem cells at risk. The similar base mutation spectrum in colorectal, pancreatic, and glial tumors  is consistent with a common underlying mechanism such as replication errors.
A cancer model that includes epidemiology data needs an age parameter. Implicit in Equation  is that progression starts at conception and most mutations accumulate in normal appearing colon (Fig 1). Once visible tumorigenesis occurs, this equation does not readily apply because it calculates the risk of the entire colon and does not model the adenoma-cancer sequence . However, progression to cancer may be dominated by its passage in normal colon because tumors before the age of 50 years are rare. The accumulation of somatic driver mutations in normal tissues is poorly documented, but mouse models demonstrate that many oncogenic mutations are also compatible with normal phenotypes , illustrating that some driver mutations can potentially arise earlier in life and persist in normal colon. Transformation of primary human cells has been engineered in vitro, but tumorigenesis in nude mice required the simultaneous combination of all three changes in a single cell .
This simple equation does not include copy number or epigenetic variations, or the very likely possibility that error and division rates may change during progression, and should be viewed as an exploratory or teaching tool. Many other quantitative models of cancer have been published, include a model of cancer genome data , but the algebraic format of this equation may be more familiar to students, which can also be manipulated with an Excel spreadsheet (see Additional file 2). By this equation, cancer is "caused" by replication errors, a large number of cells at risk, and "bad luck" , with cancer risks increased by stem cell divisions that normally occur with aging . The examples analyzed with Equation  illustrate that subtle rather than dramatic cell changes are consistent with risk changes measured in large populations. From a broader perspective, its "integrative" nature relates how cancer incidence may depend on effective stem cell numbers, division and error rates, and numbers of required rate-limiting driver pathway mutations. Many progression pathways from the zygote are possible, but the shorter, parsimonious ways may allow cancers to appear within a lifetime.
The equation p = 1 - (1 - (1 - (1 - u) d ) k ) Nm illustrates that age-related increases in cancer frequencies may result from relatively normal division and mutation rates. Although this equation does not encompass all of the known complexity of cancer, it may be useful, especially in a teaching setting, to help illustrate relationships between small and large cancer features.
Supported by grants from the National Institutes of Health and the Norris Comprehensive Cancer Center.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.