Detection of cancer before distant metastasis

Background To establish a distant metastasis (DM) cells must disseminate from the primary tumor and overcome a series of obstacles, the metastatic cascade. In this study we develop a mathematical model for this cascade to estimate the tumor size and the circulating tumor cell (CTC) load before the first metastasis has formed from a primary breast cancer tumor. Methods The metastatic cascade is described in discrete steps: 1. local tumor growth; 2. dissemination into circulation; 3. survival in circulation; 4. extravasation into tissue; and 5. growth into a metastasis. The model was built using data and relationships described in the literature to predict the relationship between tumor size and probability of distant metastasis for 38715 patients with surgically removed TXNXM0 primary breast cancer from the Netherlands Cancer Registry. The model was calibrated using primary tumor size, probability of distant metastasis and time to distant metastasis for 1489 patients with stage T1BNXM0 (25% of total patients with T1BNXM0). Validation of the model was done with data for all patients. Results From the time to distant metastasis of these 38715 breast cancer patients, we determined a tumor doubling time of 1.7 ± 0.9 months. Fitting the data for 25% of T1B patients estimates a metastatic efficiency of 1 metastasis formed per 60 million disseminated tumor cells. Validation of the model to data of patients in all T-stages shows good agreement between model and epidemiological data. To reduce the 5-year risk of distant metastasis for TXNXM0 from 9.2% to 1.0%, the primary tumor needs to be detected and removed before it reaches a diameter of 2.7 ± 1.6 mm. At this size, the model predicts that there will be 9 ± 6 CTC/L blood. Conclusions To reduce the rate of distant metastasis in surgically treated TXNXM0 breast cancer to 1%, imaging technology will need to be able to detect lesions of 2.7 mm in diameter or smaller. Before CTC detection can be applied in the early disease setting, sensitivity will need to be improved by at least 15-fold and combined with technology that minimizes false positives.


Background
The majority of deaths from cancer are due to distant disseminated disease rather than the primary tumor [1]. While metastases are often discovered years after surgical removal of the primary tumor, probably at least one metastasis was already present at the time of surgery. Understanding of the formation of distant metastasis (DM) is crucial for the reduction of the recurrence rate. For the successful colonization of a secondary site a cancer cell must complete a series of steps to become a clinically detectable lesion, baptized the metastatic cascade [2][3][4][5]. This cascade is an inefficient process, generating metastasis by sending large numbers of malignant cells into the circulation [6,7]. The number of cells disseminated and the efficiency of metastasis formation contribute to the probability that a metastasis has formed. Assays for enumeration of circulating tumor cells (CTC) in blood can provide the number of disseminated cells. The CTC are the new seeds of a tumor, and as such provide an opportunity to estimate the metastatic efficiency. Due to metastatic inefficiency, the presence of CTC does not imply that metastases already exist. Identification and enumeration of CTC at or before the diagnosis of the primary tumor may allow estimation of the probability of DM being present and thus help identify patients who will benefit from more intensive adjuvant therapy after surgical removal of the primary lesion. Here we developed a model for the metastatic cascade and estimated the sensitivity needed for imaging and CTC enumeration to detect a primary tumor before it has formed DM.

Distant metastasis statistics
The probability of DM and time to DM was determined based on patients selected from the population based Netherlands Cancer Registry (NCR, www.iknl.nl). Specially trained registrars access the patient files to gather data regarding patient, tumor characteristics, and treatment for all malignancies in all hospitals in the Netherlands. Tumor size is determined by specialized pathologists. From the registry, we selected women who were diagnosed between 2003 and 2006 with pathological stage T 1A N X M 0 -T 2 N X M 0 primary invasive breast cancer, being 25 years or older at time of diagnosis and had mastectomy or breast conserving surgery. Patients were excluded if any evidence of residual tumor was found after surgery. Patients were followed until at least five years after diagnosis and both occurrence and date of DM were registered. Time between diagnosis and occurrence of DM is indicated with the mean and standard deviation. Probability of a DM within the five years of follow-up and the 95% confidence interval of DM were determined by Poisson statistics. The study protocol was approved by the NCR Privacy Council and assured that all necessary consent from the patients were obtained.

Parameter fitting
A numerical model for the development and detection of DM was developed and tested in Matlab 2009a (Mathworks, Natick, MA). The two essential elements in the model are the number of cells disseminated into circulation with a certain time interval and the probability of metastatic success of each cell. The number of metastases formed is described by a binomial distribution, which is approximated by applying the Poisson limit theorem. The resulting cumulative distribution function was compared to a pseudo random number generated by Matlab. The time to DM in the model was defined as the time between removal of the primary tumor and the time when the first DM reaches 8 mm in diameter (T 1B ). The mean and standard deviation of the doubling time was determined from the time to DM. CTC concentration was fit to available literature values; patients with metastatic disease have 3.0 CTC/mL (5-95 percentile: 0.02-417 CTC/mL) [8][9][10][11], and patients with early stage breast cancer have CTC at a mean concentration of 0.03 CTC/mL (range of estimates 0.01-0.05 CTC/mL) [12][13][14][15].
The product of dissemination rate and metastatic efficiency was fit to the probability of DM for 25% of patients with stage T 1B , running 10,000 iterations and randomizing doubling time for each iteration. Stage T 1B was selected for fitting because it is the smallest frequently discovered tumor, typically discovered when it reaches a diameter of 8 mm. After fitting the data on a subset of T 1B patients we validated the model by comparing predicted and actual probability of DM for all patients grouped by T-stage (T 1A -T 2 ).

Model for formation of distant metastasis
The steps in the metastatic cascade are summarized in Figure 1. 1). A tumor grows locally. 2). Cells disseminate from the primary tumor. 3). The tumor cells that ultimately survive in the circulation. 4). Arrest of tumor cells in the microcirculation of an organ and potential extravasation into the surrounding tissue. The extravasated tumor cells can either 5A). Survive as a singular dormant cell, 5B). Form a micro metastasis, or 5C). Grow into a macro metastasis. While it is unknown what triggers a primary tumor to start shedding cells into the blood stream, this shedding starts well before the primary tumor is detectable by current imaging techniques [16][17][18].
We developed a mathematical model for the formation of metastasis using relationships described in literature for each of these steps. Steps 3,4 and 5C together are the probability that a disseminated cell forms a macro metastasis, i.e. the metastatic efficiency (γ metastatic ).
In the model the following conventions are applied: the number of cells is described with N, the dissemination Figure 1 Steps in the metastatic cascade. Parameters related to each step are shown near the description of each step. Cells in tissue are indicated in light gray, cells in blood or in transition between blood vessel and tissue are indicated in black. Local growth (step/equation 1) is needed to provide sufficient disseminated cells. Steps 2-5 are typically referred to as the metastatic cascade (equations S3-S5, S8). Step 5 has multiple outcomes, with cells surviving, but not replicating (5A), briefly or slowly replicating (5B) or rapidly replicating (5C). rate with R, the diameter with D, the doubling time with DT and other constants with C. Subscripts are used to differentiate different N. The formation of distant metastasis is described with functions for the steps, depicted in Figure 1, see Additional file 1: Supplemental S1 for a more elaborate derivation: 1. Local growth: In a comparison of functions typically used to describe tumor growth (exponential, Gompertz or logistic), the logistic function fit best [19], equation 1: Gompertz and logistic functions have a slowing growth rate as the tumor reaches a maximum size N max (N = number of tumor cells) at a certain time (t). N max is typically chosen at 10 12 cells/1 kilogram [19]. We assume metastases grow according to equation 1. Changes in growth rate due to occurrence of growth enhancing mutations or due to chemo or hormonal therapy are not explicitly considered in any of the growth models. 2. Dissemination to circulation: The relationship between tumor diameter (D mass ) and the number of disseminated cells (N diss ) is assumed linear and is derived from murine data comparing CTC counts to the diameter of the primary tumor [20][21][22][23]. To derive the diameter of the lesion from the number of cells (N mass ) we assumed a spherical lesion, cells into which disseminates the bloodstream at a rate (R diss ): 3,4,5C. Formation of a metastasis: The relationship between the number of cells injected into the circulation and the number of macroscopic metastases is linear [24][25][26], with a slope γ metastatic the metastatic efficiency. We now find the total number of metastases: In this equation, the number of metastases formed is equal to the metastatic efficiency times the total number of cells disseminated from the tumor. The rate of cell dissemination (R diss ) is measurable by detecting the number of CTC, while the metastatic efficiency (γ metastatic ) may be measurable either by genotyping these CTC or the primary tissue.
Below we listed the assumptions of the model with their impact: 1. Growth of the primary tumor is described by a logistic function. Studies that determine the growth functions do so on relatively large tumors, our estimated metastatic efficiency is effectively fit to the time-period just before tumor discovery. 5. The probability of forming a DM and the dissemination rate are not dependent on the growth rate of the primary tumor. Considering the high incidence of recurrence within the first 5 years compared to years 6-15, metastatic efficiency or dissemination rate probably are smaller for slower growing tumors. If, for example, 75% of recurrences are found in the first five years (doubling time < 3 months), 20% of recurrences in years 5-15 (doubling time 3-9 months), and 5% of recurrences in years 15-25 (doubling time 9-15 months), the metastatic efficiency/dissemination rate for a 0-3 month doubling time would be approximately 6-fold higher than a 4-9 month doubling time and approximately 24-fold higher than a 10-15 month doubling time. 6. Both the rate of dissemination and the probability of distant metastasis formation are independent of the cancer type. We need to make this assumption because (1) data on CTC concentrations versus cancer type is not available, and (2) data on cancer types for the patients in our data set is not available.
In another study, the five year risk of recurrence for patients with triple negative breast cancer was estimated to be 2.6 fold higher than for patients with other breast cancers [37]. This implies that the product of dissemination rate and metastatic efficiency needs to be approximately 2.6 fold higher for patients with triple negative breast cancer. 7. Disseminated cells are not temporarily 'stored' in the bone marrow (tumor dormancy). In the model we assume the transit from primary tumor to metastatic site occurs within days. Temporary storage (dormancy) of cells in the bone marrow at the metastatic site, would result in a delay in the start of growth, and thus in an underestimation of the doubling time. For long delays the recurrence would likely be pushed outside the 5-year window, and for small delays the doubling time is marginally affected. For example, if a typical delay would be four months, the previous doubling time estimate of 1.7 months would become 1.5 months. 8. All primary tumors are removed once they reach a predefined (constant) diameter corresponding to the median size at each stage. This assumption has a negligible impact on growth rate due to the small range of sizes within each T-stage. 9. In case of DM at least one metastasis was formed prior to surgery. This assumption means that all metastases originate from the primary tumor, and implies no patients formed a second primary tumor. This assumption leads to an overestimation of the rate of recurrence, and thus an overestimation of the metastatic efficiency. For example, if 10% of DM were misclassified and actually a second primary, the risk of recurrence is reduced by 10% and the metastatic efficiency is reduced by 10%. 10.The probability of DM is defined as the probability that at least one metastasis was present at the time of surgery, continues to grow and is discovered once it has reached a size of 8 mm. Changing the size at which a tumor is discovered affects the estimated growth rate. For example, if the typical tumor is discovered at a size of 15 mm, the estimated doubling time is 1.5 months instead of 1.7 months. 11.For fitting of the model, the primary tumor is detected when it reaches a size of 8 mm (stage T 1B ). This assumption has negligible impact on growth rate due to the small range of sizes within each T-stage. 12.For validating the model, the primary is detected when it reaches the median size representative of each T X stage. This assumption has negligible impact on growth rate due to the small range of sizes within each T-stage. 13.Cardiac output is 5 L/minute. The cardiac output is used to convert the CTC concentration to the dissemination rate. A different cardiac output only affects the dissemination rate and metastatic efficiency. A cardiac output of for example 6 L/minute would reduce the estimated dissemination rate by 17%, but increase the estimated metastatic efficiency by 20%. 14.Dissemination rate is proportional to tumor diameter. Literature values in murine models [20,23,[38][39][40][41][42] suggest a linear relationship between dissemination rate and tumor diameter. However, most of these determine three data points. The model does not fit the epidemiological data if we assume the dissemination rate to be proportional to the tumor surface area, or the number of cells in the tumor.

Distant metastasis statistics
Of 42318 patients matching our search criteria, 38715 (91%) patients were included, see Figure 2 for exclusion details. The probability of DM after surgery and the time to DM from stage T 1A -T 2 invasive breast carcinoma patients, without known metastases at time of diagnosis (N X M 0 ) was determined and shown in Figure 3. Three thousand five hundred and fifty patients ( Table 2, with more detailed information in Additional file 1: Table S1.

Tumor doubling time
From the time to DM of the 38715 breast cancer patients, we determined a doubling time (DT) of 1.7 ± 0.9 months. Human values for DT are estimated by fitting a growth model to imaging data reported in the literature, see Additional file 1: Table S2. The median DT is estimated at 5.7 months (range 2.0 to 11.2 months) in these reports.

Formation of metastases and dissemination rate
The probability of forming a metastasis is primarily determined by (1) the number of cells entering the circulation and (2) the probability that each of these cells forms a metastasis (γ metastatic ). The number of cells entering the circulation is a function of the dissemination rate (R diss ) and the elapsed time, which is affected by the tumor doubling time (DT). For a given DT, the probability to form a metastasis before surgery is determined by the product  of γ metastatic and R diss . We derived γ metastatic from this product by assuming that R diss is directly related to the CTC concentration, as described in Additional file 1: Supplemental S2. The CTC concentration reported for primary breast cancer before surgery is 0.03 CTC/mL (range 0.01-0.05 CTC/mL [12][13][14][15]). We now find a dissemination rate for an 8 mm tumor of 280 CTC/h · g tumor (range 90-470 CTC/h · g tumor) and a metastatic efficiency of 1.7 · 10 -8 metastases formed per disseminated cell (range 1.3 · 10 -8 -4.2 · 10 -8 ), or approximately 60 million disseminated cells per formed macrometastasis. For comparison, the dissemination rate can be estimated from two human studies which determined the CTC concentration in the efferent vein of colorectal and renal cancer, Table 3, with a median estimate of 3,100 CTC/h · g tumor, and an estimated range of 90-78,000 CTC/h · g. Metastatic efficiency has not yet been estimated in humans. In murine models, the dissemination rate determined by various techniques spans a wide range of nearly 7 orders of magnitude, Table 2 and Additional file 1: Table S1. The median estimate is 1.0 · 10 5 CTC/h · g (range 0.15 to 8.7 · 10 6 CTC/h · g). Metastatic efficiency has been determined either from the number of macro metastases formed from injection of a known number of malignant cells, or by observing the individual probabilities in the metastatic cascade by means of intra-vital video microscopy (IVM, [43]). Methods, which determined the metastatic efficiency from injection of a known number of cells, estimated γ metastatic at 0.005% (range 0.0001-6%), Additional file 1: Table S3. From the IVM studies we find a comparable γ metastatic of 0.011%, primarily caused by the low probability of extravasated cells to form a macro metastasis, Additional file 1: Supplemental S3.
CTC in a metastatic patient are present at a concentration of 3 CTC/mL of blood [11], however a 100-fold lower concentration is detected in patients before surgery. In our model the number of circulating tumor cells is linked to the total tumor size. While the total tumor size of all lesions is larger for a metastatic patient, the difference is not sufficient to cause such a high change in CTC concentration. To achieve the higher CTC concentration post-surgery, we increased the dissemination rate by 25-fold for all metastatic lesions. We could also achieve this CTC concentration by increasing the metastatic efficiency 10,000-fold. Either scenario, or a combination, is conceivable, since a cell that has completed the metastatic cascade has proven to be capable of dissemination into the circulation and of formation of a metastasis.

Sensitivity needed for radiographic imaging and CTC detection to detect a tumor before it gives rise to metastasis
The model was used to predict the technology needs for detection of tumors before metastasis can occur. The values used for the model are provided in Table 2. In Figure 4 an example is shown of a T 1B breast tumor. Panel A shows the development of the total tumor mass and the tumor cell number per equation 1. The black line represents the case for which the tumor is surgically removed and the gray line the case for which the tumor is not removed. In panel B, the solid black line shows the maximum diameter of the tumor. This diameter is important for detection of a tumor by an imaging method. In this case, the T 1B tumor is detected when it reaches 8 mm, 3.4 years after its inception, and is surgically removed. If an imaging system is employed to detect all lesions in a patient, it must be capable of identifying the smallest lesions. The dashed line in panel B shows the diameter of the smallest lesion at times multiple lesions exist. In Figure 4, the tumor has seeded a metastasis 2.8 years after initiation of the tumor. At the time of surgery this lesion has a diameter of 70 μm; undetectable by imaging. The total number of metastases is shown in a solid gray line on the secondary y-axis. The number of metastases is relatively stable from 3.4 years to~5.5 years after surgery, but rapidly increases after 6 years because Metastatic efficiency 1.7 · 10 -8 (1.3 · 10 -8 -4.2 · 10 -8 ) 7 · 1 0 -5 (1 · 10 -6 -6 · 10 -3 ) Literature values are the median of all estimates with the range of estimates in parenthesis. Detailed data for each publication is accessed through Additional file 1: Table S1. the metastasis has become sufficiently large to make formation of new metastases sufficiently probable. Panel C shows the CTC concentration in solid black and the probability of forming the first DM is solid gray on the secondary y-axis. To reduce the probability of DM from 9.2% in the patients included in our study to 1%, the tumor needs to be detected by the time it reaches 2.7 ± 1.6 mm, or when the CTC concentration is 9 ± 6 CTC/L whole blood.

Discussion
The most effective therapy to treat breast cancer is to surgically remove the primary tumor before it has formed a distant metastasis (DM). Unfortunately the technology available to detect the presence of DM at the time of diagnosis cannot accurately make this determination and a large portion of patients receiving adjuvant therapy do not benefit from this therapy whereas others could have benefitted from adjuvant therapy they did not receive. To identify those patients at risk for DM the traditional TNM-classification has been complemented with differentiation grade, peri-tumor vascular invasion, estrogen, progesterone, Her2neu receptor expression and more recently through molecular characterization of the tumor [46][47][48][49][50][51]. Although improvement in the risk assessment helps to identify the patients that need additional therapy after surgical removal of the primary tumor, detection of the actual presence of tumor cells beyond the primary tumor is preferred. Indeed the presence of micrometastases in bone marrow [52,53] and tumor cells in blood [12][13][14]54] of breast cancer patients have been associated with an increased risk for disease recurrence, but have not become part of clinical practice partly because the current technology lacks sufficient sensitivity and specificity. The observations that CTC have been detected in patients years after a diagnosis and treatment of breast cancer with curative intent further challenges the technology to identify those CTC characteristics that predict imminent relapse [55,56].
To identify the basic requirements for detection of DM we have modeled the probability that a DM has been formed prior to surgery. Three key components of this probability are the tumor doubling time (DT), the rate of tumor cell dissemination (R diss ), and the probability of successful completion of the metastatic cascade (γ metastatic ). Rate of dissemination can be determined from the CTC concentration values reported in literature [8][9][10][11][12][13][14][15].
Here we combined literature values with clinical data from the NCR to obtain estimates for DT, and (γ metastatic ) for patients. Using this model, we predicted the sensitivity needed for radiographic imaging and CTC enumeration for the detection of a primary tumor before DM formation has occurred.
The major assumptions in the model are: 1) Metastatic efficiency and dissemination rate are not dependent on doubling time. Considering the high incidence of recurrence within the first 5 years compared to years 6-15, metastatic efficiency or dissemination rate are probably smaller for slower growing tumors. If, for example, 75% of recurrences are found in the first five years (doubling time < 3 months), 20% of recurrences in years 5-15 (doubling time 3-9 months), and 5% of recurrences in years 15-25 (doubling time 9-15 months), the metastatic efficiency for 0-3 month doubling time would be approximately 6-fold higher than for 3-9 months, and approximately 24-fold higher than for 9-15 months. A shorter doubling time reduces the probability of DM; the tumor has a shorter time to form a metastasis before it is large enough to be discovered. However, a reduction of probability of DM by an x-fold shorter doubling time is negated by a x ½ -fold higher metastatic efficiency. A 6 fold increase in metastatic efficiency and/or dissemination rate for a fourfold shorter doubling time would mean that the faster growing tumor has a higher probability of DM.
2) The rate of dissemination and metastatic efficiency are independent of cancer type. We need to make this assumption because 1.) We lack data on CTC concentrations versus cancer type, and 2.) We lack data on cancer types for the patients in our data set. In another study, the five year risk of recurrence for patients with triple negative breast cancer (11% of total) was estimated to be 2.6 fold higher than for patients with other breast cancers (89% of total) [37]. To assess the impact of a subtype with high risk of recurrence, we implemented a subgroup of 11% of patients with 2.6 fold higher product of metastatic efficiency and dissemination rate than the other 89%, while the average metastatic efficiency was held constant. The estimated detection limits did not change due to a higher metastatic efficiency nor to a higher dissemination rate.
3) The metastatic efficiency does not evolve over time.
While we expect that metastatic efficiency actually increases over time [16], we lack data describing such evolution. The high relative probability of distant metastasis formation just before tumor discovery implies that the estimated metastatic efficiency also applies to the period just before tumor discovery. To obtain a fit between the CTC data in early stage patients and in metastatic patients, we applied a single increase in the dissemination rate of 25-fold, or an increase in the metastatic efficiency of 10,000-fold. Our rationale was that the metastatic cell has become efficient at disseminating and/or metastasizing due to natural selection by the metastatic cascade and has thus become genetically more prone to formation of new metastases [57,58]. We recognize that it is equally feasible that such evolution occurs more gradually. 4) The transit from primary tumor to metastatic site is instant. Temporary storage of cells in the bone marrow, or temporary dormancy at the metastatic site, would result in a delay in the start of growth, and thus in an underestimation of the doubling time. For long delays the recurrence would likely be pushed outside the 5-year window, and for small delays the doubling time is marginally affected. For example, if we assume a typical delay of four months, the previous doubling time estimate of 1.7 months would become 1.5 months. 5) The probability of DM is defined as the probability that at least one metastasis was present at the time of surgery, continues to grow and is discovered once it has reached a size of 8 mm. Changing the size at which a tumor is discovered affects the estimated growth rate. For example, if the typical tumor is discovered at a size of 15 mm, the estimated doubling time is 1.5 months instead of 1.7 months.
Data from the NCR was used to determine the probability for breast cancer DM by T-stage and the time between surgical intervention and DM. To obtain a patient group with minimal risk of DM, we included only patients with complete removal of the tumor after surgical resection, relatively small tumors (T 1,2 ) and no detectable metastasis (N X M 0 ). The NCR recorded data for DM five years after surgical intervention. From the time to DM of 32 ± 18 months, we determined a DT of 1.7 ± 0.9 months for DM; threefold faster than the DT of 5.7 months (range 2.0-11.2) determined from primary tumor imaging data. A DM with a DT of 5.7 months would lead to discovery of a DM 9.5 years after initiation of the DM. Our 5-year (60 month) observation window is too short to observe tumors with a DT of 5.7 months. It is likely that our estimate of 1.7 months represents tumors with aggressive growth rates. Concurrent, the 5-year observation window may select for specific organs, because aggressive growth rates are more likely in organs that provide high levels of nutrients and tumor specific growth factors. Approximately three quarters of recurrences take place in the first five years [59]. With a 15-year observation window we expect to find a doubling time of 2.7 months. In addition, the literature value for DT of 5.7 months is determined on primary tumors, while the model fit DT of 1.7 months is determined on the DM. The DM may have a different DT than the primary lesion in the same patient due to natural selection in the metastatic cascade, differences in the tumor microenvironment or accumulation of growth enhancing mutations.
From murine studies, we conclude that dissemination rate is linearly dependent on the diameter of a lesion. For a diameter of 8 mm (typical T 1B ) we find a dissemination rate of 280 CTC/h · g tumor (range 90-470) when we fit the clinical data to our model. This is on the low end of the range of dissemination rates determined from the tumor efferent vein in human studies of 90-78,000 CTC/h · g tumor (Additional file 1: Table S3). Dissemination rates determined in murine models span a very wide range of 7 orders of magnitude (0.15-8,700,000 CTC/h · g tumor, Additional file 1: Table S2). While this variation may be caused by differences in the detection methods used or differences between cell lines, the variation between murine estimates makes comparison with our model futile.
Metastatic efficiency in our model is estimated at 1 metastasis per 60 million disseminated tumor cells. This is substantially less efficient than the murine model median estimate of 1 metastasis in 14,000 disseminated cells (range 1 in 170 to 1 in 1 million). The large difference of metastatic efficiency between murine model and human model may be attributed to many factors, including use of cell lines with high metastatic efficiency, the 2,000fold difference in size between human and mouse and the immunodeficiency of most mouse models. A host specific (immune) response to tumor cells most likely reduces metastatic efficiency, and may reduce tumor growth of small lesions. Studies quantifying the impact of the host response on tumor growth are needed before inclusion in any model. Murine models suggest that disseminated cells have high survival in circulation and are efficient at extravasation, Additional file 1: Table S4. Survival of extravasated cells beyond 2 weeks is estimated between 4% and 50%, if these tumor cells continue to survive this would leave a substantial number of dormant cells scattered throughout the body, up to a million cells in our model, Additional file 1: Supplemental S4. These cells may constitute a malignant time-bomb, since dormant cells may be reactivated at a later time [60]. In the shorter term, metastatic efficiency is limited primarily by the ability of a disseminated cell to grow in a new site Table 4.
Based on murine studies in different organs (see Additional file 1) we expect the model to be applicable to other cancers. It should be noted that tumors with high metastatic efficiency, such as melanoma [63] or non-small cell lung cancer [64] will have substantially lower numbers of CTC. Similarly, colorectal CTC are captured in the hepatic microcirculation and are lower when detected in the peripheral circulation [9,45]. Determination of tumor size is more difficult for some tumor types such as prostate cancer, which will result in higher error margins in the model parameters.
To determine the probability of metastases in a patient, three parameters are relevant, the dissemination rate, the growth rate and the metastatic efficiency. The dissemination rate can be determined from the CTC concentration, the growth rate and metastatic efficiency can be estimated from the primary tumor or, alternatively, by genotyping captured CTC. This is supported by the observation that both CTC concentration and hormone receptor status from primary tissue information are independent prognostic data in multivariate analyses [65,66].
The model can be applied to estimate the probability of metastases as a function of primary tumor size. Figure 3 illustrates that the model reasonably predicts the probability of DM for stages T 1B to T 2 . The probability of DM grows slightly faster in the data than in the model, which may be caused by a slow increase in dissemination rate or metastatic efficiency over time. With current imaging technology, 94% of detected lesions have a size of 6 mm or more, with a specificity of 40% [67]. From the data of the NCR, we conclude that current clinical practice in the Netherlands has similar detection characteristics, with 95% of the tumors detected when the tumor is 5 mm or larger, with a median size of 17 mm. The larger probability of DM for T 1A than T 1B in the NCR data is unexpected and raises the question whether these small tumors are truly more aggressive, or whether the difficulty to detect tumors smaller than 5 mm has caused a sampling bias in the T 1A sample.
To implement CTC as a screening tool, the improved CTC detection will need to have a minimal impact on the screened patient and to have similar specificity to radiological imaging. We note that by definition, CTC enumeration will not detect benign lesions. On the other hand, CTC detection could have excellent sensitivity and specificity for malignant lesions if the malignancy of detected CTC is confirmed with for example whole genome comparative genome hybridization [68,69].

Conclusions
A model was developed to estimate tumor size and CTC concentration before distant metastasis occur. To reduce the overall probability of DM from 9.2% to 1% the tumor needs to be detected by the time it reaches 2.7 ± 1.6 mm. Clinical proof of this estimate requires an improvement in imaging technology that allows routine diagnosis of tumors smaller than 2.7 mm without a decrease in specificity of tumor detection. Alternatively, to achieve probability of DM of 1%, a tumor would need to be detected when the CTC concentration is 9 ± 6 CTC/L of whole blood. This requires at least a 15-fold improvement in the CTC detection limit. Subtypes of breast cancer with higher growth rate, higher metastatic efficiency, or higher dissemination rate would affect this estimate, requiring a smaller lesion, or a lower number of CTC, to be detected to achieve the overall probability of DM of 1%.

Additional file
Additional file 1: Detection of cancer before metastasis.

Competing interests
This work was supported by Veridex LLC. Prof. Leon WMM Terstappen is an inventor of several patents related to the CTC technology that have been assigned to Veridex LLC, he is presently a consultant for Veridex and receives research funding from Veridex LLC. All remaining authors have declared no competing interest.
Authors' contributions FC and LT designed the study and drafted the manuscript. FC and SB performed the statistical analysis. FC, SB and LT performed the data analysis and data interpretation. All authors read and approved the final manuscript.