- Research article
- Open Access
Minimum follow-up time required for the estimation of statistical cure of cancer patients: verification using data from 42 cancer sites in the SEER database
BMC Cancer volume 5, Article number: 48 (2005)
The present commonly used five-year survival rates are not adequate to represent the statistical cure. In the present study, we established the minimum number of years required for follow-up to estimate statistical cure rate, by using a lognormal distribution of the survival time of those who died of their cancer. We introduced the term, threshold year, the follow-up time for patients dying from the specific cancer covers most of the survival data, leaving less than 2.25% uncovered. This is close enough to cure from that specific cancer.
Data from the Surveillance, Epidemiology and End Results (SEER) database were tested if the survival times of cancer patients who died of their disease followed the lognormal distribution using a minimum chi-square method. Patients diagnosed from 1973–1992 in the registries of Connecticut and Detroit were chosen so that a maximum of 27 years was allowed for follow-up to 1999. A total of 49 specific organ sites were tested. The parameters of those lognormal distributions were found for each cancer site. The cancer-specific survival rates at the threshold years were compared with the longest available Kaplan-Meier survival estimates.
The characteristics of the cancer-specific survival times of cancer patients who died of their disease from 42 cancer sites out of 49 sites were verified to follow different lognormal distributions. The threshold years validated for statistical cure varied for different cancer sites, from 2.6 years for pancreas cancer to 25.2 years for cancer of salivary gland. At the threshold year, the statistical cure rates estimated for 40 cancer sites were found to match the actuarial long-term survival rates estimated by the Kaplan-Meier method within six percentage points. For two cancer sites: breast and thyroid, the threshold years were so long that the cancer-specific survival rates could yet not be obtained because the SEER data do not provide sufficiently long follow-up.
The present study suggests a certain threshold year is required to wait before the statistical cure rate can be estimated for each cancer site. For some cancers, such as breast and thyroid, the 5- or 10-year survival rates inadequately reflect statistical cure rates, and highlight the need for long-term follow-up of these patients.
The normal distribution is often used to describe the random variation of data in many scientific disciplines. However some distributions are skewed with low mean values and large variance. The distributions may be exclusively positive, such as the duration of survival of cancer patients with chronic leukemias , the incubation time of infectious diseases , and the abundance of biological species , etc. These skewed distributions often fit the lognormal distribution [4–7]. A lognormal distribution is one with random variables whose logarithms follow a normal distribution. The lognormal survival time of cancer patients, who died of their disease, has been tested and applied for various anatomic sites [8–21].
The term survival rate is used commonly, yet it is inaccurate. Five- and ten-year survival rates are commonly used in the literature. Kaplan-Meier or life-table (actuarial) methods estimate the proportion (or fraction) of survivors. In this study, the term "survival rate" means "survival fraction" expressed in percentage, and the term "cure rate" means "cure fraction" expressed in percentage, to be proper.
It is difficult to know the statistical cure rate, which is an estimation based on statistical models [11, 15], especially for slowly proliferating tumors. Because of this limitation, oncologists usually discuss survival in terms of 5-year or 10-year survival rates. In certain rapidly growing tumors, the cancer-specific survival rate  reaches a plateau within 5–10 years and approaches the statistical cure rate, which is the survival rate observed when no more risk of death from the disease. In the Kaplan-Meier method , a person with residual cancer but who died of reasons other than the specific cancer is censored for cancer-specific survival. In the present study we applied the analysis on the cancer-specific survival rates. The statistical cure was reached at the plateau of the Kaplan-Meier plot of the cancer-specific survival rate. For some fast proliferating cancers, such as pancreatic and stomach cancers, the plateau appears within 10 years. However for some slow proliferating cancers, such as thyroid and early breast cancers, the plateau does not appear even after decades. Hence the present commonly used five-year survival rates are not adequate to represent the statistical cure rate.
In the present study, we established the minimum number of years required for follow-up in order to estimate the statistical cure rate, by using a lognormal distribution of the survival time of those who died of their cancer by applying the result derived by Limpert et al. This minimum number of years required for follow-up was defined as the threshold year by μ* × (σ*)2 where μ* is the median and σ* is the multiplicative standard deviation of a lognormal distribution (see Additional file 1 : Appendix). The follow-up time for patients dying from the specific cancer covers most of the survival data, leaving less than 2.25% uncovered. This is close enough to cure from that specific cancer.
We analysed the 1973–1999 Database of the Surveillance, Epidemiology, and End Results (SEER) Program  of the United States National Cancer Institute. Data from registries of Connecticut and Detroit were used in this study and results were compared with the SEER-9 registries. The cancer sites were chosen according to the SEER codes. Primary site, vital status, cause of death, and survival time information were used. Those patients with unknown or zero survival time and unknown cause of death were excluded.
To test for lognormality of survival time of cancer patients who died of cancer-specific disease, patients diagnosed from 1973–1992 in the registries of Connecticut and Detroit were chosen so that a maximum of 27 years was allowed for follow-up to 1999. A total of 49 specific cancer sites were tested. For prostate, salivary gland, breast, and thyroid cancer, patients diagnosed from 1973–1977 were chosen to allow a long enough time for follow-up to 1999. For cancer sites with high frequencies of older patients, such as lung and bronchus, colon, prostate, and breast, younger patient ages less than 60 were chosen, because elderly patients are more likely to die of intercurrent diseases and so affect the distribution of cancer-specific deaths versus other deaths.
As an illustration for special cohorts of interest, breast cancer was further analyzed based on accepted prognostic factors such as stage, histologic type and tumor grade, to show that the threshold years were different for different cohorts of interest.
The survival times of the cancer patients who died of their disease were tested for goodness of fit for lognormality using a minimum chi-square method. The class intervals were in the powers of 2 in months of the survival time, such as 0–2, >2–4, >4–8, >8–16, and so on. The values of M, mean of the log (survival time), and S, standard deviation of the log (survival time), were varied in the tests so that a minimum chi-square value was obtained. The null hypothesis states that there is no difference between the observed data distribution and the lognormal distribution. It is rejected if P < 0.05. The values of M and S were obtained when the chi-square value reached a minimum.
Let τ be the threshold year at which statistical cure rate can be estimated, then the cancer-specific survival rate can be obtained by Kaplan-Meier method with follow-up time equal to τ. This cancer-specific survival rate at time τ was an estimation of the statistical cure rate of the disease. It was compared with the long-term cancer-specific rate calculated using Kaplan-Meier method with the actual long-term data available up to 1999 (Figures 1 and 2).
The present study verified that, for 42 specific organ sites out of 49 cancer sites in the SEER database, the survival time of cancer patients who died of their disease followed different lognormal distributions approximately. For the cancer sites with cancer-specific survival time following lognormal distribution, the number of patients and values of S, multiplicative standard deviation, M, median and P at minimum chi-square are listed in Table 1. All the P-values in Table 1 are above 0.10.
The seven specific organ sites failed in the test for lognormality were: lip, oropharynx, rectosigmoid junction, rectum, testis, urinary bladder, and kidney & renal pelvis.
Estimation of statistical cure rate
The threshold years validated for statistical cure were found to range from 2.6 years for pancreatic cancer to 25.2 years for cancer of salivary gland. For these 40 cancer sites with survival time followed a lognormal distribution, their cancer-specific survival rates were obtained by Kaplan-Meier method at the threshold year, and they were compared with their corresponding long-term survival rates with follow-up to 1999. Out of the 42 cancer sites with survival time followed a lognormal distribution approximately, the statistical cure rates for 40 cancer sites were found to match the actuarial long-term survival rates estimated by the Kaplan-Meier method within six percentage points, at threshold year. For the two remaining cancer sites: breast and thyroid, the values of their threshold years were so long that the cancer-specific survival rates could not be obtained because the SEER data up to 1999 do not provide sufficient long-term follow-up. Table 2 shows the comparison of the cancer-specific survival rates at τ year and at long-term follow-up.
Breast cancer was further analyzed according to accepted prognostic factors. The corresponding threshold years according to stages in the SEER classification: localized (age < 50), regional (age < 50) and distant were 30.8, 27.9, and 13.1 years respectively. For histologic types: medullary, ductal combined with adenocarcinoma not otherwise specified (NOS), and lobular, the threshold years were 17.0, 30.6, and 46.0 years respectively. The threshold years according to breast tumor grades were 39.8, 26.3, and 20.7 years for Grades I, II, and III+IV respectively. The breast cancer-specific survival rates at their threshold years were 9% for distant stage, 68% for medullary histology, and 40% for Grades III+IV breast cancer. These three survival rates were only one percentage point higher when compared with the available long-term actuarial survival rates. For regional stage and Grade II breast cancer, their threshold years were 27.9 and 26.3 respectively. It can be predicted that a few more years of follow-up are needed to see the plateau and their statistical cure rates are close to 39% and 47% respectively. The most recently available SEER database is now up to end of 2001. With 2 additional years of follow-up from the end of 1999, the cancer-specific survival rates were 39% and 44% respectively. For those with τ values longer than 27 years, the cancer-specific survival rates could not be obtained because the SEER data to date do not provide sufficiently long enough follow-up time.
The cancer-specific survival has not included the deaths due to other causes. The cancer-specific rates also depend on the reliability of the assignment of the cause of death. Generally, cancer-specific death rates underestimate the mortality associated with a diagnosis of the specific cancer, because some patients died of other causes. SEER is a set of geographically defined, population-based, central cancer registries in the United States, operated by local non-profit organizations under contract to the National Cancer Institute (NCI). Registry data are submitted electronically to the NCI on a biannual basis, and the NCI makes the data available for analysis. The SEER Program is considered the standard for quality among cancer registries around the world. Quality control has been an integral part of SEER since its inception. Every year, studies are conducted in SEER areas to evaluate the quality and completeness of the data being reported.
Gamel and Vogel  have compared the advantage of lognormal distribution over other distributions such as Weibull and log logit.
The 1973–1992 data were used so that the data were not out-dated. For a lognormal distribution of the survival time of those patients died of the specific cancer, there were only a very small proportion dying at the tail of follow-up (Figure 1), so it would not cause much change to the lognormal distribution, even with only 7 years of follow-up to 1999 at the tail. For those cancer sites with threshold years longer than 24 years, the 1973–1977 data were used so as to allow a long enough time for follow-up to 1999.
For 42 sites in the SEER database, the survival times of cancer patients who died of their disease followed different lognormal distributions. For 40 cancer sites, the ultimate cure rate can be roughly estimated from the cancer-specific survival rates at τ years. These are the required minimum number of follow-up years for the estimation of the cure rates. They are different for different cancer sites. For pancreatic cancer, with its typically short natural history, the cure rate can be estimated after only 2.6 years. For cancers with a longer natural history, longer follow-up periods are required; such as breast (36.2 years). These long periods are cancer-specific survival times and in reality patients may die from intercurrent non-cancer causes before τ years. For thyroid cancer, the estimated threshold year was 134.1 years. It seems that for some slow proliferating cancer types, the cure can never be estimated due to the limit of human lifetime.
We also find that the required minimum number of years of follow-up, τ, is independent of cure rate (correlation coefficient of determination, R2 = 0.10). Even for cancer sites where the cure rates were >50%, the required follow-up time τ could be less than 10 years. On the other hand, for other cancer sites, the cure rates could be < 50%, and the required follow-up time were >10 years. It shows that 5- or 10-year survival rates are inadequate to reflect the statistical cure rates.
If there are more patients dying due to other causes than dying due to the specific cancer, then the cause specific survival time distribution will not be lognormal. Hence it is not expected that all cancer-specific survival time distributions will follow lognormal distributions.
According to the bell-shaped property of a normal distribution, from 0 to τ year covers 97.75% of the lognormally distributed survival time of those cancer patients who died from their specific cancer. The cancer-specific survival rates estimated at τ years, generally, slightly overestimate the long-term cure rates compared to the Kaplan-Meier method, but the differences are reasonably small, by less than six percentage points as verified empirically. We still need to follow the patients to τ year to know the actual value of the estimated cure rates.
For both rapidly and slowly proliferating cancers, we have shown that the statistical cure rates can be estimated before a stable plateau is reached in the Kaplan-Meier survival curve. It may take decades to see a stable plateau, during this waiting time many patients might be lost to follow-up or die of intercurrent diseases.
Gamel and Vogel  used cause-specific survival and relative survival to determine actuarial survival in breast cancer patients from the SEER database. They found that there was only minimal deviation between the two survival methods.
Ries et al.  reported up to 20-year relative survival rates (RSR) from 9 registries of the SEER database. For cancers of pancreas, esophagus and stomach, the RSR were slightly decreased after five years since diagnosis. These are consistent with the threshold years of the present study varying from 2.6 years for pancreas, to 3.9 years for esophagus, and to 5.8 years for stomach. Dickman et al.  reported similar results with 10-year RSR for the Finnish Cancer Registry. Talback et al. also showed the similar results on two RSR graphs up to 30 years for pancreas and stomach for the Swedish Cancer Registry.
The threshold year of statistical cure for ovarian cancer was 10.4 years. It was consistent with the results of Ries et al. and Dickman et al. up to 10 years. After 10 years, there was only a slight decrease in RSR as reported by Ries et al. Talback et al. showed the same results on a RSR graph for ovarian cancer. Their results were also consistent with the present study for the cancer sites of lung, colon and skin melanomas, which have threshold years of 9.0, 12.2 and 18.2 years respectively. These consistencies show that the results obtained from two SEER registries in the present study are similar to those from 9 registries and from the Finnish and Swedish Cancer Registries.
For prostate cancer, the threshold year for statistical cure was 24.6 years. The RSR graph started to level after 24 years since diagnosis in the article of Talback et al.
For breast cancer, the threshold year was 36.2 years, RSR leveling was not seen even 30 years after diagnosis in the article of Talback et al. Leveling of RSR in breast cancer was also not seen in two separate studies by Schairer et al.  and Brenner and Hakulinen . Kerr et al.  reported that the ratio of observed to expected mortality remained significantly greater than unity for at least 25 years following diagnosis and treatment, indicating a failure to demonstrate cure of the disease in a statistical sense for a median of 32 years of follow-up.
The present study suggests a certain threshold year is required to wait before the statistical cure rate can be estimated for different cancer sites. Although the often used 5- or 10-year survival rates may adequately reflect statistical cure rates for cancers with short natural history, such as pancreatic cancer, this is not the case for many other cancers. This highlights the need for continued long-term cancer surveillance, especially for cancers with long natural histories such as thyroid cancer and early stage, well-differentiated breast cancer. This study is relevant for public health and cancer control. Whether knowledge of the threshold year will have any impact on decisions regarding therapy for cancers thought to have a good prognosis remains to be investigated.
Feinleib M, McMahon B: Variation in the duration of survival of patients with the chronic leukemias. Blood. 1960, 15: 332-349.
Kondo K: The log-normal distribution of the incubation time of exogenous diseases. Jpn J Hum Genet. 1977, 21: 217-237.
Magurran AE: Ecological diversity and its measurement. 1988, London, Croom Helm
Aitchison J, Brown JAC: The Log-normal distribution. 1957, Cambridge, UK, Cambridge University Press
Crow EL, Shimizu K: Log-normal distributions: Theory and application. 1988, New York, NY, Dekker
Lee ET: Statistical Methods for Survival Data Analysis. 1992, NewYork, NY, Wiley
Johnson NL, Kotz S, Balkrishan N: Continuous univariate distributions. 1994, New York, NY, Wiley
Mould RF, Hearnden T, Palmer M, White GC: Distribution of survival times of 12,000 head and neck cancer patients who died with their disease. Br J Cancer. 1976, 34: 180-190.
Boag JW: The presentation and analysis of the results of radiotherapy. Part I Introduction. Br J Radiol. 1948, 21: 128-138.
Boag JW: The presentation and analysis of the results of radiotherapy. Part II. Mathematical theory. Br J Radiol. 1948, 21: 189-203.
Boag JW: Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc [Ser B]. 1949, 11: 15-53.
Berg JW: The distribution of cancer deaths in time: A survey test of the lognormal model. Br J Cancer. 1965, 19: 695-711.
Wood CAP, Boag JW: Researches on the radiotherapy of oral cancer. Medical Research Council Special Report Series No. 267, London: HMSO. 1950
Tivey H: The prognosis for survival in chronic granulocytic and lymphocytic leukemia. Am J Roentgenol. 1954, 72: 68-93.
Haybittle JL: The estimation of the proportion of patients cured after treatment for cancer of the breast. Br J Radiol. 1959, 32: 725-733.
Mould RF, Lederman M, Tai P, Wong JKM: Methodology to predict long-term survival from short-term data using Tobacco Cancer Risk and Absolute Cancer Cure models. Phys Med Biol. 2002, 47: 3893-3924. 10.1088/0031-9155/47/22/301.
Tai P, Yu E, Vinh-Hung V, Cserni G, Vlastos G: Survival of patients with metastatic breast cancer: twenty-year data from two SEER registries. BMC Cancer. 2004, 4: 60-10.1186/1471-2407-4-60.
Tai P, Tonita J, Yu E, Skarsgard D: A 20-year follow-up study of the long-term survival of limited stage small cell lung cancer and an overview of prognostic and treatment factors. Int J Radiat Oncol Biol Phys. 2003, 56: 626-633. 10.1016/S0360-3016(03)00070-1.
Gamel JW, George SL, Edwards MJ, Seigler HF: The long-term clinical course of patients with cutaneous melanoma. Cancer. 2002, 95: 1286-1293. 10.1002/cncr.10813.
Gamel JW, McLean IW, McCurdy JB: Biologic distinctions between cure and time to death in 2892 patients with intraocular melanoma. Cancer. 1993, 71: 2299-2305.
Mould RF, Boag JW: A test of several parametic statistical models for estimating success rate in the treatment of carcinoma cervix uteri. Br J Cancer. 1975, 32: 529-550.
Cutler SJ, Axtell LM: Adjustments of long-term survival rates for deaths due to intercurrent disease. J Chronic Dis. 1969, 22: 485-491. 10.1016/0021-9681(69)90010-1.
Kaplan EL, Meier P: Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958, 53: 457-481.
Limpert E, Stahel WA, Abbt M: Log-normal distributions across the sciences: keys and clues. BioScience. 2001, 51: 341-352.
Surveillance, Epidemiology, and End Results (SEER) program public-use CD-ROM (1973–1999). Bethesda, Md.: National Cancer Institute, Apr 2002 (software). 2002, Apr (software)
Brown BW, Brauner C, Minnotte MC: Noncancer deaths in white adult cancer patients. J Natl Cancer Inst. 1993, 85: 979-987.
Gamel JW, Vogel RL: Comparison of parametric and non-parametric survival methods using simulated clinical data. Stat Med. 1997, 16: 1629-1643. 10.1002/(SICI)1097-0258(19970730)16:14<1629::AID-SIM594>3.0.CO;2-C.
Gamel JW, Vogel RL: Non-parametric comparison of relative versus cause-specific survival in Surveillance, Epidemiology and End Results (SEER) programme breast cancer patients. Stat Methods Med Res. 2001, 10: 339-352. 10.1191/096228001680678331.
Ries LAG, Eisner MP, Kosary CL, Hankey BF, Miller BA, Clegg L, Mariotto A, Feuer EJ, Edwards BK, (eds): SEER Cancer Statistics Review, 1975-National Cancer Institute. 2004, Bethesda, MD, [http://seer.cancer.gov/csr/1975_2001/]
Dickman PW, Hakulinen T, Luostarinen T, Pukkala E, Sankila R, Soderman B, Teppo L: Survival of cancer patients in Finland 1955–1994. Acta Oncol. 1999, 38 (Suppl 12): 1-103.
Talback M, Stenbeck M, Rosen M, Barlow L, Glimelius B: Cancer survival in Sweden 1960–1998 – developments across four decades. Acta Oncol. 2003, 42: 637-659. 10.1080/02841860310013391.
Schairer C, Mink PJ, Carroll L, Devesa SS: Probabilities of death from breast cancer and other causes among female breast cancer patients. J Natl Cancer Inst. 2004, 96: 1311-1321.
Brenner H, Hakulinen T: Are patients diagnosed with breast cancer before age 50 years ever cured?. J Clin Oncol. 2004, 22: 432-438. 10.1200/JCO.2004.04.067.
Kerr GR, Kunkler IH, Langlands AO, Rodgert A: (In)curability of breast cancer: a 30-year report of a series of 3933 cases. The Breast. 1998, 7: 90-94. 10.1016/S0960-9776(98)90063-7.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2407/5/48/prepub
PT: Saskatchewan Cancer Agency Research Grant Award 2792.
GCs: János Bolyai Research Fellowship from the Hungarian Academy of Sciences.
The author(s) declare that they have no competing interests.
PT: Data analysis and writing of the manuscript.
EY, GCs, GV, MR, IK, VVH: Critical appraisal of the manuscript.
All authors read and approved the final manuscript.
Electronic supplementary material
About this article
Cite this article
Tai, P., Yu, E., Cserni, G. et al. Minimum follow-up time required for the estimation of statistical cure of cancer patients: verification using data from 42 cancer sites in the SEER database. BMC Cancer 5, 48 (2005). https://doi.org/10.1186/1471-2407-5-48
- Lognormal Distribution
- Cancer Site
- Specific Cancer
- Relative Survival Rate
- Swedish Cancer Registry