Skip to main content
  • Research article
  • Open access
  • Published:

Global and regional estimates of cancer mortality and incidence by site: I. Application of regional cancer survival model to estimate cancer mortality distribution by site

Abstract

Background

The Global Burden of Disease 2000 (GBD 2000) study starts from an analysis of the overall mortality envelope in order to ensure that the cause-specific estimates add to the total all cause mortality by age and sex. For regions where information on the distribution of cancer deaths is not available, a site-specific survival model was developed to estimate the distribution of cancer deaths by site.

Methods

An age-period-cohort model of cancer survival was developed based on data from the Surveillance, Epidemiology, and End Results (SEER). The model was further adjusted for the level of economic development in each region. Combined with the available incidence data, cancer death distributions were estimated and the model estimates were validated against vital registration data from regions other than the United States.

Results

Comparison with cancer mortality distribution from vital registration confirmed the validity of this approach. The model also yielded the cancer mortality distribution which is consistent with the estimates based on regional cancer registries. There was a significant variation in relative interval survival across regions, in particular for cancers of bladder, breast, melanoma of the skin, prostate and haematological malignancies. Moderate variations were observed among cancers of colon, rectum, and uterus. Cancers with very poor prognosis such as liver, lung, and pancreas cancers showed very small variations across the regions.

Conclusions

The survival model presented here offers a new approach to the calculation of the distribution of deaths for areas where mortality data are either scarce or unavailable.

Peer Review reports

Background

As a part of the Global Burden of Disease 2000 (GBD 2000) project, the present study aims at estimating the total global and regional cancer mortality and incidence based on its detailed analysis of all-cause levels and cause of death distributions for 191 Member States of the World Health Organization (WHO) [1]. GBD 2000 requires age- and sex- specific incidence, duration and mortality as a minimum input to estimate burden of each disease sequela by a composite measure of mortality and morbidity (i.e., disability-adjusted life years: DALYs).

Attempts have been made to quantify the global burden of cancer, and estimate site-specific cancer mortality and morbidity [2–6]. Such studies have repeatedly suggested that incidence and mortality from cancer are continuously increasing in many parts of the world. Despite an increasing trend of cancer incidence and mortality, data on survival and prognosis of incident cases from population-based cancer registries are limited in majority of developing countries. One of the most credible sources of information is available from the International Agency for Research on Cancer (IARC), which has been coordinating and implementing the cancer registries in such regions [7].

While vital registration of causes of death and national cancer registries are perhaps the best source of data on cancer mortality, mortality data are still scarce, poor or even unavailable for some regions of the world. Innovative methods will thus continue to be needed to exploit available data. Estimating mortality from morbidity and, especially, morbidity from mortality was a common practice in the 70's and 80's [8, 9]. More recently, the continuos effort made by IARC has lead to the Globocan 2000 estimates which has also used information on incidence and survival to estimate cancer death for the year 2000 from various sources including cancer registries [2, 6]. Still others have made use of vital statistics and cancer incidence data to predict the number of new cancer cases and deaths for the US in the subsequent year [10].

On the basis of available published information on age-, sex-, and site-specific cancer incidence and survival, we developed an algorithm to estimate region-specific overall cancer mortality, and site-specific survival, death distributions and incidence for the year 2000. This paper presents the first of the two consecutive reports which present the detailed methods and results of GBD 2000 estimates for mortality and incidence of cancer by site.

The particular feature of the GBD 2000 study is that the number of deaths by age and sex in each region provides an essential envelope which constrains individual disease and injury estimates of deaths and that competing claims for the magnitude of deaths from various causes must be reconciled within this envelope [1]. Given the regional cancer mortality envelope by age and sex, the estimates of site-specific distributions of cancer mortality are necessary to disagreggate the estimated total cancer deaths by age and sex for each region.

For geographic disaggregation of the GBD 2000, the six WHO regions of the world have been further divided into 14 sub regions, based on levels of child (under five years) and adult (15–59 years) mortality for WHO Member States [1]. Five mortality strata were defined in terms of quintiles of the distribution of child and adult mortality (both sexes combined). Adult mortality was regressed on child mortality and the regression line used to divide countries with high child mortality into high adult mortality (stratum D) and very high adult mortality (stratum E). Stratum E includes the countries in sub-Saharan Africa where HIV/AIDS has had a very substantial impact.

When these mortality strata are applied to the six WHO regions, they produce 14 mortality subregions. For the purposes of burden of disease epidemiological analyses, 2 of these regions were further subdivided: EurB into EurB1 and EurB2 – the latter including the central Asian states; and WprB into WprB1 (mainly China), WprB2 (South East Asian countries) and WprB3 (Pacific Islands). Additionally, some Member States have been reclassified into subregions with similar epidemiological/geographic/ethnic patterns in order to maximise the epidemiological homogeneity of the subregions for the purposes of epidemiological analysis. The resulting 17 epidemiological subregions are listed in Table 1.

Table 1 Global Burden of Disease 2000 (GBD 2000) project: regions and sub regions

The approaches to estimating mortality distributions were different depending on the availability and quality of data on detailed causes of death. Direct estimates of the site-specific distributions of cancer mortality were possible for the regions where established vital registration records with high coverage and coding practice based on the International Statistical Classification of Diseases and Related Health Problems (ICD) are available, including countries in the A sub regions (AmrA, EurA and WprA) and countries in AmrB, EurB1, EurB2 and EurC [1]. For the other regions of the world (AfrD, AfrE, AmrD, EmrB, EmrD, SearB, SearD, WprB1, WprB2 and WprB3), we developed a site-specific model for relative interval survival adjusted for each region and applied it to the regional incidence estimated to calculate the mortality distribution by site for the year 2000. This model can also be used to estimate survival at different ages and average duration of cancer by site. In this paper, we present a detailed model as a key input to estimate the distribution of cancer deaths by site model for the regions where few data are available.

Material and Methods

Data sources

Relative interval survival based on the US data

The primary data sources used to develop the cancer survival model were the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) statistical program (SEER*Stat version 4.2). The SEER program is considered as the standard for quality among cancer registries around the world, being the most authoritative source of information on cancer incidence and survival in the United States [11], RIS was directly obtained from the SEER database (1973–1997 Public-Use Data) within SEER*Stat for every age group, sex, and cancer site. Cancer sites for which survival was calculated were: mouth and pharynx (ICD-10 C00-C14), oesophagus (C15), stomach (C16), colon and rectum (C18-C21), liver (C22), pancreas (C25), trachea, bronchus and lung (C33-C34), melanoma of the skin (C43), female breast (C50), cervix uteri (C53), corpus uteri (C54-55), ovary (C56), prostate (C61), bladder (C67), lymphomas and multiple myeloma (C81-C90, C96), leukaemia (C91-C95), and other malignant neoplasms (balance of ICD-10 C00-C97).

Incidence data

We initially used the Globocan 2000 estimates of the international Agency for Research on Cancer (IARC) to apply the survival model for a region [6], assuming that incidence rates to be constant over the years. We then estimated the region-specific number of new cases for 1986 to 2000, by applying these age-specific incidence rates to the annual population. We carefully examined the methods used to estimate country-specific incidence data in Globocan 2000, to ensure that for all the regions where we required incidence estimates, the Globocan estimates were based on cancer registry incidence data, and not modelled from mortality data using assumptions about survival (which would then result in circularity in our mortality estimation process for regions without good mortality data by cancer site).

Globocan 2000 estimates of cancer incidence by site for countries differ from those required for the GBD 2000 in two major respects: 1) Globocan 2000 estimates include Kaposi's sarcoma and non-Hodgkin lymphomas (NHL) caused by HIV/AIDS. The GBD 2000 includes these cases among AIDS sequela and their burden is included with the HIV/AIDS burden [12–14] and 2) Globocan 2000 estimates include cancers of unknown primary with cancers of other specified sites. The GBD 2000 attributes these ill-defined cancers back to specific sites as described above. Accordingly Globocan 2000 incidence estimates by age, sex, site and country were adjusted for these differences. Firstly, unpublished data on the incidence of Kaposi's sarcoma for countries in Africa were provided by IARC and used to adjust incidence of other cancers to remove incidence of Kaposi's sarcoma. Secondly, relative risks of HNL from HIV [15–18] were estimated and, together with the UNAIDS prevalence estimates of HIV in each country of the African region, NHL attributable to HIV was also removed. Thirdly, incidence estimates for cancers of unknown primary site were redistributed among specific sites using the GBD 2000 algorithm [1]. The proportion of the others category (balance of all but skin cancers) in the Globocan 2000 corresponding to unknown primary sites was estimated from published data on the distribution of cancer incidence by site which included unknown primary as a specific category [19–27].

After adjusting the Globocan 2000 incidence estimates for each country as described above, these estimates were summed for the countries in each GBD 2000 region, resulting in estimated incidence distributions by site, age and sex for each region. Finally, the GBD 2000 uses the latest population estimates for the Member States of the World Health Organization prepared by the United Nations Population Division [28]. In order to obtain incidence from 1986 to 2000, we estimated the age-specific population by sex for each of these years, using growth rates also from the United Nation's data.

Multiplicative model of relative interval survival

In order to estimate cancer death distribution for regions where little cancer mortality data is available (AfrD, AfrE, AmrD, EmrB, EmrD, SearB, SearD, WprB1, WprB2 and WprB3 sub regions), we developed an age-period-cohort multiplicative model for the relative interval survival (RIS) for each site. To incorporate all three time dimensions, we have taken into account the relative survival for every 5-year age group from 0 up to 85+ years of age, and for calendar year for 15 years (1981 to 1995), and for time since cancer diagnosis (survival time for cohorts) from 1- up to 15-year survival. After obtaining the time-specific survival data, we have then further indexed all the age, time, and calendar year survival information to the first year interval survival for each sex, and cancer site. The first year of survival was chosen because, for most if not all cancer sites, it is the most critical year concerning cancer survival experience. After the first year of survival, the relative survival curve usually increases and then flattens smoothly. Indexing was done by dividing each of the time-specific RIS by the survival at 1-year interval.

The specification of relative interval survival (RIS α,t,τ) for age α, calendar year t across the interval (τ - 1) since diagnosis in years, separately for each cancer site, was of the form:

RIS α,t,τ = 1 - (1 - RIS 1) A α T t Y τ     (1)

where RIS 1 is the relative interval survival after 1 year for all ages, averaged across the calendar years 1973 to 1997, A α is the ratio of the relative probability of death after 1 year at age α to the relative probability of death after 1 year for all ages averaged across the calendar years 1973 to 1997 , T t is the ratio of the relative probability of death after 1 year for all ages in calendar year t to the relative probability of death after 1 year for all ages from 1973 to 1997 and Y τ is the ratio of the relative probability of death after t years for all ages to the relative probability of death after 1 year for all ages from 1973 to 1997 .

To estimate survival for developing regions where little or no data is available, we replaced T t for the "equivalent" calendar year survival term T r in the equation (1) for each region. T r is the ratio of the relative probability of death after 1 year for all ages in the relevant region to the relative probability of death after 1 year for all ages in the SEER data, averaged across the calendar years 1973 to 1997. In this way, we obtain a new calendar year survival term for the model.

Equivalent period survival terms were estimated for each region by examining the relationship between period survival terms and gross domestic product per capita (measured in international dollars adjusted for purchasing power parity) using the following data: 1) SEER survival data for the USA for the years 1973 to 1997 [11, 29]; 2) Connecticut survival data for the years 1950 and 1958 [30]; 3) survival data for the late 1980s from cancer registries in 5 developing countries [31]; 4) survival data for European countries [32]; 5) specific recent national estimates of cancer survival as published [24, 33]. Survivorship functions were estimated to derive regional relative survival from registry data by fitting a Weibull distribution function. To allow for a proportion who are cured and never die from the cancer, we modify the Weibull model as follows:

S(t) = k + (1 - k) exp (-(λ t)γ)     (2)

where k is the proportion who never die from the cancer, λ is the location parameter (1/λ is the time at which 50% of those will die have died) and γ is the shape parameter. The mean survival time for those who die () is given by

where Γ denotes the gamma distribution. The analysis of survival data in developed regions suggested that the 10-year relative survival can be used as an estimate of the proportion who never die from the cancer. This is particularly useful when relative survival point estimates fluctuate significantly and the plausible exact solutions did not obtained as in some developing regions. To assess the goodness-of-fit of the survival curve, we compared fitted 5-year survival with the observed survival and ensured the good fits in all cases.

There are substantial variations in relative interval survival (all ages) among countries; these variations are even larger, and fluctuate substantially with age, when the age-sex specific survival estimates are examined. There is also a possibility of correlation among the observations within a same region. Thus we employed both liner and non-linear feasible generalised least squares (FGLS) by maximum likelihood estimation to accommodate with heteroscedasticity and correlation among the observations, and chose the best fit model for each site [34]. We did not include region fixed-effects since the data are not available for all sub regions.

Model estimation of cancer death distribution

For the estimation of the number of deaths and cancer death distribution by site, we needed to estimate the number of individuals who survived up to 2000 by age and time of survival as well as their corresponding probability of death during this year. The number of surviving individuals at age α in 2000 was calculated by multiplying incidence at age α in year (2000 - τ) by the observed interval survival for τ years since diagnosis for individuals aged α in 2000 (OIS α,τ), and summing over τ. To estimate OIS α,τ, we first calculated the relative cumulative survival (RCS α,τ) for every single age and year of survival for 2000, by multiplying RIS α,τ over the years of survival. In a standard life table format, OIS α,τ is written of the from:

where l x is the number of individuals surviving at exact age x in the life table, h x = ln (l x+1/l x ), α is age and τ is time since diagnosis.

The number of individuals S α,τ who had survived up to 2000 was by multiplying incidence and observed interval survival for the corresponding year of age and survival time:

S α,τ = I α-τ,2000-τ OIS α,τ     (5)

where I α,t is the incidence at age α in calendar year t. For example, the number of individuals who were 7 years of age (α = 7) in 2000, and who had survived cancer for 4 years (τ = 4) in 2000 was calculated by multiplying the incidence of cancer for the cohort of individuals who were 3 years of age (α - τ = 3) in 1996 ( = 2000 - τ) (year of diagnosis) by the OIS α,τ calculated for a 7 year old person who had survived 4 years since cancer diagnosis.

The probability of dying in 2000 due to cancer hazard, for each single age and year of survival, was calculated as follows:

P α,τ = [1 - exp (-(-ln (RIS α,τ) + h α))] [-ln (RIS α,τ)/(-ln (RIS α,τ) + h α)]     (6)

For each site, the number of deaths in 2000, among those individuals aged α years and who had survived cancer for τ years, was estimated by multiplied the number of survivors S α,τ by the relevant probability of dying in 2000 due to cancer hazard P α,τ. The total cancer deaths of the same site in 2000 at age α years is then estimated by summing over all survival times τ:

Based on these region-specific adjusted incidence estimates and survival levels, cancer deaths were calculated by equations (3)-(6) for each region by age group and sex to estimate the distribution, but not the magnitude, of cancer by site, sex, and age group.

Validation of the model

We tested the validity and performance of the proposed survival model in three different ways. Firstly, we compared our estimated RIS α,t,τ for τ = 1 to 10 years individuals diagnosed with cancer in 1986 with the SEER RIS α,t,τ for τ = 1 to 10 years for the same cohort of individuals. Secondly, we compared the model estimates of cancer mortality distribution with the observed distributions in the regions with good vital records (AmrB, EurA, EurB, EurC and WprA sub regions). AmrA sub region was excluded for the validation purpose since it includes the United States. Finally we compared the cancer death distribution of our model with the Globocan 2000 estimates for the regions where no vital records are available (AfrD, AfrE, SearB and SearD sub regions) to assess whether our model estimates are comparable to the estimates extrapolated from the actual observed data from the registries. In all cases, non-parametric tests for trends and Pearson's correlation were performed to examine whether the models estimates and observed data are consistent with each other. All statistical analysis was performed by STATA 7.0 (STATA corporation, College Station, TX).

Results

Parameter estimates

Relative interval survival (RIS) was directly obtained from the SEER database for every age group, sex, and cancer site The probability of death in the first year (1 - RIS 1), which is most crucial for the survival of most cancer patients, is shown in Table 2. The probability of death varied significantly from less than 5% in cancers of melanoma, breast, uterus, and prostate to over 80% in pancreas and liver cancers.

Table 2 Estimated relative probability of death after 1 year (1 - RIS 1)

Relative interval survival (RIS) was further indexed to the three parameters in the multiplicative cancer survival model by dividing each of the time-specific probability of death (1 - RIS) by the probability of death at 1-year interval (1 - RIS 1). Tables 3,4,5 represent the indexed estimates of three parameters by site for every 5-year age group from 0 up to 85+ years of age, and for calendar year from 1981 to 1995, and for time since cancer diagnosis from 1- up to 15-year survival, respectively. While there is considerable variation in the cohort parameters which reflect the prognosis among patients since the time of diagnosis, both age and period parameters are generally consistent across different types of cancer.

Table 3 Estimated age parameters (A α) by cancer site, age and sex
Table 4 Estimated time since diagnosis (cohort) parameters (Y Ï„) by cancer site, year from diagnosis and sex
Table 5 Estimated period parameters (T t ) for 1981–1995 by cancer site, calendar year, and sex

Based on the fitted data for each site and sex, and the estimated GDP per capita in international dollars for each region in 2000, T r factors were estimated for each site and sex for each GBD 2000 region. The results are presented in Table 6. An example is shown for breast cancer: knowing that GDP per capita in AfrD was $1,158 in 2000, this corresponded to an indexed calendar year-specific T t = 2.748. This was then the value used in the age-period-cohort survival model for breast cancer in the AfrD sub region. A similar process was applied to the other regions and for other cancer sites.

Table 6 Estimated regional period parameters (T r ) for 2000 by cancer site and sex

The period parameters (T r ) for all the available survival data and fitted values from regression analysis were plotted against GDP per capita (international dollars) for each site and sex as shown in Figures 1,2,3,4,5,6. The largest variation in survival was observed among cancers of breast, melanoma of the skin, and hematological malignancies such as lymphoma and leukemia. For the cancers of cervix and colon and rectum, both early detection and availability of treatment determine the survival and the variation among regions were moderate. On the other hand, cancers with very poor prognosis such as liver, lung, and pancreas cancers showed very small variations across the regions regardless of the levels of national income.

Figure 1
figure 1

Observed and fitted period factor by region ( T r ) versus GDP per capita (1)

Figure 2
figure 2

Observed and fitted period factor by region ( T r ) versus GDP per capita (2)

Figure 3
figure 3

Observed and fitted period factor by region ( T r ) versus GDP per capita (3)

Figure 4
figure 4

Observed and fitted period factor by region ( T r ) versus GDP per capita (4)

Figure 5
figure 5

Observed and fitted period factor by region ( T r ) versus GDP per capita (5)

Figure 6
figure 6

Observed and fitted period factor by region ( T r ) versus GDP per capita (6)

Model performance and validation

In order to check the performance of the model, we graphically compared our estimated RIS α,t,τ for τ = 1 to 10 years individuals diagnosed with cancer in 1986 with the SEER RIS α,t,τ for τ = 1 to 10 years for the same cohort of individuals. We show the results obtained for males and females 55–59 years old, and for every cancer site in Figures 7,8,9. From these figures, we can observe that the model predicts very well the relative interval survivals. For those cancer sites with greater number of cases, such as colon, lung, breast, corpus uteri, and prostate cancer, the model fits very well. For those with smaller numbers such as cancers of liver and pancreas, the estimated RIS smoothes the curves for the observed RIS, also showing a very good fit.

Figure 7
figure 7

Comparison between predicted and observed relative interval survival for 55–59 year olds for 15 cancer sites, by sex, 1986 (1)

Figure 8
figure 8

Comparison between predicted and observed relative interval survival for 55–59 year olds for 15 cancer sites, by sex, 1986 (2)

Figure 9
figure 9

Comparison between predicted and observed relative interval survival for 55–59 year olds for 15 cancer sites, by sex, 1986 (3)

We also tested the validity of our model when applied to other populations. We have chosen the age groups from 45 to 79 in which cancer mortality rate is relatively stable and the probability of miscoding of cause of death is small. Figures 10,11 shows the comparison between model estimates and vital registration data for six sub regions in age group 65–69 (AmrB, EurA, EurB1, EurB2, EurC, and WprA sub regions). The estimated coefficients and p-values for the test of Pearson's correlation by using all data points in age groups from 45 to 79 were also presented. When compared with site-specific mortality distribution of the Globocan 2000 based on regional cancer registries for AFRO and SEARO regions, the model estimate also yielded consistent mortality distribution pattern (Figure 12). In all cases, the correlation coefficients were in the range of 0.91 to 0.98, suggesting that model estimates for these regions are quite consistent with the observed cancer mortality distribution.

Figure 10
figure 10

Mortality distribution by site: comparison between model estimates and vital registration data in three sub regions (AmrB, EurA, and EurB1) Cancer site: 1 = mouth and pharynx, 2 = oesophagus, 3 = stomach, 4 = colon and rectum, 5 = liver, 6 = pancreas, 7 = trachea, bronchus and lung, 8 = melanoma of the skin, 9 = breast, 10 = cervix uteri, 11 = corpus uteri, 12 = ovary, 13 = prostate, 14 = bladder, 15 = lymphomas and multiple myeloma, 16 = leukaemia. r = Pearson's correlation coefficient when analysed with all data in age groups 45–79.

Figure 11
figure 11

Mortality distribution by site: comparison between model estimates and vital registration data in three sub regions (EurB2, EurC, and WprA) Cancer site: 1 = mouth and pharynx, 2 = oesophagus, 3 = stomach, 4 = colon and rectum, 5 = liver, 6 = pancreas, 7 = trachea, bronchus and lung, 8 = melanoma of the skin, 9 = breast, 10 = cervix uteri, 11 = corpus uteri, 12 = ovary, 13 = prostate, 14 = bladder, 15 = lymphomas and multiple myeloma, 16 = leukaemia. r = Pearson's correlation coefficient when analysed with all data in age groups 45–79.

Figure 12
figure 12

Mortality distribution by site: comparison between model estimates and previous estimates based on cancer registration data in AFRO and SEARO regions Cancer site: 1 = mouth and pharynx, 2 = oesophagus, 3 = stomach, 4 = colon and rectum, 5 = liver, 6 = pancreas, 7 = trachea, bronchus and lung, 8 = melanoma of the skin, 9 = breast, 10 = cervix uteri, 11 = corpus uteri, 12 = ovary, 13 = prostate, 14 = bladder, 15 = lymphomas and multiple myeloma, 16 = leukaemia. r = Pearson's correlation coefficient when analysed with all data in age groups 45–79.

Probability of 5-year survival and mean duration by site

The proposed model also yields RIS for various years and the mean duration of cancer by site, both of which are important inputs for the future estimation of cancer morbidity burden in terms of years lived with disability (YLDs). As an illustration of the further use of our model, Figures 13 and 14 show, respectively, the conventional 5-year survival and average duration for the female cancer patients aged 45–54 in four different sub regions (AfrE, AmrA, SearD and WprB3). Depending on the site, the chance of 5-year survival and average duration varied considerably across the resigns, which are consistent with the estimated survival pattern above.

Figure 13
figure 13

Five-years survival rate by site in four sub regions, females, age 45–54

Figure 14
figure 14

Average duration of patients who die from cancer in four sub regions, females, age 45–54

Discussion

As a part of the Global Burden of Disease 2000 (GBD 2000) study, we have developed a multiplicative model of relative interval survival for cancer by site based on the best available evidence from published population-based survival data from both developed and developing countries. Because of the sparseness of survival data for the developing regions of the world, we decided to use all the available data, including the most valuable cancer registry data compiled by the International Agency for Research on Cancer (IARC), national cancer registries, and lengthy time series data from the United States, to establish trends in survival with gross domestic product (GDP) per capita and then to use latest estimates of GDP per capita for developing regions, in order to estimate survival by site.

This approach takes into account, through increases in average GDP per capita for regions, the likely improvements in survival over the periods since those for which developing country survival data are available. Since our survival model adjusted for age and differences in competing mortality in each population by employing relative interval survival, the remaining variations in survival are likely to be due to differences in diagnosis and availability of appropriate treatment options. For instance, the large variation in survival was observed among cancers of bladder, breast, melanoma of the skin, and hematological malignancies such as lymphoma and leukemia for which effective therapy is established in developed regions. For the cancers of cervix and colon and rectum, both early detection and availability of treatment determine the survival and the variation among regions were moderate. On the other hand, cancers with very poor prognosis showed very small variations across the regions. The survival pattern across the regions are consistent with previous analysis based on the IARC cancer registry data [35].

The proposed model takes into account time in its three dimensions: age, calendar year (period) and time since cancer (cohort) and, due to the availability of data, the model outcome was compared the to the data reported by the US vital statistics and other regions of the world. This has given us the opportunity to evaluate our model and the data available.

However, perhaps the main advantage of this approach to estimating regional survival distributions by cancer site for developing regions is that the model correctly estimates survival and smoothes it and ensures that regional survival estimates are consistent with trends in survival across all regions, where the numbers for some cancer sites are small and highly fluctuate, and are inconsistent with other regions. For example, as can be seen in Figure 1a, cancer registry survival estimates for some sites in some developing countries are better than recent experience in the United States, or significantly below the trend line with GDP per capita, suggesting that survival may have been overestimated due to small numbers or incomplete case follow-up. In these cases, the survival model provides survival estimates more consistent with the complete body of evidence. The second advantage of the proposed approach is that the model is flexible enough to yield the survival estimates of various age, years and period as well as mean duration of time of cancer by site. In addition to mortality and incidence estimates[36], such information is required to estimate the cancer burden in terms of disability-adjusted life years (DALYs) in the future analysis for the GBD 2000[1].

The main limitations for applying this model were the relative lack of region-specific survival data and very few, and probably not always representative, regional cancer incidence data for some developing regions. We assumed that cancer incidence reported by a few countries of one region or sub region would represent the incidence of the whole area, which may not always be the case.

It is suggested that model-based estimates of cancer mortality in the previous GBD 1990 study did not reflect the actual profile of cancer recorded at the regional registries, in particular of site-specific cancer mortality distribution [3, 5, 37]. Although population-based estimates from cancer registry data should be incorporated, they may not to be representative of the whole countries they should represent. Such estimates are sometimes restricted to certain geographic areas and also related to the extent of health care and surveillance system. Furthermore, several developing regions of the world were not included in these estimates, and the need to produce model-based estimates would persist.

Contrary to the previous GBD 1990 model, the present survival model specifically developed for the GBD 2000 incorporated all available survival information obtained from registries and corrected for possible bias. The model was used to estimate the distribution of death by site not the actual magnitude of cancer mortality in regions where no or little data on detailed cause of death is available. In fact, the model estimates were quite comparable to the mortality distribution estimated from vital registration records. Furthermore, when for the regions where vital records are not available, our model was consistent with the Globocan 2000 estimates based on the regional registries [6].

Conclusions

The survival model presented offers a new approach to the calculation of the number and distribution of deaths for areas where mortality data are either scarce or unavailable. It can also be applied in areas with good quality data, but where there are small numbers of some site-specific cancers. The model is flexible enough to estimate some of the parameters required to estimate the cancer burden. In our future work, we will attempt to collect further information on survival and incidence from more individual countries in order to improve our estimates, with more precise inputs for the model.

Author's Contributions

CDM participated in the design of the study, analysed the data, and drafted the initial manuscript. KS participated in the design of the study, performed statistical analyses, and drafted the revised manuscript. CBP implemented the initial version of the survival model, carried out the data compilation, and drafted the initial manuscript. ADL participated in the design of the study and in the mortality analysis. CJLM conceived of the study and participated in its design and coordination. All authors read and approved the final manuscript.

References

  1. Murray CJL, Lopez AD, Mathers CD, et al: The Global Burden of Disease 2000 project: aims, methods and data sources. Geneva, World Health Organization. 2001

    Google Scholar 

  2. Pisani P, Parkin DM, Bray F, et al: Estimates of the worldwide mortality from 25 cancers in 1990. Int J Cancer. 1999, 83 (1): 18-29. 10.1002/(SICI)1097-0215(19990924)83:1<18::AID-IJC5>3.3.CO;2-D.

    Article  CAS  PubMed  Google Scholar 

  3. Murray CJL, Lopez AD: Mortality by cause for eight regions of the world: global burden of disease study. Lancet. 1997, 349: 1269-1276. 10.1016/S0140-6736(96)07493-4.

    Article  CAS  PubMed  Google Scholar 

  4. Parkin DM: The global burden of cancer. Semin Cancer Biol. 1998, 8: 219-235. 10.1006/scbi.1998.0080.

    Article  CAS  PubMed  Google Scholar 

  5. Parkin DM, Pisani P, Ferlay J: Estimates of the worldwide incidence of 25 major cancers in 1990. Int J Cancer. 1999, 80 (6): 827-841. 10.1002/(SICI)1097-0215(19990315)80:6<827::AID-IJC6>3.0.CO;2-P.

    Article  CAS  PubMed  Google Scholar 

  6. Ferlay J, Bray F, Pisani P, et al: Globocan 2000: Cancer Incidence, Mortality and Prevalence Worldwide, Version 1.0. IARC Cancer Base No. 5. Lyon, IARC Press. 2001

    Google Scholar 

  7. Parkin DM, Whelan SL, Ferlay J, et al: Cancer incidence in five continents. IARC Scientific Publications No. 143. Lyon, International Agency for Research on Cancer. 1997

    Google Scholar 

  8. Verdecchia A, Capocaccia R, Egidi V, et al: A method for the estimation of chronic disease morbidity and trends from mortality data. Stat Med. 1989, 8: 201-216.

    Article  CAS  PubMed  Google Scholar 

  9. Lundberg O: Methods of estimating morbidity and prevalence of disablement by use of mortality statistics. Acta Psychiatrica Scandinavica. 1973, 49: 324-331.

    Article  CAS  PubMed  Google Scholar 

  10. Wingo PA, Lndis S, Parker S, et al: Using cancer registry and vital statistics data to estimate the number of new cancer cases and deaths in the United States for the upcoming year. Journal of Regulation and Management. 1998, 25: 43-51.

    Google Scholar 

  11. Ries LAG, Eisner MP, Kosary CL, et al: SEER Cancer Statistics Review, 1973–1999. Bethesda, MD, National Cancer Institute. 2002, [http://seer.cancer.gov/csr/1973_1999/]

    Google Scholar 

  12. Sitas F, Bezwoda WR, Levin V, et al: Association between human immunodeficiency virus type 1 infection and cancer in the black population of Johannesburg and Soweto, South Africa. Br J Cancer. 1997, 75: 1704-1707.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Mueller N: Overview of the epidemiology of malignancy in immune deficiency. J Acquir Immune Defic Syndr. 1999, 21 (Suppl 1): S5-10.

    PubMed  Google Scholar 

  14. Smith C, Lilly S, Mann K, et al: AIDS-related malignancies. Ann Med. 1998, 30: 323-344.

    Article  CAS  PubMed  Google Scholar 

  15. Serraino D: The spectrum of Aids-associated cancers in Africa. AIDS. 1999, 13: 2589-2590. 10.1097/00002030-199912240-00013.

    Article  CAS  PubMed  Google Scholar 

  16. Sitas E, Pacella-Norman R, Carrara H, et al: The spectrum of HIV-1 related cancers in Sourth Africa. Int J Cancer. 2000, 88: 489-492. 10.1002/1097-0215(20001101)88:3<489::AID-IJC25>3.0.CO;2-Q.

    Article  CAS  PubMed  Google Scholar 

  17. Parkin DM, Garcia-Giannoli H, Raphael M, et al: Non-Hodgkin lymphoma in Uganda: a case-control study. AIDS. 2000, 14: 2929-2936. 10.1097/00002030-200012220-00015.

    Article  CAS  PubMed  Google Scholar 

  18. Newton R, Grulich A, Sindikubwabo B, et al: Cancer and HIV infection in Rwanda. Lancet. 1995, 345: 1378-1379. 10.1016/S0140-6736(95)92583-X.

    Article  CAS  PubMed  Google Scholar 

  19. Sitas F, Madhoo J, Wessie J: Cancer in South Africa, 1993–1995. Johannesburg, National Cancer Registry of South Africa, South African Institute for Medical Research. 1998

    Google Scholar 

  20. Chokunonga E, Levy LM, Bassett MT, et al: Cancer incidence in the African population of Harare, Zimbabwe: second results from the cancer registry 1993–1995. Int J Cancer. 2000, 85: 54-59. 10.1002/(SICI)1097-0215(20000101)85:1<54::AID-IJC10>3.3.CO;2-4.

    Article  CAS  PubMed  Google Scholar 

  21. Wabinga HR, Parkin DM, Wabwire-Mangen F, et al: Cancer in Kampala, Uganda, in 1989–91: changes in incidence in the era of AIDS. Int J Cancer. 1993, 54: 26-36.

    Article  CAS  PubMed  Google Scholar 

  22. Newton R, Ngilimana PJ, Grulich A, et al: Cancer in Rwanda. Int J Cancer. 1996, 66: 75-81. 10.1002/(SICI)1097-0215(19960328)66:1<75::AID-IJC14>3.3.CO;2-4.

    Article  CAS  PubMed  Google Scholar 

  23. Bah E, Hall AJ, Inskip HM: The first 2 years of the Gambian National Cancer Registry. Br J Cancer. 1990, 62: 647-650.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Australian Institute of Health and Welfare (AIHW): Cancer survival in Australia, 2001. Part 1: National summary statistics. Canberra, Australian Institute of Health and Welfare. 2001

    Google Scholar 

  25. Martin AA, Galan YH, Rodriguez AJ, et al: The Cuban National Cancer Registry: 1986–1990. Eur J Epidemiol. 1998, 14: 287-297. 10.1023/A:1007463826932.

    Article  CAS  PubMed  Google Scholar 

  26. Brooks SE, Hanchard B, Wolff C, et al: Age-specific incidence of cancer in Kingston and St. Andrew, Jamaica, 1988–1992. West Indian Med J. 1995, 44: 102-105.

    CAS  PubMed  Google Scholar 

  27. Adib SM, Mufarrij AA, Shamseddine AI, et al: Cancer in Lebanon: an epidemiological review of the American University of Beirut Medical Center Tumor Registry (1983–1994). Ann of Epidemiol. 1998, 8: 46-51. 10.1016/S1047-2797(97)00109-9.

    Article  CAS  Google Scholar 

  28. United Nations: World Population Prospects. The 1998 revision Volume III: Analytical Report. New York, United Nations. 2000

    Google Scholar 

  29. Surveillance, Epidemiology, and End Results (SEER): Program Public-Use Data (1973–1999). National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April based on the November. 2002

    Google Scholar 

  30. Eisenberg H, Sullivan PD, Connelly RR: Cancer in Connecticut. Survival experience. Hartford, Connecticut State Department of Health. 1968

    Google Scholar 

  31. Sankaranarayanan R, Black RJ, Parkin DM: Cancer survival in developing countries. IARC Scientific Publications No. Lyon, International Agency for Research on Cancer. 1998

    Google Scholar 

  32. Berrino F, Capocaccia R, Estève J, et al: Survival of cancer patients in Europe: the EUROCARE-2 study. IARC Scientific Publications No. 151. Lyon, International Agency for Research on Cancer. 1999

    Google Scholar 

  33. Osaka Prefectural Department of Public Health and Welfare, Osaka Medical Association, Osaka Medical Center for Cancer and Cardiovascular Diseases: Annual Report of Osaka Cancer Registry No. 64-Cancer Incidence and Medical Care in Osaka in 1998 and the Survival in 1994. Osaka, Japan, Osaka Prefectural Department of Public Health and Welfare. 2001

    Google Scholar 

  34. Greene WH: Econometric analysis. New York, Prentice Hall. 1997, 3

    Google Scholar 

  35. Sankaranarayanan R, Swminathan R, Black RJ: Global variations in cancer survival. Cancer. 1996, 78: 2461-4. 10.1002/(SICI)1097-0142(19961215)78:12<2461::AID-CNCR2>3.0.CO;2-N.

    Article  CAS  PubMed  Google Scholar 

  36. Shibuya K, Mathers CD, Boschi-Pinto C, et al: Global and regional estimates of cancer mortality and incidence by site: II. Results for the Global Burden of Disease 2000. BMC Cancer. 2002, 37-10.1186/1471-2407-2-37. 2

  37. Gupta P, Sankaranarayanan R, Ferlay J: Cancer death in India: is the model-based approach valid?. Bull World Health Organ. 1994, 72: 943-944.

    CAS  PubMed  PubMed Central  Google Scholar 

Pre-publication history

Download references

Acknowledgements

Many people are contributing to the analysis of cancer incidence and mortality for the GBD 2000 both inside and outside WHO. We wish to particularly acknowledge the contributions of staff within the Global Program on Evidence for Health Policy who have contributed to the estimation of total cancer deaths for the year 2000: Majid Ezzati, Brodie Ferguson, Mie Inoue, Rafael Lozano, Doris Ma Fat, and Lana Tomaskovic. We would also like to than the valuable comments from Hideaki Tsukuma and Akira Oshima of the Osaka Prefectural Department of Public Health and Welfare. We thank staff of the International Agency for Research on Cancer (IARC) for provision of data, advice on survival analyses carried out by IARC and methods used to estimate cancer incidence and mortality for Globocan 2000, particularly Max Parkin, Jacques Ferlay, Paola Pisani and Fred Bray.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Colin D Mathers.

Additional information

Competing Interests

None declared.

Authors’ original submitted files for images

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mathers, C.D., Shibuya, K., Boschi-Pinto, C. et al. Global and regional estimates of cancer mortality and incidence by site: I. Application of regional cancer survival model to estimate cancer mortality distribution by site. BMC Cancer 2, 36 (2002). https://doi.org/10.1186/1471-2407-2-36

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2407-2-36

Keywords