Life tables for global surveillance of cancer survival (the CONCORD programme): data sources and methods

Background We set out to estimate net survival trends for 10 common cancers in 279 cancer registry populations in 67 countries around the world, as part of the CONCORD-2 study. Net survival can be interpreted as the proportion of cancer patients who survive up to a given time, after eliminating the impact of mortality from other causes (background mortality). Background mortality varies widely between populations and over time. It was therefore necessary to construct robust life tables that accurately reflected the background mortality in each of the registry populations. Methods Life tables of all-cause mortality rates by single year of age and sex were constructed by calendar year for each population and, when possible, by racial or ethnic sub-groups. We used three different approaches, based on the type of mortality data available from each registry. With death and population counts, we adopted a flexible multivariable modelling approach. With unsmoothed mortality rates, we used the Ewbank relational method. Where no data were available from the registry or a national statistical office, we used the abridged UN Population Division life tables and interpolated these using the Elandt-Johnson method. We also investigated the impact of using state- and race-specific life tables versus national race-specific life tables on estimates of net survival from four adult cancers in the United States (US). Results We constructed 6,514 life tables covering 327 populations. Wide variations in life expectancy at birth and mortality by age were observed, even within countries. During 1995–99, life expectancy was lowest in Nigeria and highest in Japan, ranging from 47 to 84 years among females and 46 to 78 years among males. During 2005–09, life expectancy was lowest in Lesotho and again highest in Japan, ranging from 45 to 86 years among females and 45 to 80 years among males. For the US, estimates of net survival differed by up to 4% if background mortality was fully controlled with state- and race-specific life tables, rather than with national race-specific life tables. Conclusions Background mortality varies worldwide. This emphasises the importance of using population-specific life tables for geographic and international comparisons of net survival. Electronic supplementary material The online version of this article (doi:10.1186/s12885-017-3117-8) contains supplementary material, which is available to authorized users.


Background
The CONCORD-2 study was designed to establish longterm surveillance of cancer survival worldwide, by central analysis of population-based cancer registry data. Net survival from 10 common malignancies was estimated from individual patient data submitted by 279 cancer registries in 67 countries [1].
Net survival of a cohort of cancer patients is estimated as the probability of survival derived solely from the cancerspecific hazard of death. It can be interpreted as the proportion of cancer patients who survive up to a given time after diagnosis (e.g. 5 years), after eliminating the impact of other causes of death (background mortality). This is done by separating the excess hazard of death due to cancer from the background mortality. Background mortality often differs widely between populations, and can even differ substantially within registry populations, for instance by race [2], ethnic group [3] or socio-economic status [4].
Information on background mortality in a given population is obtained from life tables, which are tables of age-and sex-specific death rates or probabilities in a given population at a given point in time. Net survival will be more accurate if the estimates of background mortality are as close as possible to each particular individual's "real" expected mortality from all causes. Previous international studies of cancer survival [5], including the first CONCORD study [2,6], have recommended that life tables specific to the area in which cancer patients live should be used, rather than national life tables, which may not account for sub-national differences in mortality. Ideally, these life tables should be by single calendar year, single year of age, sex, and race (or ethnicity) or deprivation when the relevant data are available. Such life tables are, however, not generally available: national statistical offices often only produce life tables for the whole country or major geographic regions.
In this article, we outline the methods used to construct life tables for the CONCORD-2 study, which is the largest comparison of worldwide trends in population-based cancer survival to date. We document the wide variations in life expectancy and age-specific mortality between and within the populations covered by the 279 participating cancer registries. We examine trends in life expectancy in regions within continents, and even within countries. We also investigate the importance of using regional vs. national life tables in the estimation of net survival, by comparing estimates for four adult cancers (breast, colon, lung, prostate) in 44 US registries, using either a US national, race-specific life table [7] or the race-and state-specific life tables that were constructed for the CONCORD-2 study.

Methods
All 279 cancer registries participating in the CONCORD-2 study were invited to contribute data for patients diagnosed during all or part of the calendar period 1995-2009, with follow-up to 31 December 2009, or a later year. To enable estimation of net survival for these patients, registries were asked to provide data on background mortality for each calendar year for which they submitted cancer data, from the first year of incidence to the last year of follow-up. They were offered the option of supplying their own life tables or providing death and population counts from which we could construct the life tables required.
Some registries also supplied life table data for racial or ethnic sub-populations within their territory: in all, we received data for 327 populations. The Israel National Cancer Registry and all 44 participating United States (US) cancer registries submitted death and population data from which to construct life tables by ethnicity (Israel, national-level) or race (US, state-level). The New Zealand Cancer Registry and the Penang Cancer Registry (Malaysia) provided mortality rates by ethnicity at a national level. Both the Polish National Cancer Registry and the Austrian Cancer Registry submitted mortality rates for the sub-regions covered by their registries (voivodeships for Poland, bundeslands for Austria). Neither registry submitted data by ethnicity.
We classified the data we received into four categories on the basis of their structure and quality: i) death and population counts by single year of age; ii) death and population counts by age group (typically five years); iii) mortality rates by single year of age; and iv) mortality rates by age group. A fifth category included registries from which life table data were unavailable or deemed unreliable. The methods used to construct life tables were different for each of the five categories (Table 1).
Some registries did not provide life tables (or the corresponding death and population counts) for each calendar year covered by their cancer data. We constructed life tables for any intervening years by linear interpolation of the age-specific death rates. If the calendar span of life tables was shorter than the calendar span of the cancer incidence and follow-up data, life tables for the earliest or latest available year were used for the missing years, i.e. without extrapolation, so that we would have estimates of background mortality for every year included in the cancer data.

Life tables from death and population counts (categories i and ii)
In all, 172 (62%) of the participating registries provided data on the numbers of deaths and the population size (death and population counts) by age and sex (table 1). A flexible multivariable model (flexible Poisson model) [8] was used to derive the required age-and sex-specific mortality rates. This method was chosen because it was recently recommended for the estimation of smoothed age-specific mortality rates for small populations [8]. This approach also allowed for the modelling of mortality rates by race or ethnicity, where the data were available.
The death counts were modelled separately for each sex and calendar year, within the generalised linear model framework, using a Poisson error and log link. Person-years at risk were used as the offset: where x denotes age in years, d x denotes the age-specific death count, β 0 denotes the coefficient at baseline (i.e. the log of the mortality rate at the reference age), f(x) denotes a restricted cubic spline function on age, and pyrs x denotes the age-specific person-years at risk. The model was implemented using the Stata command mvrs (multivariable regression splines) [9] in Stata 13. Splines are made up of piecewise polynomial functions joined at locations called knots. The process we used to select the knot locations is summarised in Additional file 1 and in detail elsewhere [8]. We used the flexible Poisson model with a continuous interaction between race/ethnicity and age to construct race/ethnicity-specific life tables for the Israel National Cancer Registry (ethnicity) and the 44 US states (race). Further details are provided in Additional file 2.
We used three calendar years of death and population counts around a central year, so that the resulting life tables would not be as susceptible to year -on -year fluctuations.
Life tables from mortality rates (categories iii and iv) We obtained age-specific mortality rates from 83 (30%) of the participating registries (Table 1). Of these, 73 (88%) provided mortality rates by single year of age (complete life tables) and 10 (12%) provided rates by five-year age group (abridged life tables). Of those registries that submitted complete life tables, 56 (77%) provided smoothed versions for each calendar year submitted (where the raw, age-specific mortality rates had been modelled up to age 99 years to remove any random fluctuations by age) and 17 (23%) did not.
Where the mortality rates we received had not been smoothed, we used the Ewbank relational method [10] to derive a smoothed mortality profile for the given population. The Ewbank method is an extension of the Brass relational method [11]. The Brass method involves plotting the linear relationship between the logits of two survivorship functions, one from a standard life table and the other from observed data. Plotting this linear relationship provides information on two parameters, one for the level of mortality in the model (a) and another for the slope of the observed survivorship curve relative to the standard curve, i.e. the relation between young and old age mortality in the observed data relative to the standard (β). These two parameters are then used to determine the shape of a smoothed survivorship function for the observed data. The Ewbank method includes two additional parameters: one for childhood mortality (κ) and another for mortality at older ages (λ). The parameter for childhood mortality applies before the median age at death in the population. The parameter for mortality at older ages applies after the median age at death.
If mortality rates were available by single year of age up to 99 years, we used all four parameters (level of mortality, relation between young and old age mortality, childhood mortality, older-age mortality). In many populations, the median age at death was close to 80 years of age, or higher. For abridged life tables, in which the highest age group is typically for ages 85 years and above, this meant that data to estimate values for the older-age mortality parameter (which only applies after the median age at death) were often available for only one or two age groups. This has previously been found to cause instability in the estimated older-age mortality parameter, leading to unreliable estimates of older-age mortality [12]. For abridged mortality rates, we therefore used only three parameters and constrained the parameter for older-age mortality to be a factor of the parameter for the level of mortality [10].
Registries for which no reliable data were available (category v) We were unable to obtain reliable life  [13]. We centred these on years 1997, 2002 and 2007 and smoothed the abridged values using the Elandt-Johnson method [14]. The Elandt-Johnson method has been recommended for deriving single-year-of-age life tables from abridged ones [15]. As above, we produced life tables for individual calendar years by age-specific linear interpolation between the life tables for each of the three calendar periods. For one of these registries, Gibraltar, no life table data were available from the UNPD [13], WHO [16], Global Burden of Disease Study [17] or the Human Mortality Database [18], so we used the life table we constructed for England.

Evaluation and comparison of derived life tables
Life expectancy at birth is a summary measure of agespecific mortality. We calculated life expectancy at birth, the infant mortality rate (probability of dying between birth and exact age 1), childhood mortality rate (probability of dying between birth and exact age 5), and the probabilities of dying between exact ages 15 and 60, 60 and 85, and 85 and 99 years from each of the derived life tables. Life expectancies at birth and the probabilities of death were summarised in a standardised report for each cancer registry (see example in Additional file 3). The reports included plots of the smoothed mortality curves on both logarithmic and arithmetic scales.
Performance of the flexible Poisson model was also evaluated from plots of the deviance residuals at each age. Deviance residuals are a measure of how closely the modelled values fit the observed data. The residuals should be approximately normally distributed, with a constant range, if the model fits the data well [19]. We deemed the model to be performing well if the standardised deviance residuals were in the range −2 to +2.

Results
In total, 6,514 life tables were constructed as part of the CONCORD-2 study: of these, 6,392 life tables were constructed for 223 (80%) registries with the flexible Poisson model, the Ewbank method or the Elandt-Johnson method. A further 35 registries (12.5%) provided smoothed life tables that did not cover all calendar years; for these registries, we constructed 122 life tables by linear interpolation. We received smoothed, complete, life tables for all calendar years from 21 registries (7.5%). No modifications were required for these life tables.
The type of data received varied by continent (    These variations and trends in life expectancy at birth summarise the underlying patterns and trends in agespecific mortality, which also varied very widely (Table 3; Figs. 3, 4 and 5; Additional file 4). Worldwide, the greatest range in the probability of death among adults was seen in the age range 60 to 85 years, both during 1995-1999 (37.9% to 93.7% among females; 55.9% to 94.3% among males) and during 2005-2009 (31.5% to 93.5% among females; 51.4% to 93.9% among males).
Where we obtained background mortality data by race or ethnic group, the majority group (whites in the United States, Jews in Israel, Non-Maoris in New Zealand) tended to have higher life expectancy at birth than the other subgroup(s). Malaysia (Penang Cancer Registry) was the exception, where life expectancy among the Chinese (23% of the population) was higher than among the majority Malay (50%) population [20] (Fig. 6; Additional file 4). Correspondingly, there were clear disparities in agespecific mortality, but the rates tended to converge among the elderly. For some states in the US, the population of blacks is so small that death counts were not available for several age groups. We were therefore unable to construct robust life tables for blacks in Hawaii, New Hampshire, Montana, Idaho or Wyoming, even with the flexible Poisson model. For Utah and Alaska, the black life tables were also based on small counts, but data were available for enough age groups for us to construct life tables for use in survival analyses.

Impact of using state-and race-specific life tables
We compared five-year net survival estimates for the 44 participating US registries for patients diagnosed during 2005-2009, obtained using state-and race-specific life tables that we had constructed using the flexible Poisson model, with the corresponding survival estimates derived with the national, race-specific life tables obtained from the National Center for Health Statistics (NCHS) [21]. For this comparison, we chose four cancers with very different prognosis: breast and prostate (high), colon (medium) and lung (low).
Absolute differences between the two sets of survival estimates were greatest for states where life expectancy at birth differed most from the national average, and for cancers with a better prognosis. They were smallest for states where life expectancy at birth differed least from the national average, and for cancers with a poor prognosis, where the majority of deaths were excess deaths. Differences were largest for men with prostate cancer, and for women with breast cancer, and smallest for lung cancer in both sexes ( Table 4). The greatest difference was 3.6% for prostate cancer in Mississippi.

Discussion
In order to establish worldwide surveillance of populationbased cancer survival trends in the CONCORD-2 study [1], we needed to obtain or construct life tables of background mortality by age, sex and calendar year that were as specific as possible for each registry population or sub-population.
This was particularly important in light of the tremendous intra-continental and even sub-national variations in background mortality. The UN Population Division, the World Health Organisation and the Global Burden of Disease study regularly produce life tables for countries worldwide [13,16,17,22], but they are for countries, rather than sub-regions or ethnic/racial groups, and they Probability (%) of dying between ages 60 and 85 years Fig. 4 Probability of dying between ages 60 and 85: range, by continent, calendar period and sex. The numbers in brackets beside each calendar period denote the number of registries contributing life table data for that calendar period. Each dot on the graph represents a registry population or sub-population may not accurately reflect the background mortality in the specific population(s) covered by a cancer registry. We were obliged to use several methods to construct the life tables, because of the different types of data available from the registries (complete or abridged death and population counts; complete or abridged mortality rates; no reliable data). These methods involved different assumptions about the shape of age-mortality patterns and the rate of increase of mortality at older ages. The different assumptions made in the construction of the life tables may have had an impact on the subsequent estimates of net survival, and this warrants further investigation.
We recommend using the multivariable flexible Poisson model to construct life tables for future international comparisons of population-based cancer survival. We found that this method performed well, even for small populations. It does not rely on an external standard population or a pre-defined set of coefficients, and therefore does not make strong assumptions about the age-pattern of mortality. It was also recently found to perform better than the Elandt-Johnson method and a flexible relational method (based on the Ewbank approach) for small populations [8].
Life expectancy at birth varied by more than 30 years among the 327 populations examined in the 279 registries. In Canada alone, during 2005-2009, life expectancy differed by 10 years between residents of Nunavut (females 73.4 years; males 68.3 years) and British Columbia (females 83.4 years; males 78.9 years). These differences are probably explained by the very different demographic profiles of these two provinces: aboriginal people made up 86% of the population of Nunavut in 2011 [23].
In most populations, life expectancy increased during 1995-2009, but in Lesotho and South Africa it fell by as much as 6 years, most probably because of the HIV/ AIDS epidemic emerging in those countries during the 1990s [24,25].
We constructed ethnic or race-specific life tables for Israel, Malaysia (Penang Cancer Registry), New Zealand and the US. These life tables showed marked differences in background mortality between the ethnic and racial sub-populations in each country. In 5 of the 44 participating US states (Hawaii, New Hampshire, Montana, Idaho, Wyoming), it was not possible to construct sufficiently robust life tables for blacks. However, we were able to use race-or ethnic-specific life tables to estimate Examination of the impact of using race-specific life tables for each US state on estimates of net survival showed that age-standardised estimates differed by up to 3.6%, when compared with estimates obtained with the national race-specific life tables that have been used in the past. The differences were more marked for cancers with better prognosis. This is in line with previous findings [2,26]. The largest difference observed was in the estimate of age-standardised five-year net survival for prostate cancer in Mississippi, which was 3.6% higher when derived with state-and race-specific life tables than when using national life tables. The explanation is that background mortality among adults in Mississippi is considerably higher than the US national average, for both blacks and whites. National life tables therefore under-estimate background mortality in Mississippi, leading us to over-estimate excess mortality and subsequently underestimate net survival. Of note, we did not investigate age-, sex-or race-specific differences in net survival estimated with the alternative life tables. Differences in net survival from those obtained with national life tables will be larger in some of the groups defined by age and race and, for other cancers, sex. This is a further reason for using the most specific life tables that can be obtained.
Stroup et al. [26] recently examined the differences between estimates of relative survival by age, sex and race for 17 SEER registries, obtained either with state-and race-specific life tables or with national, race-specific life tables. The differences were greatest for patients aged 85 years or over, and differed in both direction and magnitude by race and sex. They deemed the NCHS stateand race-specific life tables unreliable above age 85 and recommended against using them to estimate relative survival for patients aged 85 and older. We have compared the probabilities of dying between ages 85 and 99 years for each state, race and sex derived from the CONCORD-2 life tables with those derived from the national, race-specific life tables available from NCHS. Estimates of the probability of dying between ages 85 and 99 were higher for black males and females, and to a lesser extent for white males, when derived from the CONCORD-2 state-and race-specific life tables than when derived with the national life tables. For white females, the estimates derived from the CONCORD-2 life tables were fairly evenly distributed around the corresponding national estimates.
We opted to use life tables that were specific to each registry or sub-population, wherever possible, in order to reflect as closely as possible the background mortality that an individual cancer patient would be expected to experience. This is critical when estimating net survival. Accurate life tables allow us to estimate the net survival of cancer patients within each population, rather than an approximation obtained with a less specific life table.