Accuracy of cause of death data routinely recorded in a population-based cancer registry: impact on cause-specific survival and validation using the Geneva cancer registry

Background Information on the underlying cause of death of cancer patients is of interest because it can be used to estimate net survival. The population-based Geneva Cancer Registry is unique because registrars are able to review the official cause of death. This study aims to describe the difference between the official and revised cause-of-death variables and the impact on cancer survival estimates. Methods The recording process for each cause of death variable is summarised. We describe the differences between the two cause-of-death variables for the 5,065 deceased patients out of the 10,534 women diagnosed with breast cancer between 1970 and 2009. The Kappa statistic and logistic regression are applied to evaluate the degree of concordance. The impact of discordance on cause-specific survival is examined using the Kaplan Meier method. Results The overall agreement between the two variables was high. However, several subgroups presented a lower concordance, suggesting differences in calendar time and less attention given to older patients and more advanced diseases. Similarly, the impact of discordance on cause-specific survival was small on overall survival but larger for several subgroups. Conclusion Estimation of cancer-specific survival could therefore be prone to bias when using the official cause of death. Breast cancer is not the more lethal cancer and our results can certainly not be generalised to more lethal tumours.


Background
Population-based cancer survival is widely used to evaluate the impact of health care systems in disease management. Net survival is the survival that would be observed if the only possible cause of death were the cancer of interest [1]. Net survival is especially relevant when the cohort of interest become older since the risk of dying from other causes than cancer increases. Net survival is also very useful when comparing subgroups whose mortality due to other causes could be different and therefore lead to biased estimation of the survival contrast.
Two main data designs can be distinguished, the causespecific and the relative survival designs, according to the availability of information on cause of death. Such information is rarely available in routine, population-based data and net survival is then commonly estimated within the relative survival framework. However, when information about the underlying cause of death is available, net survival can be estimated using the cause-specific approach, in which only deaths from the cause of interest are considered as 'failures', while deaths from other causes are censored. High-quality information on the cause of death is required for each individual patient. This information is commonly available only in clinical trials or hospital series, but the cause-specific approach is sometimes used on population-based data from cancer registries, where the underlying cause of death is derived from death certificates. The underlying cause of death is the "disease or injury which initiated the train of morbid events leading directly to death" or the "circumstances of the accident or violence which produced the fatal injury". It is codified in The International Classification of Diseases (ICD), which was designed to classify causes of death for statistical tabulation and research. Despite these international rules (developed over 100 years), comparability and accuracy issues still arise. Different medical terminologies, inaccurate completion of the death certificates, misinterpretation or misapplication of the coding rules for selection of the underlying cause of death can cause comparability problems between different geographical areas and/or different periods of time. The validity and accuracy of the reported underlying cause of death may also be incorrect if the clinician's certification does not accurately reflect the clinical history of events leading to death. Percy et al. were the first to report that misclassification of the underlying cause of death could bias the mortality trends and therefore the estimation of cancer-specific survival [2]. Many other studies, then, have highlighted the issue of inaccuracy of the cause of death information obtained from death certificates [3][4][5][6][7][8][9]. Some studies have shown that the proportion of misclassification can be very high [4,10]. However, one study has suggested that the proportion of misclassification can be lower for screened patients dying from breast cancer [11].
The validity of disease-specific survival is based on the assumption that the underlying cause of death is accurately determined. The Geneva Cancer Registry, which collects all the death certificates of routinely recorded deaths in the Geneva canton (Switzerland), also reviews the cause of death of each registered cancer patient using all the available clinical information relating to the patient's disease and treatment. This leads to a particular and unique situation in which a second, validated variable defining the cause of death is generated. This second variable is considered to be a more reliable record of the patient's cause of death and so will be expected to give rise to more accurate estimates of causespecific survival.
The purposes of this study are (a) to describe the process of recording the cause of death in the Geneva Cancer Registry, (b) to investigate how accurate the routinely recorded cause of death is compared to the validated cause of death derived from clerical review and (c) to examine whether the process of validation leads to differences in the estimates of cause-specific survival.

Data
The data used in this study were obtained from the Geneva Cancer Registry. All women diagnosed with a breast cancer between 1970 and 2009 and resident in Geneva were included in the study.
The Geneva Cancer Registry collects information on incident cancer cases from various sources, including hospitals, laboratories and private clinics, all requested to report new cancer cases. Trained registrars systematically extract information from the medical records and conduct further investigations in the case of missing key data. The variables of interest for this study were cause of death as specified on the death certificate, revised cause of death, age at diagnosis, age at death, year of diagnosis, year of death, social class, stage of the tumour, treatment, sector of care and place of death. The Geneva Cancer Registry has general registry approval by the Swiss Federal Commission of Experts for professional secrecy in medical research (Commission d'experts pour le secret professionnel en matière de recherche medical). This approval permits cancer data collection and its use for research purposes.

Coding of cause of death
The Geneva Cancer Registry is notified of all deaths occurring in the Geneva canton through three different processes.
First, when a patient dies in the canton of Geneva, a death certificate is compulsorily completed by the clinician certifying the death who reports the primary, secondary and concomitant causes of death. The Geneva Cancer Registry receives photocopies of all these death certificates through the Geneva Health Administration; and links them to the incidence database. The causes of death reported on the death certificates represent the original causes of death.
Meanwhile, once a year, the Federal Office of Statistics (Office Federal de la Statistique, OFS) which is a national publicly-funded organisation collecting death certificates and maintaining a mortality database for the whole of Switzerland provides the Geneva Cancer Registry with a mortality database for the Geneva canton. This is also linked to the incidence database to complete and/or validate the process described above. This leads to the definition of the official cause of death as the underlying cause of death derived from death certificates.
Finally, the Geneva Cancer Registry is provided on an annual basis with information on the vital status of the Canton population by the Cantonal Office of the Population (Office Cantonal de la Population, OCP). OCP is a regional administration that monitors births, deaths, migration, residency and civil partnerships. Only information about the vital status of a patient (deceased or not), or information on whether a person has migrated from Geneva is provided to the Registry. Information on the cause of death is not available within this database.
After all the records are merged, the Cancer Registry registrars then go back to the patient's charts and review the cause of death according to all the documents available. These include death certificates, autopsy reports, letter at death written by general practitioners and all the patient's medical notes. By this process the cause of death variable, the revised cause of death, is obtained.
Sometimes, the Geneva Cancer Registry is able to obtain information about the occurrence of a death and its cause through the health system (essentially the public health system) before information from death certificates, OFS or OCP. This is particularly so for public sector, where information about the patient's follow-up is easier to obtain than in the private sector, with which communication is mainly based on mails and willingness of the practitioners.
Some patients leave the canton of Geneva after their diagnosis with cancer, but return and die in Geneva. These individuals are recorded as dead in the OFS database. However, since no additional information on their disease was collected in the Geneva area, they are considered lost to follow up at the point of their departure by the Geneva Cancer Registry.

Statistical methods
We first examined the agreement between the official underlying cause of death and the reviewed underlying cause of death. We then evaluated the impact of such disagreement on the cause-specific survival estimates.
We used the Kappa statistic to compare concordance between the two cause-of-death variables for all patients who had died (N = 5,065). The Kappa statistic corrects for agreement expected by chance alone. Its values range from 0 to 1; 0 represents no agreement whereas 1 is perfect agreement. We stratified the analysis according to age at diagnosis, age at death, period of diagnosis, period of death, social class, stage, treatment received, sector of care and place of death. Age at diagnosis and age at death were coded into 5 categories (0-49, 50-59, 60-69, 70-79 and 80 and over), whilst four periods were used for the temporal analysis of diagnosis and death (1970-79, 1980-89, 1990-99, 2000-09). Social class was based on the patient's last job or, if missing, on the patient's partner's job. It was divided in four categories (high, medium, low and unknown) [12]. Stage followed the TNM classification [13] with 5 subgroups (stage I, stage II, stage III, stage IV, unknown). We distinguished 5 categories for the treatment each patient received: surgery only, surgery plus adjuvant therapy, hormonal treatment, others (including a mix of different palliative therapies), and an absence of treatment. Only treatments received during the first six months after diagnosis are recorded by the registry according to the IARC rules [14]. Sector of care was defined as private or public sector. We also defined 5 categories of place of death: public hospital, retirement home, private hospital, patient's home and unknown.
We used variance-weighted least-squares regression to evaluate trends in the Kappa values for sub-groups [15]. We used logistic regression to evaluate the odds of disagreement between the official and revised cause of death, associated with each of the factors listed above.
We also examined the concordance between the official and the revised cause of death as a function of time since diagnosis: patients who died within five years after diagnosis, patients who died after 5 years but before 10 years of follow-up and patients who died after 10 but before 15 years of follow-up. Because of small numbers, patients dying more than 15 years after their diagnosis were not considered.
To estimate the impact of discordance upon causespecific survival, we derived Kaplan Meier cause-specific survival curves for the whole cohort (N = 10,534) using both official and revised cause of death. In cause-specific survival analyses, patients are classified as presenting the event if they are recorded as dying from their cancer while those who die from other causes are censored at the date of their death. We performed subgroup survival analysis by age group, period of diagnosis, stage of the disease and treatment.

Results
The cohort consisted of 10,534 women (mean age 61.5 years) diagnosed between 1970 and 2009. Nearly half belonged to the middle social class groups (Table 1). About three quarters of the women were diagnosed at early stage of disease (stage I and II). Almost 90% underwent surgery, associated with adjuvant treatments such as radiotherapy (63%), hormones (44%) and chemotherapy (33%; data not shown).
Among the 5,065 women who have died, the official and the revised underlying cause of death were identical for 4,620 patients (91%) ( Table 2). 254 cases (5%) were recorded as dying of breast cancer according to their death certificate but as dying from other causes in the revised data. Among these women, the cause of death was mostly recoded to heart diseases (48%) and other malignant tumours (20%). Conversely, 191 cases (3.8%) were recorded as dying from other causes according to their death certificate but as dying from breast cancer in the revised data. Among these women, the main causes of death reported on their original death certificates were other malignant tumours (40%) or an imprecise code (19%) ( Table 2). The overall value of the kappa test was 0.82 (p-value < 0.001).
Unadjusted concordance varied greatly between subgroups ( Table 3). The concordance was significantly lower with increasing age, from 0.87 for ages 0-49 to 0.74 for ages 80+ (p-value for trend test = 0.008). Similar age-related trends, though not significant, were found among the three subpopulations defined by time since diagnosis. These age-related patterns were much less marked for age at death. Concordance was comparable in all four periods of diagnosis although it tended to be lower in the earlier periods. Concordance was greater for early stage of disease (stage I and II) compared to advanced stage (III and IV), from 0.84 for stage I to 0.63 for stage IV (p-value for trend <0.001). However, the concordance between the two underlying causes of death for women with missing stage (about 14%) tended to be higher than those for stage IV (and stage III). If these records corresponded to advanced diseases, as it is often the case, this stage-related pattern could be greatly attenuated. This pattern was more marked for patients deceased within the first five years after diagnosis. A clear pattern was found according to the type of treatment with higher concordance for complete, with curative intent, treatment (0.83), intermediate concordance for palliative treatment (0.73) and lower concordance for non-treated patients (0.63). This pattern was mostly found among patients who died within five years since  diagnosis. We found no association between social class and concordance, but a higher concordance for patients who were monitored (0.86) or who have died (0.85) in the private sector than for those in the public sector (0.80 and 0.76, respectively). Unadjusted odds ratios of disagreement between the official and the revised underlying causes of death are presented in Table 4 for the overall cohort and for the three subcohorts defined by length of follow-up. for all patients) as well as those who died in a public hospital. We did not observe differences by social class. We were unable to perform a logistic regression for the subcohort defined by a follow-up time between 10 and 15 years because of the small number of observations (<10) for several variables. Figure 1 presents the breast cause-specific survival curves up to 20 years since diagnosis using the two different cause-of-death variables, for all breast cancer patients regardless their final vital status. The survival curves matched almost perfectly, with a difference in 20-year survival lower than 1%. The estimation of proportion of patients alive after twenty years of follow-up when using the official cause of death was 60.51%, 95% CI [59.11; 61.89] and 61.26, 95% CI [59.85; 62.64] when using the revised cause of death.
We compared cause-specific survival curves estimated with the revised and official underlying cause of death for selected subgroups (Figure 2). We estimated and presented results only if 10 women were remaining in the exposed group and/or the difference between the two curves was larger than 1%. Among patients aged 70-79 the survival at 20

Discussion and conclusion
Survival statistics derived from routinely collected population-based cancer registry data are key means of reporting progress against cancer. In the Geneva Cancer Registry, in addition to the official underlying cause of death derived from the death certificate, registrars use all the available information in order to establish, where relevant, a revised underlying cause of death which allows evaluation of the accuracy of death certification. This study describes both processes of recording the cause of death and shows their impact upon estimated survival rates from breast cancer.
The overall concordance between the official and the revised underlying cause of death was high. Differences were only present for 8.8% of the deceased patients representing 4.2% of the entire cohort. This is consistent with the study conducted by Goldoni et al. [11] in 2009 who reported 4.3% misclassification among their cohort. The official underlying cause of death was revised to breast cancer in 191 women (3.8% of those who have died) according to the cancer registry registrars; the underlying cause of death of these women had mainly been coded to other tumours. This could be explained by the presence of metastases that may have misled the certifying doctor about the location of the primary cancer and leads to differences in cause-specific survival estimation among metastatic patients (Figure 2).
On the other hand, most of the 254 women (5.0% of the patients who have died), coded as breast cancer deaths on the death certificates and considered as deaths from other causes from the registry, have been attributed to heart disease. Most of these women were elderly patients diagnosed during 1970-89. At that time the guidance for death certification among cancer registries was not to emphasize the cancer as a cause of death [16]. This might explain a tendency to recode the cause of death from cancer to heart diseases among elderly.
Our results based on Kappa statistic and on logistic regression showed that disagreement was greater among elderly women, patients with advanced disease and patients receiving palliative treatment. This suggests that less attention is given by doctors certifying death to the underlying cause of death for patients who are more likely to die. Concordance is also lower within the first five years after diagnosis, suggesting that more accurate information is available to the registrars assessing the true underlying cause of death during a shorter period of follow-up.
We also observed increasing concordance in successive calendar periods of death. Since this variable closely represents the year in which the review took place, several explanations may apply. First, the Geneva Cancer Registry may have less information in more recent times. This seems unlikely since more linkages have been set up over time with the health system in the canton, allowing a greater exchange of data. More likely, the accuracy of death certificates has improved over time which has led to more confidence in the official coding supplied on death certificates.
It is legitimate to ask why the reliability of cause of death reported on the death certificates may be questioned at all. It can be argued that the general practitioner responsible for the patient is the person most likely to be aware of the underlying cause of death insofar as they are aware of all the clinical information and also often know the patients personally. However, this advantage is not always capitalised on. Physicians are more likely to misclassify the cause of death than a trained registrar [4,10,[17][18][19]. The general practitioner is not always concerned about the epidemiological information they are providing, and may not be aware of the international rules of WHO about the coding of the cause of death. Moreover, the general practitioner often receives the results of the autopsy after the death certificate has been issued and therefore does not take into account the report when certifying the death. The registrars of the Years Patients with stage IV.
Using the official cause of death (1) Using the revised cause of death (2) Difference between curve (1) and curve (2) Figure 2 Up-to-20-year cancer-specific survival using 1) the cause of death based on death certificate only and 2) cause of death reviewed by registrars and the absolute difference between them: female breast cancer patients diagnosed between 1970 and 2009. Selected results by co-variables.
Geneva Cancer Registry, on the other hand, are able to access all pathological and histological information and/or the clinical information for most cases and review the cause of death only in the light of the autopsy reports. In addition, the registrars are more experienced with epidemiological data and its coding. James [20] showed that coding the cause of death using death certificates only, in isolation from all other available information, led to biased interpretations of the cause of death. Our study tends to legitimate the process of verification that is performed in the Geneva Cancer Registry and induces that the resulting estimation of survival is more accurate.
Both methods aim to assign as best as possible the cause of death and none of them can be considered as the gold standard. We nevertheless consider the revised cause of death as more accurate insofar as additional information is available to experienced registrars but not necessarily to the practitioner completing the death certificate.
Overall, the revised underlying cause of death did not have a major impact on the cause-specific survival up to 20 years. However, important differences appeared in several subgroups suggesting that using the official underlying cause of death could lead to biased estimation of cause-specific survival in some populations.
The main limitation of this study relates to the proportion of women who have died for whom information other than the death certificate was available. The more information available, the more likely it is that we will be able to find discordance. High concordance reflects either lack of additional information available to correct the official cause of death, or that the death certificates define the cause of death fairly well. However, among our cohort of 5,062 deceased patients, a high percentage was monitored and/or passed away in the public sector of care, where access to information about cause of death is more readily available. We therefore assume that information enabling review of the underlying cause of death was available for the great majority of women who had died and that the overall high concordance between official and revised underlying cause of death is real.
Moreover, the number of deaths in the cohort influences the discordance. The more deaths, the more likely it is to find differences between the two causes-of-death and then the concordance. This state is confirmed in our study with a higher discordance among elderly. Breast cancer is not the more lethal cancer and our results can certainly not be generalised to other tumour localisations.
The Geneva Cancer registry data represent a unique opportunity to review the accuracy of the cause of death recorded on a death certificate by comparing it to all the available information in the health system. We observed that the overall concordance with the cause of death found on the death certificates is fairly high. More particularly, the impact on estimates of cause-specific survival is very small overall, although analyses in subgroups show larger differences, suggesting that misclassification of the underlying cause of death could lead to biased estimation of differences or trends in causespecific survival.