Completeness and selection bias of a Belgian multidisciplinary, registration-based study on the EFFectiveness and quality of Endometrial Cancer Treatment (EFFECT)

Background With the aim of obtaining more uniformity and quality in the treatment of corpus uteri cancer in Belgium, the EFFECT project has prospectively collected detailed information on the real-world clinical care offered to 4063 Belgian women with primary corpus uteri cancer. However, as data was collected on a voluntary basis, data may be incomplete and biased. Therefore, this study aimed to assess the completeness and potential selection bias of the EFFECT database. Methods Five databases were deterministically coupled by use of the patient’s national social security number. Participation bias was assessed by identifying characteristics associated with hospital participation in EFFECT, if any. Registration bias was assessed by identifying patient, tumor and treatment characteristics associated with patient registration by participating hospitals, if any. Uni- and multivariable logistic regression were applied. Results EFFECT covers 56% of all Belgian women diagnosed with primary corpus uteri cancer between 2012 and 2016. These women were registered by 54% of hospitals, which submitted a median of 86% of their patients. Participation of hospitals was found to be biased: low-volume and Walloon-region centers were less likely to participate. Registration of patients by participating hospitals was found to be biased: patients with a less favorable risk profile, with missing data for several clinical-pathological risk factors, that did not undergo curative surgery, and were not discussed in a multidisciplinary tumor board were less likely to be registered. Conclusions Due to its voluntary nature, the EFFECT database suffers from a selection bias, both in terms of the hospitals choosing to participate and the patients being included by participating institutions. This study, therefore, highlights the importance of assessing the selection bias that may be present in any study that voluntarily collects clinical data not otherwise routinely collected. Nevertheless, the EFFECT database covers detailed information on the real-world clinical care offered to 56% of all Belgian women diagnosed with corpus uteri cancer between 2012 and 2016, and may therefore act as a powerful tool for measuring and improving the quality of corpus uteri cancer care in Belgium. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09671-5.


Background
Cancer of the uterine corpus is a common disease worldwide, particularly in high-and middle-income countries where the highest incidence rates are seen [1]. In Belgium, with 1352 new cases in 2019, it is the most common cancer of the female genital tract and the fifth most frequent female cancer overall [2]. Furthermore, with 382 related deaths in 2018, it is also the seventh most common cause of cancer-related mortality among Belgian females [3]. This burden is projected to further increase for women over the age of 70 years [4].
In recent years, the management of corpus uteri cancer has changed and improved substantially. However, several aspects of its treatment remain highly controversial [5][6][7][8], such as the role of lymphadenectomy in staging and treatment [9,10]. As a result, wide variations in clinical practice are noticed between hospitals in Belgium, whereby many patients receive a suboptimal quality of care not according to guidelines [11]. In our opinion, this constitutes one of the major concerns for women diagnosed with corpus uteri cancer in Belgium.
The EFFECT (EFFectiveness and quality of Endometrial Cancer Treatment) project was launched with the objective of obtaining more uniformity and quality in the treatment of corpus uteri cancer in Belgium [12]. Quality of care (from diagnosis to follow-up) will be measured by means of quality indicators [13], and improved by means of feedback and benchmarking to the hospitals involved [14]. For this purpose, EFFECT has prospectively collected detailed information on the real-world clinical care offered to 4063 Belgian women diagnosed with primary corpus uteri cancer between 2012 and 2016. This information was collected via an online registration module of the Belgian Cancer Registry, and this on a voluntary basis. The major advantage of this approach is that it enables a highly detailed and meaningful assessment of the clinical care that was offered by hospitals and healthcare teams [15]. However, due to its voluntary nature, the major disadvantage of this approach is that data may potentially be incomplete and suffer from selection bias, as was demonstrated in a highly similar quality of care initiative that was performed in Belgium in the context of rectal cancer (i.e., PROject on CAncer of the REctum; PROCARE) [16,17]. The presence of such a selection bias is not necessarily problematic, as long as you identify it, characterize it, and take it into account in the analyzation and interpretation of the data. Therefore, this present study aimed to assess and characterize the completeness and potential selection bias of the EFFECT database.

Data sources
Four databases were deterministically coupled by use of the patient's national social security number as unique identifier: the Belgian Cancer Registry (BCR) database, the database from the InterMutualistic Agency (IMA), the Crossroads Bank for Social Security (CBSS), and the EFFECT database. Due to cancer registration being compulsory in Belgium, the BCR is a national populationbased registry that covers basic information (regarding both the patient and the tumor) on at least 98% of all incident cancer diagnoses in Belgium [18,19]. Consequently, the BCR serves as the gold standard for cancer registration in Belgium. The IMA is a national registry covering information on the (cancer-related) diagnostic and therapeutic procedures, as well as pharmaceuticals, reimbursed to the patient by the Belgian compulsory health insurance. The CBSS covers data on the vital status of the patient, amongst other things. Finally, a fifth database was provided by the public health authorities covering the characteristics of all Belgian hospitals that were recognized as a general acute hospital on December 31 st , 2016 [20].

Study population
The following patients were retrieved from the BCR database: all 7239 Belgian women that were diagnosed between 2012 and 2016 with a primary corpus uteri cancer (C54-C55; International Classification of Diseases for Oncology, third edition) eligible for EFFECT. See the online manual for the in-and exclusion criteria of the EFFECT project [21]. Patients for whom IMA data was not available (n = 152) were excluded. Furthermore, patients for whom IMA data was less reliable were also excluded: cases with a synchronous malignancy (n = 471) or an uncertain incidence date (n = 8). Finally, patients for whom the center of main treatment could not be identified were also excluded (n = 9) (see below). A synchronous malignancy was defined as a second primary cancer diagnosed in the timeframe of 3 months prior to until 12 months after corpus uteri cancer incidence, regardless of topography and morphology, except nonmelanoma skin cancer. This way, a final cohort of 6599 patients was included.

Hospital allocation and hospital volume
IMA data allowed us to identify the hospital(s) where the patient was treated. By use of the following algorithm, patients were allocated to one specific hospital defined as the center of main treatment: first, if all care was performed in one single hospital, this center was considered as the center of main treatment; second, if care was performed in more than one hospital, the following priority rules were applied for defining the center of main treatment: center of (a) curative surgery, (b) chemotherapy, (c) radiation therapy, (d) hormone therapy, (e) multidisciplinary tumor board (MDT), (f ) diagnostic biopsy, and (g) diagnostic imaging. If no treatment centers were known, the patient was assigned to the hospital that registered the patient to the BCR. The center of main treatment could not be identified for nine patients.
A hospital's volume was then defined as the number of patients that underwent their main treatment in that specific hospital over the period 2012-2016. Volume was categorized in low-, medium-, and high-volume based on the average annual volume and by use of the following cut-off values: < 10, 10-19, and ≥ 20 patients treated on average per year, respectively. Cut-off values are arbitrary and based on expert opinion, as well as on the need to have a balanced repartition of centers and patients over the volume categories.

Patient subgroups
The study population was categorized into four patient subgroups: (a) patients registered for EFFECT (Registered EFFECT-Patients, REP); (b) patients not registered for EFFECT that underwent their main treatment during a participating center's active registration period, and therefore should have been registered (Non-Registered EFFECT-Patients, Non-REP); (c) patients not registered for EFFECT that underwent their main treatment outside of a participating center's active registration period, and therefore could not have been registered (Non-EFFECT-A); and (d) patients not registered for EFFECT that underwent their main treatment in a non-participating center, and therefore could not have been registered (Non-EFFECT-B). A participating center's active registration period was determined by chronologically ranking all its registered cases based on their incidence date, and defined as starting from the first until the last incidence date. See Fig. 1 for more detailed information.

Participation and registration bias
Hospital participation bias was assessed and characterized by identifying characteristics associated with the (non-)participation of hospitals in EFFECT. Patient registration bias was assessed and characterized by identifying patient, tumor and treatment characteristics associated with the (non-)registration of patients by participating hospitals. For the latter, only REP and Non-REP patients were taken into account.

Statistical analyses
Summary statistics are expressed as medians and (interquartile) ranges for continuous data, and as frequencies and percentages for categorical data. Uni-and multivariable logistic regression were applied for assessing characteristics associated with hospital participation and patient registration in EFFECT. Characteristics to include in the multivariable model were selected based on clinical relevance and results of the univariable analysis (p < 0.10 was considered interesting). Goodness-of-fit was assessed by the Hosmer-Lemeshow goodness-of-fit test, the chi-squared test of the Pearson and deviance residuals, and by visual inspection of model residuals. For assessing registration bias, clustering of patients within hospitals (intra-cluster correlations) was taken into account by adding the 'center of main treatment' as random effect term to the final model. All statistical tests were two-sided and p-values below 0.05 were considered statistically significant. Statistical analyses were performed using SAS 9.4 (SAS Institute, Cary, NC, USA).

Descriptives
During the 2012-2016 period, 101 Belgian hospitals were involved in the treatment of corpus uteri cancer, of which 49.5% (n = 50) were low-volume, 30.7% (n = 31) mediumvolume, and 19.8% (n = 20) high-volume (Table 1). These hospitals took care of 22.7% (n = 1496), 31.9% (n = 2106), and 45.4% (n = 2997) of cases, respectively ( Table 2). Considering the entire study population (n = 6599), 60.6% (n = 3998) of cases were diagnosed between the age of 60 years and 79 years. When known, 79.4% (n = 4948) of cases were diagnosed with early-stage disease (stage 0-II). Endometrial carcinomas and uterine sarcomas accounted for 95.5% (n = 6302) and 4.5% (n = 297) of cases, respectively. 64.6% (n = 4073) of the carcinomas were of the type I subclass, and 28.8% (n = 1812) of the type II subclass. Curative surgery was the primary treatment for 86.9% (n = 5732) of patients, with total hysterectomy The objective is to identify those patients that were registered for EFFECT by the participating centers, and those that were not but also should have been. First, based on EFFECT data, the study population was categorized in patients registered and not registered for EFFECT. Next, within the group of non-registered cases, a further distinction was made based on whether main treatment was performed in an EFFECT-participating center and, if yes, whether the patient's incidence date falls inside the hospital's active registration period. This way, four patient subgroups were defined: (a) patients registered for EFFECT by the participating centers (Registered EFFECT-Patients; REP); (b) non-registered patients that underwent their main treatment during a participating center's active registration period, and therefore also should have been registered (Non-Registered EFFECT-Patients; Non-REP); (c) non-registered patients that underwent their main treatment outside of a participating center's active registration period, and therefore could not have been registered (Non-EFFECT-A); and (d) non-registered patients that underwent their main treatment in a non-participating center, and therefore could not have been registered (Non-EFFECT-B). A participating center's active registration period was determined by chronologically ranking all its registered cases based on their incidence date, and defined as starting from the first until the last incidence date Table 1 Center characteristics Distribution of volume, region, university status and ownership status for (a) all centers eligible for EFFECT participation (n = 101), (b) the participating centers (n = 55), and (c) the non-participating centers (n = 46). For each subgroup of hospitals with a certain characteristic, participation rate was calculated as the percentage of centers that participated in EFFECT out of all centers  a REP = Registered EFFECT-Patients, Non-REP = Non-Registered EFFECT-Patients, Non-EFFECT-A = non-registered patients treated outside of the active registration period of a participating center, Non-EFFECT-B = non-registered patients treated in a non-participating center b World Health Organization (WHO) performance status score, expressing the patient's general health condition at diagnosis, ranging from 0 (asymptomatic) to 4 (completely disabled/bedbound) [22] c Index quantifying the prevalence of three major chronic comorbid conditions (i.e., diabetes mellitus, chronic cardiovascular disease, and chronic respiratory disease), ranging from 0 (no comorbidity present) to 3 (all three comorbidities present) [  (TH) being performed most frequently (n = 3084; 53.8%). Finally, 92.7% (n = 6119) of patients were discussed in at least one MDT meeting (Table 2).

Hospital participation
Of the 101 hospitals treating corpus uteri cancer in the period 2012-2016, 55 (54.5%) did participate in EFFECT. Low-volume centers and centers from the Walloon region were significantly less likely to participate, and are therefore underrepresented in EFFECT. Based on the multivariable model, volume and region were found as the main independent, explanatory factors for the (non-) participation of hospitals in EFFECT (Tables 1 and 3).

Patient registration
Of the 7239 corpus uteri cancer cases that were retrieved from the BCR database, 4063 (56.1%) were registered in the EFFECT database. Patient registration rate varies widely between the participating centers, which registered a median of 85.7% of cases that were treated during their active registration period (interquartile range = 80.4%-94.4%, range = 41.2%-100.0%) (Fig. 2). Patients aged 80 years and older, with a WHO (World Health Organization) score of ≥ 2 or missing, with a multiple tumor, with stage IV disease or missing stage, and those diagnosed with a uterine sarcoma or other carcinoma (i.e., could not be classified as either a type I or type II carcinoma) were all significantly less likely to be registered for EFFECT by the participating centers. Likewise for patients who did not undergo curative surgery as primary treatment, who were not discussed in an MDT meeting, and who died within the first 30 days postsurgery. Based on the multivariable model; WHO score, combined stage, type of primary treatment, and discussion in an MDT meeting were identified as the main independent, explanatory factors for the (non-)registration of patients by the participating centers (Table 4; Supplementary table 1). Significant differences were also found between patients from participating centers (REP + Non-REP + Non-EFFECT-A) and those from non-participating centers (Non-EFFECT-B). The latter are older (odds ratio (OR) ≥80 years = 1.16, 95% confidence interval (CI) = 1.02-1.32), less frequently underwent treatment (OR treatment = 0.80, 95% CI = 0.64-1.00), and were less often discussed at an MDT meeting (OR MDT = 0.52, 95% CI = 0.43-0.63). Surgery rate is not different, but patients from non-participating centers more frequently underwent total radical hysterectomy (TRH) (OR TRH = 1.32, 95% CI = 1.17-1.49) and adjuvant treatment (OR adj. treat. = 1.20, 95% CI = 1.06-1.35). Finally, they also more frequently have missing data for WHO score (OR missing = 2.20, 95% CI = 1.90-2.55), combined stage (OR missing = 1.54, 95% CI = 1.24-1.92), and differentiation grade (OR missing = 1.92, 95% CI = 1.57-2.35) (ORs and CIs are calculated based on the data presented in Table 2).

Discussion
Because of its voluntary nature, this study found the EFFECT database to be incomplete and somewhat biased, both in terms of the hospitals choosing to participate and the patients being registered by participating centers. More precisely, low-volume and Walloon-region centers were less likely to participate in EFFECT. Furthermore, participating hospitals were less likely to Table 3 Center characteristics associated with hospital participation in EFFECT Estimated odds ratios (ORs) for participation of hospitals in EFFECT. ORs are expressed together with their corresponding 95% Wald Confidence Interval (CI) and P-value. P-value (specific) expresses the statistical significance of the specific comparison with the reference group (ref ), whereas p-value (overall) expresses the statistical significance of the overall association of the characteristic under investigation with the outcome of interest (i.e., hospital participation status: participating or non-participating) include patients with a less favorable risk profile, with missing data for several clinical-pathological risk factors, that did not undergo curative surgery, and that were not discussed in a multidisciplinary tumor board. Finally, clinical practice patterns were found to be different for participating and non-participating institutions. The observed participation bias could potentially be explained by the following two mechanisms. First, despite our efforts to inform all hospitals about EFFECT, low-volume and Walloon-region centers might have been informed to a lesser extent. Second, particularly lowvolume centers might not have disposed of the resources necessary to participate (e.g., time, funding, personnel and technical support). Furthermore, the observed registration bias could potentially be explained by the following three mechanisms. First, in some to many of the participating institutions, EFFECT registration might have been performed by the healthcare team itself, which might have preferred to particularly include patients that they curatively treated. Second, as many aspects of the patient's treatment scheme were known at the time of first registration, this information might have biased one's decision whether to include the patient. For instance, when standard of care was offered but refused, one could have decided not to include the patient. Third, EFFECT registration might have been more time-consuming and labor intensive for certain cases. At this point, these mechanisms are merely theoretical and therefore require further investigation.
PROCARE is a quality of care initiative that was performed in Belgium in the context of rectal cancer and was also relying on hospitals to voluntarily register healthcare data [16]. A study by Jegou et al. found the PROCARE database to be incomplete and biased in a highly similar way as EFFECT. More precisely, they also found that low-volume, Walloon-region and non-university centers were less likely to participate. Furthermore, participating centers were less likely to include patients with a less favorable risk profile and who did not undergo surgical resection. This way, the PROCARE database was found to cover 37% of all Belgian rectal cancer patients. These were registered by 72% of centers involved, which   included 56% of their cases [17]. Furthermore, a similar underreporting of hospitals and cases has also been described by other clinical audit programs relying on voluntary participation [24][25][26][27].
In line with the facilitators and barriers of clinical audit as previously described [28,29], two survey-based studies by Cornish et al. and Voeten et al. recently found that hospitals and healthcare providers generally think clinical audit programs to be a powerful and relevant tool for improving clinical practice and patient outcomes. However, lack of resources (e.g., technical support, time, personnel and funding) was found to be one of the major reasons for non-participation [30,31]. Our results reflect these findings, as most hospitals and healthcare teams had a positive attitude towards EFFECT. However, many might not have disposed of the resources necessary to participate, particularly low-volume centers.
Conflicting results have been reported by studies comparing the performance of hospitals and healthcare providers that do participate voluntarily in clinical audit with the performance of those that do not [24,26,32,33]. Similarly, although differences were found in the clinical practice of centers participating and not participating in EFFECT, whether this reflects real differences in quality of care warrants further investigation.
Altogether, for the purpose of measuring and improving the quality of cancer care, these findings highlight the feasibility of voluntarily collecting detailed information on the real-world clinical care offered to the patient, from diagnosis to follow-up. Compared to the use of routinely available administrative data, the major advantage of this approach is that it enables a more detailed and meaningful assessment of clinical practice [15]. Nevertheless, in contrast to administrative databases that are highly complete and free of bias, the major disadvantage of this approach is that such clinical databases are at risk of being incomplete and biased, both in terms of the hospitals choosing to participate and the patients being registered by the participating institutions. As a result, hospitals that would arguably benefit most from quality improvement (i.e., low-volume hospitals) tend not to participate [34][35][36][37]. Furthermore, assessing the clinical practice of participating hospitals may be complicated substantially by the bias that tends to be present in their registration of patients. Consequently, to enable meaningful interpretation and feedback, this bias should always be characterized and taken into account. Furthermore, for clinical audit programs to promote quality improvement on the national level, measures should be taken to prevent such selection bias as much as possible, as this requires coverage of all hospitals and patients involved.
Based on the aforementioned mechanisms that could be driving the observed selection bias, we present a couple of methods to potentially reduce the risk of bias in the registration of data, as this would further enhance the potential of clinical audit programs to promote quality improvement. We first suggest to make participation in clinical audit less resource intensive, so that centers and healthcare providers with less Estimated odds ratios (ORs) for being registered for EFFECT (REP) when having undergone main treatment during the active registration period of a participating center (REP + Non-REP). ORs are expressed together with their corresponding 95% Wald Confidence Interval (CI) and p-value. P-value (specific) expresses the statistical significance of the specific comparison with the reference group (ref ), whereas p-value (overall) expresses the statistical significance of the overall association of the characteristic under investigation with the outcome of interest (i.e., patient registration status: REP or Non-REP) a World Health Organization (WHO) performance status score, expressing the patient's general health condition at diagnosis, ranging from 0 (asymptomatic) to 4 (completely disabled/ bedbound) [22] b Index quantifying the prevalence of three major chronic comorbid conditions (i.e., diabetes mellitus, chronic cardiovascular disease, and chronic respiratory disease), ranging from 0 (no comorbidity present) to 3 (all three comorbidities present) [23] c Whether another primary cancer was present in the 5-year period prior to diagnosis, regardless of topography and morphology, except non-melanoma skin cancer resources may also be able to participate. This could potentially be done by making the data extraction and registration process more automated or by giving technical and/or financial support to participating institutions [28][29][30][31]. Second, we suggest to ensure that all centers and healthcare teams involved are sufficiently informed about the project. This could possibly be achieved by presenting in person the rationale and importance of the project, which should preferably be done by a colleague renowned in the field [14]. Third, we suggest patient registration to be performed by someone independent from the healthcare team, preferably a data manager specifically trained in cancer registration. Fourth, we suggest the patient to be registered at time of diagnosis, not when many aspects of the treatment scheme are already known. Finally, we suggest rewarding institutions and healthcare teams for their active participation in clinical audit, on the condition that their participation is of sufficient quality (i.e., when a high enough proportion of patients are registered without selection bias). This could potentially be achieved by some sort of accreditation. However, these suggestions are merely theoretical and therefore require further investigation.
The work presented has a couple of limitations that are mainly associated with the databases used. First, although the BCR database has an excellent coverage of nearly all incident cancer cases in Belgium, its data on WHO score, combined stage and differentiation grade was missing for a substantial number of patients. Second, although IMA data was pivotal for this study, it had some major limitations: (a) miscoding or misuse of nomenclature might have occurred; (b) nomenclature was often vague and unspecific, which made detailed analyses and interpretation of data difficult; and (c) the number of patients that underwent a certain medical procedure may have been under-or overestimated due to the impossibility to unambiguously link nomenclature to one specific indication, or to the fact that not all procedures are reimbursed (e.g., when performed in the context of a clinical trial). Different measures were taken to tackle these limitations. For example, cases with missing data were included in the analyses as a separate category within the respective variable, and patients with less reliable IMA data were excluded.
At the same time, these national population-based databases are the major strength of our study: as they are highly complete covering all corpus uteri cancer cases and institutions involved, they allowed us to accurately assess the completeness and potential selection bias of the EFFECT database.
Future studies should focus on unraveling the underlying mechanisms that are driving the selection bias observed in clinical audit programs, as well as on effective ways to counteract these mechanisms. This knowledge could then be applied to further enhance the potential of clinical audit programs to promote quality improvement in healthcare on the national level.

Conclusion
For the purpose of measuring and improving the quality of cancer care, the present study highlights the feasibility of voluntarily collecting detailed information on the real-world clinical care offered to cancer patients, from diagnosis to follow-up. Compared to the use of routinely available administrative data, the major advantage of this approach is that it enables a more detailed and meaningful assessment of clinical practice. However, in contrast to administrative databases that are highly complete and free of bias, the major disadvantage of this approach is that such clinical databases are at risk of being incomplete and to suffer from selection bias, both in terms of the hospitals choosing to participate and the patients being registered by participating institutions. This bias should therefore always be assessed and characterized, as well as taken into account in the analyzation and interpretation of the data. Furthermore, to really promote quality improvement on the national level, measures should be taken to prevent such bias as much as possible. To conclude, regardless of the observed selection bias, the EFFECT database covers detailed information on the real-world clinical care offered to 56% of all Belgian women diagnosed with corpus uteri cancer between 2012 and 2016. The database may therefore act as a unique and powerful tool for measuring and improving the quality of corpus uteri cancer treatment in Belgium.