Are socio-economic inequalities in breast cancer survival explained by peri-diagnostic factors?

Background Patients living in more deprived localities have lower cancer survival in England, but the role of individual health status at diagnosis and the utilisation of primary health care in explaining these differentials has not been widely considered. We set out to evaluate whether pre-existing individual health status at diagnosis and primary care consultation history (peri-diagnostic factors) could explain socio-economic differentials in survival amongst women diagnosed with breast cancer. Methods We conducted a retrospective cohort study of women aged 15–99 years diagnosed in England using linked routine data. Ecologically-derived measures of income deprivation were combined with individually-linked data from the English National Cancer Registry, Clinical Practice Research Datalink (CPRD) and Hospital Episodes Statistics (HES) databases. Smoking status, alcohol consumption, BMI, comorbidity, and consultation histories were derived for all patients. Time to breast surgery was derived for women diagnosed after 2005. We estimated net survival and modelled the excess hazard ratio of breast cancer death using flexible parametric models. We accounted for missing data using multiple imputation. Results Net survival was lower amongst more deprived women, with a single unit increase in deprivation quintile inferring a 4.4% (95% CI 1.4–8.8) increase in excess mortality. Peri-diagnostic co-variables varied by deprivation but did not explain the differentials in multivariable analyses. Conclusions These data show that socio-economic inequalities in survival cannot be explained by consultation history or by pre-existing individual health status, as measured in primary care. Differentials in the effectiveness of treatment, beyond those measuring the inclusion of breast surgery and the timing of surgery, should be considered as part of the wider effort to reduce inequalities in premature mortality.


Background
Patients living in more deprived localities have lower cancer survival in England [1][2][3][4]. The avoidable mortality associated with these socio-economic differences is considerable [5]. There are three potential routes by which these inequalities might arise [6,7]: tumour factors (more aggressive disease, more advanced disease arising from differential ease of access and availability of appointments, and, or screening), patient factors (differential pre-existing comorbidities, health or nutritional status, leading to less effective or under-treatment), and health system factors (differential referral patterns from primary care, or differential treatment within secondary care).
To date, the relative contribution of these mechanisms in explaining the persistence of socio-economic differences in England has focussed on a variety of factors. These include the examination of patterns of survival by screening status [8][9][10], analyses of routine data from secondary care [11][12][13][14][15][16] and the equalisation of treatment [17][18][19]. The presence of factors measured in primary health care, such as the presence of other diseases, obesity, smoking history, alcohol consumption, as well as the total number of consultations attended by the patient may also be associated with these inequalities. However, their role in explaining survival differentials has not been considered outside our own analysis of screening-eligible women diagnosed with breast cancer [20,21].
In this study, we specifically consider the relative impact of a) pre-existing individual health status (comorbidity and detrimental health behaviours) together with b) primary care consultation history upon socioeconomic patterns in breast cancer survival, using linked routine cancer registration and primary care data. These factors represent potentially modifiable factors which could help to reduce inequalities and avoidable mortality for women with breast cancer as well as for patients diagnosed with other socio-economically patterned diseases.

Data sources
The English National Cancer Registry (CR) was individually linked to Clinical Practice Research Datalink 'GOLD' (CPRD) which contains data contributed by practices using Vision® software [22] and Hospital Episodes Statistics (HES) databases. The CR-CPRD linkage took place on two different occasions: in 2010 for diagnoses 1988-2004 and in 2016 for diagnoses 2005-2010. Hospital Episodes Statistics (HES) data were available for the later period only.

Deprivation
We used ecologically-derived measures of income deprivation for each woman: quintiles of the 1991 census-based Carstairs index [23] for women diagnosed 1988-1995, and the English Indices of Multiple Deprivation (IMD) income domain from 1998 onwards [24]. Although each of these scores use slightly different underlying variables, they both aim to quantify relative deprivation by computing a score from the socioeconomic characteristics of very small areas using the census or routinely collected administrative data (Carstairs: car ownership, overcrowding, social class and unemployment, IMD: receipt of various means-tested benefits). The areas used for each score are those defined at the UK's decennial census (EDs in 1991 c.500 persons; LSOAs in 2001 and 2011 c.1500 persons but designed to be as socially homogenous as possible) and are the smallest administrative geography available at any given time point. Deprivation categories were derived from the score temporally closest to each woman's date of diagnosis on the basis of her residential address.

Co-variables
We used information from the cancer registry to derive each woman's date and age at diagnosis, tumour characteristics and date of death (if applicable). We derived stage of disease at diagnosis using all relevant available clinical information [25]. Each women's individual smoking status (non-or ex-smoker, current smoker), alcohol consumption status (non-, ex-, current drinker) and body mass index [26] were extracted from CPRD records as previously described [20]. The Charlson comorbidity score [27] was derived from data recorded in the 18month period between 2 years to 6 months before diagnosis [28] using information from both CPRD and HES data for patients diagnosed after 2005. The total number, as well as the number of "breast-related" vs. "not breastrelated" consultations along with the number of referrals for breast cancer were derived for 18-month period immediately prior to diagnosis. Breast-related symptoms included any mention of separate breast symptoms, within the same consultation or reported at different times, including breast lump, breast pain, skin changes, discharging bleeding or inverted nipples. We adopted the conservative approach of considering only consultations with a doctor (GP), excluding CPRD records relating to nurse or other practitioner appointments, as well as all administrative events such as telephone calls, letters, or the issuing of repeat prescriptions. This avoided potentially recording one symptom more than once, or inflating a woman's total number of consultations by the inclusion of non-clinical events. Time in days from the last breast-related consultation to diagnosis (as an indication of time elapsed from referral to diagnosis) was calculated for all patients and from diagnosis to first major breast cancer surgery (within 18 months, defined using OPCS-4 codes, the classification used by clinical coders within National Health Service) for women diagnosed after 2005. A specific category for missing data was available for stage, and we similarly coded women as 'missing' if no information on smoking, alcohol and BMI could be obtained. It was not possible to distinguish the difference between 'none observed' and 'missing information' for pre-existing comorbidities, symptoms, referrals or surgery. For these variables, 'none recorded' was assumed to equate to the non-observation of the relevant factor in primary and secondary care. Multinomial regression (categorical variables) and nonparametric tests for trend (continuous variables) [29] were used to assess the differences between deprivation categories.

Net survival estimation
Net survival is the survival probability the patients would experience if their only possible cause of death were breast cancer. It is independent from other causes of death (expected mortality, which varies in particular by age and deprivation) and reflects the prognosis of the disease. We estimated net survival by each co-variable using the non-parametric Pohar-Perme estimator [30,31] implemented in stns [32]: software available for Stata 16 [33]. This is the most widely used, unbiased estimator of net survival. Controlling for expected mortality (or its counterpart, expected survival) required the use of information from deprivation-specific life tables for the general population of England [34]. Survival estimates were derived for all co-variables for the data as a whole as well as by time period (1988-1998, 1999-2004 and 2005-2010).

Multivariable excess hazard modelling
We fitted flexible parametric excess hazard regression models using stcrs [35] in order to estimate the excess hazard ratio of death (i.e. death related to breast cancer) within the first 5 years following diagnosis. This approach models the excess hazard on the log-hazard scale, reducing computational intensity, and also allows the estimation of both time-dependent and non-linear covariable effects. We examined the mechanism giving rise to missing values for the four variables within the dataset with incomplete data (stage, smoking, alcohol consumption and BMI) using logistic regression. In order to account for the impact of these missing data in the analysis, we implemented a five-fold multiple imputation which was enough to obtain stable estimates and variance. Imputation models were fitted separately for each deprivation quintile to enable interactions to be considered, and included all variables of interest. Missing values for BMI were derived from a linear regression model, stage from an ordered logistic model and smoking and alcohol from multinomial regression. Estimates were recombined using Rubin's rules [36]. Initial excess hazard models included, a priori, age, year of diagnosis, deprivation and stage of disease at diagnosis. We tested for non-linearity of each of these variables using restricted cubic splines with 3 degrees of freedom (2 internal knots) for age and year, and the ordered categorical form of the variable for deprivation using the Stata sub-command mi test [36] (p-value < 0.05). Peridiagnostic variables which were observed to have a significant association with both deprivation and net survival in the univariable analyses were included in turn, first those relating individual health status, then individual consultation history in primary care. Models used all disease stages, then were subsequently fitted only to TNM stage I or II. Models were derived by follow-up time in order to assess time-variance. Finally, we repeated all analyses restricting the cohort to diagnoses 2005-2010. We used precisely the same strategy, but included in the model the number of days from diagnosis to major breast surgery and the Charlson comorbidity score derived from both CPRD and HES.

Cohort & data linkage
Out of the 733,809 persons aged 16-99 years in England recorded in the National Cancer Registry as having being diagnosed with invasive breast cancer between 1 January 1988 and 31 December 2010, we analysed 21,802 women for whom follow-up was complete up to 31 December 2014 (Fig. 1).

Descriptive analyses
A third of the women died on or before the end of followup (Table 1). Women living in deprived areas were on average 2 years older at diagnosis and less likely to be diagnosed in the screening age range 50-69 (p-value < 0.001). They were less frequently diagnosed with localised (Stage I) disease (3.3% difference. 95% CI 1.4-5.2) and much more likely to die during the study period (11.1% difference, 95% CI 8.8-13.5). They were also more likely to be recorded as current or ex-smokers (13.9% difference, 95% CI 11.6-16.3), non-or ex-drinkers (15.7% difference 95% CI 13.6-17.8), and have a recorded BMI above 24 (11.6% difference 95% CI 14.1-9.3). There was a very strong linear association with pre-existing co-morbidities and deprivation, with 82.2% of women living in the least deprived areas having no pre-existing condition compared to 70.7% of women living in the most deprived areas (difference 11.4 95%CI 9.6-13.5). Women in deprived areas had a higher mean number of consultations overall (9.6 vs 8.5, p-value < 0.001), but a slightly lower number of breast-related consultations compared with women living in more affluent areas (0.4 fewer, 95% CI 0.1-0.8).
Women living in the most deprived areas reported a similar number of breast symptoms to the GP prior to diagnosis than women living in the most affluent areas (53.9% vs 53.1% reporting at least 1). However, women living in middle-to deprived areas (quintiles 3 and 4) reported fewer (p-value < 0.01). The average time from symptom report to diagnosis was longest amongst women in the most affluent two quintiles (32.7 days) but not notably shorter in any other group (30.7-32.4 days). These overall patterns were similar in the data set restricted to diagnoses after 1 January 2005 (data not shown).
Using information from the HES database in order to calculate the Charlson co-morbidity score for women diagnosed after 2005 did not add much: 76.4% of the cohort were identified as having no comorbidities without HES data in comparison to 71.4% with ( Table 1). The distribution of co-morbidities overall was similar with 17.5% having one significant co-morbidity. A similarly strong association with deprivation was also evident (p-values both < 0.001). Major breast surgery was identified in 71.6% of the women in the cohort. More deprived women tended to have surgery slightly sooner overall (2.5 days earlier, 95% CI 0.4-4.7), and were more likely to have surgery at the time of or before diagnosis (11.3% in the most deprived vs 6.7% in the least, difference 4.5, 95% CI 6.4-2.6).

Univariable survival analyses
Five-year net survival increased from 71.4% (95% CI 69.8-73.0) to 76.6% (95% CI 75.9-77.4) over study period. Women living in more deprived localities had lower survival, the difference between the least and most deprived in survival (the survival 'gap') equal to 9% 5 years after diagnosis and 14% 10 years after diagnosis for women diagnosed during the period 2005-2010 (the post-screening era, Fig. 2a). Older women and those  diagnosed at later stages displayed substantially poorer outcomes (Fig. 2b, c). Smoking status was not associated with net survival (Fig. 2e) and thus not included in the multivariable modelling. Current drinkers had better survival than non-or ex-drinkers whereas those with greater numbers of comorbidities had increasingly worse survival (Fig. 2f, d). Underweight and obese women diagnosed up to 2004 had poorer outcomes compared to those who were normal or overweight, but in the period 2005-2010 only underweight women experienced lower net survival (Fig. 3). Those with either no consultations, or more than 11 for any reason in the 18 months prior  to diagnosis had worse outcomes than those who had between 1 and 10 visits to the GP, as did those who had fewer than two breast-related consultations, those whose time from last symptom report to diagnosis was shorter, and those whom received a single or no referral (Fig. 4a-e). Among women diagnosed after 2005, survival was similar irrespective of time from diagnosis to breast surgery, except amongst women whose time to surgery was greater than 2 months or surgical status was missing amongst whom survival was dramatically worse (Fig. 4f).

Multivariable excess hazard modelling
After accounting for age, year and stage at diagnosis, a single unit increase in deprivation quintile was associated with a significant 4.4% (95% CI 1.4-8.8) increase in excess mortality due to breast cancer across all periods of follow-up time (Fig. 5a) in the imputed data. Amongst women diagnosed with stage I or II disease, the differential was greater (7.6, 95% CI 0.9-14.6) but of borderline significance. These hazard ratios equate to a 17.5% (or 30.3% for stages I & II) mortality differential between the most affluent and most deprived groups. A similarly consistent linear association was observed amongst women diagnosed 2005-2010 (Fig. 5b) and for all the different age groups (data not shown). The inclusion of co-variables relating to individual health status, primary and secondary care had almost no impact on the magnitude of the differential amongst those diagnosed 1988-2010 and minimal impact for those diagnosed 2005-2010. Significant variables in the multivariable models were restricted to alcohol intake, comorbidity and the number of breast consultations. The number of breast symptoms reported was significant for all women across the study period but not for those diagnosed with early stage disease nor those diagnosed after 2004. Time to breast surgery (available only for women diagnosed after 2004) significantly improved the fit but did not alter the magnitude of the association.
Women diagnosed with early stage disease between 2005 and 2010 who were living in areas categorised as quintile 2 had lower excess mortality than women living in quintiles 1, 3, 4 or 5 (Fig. 5c). Similar to the above, only alcohol consumption and comorbidity improved fit of these non-linear stage-adjusted models, but the number of consultations did not. Time to breast surgery improved the model fit but reduced the magnitude of the associations slightly.

Summary
We have shown that individual health status at diagnosis and primary care consultation history vary by deprivation status but do not explain socio-economic differences in breast cancer survival in this cohort as far as can be established from these data. A persistent and consistent increase in deprivation-specific cancer mortality was observed. Although the association did not reach significance for women diagnosed most recently, its magnitude was almost identical to that for the period as a whole. The accuracy and completeness of some fields utilised in this study could be improved. Nevertheless these data support the null hypothesis that socio-economic differentials in breast

Strengths and limitations
We used a unique, national, population-based, individually-linked database. This included three separate measures of individual health status, a single measure of pre-existing comorbidities and prediagnostic consultation rate both overall and for breast complaints specifically. We used the most up-to-date survival analysis methodology [32] combined with deprivation-specific estimates of background mortality, and have simultaneously examined the impact of  multiple peri-diagnostic factors upon the excess hazard due to the disease [35].
We defined a woman's deprivation category based upon the characteristics of her local area. Consequently, we have demonstrated the influence of ecologicallymeasured deprivation, rather than of individual circumstances. Although LSOAs are designed to be as socially homogenous as possible, it is probable that more deprived individuals are distributed across the different quintiles of ecological deprivation. Since personal socio-economic data are not available in either the CPRD or cancer registration databases evaluating the direct impact of individual deprivation is not possible in these data. The differentials we identify will are thus likely to reflect the impact of both environmental (contextual) and individual deprivation. The extent to which each are independently influential remains to be demonstrated.   Our database included a substantial proportion of missing data, most importantly for stage, alcohol consumption, and BMI. We accounted for these by multiple imputation methods. Although we examined the likely mechanisms giving rise to missing data, some residual bias may still be present. For comorbidity, consultations, referrals and symptoms there was no missing data simply because it was not possible to distinguish between, for example, a patient with no pre-existing comorbidities and one with unrecorded comorbidities. Further, our measures of BMI, smoking and alcohol only capture a part of the differences in underlying health status, nutrition and physical activity. Residual confounding is thus likely to be present. Our analysis of number of symptoms, referrals, comorbidities, and time to major breast surgery assumed that 'none recorded' equated to 'none observed'. This is a limitation as some of these groupings are likely to, in fact, represent persons who did report symptoms, were referred or received surgery but for whom this information is missing. In particular, it has been noted that affluent women are more likely to undergo surgery in the private sector, which is undetectable in the HES database [11]. We did not have very detailed information on surgery (mastectomy, breast conserving therapy) or other types of treatment received (radiotherapy, chemotherapy, hormonal treatments) which may potentially explain some of the differences observed and could be included in further analyses for periods in which these fields are more complete. For example, the effectiveness of the surgery (experience of surgeon, hospital, and neo-adjuvant therapies given) may vary with deprivation. Finally, we were unable to define   women by ethnicity in these analyses. Black women are known to have lower breast survival than White or South Asian women [9], in part due to more aggressive tumours. We were unable to account for this but it is unlikely to substantially bias our results since Black women are a very small proportion (< 3%) of the overall population [37].

Comparison with existing literature
These data are consistent with those we previously reported which showed that neither individual health status nor primary care consultation patterns explain much of survival inequalities amongst women diagnosed in the screening age range [20], as well as a notable 'J' shaped relationship between deprivation and survival for women with early stage disease [38]. The data we present here on stage I & II disease are also consistent with our demonstration that socio-economic differentials in net survival are present amongst women whose tumour was screen-detected [10].
More deprived women in our study were no less likely to consult their GP, in fact, they consulted slightly more and reported a similar number of symptoms. This may seem counter-intuitive given their more advanced disease at diagnosis and lower survival. However, it is consistent with other data from the UK [39,40], Denmark [41] and Australia [42], as well as with an ecological study of healthcare trusts in England which showed symptom awareness for breast cancer was similar across the socio-economic spectrum, although help-seeking behaviours were slightly lower in more deprived areas [43]. Breast cancer is characterised by especially short prediagnosis presentation intervals [44] which may suggest that the lack of association observed here between peridiagnostic factors and survival is unique to this malignancy. However, peri-diagnostic consultation rates have also been shown to be similar amongst colon cancer patients presenting as emergencies compared to nonemergencies [45]. Since emergency presentation is much more frequent amongst more deprived patients [46] this lends weight to the interpretation that the lower cancer survival experienced by more deprived cancer patients in general are not primarily related to differential use or access to primary care.

Implications for future research
This study has shown that the underlying reasons for socio-economic differentials in cancer survival are elusive but are not likely to fall exclusively in the peridiagnostic period. It is known that more deprived women are disproportionately diagnosed with the most aggressive, triple negative tumours which may partially explain these observations [47,48]. The fact that a greater number of deprived women had major surgery at the time of or prior to diagnosis may further suggest that they are more frequently diagnosed via the emergency route or opportunistically, but this is known to be rare for breast cancer. Beyond this, timing of surgery did not strongly influence survival except where it was > 2 months or missing, and was not strongly socioeconomically patterned (Table 1). However, it remains the case that variations in treatment effectiveness, beyond the inclusion of major breast surgery and the timing of surgery [21], may have a significant role in determining differentials in outcomes. Future investigations might examine differences in the types of hospital patients to travel to [49], differential experience and resources available in different centres [50], as well as the types of treatment and follow-up patients are offered, or opt to receive [51], and the timing of each of these events.

Conclusions
We have demonstrated that socio-economic inequalities in survival in these data cannot be explained by consultation history or by pre-existing individual health status, as measured in primary care. The absolute impact of the differentials demonstrated here is relatively small for women with breast cancer since the excess mortality rate itself is now, mercifully, fairly low. However, it is probable that these patterns are suggestive of a tendency towards differential treatment effectiveness which has wide ranging implications for cancers or other diseases with socio-economically patterned outcomes where treatment effectiveness is likely to be similarly differentiated. Since reducing inequalities in premature mortality is a major focus of current health policy in England [52], effort should be made to develop a better understanding of the causes and perpetuation of socio-economic health differentials in secondary as well as primary care.

Availability of data and materials
The data used in this article are not publically available but can be accessed via CPRD: www.cprd.com

Declarations
Ethics approval and consent to participate These data were released under national statutory approvals from The Confidentiality Advisory Group (CAG): PIAG 1-05(c)2007, PIAG 3-06(f) 2008 and national ethical approvals from the Research Ethics Committee (REC): 13-LO-0610, 08-H1102-46. Data cannot be shared publicly because we do not own these data and are not permitted to share them in the original form. Data are available from the Clinical Practice Research Datalink (contact via https://www.cprd.com/) for researchers who meet the criteria for access to confidential data. There is a standard process for accessing this data where researchers need to get a scientific approval of their protocol by an independent scientific advisory committee (ISAC) of CPRD, sign a license agreement for data use and pay fees for the data. The authors did not receive any special privileges and applied for data access via the same route. The read code lists underlying the results presented in the study are available from the LSHTM Data Compass and are freely available for download from https://datacompass.lshtm.ac.uk/.

Consent for publication
Not applicable.