The European Prospective Investigation into Cancer and Nutrition (EPIC) is a multi-center prospective cohort study designed to investigate the relationships between diet, nutrition and metabolic factors and cancer, consisting of approximately 360,000 women and 150,000 men aged mostly between 25–70 years [16, 17]. All participants were enrolled between 1992 and 2000 and came from 23 regional and national research centers located in 10 western European countries: Denmark, France, Italy, Germany, Greece Norway, Spain, Sweden, The Netherlands and the United Kingdom. Extensive details about the standardized procedures for recruitment, measuring baseline anthropometry, questionnaires on current habitual diet, reproductive and menstrual history, exogenous hormone use [OC and hormone replacement therapy (HRT) use], medical history, lifetime smoking and alcohol consumption history, occupation, level of education and physical activity and biological sample collection at study centers are given elsewhere [16, 17]. All subjects gave written informed consent. The Internal Review Boards of the International Agency for Research on Cancer and the local ethics committees in participating countries approved the analyses based on EPIC participants.
Of the approximately 360,000 female participants in EPIC, women were excluded a priori if they had a history of cancer prior to recruitment or were missing a diagnosis or censoring date, thus leaving 345,153 participants. At the time of this analysis, three EPIC study centers, (Granada, Murcia and Malmo), did not provide any information on breast tumor hormone receptor status and therefore were excluded from this analysis (n = 26,091). Women were further excluded if they were missing questionnaire data (n = 526) or were missing data on age at menarche, age at menopause, age at first full-term birth, ever use of OCs, number of full-term births, age at last full-term childbirth and duration of breastfeeding (n = 7,439). This left a total of 311,097 women with 9,456 first primary invasive breast cancer cases from 10 countries for the present analysis.
Questionnaire data and classification of reproductive variables
The details of standardized procedures for collecting baseline information on the age at first and last menstruation, parity, breastfeeding, exogenous hormone use, and hysterectomy from the general lifestyle questionnaire has been previously reported [17, 18]. Briefly, in Greece, Italy, the Netherlands, Sweden and the United Kingdom, age at menarche was asked in years. In the other countries, age at menarche was asked in defined categories (≤8, 9, 10,…, 18, 19 or >19 years). The number of full-term pregnancies (defined as the sum of all live and stillborn children born) and spontaneous or induced abortions were also collected at baseline, together with the ages of the first three and last deliveries and the ages at first and last induced or spontaneous abortions and stillbirths. Except for Norway and the Swedish center Umeå, where information about multiple pregnancies was available, the number of pregnancies is overestimated as multiple pregnancies were counted as different pregnancies. The length of time between menarche and age at first pregnancy was estimated among women who had menarche between the ages of 8 and 20 years (time between menarche and first full-term birth = age at first full-term birth – age at menarche).
Women were considered postmenopausal at recruitment if they had had no menstrual cycles in the last 12 months, were older than 55 years (if the menstrual cycle history was missing), or had a bilateral oophorectomy. Women who were aged 46–55 years and had incomplete or were missing questionnaire data on menstrual history were classified with a peri/-or of unknown menopausal status. Women were deemed premenopausal if they reported regular menstrual cycles in the last 12 months or if they were younger than 46 years of age (if the menstrual cycle history was missing).
The details of standardized procedures for measuring height and weight at EPIC study centers has also been previously reported . In most countries, height, weight and waist and hip circumferences were measured to the nearest centimeter and kilogram, in light clothing, according to standardized protocols. In Norway, Umeå and a large proportion from France, subjects’ height and weight were measured and self-reported by the cohort participants themselves, following detailed instructions [17, 19]. For subjects that had neither self-reported nor measured weight or height data, the center-, age- and gender-specific average weight and height values were imputed for anthropometry variables used for adjustment purposes only. A sensitivity analysis that restricted the adjusted variables to those without imputation showed similar results to those presented (data not shown).
Prospective ascertainment of breast cancer cases and the coding of receptor status
In all countries (except for France, Germany and Greece) incident breast cancer cases were identified using record linkage with cancer and pathology registries. In France, Germany and Greece, cancer occurrence was prospectively ascertained through linkage with health insurance records and regular direct contact with participants and their next of kin, and all reported breast cancer cases were then systematically verified against clinical and pathological records. Mortality data were coded according to the 10th Revision of the International Statistical Classification of Diseases, Injuries, and Causes of Death (ICD-10), and cancer incidence data were coded according to the International Classification of Diseases for Oncology (ICD-O-2). Invasive (primary, malignant) breast cancer cases were classified as per the International Classification of Diseases for Oncology (Topography C50), second revision (ICD-O-2). Breast tumor receptor status was standardized across EPIC centers using the following criteria for a positive expression: ≥10% cells stained, any ‘plus-system’ description, ≥20 fmol/mg, an Allred score of ≥3, an IRS ≥2, or an H-score ≥10 .
Vital status was collected from regional or national mortality registries. The last updates of endpoint data for cancer incidence and vital status were between 2005 and 2010, depending on the center. Women were considered at risk from the time of recruitment until breast cancer diagnosis or censoring (age at death, loss to follow up, end of follow up, or diagnosis of other cancer) respectively. A total of 7,095 breast cancer cases had information on ER status (5,723 ER-positive, 1,372 ER-negative); of which, 5,843 had further information on PR status (3,567 ER+PR+, 1,078 ER+PR-, 200 ER-PR+, 998 ER-PR-).
Associations between reproductive factors and the risk of breast cancer subtype were evaluated using Cox proportional hazards models to estimate hazard ratios (HR) and 95% confidence intervals (CIs). Breast cancer subtypes were defined as jointly classified ER+PR+ or ER-PR- breast tumors. Results for ER-positive versus ER-negative (ignoring PR status); and PR-positive versus PR-negative (ignoring ER status) were generally similar to the jointly defined ERPR breast cancer subtypes and have been included in Additional file 1: Table S1. Results for breast tumors with discordant ER and PR status and unknown ER and/or PR status have been reported in Additional file 2: Table S2. All analyses were stratified by age at recruitment in one-year categories and by study center, to prevent violations of the proportional-hazard assumption. Trend tests across levels of exposure categories were performed on the continuous categorical variables entered as ordered, quantitative variables into the models.
Age at menarche was categorized as ≤12, 13–14 and ≥15 years and time between menarche and first full-term childbirth as <10 and ≥10 years. Parity related variables were divided into the following categories ever vs. never, number of full-term pregnancies (1, 2, ≥3), age at first full-term childbirth as ≤19, 20–24, 25-29,30-34, ≥35 years, age at last full-term childbirth since recruitment as ≤24, 25–29, 30-34, ≥35 years and time since last child birth as ≤20 and >20 years. Breastfeeding was categorized as ever versus never, and ≤1 month, 2-3, 4–6, 7–12, 13–17 and ≥18 months for total cumulative duration of breastfeeding. Dichotomized categories of ever vs. never having had a spontaneous or induced abortion, ever vs. never OC use, and current versus not currently using OCs (at baseline) also were analyzed. The duration of OC use was categorized into ≤1, 2–4, 5–9, and ≥10 years. Age at menopause was divided into the categories ≤48, 49–50, 51–54 and ≥55 years.
A basic model stratified by age and center and a multivariable model further adjusted for body mass index (BMI kg/m2, as a continuous variable), height (as a continuous variable), menopausal status at enrolment (premenopausal, peri-/unknown menopausal, postmenopausal [natural and surgical menopause], HRT use (premenopausal, ever use, never use and missing in postmenopausal women only), smoking status (current, former, never, missing), baseline alcohol consumption (non-consumers, 0.1–6 g/day, 6-12 g/day, 12-24 g/day, 24-60 g/day and greater than 60 g/day, missing), physical activity (Cambridge Index: active, moderately active, moderately inactive and inactive, missing ), education level (none, primary school, technical/professional school, secondary school, longer education including university degree, missing) were assessed. Missing values (generally <2%) were accounted for by creating an extra category in each covariate.
To avoid collinearity when studying the joint effect of the number of full-term pregnancies, age at first and last full-term childbirth and time since last childbirth, we used the approach described by Heuch et al.. In analyses including age at last full-term childbirth and time since last childbirth in an age adjusted model, the general age effect was represented by the age effect among nulliparous women. We assigned constant values for age at full-term childbirth and time since last full-term childbirth (corresponding to the reference categories) to nulliparous women, and indicator variables were introduced in the model to ensure that the risk estimates reflected effects in parous women only.
Differences in risk estimates of a given factor and across breast cancer subtypes were analyzed using the data augmentation method as described by Lunn and McNeil, using a likelihood ratio test to compare the model with and without interaction terms between the exposure of interest and breast cancer subtype . Women were considered at risk of a given breast cancer subtype until they were diagnosed with a different competing breast cancer subtype or were diagnosed with breast cancer and the receptor status information was missing. These women were censored at the time of occurrence of the competing breast cancer subtype . To assess whether breast cancer subtype reproductive risk factors changed across women after menopause, left and/or right side censoring was used to count person years within defined age periods <50 years, and ≥50 years. As no differences were observed between risk estimates of reproductive risk factors and breast cancer subtype risk across the age-bands we report results for all women combined. A sensitivity analysis restricting to cases with any indication of an ER and PR expression versus a complete absence of ER and PR expression (0% cells stained, a “-“ description (i.e. a negative/minus symbol description), 0 fmol/mg, an Allred score of 0, an IRS = 0, or an H-score = 0) was also performed. Heterogeneity in the risk associations between subgroups by age at diagnosis (<50 vs. ≥50 years), OC use, center, median BMI, age at first pregnancy and ever having breastfed were also examined using the Cochran’s Q statistic. A previous analysis on postmenopausal HRT use has been reported in the EPIC cohort, therefore HRT use as a predictor of breast cancer risk by HR status was not included in this analysis . All statistical analyses were performed using the SAS software package, version 9.2 (SAS Institute, Cary, NC).