AHRR methylation in heavy smokers: associations with smoking, lung cancer risk, and lung cancer mortality

Background A low level of methylation at cg05575921 in the aryl-hydrocarbon receptor repressor (AHRR) gene is robustly associated with smoking, and some studies have observed associations between cg05575921 methylation and increased lung cancer risk and mortality. To prospectively examine whether decreased methylation at cg05575921 may identify high risk subpopulations for lung cancer screening among heavy smokers, and mortality in cases, we evaluated associations between cg05575921 methylation and lung cancer risk and mortality, by histotype, in heavy smokers. Methods The β-Carotene and Retinol Efficacy Trial (CARET) included enrollees ages 45–69 with ≥ 20 pack-year smoking histories and/or occupational asbestos exposure. A subset of CARET participants had cg05575921 methylation available from HumanMethylationEPIC assays of blood collected on average 4.3 years prior to lung cancer diagnosis in cases. Cg05575921 methylation β-values were treated continuously for a 10% methylation decrease and as quintiles, where quintile 1 (Q1, referent) represents high methylation and Q5, low methylation. We used conditional logistic regression models to examine lung cancer risk overall and by histotype in a nested case-control study including 316 lung cancer cases (diagnosed through 2005) and 316 lung cancer-free controls matched on age (±5 years), sex, race/ethnicity, enrollment year, current/former smoking, asbestos exposure, and follow-up time. Mortality analyses included 372 lung cancer cases diagnosed between 1985 and 2013 with available methylation data. We used Cox proportional hazards models to examine mortality overall and by histotype. Results Decreased cg05575921 methylation was strongly associated with smoking, even in our population of heavy smokers. We did not observe associations between decreased pre-diagnosis cg05575921 methylation and increased lung cancer risk, overall or by histotype. We observed linear increasing trends for lung cancer-specific mortality across decreasing cg05575921 methylation quintiles for adenocarcinoma and small cell carcinoma (P-trends = 0.01 and 0.04, respectively). Conclusions In our study of heavy smokers, decreased cg05575921 methylation was strongly associated with smoking but not increased lung cancer risk. The observed association between cg05575921 methylation and increased mortality in adenocarcinoma and small cell histotypes requires further examination. Our results do not support using decreased cg05575921 methylation as a biomarker for lung cancer screening risk stratification.


(Continued from previous page)
Conclusions: In our study of heavy smokers, decreased cg05575921 methylation was strongly associated with smoking but not increased lung cancer risk. The observed association between cg05575921 methylation and increased mortality in adenocarcinoma and small cell histotypes requires further examination. Our results do not support using decreased cg05575921 methylation as a biomarker for lung cancer screening risk stratification.
Keywords: Lung cancer, Epidemiology, Biomarkers/serum biomarkers, Methylation, AHRR, CARET, Mortality Background Exposure to cigarette smoke is associated with altered DNA methylation at thousands of individual cytosineguanine dinucleotide (CpG) sites across the genome in both blood and lung tissue based on results from at least 73 epigenome-wide association studies (EWAS) [1]. The most consistent association for any CpG with smoking is decreased methylation at cg05575921 in the aryl hydrocarbon receptor repressor gene (AHRR), which has been associated with cigarette smoking in whole blood samples in at least 30 EWAS [1]. The cg05575921 locus typically shows the largest absolute difference in methylation by cigarette smoking relative to other individual CpGs [2][3][4][5][6][7][8][9][10][11]. Longitudinal studies have shown that decreased methylation of cg05575921 persists in former smokers compared to never smokers, and that methylation gradually increases with time since cessation [5,11,12].
Cg05575921 is located in an AHRR gene enhancer, and decreased methylation in this region results in increased AHRR gene expression in both blood [13,14] and lung tissue [15][16][17]. Greater AHRR expression inhibits the aryl-hydrocarbon receptor, which among other functions, regulates toxicity of polycyclic aromatic hydrocarbons (PAHs) [18]. Since cigarette smoke contains PAHs, it has been hypothesized that decreased AHRR methylation induced by cigarette smoking may be a mediator in lung cancer development [19]. Several epidemiologic studies support this hypothesis and report that a low level of cg05575921 methylation is associated with increased lung cancer risk [4,9,[19][20][21][22]. However, these reports all include light and never smokers. While decreased cg05575921 methylation has been reported to be associated with all-cause mortality [9,12], the relationship between pre-diagnosis cg05575921 methylation and mortality in lung cancer cases is less clear. One casecohort study reported increased lung cancer-specific mortality [23], but results were not presented by histotype, which could limit the examination of associations among tumor subgroups with known differences in treatment response and mortality. To our knowledge, no studies to date have examined associations with prediagnosis cg05575921 methylation and mortality, allcause or lung cancer-specific, among lung cancer cases.
Since a low level of cg05575921 methylation is highly correlated with increased smoking exposure, and has been reported to be associated with lung cancer risk, it is an appealing marker to examine for risk stratification for lung cancer screening. Since 2014, the United States Preventive Services Task Force (USPSTF) has recommended annual lung cancer screening for individuals aged 55-80 years who have at least 30 pack-year smoking histories and are current or former smokers who quit within the past 15 years [24]. An updated 2020 draft USPSTF recommendation statement broadens screening eligibility to include those aged 50-80 with 20 or more pack-year smoking histories, still among current or former smokers who quit within the past 15 years [25]. In order for a biomarker to improve lung cancer screening risk stratification by minimizing false-positive screens, it must be associated with lung cancer risk among individuals who are eligible for screening. We sought to disentangle the relationships between cg05575921 methylation, lung cancer risk, and lung cancer mortality in a nested case-control study of heavy smokers generally representative of a lung cancer screening-eligible population.

Methods
Our study includes a subset of participants from the multicenter β-Carotene and Retinol Efficacy Trial (CARET) [26]. CARET was a randomized, doubleblinded, placebo-controlled trial designed to assess the safety and efficacy of daily β-carotene and retinyl palmitate supplementation in heavy smokers at high risk of developing lung cancer [26][27][28]. From 1985 to 1994, CARET enrolled 14,254 men and women ages 50-69 years who were current or former smokers (quit ≤ 6 years prior to enrollment) with ≥ 20 pack-year cigarette smoking histories. Men with occupational asbestos exposure ages 45-69 years who were current or former smokers (quit ≤ 15 years prior to enrollment) were also enrolled (n = 4060). Smoking status, smoking history, and other risk factors were collected via annual questionnaires. Whole blood samples were collected at visits between 1994 and 1997. The intervention was stopped in 1996 due to higher lung cancer incidence and overall mortality rates in the intervention versus placebo arm.
Within our larger matched case-control study designed to examine genetic factors and lung cancer risk described in [29], we generated whole-genome DNA methylation data for 350 lung cancer cases identified during active follow-up between 1985 and 2005, and one matched control per case. The case-control pairs were matched on enrollment characteristics including age (±4 years) and smoking status, as well as sex, race/ethnicity, enrollment year (±2 years), and history of occupational asbestos exposure. Controls were cancer-free at least as long as their corresponding case through 2005.
DNA was extracted from whole blood using QIAGEN QIAmp DNA Blood Midi Kits (n = 348 cases, n = 347 controls) and 5PRIME ArchivePure DNA Purification Kits (n = 2 cases, n = 3 controls). DNA methylation was assayed in a single batch using the Illumina Human-MethylationEPIC BeadArray at the University of Southern California Epigenomics Core Facility following standardized protocols from Illumina, Inc. We performed data quality control, preprocessing, and Noob+ β-mixture quantile normalization using the minfi and wateRmelon Bioconductor packages [30,31], described in detail previously [32]. Analytical β-values, representing percent methylation, were obtained for the cg05575921 locus.
Since blood was collected at post-enrollment study visits, and DNA methylation is influenced by age and smoking status, we re-matched among the 350 casecontrol pairs using age (±5 years) and smoking status (current or former) at blood draw, rather than at enrollment, as well as sex, race/ethnicity, enrollment year (±2 years), asbestos exposure, and duration of follow-up. A total of 322 case-control pairs were able to be rematched, but three pairs missing data on body mass index (BMI) were removed, resulting in 319 pairs in our previous study [32]. For the present analysis, we included the three pairs missing BMI, but we discovered that there were six mismatched pairs that were removed for the present analysis. Analyses examining cg05575921 methylation and risk of lung cancer therefore include 316 matched case-control pairs, with blood collected on average 4.3 years prior to diagnosis for the cases. Mortality analyses were performed for all 350 lung cancer cases diagnosed through 2005, plus 22 controls who developed lung cancer during passive follow-up from 2005 to 2013; blood was collected on average 4.9 years prior to diagnosis for this larger case group.

Statistical analysis
We categorized cg05575921 percent methylation into quintiles, with quintile 1 (Q1, referent) containing the top 20% of percent methylation values (i.e., hypermethylation), and Q5 containing the lowest 20% of percent methylation values (i.e., hypomethylation). Cut points for cg05575921 quintile methylation for the lung cancer risk analyses are based on the distribution of cg05575921 methylation in the controls. We used ordinal linear regression to assess linear trends of association between cg05575921 methylation quintiles and continuous participant characteristics including age, BMI, cigarettes per day in current smokers, pack years smoked, and years since cessation in former smokers. We assessed linear trends in proportions of strata for discrete participant characteristics, including race, sex, smoking status, and occupational asbestos exposure, as well as stage and histotype (adenocarcinoma, squamous cell carcinoma, or small cell carcinoma) across cg05575921 methylation quintiles using Cochran-Armitage Trend tests, or Fisher's Exact tests for variables with at least 50% of cells containing expected counts of less than five per cell.
We evaluated associations between continuous decreasing cg05575921 methylation and lung cancer risk using multivariable-adjusted logistic regression models conditioned on matching factors. In addition to a priori selected adjustment for continuous age at blood draw (to reduce residual confounding by age) and methylation-derived estimated blood cell type proportions [33,34], adjustment variables were assessed for inclusion based on biologic plausibility and/or if their addition to age-and estimated cell type-adjusted conditional logistic regression models for all lung cancer cases resulted in a ≥ 10% change in the estimated odds ratio for either quintile or continuous 10% decreased cg05575921 methylation. Final risk models were adjusted for age at blood draw, estimated blood cell proportions, and cigarettes per day at blood draw. We performed the same analysis restricted to the 242 matched pairs where both the case and control would have been eligible for lung cancer screening based on age (55-80 years) and smoking (≥ 30 pack years; current or quit < 15 years) per the 2014 USPSTF recommendation statement.
For mortality analyses, quintile cg05575921 percent methylation cut points were based on the distribution including all 372 lung cancer cases. We evaluated associations between decreasing pre-diagnosis cg05575921 methylation and lung cancer-specific and all-cause mortality using multivariable-adjusted Cox proportional hazards models with follow-up defined as time between lung cancer diagnosis and death or December 31, 2013, whichever occurred first. We included a strata variable for early, late, or unknown stage to allow for differing baseline hazards since stage at diagnosis is strongly associated with mortality [35]. Continuous age, sex, methylation-derived estimated blood cell type proportions [33,34], and time between blood draw and diagnosis were a priori selected for adjustment, and additional variables were included based on biologic plausibility and/or if their addition to a priori variable-adjusted Cox proportional hazards models for all lung cancer cases resulted in a ≥ 10% change in the estimated hazard ratio (all-cause or lung cancer-specific) for either quintile or continuous 10% decreased cg05575921 methylation. Final mortality models were adjusted for age at blood draw, sex, estimated blood cell proportions, time between blood draw and diagnosis, smoking status, and years since smoking cessation at blood draw.
We performed a sensitivity analysis excluding the three pairs where either the case or control had DNA extracted by the 5PRIME method. We also examined the possibility of interaction by sex in the mortality models, overall and by histotype, to ensure sound adjustment for sex as a confounder and not an effect modifier in our models. All analyses were performed in SAS 9.4 (Cary, NC). Statistical tests were two-sided and statistical significance testing was performed at a nominal level of P < 0.05.

Results
We observed highly statistically significant linear trends of increasing proportions of current smokers across decreasing cg05575921 methylation quintiles in both lung cancer cases and controls (P case = 2 × 10 − 22 , P control = 4 × 10 − 25 ; Table 1). Striking differences in the proportions of current smokers were observed in quintile five (Q5) compared to Q1 in both cases (90% vs 24%) and controls (89% vs 22%). Similar trends were observed across increasing quintiles with greater total years smoked (P case = 0.03, P control = 1 × 10 − 8 ), fewer years since cessation in former smokers (P case = 0.002, P control = 0.001), and more cigarettes smoked per day in current smokers (P case = 8 × 10 − 5 , P control = 0.04). We observed linear associations with increasing quintiles for increasing pack years (only statistically significant among controls: P control = 0.004; P case = 0.15), decreasing BMI (P case = 0.004, P control = 0.002), and age at blood draw (only statistically significant among cases: P case = 7 × 10 − 5 ; P control = 0.07). We observed decreasing proportions of individuals with asbestos exposure across increasing quintiles (P case = 0.05; P control = 0.003). We observed similar linear trends across decreasing cg05575921 methylation quintiles in the full 372 cases examined in the mortality analyses (Additional file 1: Table S1).
Although strong and highly statistically significant associations were observed between decreased cg05575921 methylation and aspects of smoking exposure (Table 1; Additional file 1: Tables S1-S2), there were no clear associations between decreased cg05575921 methylation and lung cancer risk overall or by histotype in the 316 matched case-control pairs after controlling for age, estimated cell type, and cigarettes per day at blood draw ( Table 2). Neither odds ratios nor linear trends reached statistical significance. While there was a nonstatistically significant greater than two-fold increased risk of adenocarcinoma in Q2 and Q5 compared to Q1, there was no linear association (P = 0.50). All odds ratios for squamous cell carcinoma were below one, but they were statistically imprecise. Similar patterns were observed in the 242 case-control pairs where both members of the case-control pair would have been eligible for lung cancer screening per the 2014 USPSTF recommendations, with the exception of small cell histotype in which a borderline linear association emerged (P-trend = 0.05; Table 3). The screening-eligible small cell histotype quintile estimates became unstable due to small counts, but in the continuous model each 10% decrease in cg05575921 methylation was associated with a reduced small cell lung cancer risk (Odds Ratio (OR) = 0.51, 95% CI: 0.28-0.93). We did not observe interactions by sex.
In mortality analyses, decreasing cg05575921 methylation was borderline-statistically significantly associated with increased lung cancer-specific and all-cause mortality for all histotypes combined (P-trends = 0.05 and 0.06, respectively; Table 4). These associations were driven by the associations in adenocarcinoma and small cell histotypes; no association was observed for squamous cell carcinoma. Among adenocarcinoma cases, we observed linear associations between decreasing cg05575921 methylation quintiles and increased lung cancer-specific mortality (P = 0.01; Q5 vs Q1 HR = 2.32, 95% CI: 1.12-4.82) and all-cause mortality (P = 0.01; Q5 vs Q1 HR = 2.37, 95% CI: 1.20-4.71). Each continuous 10% decrease in cg05575921 methylation was associated with a 21% greater risk of death in adenocarcinoma cases (lung cancer-specific 95%CI: 1.03-1.43; all-cause 95% CI: 1.03-1.41). Among small cell cases, we observed a linear association between decreasing cg05575921 methylation quintiles and increased lung cancer-specific mortality (P = 0.04; Q5 vs Q1 HR = 3.68, 95% CI: 1.32-10.25), and although the all-cause mortality quintile results were generally similar, the linear trend was not statistically significant (P = 0.09). We did not observe evidence for statistical interaction by sex in any of our mortality models.
Associations excluding individuals with 5PRIME extracted DNA were similar to the main risk and mortality results including them, respectively (Additional file 1: Tables S3-S5).

Discussion
To our knowledge, our study is the first to examine associations between pre-diagnosis AHRR cg05575921 methylation and lung cancer risk and mortality by histotype among smokers at high risk of lung cancer. We observed that cg05575921 methylation differed dramatically by smoking exposure even among this population of heavy smokers, with mean pack years of 59.3 in cases and 54.2 in controls. Though strong and highly Years since smoking cessation c ; mean (SD) 6.6 (6.4) 9.0 (7.9) 5.6 (4. statistically significant associations were observed for lower cg05575921 methylation and greater smoking exposure in our study and in others [2][3][4][5][6][7][8][9][10][11], we did not observe that lower cg05575921 methylation was associated with an increased risk of lung cancer risk overall or by histotype. However, we observed that among lung cancer cases, decreased pre-diagnosis cg055759921 methylation was associated with increased mortality for adenocarcinoma and small cell, but not squamous cell lung cancer. In prior epidemiologic publications, low levels of cg05575921 methylation have been associated with increased risks of lung cancer [4,9,[19][20][21][22]. These reports include never and light smokers, and results have not been presented by histotype. In the population-based study by Bojesen et al. of approximately 23% never smokers and current/former smokers with mean smoking histories of fewer than 40 pack years, an over fourfold increased risk of lung cancer for individuals in the lowest versus highest methylation quintiles (95% CI: 2.31-10.30) was observed after adjusting for smoking status, cigarettes per day, and pack years [9]. In four publications reporting on combinations of study populations from up to five nested case-control studies, with each individual nested case-control study comprised of 63 to 367 pairs, statistically significant 40-60% increased risks of lung cancer per standard deviation decrease in cg05575921 methylation were reported [4,19,21,22]. These results maintained statistical significance after adjustment for smoking for all but one study, which reported a statistically significant 63% increased risk that was attenuated and no longer statistically significant after controlling for smoking features (e.g., smoking status, pack years, comprehensive smoking index) [22]. In this study, cases had 20 mean pack years while controls averaged nine [22]. Our models of lung cancer risk in heavy smokers per standard deviation decrease in  Abbreviations: CI confidence interval, NSCLC non-small cell lung cancer, NSCLC, NOS non-small cell lung cancer, not otherwise specified, OR Odds ratio a Logistic regression model results, conditioned on matching factors (age at blood draw ±5 years, smoking status, sex, race, asbestos, enrollment year ±2 years, and time at risk) and adjusted for age at blood draw, estimated cell type, and cigarettes per day at blood draw b "All lung cancer cases" includes adenocarcinoma, squamous cell, and small cell, as well as 10 cases for whom histotype was NSCLC, NOS; other NSCLC; unknown or missing cg05575921 methylation were similar to the continuous 10% decrease model results shown in Table 2, with an OR = 0.91 (95% CI: 0.71-1.16) for the 316 case-control pairs after controlling for matching factors, age, estimated cell type, and cigarettes per day at blood draw. In a study that performed a supplementary analysis restricting to the 2014 USPSTF screening eligible smokers, a non-statistically significant 1.2-fold increased risk of lung cancer per standard deviation decrease in cg05575921 methylation was observed after adjustment for age, sex, pack years, and time since quitting [20]. Again, there were large differences in smoking exposure by case control status, with mean pack years of 34 for cases and 13 for controls [20]. These results are in contrast to our results per standard deviation decrease in cg05575921 methylation, which were similar to the continuous 10% decrease model results shown in Table 3, with OR = 0.85 (95% CI: 0.65-1.13) in the 242 2014 USPSTF screening-eligible pairs after controlling for matching factors, age, estimated cell type, and cigarettes per day at blood draw. An update to the 2014 USPSTF screening guidelines is in process, with the 2020 draft USPSTF recommendation statement broadening eligibility by age (50-80 years) and smoking history (at least a 20 pack-year smoking history) [25]. Based on the 2020 draft USPSTF recommendation, 93% of the case-control pairs in our study would have been eligible for screening, and thus, our findings reflect the expected associations among that group.
Consistent with our observation that decreased prediagnosis cg05575921 methylation was associated with increased mortality in heavy smoker lung cancer cases, a case-cohort study with 60 fatal lung cancer cases in a subcohort of 1565 participants observed a multivariableadjusted 1.56-fold increased hazard of lung cancerspecific death per 5% lower pre-diagnosis cg05575921 methylation (95% CI: 1.30-1.87) [23]. Histotype-specific results were not presented.
Decreased blood cg05575921 methylation is time-and dose-dependent on exposure to cigarette smoking, with cg05575921 methylation gradually increasing after a smoker quits smoking [11,19,36]. Two studies of former smokers have reported that cg05575921 methylation levels increase to never-smoker levels on average 10-22 years after cessation [19,36], while two other studies report that decreased cg05575921 methylation persists 30-35 years post-cessation [11,37]. Differences in length and condition of blood storage [38,39], DNA extraction method [38,40], and methylation quantification method [15,41] may contribute to differences in cg05575921 methylation distributions across studies. Fortunately, such between-study differences do not tend to affect differential methylation detection across individuals on a per-study basis [15,[38][39][40]. This is supported by consistent replication of strong associations between low cg0557921 methylation with smoking features across studies [2][3][4][5][6][7][8][9][10][11], regardless of storage or processing. A major strength of our study is that the population was at high risk of lung cancer due to high levels of cigarette smoke exposure. CARET selection was based on pack years smoked and time since cessation, and cases and controls were matched on current versus former smoking status at blood draw. While matching on smoking status may have ultimately limited our ability to see differences in risk and mortality with a marker that is so strongly related to smoking, our goal was to evaluate whether this marker provided information for lung cancer risk stratification above and beyond the effect of smoking.

Conclusions
Although cg05575921 is a robust marker of cigarette smoking exposure, our results suggest that low levels of cg05575921 methylation are not associated with an increased risk of lung cancer in heavy smokers, and thus do not support using this marker for risk stratification for lung cancer screening among high-risk individuals. Additional research is needed to inform on whether decreased pre-diagnosis cg05575921 methylation is associated with mortality above and beyond smoking exposure, and thus may be useful for clinical decision making for lung adenocarcinoma and/or small cell lung carcinoma.
Additional file 1: Table S1. Characteristics of lung cancer cases (n = 372) by their quintiles of cg05575921 percent methylation. Table S2. Linear regression results for quintile cg05575921 hypomethylation and smoking features. Table S3. Lung cancer risk by cg05575921 percent methylation for all lung cancer cases and by histotype, excluding n = 3 case/control pairs where one had 5PRIME DNA extraction. Table S4. Abbreviations: CI confidence Interval, NSCLC non-small cell lung cancer, NSCLC, NOS non-small cell lung cancer, not otherwise specified, HR hazard ratio a Cox proportional hazards model results adjusted for age at blood draw, sex, years between blood draw and lung cancer diagnosis, and years since quit smoking at blood draw. All models include early, late, or unknown stage as a strata variable b "All lung cancer cases" includes adenocarcinoma, squamous cell carcinoma, and small cell cases as well as not otherwise specified non-small cell lung cancer (NSCLC, NOS; n = 16) and unknown/no pathology (n = 12)