Evaluation of gene-environment interactions for colorectal cancer susceptibility loci using case-only and case-control designs

Background Genome-wide association studies (GWAS) have identified more than 40 colorectal cancer susceptibility loci, but only a small fraction of heritability was explained. To account for missing heritability, we investigated gene-environment interactions (G × Es) between GWAS-identified single-nucleotide polymorphisms (SNPs) and established risk or protective factors for colorectal cancer using both case-only and case-control study designs. Methods Data on 703 colorectal cancer cases and 1406 healthy controls from the National Cancer Center in Korea were used. We tested interactions between 31 GWAS-identified SNPs and 13 established risk or protective factors for colorectal cancer (family history, body mass index, history of colorectal polyps, inflammatory bowel disease, and diabetes mellitus, alcohol drinking, smoking, regular exercise, regular aspirin use, postmenopausal hormone replace therapy, red meat and processed meat intake, and dairy consumption). Logistic regression models were used to assess G × Es for colorectal cancer risk. Results The SNP rs4444235 at 14q22.2 interacted with regular exercise in colorectal cancer (pcase-only = 2.4 × 10− 3, pcase-control = 1.5 × 10− 3). The risk allele (C) of rs4444235 increased the risk of colorectal cancer in regularly exercising individuals (OR = 1.47, 95% CI = 1.02–2.10) but decreased the risk in non-exercising individuals (OR = 0.76, 95% CI = 0.62–0.94). Furthermore, the G × E between the SNP rs2423279 at 20p12.3 and regular aspirin use was statistically significant (pcase-only = 7.7 × 10− 3, pcase-control = 1.6 × 10− 3). The additive effect of the risk allele (T) of rs2423279 on colorectal cancer risk was increased among regular aspirin users (OR = 4.62, 95% CI = 1.97–10.80). Conclusion Our results suggest that SNP rs4444235 at 14q22.2 and SNP rs2423279 at 20p12.3 may interact with regular exercise and aspirin use in colorectal carcinogenesis.


Background
The genetic heritability for colorectal cancer was approximately 35% (95% confidence interval (CI) = 10-48%) in a twin study [1]. Furthermore, common singlenucleotide polymorphisms (SNPs) were expected to explain at least 7.4% of the heritability [2]. Although genome-wide association studies (GWASs) have identified more than 40 genetic susceptibility regions related to colorectal cancer risk with a nominal genome-wide significance threshold (p-value = 5 × 10 − 8 ) [3], the common SNPs discovered by previous GWAS only accounted for 0.65% of the heritability of colorectal cancer, resulting in remaining missing heritability [2]. Accordingly, gene-environment interactions (G × Es) were suggested to contribute to the missing heritability [4]. Furthermore, since GWAS-identified SNPs might be located on non-coding regions or unknown genes of the DNA since due to a non-hypothesis-driven approach of GWAS, elucidation G × Es may allow a a better understanding of the biological mechanism of the genetic variations [5].
To investigate the potential contribution of G × Es to colorectal cancer, several studies have evaluated G × Es for colorectal cancer susceptibility loci identified by previous GWAS [6][7][8][9][10][11][12][13] and at a genome-wide level [14]. Most studies have adopted a case-control study design to study G × Es [6][7][8][9][10][11][12], which has the advantage of being relatively robust and maintaining a desired type I error rate [15]. Few studies on G × Es have used a case-only design for colorectal cancer [13,14]. Although the caseonly design on G × Es is considered an alternative to case-control design due to potential false positives by unverified assumption of independence between genetic and environmental factors, it allows for more efficient estimation and more powerful association tests to be performed on G × Es than case-control design [16].
For robust and powerful detection, we used both caseonly and case-control approaches to investigate G × Es for colorectal cancer. We focused on 31 SNPs in colorectal cancer susceptibility loci identified by previous GWAS and 13 established environmental risk or protective factors.

Study population
The study population was recruited from the National Cancer Center (NCC) in Korea as previously reported [11][12][13] and presented in Additional file 1: Figure S1. In brief, among 1427 incident colorectal cancer cases who were diagnosed and had a surgery between 2010 and 2013, 1070 cases agreed to participate in the study. We excluded patients who did not complete questionnaires or patients whose blood samples were not insufficient for genotyping. Thus, a total of 703 colorectal cancer patients were included in the analyses. The 14201 healthy controls were recruited among people who underwent a health screening examination, which was a benefit program of the National Health Insurance between 2007 and 2014. Among 9037 people who consented to participate in the study and completed the questionnaire, a total of 1406 healthy controls were included in the analyses by 1:2 frequency matching on 5year age and sex. All study participants provided written informed consent, and the study was approved by the institutional review board (IRB) of the NCC (IRB No. NCCNCS-10-350 and NCC 2015-0202).

Data collection
In this study, the established colorectal cancer risk or protective factors were defined as factors or interventions with adequate evidence of increased or decreased risk of colorectal cancer based on the latest Physician Data Query (PDQ®) cancer information summaries on colorectal cancer prevention of the National Cancer Institute (NCI) updated by Mar 1, 2018 [17] and the Colorectal Cancer Facts & Figs. 2017-2019 of American Cancer Society (ACS®) [18]. Accordingly, the environmental factors considered in this analysis included family history of colorectal cancer, BMI, history of colorectal polyps, history of inflammatory bowel disease (IBD), history of DM, postmenopausal HRT, red meat intake, processed meat intake, and dairy consumption. Milk consumption was excluded in this analysis due to substantial overlap with diary product consumption.
The data for the selected environmental factors were collected from the structured questionnaires composed of two parts: one was concerned with demographic and epidemiological factors, described in detail elsewhere [11], and the other was a semiquantitative food frequency questionnaire (SQFFQ) [19]. Face-to-face interviews were conducted for colorectal cancer patients by trained interviewers using the written questionnaires. Controls completed the questionnaires themselves and their responses were validated by telephone interviews. The questionnaires were developed based on Korean National Health and Nutrition Examination survey, where internal quality assurance as well as external quality control program were managed by the Korea Centers for Disease Control and Prevention [20].

SNP selection and genotyping
We included 31 SNPs previously identified to be associated with colorectal cancer risk with nominal genomewide statistical significance (p-value < 5 × 10 − 8 ) as described previously in detail [11][12][13]. DNA extraction and genotyping were performed on BioRobot M48 automatic extraction equipment with the MagAttract DNA Blood M48 Kit (Qiagen, Hilden, Germany) and an Agenabio MassArray iPLEX® gold assay (Agena Bioscience, Inc., San Diego, CA, US). Briefly, the genotype data for any SNPs were excluded according to the quality control procedures for the following reasons: genotyping failure, monomorphic or minor allele frequency (MAF) < 0.01, or p-value for deviations from Hardy-Weinberg equilibrium (HWE) < 0.01 in controls.

Statistical analysis
To test the difference in the distribution of environmental factors between colorectal cancer cases and controls, a chi-square test for categorical variables and a T-test for continuous variables were conducted. The environmental factors were dichotomized, and a category known to be a lower risk group for colorectal cancer was considered as a reference. For genetic factors, individual SNP alleles were designated as risk or effect alleles based on the literature review. Based on each genotype of SNPs coded as 0, 1, or 2 copies of risk or effect alleles, we calculated MAF, risk or effect allele frequency (RAF), and p-value for deviations from HWE in controls. To assess the effects of the environmental factors and genetic factors assuming a log-additive model on colorectal cancer risk, a logistic regression model was used.
To detect the statistical significance of G × Es, we employed both case-only and case-control designs. In the case-only analysis, each SNP genotype was treated as an independent variable, and each status of dichotomized environmental factors was treated as a dependent variable using a logistic regression model. Under the same setting, control-only analysis was also conducted to test the assumption of independence between genetic and environmental factors. In case-control logistic analysis on colorectal cancer risk, independent variables included not only the SNP genotype and binary status of environmental factors but also the meaning of G × E terms of those genetic and environmental factors. To be eligible for further analysis, the SNPs for which the nominal p-value for G × E was < 0.05 in both case-only and case-control analyses and at least one p-value for G × E was < 1.61 × 10 − 3 (Bonferroni-corrected p-value; 0.05/31) to account for multiple testing were selected. To evaluate the genetic effects on colorectal cancer risk that were modified by environmental factors, association tests for the selected SNPs with statistically significant pvalues for G × E were conducted, stratified by corresponding environmental factor status. For those SNPs, we estimated effects for each genotype as well as effects assuming log-additive, dominant, and recessive models.
The logistic models that only included genetic variables were unadjusted. If models included environmental variables, all analyses were adjusted for age and sex. Potential confounders were chosen based on an association test between environmental factors and colorectal cancer risk. To prevent a problem of multicollinearity among the potential confounders, if a statistically significant correlation was observed between any two paired variables, the variable making a smaller contribution to colorectal cancer risk was dropped. Accordingly, analyses were adjusted for age, sex, family history of colorectal cancer, history of DM, regular exercise, and dairy consumption. Moreover, the dietary factor values were adjusted for total energy intake using the residual method as described elsewhere [21]. All associations and statistical significance were estimated by odds ratio (OR), 95% CI, and two-sided p-value using SAS 9.4 software (SAS Institute, Inc., Cary, NC, US). Table 1 shows the characteristics of the study population and their associations with colorectal cancer risk. Our study population consisted of 703 colorectal cancer cases and 1406 healthy controls. Given that cases and controls were frequency-matched by age and sex, they had a similar mean age (56.4 years in cases and 56.0 years in controls) and the same distribution of sex (31.7% women and 68.3% men). We observed statistically significant differences in family history of colorectal cancer, BMI, history of DM, regular exercise, regular aspirin use, postmenopausal HRT use, red and processed meat intake, and dairy product consumption (P < 0.05). After adjustment for covariates, we observed a statistically significant association for an increased risk of colorectal cancer with family history of colorectal cancer (OR = 2.27, 95% CI = 1.56-3.32, P < 0.01), history of DM (OR = 2.27, 95% CI = 1.56-3.32, P < 0.01), nonregular exercise (OR = 2.97, 95% CI = 2.43-3.62, P < 0.01), nonregular aspirin use (OR = 3.26, 95% CI = 1.97-5.41, P < 0.01), and dairy consumption less than 400 g/day (OR = 2.23, 95% CI = 1.53-3.25, P < 0.01). Contrary to previous studies, we observed a statistically significant association for a decreased risk of colorectal cancer with red meat intake equal to or greater than 100 g/day (OR = 0.66, 95% CI = 0.47-0.92, P = 0.02). Table 2 shows the associations between susceptibility SNPs and colorectal cancer risk in previously published GWAS and the current study. Among 31 previously reported SNPs, 13 SNPs (rs647161 at 5q31.1, rs6983267 at 8q24.21, rs7014346 at 8q24.21, rs10505477 at 8q24.21, rs10795668 at 10p14, rs704017 at 10q22.3, rs11196172 at 10q25.2, rs174537 at 11q12.2, rs174550 at 11q12.2, rs1535 at 11q12.2, rs4779584 at 15q13.3, rs10411210 at 19q13.11, and rs2423279 at 20p12.3) showed statistical evidence of association with colorectal cancer risk in the same direction as previous results, with nominal pvalues ranging from 0.05 to 2.0 × 10 − 4 . The remaining 10 SNPs (rs3802842 at 11q23.1, rs10849432 at 12p13.31, rs10774214 at 12p13.32, rs7136702 at 12q13.13, The G × Es between 31 SNPs and 13 environmental factors were tested using both case-only (Additional file 1: Table S1) and case-control designs (Additional file 1: Table  S2). A total of 7 out of 8 G × Es showing the nominal significance of p-value < 0.05 in both case-only and casecontrol analyses satisfied the assumption of independence between genetic and environmental factors except 1 G × E between rs1957636 and smoking status (Additional file 1: Table S3). Table 3 summarizes those 7 G × Es between rs10849432 and BMI, rs11196172 and history of colorectal polyps, rs10795668 and regular exercise, rs4444235 and regular exercise, rs2241714 and regular aspirin use, rs2423279 and regular aspirin use, and rs1957636 and diary consumption in colorectal cancer by study designs. Notably, 2 G × Es between rs4444235 and regular exercise (caseonly: P interaction = 2.4 × 10 − 3 , case-control: P interaction = 1.5 × 10 − 3 ) and rs2423279 and regular exercise (case-only: P interaction = 7.7 × 10 − 3 , case-control: P interaction = 1.6 × 10 − 3 ) remained significant in at least one of case-only and casecontrol analyses even after Bonferroni-corrected p-value for multiple testing was allowed for. Table 4 shows the association of 2 SNPs, rs4444235 and rs2423279, with colorectal cancer risk stratified by regular exercise and regular aspirin use. Although the SNP rs4444235 was not significantly associated with colorectal cancer risk (

Discussion
We evaluated G × Es on colorectal cancer risk for 31 susceptibility SNPs identified through GWAS with 13 established environmental risk or protective factors using both case-only and case-control study designs. Our analysis showed evidence of G × Es between the SNP rs4444235 at 14q22.2 and regular exercise and the SNP rs2423279 at 20p12.3 and regular aspirin use after accounting for multiple testing. Furthermore, we observed that the associations between rs4444235 and rs2423279 were modified by regular exercise and regular aspirin use.
Among previous G × E studies for colorectal cancer susceptibility loci identified by GWAS, the G × E between rs4444235 and regular exercise for colorectal cancer risk has not been investigated [6][7][8][9][10][11][12][13]. The G × E between rs2423279 and regular aspirin was tested by Kantor et al., but the interaction was not detected with a statistically significant level [7]. Furthermore, previously reported G × Es identified by GWAS for colorectal carcinogenesis were not replicated. This may be due to ethnic differences or limited power to detect interactions. We previously reported on G × Es involving GWASidentified colorectal cancer susceptibility loci with age at cancer onset [13], smoking [11], and alcohol consumption [12] using a conventional method of detecting    interactions. In the current analysis, we combined the case-only, case-control, and control-only study designs, suggesting that the results were more powerful and less biased.
Colorectal cancer susceptibility associated with the SNP rs4444235 was first reported by meta-analysis of two GWAS from individuals of European descent [22]. The association between rs4444235 and colorectal cancer risk was also detected among Caucasian and east Asian patients by a meta-analysis [23]. Although several Asian studies [24][25][26] as well as the current study did not show a statistical association between rs4444235 and colorectal cancer risk, perhaps due to a small sample size, the direction of the association was consistent, suggesting a potential higher risk associated with the C allele. rs4444235 is located at chromosome 14q22.2 close to the bone morphogenetic protein 4 (BMP4) coding gene. Despite the noncoding risk variant, the C allele of rs4444235 showed significantly increased allele-specific expression of the BMP4 gene in the colorectal cancer cell line [27]. BMP4 is involved in the transforming growth factor beta (TGFβ) superfamily signaling pathway, contributing to colorectal tumorigenesis [28]. Colorectal tumorigenesis may be inhibited by favorable effects of regular exercise stimulating intestinal peristalsis and maintaining the general metabolic milieu [29].
This association between the SNP rs2423279 and colorectal cancer risk was identified by GWAS in east Asians and replicated in east Asians and European-ancestry populations as well [30]. This study also observed that rs2423279 with the C allele was associated with an increased risk of colorectal cancer in the same direction. The 2,423,279 is located at chromosome 20p12.3 close to HAO1, which encodes hydroxy acid oxidase 1, and PLCB1, which encodes phospholipase C beta 1. In terms of HAO1 or PLBC1 genes, the mechanisms of colorectal carcinogenesis and interaction with aspirin are unknown. However, because aspirin can be used as a ligand and/or transport and absorption facilitators of diverse agents, including RNAi or polynucleotide targeting for inhibition of HAO1 gene expression [31,32], there still remains a possibility of indirect G × Es between HAO1 and aspirin use.
A major strength of this study is that we found novel G × Es for colorectal cancer susceptibility loci between SNP rs4444235 and regular exercise and SNP rs2423279 and regular aspirin use after accounting for multiple testing. However, the calculated power was 32.2% for the G × E between rs4444235 and regular exercise and 53.2% for the G × E between rs2423279 and regular aspirin use considering the case-only analysis with 703 cases. Although we did not obtain enough statistical power to detect weak G × Es due to insufficient sample size, both case-only and case-control analyses were performed to overcome sample size limitations, derive additional power, and ensure general validity. Through additional control-only analysis, the assumption of independence of genetic and environmental factors was tested in the underlying population. Also, because the case-only study design estimated interactions on the multiplicative scale, which could not imply that G × Es biologically cause colorectal cancer, case-control study design validated the biological hypotheses.
One limitation is that we did not include all colorectal susceptibility loci identified by previous GWAS in the analyses. However, our genetic factors included a relatively updated and large number of colorectal cancer susceptibility SNPs compared with previous G × E studies for colorectal cancer. Environmental factors in the analysis were also selected based on the latest evidence for colorectal cancer risk or protective factors. The other limitation is that the biological basis of G × Es for GWAS-identified SNPs remains unclear, because the functional relationship between those SNPs based on the agnostic approach and colorectal cancer risk are not fully understood. Third, the observed G × Es have not been validated in the other population. We further conducted case-only and case-control analysis on G × Es between rs4444235 and regular exercise among Whites in UK Biobank, no statistically significant interactions were observed (Additional file 1: Table S4). Further studies for Asian-based established risk and protective factors on colorectal cancer and validation studies with sufficient sample size are warranted.

Conclusions
In conclusion, our results suggest that there are possible interactions between the SNP rs4444235 at 14q22.2 and regular exercise and the SNP rs2423279 at 20p12.3 and regular aspirin use in colorectal carcinogenesis.

Additional files
Additional file 1 : Figure S1. A flow diagram of the study population. Table S1. Interactions between susceptibility SNPs and environmental factors in colorectal cancer by case-only analysis. Table S2. Interactions between susceptibility SNPs and environmental factors in colorectal cancer by case-control analysis. Table S3. Independence test between selected susceptibility SNPs and environmental factors by control-only analysis. Table S4. Associations between rs4444235 and colorectal cancer risk by regular exercise among Whites in UK Biobank.