Case-control study for colorectal cancer genetic susceptibility in EPICOLON: previously identified variants and mucins

Background Colorectal cancer (CRC) is the second leading cause of cancer death in developed countries. Familial aggregation in CRC is also important outside syndromic forms and, in this case, a polygenic model with several common low-penetrance alleles contributing to CRC genetic predisposition could be hypothesized. Mucins and GALNTs (N-acetylgalactosaminyltransferase) are interesting candidates for CRC genetic susceptibility and have not been previously evaluated. We present results for ten genetic variants linked to CRC risk in previous studies (previously identified category) and 18 selected variants from the mucin gene family in a case-control association study from the Spanish EPICOLON consortium. Methods CRC cases and matched controls were from EPICOLON, a prospective, multicenter, nationwide Spanish initiative, comprised of two independent stages. Stage 1 corresponded to 515 CRC cases and 515 controls, whereas stage 2 consisted of 901 CRC cases and 909 controls. Also, an independent cohort of 549 CRC cases and 599 controls outside EPICOLON was available for additional replication. Genotyping was performed for ten previously identified SNPs in ADH1C, APC, CCDN1, IL6, IL8, IRS1, MTHFR, PPARG, VDR and ARL11, and 18 selected variants in the mucin gene family. Results None of the 28 SNPs analyzed in our study was found to be associated with CRC risk. Although four SNPs were significant with a P-value < 0.05 in EPICOLON stage 1 [rs698 in ADH1C (OR = 1.63, 95% CI = 1.06-2.50, P-value = 0.02, recessive), rs1800795 in IL6 (OR = 1.62, 95% CI = 1.10-2.37, P-value = 0.01, recessive), rs3803185 in ARL11 (OR = 1.58, 95% CI = 1.17-2.15, P-value = 0.007, codominant), and rs2102302 in GALNTL2 (OR = 1.20, 95% CI = 1.00-1.44, P-value = 0.04, log-additive 0, 1, 2 alleles], only rs3803185 achieved statistical significance in EPICOLON stage 2 (OR = 1.34, 95% CI = 1.06-1.69, P-value = 0.01, recessive). In the joint analysis for both stages, results were only significant for rs3803185 (OR = 1.12, 95% CI = 1.00-1.25, P-value = 0.04, log-additive 0, 1, 2 alleles) and borderline significant for rs698 and rs2102302. The rs3803185 variant was not significantly associated with CRC risk in an external cohort (MCC-Spain), but it still showed some borderline significance in the pooled analysis of both cohorts (OR = 1.08, 95% CI = 0.98-1.18, P-value = 0.09, log-additive 0, 1, 2 alleles). Conclusions ARL11, ADH1C, GALNTL2 and IL6 genetic variants may have an effect on CRC risk. Further validation and meta-analyses should be undertaken in larger CRC studies.


Background
Colorectal cancer (CRC) is the second leading cause of cancer death in developed countries [1]. Familial adenomatous polyposis and Lynch syndrome are the most frequent hereditary CRC syndromes with a more aggressive presentation, earlier onset and strong familial aggregation. However, they only correspond to a minority of the total CRC burden (~5%). Most genetic components involved in these less frequent hereditary forms were successfully identified in the past two decades and they correspond to rare, highly penetrant alleles that predispose to CRC. Genetic association analyses have been the strategy to identify predisposing CRC alleles in the last decade, firstly by studying a small number of such variants or single nucleotide polymorphisms (SNP) using candidate-gene approaches [2], and lately with an unbiased strategy by genome-wide association studies (GWAS) [3,4].
From the EPICOLON consortium [5], several genetic association candidate-gene efforts have been pursued since 2005 aiming to identify genetic susceptibility variants for CRC. In this manner, SNPs/genes were selected to be studied from the previously identified category (variants linked to CRC risk in previous studies), from human syntenic CRC susceptibility regions identified in mouse, from the CRC carcinogenesis-related pathways Wnt and BMP, from regions 9q22 and 3q22 with positive linkage in CRC families, and from the mucin gene family [6][7][8] SNPs in genes selected in the previously identified category (ADH1C, APC, CCDN1, IL6, IL8, IRS1, MTHFR, PPARG, VDR and ARL11) have been analyzed in previous independent genetic association studies and they are a priori attractive candidates for genetic susceptibility to CRC [6,[9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]. The expression of most genes in this category are altered in CRC and they appear to be involved in important processes for CRC risk such as hereditary CRC (APC), alcohol metabolism (ADH1C), inflammation (IL6, IL8), cell cycle regulation (CCDN1), energy balance (IRS1, PPARG), methylation (MTHFR), vitamin D (VDR), and the RAS superfamily (ARL11). On the other hand, mucins are protein constituents of the mucous barrier that protects human epithelia from adverse conditions, and they are highly glycosylated by GALNT proteins. Mucins and GALNTs, members of the mucin gene family, can be found deregulated in CRC and other neoplasms and, although also interesting candidates, they have not been previously evaluated for CRC genetic susceptibility [25].
Here, we report results from a case-control association study for CRC risk in the EPICOLON cohort for previously identified SNPs in ADH1C, APC, CCDN1, IL6, IL8, IRS1, MTHFR, PPARG, VDR and ARL11, and selected variants within the mucin gene family.

Study populations
Studied subjects were mainly from EPICOLON, a prospective, multicenter, nationwide Spanish initiative http://www.aegastro.es/aeg/ctl_servlet?_f=16&grupo=4 [5], comprised of two stages (2000-2001 and 2006-2008), where CRC cases and matched healthy controls were collected. DNA samples were extracted as previously described [6,7]. EPICOLON stage 1 corresponded to 515 CRC cases and 515 controls, whereas EPICOLON stage 2 consisted of 901 CRC cases and 909 controls. Additionally, an independent cohort of 549 CRC cases and 599 controls (MCC-Spain; http://www.creal.cat) was available for further replication of putative positive hits in EPICOLON. Cases and controls were matched for sex and age (± 5 years) and controls were negative for personal and family cancer history. All samples were obtained with informed consent reviewed by the ethical board of the corresponding hospital.

Gene and SNP selection
Selected SNPs were included in the previously identified category if they corresponded to genetic variants linked to CRC risk by previously published independent studies, or in the mucin gene family if located on genes encoding for mucins and GALNT proteins. Mucins are protein constituents of the mucous barrier highly glycosylated by GALNT proteins. One relevant previously identified SNP was studied in each of the following genes: ADH1C, APC, CCDN1, IL6, IL8, IRS1, MTHFR, PPARG, VDR and ARL11. For the mucin gene family, 1-2 SNPs were selected from each of the following genes: MUC7, MUC12, MUC13, MUC15, MUC16, MUC17, MUC19, MUC21, GALNT1, GALNTL2, GALNT10, GALNT14, GALNT4 and OVGP1. Mucin gene selection did not intentionally include GALTN12. This gene is located on the 9q22 region and it was studied in our candidate-gene approach for regions with positive linkage in CRC families (Abulí et al., manuscript in preparation). SNP selection in the mucin gene family was performed using Pupasuite, a web tool for the selection of genetic variants with potential phenotypic effect (pupasuite.bioinfo.cipf.es) [26]. SNPs were always prioritized if they were coding, evolutionary conserved in mouse and SNP minor allele frequency (MAF) was above 5%. Other selected SNPs with a putative regulatory effect were in promoter, intronic or 3'-UTR regions. A complete list of SNP and genes analyzed in the present study is detailed in Table 1.
High-throughput genotyping in EPICOLON cohorts was performed according to manufacturer's instructions with the TaqMan allelic discrimination and SNPlex™ systems (Applied Biosystems, Foster City, USA), and single-base primer extension chemistry matrix-assisted laser desorption/ionization time-offlight mass spectrometry (MALDI-TOF MS) genotyping platform (Sequenom Inc., San Diego, USA). Genotyping in the MCC-Spain cohort was performed using the VeraCode technology (Illumina, San Diego, USA). Genotyping was performed at the Santiago de Compostela and Barcelona nodes of the Spanish National Genotyping Centre http://www.cegen.org, and at the Genome Analysis Platform of the CIC-BioGUNE http://www.cicbiogune.es.

Statistical methods
As quality control, genotyping success was set above 90% for SNPs. Allelic frequency description and Hardy-Weinberg equilibrium test were performed using SNPator, a web-based tool offered by the Bioinformatics division of the Spanish National Genotyping Centre (bioinformatica.cegen.upf.es) [27]. All SNPs analyzed had a genotype success rate > 90%. The genotype frequencies of all variants in the control population fitted the Hardy-Weinberg equilibrium (P > 0.01), supporting absence of genotyping artifacts. There was no sign of underlying population stratification in EPICOLON as tested by an independent study [7]. Genotype analysis was carried out using the SNPassoc R library [28]. Intergroup comparisons of genotype frequency differences were performed by regression analysis for codominant, dominant, recessive and log-additive models of inheritance. We estimated the crude odds ratio (OR) and their 95% confidence intervals (95% CIs). As expected, results did not change after sex and age adjustment. The best genetic or inheritance model was selected using the Akaike information criteria. To address the issue of multiple testing, we used Bonferroni correction (P = 0.0125 for four SNPs). Study power was estimated with CaTS software [29].

Results
Twenty-eight SNPs, ten from the previously identified category and 18 from the mucin gene family, were successfully genotyped in EPICOLON stage 1. In the EPICOLON stage 1 cohort, a liberal P-value threshold (P-value < 0.05) was used to avoid false-negative results. We then validated statistically-significant stage 1 results by replicating them in another independent CRC cohort (EPICOLON stage 2). Further replication in stage 2 was performed only for significant SNPs in stage 1. Results for rs698, rs1800795, rs3803185, and rs2102302 in EPICOLON stage 2 are shown in Additional File 2. Only rs3803185 maintained statistical significance in stage 2 (OR = 1.34, 95% CI = 1.06-1.69, Pvalue = 0.01, recessive model, AA-AG vs GG). In order to improve statistical power, results for EPICOLON stages 1 and 2 were also analyzed jointly for these four SNPs. Results were only significant for rs3803185 (OR = 1.12, 95% CI = 1.00-1.25, P-value = 0.04, log-additive 0, 1, 2 alleles) and borderline significant for rs698 and rs2102302 (Table 2). In order to further investigate rs3803185, we were able to genotype it additionally outside EPICOLON in a cohort from a Spanish multicase-control population study for common neoplasms (MCC-Spain). This variant was not significantly associated with CRC risk in this independent cohort. Pooled analysis for rs3803185 in the EPICOLON and MCC-Spain cohorts still showed borderline significance (OR = 1.08, 95% CI = 0.98-1.18, P-value = 0.09, log-additive 0, 1, 2 alleles). Associations of rs698, rs1800795, rs3803185 and rs2102302 were also evaluated in 2 cohorts described in a previous GWAS [30], either by checking the original variant or a proxy SNP highly correlated with it (r 2 > 0.7) (Table 3). Interestingly, rs3803185 showed again significance in one of the GWAS (P = 0.03). However, it should be commented that weak associations observed in our study would have not been present if Bonferroni correction for multiple testing was applied and, therefore, they should be considered as not statistically significant.

Discussion
Ten previously identified SNPs in ADH1C, APC, CCDN1, IL6, IL8, IRS1, MTHFR, PPARG, VDR and ARL11, and 18 selected variants in the mucin gene family were evaluated with a genetic association strategy. CRC cases and matched controls collected in two independent stages within the EPICOLON consortium were genotyped in order to evaluate its potential association with CRC risk. Mucins and GALNTs, members of the mucin gene family, have not been previously evaluated for CRC genetic susceptibility.
In our study, four SNPs were significant in EPICO-LON stage 1 (rs698 in ADH1C, rs1800795 in IL6, rs3803185 in ARL11, and rs2102302 in GALNTL2), but only rs3803185 achieved statistical significance in EPI-COLON stage 2. In the joint analysis for both stages, results were only significant for rs3803185 and borderline significant for rs698 and rs2102302. The rs3803185 variant was not significantly associated with CRC risk in an external cohort (MCC-Spain), but it still showed some borderline significance in the pooled analysis of both cohorts.
ARL11, also known as ARLTS1 (ADP-ribosylation factor-like tumor suppressor gene 1), is a tumor suppressor gene that belongs to the ARF family of the Ras superfamily of small GTPases that are known to be involved in multiple regulatory pathways altered in human carcinogenesis. It has been suggested that ARL11 SNPs, especially rs3803185, may act as low penetrance variants in several neoplasms including CRC [6,24,32]. The ADH1C gene encodes for class I alcohol dehydrogenase, gamma subunit, which is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol. There is a noticeable association between alcohol consumption and CRC risk [33], and it is plausible that this association could also be modified by germline variants in enzymes that metabolize ethanol. GALNTL2 (UDP-N-acetylalpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase-like 2), also known as GALNT15, is ubiquitously expressed in human tissues [34]. This gene has never been investigated in CRC susceptibility and it has only been reported to be studied as a genetic factor involved in longevity [35]. Chronic inflammation is in the etiology of CRC and release of large amount of cytokines and growth factors may influence the carcinogenesis process. The IL6 (interleukin 6) gene, among others, has been analyzed in previous independent genetic association studies and it is an a priori attractive candidate for genetic susceptibility to CRC [15,16].
Finally, as limitations of our study, it should be commented that our cohort sample size may probably be  In summary, none of the 28 SNPs analyzed in our study could be associated with CRC risk. However, variants in ARL11, ADH1C, GALNTL2 and IL6 may have an effect on CRC risk. Mucins and GALNTs, included in the mucin gene family, have been found deregulated in CRC and other neoplasms, and they are interesting candidates for CRC genetic susceptibility [27]. Since they have not been previously evaluated and despite our mostly negative results, we consider that genetic variation in the mucin gene family should be further explored in larger CRC cohorts in order to draw more solid conclusions. Also, additional case-control studies in larger CRC cohorts and meta-analyses could be useful to confirm or refute the role of ARL11, ADH1C, GALNTL2 and IL6 variants in CRC susceptibility.

Conclusions
None of the 28 SNPs analyzed in our study could be associated with CRC risk. However, ARL11, ADH1C, GALNTL2 and IL6 genetic variants may have an effect on CRC risk. Further validation and meta-analyses should be undertaken in larger CRC cohorts.

Additional material
Additional file 1: Results for previously identified and mucin SNPs in EPICOLON stage 1. SNPassoc results for previously identified and mucin SNPs in EPICOLON stage 1. P-values for some SNPs are highlighted in bold if significant (P < 0.05).
Additional file 2: Results for previously identified and mucin SNPs in EPICOLON stage 2. SNPassoc results for previously identified and mucin SNPs in EPICOLON stage 2. P-values for some SNPs are highlighted in bold if significant (P < 0.05).