Screening and association testing of common coding variation in steroid hormone receptor co-activator and co-repressor genes in relation to breast cancer risk: the Multiethnic Cohort

Background Only a limited number of studies have performed comprehensive investigations of coding variation in relation to breast cancer risk. Given the established role of estrogens in breast cancer, we hypothesized that coding variation in steroid receptor coactivator and corepressor genes may alter inter-individual response to estrogen and serve as markers of breast cancer risk. Methods We sequenced the coding exons of 17 genes (EP300, CCND1, NME1, NCOA1, NCOA2, NCOA3, SMARCA4, SMARCA2, CARM1, FOXA1, MPG, NCOR1, NCOR2, CALCOCO1, PRMT1, PPARBP and CREBBP) suggested to influence transcriptional activation by steroid hormone receptors in a multiethnic panel of women with advanced breast cancer (n = 95): African Americans, Latinos, Japanese, Native Hawaiians and European Americans. Association testing of validated coding variants was conducted in a breast cancer case-control study (1,612 invasive cases and 1,961 controls) nested in the Multiethnic Cohort. We used logistic regression to estimate odds ratios for allelic effects in ethnic-pooled analyses as well as in subgroups defined by disease stage and steroid hormone receptor status. We also investigated effect modification by established breast cancer risk factors that are associated with steroid hormone exposure. Results We identified 45 coding variants with frequencies ≥ 1% in any one ethnic group (43 non-synonymous variants). We observed nominally significant positive associations with two coding variants in ethnic-pooled analyses (NCOR2: His52Arg, OR = 1.79; 95% CI, 1.05–3.05; CALCOCO1: Arg12His, OR = 2.29; 95% CI, 1.00–5.26). A small number of variants were associated with risk in disease subgroup analyses and we observed no strong evidence of effect modification by breast cancer risk factors. Based on the large number of statistical tests conducted in this study, the nominally significant associations that we observed may be due to chance, and will need to be confirmed in other studies. Conclusion Our findings suggest that common coding variation in these candidate genes do not make a substantial contribution to breast cancer risk in the general population. Cataloging and testing of coding variants in coactivator and corepressor genes should continue and may serve as a valuable resource for investigations of other hormone-related phenotypes, such as inter-individual response to hormonal therapies used for cancer treatment and prevention.

disease subgroup analyses and we observed no strong evidence of effect modification by breast cancer risk factors. Based on the large number of statistical tests conducted in this study, the nominally significant associations that we observed may be due to chance, and will need to be confirmed in other studies.
Conclusion: Our findings suggest that common coding variation in these candidate genes do not make a substantial contribution to breast cancer risk in the general population. Cataloging and testing of coding variants in coactivator and corepressor genes should continue and may serve as a valuable resource for investigations of other hormone-related phenotypes, such as interindividual response to hormonal therapies used for cancer treatment and prevention.

Background
Breast cancer risk is related to lifetime exposure to steroid hormones. [1][2][3] Continuous exposure to endogenous and exogenous estrogens enhances cell proliferation in breast tissue, which is thought to increase the chance that a spontaneous mutation may become fixed and lead to a malignant phenotype. [4,5] Inherited polymorphisms in genes involved in steroid hormone biosynthesis may serve as markers of lifetime exposure to elevated levels of estrogen and breast cancer risk. [4,6,7] While studies have demonstrated genetic control of steroid hormone production, associations between common genetic variation and circulating hormone levels have been modest, and insufficient to alter one's risk of developing breast cancer. [8,9] Cellular response to estrogens is mediated through estrogen receptors (ERα and ERβ), which upon binding to ligand and DNA hormone response elements, recruit coactivator and corepressor proteins that regulate the expression of steroid hormone target genes. [10,11] More than 200 nuclear receptor coactivators and 40 corepressors have been identified http://www.nursa.org/. [11] The relative recruitment of coactivators versus corepressors for a given ligand (e.g. estradiol vs tamoxifen) is tissue specific and may account for agonist vs. antagonist activity of the same ligand in different tissues. [12,13] Polymorphic variants in these mediators of hormonal responsiveness could affect the functional activity of estrogen receptors following stimulation by endogenous or exogenous (i.e. HRT or SERMs) estrogens and lead to differences in steroid hormone sensitivity which could alter one's risk of developing breast cancer.
In this study, we systematically screened the coding exons of steroid hormone receptor coactivator and corepressor genes in a multiethnic panel of women with breast cancer in an attempt to identify and catalogue potentially functional coding polymorphisms that may serve as genetic markers of breast cancer risk. We targeted 17 genes suggested to influence transcriptional activation by steroid hormone receptors (PGR, ERα, ERβ) through direct binding to these receptors or through interactions with other well characterized co-activator/co-repressor protein com-

Study population
The Multiethnic Cohort Study (MEC) is a populationbased prospective cohort study that was initiated between 1993 and 1996 and includes subjects from various ethnic groups -African-Americans and Latinos primarily from California (mainly Los Angeles) and Native Hawaiians, Japanese-Americans, and European Americans primarily from Hawaii. [14] State driver's license files were the primary sources used to identify study subjects in Hawaii and California. Additionally, in Hawaii, state voter's registration files were used, and, in California, Health Care Financing Administration (HCFA) files were used to identify additional African American men.
All participants (n = 215,251) returned a 26-page selfadministered baseline questionnaire that obtained general demographic, medical and risk factor information such as ethnicity, prior medical conditions, family history of various cancers, dietary exposures, smoking, physical activity, body mass index (BMI), and for women, reproductive history and exogenous hormone use. All participants were 45 to 75 years of age at baseline. In the cohort, incident cancer cases are identified annually through cohort linkage to population-based cancer Surveillance, Epidemiology, and End Results (SEER) registries in Hawaii and Los Angeles County as well as to the California State cancer registry. Information on stage of disease and estrogen and progesterone receptor status is also obtained through the SEER registries.
Nested case-control study of breast cancer Blood sample collection in the MEC began in 1994 and targeted incident breast cancer cases and a random sample of study participants to serve as controls for genetic analyses. In this present study, incident cases are defined as those diagnosed with invasive breast cancer after enrollment through December 31, 2002. Cases were over 45 years of age, and consisted primarily of postmenopausal women. Women with a previous diagnosis of breast cancer identified by SEER or self-report at baseline were excluded. Controls were women without a breast cancer diagnosis through December 31, 2002. The controls were frequency matched to cases on ethnicity and the case's age at diagnosis in 5-year intervals. The nested breast cancer case-control study consists of 1,612 invasive breast cancer cases and 1,961 controls (n, cases/n, controls: African Americans, 345/426; Native Hawaiians, 108/290; Japanese Americans, 425/419; Latinas 334/386; European Americans, 400/440), and has been utilized previously for numerous candidate gene association studies in the MEC. [15][16][17] This study was approved by the Institutional Review Boards at the University of Southern California and at the University of Hawaii and informed consent was obtained from all study participants

Polymorphism discovery
Polymorphism discovery was carried out by sequencing the coding exons and splice-site regions of the 17 candidate genes in a multiethnic panel of 95 women (19 subjects of each ethnic group) with advanced breast cancer (invasive/non-localized cancer with SEER stage ≥ 2) from the MEC. These genes were selected because they are the focus of ongoing structural and functional studies being conducted by the investigators. Advanced cases were targeted for sequencing to increase the likelihood of detecting variants that would be biologically associated with breast cancer. This panel was selected to have ≥ 85% power to detect a potentially functional variant of ~5% frequency (2 of 38 chromosomes) in any one population or an overall frequency of ~1% (2 of 190 chromosomes).
DNA was extracted from buffy coat fractions using the Qiagen QiaAMP Blood Kit (Valencia, CA). All DNA samples were previously whole-genome amplified (WGA) by Molecular Staging Inc. following their standard protocol (New Haven, CT). [18] Non-synonymous variants in the coding region or variants in known splice-site regions that were observed in > 1 individual (a minimum of 2 out of the 190 chromosomes) were targeted for association testing. For variants that were observed in only one individual, we sequenced an additional 88-91 subjects of that specific ethnic population. This extra sequencing was performed to determine whether the variant is extremely rare or may have been introduced during the WGA process or through PCRbased sequencing, with each having error rates of ~10 -6 . [19] Of the 68 rare ethnic-specific variants that we observed, 18 were confirmed (i.e. observed in ≥ 1 of the additional 190 chromosomes and ≥ 2 of 228 chromosomes examined in total (~1%)), and were further examined in relation to breast cancer risk.

DNA sequencing
Bi-directional sequencing was performed on the ABI 3730 × l DNA Analyzer (Applied Biosystems, Foster City, CA). Gene and exon specific PCR primers were obtained from NCBI Probe Database http://www.ncbi.nlm.nih.gov/ sites/entrez?cmd=search&db=probe when available, and PCR conditions are according to the VariantSEQr Resequencing System protocol (Applied Biosystems, Foster City, CA). In the event that PCR primers were not available (13 exons out of 353 targeted exons), primers were designed in-house (at least 50 bases upstream and downstream from the targeted exon). In the instance that an exon exceeded approximately 550 bases or sequencing did not yield analyzable results, internal primers were designed. Sequencing primers were typically universal primers obtained from ABI or internal primers, as mentioned above. Sequencing purification was performed using DyeDX 96 columns (Qiagen, Valencia, CA) following their standard protocol. PCR primers, cycling conditions and details of the sequencing protocol can be provided upon request.
PolyPhred was used for analyzing sequence traces and variation discovery http://droog.mbt.washington.edu/Poly Phred.html. [20][21][22] For the 17 genes evaluated in this study (Table 1), we successfully sequenced 346 of the 355 coding exons (97.5%; > 72 kilobases (kb)), 327 in both the forward and reverse direction, and 19 in only one direction (a total of 367 amplicons). Each amplicon was sequenced in 94 of the 95 subjects in the multiethnic panel, on average. The Phred quality score was used to assess the quality of each trace from 10 bases 5' through 10 bases 3' of each exon. [23] The average Phred quality score was 46.6 for all exons sequenced; 86% of the amplicons had a quality score ≥ 40 and 97.5% had a quality score ≥ 30.

Genotyping
The genotype of coding variants (43 non-synonymous SNPs, 1 in/del and 1 splice-site variant) in the case-control samples was determined using the allelic discrimination assay. [24] Each assay was validated initially by genotyping the multiethnic sequencing panel (n = 95) and comparing with the sequencing results; the concordance was > 99.6%, on average. Five variants could not be genotyped in the case-control samples because a working assay could not be designed (FOXA1, Gly227Glu; NCOR2, Tyr19Cys, Pro975Ser and Pro2008Ser (rs2230944); SMARCA2, Gly1416Ala (rs3793510)). We could infer genotypes at Pro2008Ser in NCOR2 however as this variant was found to be in near perfect linkage disequilibrium (i.e. r 2 = 1) with Ala2007Thr (determined based on sequencing of 69-84 individuals from each of the 5 populations). The association results for the splice-site variant in CCND1 (rs603965, Pro241Pro) in the MEC is also part of another study (Knudsen et al. in review, 2008). Primers and probes for all assays can be provided upon request. Quality control replicates (~5%) were included to assess the genotyping reliability and reproducibility for the 40 assays. The average concordance for the duplicates was 99.8%, ranging from 98%-100% across all assays. All variants were successfully genotyped in > 94% of cases or controls (average call rate = 98.2%) This study has been approved by the Institutional Review Boards at the University of Southern California and the University of Hawaii.

Statistical analyses
Hardy-Weinberg equilibrium was evaluated using an exact test. [25] Unconditional logistic regression was used to assess the association of each variant with breast cancer risk. Allele dosage effects were examined using a log-additive model in both ethnic-specific and pooled analyses, treating the common homozygous (i.e. wild-type) genotype class as the "low risk" group. Logistic regression models were fitted to estimate odds ratios associated with this score variable treated as a linear variable, adjusted for age and race. We also examined effect heterogeneity by breast cancer phenotypes (e.g. stage and estrogen receptor (ER) status). These analyses were performed using the standard case-control approach, limiting the cases to those with a specific phenotype (i.e. ER-positive cases) and all controls, and a case-only analysis to test for differences by disease subgroup. We also examined effect modification by established breast cancer risk factors that are associated with steroid hormone exposure: body mass index (kg/m 2 ) among postmenopausal women (≥ 25 vs < 25 kg/m 2 ), use of hormone replacement therapy (current vs past vs never user of estrogen or estrogen + progestin) and age at menarche (≤ 12, vs > 12 years). Tests for interactions were performed using the Likelihood Ratio Test (LRT). All statistical analyses were done using SAS version 9.0 (Cary, NC).

Discovery of coding variation
In resequencing of the 17 candidate genes (Table 1), we identified 43 non-synonymous SNPs with frequencies ≥ 1% in any one ethnic group and 19 of these variants were  (see Additional file 1, Supplementary table 1). We detected all common (≥ 5%) validated non-synonymous SNPs in these genes reported in dbSNP. We also detected the well known and common splice-site variant in CCND1 (rs603965, Pro241Pro). [26] as well as a novel inframe deletion (Val1996/1997del) in NCOR1 in African Americans (MAF, 0.03). Six of the candidate genes targeted in this study were quite large and included > 30 coding exons (Table 1), with four of these genes containing > 7.2 kb of coding sequence (EP300, NCOR1, NCOR2 and CREBBP). NCOR2 contained both the largest number of coding exons (n = 47, 7.6 kb) as well as non-synonymous variants (n = 15). In contrast, NCOR1, an equally large gene was found to harbor no non-synonymous variants (among the 44 of 45 coding exons that were successfully sequenced, 7.3 kb). We also identified three known polyglutamine repeat polymorphisms, one in exon 20 of NCOA3. [27], one in exon 15 of NCOR2. [28], and one in exon 4 of SMARCA2 [29]; associations with these repeat polymorphisms and breast cancer risk will be examined in future studies. All non-synonymous, in-frame in/del and splice-site variants were targeted for association testing (as discussed in the methods) in the breast cancer case-control study in the MEC (1,612 cases and 1,961 controls). A detailed list of the coding variants examined in the breast cancer study, and their frequencies in each racial/ethnic population is provided (see Additional file 1, Supplementary table 1). Only one variant was found in a single population (NCOA3, Met391Val; African Americans, MAF 1.6%) and all variants were in Hardy-Weinberg Equilibrium among controls (p > 0.05) in at least four of the five ethnic groups.

Allelic associations with breast cancer risk
The median age of the breast cancer cases and controls was 65 and 63 years, respectively. The associations with established breast cancer risk factors for each ethnic group were generally as expected. Briefly, among postmenopausal women, compared to controls, cases were more likely to be heavier and to have used hormone therapy. Cases also reported having an earlier age of menarche and were more likely to report a family history of breast cancer, than controls (Additional file 1, Supplementary Allelic associations were also examined by ER status and stage of disease. We observed significant positive associations with single variants in EP300 (Ser507Gly), NCOR2 (Lys980Thr) and CREBBP (Val992Ile) among ER-negative tumors ( Table 2), and significant heterogeneity by ER status in case-only analyses (p < 0.01) for Ser507Gly and Val992Ile. There was a statistically significant positive association with CCND1 (Pro241Pro) and an inverse association with EP300 (Ile997Val) for ER-positive cases ( Table 2). In addition to the previously noted associations with His52Arg in NCOR2 and Arg12His in CALCOCO1, we observed a significant positive association with Ala407Ser in NCOA2 and advanced disease and nominally significant heterogeneity (p < 0.05) by disease stage for variants Ala83Thr and Ser448Asn in FOXA1 and variants Ser2311Gly and Ala2496Thr in NCOR2.
We also evaluated allelic effect modification by known breast cancer risk factors that are associated with steroid hormone exposure, with the a priori hypothesis being that conditions related to long-term steroid hormone exposure (i.e. greater postmenopausal weight, early age at menarche and ever use of hormone therapy) may influence the penetrance associated with these candidate coding variants (see Additional file 1, Supplementary tables 4-6). We observed very little evidence to support this hypothesis and only two nominally statistically significant interactions (Ala2007Thr in NCOR2 and age at menarche (LRT, p = 0.04); and Ala83Thr in FOXA1 and BMI (LRT, p = 0.00054)). Of note we observed a borderline significant positive association with the common Pro241Pro variant in CCND1 among current users of HRT (OR = 1.21; 95% CI, 1.00-1.46), however no significant interactions were observed by postmenopausal hormone use status (see Additional file 1, Supplementary table 6).

Discussion
Inherited susceptibility to breast cancer is associated with a wide spectrum of allelic variants that convey varying degrees of risk (reviewed in [30]). Rare mutations in the coding sequence of BRCA1 and BRCA2 and a growing number of other genes involved in maintaining genomic stability have been shown to confer high to moderate risks of breast cancer. More recently, genome-wide scans of breast cancer in populations of European ancestry have identified a limited number of common alleles associated with more modest risks of breast cancer (OR ~1.1-1.2 per allele). [31][32][33][34] Genome-wide scanning approaches have been shown to be powerful for discovering common variants that influence complex disease phenotypes; however, these approaches currently fail to comprehensively survey coding variation, particularly uncommon alleles in non-European populations. At present, the only way to fully enumerate and comprehensively assess the codingvariant model for breast cancer susceptibility is by direct resequencing of candidate loci. This candidate gene resequencing strategy has been successful in identifying rare truncating mutations in a number of genes that confer approximately 2-fold risks of breast cancer (e.g. PALB2 and BRIP1). [35,36] This approach has also been successful for identifying rare alleles that act collectively to influence other complex traits and cancer phenotypes, with examples including variants in candidate genes that influence plasma lipid levels [37] and risk of developing colorectal adenomas. [38] In this study, we examined the role of coding variation in breast cancer in multiple populations, focusing on a set of 17 candidate steroid hormone coactivator and corepressor-related genes selected based on their ability to interact with the estrogen receptor transcription complex and potentially modulate response to estrogen in breast tissue. Sequencing of the coding exons in a multi-ethnic panel of 95 women with advanced breast cancer provided 85% power to identify putative functional coding variants with frequencies as low as 1% in the combined sample. The breast cancer case-control study utilized for association testing was also well-powered to detect nominally significant effects as low as 1.8 for rare alleles (1% MAF, 81% power) and 1.35 for more common alleles (5% MAF, 83% power). This study was also well-powered to detect effects as low as 1.36 for more common alleles (10% MAF, 82% power) after correcting for the number of tests performed (n = 40, α = 0.00125). The multiethnic nature of this study was designed to allow for investigating a wide range of risk alleles; however, we found the vast majority of coding variants to be present in more than one population (39 of 40 variants). In this study, we observed nominally significant associations with only 2 variants and breast cancer risk (His52Arg in NCOR2 and Arg12His in CALCOCO1), which is in line with expectation based on the number of tests that were performed (2 of 40 = 5%). The observation that these variants were also significantly associated with stage and ER status will need to be confirmed in other studies.
Aside from CCND1, coding variation in only a small number of these genes has been investigated in relation to breast cancer risk [39,40]. The Pro241Pro splice-site vari-ant in CCND1 has been examined extensively, with some studies reporting a positive association, [41][42][43] and others finding no significant association [44][45][46]. We observed no significant association with this variant, and much larger collaborative efforts, such as the Breast Cancer Association Consortium [47], will be needed to rule out weak effects (RR < 1.15) for this functional variant. Our data support those of Wirtenberger et al. [40] who reported no association with the Ile997Val or Gln2223Pro variants in EP300. However, we did observe nominally significant associations between Pro241Pro in CCND1 and Ile997Val in EP300 and risk of ER-positive breast cancer. These findings are noteworthy as both of these variants are relatively common in the population and will need to be confirmed in other large studies. An inverse association has also been reported with the Gln586His variant in NCOA3 (p = 0.03) in a European study (775 cases and 1,628 controls), which we were unable to replicate in our much larger study [39]. The glutamine repeat polymorphism in NCOA3 has also been investigated in multiple populations, with the vast majority of studies reporting no significant correlation between repeat genotype and breast cancer incidence. [48][49][50][51][52] Although we conducted a comprehensive assessment of coding variants in these candidate genes, there are a number of limitations to our study that should be considered when interpreting our findings. First, our resequencing strategy most likely missed rare coding variants in these genes, including those which may be population-or subject-specific. Second, we did not enrich our sequencing panel with cases with a family history of breast cancer who may be more genetically susceptible and for whom a coding-variant genetic model may be more probable. Nor did we enrich our panel with cases with ER-positive tumors which may increase the likelihood for detecting putative functional coding variants in these genes based on their known modes of action. Most of the variants that we identified were rare, so power was limited. Future studies of these loci and other steroid hormone receptor coactivator and corepressor genes will require even larger samples from these defined population subgroups to ensure complete ascertainment of all rare coding alleles, followed by robust association studies with greater power to assess more modest effects.

Conclusion
This study suggests that common coding variation in these 17 candidate steroid hormone receptor coactivator and corepressor genes does not make a substantial contribution to breast cancer risk in the general population. Cataloging and testing of coding variants in coactivator and corepressor genes should continue and may serve as a valuable resource for investigations of other hormonerelated phenotypes, such as mammographic density, and of inter-individual response to hormonal therapies used for cancer treatment and prevention.