Nuclear receptor coregulator SNP discovery and impact on breast cancer risk

Background Coregulator proteins are "master regulators", directing transcriptional and posttranscriptional regulation of many target genes, and are critical in many normal physiological processes, but also in hormone driven diseases, such as breast cancer. Little is known on how genetic changes in these genes impact disease development and progression. Thus, we set out to identify novel single nucleotide polymorphisms (SNPs) within SRC-1 (NCoA1), SRC-3 (NCoA3, AIB1), NCoR (NCoR1), and SMRT (NCoR2), and test the most promising SNPs for associations with breast cancer risk. Methods The identification of novel SNPs was accomplished by sequencing the coding regions of these genes in 96 apparently normal individuals (48 Caucasian Americans, 48 African Americans). To assess their association with breast cancer risk, five SNPs were genotyped in 1218 familial BRCA1/2-mutation negative breast cancer cases and 1509 controls (rs1804645, rs6094752, rs2230782, rs2076546, rs2229840). Results Through our resequencing effort, we identified 74 novel SNPs (30 in NCoR, 32 in SMRT, 10 in SRC-3, and 2 in SRC-1). Of these, 8 were found with minor allele frequency (MAF) >5% illustrating the large amount of genetic diversity yet to be discovered. The previously shown protective effect of rs2230782 in SRC-3 was strengthened (OR = 0.45 [0.21-0.98], p = 0.04). No significant associations were found with the other SNPs genotyped. Conclusions This data illustrates the importance of coregulators, especially SRC-3, in breast cancer development and suggests that more focused studies, including functional analyses, should be conducted.


Background
Nuclear receptors are critical for proper development and function of many physiological pathways including lipid metabolism, inflammation, and cell growth [1][2][3]. Over the past 25 years, it has become clear that nuclear receptors are also critical for the onset and progression of many diseases, including cancer. In breast cancer, for example, estrogen receptor-α (ERα) is expressed and drives tumor growth in approximately 2/3 of cases. However, only recently it has been appreciated that proper nuclear receptor function is absolutely dependent on the interaction with coregulator proteins [4]. These proteins couple nuclear receptors with RNA polymerase II and chromatin remodeling machinery to either activate (coactivators) or repress (corepressors) nuclear receptor mediated gene transcription. And because a single or a subset of coregulators can simultaneously regulate multiple cellular processes through multiple nuclear receptors, they have been classified as 'master regulators' [3]. Keeping with this classification, many coregulators have been implicated in numerous human diseases, including breast cancer [5][6][7][8][9][10].
Family history is one of the strongest risk factors for breast cancer with the risk approximately double in first degree relatives of women with breast cancer compared to the general population [11]. Because of this, many attempts to identify genetic risk factors using multiple approaches have been conducted. However, despite the identification of mutations in the major risk factor genes such as BRCA1, BRCA2, PTEN, CHEK2, and ATM, it is estimated that 75% of familial breast cancers have yet unidentified risk alleles [12]. ERα is expressed and drives a large fraction of breast cancer cases and is therefore an excellent candidate gene for identifying breast cancer risk factors. Recently, a significant association with familial breast cancer risk has been observed for the C allele of ESR1_rs2747648 in an allele dose-dependent manner. This variant is located in a miRNA-binding site in the 3' untranslated region of ESR1 [13]. However, historically very few associations have been found between SNPs in ERα and breast cancer risk. Further, a recent study conducted a comprehensive search of all SNPs in ERα that revealed no major risk associations (n>55,000 breast cancer cases and controls) [14]. This suggests that other players in the ER signaling pathway may be important for breast cancer risk. Because of the critical importance of coregulators for ERα function, we hypothesized that breast cancer risk is influenced by SNPs within the coactivators SRC-1/NCoA1 and SRC-3/ NCoA3/AIB1 and the corepressors NCoR and SMRT/ NCoR2.
We previously reported two SNPs in SRC-3 (rs2230782 and rs2076546) associated with reduced breast cancer risk in a case-control study of German and Polish high-risk, BRCA1/2 mutation-negative women (cases: 775, controls: 1628) [15]. In a recent study by Haiman et al [16], coregulator sequencing was conducted in 95 women with advanced breast cancer from the Multiethnic Cohort (African Americans, Latinos, Japanese, Native Hawaiians, and European Americans) to identify novel SNPs and determine their contribution to breast cancer risk in the Multiethnic Cohort (cases: 1612, controls: 1961). Two SNPs were significantly associated with breast cancer risk in this study (one in each of SMRT and CALCOCO1). These SNPs, however, are found exclusively or nearly exclusively in African Americans and therefore cannot be feasibly tested in DNA banks derived from European individuals. One SRC-3 SNP previously identified to be protective in our study [15] was genotyped (rs2230782) and found not to be associated with altered breast cancer risk in the Haiman study [16]. The other SNP we reported to be protective (rs2076546) was not genotyped in this study since it focused on non-synonymous SNPs.
Here we report an extension of our previous study that identified two SNPs within SRC-3 associated with reduced breast cancer risk [15]. We followed a similar approach by genotyping candidate SNPs for associations with breast cancer risk in a high-risk, BRCA1/2 mutation-negative case-control study; however, the original study was extended in three ways. First, three additional coregulators were examined. Second, we sequenced 96 apparently normal individuals from two populations (48 Caucasian Americans and 48 African Americans) to discover novel SNPs and to confirm or reveal SNP frequency information in different populations. Third, a larger population was examined, almost doubling the number of cases and significantly improving our statistical power. The association studies allowed us to strengthen the significance of the protective effect previously reported for a SNP in SRC-3 while extending it to a rare two-SNP haplotype that is highly protective for breast cancer risk.

SNP Discovery
Target sequence obtained from NCBI consisting of all exons, 500 bp of proximal promoter, and 25 bp of flanking introns from SRC-1, SRC-3, NCoR, and SMRT was submitted for primer design and Sanger sequencing to Polymorphic DNA Technologies Inc. (Alameda CA). DNA from 96 samples (48 Caucasian American, 48 African American) obtained from the Coriell Institute (Camden, NJ, USA) (sample sets: HD100CAU and HD100AA) was sequenced in both directions and aligned to NCBI reference sequence and previously reported SNPs in dbSNP. These samples had been collected and anonymized by the National Institute of General Medical Sciences. Visual inspection of chromatograms was conducted for heterozygous calls.

Genotyping Cohort
A case-control study was conducted investigating a German familial breast cancer study cohort. Unrelated, Ger-man, female BRCA1/2 mutation negative index cases from breast cancer families were used in this study. The samples, all of Caucasian origin, were collected during the years 1997-2005 by six centers of the German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC: centers of Heidelberg, Würzburg, Cologne, Kiel, Düsseldorf and Munich, see authors affiliations). Familial cases were identified based on (A1) families with two or more breast cancer cases including at least two cases with onset below the age of 50 years; (A2) families with at least one male breast cancer case; (B) families with at least one breast cancer and one ovarian cancer case; (C) families with at least two breast cancer cases including one case diagnosed before the age of 50 years; (D) families with at least two breast cancer cases diagnosed after the age of 50 years; (E) single cases of breast cancer with age of diagnosis before 35 years. These selection criteria which have previously been reported [17] enrich for cases caused by genetic factor(s). The control population included healthy and unrelated female blood donors collected by the Institute of Transfusion Medicine and Immunology (Mannheim), sharing the ethnic background and sex with the breast cancer patients. The age distribution in the controls and cases was similar (controls: mean age 45.6 years, median age 46 years, age range from 18 to 68 years old; cases: mean age 45.1 years, median age 45 years, age range from 19 to 87 years old). According to the German guidelines for blood donation, all blood donors were examined by a standard questionnaire and gave their informed consent. They were randomly selected during the years 2004-2007 for this study and no further inclusion criteria were applied during recruitment. The study was approved by the Ethics Committee of the University of Heidelberg (Heidelberg, Germany).

Genotyping
Genotyping was conducted using TaqMan allelic discrimination assays. Primers and TaqMan MGB probes were purchased from Applied Biosystems (Foster City, CA). Genotyping call rates for all studies were >97%. The SNP assays were validated by re-genotyping 5% of all samples. The concordance rate for all SNPs varied from 99 to 100%.

Statistical Analysis
Hardy-Weinberg equilibrium test was undertaken using the chi-square "goodness-of-fit" test. Crude odds ratios (ORs), 95% confidence intervals (95% CIs) and P values were computed by unconditional logistic regression using a tool offered by the Institute of Human Genetics, Technical University Munich, Germany http://ihg.gsf.de/cgi-bin/ hw/hwa1.pl. Power calculations were determined using power and sample size calculator software PS version 2.1.31 http://www.mc.vanderbilt.edu/prevmed/ps/. With the total sample size, we had 80% power to detect OR of 0.79/1.26 and 0.57/1.56 for carrier frequencies of 30% and 5%, respectively.

Haplotype Analysis
Haplotypes of variants located in the same gene were determined using the PHASE 2 software created by Stephens et al. [18], or SNPHAP 1.3 software created by David Clayton http://www-gene.cimr.cam.ac.uk/clayton/ software/snphap.txt. Each individual was assumed to carry the most likely pair of haplotypes and the haplotype distributions were estimated based on the controls.

SNP Discovery
Complete coding regions and 25 bp of the flanking intronic regions of SRC-1, SRC-3, NCoR, and SMRT were fully sequenced in both directions using Sanger sequencing in 96 apparently normal individuals (48 Caucasian American, 48 African American) generating a total of ~5.8 MB of sequence. From this effort we identified 120 SNPs (61 in SMRT, 33 in NCoR, 18 in SRC-3, and 8 in SRC-1). A summary of the results is shown in Table 1  By conducting the sequencing in two populations, we were able to distinguish SNPs unique to a particular population. We identified 66 SNPs unique to African Americans and 23 SNPs unique to Caucasian Americans (see Additional File 1). This distribution is similar to that reported previously in the SNP@Ethnos database for Yoruban and European populations and is hypothesized to arise from bottlenecks in non-African population his-tory [19] However, most of the unique SNPs found in Caucasians were rare, possibly suggesting that these are recent alterations since only 4 out of the 23 unique SNPs (17%) were found in more than a single individual. On the other hand, 31 out of the 66 unique SNPs (47%) in African Americans were found in more than a single individual. It is important to note that some of the population unique SNPs are rare and since only 48 individuals were sequenced for each population, they could appear as unique SNPs purely by chance.
From our sequencing effort we identified 74 SNPs in these four coregulators not previously represented in dbSNP or reported in the recent study by Haiman et al [16] (Table 1, columns on the right). We will refer to these SNPs as novel SNPs. Surprisingly, 8 of these novel SNPs were found at MAF>5% (7 within SMRT and 1 within SRC-3). Of the 74 novel SNPs, 18 were nonsynonymous, again with SMRT harboring many of the alterations. This illustrates that   SMRT  61  43  17  16  32  6  7  NCoR  33  25  10  1  30  9  0  SRC-3  18  11  8  5  10  3  1  SRC-1  8  7  1  3  2  0  0   Total  120  86  36  25  74  18  8 SMRT is by far the most polymorphic of the 4 coregulators. A recent study suggests that mutation rate, compared to selection pressure, has a larger impact on polymorphism frequency in a region [20]. Further, areas of condensed chromatin have been suggested to have the highest level of background mutation [21]. Together this suggests that SMRT is under less selective pressure than NCoR and/or is in a region of the genome with a higher mutation rate (possibly in an area of condensed chromatin).

Genotyping for Association with Breast Cancer Risk
We genotyped a case-control study of female index patients of BRCA1/BRCA2 mutation negative breast cancer families for two SNPs in SRC-3 which were previously shown to have a protective effect for breast cancer [15] (rs2230782 and rs2076546). Additionally, we genotyped other coregulator SNPs we rationalized may have functional consequences based on the severity of the amino acid change and proximity to functional domains [rs1804645 (SRC-1), rs6094752 (SRC-3), and rs2229840 & rs7978237 (SMRT)] (positions are highlighted in Figure  1). For example, rs1804645 (SRC-1 P1272S) was chosen since it is the only non-synonymous SNP in SRC-1, is located in the second activation domain, and is predicted to be 'probably damaging' by a polymorphism phenotype prediction tool (PolyPhen, http://genetics.bwh.har vard.edu/pph/). Rs6094752 (SRC-3 R218C) was chosen because of the loss of charge and size as a result of the amino acid substitution, and is one of the most common non-synonymous SNPs in SRC-3. The SNPs in SMRT, rs2229840 (A1706T) and rs7978237 (G781E) were chosen for genotyping due to high frequency, severity of amino acid change, and location in a functional domain. Several approaches to design TaqMan assays for rs7978237 failed. We were therefore unable to obtain genotyping information for this SNP.
The genotyping results were in Hardy-Weinberg equilibrium in controls for all SNPs investigated (p = 0.309 for rs1804645; p = 0.112 for rs6094752; p = 0.058 for rs2230782; p = 0.067 for rs2076546; p = 0.140 for rs2229840). The three SNPs that we rationalized may have functional consequences that we were able to genotype, namely SRC-1 P1272S (rs1804645), SRC-3 R218C (rs6094752), and SMRT A1706T (rs2229840), did not significantly associate with breast cancer risk (Table 2). Also, stratification for age (> = 50 year and <50 years of age) in order to investigate a possible risk influence in preor postmenopausal women revealed no significant associations except for rs6094752 where a significant effect could be detected for heterozygous carriers only (Table 3). However, this is most likely a chance effect due to multiple testing. Stratification by bilateral cases revealed no significant associations (Table 4). We observed a protective effect of the homozygous c-allele carrier of SRC-3 Q586H rs2230782 (GG+GC versus CC: OR = 0.45, 95%CI = 0.041, Table 2), similar to the findings that have been reported before (GG+GC versus CC: OR = 0.39, 95%CI = 0.14-1.05 p = 0.061) [15]. As our study included a portion of the samples of the previous reported study it is noteworthy to mention that the results of the current study excluding the previously analyzed samples show the same protective effect and borderline significance (GG+GC versus CC: OR = 0.37, 95%CI = 0.13-1.08, p = 0.059). However, we failed to replicate previous associations between SRC-3 rs2076546 (T960T) SNP and breast cancer risk. The haplotype analysis of the variants analysed in SRC-3 revealed a protective haplotype including the C-C-G-alleles of R218C, Q586H and T960T, respectively ( Table 5).
As the haplotype is very rare occurring with a frequency of 0.03 in controls this result has to be verified in further multi-center collaboration studies.
The discordant findings between our studies and the Haiman study [16] with respect to SRC-3 Q586H may be due to the inherent differences in the populations examined. For example, our studies exclusively examined Europeans while the study by Haiman et al. examined a range of ethnic backgrounds. A number of recent studies suggest that a SNP association could be specific to the genetic background of a certain ethnic group [22,23]. It is possible that the Q586H effect is only seen in European populations, and/or that the lower number of unselected European cases within the Haiman study had insufficient power to detect this effect. The selection of high risk BRCA1/BRCA2 mutation negative cases in our study is expected to act as a multiplier to further increase our power to detect associations. Lastly, since only nonsynonymous SNPs were genotyped in the Haiman study, the stronger effect seen in the two-SNP SRC-3 haplotype could not be observed. We did not genotype the two SNPs (SMRT H52R and CALCOCO1 R12H) identified in the Haiman study to be associated with breast cancer risk since they were found either exclusively or predominantly in African Americans (European population MAF: SMRT H52R = 0%, CALCOCO1 R12H = 0.6%). Since our study exclusively contains Europeans, it was unlikely that we would obtain sufficient power to detect an association.

Conclusions
In summary, these results illustrate the dramatic differences in polymorphism frequency that can be seen amongst closely related genes. Further, the fact that so many novel SNPs were identified through our sequencing effort, even common SNPs with MAF>5%, illustrates the huge amount of genetic diversity that has yet to be discovered. Finally, the strengthening of the association between the SRC-3 Q586H SNP and decreased breast cancer risk, and the identification of a rare haplotype within SRC-3 associated with decreased risk, suggest that this information could be used to help identify a subgroup of high-risk women at a more modest risk. However, this remains to be verified prospectively.