Analysis of functional germline variants in APOBEC3 and driver genes on breast cancer risk in Moroccan study population

Background Breast cancer (BC) is the most prevalent cancer in women and a major public health problem in Morocco. Several Moroccan studies have focused on studying this disease, but more are needed, especially at the genetic and molecular levels. Therefore, we investigated the potential association of several functional germline variants in the genes commonly mutated in sporadic breast cancer. Methods In this case–control study, we examined 36 single nucleotide polymorphisms (SNPs) in 13 genes (APOBEC3A, APOBEC3B, ARID1B, ATR, MAP3K1, MLL2, MLL3, NCOR1, RUNX1, SF3B1, SMAD4, TBX3, TTN), which were located in the core promoter, 5’-and 3’UTR or which were nonsynonymous SNPs to assess their potential association with inherited predisposition to breast cancer development. Additionally, we identified a ~29.5-kb deletion polymorphism between APOBEC3A and APOBEC3B and explored possible associations with BC. A total of 226 Moroccan breast cancer cases and 200 matched healthy controls were included in this study. Results The analysis showed that12 SNPs in 8 driver genes, 4 SNPs in APOBEC3B gene and 1 SNP in APOBEC3A gene were associated with BC risk and/or clinical outcome at P ≤ 0.05 level. RUNX1_rs8130963 (odds ratio (OR) = 2.25; 95 % CI 1.42-3.56; P = 0.0005; dominant model), TBX3_rs8853 (OR = 2.04; 95 % CI 1.38-3.01; P = 0.0003; dominant model), TBX3_rs1061651 (OR = 2.14; 95 % CI1.43-3.18; P = 0.0002; dominant model), TTN_rs12465459 (OR = 2.02; 95 % confidence interval 1.33-3.07; P = 0.0009; dominant model), were the most significantly associated SNPs with BC risk. A strong association with clinical outcome were detected for the genes SMAD4 _rs3819122 with tumor size (OR = 0.45; 95 % CI 0.25-0.82; P = 0.009) and TTN_rs2244492 with estrogen receptor (OR = 0.45; 95 % CI 0.25-0.82; P = 0.009). Conclusion Our results suggest that genetic variations in driver and APOBEC3 genes were associated with the risk of BC and may have impact on clinical outcome. However, the reported association between the deletion polymorphism and BC risk was not confirmed in the Moroccan population. These preliminary findings require replication in larger studies.


Background
Breast Cancer (BC) is one of the most frequent malignant disease and primary cause of death in women worldwide. Approximately 522,000 women died on BC in 2012 and 1.67 million new cancer cases were diagnosed worldwide [1,2].
The vast majority of sporadic and familial breast cancer cases arise due to lifelong accumulation of genetic factors in the breast tissue. Recent genome-wide association studies (GWASs) focusing on evaluating common single nucleotide polymorphisms (SNPs) have identified more than 70 genetic susceptibility loci for breast cancer . Partial and full tumor genome sequences have revealed the existence of hundreds to thousands of mutations in most cancers [26][27][28][29][30][31][32]. However, genome sequencing has revealed that many cancers, including breast cancer, have somatic mutation spectra dominated by C-to-T transitions [27][28][29][30][31][32]. Recently, the International Cancer Genome Consortium (ICGC) was launched to identify those somatic mutations and consequently to determine those genes which are required for human cancer development [29,33]. Approximately 10 % of those are driver mutations, which initiate the carcinogenic process [34].
Additionally, recent studies have shown that copy number variations (CNVs), another type of genetic variation, occur frequently in the genome and account for more nucleotide sequence variation than single-nucleotide polymorphisms [35]. This variation accounts for roughly 12 % of human genomic DNA, and each variation may range from about 1 kb to several megabases in size [36]. Recently, through CNV GWAS, Long et al. [37] discovered a common CNV locus for breast cancer in Chinese women, which was located between exon 5 of APOBEC3A and exon 8 of APOBEC3B, resulting in a fusion gene with a protein sequence identical to APOBEC3A, but with a 3'-UTR of APOBEC3B. This deletion has been associated with increased BC risk in both Chinese and a Caucasian population with a population frequency of around 37 and 6 % respectively [37][38][39]. In addition to decreased expression of APOBEC3B, the deletion may lead to alteration in APO-BEC3A RNA stability.
Considering the potential function of driver and APO-BEC3 gene in the process of tumorigenesis in BC, it is possible that germline variations and CNV in those genes could influence the risk of BC. For this reason, we conducted this case-control study in a sample of Moroccan women.

Study population
The present case-control study was performed involving 226 cases, recruited from the Department of Oncology of the Littoral Clinic of Casablanca during 2013. The control group included a total of 200 healthy women with no personal history of cancer diseases selected from DNA bank volunteers of the Genetics and Molecular Pathology Laboratory. Clinico-pathological parameters including age at diagnosis, menopausal status, histology type, tumor size, Scarff-Bloom-Richardson (SBR) grade, lymph nodes status, and hormone receptors status were obtained from patients' medical records. The study protocols have been approved by the Ethic Committee for Biomedical Research in Casablanca (CERBC) of the Faculty of Medicine and Pharmacy and written informed consent was obtained from each subject.

Gene/SNP selection
Regarding driver genes, we focused on genes described to carry BC driver mutations in at least two of the fol-  [32,[40][41][42]. The well-known and intensively studied genes such as BRCA1 or PTEN were excluded from this study . A total of 36  SNPs across 11 driver genes (ARID1B, ATR, MAP3K1,  MLL2, MLL3, NCOR1, RUNX1, SF3B1, SMAD4, TBX3,  TTN) and 2 genes of APOBEC3 family (APOBEC3A, APOBEC3B) were selected to the study based on data obtained from Ensembl Genome browser (http://www.ensembl.org/index.html) for the CEU (Utah residents with Northern and Western European ancestry from the CEPH collection). The SNPs selection was based on these criteria: (1) minor allele frequency (MAF) value over 10 %; (2) location within the coding region (non synonymous SNPs), core promoter regions and 5'-and 3'-untranslated regions (UTRs), (3) Haploview was used to select SNPs on the basis of linkage disequilibrium (LD; r 2 ≥ 0.80)) to minimize the number of SNPs to be genotyped. Regulo-meDB (http://www.regulomedb.org/) was used to explore the potential function of the associated SNPs.

Genotyping
Genomic DNA was extracted from peripheral blood leukocytes using the salting out procedure [31]. Genomic DNA was dissolved in TE (10 mM Tris-HCl and 0.1 mM EDTA, pH8.0). Spectrophotometry was used to quantify DNA using the Nanovue TM Plus spectrophotometer.
Genotyping was performed using TaqMan SNP Genotyping Assay from Life Technologies (Darmstadt, Germany) or KASPar SNP Genotyping system from KBioscience (Hoddesdon, Great Britain) in a 384-well plate format. Master Mix for the the KASPar assay was prepared according to the KBioscience's conditions and products, whereas 5× HOT FIREPol Probe qPCR Mix Plus from Solis BioDyne (Tartu, Estonia) for TaqMan SNP Genotyping Assay was used. The Polymerase chain reactions (PCR) were performed in a final reaction volume of 5 μl per well. The PCR poducts were analyzed using ViiA7 Real-Time PCR System from Applied Biosystems (Weiterstadt, Germany).

Screening for APOBEC3 deletion
Polymerase chain reaction (PCR) was carried out to amplify APOBEC3 gene in a final volume of 10 μl containing 10× reaction buffer, 50 mM MgCl 2 , 10 mM dNTPs, 10 μM primers, 5U Taq DNA polymerase, and 10 ng genomic DNA. The PCR amplification parameters were 40 cycles of 1 min of denaturing at 95°C, 1 min of annealing at 60°C, and 1 min of extension at 72°C.

Statistical analysis
The Hardy Weinberg equilibrium (HWE) was tested by comparing observed and expected genotype frequencies in both cases and controls using χ2 test. Odds ratio with a confidence intervals (CIs) of 95 % were calculated using multiple logistic regression (PROC LOGISTIC, SAS Version 9.2; SAS Institute, Cary, NC) to assess the strength of the association between genotypes and breast cancer risk. The P value ≤ 0.05 was considered statistically significant.

In Silico prediction
To investigate how the SNPs can influence the gene expression and their consequences on protein binding sites, chromatin structure and promoter and enhancer strength, we used HaploReg (http://www.broadinstitute.org/mammals/ haploreg/haploreg.php). To identify the possible effects on histone modification we used RegulomeDB (http://regulome.stanford.edu/). These effects were proofed for data in MCF7 (Michigan Cancer Foundation-7 breast cancer cell line), T-47D (epithelial cell line derived from mammary ductal carcinoma), HMEC (human mammary epithelial cells) or MCF10A-ER-SRc (breast epithelial cell line -estrogen receptor -src) cell lines. SIFT and PolyPhen predictions were used to determine the possible effect of amino acid substitutions on protein function and structure (Ensemble release 75, http://www.ensembl.org/index.html). The MicroSNiPer was used to predict the impact of all the significant SNPs of this study located in 3'UTR on micro-RNA binding using microSNiPer (http://epicenter.ie-freiburg.mpg.de/services/microsniper/).

Results
The baseline characteristics of the population sample analyzed in our study are listed in Table 1. In total, 226 BC cases and 200 controls were successfully genotyped for 36 selected SNPs in 13 potential genes. Altogether 12 SNPs in 8 driver genes, 4 SNPs in APOBEC3B gene and 1 SNP in APOBEC3A gene were associated with BC risk and/or clinical outcome at P ≤ 0.05 level (Tables 2  and 3).
The most significant associations with BC risk were observed for RUNX1_rs8130963 ( (Table 3). An increased risk was observed for homozygous carriers of the minor allele for rs178831 in NCOR1 (OR 2.22, 95%CI 1.00-4.95) ( Table 2), however no association with clinical tumor characteristics was observed. Two of the six genotyped SNPs in TTN were associated with less aggressive tumor features: rs12463674 with low histological grade and rs2244492 with low hormone receptor status (Table 3). Additionally, the minor allele carriers of the SNPs rs6001376 in APOBEC3B and rs832583 in MAP3K1 had an increased risk of BC (OR 2.15, 95 % CI 1.16-4.00; OR and OR 3.37, 95 % CI 1.20-9.47, respectively) ( Table 2). Three additional SNPs in APOBEC3B showed associations with clinic-pathological features: large tumor size and hormone receptor status (Table 3). An increased risk was observed for rs12456284 in SMAD4(OR 2.04, 95%CI 1.32-3.15). The SNP was also associated with histologic grade. No correlation was observed between APOBEC3 deletion and clinic-pathological parameters of breast cancer either in the hormone receptor status, tumor size, histological grade, lymph node status and distant metastases (Table 4). In addition, no statistically significant association was observed between APOBEC3 deletion and breast cancer risk (Table 5).

Discussion
In this population-based case-control study, we investigated for the first time the influence of the germline variation and CNVs in the potential driver genes and APO-BEC3 genes on breast cancer susceptibility in a North African population.
The APOBEC3 genes family, including APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3E, APO-BEC3F, APOBEC3G, and APOBEC3H, plays pivotal roles in intracellular defense against viral infections [43]. The APO-BEC3 genes family encodes cytosine deaminases that have been implicated in innate immune responses by restricting retroviruses, mobile genetic elements like retro-transposons and endogenous retroviruses [44]. Furthermore, the APO-BEC3 genes may play a role in carcinogenesis by triggering DNA mutation through dC deamination [45]. Moreover, expression of the APOBEC3 genes is regulated by estrogen [46], a hormone that plays a central role in the etiology of breast cancer. Very recently, Burns et al. provided evidence that APOBEC3B is overexpressed in breast cancer tumors and cell lines and that the APOBEC3B mutation signature is statistically more prevalent in the breast tumor database of The Cancer Genome Atlas (TCGA) than is expected [47]. Interestingly, the APOBEC3B mutation signature was detectable in colorectal and prostate cancers only when whole-genome, but not whole-exome, data were used, suggesting a tissue-specific bias against enrichment of mutations by APOBEC3B in coding regions. Both studies from Burns et al. and Roberts et al. reached the same conclusion that the APOBEC3B mutation signature is specifically enriched in six types of cancers, including those of the cervix, bladder, lung (adeno and squamous cell), head and neck, and breast [47,48].
Furthermore, the APOBEC3 deletion is 29.5 kb in length, located between exon 5 of APOBEC3A gene and exon 8 of APOBEC3B gene resulting in complete removal of the coding region of the APOBEC3B gene. This deletion is associated with decreased expression of the APOBEC3B gene in breast cancer cells [46]. Somatic deletion of this 29.5 kb has also been observed in breast and oral cancer tumor tissue [39,46]. In the present study, our results did not reveal significant association between APOBEC3 deletion polymorphism and breast cancer risk (Table 5). This result is in agreement with a Japanese case-control study of 50 cases and 50 controls   reporting a non-statistically significant risk of breast cancer associated with homozygous deletion of this region (OR = 3.91, 95 % CI = 0.77 to 19.83) [49]. Nevertheless, there are some studies showing an important role of this CNVs in breast cancer and provide additional evidence to implicate APOBEC3 deletion as a novel susceptibility factor for breast cancer risk [37,39]. In addition, our genetic data pointed to the possible involvement of genetic variants within the studied genes NCOR1, RUNX1, SMAD4, TBX3, TTN, ATR, ARID1B and MAP3K1. The most significant association with breast cancer risk was identified by RUNX1_rs8130963, RUNX1_ rs17227210, TBX3_rs8853, TBX3_ rs1061651, TBX3_2242442, TTN_rs12463674, and ATR_rs2227928. The other driver gene did not reveal an important role in breast cancer risk.
RUNX1 (Run-Related Transcription Factor 1) also known as AML1 (acute myeloid leukemia 1 gene) is a tumor suppressor gene with a length of 1,196,949 bp and was original identified in acute myeloid leukemia (AML). Previously, several studies have suggested that the RUNX1 gene is highly expressed in breast epithelial cells and it is frequently mutated in breast cancer [50]. Down regulation of RUNX1 is part of a 17-gene signature that has been suggested to predict breast cancer metastasis [51]. In the present study, 2 of 3 genotyped SNPs (rs8130963 and rs17227210) were associated with breast cancer risk. Rs8130963 shows a strong genetic differentiation between the European and African population (Fst = 0.346), which is an indication for positive selection. Interestingly rs17227231 which is linked with an r 2 = 92 to rs17227210 could change the protein binding of GATA3 (GATA binding protein3) as well as the transcription factor binding site of GATA. GATA3 was already classified as a high confident driver gene for breast [52]. On the other hand, rs17227210 has an effect in splicing. The variant C do not bind SF2/ASF which is involved in alternative mRNA splicing. It is a member of the serine/arginine rich protein family and was found to be up regulated in diverse tumors [49].
The T-box transcription factor 3 (13,910 bp) is expressed in mammary tissues and plays therefore a context-dependent role in mammary gland development as well as in mammary tumor genesis [53]. In addition, The TBX3 is overexpressed in a number of breast cancer cell lines [54] and could serve as a biomarker [55]. Our results reveal that one of genotyped SNPs in TBX3 was associated both with breast cancer risk and clinical outcome. Rs8853 apparently has an impact on the transcription factor binding site STAT (signal transducer and activator of transcription). Gene expression of TBX3 could be influenced by the SNP rs8853 and its impact on miR-3189.
However an association to breast cancer could not be discovered. Furthermore Douglas and Papaioannou observed TBX3 overexpression in estrogen-receptor-positive breast cancer cell lines [53]. However, other publications describe an effect of TBX3 overexpression results in a pool of estrogen receptor negative cancer stem-like cells [56]. TTN (Titin or connectin) is the largest polypeptide encoded by the human genome [57] and it has been intensely studied as a component of the muscle contractile machinery [27]. However, TTN is expressed in many cell types and has other functions that are compatible with a role in oncogenesis [58][59][60]. The role of TTN as a cancer gene is currently a mathematically based prediction and will require direct biological evaluation. During the present study, 2 out of 6 genotyped SNPs show significant association with increased risk and 4 out of 6 genotyped SNPs with clinical outcome. In addition, more than 50 % of the statistical significant SNPs show an association with negative estrogen or progesterone receptor status. A link between hormones and calcium, which plays a major role in the muscle contractile machinery were Titin is located, could be seen in the estrogen signaling pathway, where the Calcium signaling pathway is a part of. Furthermore, a relation of Calcium signaling pathways and breast cancer is proofed [61,62].
ATR (Ataxia Telangiectasia mutated and Rad3-related), an essential regulator of genomic integrity, controls and coordinates DNA-replication origin firing, replicationfork stability, cell cycle checkpoints, and DNA repair [63]. Smith et al. showed that overexpression of the ATR gene resulted in a phenocopy of the i(3q). The genetic alteration of ATR leads to loss of differentiation as well as cell cycle abnormalities [64]. Thus ATR has been studied as a target for cancer therapy [65]. However new Inhibitors such as caffeine has been proven as fragile and nonspecific [66]. In the present study, rs2227928 was genotyped and statistical analyzed. It is predicted to be tolerated according to Ensembl release [67]. Rs2227928 could be associated with tumour size >2 cm and negative estrogen or progesterone receptor status. It has been frequently studied for an association in different populations. However, they have found no significant differences [68,69]. These conflicting results about the relationship between rs2227928 and breast cancer could be related to some factors such as sample size and environmental factors but not genetic background. All three populations have European ancestry and can be summarized under the phylogenetic definition Caucasian. In this context, by increasing the sample size number of the French and Finish population an association of rs2227928 and breast cancer could be expected. Some SNPs which are linked with an r 2 between 85 and 97 to rs2227928 are located in gene PLS1 (Plastin1). The encoded actin-binding protein  has been found at high levels in small intestine [70]. However an association with breast cancer could not be discovered. Regarding signatures of selection rs2227928 shows a significant value among the European vs. African population (Fst =0.076). Some limitations should be addressed in this study. The statistical power to perform interaction analyses between different SNPs and breast cancer risk is still limited because of our small sample size. In addition, because no data were available on SNP frequencies in any North African population, we used data on the CEU population in our selection process. As also shown by our genotyping, the genetic constitution of the Moroccan population is very similar, and it has been influenced by both European and Sub-Saharan gene flow. However, we may have missed some SNPs private to the North African populations. There may also be some rare SNPs with minor frequency allele or SNPs with still-unknown regulatory properties that were not covered by our study.

Conclusion
Our preliminary genetic analysis suggests a potential role of germline variations in driver and APOBEC3 genes in breast cancer susceptibility. These mutations can have impact on clinical outcome and/or BC risk. We could also show that there is a strong association between the polymorphisms in RUNX1, TBX3, TTN, ATR genes and the risk of BC. However to verify the results of breast cancer risk and the influence of these polymorphisms further researchers are necessary.