Mutation analysis and characterization of ATR sequence variants in breast cancer cases from high-risk French Canadian breast/ovarian cancer families

Background Ataxia telangiectasia-mutated and Rad3-related (ATR) is a member of the PIK-related family which plays, along with ATM, a central role in cell-cycle regulation. ATR has been shown to phosphorylate several tumor suppressors like BRCA1, CHEK1 and TP53. ATR appears as a good candidate breast cancer susceptibility gene and the current study was designed to screen for ATR germline mutations potentially involved in breast cancer predisposition. Methods ATR direct sequencing was performed using a fluorescent method while widely available programs were used for linkage disequilibrium (LD), haplotype analyses, and tagging SNP (tSNP) identification. Expression analyses were carried out using real-time PCR. Results The complete sequence of all exons and flanking intronic sequences were analyzed in DNA samples from 54 individuals affected with breast cancer from non-BRCA1/2 high-risk French Canadian breast/ovarian families. Although no germline mutation has been identified in the coding region, we identified 41 sequence variants, including 16 coding variants, 3 of which are not reported in public databases. SNP haplotypes were established and tSNPs were identified in 73 healthy unrelated French Canadians, providing a valuable tool for further association studies involving the ATR gene, using large cohorts. Our analyses led to the identification of two novel alternative splice transcripts. In contrast to the transcript generated by an alternative splicing site in the intron 41, the one resulting from a deletion of 121 nucleotides in exon 33 is widely expressed, at significant but relatively low levels, in both normal and tumoral cells including normal breast and ovarian tissue. Conclusion Although no deleterious mutations were identified in the ATR gene, the current study provides an haplotype analysis of the ATR gene polymorphisms, which allowed the identification of a set of SNPs that could be used as tSNPs for large-scale association studies. In addition, our study led to the characterization of a novel Δ33 splice form, which could generate a putative truncated protein lacking several functional domains. Additional studies in large cohorts and other populations will be needed to further evaluate if common and/or rare ATR sequence variants can be associated with a modest or intermediate breast cancer risk.

transcript generated by an alternative splicing site in the intron 41, the one resulting from a deletion of 121 nucleotides in exon 33 is widely expressed, at significant but relatively low levels, in both normal and tumoral cells including normal breast and ovarian tissue.

Conclusion:
Although no deleterious mutations were identified in the ATR gene, the current study provides an haplotype analysis of the ATR gene polymorphisms, which allowed the identification of a set of SNPs that could be used as tSNPs for large-scale association studies. In addition, our study led to the characterization of a novel ∆33 splice form, which could generate a putative truncated protein lacking several functional domains. Additional studies in large cohorts and other populations will be needed to further evaluate if common and/or rare ATR sequence variants can be associated with a modest or intermediate breast cancer risk.

Background
All common cancers show some degree of familial clustering [1]. Most of the familial aggregation, especially in breast cancer [2], results predominantly from inherited susceptibility [3]. Linkage studies in the 1990s led to the discovery of several predisposition genes associated with many rare familial cancer syndromes, thus providing fundamental insights into various pathways of carcinogenesis [4]. Nevertheless, this approach has mainly been limited to genes with relatively rare, highly penetrant alleles, for several reasons, such as a lack of power to detect alleles conferring modest or moderate risks that are believed to be involved in common cancers [1,[5][6][7]. Analyses of risk attributable to such alleles in the known breast cancer susceptibility genes (e.g. BRCA1, BRCA2, TP53, PTEN, ATM) suggest they are responsible for ~25% of the familial component of breast cancer risk [6,8,9]. The number and properties of genetic variants that account for the remaining 75% of inherited risk are largely unknown. It has been proposed that a complex polygenic model is the best explanation for this missing genetic risk [10,11] and perhaps the majority of breast cancers arise in a susceptible minority of women [2,12].
Under the Common Variant/Common Disease (CV/CD) model, disease susceptibility is suggested to result from the joint action of several common variants, with unrelated affected individuals sharing a substantial proportion of disease alleles [13][14][15]. The alternative is the heterogeneity hypothesis, which maintains that genetic susceptibility to common disease is caused by many different rare genetic variants, with a relatively large effect produced by each allele [16][17][18][19]. If most cancer susceptibility is related to fundamental processes of cellular control, rare alleles might turn out to be the more important component and should be detectable by linkage analysis and/or the candidate gene re-sequencing approach [5,6].
The central role of BRCA1 and BRCA2 genes in DNA repair, recombination, cell cycle control and transcription [20,21] has led to the investigation of the implication of several similarly acting genes in breast and/or ovarian cancer predisposition, including ATM (Ataxia telangiectasia-mutated) [22][23][24][25][26][27], CHEK2 [28,29], TP53 [30], PTEN [31], STK11 [32] and a few other genes involved in DNA repair [33]. Ataxia-telangiectasia-mutated and Rad3-related (ATR) is a member of the phosphatidyl inositol-kinase (PIK)-related family which plays, along with ATM, a central role in cell-cycle regulation, by transducing DNA damage signals to downstream effectors of cell-cycle progression [34]. In response to double-strand breakage, stalled replication forks or DNA adducts, ATR complexed with ATR-interacting protein (ATRIP) is recruited and then phosphorylates a number of proteins involved in DNA damage, including H2AX, 53BP, TP53, NBS1 and CHEK1 [35][36][37][38], thereby activating cell checkpoints, DNA repair or apoptosis. ATR is also able to bind to Rad17 and BRCA1 and to associate with components of the nucleosome remodeling and deacetylating complex [39][40][41]. Furthermore, ATR has recently been shown to interact with the Fanconi Anemia complex [42], which growing number of evidences link to the two BRCA genes [ [21], for review see [43]]. A recent study has also demonstrated that the Mre11/Rad50/NBS1 (MRN) complex, a central component in the cellular response to ionizing radiations and other causes of double-strand breaks, is required for ATR-dependant phosphorylation mechanisms of the protein Smc1 (Structural maintenance of chromosomes 1) [44]. ATR knockout studies showed that ATR is essential for somatic cell growth and genomic integrity in the embryo and that its deletion leads to genomic disruption and early embryonic lethality in mice [45,46]. Moreover, it has been reported that disruption of the ATR gene leads to an increase in the incidence of large benign tumors in heterozygotes, possibly indicating that deficiency in ATR affects the rate of tumor initiation [45].
Based on the major role of ATR in cellular response to DNA damage and its multiple interactions with several proteins such as BRCA1 [40,47], ATR represents an attractive candidate gene to potentially explain a fraction of the remaining breast cancer susceptibility. The current study was designed to assess the possible involvement of ATR germline mutations in breast cancer susceptibility. For this purpose, the complete sequence of the 47 exons and flanking intronic sequences of the ATR gene were ana-lyzed in DNA samples from individuals affected with breast cancer from non-related BRCA1-and BRCA2-negative high-risk French Canadian breast/ovarian families.

Ascertainment of families and DNA extraction
The recruitment of high-risk French Canadian breast and/ or ovarian families started in 1996 through a research project, which thereafter evolved in a large ongoing interdisciplinary research program designated INHERIT BRCAs. More details regarding ascertainment criteria, experimental and clinical procedures as well as the INHERIT BRCAs research program have been described elsewhere [48][49][50][51][52]. A major component was to identify and characterize BRCA1 and BRCA2 mutations in French Canadian high-risk families (CGL cohort) [52].
Subsequently, another component was designed for the "Localization and identification of new breast cancer susceptibility loci/genes". Ethics approval for this latter study was also obtained from the different institutions participating in this research project and each participant knowing their inconclusive BRCA1/2 test results status had to sign a specific informed consent for their participation in this component. A subset of 54 high-risk French Canadian breast/ovarian cancer families were recruited in the present study according to the following ascertainment criteria 1) three or more breast cancer cases diagnosed before the age of 65 (48 families), 2) two or more breast cancer cases (<65) if one breast cancer was diagnosed before 45 years (5 families), 3) or when there was a strong family history of breast/ovarian cancer (e.g. daughtermother-grand-mother) (1 family). All participants had to be at least 18 years of age and mentally capable. The diagnoses of breast and/or ovarian cancer were confirmed by obtaining a pathology report, and when two or more subjects were available within a family, the youngest subject was systematically chosen for this study. The mean age at diagnosis of these 54 subjects affected with breast cancer was 45.5 years old (30-59 years), while 46 of them have been diagnosed before 50 year of age and 11 were affected by more than one breast cancer case. The analysis of the breast cancer history revealed that 15 (28%), 18 (33%) and 19 (35%) families included 1-2, 3 or ≥ 4 case(s) in at most 2 nd degree relatives, respectively. When including all breast cancer cases in the family history occurring in at most 3 rd degree relatives from the index case, 10 (18%), 16 (30%) and 28 (52%) families have 1-2, 3 or ≥ 4 case(s), respectively.
The BRCA1/2 status of each participant was previously assessed [52]. Briefly, to this day, genomic DNA samples have been first tested for a panel of 29 mutations, including 26 truncating mutations and 3 unclassified variants (two missense mutations and one in-frame deletion), observed and/or reported in the French Canadian population [52]. Thereafter, DNA samples of individuals included in this study were sent to Myriad Genetic Laboratories (Salt Lake City, Utah, USA) for full-length BRCA1/ 2 sequencing following their Comprehensive BRACAnalysis ® -BRCA1 and BRCA2 gene sequence analysis for susceptibility to breast and ovarian cancer test, with the exception of 9 subjects for which DNA samples from another affected individual of the family (n = 7) or unaffected parents of cancer cases (n = 2) were sent to Myriad as previously described [49,50,52]. Evidence of the absence of genomic rearrangements in BRCA1/2 genes was thereafter investigated by Multiplex Ligation-dependant Probe Amplification (MLPA) for 45 of the 54 subjects and BRCA1/2 Southern analysis for 32 of the 54 individuals. For seven of the remaining subjects, MLPA was performed on another individual of the family [53], while for two subjects this analysis was not performed.
Genomic DNA from 73 healthy unrelated French Canadian women was obtained from Dr Damian Labuda at the Centre de cancérologie Charles Bruneau, Hôpital Ste-Justine, Montreal, Canada. The individuals who provided these samples were recruited on a non-nominative basis, in the framework of long-term studies aiming the characterization of the genetic variability in human populations, approved by the Institutional Ethic Review Board. DNA from peripheral blood was isolated by conventional methods, either phenol-chlorophorm or using Gentra kits (Minneapolis, MN, USA). The mean age of these individuals was 45.2 years old; 2 (2.7%), 26 (35.6%), 23 (31.5), 17 (23.3%) and 5 (6.8%) of them were between 25-29, 30-39, 40-49, 50-59 and 60-69 year of age, respectively.
The validation group comprised 46 BRCA1/2-negative breast cancer proband cases of French origin belonging to multiple-case breast cancer families from the following sources: high-risk breast cancer only and breast/ovarian cancer families referred for genetic testing at the Department of Preventive Medicine at Creighton University School of Medicine, Omaha, NE, and at the cancer genetic counseling unit at Centre Léon Bérard, Lyon, France, and a population-based study including women diagnosed with breast cancer below age 46 years, recruited through the Rhône region cancer registry, France. The cancer status of index cases was confirmed through pathology reports. Cancers reported in relatives were verified through pathology reports, hospital records and death certificates. Index cases have been screened for mutations in BRCA1 and BRCA2 [54,55]. All subjects provided written informed consent for participation in the study. Approval for the study was obtained from the International Agency for Research on Cancer (IARC) ethics committee.
The mean age at diagnosis of these 46 French subjects affected with breast cancer was 39.7 years old (19-61 years); 44 of them have been diagnosed before 50 year of age and five were affected by more than one cancer case. The analysis of the breast cancer history revealed that 23 (50%), 9 (20%) and 14 (30%) families included 1-2, 3 or ≥ 4 case(s) in at most 2 nd degree relatives, respectively.

PCR amplification, mutation analysis and variant characterization
The intron-exon boundaries of the ATR gene were determined by aligning GenBank mRNA records (NM_001184) with genomic sequence records (NC_000003). ATR spans approximately 130 kb and is composed of 47 exons (3q22-q24: 143650778-143780349). PCR amplicons using primers designed by the Primer Express 2.0 software (Applied Biosystems, Foster City, CA, USA) covered the entire mRNA encoding portions and flanking intronic sequences from genomic DNA. Forty primer pairs were used to amplify fragments ranging in size from 351 bp to 1385 bp, which were sequenced with primers also indicated in the table [see Additional file 1]. ATR direct sequencing was performed on an ABI3731 automated sequencer using version 3.1 of the Big Dye fluorescent method according to the manufacturer's instructions (Applied Biosystems, Foster City, USA). Sequence data were analyzed using the Staden preGap4 and Gap4 programs.

LD analysis, haplotype estimation and tagging SNP selection (tSNP)
To estimate the pattern of linkage disequilibrium (LD), all 41 SNPs identified in our breast cancer case series have been genotyped. The LDA program [56] was used to calculate pairwise LD for each SNP pair. Lewontin's |D'| was used as a measure of LD between SNPs [57,58].
Haplotype analysis was performed using PHASE 2.1.1 software [59,60]. This program (PHASE) estimates haplotype frequencies with a Bayesian-based algorithm and then uses a permutation test to determine the significance of differences in inferred haplotypes between cases and controls. All association tests were run under default conditions, with 1000 permutations. Haplotype frequencies were estimated using the SNPs with minor allele frequency (MAF) ≥ 5% identified in both sample series (cases and controls). Haplotype blocks were identified using genotyping data from control individuals as well as using HapMap data from the CEPH cohort [61] using the Haploview [62,63] software. Tagging SNPs (tSNPs) from each LD block were then identified using the same soft-ware. Splice site prediction scores were evaluated using SSPNN [64] while protein alignment was performed using ClustalW [65].

RNA isolation from cell lines and normal tissue samples
Total RNA was extracted using TRI Reagent ® (Molecular Research Center inc, Cincinnati, OH, USA) according to the manufacturer's instructions as previously described [66]  . RNA samples were then processed as previously described [66]. Thereafter, reverse transcription of 2.5 µg of standardized RNA samples was performed using 250 ng random hexamers and 200 U of SuperScript™ II RNase H -Reverse Transcriptase (Invitrogen, Carlsbad, CA, USA) following the supplier's protocol.

∆33 alternative splice transcript
In order to investigate if the SNP c.5739-4del9+T may lead to alternative splice transcript(s), a PCR reaction was performed, using the forward primer (5'-GCAGAT-GGAAAATCTACAACATGGA) and reverse primer (5'-TGATTTCCATATTGTAGAGATCTGCCA) designed to allow amplification of a specific ATR cDNA fragment spanning nucleotides 5479 to 6348 of the wild-type mRNA, with cDNA samples from immortalized cell lines from two homozygous and one heterozygous individuals for this variant, as well as two wild-type individuals. PCR product lengths were analyzed by migration on 1.5% agarose gel and sequenced in both orientations. Thereafter amplified ATR cDNA fragments were subcloned in the pCRII vector (TA cloning kit from Invitrogen) according to the manufacturer's instructions. After growing colonies and extracting the plasmid DNA samples using the GFX Micro Plasmid Prep Kit (Amersham), sequencing of each amplicon was performed as described above using the forward amplification primer. The only alternative splice transcript observed, designated ∆33, yielded to a 749 bp PCR product, while the wild-type fragment length was 870 bp.
Subcloning and standard curves cDNA samples prepared from RNA extracted from immortalized cell lines were used to amplify by PCR three fragments corresponding to the ATR cDNA region spanning nucleotides 5748-5858 (NM_001184), the ATR splice transcript spanning nucleotides 5626-5857 and including a deletion of the last 121 nucleotides of exon 33, and a fragment spanning nucleotides 450-619 of 18S RNA. The fragments were thereafter subcloned in the pCR ® II vector (TA Cloning ® from Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. To ensure amplification specificity, the reverse primer for the wildtype ATR fragment was designed on the junction between exons 33-34 while the reverse primer specific to the alternate splicing transcript was placed on the new exon 33-34 junction created by the exon 33 3'-deletion. Plasmid constructions were amplified and purified using Plasmid Maxi Kit (Qiagen, Mississauga, ON, Canada). Specific standard curves were generated by making 2-fold serial dilutions of plasmid constructions in the appropriate range for each quantitation assay.
QRT-PCR assays QRT-PCR assays were performed in triplicate on an ABI 7900 Sequence Detection System (Applied Biosystems) as previously described [66]. For all the assays, a reaction mixture was prepared in a final volume of 10 µl with 1X Taqman Master Mix Buffer (Applied Biosystems) which included Taq Gold polymerase, 200 nM of Taqman ® probe, 900 nM of each primer for ATR wild-type and alternative transcript assays or 50 nM of each primer for 18S RNA assays, and cDNA samples reverse-transcribed from total RNA. The amount of cDNA used for quantitation was 15 ng for wild-type, 150 ng for splice transcript and 1.5 ng for 18S RNA.

ATR mutation analysis and variant characterization
Although no truncating mutation was found in the ATR coding region of our French Canadian breast cancer cases, we identified 41 variants in ATR exonic and flanking intronic sequences (Table 1). These included 16 nucleotide substitutions in the exons, 6 of which resulted in amino acid changes, and 25 variants in the intronic regions, consisting of 23 nucleotide substitutions, one deletion and one insertion-deletion. Of the 41 variants, 21 were novel while 20 were reported in the single nucleotide polymorphism (dbSNP) database. Eight of the identified variants were very common polymorphisms with MAF around 40%. Of these, three were intronic variants, one was located in the 3'-UTR region and four were coding polymorphisms, only one of which caused an amino acid change. This latter variant, Thr211Met, is located in exon 4 and was previously reported in the dbSNP database. Twenty-five of the identified variants had MAF around or below 5% (Table 1). Of these, 14 polymorphisms were found exclusively in 3 individuals (s1-3-6-7-13-16-17-18-19-23-24-31-32-33), and 9 were observed only once. The six coding variants causing an amino acid change were genotyped in an independent European Caucasian validation group consisting of 46 unrelated breast cancer cases originating from high-risk non-BRCA1/2 families (Table 2). Frequencies were similar in both cohorts as well as with frequencies reported in the dbSNP database. Comparison of most common variants with those described in the recent study in Finnish families shows no notable differences in carrier frequency, with the exception of the c.268C>T variant which has a carrier frequency of 12.7% in the Finnish families and which was not observed in our cases [70]. This may be attributed to population-specific differences.
Genotype and MAF were determined in cases and controls, both of French Canadian origin, for all coding variants as well as the intronic variants showing a MAF ≥ 5%. As indicated in Table 3, most genotype distributions significantly deviating from those expected under Hardy-Weinberg Equilibrium (HWE) involved rare SNPs (i.e. s6-7-13-16-18-32-33) found exclusively in 3 cases who seem to carry a specific allele.

Conservation of human ATR residues
Among exonic variants resulting in amino acid substitutions, c.6394T>G (Tyr2132Asp) was located in the FAT (FRAP/ATM/TRRAP) domain and c.7274G>A (Arg2425Gln) was located in the PI3Kc (phosphoinositide 3-kinase related catalytic) domain ( Figure 1) while the remaining amino acid substitutions were located outside catalytic domains. Comparison of missense substitutions was performed across relevant species in order to obtain a more representative prediction of the importance of specific residues on protein function. Alignment of ATR orthologue sequences illustrated in Table 4, revealed that, with the exception of Thr211Met and Val959Met, which are non-conserved residues (Thr211Met being conserved only in Pan troglodytes and Canis familiaris), Val316Ile and Lys764Glu are conserved in Pan troglodytes and in more distant species such as Canis familiaris, Mus musculus, Xenopus laevis and Fugu rubribes. Since Val316 and Lys764 residues are invariant from human to fish, this could suggest that these positions are under strong functional constraint. Tyr2132 and Arg2425 are only conserved in higher species, namely Pan troglodytes and Canis familiaris, respectively.

LD analysis
A graphical representation of the pairwise LD between all 41 SNPs identified in cases, as measured by Lewontin's |D'|, is shown in Figure 2. As demonstrated, the majority of SNPs are in strong LD with each other. Complete LD was found between the two most distantly separated intragenic SNPs (SNPs 1 and 41, inter-marker distancẽ 130 kb, |D'| = 1) which suggested that LD at the ATR locus did not decrease significantly with distance. For a few associations involving SNP2, the weaker LD values range from 0.113 to 0.441. Indeed, the only pair of adjacent SNPs displaying weak LD is SNP2 in association with SNP1 and SNP3 (D' = 0.118 and 0.113, respectively) and SNP40 with SNP39 (D' = 0.467). The SNP40 also displayed a large spectrum of LD values, interestingly all of these SNPs involved with SNP40, show a MAF >5%.

Haplotype analysis and tSNP identification
To reduce genotyping costs and efforts in future association studies, it is useful to identify tSNPs that would represent the majority of observed haplotypes, but with minimal reduction in power to detect a possible associa- Thereafter, in order to identify ATR tSNPs useful for wellpowered studies using larger sample sets, genotypes of 17 coding and intronic SNPs showing a MAF ≥5% in healthy individuals have been used for further haplotype analyses (Figure 3, Panel A). Genotyping data from controls only were purposely selected as they could well be more representative of the French Canadian population. When only haplotypes displaying a MAF ≥5% are used, 85% of all estimated haplotypes are represented by the tSNPs identified.
The identification of tSNPs was then carried out in two subsequent steps, firstly by determining haplotype blocks, followed by identification of tSNP in each LD block. Based on the algorithm from Gabriel et al. [71], three LD blocks have been identified in the French Canadians by the Haploview software (expectation maximisation algorithm) (Figure 3, Panel B). Thereafter, considering haplotypes having a MAF ≥5%, 8 tSNPs have been identified in the 3 LD blocks, namely SNPs 2, 5, 12 and 26 found in block 1, SNPs 30 and 39 in block 2, while block 3 consists of SNPs 40 and 41. Furthermore, seven tSNPs clustered in two LD blocks were selected in the ATR gene using the HapMap data from the CEPH/CEU cohort. It is of interest to note that three tSNPs have been tagged in both the French Canadian and the CEPH/CEU sample sets.

Splicing consensus sequence analysis
The possible effect on splicing of all coding or intronic variants located in a splice junction was assessed using the SSPNN website and revealed that the intronic variant c.5739-4del9+T (SNP32) located 4 nucleotides upstream of exon 34 decreased exon 34 consensus acceptor site score from 0.94 to 0.31 (Figure 4, Panel A). The presence of the intronic variant c.5739-4del9+T, located in intron 33, became of interest regarding its possible implication to generate a new alternative ATR transcript. Acceptor and Donor splice site sequences in this region are well conserved between mammalian species as illustrated in Figure 4. Moreover it should be noted that the in silico analysis revealed a weak donor splice site score (0.11) for exon 33 while a putative donor site with a higher splicing score (0.63) was predicted in exon 33, potentially generating an alternative exon.
Another rare variant (c. 7041+8G>A) showed a significant alteration in the splicing score for a donor site, leading to a score from 0.51 to 0.24 ( Figure 4, Panel B). In silico analysis of the region surrounding the c.7041+8G/A variant revealed that a putative intronic donor site located 441 nucleotides downstream of the exon 41 could be alternatively used instead of the exon 41 donor site. The WT exon 41 donor site found in other species displays a weak splicing score. The intronic putative donor site located in intron 41 has a higher splicing score in human (0.90) and  Pan troglodyte (0.90), while no corresponding sequence could be identified in other species analyzed.  Genomic structure of the human ATR gene which lacks the C-terminal part of the FAT domain, the entire kinase catalytic domain as well as the FATC domain. Subcloning of PCR products covering this region confirmed the presence of this ∆33 splice form in the immortalized lymphoblastoid cell lines from our breast cancer cases ( Figure 6). Analysis through the UCSC genome website revealed the presence of two human ESTs corresponding to the ∆33 splice transcript sequence (BG770191 and CD642306), supporting the presence of such a transcript in humans. Interestingly this alternative transcript uses the putative 3' splicing site predicted in exon 33 described above, which is conserved in most of the mammalian species analyzed. In order to confirm that no additional alternative transcripts involving the skipping of exon 34 could be generated due to this c.5739-4del9+T variant, we used specific primers located on putative exon 33-35 and ∆33-35 junctions in combination with upstream and donwstream primers for PCR amplification. No such detectable PCR product was observed. Besides, no EST corresponding to this potential transcript was found in the UCSC genome website.

Assessment of the presence of alternative transcripts
No alternative splice transcript has been identified in the region comprising the c.7041+8G/A variant using standard procedures such as PCR amplification with external primers located on exon 38-39, 39-40, 43-44 and 44-45 junctions, followed by subcloning. However, PCR amplification using a specific primer located on a putative exon41-intron41 junction or another primer located 441 nucleotides downstream of exon 41 on the putative Intron41-Exon42 junction identified by in silico analysis as described above, revealed the presence of such an alternative transcript ( Figure 5, Panel B). However, this alternative transcript is not reported in UCSC database. Since preliminary expression analyses performed in immortalized cell lines of an c.7041+8G/A heterozygote carrier and 9 wild-type individuals revealed an expression at the limit of detection (data not shown), no further expression analyses have been performed.

Characterization of ∆33 splice mRNA expression
Further characterization of expression levels of ∆33 splice form was performed in several normal tissues and cancer cell lines. As shown in panel A (Figure 6), relative expression levels of ∆33 splice form are highest in the breast and ovary, with relative expression levels of approximately 18% and 13% of the wild-type full-length splice form, while other examined tissues showed similar and lower expression levels. In breast cancer cell lines, ∆33 splice form expression ranges from approximately 6% to 11% in relation to total exon 33 expression (wild-type + ∆33 splice form) and no significant variation is observed according to estrogen receptor or differentiation status. ATR wild-type expression levels standardized for 18S RNA levels are also illustrated in panel B of Figure 6, and show variable expression across tissue samples and somewhat slightly higher expression levels in cancer cell lines.

Is the c.5739-4del9+T variant associated with ∆33 splice transcripts?
Assessment of association between the c.5739-4del9+T variant and the expression of the ∆33 splice form was performed using real-time PCR in RNA samples obtained from lymphoblastoid cell lines of 38 of the screened cases, which included 35 wild-type individuals, one heterozygote and two homozygotes for the c.5739-4del9+T variant. As illustrated in Figure 7, the presence of the ∆33 splice transcript was detected in all individuals, including wild-type individuals, therefore supporting that the expression of this splice form is not associated with the presence of the intronic c.5739-4del9+T variant. Furthermore, more interestingly, expression levels do not seem to correlate with genotype status, as observed in heterozygous and homozygous individuals, and high expression variability is also observed in wild-type individuals, whose mean expression level was 2.733 ± 1.806.
Since no evidence suggested that the c.5739-4del9+T variant could be associated with the presence or the expression of the ∆33 splice form, amplification of several cDNA fragments covering exons 30-38 using different combinations of primers located on exon-exon junctions ( Figure  5) was performed to detect any additional splice form resulting from the effect of this sequence change on mRNA splicing. No additional splice mRNA was observed.

Discussion
Since it is well established that the residual familial risk of breast cancer, not caused by BRCA1 or BRCA2 genes, could be explained by a polygenic or high-risk genes heterogeneity model [72,73], we selected individuals affected with breast cancer without mutations in BRCA1/2 genes from high-risk families (one individual per family), in order to increase the power of the study to find genetic variants involved in breast cancer susceptibility. So far, several genes have been investigated based on their interaction with BRCA1/2 or their involvement in DNA repair mechanisms. Since BRCA1/2 genes are intimately linked to genomic stability, other genes involved in this pathway are very good candidates to be BRCA3, and this is especially true of ATM and ATR which play a central role in genome stability maintenance. The ATM gene has been suspected to be a breast cancer susceptibility locus, due to the presence of breast cancer in A-T families, particularly among ATM heterozygotes [74]. ATM mutations have already been reported to increase breast cancer susceptibility [9,27,75], while some other sequence variants located in this gene do not seem to be linked to breast cancer [24].
Based on the similar roles played by ATM and ATR as sensors of DNA damages, ATR may be considered a putative candidate gene that could possibly explain a fraction of the remaining familial breast cancer risk. Association of ATR germline mutation with breast cancer susceptibility has been previously analyzed in Finnish 126 families [70], and no germline mutation was identified in this founder population. The current study, performed in a French Canadian cohort, also being a founder population, was designed to assess the possible involvement of ATR germline deleterious mutations in breast cancer predisposition.
No deleterious germline mutation leading to a premature termination of the protein were identified in the coding region. However, 41 sequence variants were identified, among which 16 were coding variants while 21 were novel changes. In addition we find it unlikely that neither of the common missense substitutions located in the FAT and kinase domains (c.6394T>G and c.7274G>A) have a significant effect on protein function because: (i) their frequencies are similar in cases and controls, especially for c.7274G>A whose MAF is greater than 20% in controls and (ii) these residues are not well conserved in other species (Table 4). Indeed, the polymorphisms displaying a significant deviation from HWE are composed of a group of 14 uncommon polymorphisms identified in the same 3 breast cancer cases (2 homozygotes and 1 heterozygote), and therefore this most likely constitutes a single relatively rare allele. It has to be stated that no particular characteristics seem to emerge for the families bearing any of these rare variants, as both the French Canadian and the validation families have been recruited on the basis of high-risk breast cancer families.
Comparison of polymorphism frequencies between our cohort and the Finnish cohort [70] is not fully informative since the latter does not distinguish the number of heterozygotes and homozygotes found in their cohort but only the number of carriers of a given polymorphism. However, if we also use this method to calculate polymorphism frequencies observed in our cohort, only SNP40 displayed a notably lower frequency than that found in the Finnish cohort. As stated earlier, both studies (Heikkinen et al. and the present study) have been designed to identify ATR deleterious germline mutations in breast cancer cases. No such mutation was found in either study, therefore ATR is unlikely to play a major role as a high penetrance gene in breast cancer predisposition. Even though novel variants have been identified, the possible involvement of polymorphisms or haplotypes observed in cases compared to those found in controls would need a lot more individuals to obtain a significant value of association to breast cancer susceptibility [76,77]. We thus sought to identify tSNPs that could be useful to other studies and populations.
Our pairwise linkage disequilibrium analysis ( Figure 2) did not seem to identify any distinct LD blocks within ATR. This observation is supported by the fact that SNP1 is in perfect LD with most other SNPs, including the most distal SNP41, and is also in accordance with what is seen in the French Canadian founder population which displays large conserved haplotypes as reported at the BRCA1 locus [49]. However, using the Haploview software, three distinct LD blocks were identified at the ATR locus when using SNPs showing a MAF >5% in healthy French Canadian individuals (Figure 3). The breakage of strong LD seems to be located in the region of exon 31, and between exon 43 and exon 47.
Based on the same algorithm (Haploview), and using the SNPs genotyped in HapMap database showing a MAF higher than 5%, two LD blocks could be identified; the first block comprising the SNPs located from intron 1 to exon 43, while the second block included all the remaining SNPs until exon 47. However, it should be noted that the majority of the SNPs used to determine haplotype blocks have a MAF higher than 0.4, which represent common SNPs found in many different populations and therefore probably exclude the SNPs specifically observed in our French Canadian founder population.
Pairwise linkage disequilibrium (LD) measures of |D'| for the 41 SNPs identified in our breast cancer cases series Figure 2 Pairwise linkage disequilibrium (LD) measures of |D'| for the 41 SNPs identified in our breast cancer cases series. All SNPs are denoted numerically with reference to Table 1.
Panel A: The Table denotes the frequencies, using PHASE, of haplotypes using SNPs having a MAF higher than 5% estimated in controls Figure 3 Panel A: The Table denotes the frequencies, using PHASE, of haplotypes using SNPs having a MAF higher than 5% estimated in controls. Panel B: Haplotype blocks predicted using SNPs identified in control series showing a MAF higher than 5% (17 SNPs). tSNPs identified on a block-by-block basis are denoted with an asterisk (*) above the SNP number. Population haplotype frequencies are displayed on the right of each haplotype combination while the level of recombination is displayed above the connections between two blocks. Thick connections represent haplotypes with frequencies higher than 10% while frequencies below 10% are represented by thin lines. Expression levels of ATR ∆33 splice form in cell lines and human tissues as measured by quantitative real-time PCR experi-ments Figure 6 Expression levels of ATR ∆33 splice form in cell lines and human tissues as measured by quantitative real-time PCR experiments. Panel A. Relative expression levels of ∆33 splice form were calculated as ∆33 splice form/(∆33 splice form + wild-type allele) in various human tissues and cell lines. Panel B. Standardized expression levels of WT exon 33 were calculated as WT exon 33/(∆33 splice form + wild-type allele) in various human tissues and cell lines.
We were able to demonstrate that 8 tSNPs are sufficient to represent the majority of ATR haplotypes in our French Canadian individuals, which will greatly facilitate subsequent studies. Our results of 8 tSNPs at the ATR locus in our population is consistent with previously reported number of tSNPs required at other gene loci in other populations [78,79]. We can therefore be quite confident that these tSNPs will be useful in subsequent analyses. Moreover, out of 72 SNPs genotyped in the HapMap database (HapMap data rel#20 on NCBI B35 assembly, dbSNP b125) at the ATR locus, only 40 displayed a MAF >5%. Among them, 7 tSNPs were identified, 3 of which have been identified as tSNPs in our analyses (rs10804682, rs2229032, rs1802904). Of the remaining four tSNPs identified in HapMap database (rs11920625, rs9856772, rs6805118 and rs9816736), three have not been genotyped in our cohort as they were located in intronic regions (>150-200 bp) and one (rs11920625) was not observed in our individuals.
Sequence analysis of exon 34 flanking intronic sequences revealed a deletion of 9 nucleotides + insertion T (c.5739-4del9+T), which is located 4 nucleotides upstream of this exon. In silico analysis showed that this deletion decreased the exon 34 acceptor site splicing score from 0.94 to 0.31, which suggests potential splicing alteration in this region.
Surprisingly, sequence analysis of this cDNA region in our immortalized cell lines revealed a deletion of the last 121 nucleotides of exon 33 instead of a skipping of exon 34 (or a portion of exon 34), as expected. This deletion of 121 nucleotides alters the ORF and results in a putative truncated protein of 1889 amino acids. Although interest-ing, this deletion is observed at similar levels in all tested individuals and is therefore unlikely related to the c.5739-4del9+T polymorphism. This ∆33 splice form may be explained by the weak wild-type donor site score of exon 33 (0.11) and the presence of an additional donor site located within exon 33, which exhibits a score of 0.63. No splice form involving the skipping of exon 34 has been identified when using specific primers located on the putative ∆33-35 or 33-35 exon junctions.
Splicing score analyses of exon 41 flanking intronic sequences were also analyzed since c.7041+8G/A could potentially affect the splicing in this region. While the exon 41 donor splice site showed a relative low splicing score in all species, the putative intronic donor site (splicing score of 0.90) located 441 nucleotides downstream of exon 41 became of interest, given its potential effect on splicing in this region ( Figure 4). The ins∆Int41 splice form could not be detected using standard procedures. However, this splice form has been amplified and subcloned by using specific primers located on this putative exon junction, demonstrating its very low mRNA expression (at the limit of detection). Due to this low expression, it was impossible to conclude whether or not this ins∆Int41 splice form is associated with the c.7041+8G/A variant.
The ratio of ∆33 splice form/WT form being a potentially important factor regarding DNA repair and other related functions in genome stability, we performed QRT-PCR to estimate the relative abundance of WT and ∆33 splice form mRNAs, using TAQMAN probes to allow discrimination between both forms. No correlation was found between the presence of c.5739-4del9+T in either the heterozygous or homozygous state and the expression levels of the ∆33 splice form. However, it is very interesting that significant relative expression of the ∆33 splice mRNA is observed in breast and ovarian tissues, as well as in MCF7 and HaCat (human skin keratinocytes) cells ( Figure 6A), especially since expression levels of the WT ATR form in these tissues ( Figure 6B) seem to be relatively similar to other tissues and cell lines. The ratio of expression levels between both mRNAs could therefore be of primary issue regarding the effect of the balance of these transcript levels on cell integrity in different human tissues. However it should be noted that only one sample per tissue was analyzed, which by no means represents a mean expression in these tissues or cell lines. While alternative splicing within the non-catalytic domain of ATR mRNA transcript causing skipping of exon 6 had already been observed [80], in 2003 O'Driscoll and coll. [81] identified a founder mutation (2101A→G) in ATR that affects exon 9 splicing in two related Pakistani families affected with Seckel syndrome. This study also shows an impaired response to DNA damage in a cell line from an affected parent who carried the mutation. Further characterization of ATR-Seckel cells showed impaired phosphorylation of ATR-dependent substrates, impaired G2/M checkpoint arrest and supernumery centrosomes in mitotic cells, clearly demonstrating a role for ATR in the maintenance of centrosome stability [82]. More recently, two other splicing alterations of ATR have been reported in clinical samples with pyothorax-associated lymphoma [83].

Conclusion
No deleterious germline mutations have been identified in French Canadian breast cancer cases. However, we have conducted the first detailed haplotype tagging analysis of the ATR gene within control individuals from the French Canadian population. The data presented here clearly identified 8 ATR tSNPs, which will be useful for other large-scale association studies. We did not find any germline mutations in the ATR gene potentially involved in breast cancer predisposition. However, given that different splicing alterations of ATR have been associated with impaired response to DNA damage, the notably significant expression of the novel ∆33 splice form observed in breast and ovarian tissues could have a potential effect on DNA repair mechanisms in these cells, although exhaustive analyses should be required to verify this hypothesis. Further analyses in other populations and larger cohorts will be required to define the possible association of ATR gene polymorphisms with breast cancer susceptibility. Josée Rhéaume, for genetic counselling and clinical data management at the Cancer Genomics Laboratory. We also thank Geneviève Ouellette for establishment of EBV-transformed B-lymphoblastoid cell lines and RNA and genomic DNA extractions, as well as Anne-Marie Moisan and Lucie Larouche for MLPA analyses. We thank Claudia Moreau at the Centre de Recherche de l'Hôpital Ste-Justine for help with control DNA samples. We would also like to thank Professor Bartha Maria Knoppers and her colleagues from the Centre de recherche en droit public de l'Université de Montréal for their precious help with ELSI issues related to our research