Skip to main content

Characterization of the linkage disequilibrium structure and identification of tagging-SNPs in five DNA repair genes



Characterization of the linkage disequilibrium (LD) structure of candidate genes is the basis for an effective association study of complex diseases such as cancer. In this study, we report the LD and haplotype architecture and tagging-single nucleotide polymorphisms (tSNPs) for five DNA repair genes: ATM, MRE11A, XRCC4, NBS1 and RAD50.


The genes ATM, MRE11A, and XRCC4 were characterized using a panel of 94 unrelated female subjects (47 breast cancer cases, 47 controls) obtained from high-risk breast cancer families. A similar LD structure and tSNP analysis was performed for NBS1 and RAD50, using publicly available genotyping data. We studied a total of 61 SNPs at an average marker density of 10 kb. Using a matrix decomposition algorithm, based on principal component analysis, we captured >90% of the intragenetic variation for each gene.


Our results revealed that three of the five genes did not conform to a haplotype block structure (MRE11A, RAD50 and XRCC4). Instead, the data fit a more flexible LD group paradigm, where SNPs in high LD are not required to be contiguous. Traditional haplotype blocks assume recombination is the only dynamic at work. For ATM, MRE11A and XRCC4 we repeated the analysis in cases and controls separately to determine whether LD structure was consistent across breast cancer cases and controls. No substantial difference in LD structures was found.


This study suggests that appropriate SNP selection for an association study involving candidate genes should allow for both mutation and recombination, which shape the population-level genomic structure. Furthermore, LD structure characterization in either breast cancer cases or controls appears to be sufficient for future cancer studies utilizing these genes.

Peer Review reports


Candidate gene association studies are a powerful study design for complex diseases such as cancer. Advances in association studies have been furthered by the recent discovery of single nucleotide polymorphisms (SNPs); their vast density throughout the genome, ease of genotyping and moderate cost contribute greatly to their utility. Association testing is efficient when the SNPs being analyzed represent the entire genetic variation of the gene. It has been suggested that nearby SNPs are organized into regions of high linkage disequilibrium (LD) separated by short segments of very low LD [16]. In Caucasians, high LD regions may vary in length from a few kb to >300 kb[2, 6, 7]. Regions of high LD contain redundant information and can be reduced to smaller subsets of tagging-SNPs (tSNPs)[8], such that tSNPs identify all common haplotypes within the region of high LD. A number of algorithms have been proposed to define regions of high LD and tSNPs[4, 814]. Thus far, no consensus of which algorithm is best has been achieved. Several studies have suggested the utility of matrix decomposition algorithms.[12, 13, 1517]. One advantage of these algorithms is that SNPs in high LD are not required to be contiguous nor mutually exclusive, a flexibility that is necessary for analyzing small genomic regions and rare variants. Further, these methods are stable with regards to marker density, minor allele frequency, analysis window, and possible analysis window length[18].

Growing evidence appears to suggest that tumorigenesis is a multi-step process of genetic alterations that transform a normal human cell into a malignant derivative[19]. The ability of a cell to maintain genomic stability through DNA repair mechanisms is essential to prevent tumor initiation and progression. A number of different types of cancer have been attributed to defective DNA repair including xeroderma pigmentosum[20], hereditary nonpolyposis colorectal cancer[21], and breast cancer due to mutations in BRCA1 and BRCA2 as well as other DNA repair genes (e.g., ATM, TP53 and CHK2)[22]. Many published candidate gene association studies involving DNA repair genes and cancer risk have assessed risk by examining a single SNP per gene or a single locus at a time analysis approach. Unfortunately, the former approach is often inadequate in comprehensively accounting for the genetic variation of a gene, and the latter incurs multiple testing corrections, which usually eliminate all or most of the association evidence found. It has been suggested that use of haplotypes in association studies may have increased power over single-allele studies[8]. Descriptions of haplotype diversity and LD structure as well as identification of potential tSNPs will be key for success in candidate gene association studies.

Here we describe haplotypes, LD structure and potential tSNPs in five DNA repair breast cancer susceptibility genes: ATM, MRE11A, NBS1, RAD50, and XRCC4. We used a matrix decomposition algorithm based on a method of principal components analysis[13]; this method does not require SNPs to be in contiguous block structure. Characterization of the LD structure and tSNPs are necessary for the design of future effective association studies.



This study is part of a larger study involving 139 high-risk Caucasian breast cancer families, defined as high risk because cancer rates in these families were significantly higher than the general population rate determined using the Utah Population Database (UPDB) [2325]. All breast cancer cases in the larger cohort met at least one of the following criteria: 1.) their family tested negative for a BRCA1 or BRCA2 mutation, 2.) the case themselves tested negative for the same BRCA1/2 mutation that was present in their family, or 3.) their family had a low probability of carrying a BRCA1/2 mutation based on the number of breast cancer cases present in the family and/or ages at diagnosis of breast cancer within the family. Therefore, all breast cancer cases in the larger study had a low residual probability of their cancer being due to mutations in BRCA1/2. Breast cancer diagnosis information was obtained from medical records for the subject or the Utah Cancer Registry.

For this LD characterization study, we selected a panel of 94 individuals (47 female breast cancer cases and 47 female controls), chosen randomly from separate kindreds to ensure independence. Both cases and controls were chosen such that comparisons of LD structure could be made between the groups. The sample size of 188 chromosomes is larger than generally used for this type of study [2629], but inadequate for an association analysis. This current study is not a case-control study and associations with disease were not assessed.

Blood samples were collected on all subjects and all individuals signed consent to participate this study. This study was approved by the University of Utah Institutional Review Board.

Genes and SNP selection

For each gene of interest (i.e., ATM, MRE11A, NBS1, RAD50 and XRCC4), all SNPs available from Applied Biosystems[30], within each gene and the flanking 10 kb on either side, that had been validated to have a minor allele frequency greater than 0.01 in Caucasians were selected. For ATM (on chromosome 11q22-q23), which spans approximately 143 kb and contains 64 exons, 14 SNPs were studied with a SNP resolution of 1 SNP/10,489 bp. For MRE11A (11q21), which spans approximately 76 kb and contains 20 exons, 11 SNPs were studied with a SNP resolution of 1 SNP/8539 bp. For NBS1 (8q21), which contains 16 exons and spans about 51 kb, 5 SNPs were studied with a SNP resolution of 1 SNP/8256 bp. The RAD50 gene (5q31) spans approximately 87 kb contains 25 exons, and we studied 10 SNPs at a resolution of 1 SNP/10,533 bp. Finally, for XRCC4 (5q13-q14) with 8 exons and approximately 276 kb in length, we studied 21 SNPs at a resolution of 1 SNP/13,198 bp. The vast majority of the SNPs studied were intronic (see Table 1).

Table 1 Characteristics of SNPs analyzed


For the ATM and XRCC4 all SNPs that met the above criteria were genotyped on our panel of 94 subjects. For MRE11A, one SNP repeatedly failed to amplify (rs10831224) and was removed from the study.

Genomic DNA was isolated and purified using standard phenol/chloroform DNA extraction. SNP genotyping was performed using the fluorogenic 5' nuclease TaqMan Assay[31] (Applied Biosystems). The TaqMan Assay requires TaqMan PCR Master Mix (Applied Biosystems), which we used according to manufacturer's instructions, yielding a final volume of 5 μl per well. PCR amplification was also performed according to the Applied Biosystems protocol. The 7900HT Sequence Detection System (Applied Biosystems) was used to measure each fluorescent dye-labeled probe specific for each allele studied and results were analyzed with the Sequence Detection Software (Applied Biosystems).

Haplotype structure and tSNP selection

Haplotypes and haplotype frequencies were estimated from unphased genotype data using an expectation-maximization algorithm, SNPHAP[32]. SNPHAP uses a maximum-likelihood program to predict multilocus haplotypes. Haplotypes with a frequency of at least 0.01 were analyzed using a two-step PCA method[13]. This method does not require that groups of SNPs be contiguous along a DNA fragment and also allows SNPs to be present in more than one group. In step I, LD groups are determined. In brief, the PCA method extracts factors (LD groups) to capture ≥ 90% of the genetic diversity. An LD group is defined as those SNPs that load onto the same factor. In step II, tSNPs are selected for each LD group. Each LD group is considered separately and the PCA method again extracts factors; tSNPs are chosen as the SNPs with the highest factor loading. When a number of SNPs load equally well on an LD group, these can all be considered potential tSNPs. Under such circumstances, we selected the single SNP that performed best in the genotyping assay. This was done in order to minimize errors in allele calls.

We compared our genotype data for ATM, MRE11A, and XRCC4 with genotyping data for these same genes obtained from Applied Biosystems (ABI)[30] on 45 Caucasians. We found good concordance in allele frequencies between the data sets. Further, we applied the same LD characterization to both data sets and found excellent concordance in the LD groups and potential tSNPs (see Results). We therefore characterized LD groups and tSNPs for NBS1 and RAD50 using the genotyping data available online.

We also examined whether differences existed between LD group structure and tSNP selection when cases and controls were considered separately. This analysis could only be performed for ATM, MRE11A, and XRCC4.


Characteristics of the SNPs studied are listed in Table 1. Minor allele frequencies from our 94 subjects compared well with those listed by Applied Biosystems[30]. Despite the very low minor allele frequencies in some of the SNPs studied, we observed heterozygosity for all SNPs genotyped.

Table 2 lists the haplotypes with a frequency > 0.01 obtained from SNPHAP, and the LD group designation and the tSNPs that were selected using the PCA method, for ATM, MRE11A, and XRCC4. Haplotypes are reported using the standard convention of designating the major allele as '1' and the minor allele as '2', in order to more easily spot occurrences of the minor allele. Please see Table 1 for the corresponding base pair change. For ATM, 7 haplotypes overall were observed and 5 had a frequency > 0.01. Using the PCA method, a single LD group was identified, encompassing the entire gene and accounting for 98.8% of the genetic variance across the gene. From this single LD group, a single tSNP (A13) was selected.

Table 2 Haplotypes with frequency>0.01, LD group characterization and tSNPs selected using Utah genotyping data*

For MRE11A, we observed 9 haplotypes in total and 6 with frequency > 0.01. From the PCA analysis, four LD groups were identified based on these 6 haplotypes with a frequency > 0.01, and accounted for 99.1% of the genetic variance. The LD groups did not conform to haplotype blocks. SNP M4 separated LD group 1 into two parts and M8 separated LD group 2. Each LD group was represented by a single tSNP, such that the tSNP set contained 4 tSNPs (M6, M10, M11, and M14).

For XRCC4, we observed 26 haplotypes overall; 13 of which had a frequency >0.01. From the PCA method, four LD groups were observed which accounted for 97.2% of the variance. Similarly to MRE11A, the LD groups were not contiguous blocks. LD group 1 was divided by X9 and LD group 2 was divided by X15. Each of the LD groups could be represented by a single SNP resulting in the tSNP set (X2, X9, X14, and X21).

Table 3 shows the LD groups and tSNPs for ATM, MRE11A and XRCC4 using our panel of 94 subjects and using the 45 Caucasian subjects from Applied Biosystems[30]. For these three genes, we observed the same number of LD groups containing precisely the same SNPs for both data sets. The difference between the results was in the number of potential tSNPs for each LD group. For the majority of LD groups, the potential tSNPs using Applied Biosystems data were a subset of those from our data. This is perhaps expected, because our sample size was more than double their size and is therefore capable of better resolution.

Table 3 Comparison of LD groups for the Utah breast cancer cases and controls with Applied Biosystems (ABI) data*

Table 4 lists the haplotypes, LD group designation, potential tSNPs, and tSNP selected per group for NBS1 and RAD50 using the Applied Biosystems' data. For NBS1, 6 haplotypes overall were observed and all 6 haplotypes had a frequency > 0.01. Using the PCA method, two LD groups were identified and accounted for 93.8% of the variance. Two tSNPs were sufficient to tag these groups (N1, N2). However, N5 could replace N2 with no reduction in the variance explained. For the RAD50 gene, in order to include two available rare SNPs in the analysis, we lowered the haplotype acceptance threshold to 0.009. We observed a total of 14 haplotypes, 10 with a frequency greater than 0.01. Using the PCA method, we identified three LD groups, which accounted for 91.5% of the variance. Similarly to MRE11A and XRCC4, the LD groups for RAD50 were not contiguous blocks. Three tSNPs were sufficient to tag the groups (R1, R3, and R10), although R5 could replace R1 and R6 could replace R3 with no loss of variance explained.

Table 4 Haplotypes with frequency>0.01, LD group characterization and tSNP selected using data from Applied Biosystems*

For ATM, MRE11A, and XRCC4, we compared haplotypes and LD structure between the breast cancer cases and controls. For ATM and XRCC4 no difference in the LD structure was observed when cases and controls were analyzed separately. For the MRE11A gene differences in LD structure were noted, however, these were minor and likely attributable to small sample size since the differences were driven by 3 rare haplotypes (frequency = 0.02).


Identification of the most informative markers to use in a large-scale association analysis for studies of complex disease, such as breast cancer, is critical to the success of the study. The key to this process is to select SNPs that are most informative about the underlying haplotype structure in a population of interest. As haplotype based designs have been suggested as being more powerful than the single-allele approach for association studies[8], a haplotype-based approach should result in more accurate and definitive findings. In this study, we have described haplotypes and characterized the LD structure of the ATM, MRE11A, and XRCC4 genes using a panel of 94 subjects, including breast cancer cases from high-risk breast cancer families as well as controls. Further, we identified tSNPs that can be used in future haplotype-based association studies. A similar analysis was performed for NBS1 and RAD50 using publicly available genotype data. We identified, using Principal Components Analysis[13], a single LD group for ATM, four noncontiguous LD groups for MRE11A, two LD groups for NBS1, three noncontiguous LD groups for RAD50, and four noncontiguous LD groups for XRCC4. In each case, the LD groups captured greater than 90% of the variance of the total SNPs available from Applied Biosystems across the gene. Furthermore for each gene, we present tSNPs that could be selected to represent the gene.

It is of interest that the LD structure for three of these five DNA repair genes did not conform to the haplotype block model, that is, that the LD groups did not contain contiguous SNPs. This was true whether the genotyping data came from our own study or from Applied Biosystems. Although we did not directly sequence these genes to identify all possible variants, the discontinuity we observed illustrates that the underlying LD structure cannot conform to contiguous haplotype blocks. A more flexible LD group representation (as supported under principle components analysis) fit the data better and appears to be stable to differences in minor allele frequency. Similar findings of a complex pattern of LD structure were recently reported in a high-resolution study of the ELAC2 gene[15]. Our results suggest that when studying small genomic regions and low frequency variants (<0.2), mutation is an important dynamic in LD structure, and the simple recombination-only model used in classical haplotype block methods does not fit the data well and hence will lead to a poor selection of tSNPs.

Due to the stability of the results for ATM, MRE11A and XRCC4, we pursued two additional DNA repair genes of interest (i.e., NBS1 and RAD50). Applied Biosystems provides freely-available genotyping data for four ethnically diverse populations of 45 subjects in each, therefore, even with limited funds, the haplotype structure and selection of tSNPs can be estimated for a study prior to any genotyping costs. However, caution must be used if this option is exercised as one's population must be one of Applied Biosystems' ethnic cohorts (i.e., Caucasian, African American, Chinese, or Japanese) and our experience is that occasionally errors exist in the data.

Of the genes studied here, only ATM has previously been studied in any depth for LD structure. The reason that ATM has received so much attention is that patients with the recessive disease ataxia-telangiectasia, due to a mutation in the ATM gene, have a 100-fold increased risk of cancer[33, 34] and obligate heterozygous carriers of ATM mutations may have an increased risk of cancer, particularly breast cancer [3539], although this finding is controversial[40, 41]. Extensive LD across the ATM gene has previously been reported [4244], and sequence analysis reveals that ATM polymorphisms are relatively rare resulting in low overall sequence diversity[44]. Thus, it follows that only a small number of haplotypes have been found, particularly in Caucasian populations of European descent. Thorstenson et al [44] predicted seven haplotypes in populations throughout the world, only three of which were found in Europeans or the Americans. Bonnen et al [43] identified 22 unique haplotypes, seven of which occurred in Caucasians, and only five of these occurred at a frequency of greater than 5% among Caucasians. We observed five haplotypes for the ATM gene, but only two of these could be considered common haplotypes (>0.01) and together accounted for 96% of all chromosomes. A recently published study using those haplotypes defined by Thorstenson et al[44] and Bonnen et al[43] identified five haplotype tagging-SNPs that were necessary to capture all of these haplotypes with a frequency >1%[45]. In our study, which is limited to Applied Biosystems' validated SNPs, we found that one tSNP was sufficient to represent 98.8% of the total genetic variance for all the SNPs available. The results of our study differed from these other studies due most likely to differences in the minor allele frequency range of the SNPs utilized. Our minor allele frequency for the 14 SNPs studied in the ATM gene varied minimally from 0.43 – 0.45. Thorstenson et al[44] and Bonnen et al[43] included 2 and 3 SNPs, respectively, that had minor allele frequencies <0.25. Population structure exists in SNP-allele frequencies[43] and as observed by the results of this study, exclusion of rarer SNPs has an impact on the frequency of haplotypes that are observed.

Comparison of haplotype and LD structure between cases and controls for ATM, MRE11A, and XRCC4 indicated that LD structure for these genes were similar in both groups. Results for ATM and XRCC4 were identical and only minor differences in LD structure were noted for MRE11A due to three rare haplotypes. A recent study has reported that rare haplotypes may be important for disease susceptibility and in their study these rare haplotypes had significant effects on their phenotype of interest[46]. Therefore, if rare haplotypes are of interest to an investigator, it may be prudent to characterize LD in both cases and controls and select tSNPs that comprehensively cover the diversity of both groups. However, most studies to date have empirically found that LD structure is similar across phenotype[1, 47]. If major differences in LD structure were to exist, this would have a profound effect on guidelines for tSNP selection and for application of projects such as the HapMap[48, 49].

Some limitations are inherent in this study and must be pointed out. First, we did not sequence our genes of interest and thus all of the genetic diversity within these genetic regions may not be captured. Our results must be interpreted in light of this. The gold standard is to identify all variants within a gene and select a subset of tSNPs from this set. It would be interesting to evaluate the robustness of our findings using sequence data. However, the SNPs examined were relatively evenly spaced, on the order of 1 SNP every 10 kb, and our results are important as they illustrate how smaller budget studies can best select tSNPs. Second, our sample size was modest (188 chromosomes), although larger than other previous studies examining LD and tSNPs [2629]. Finally, haplotype block and haplotype-tagging SNP analyses have been suggested to only be reliable when markers are dense, otherwise marker sets have considerable loss of information[50]. This result may extend to PCA methods, however, the matrix decomposition algorithm used has been suggested to be stable with regards to varying levels of marker density[18].


In conclusion, we have described haplotypes, linkage disequilibrium structure, and identified tSNPs from all available Applied Biosystems' validated SNPs in ATM, MRE11A, NBS1, RAD50, and XRCC4 genes in a Caucasian population. As has been found for other genes, we identified LD structures that did not conform to contiguous haplotype block structures. This illustrates the importance of using flexible methods, such as matrix decomposition, that allow for multiple population dynamics such as recombination, mutation and selection. Although the gold standard for SNP characterization across a candidate gene is sequencing to identify all variants, we describe a low-budget means to characterize the LD structure and select tSNPs using publicly available data. Comprehensive characterization of the LD structure at genes of interest will be essential for future, effective association studies.

Electronic database information

The data from the 94 breast cancer case and control subjects for these tables is publicly available at under Supplemental Materials to Publication. On request from Dr. Nicola Camp a username and password to access the data will be given.


  1. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype structure in the human genome. Nat Genet. 2001, 29: 229-232. 10.1038/ng1001-229.

    Article  CAS  PubMed  Google Scholar 

  2. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science. 2002, 296: 2225-2229. 10.1126/science.1069424.

    Article  CAS  PubMed  Google Scholar 

  3. Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R, Lander ES: Linkage disequilibrium in the human genome. Nature. 2001, 411: 199-204. 10.1038/35075590.

    Article  CAS  PubMed  Google Scholar 

  4. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science. 2001, 294: 1719-1723. 10.1126/science.1065573.

    Article  CAS  PubMed  Google Scholar 

  5. Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, Dibling T, Tinsley E, Kirby S, Carter D, Papaspyridonos M, Livingstone S, Ganske R, Lohmussaar E, Zernant J, Tonisson N, Remm M, Magi R, Puurand T, Vilo J, Kurg A, Rice K, Deloukas P, Mott R, Metspalu A, Bentley DR, Cardon LR, Dunham I: A first-generation linkage disequilibrium map of human chromosome 22. Nature. 2002, 418: 544-548. 10.1038/nature00864.

    Article  CAS  PubMed  Google Scholar 

  6. Phillips MS, Lawrence R, Sachidanandam R, Morris AP, Balding DJ, Donaldson MA, Studebaker JF, Ankener WM, Alfisi SV, Kuo FS, Camisa AL, Pazorov V, Scott KE, Carey BJ, Faith J, Katari G, Bhatti HA, Cyr JM, Derohannessian V, Elosua C, Forman AM, Grecco NM, Hock CR, Kuebler JM, Lathrop JA, Mockler MA, Nachtman EP, Restine SL, Varde SA, Hozza MJ, Gelfand CA, Broxholme J, Abecasis GR, Boyce-Jacino MT, Cardon LR: Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat Genet. 2003, 33: 382-387. 10.1038/ng1100.

    Article  CAS  PubMed  Google Scholar 

  7. Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P: The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet. 2004, 13: 577-588. 10.1093/hmg/ddh060.

    Article  CAS  PubMed  Google Scholar 

  8. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA: Haplotype tagging for the identification of common disease genes. Nat Genet. 2001, 29: 233-237. 10.1038/ng1001-233.

    Article  CAS  PubMed  Google Scholar 

  9. Zhang K, Deng M, Chen T, Waterman MS, Sun F: A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci U S A. 2002, 99: 7335-7339. 10.1073/pnas.102186799.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Weale ME, Depondt C, Macdonald SJ, Smith A, Lai PS, Shorvon SD, Wood NW, Goldstein DB: Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am J Hum Genet. 2003, 73: 551-565. 10.1086/378098.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ke X, Cardon LR: Efficient selective screening of haplotype tag SNPs. Bioinformatics. 2003, 19: 287-288. 10.1093/bioinformatics/19.2.287.

    Article  CAS  PubMed  Google Scholar 

  12. Meng Z, Zaykin DV, Xu CF, Wagner M, Ehm MG: Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am J Hum Genet. 2003, 73: 115-130. 10.1086/376561.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Horne BD, Camp NJ: Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation. Genet Epidemiol. 2004, 26: 11-21. 10.1002/gepi.10292.

    Article  PubMed  Google Scholar 

  14. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004, 74: 106-120. 10.1086/381000.

    Article  CAS  PubMed  Google Scholar 

  15. Camp NJ, Swensen J, Horne BD, Farnham JM, Thomas A, Cannon-Albright LA, Tavtigian SV: Characterization of linkage disequilibrium structure, mutation history, and tagging SNPs, and their use in association analyses: ELAC2 and familial early-onset prostate cancer. Genet Epidemiol. 2004, 28: 232-243. 10.1002/gepi.20054.

    Article  Google Scholar 

  16. Lin Z, Altman RB: Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet. 2004, 75: 850-861. 10.1086/425587.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Nyholt DR: A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004, 74: 765-769. 10.1086/383251.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Belmont J, Yu F, Hardenbol P, Lu X, Moorhead M, Scott G, Ghose S, Pasternak S, Willis T, Faham M, Leal SM, Taylor J, Morris R, Kaplan N, Gibbs RA: High Density SNP Map Reveals Interrupted and Interlaced Organziation of Linkage Disequilibrium Among Markers: ; Toronto, Ontario, Canada. 2004,

    Google Scholar 

  19. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell. 2000, 100: 57-70. 10.1016/S0092-8674(00)81683-9.

    Article  CAS  PubMed  Google Scholar 

  20. Magnaldo T, Sarasin A: Xeroderma pigmentosum: from symptoms and genetics to gene-based skin therapy. Cells Tissues Organs. 2004, 177: 189-198. 10.1159/000079993.

    Article  PubMed  Google Scholar 

  21. Chung DC, Rustgi AK: The hereditary nonpolyposis colorectal cancer syndrome: genetics and clinical implications. Ann Intern Med. 2003, 138: 560-570.

    Article  CAS  PubMed  Google Scholar 

  22. Khanna KK, Jackson SP: DNA double-strand breaks: signaling, repair and the cancer connection. Nat Genet. 2001, 27: 247-254. 10.1038/85798.

    Article  CAS  PubMed  Google Scholar 

  23. Skolnick M, Bean L, Dintelman SM, Mineau G: A computerized family history data base system. Sociol Soc Res. 1979, 63: 506-

    Google Scholar 

  24. Skolnick M: The Utah genealogical database: A resource for genetic epidemiology. Banbury Report No 4: Cancer Incidence in Defined Populations. Edited by: Skolnick M. 1980, Cold Spring Harbor, NY, Cold Spring Harbor Laboratory Press

    Google Scholar 

  25. Pedigree and Population Resource: Utah Population Database. []

  26. Park BL, Kim LH, Shin HD, Park YW, Uhm WS, Bae SC: Association analyses of DNA methyltransferase-1 (DNMT1) polymorphisms with systemic lupus erythematosus. J Hum Genet. 2004, 49: 642-646. 10.1007/s10038-004-0192-x.

    Article  CAS  PubMed  Google Scholar 

  27. Fullerton SM, Buchanan AV, Sonpar VA, Taylor SL, Smith JD, Carlson CS, Salomaa V, Stengard JH, Boerwinkle E, Clark AG, Nickerson DA, Weiss KM: The effects of scale: variation in the APOA1/C3/A4/A5 gene cluster. Hum Genet. 2004, 115: 36-56. 10.1007/s00439-004-1106-x.

    Article  CAS  PubMed  Google Scholar 

  28. Setiawan VW, Hankinson SE, Colditz GA, Hunter DJ, De Vivo I: HSD17B1 gene polymorphisms and risk of endometrial and breast cancer. Cancer Epidemiol Biomarkers Prev. 2004, 13: 213-219.

    Article  CAS  PubMed  Google Scholar 

  29. Kosoy R, Yokoi N, Seino S, Concannon P: Polymorphic variation in the CBLB gene in human type 1 diabetes. Genes Immun. 2004, 5: 232-235. 10.1038/sj.gene.6364057.

    Article  CAS  PubMed  Google Scholar 

  30. Applied Biosystems. []

  31. Lee LG, Connell CR, Bloch W: Allelic discrimination by nick-translation PCR with fluorogenic probes. Nucleic Acids Res. 1993, 21: 3761-3766.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Clayton DG: SNPHAP. []

  33. Morrell D, Cromartie E, Swift M: Mortality and cancer incidence in 263 patients with ataxia-telangiectasia. J Natl Cancer Inst. 1986, 77: 89-92.

    CAS  PubMed  Google Scholar 

  34. Swift M, Morrell D, Massey RB, Chase CL: Incidence of cancer in 161 families affected by ataxia-telangiectasia. N Engl J Med. 1991, 325: 1831-1836.

    Article  CAS  PubMed  Google Scholar 

  35. Morrell D, Chase CL, Swift M: Cancers in 44 families with ataxia-telangiectasia. Cancer Genet Cytogenet. 1990, 50: 119-123. 10.1016/0165-4608(90)90245-6.

    Article  CAS  PubMed  Google Scholar 

  36. Broeks A, Urbanus JH, Floore AN, Dahler EC, Klijn JG, Rutgers EJ, Devilee P, Russell NS, van Leeuwen FE, van 't Veer LJ: ATM-heterozygous germline mutations contribute to breast cancer-susceptibility. Am J Hum Genet. 2000, 66: 494-500. 10.1086/302746.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chenevix-Trench G, Spurdle AB, Gatei M, Kelly H, Marsh A, Chen X, Donn K, Cummings M, Nyholt D, Jenkins MA, Scott C, Pupo GM, Dork T, Bendix R, Kirk J, Tucker K, McCredie MR, Hopper JL, Sambrook J, Mann GJ, Khanna KK: Dominant negative ATM mutations in breast cancer families. J Natl Cancer Inst. 2002, 94: 205-215.

    Article  PubMed  Google Scholar 

  38. Thorstenson YR, Roxas A, Kroiss R, Jenkins MA, Yu KM, Bachrich T, Muhr D, Wayne TL, Chu G, Davis RW, Wagner TM, Oefner PJ: Contributions of ATM mutations to familial breast and ovarian cancer. Cancer Res. 2003, 63: 3325-3333.

    CAS  PubMed  Google Scholar 

  39. Izatt L, Greenman J, Hodgson S, Ellis D, Watts S, Scott G, Jacobs C, Liebmann R, Zvelebil MJ, Mathew C, Solomon E: Identification of germline missense mutations and rare allelic variants in the ATM gene in early-onset breast cancer. Genes Chromosomes Cancer. 1999, 26: 286-294. 10.1002/(SICI)1098-2264(199912)26:4<286::AID-GCC2>3.0.CO;2-X.

    Article  CAS  PubMed  Google Scholar 

  40. FitzGerald MG, Bean JM, Hegde SR, Unsal H, MacDonald DJ, Harkin DP, Finkelstein DM, Isselbacher KJ, Haber DA: Heterozygous ATM mutations do not contribute to early onset of breast cancer. Nat Genet. 1997, 15: 307-310. 10.1038/ng0397-307.

    Article  CAS  PubMed  Google Scholar 

  41. Olsen JH, Hahnemann JM, Borresen-Dale AL, Brondum-Nielsen K, Hammarstrom L, Kleinerman R, Kaariainen H, Lonnqvist T, Sankila R, Seersholm N, Tretli S, Yuen J, Boice JDJ, Tucker M: Cancer in patients with ataxia-telangiectasia and in their relatives in the nordic countries. J Natl Cancer Inst. 2001, 93: 121-127. 10.1093/jnci/93.2.121.

    Article  CAS  PubMed  Google Scholar 

  42. Li A, Huang Y, Swift M: Neutral sequence variants and haplotypes at the 150 Kb ataxia-telangiectasia locus. Am J Med Genet. 1999, 86: 140-144. 10.1002/(SICI)1096-8628(19990910)86:2<140::AID-AJMG10>3.0.CO;2-X.

    Article  CAS  PubMed  Google Scholar 

  43. Bonnen PE, Story MD, Ashorn CL, Buchholz TA, Weil MM, Nelson DL: Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium. Am J Hum Genet. 2000, 67: 1437-1451. 10.1086/316908.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Thorstenson YR, Shen P, Tusher VG, Wayne TL, Davis RW, Chu G, Oefner PJ: Global analysis of ATM polymorphism reveals significant functional constraint. Am J Hum Genet. 2001, 69: 396-412. 10.1086/321296.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Tamimi RM, Hankinson SE, Spiegelman D, Kraft P, Colditz GA, Hunter DJ: Common ataxia telangiectasia mutated haplotypes and risk of breast cancer: a nested case-control study. Breast Cancer Res. 2004, 6: R416-22. 10.1186/bcr809.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Liu PY, Zhang YY, Lu Y, Long JR, Shen H, Zhao LJ, Xu FH, Xiao P, Xiong DH, Liu YJ, Recker RR, Deng HW: A survey of haplotype variants at several disease candidate genes: the importance of rare variants for complex diseases. J Med Genet. 2005, 42: 221-227. 10.1136/jmg.2004.024752.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Thompson D, Stram D, Goldgar D, Witte JS: Haplotype tagging single nucleotide polymorphisms and association studies. Hum Hered. 2003, 56: 48-55. 10.1159/000073732.

    Article  PubMed  Google Scholar 

  48. The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168.

  49. HapMap. []

  50. Iles MM: The effect of SNP marker density on the efficacy of haplotype tagging SNPs--a warning. Ann Hum Genet. 2005, 69: 209-215. 10.1046/j.1529-8817.2004.00141.x.

    Article  CAS  PubMed  Google Scholar 

Pre-publication history

Download references


Kristina Allen-Brady is an NLM fellow, supported by NLM grant T15 LM0724. This research was supported by a dissertation research grant from the Susan G. Komen Breast Cancer Foundation for Kristina Allen-Brady (DISS0201521, to NJC) and an NIH NCI grant CA 098364 (to NJC). We appreciate the assistance of Kim Nguyen (Genetic Epidemiology) and Michael Hoffman (Family and Preventive Medicine) for their help in the laboratory. We also thank Helaman Escobar (Director of Sequencing and Genomics) and Michael Klein (Genomics) from the Core Resource Facilities, University of Utah, for use of their equipment and assistance on this project. Data collected for this publication was assisted by the Utah Cancer Registry supported by National Institutes of Health, Contract NO1-PC-35141, Surveillance, Epidemiology and End Results (SEER) Program, with additional support from the Utah Department of Health and the University of Utah. Partial support for all datasets within the Utah Population Database (UPDB) was provided by the University of Utah Huntsman Cancer Institute.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kristina Allen-Brady.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

KAB assisted in the study design, performed the genotyping, and drafted the manuscript. NJC conceived of the study and its design and helped to draft the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Allen-Brady, K., Camp, N.J. Characterization of the linkage disequilibrium structure and identification of tagging-SNPs in five DNA repair genes. BMC Cancer 5, 99 (2005).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: