A procedure for the detection of linkage with high density SNP arrays in a large pedigree with colorectal cancer

Middeldorp, Anneke; Jagmohan-Changur, Shantie; Helmer, Quinta; van der Klift, Heleen M; Tops, Carli MJ; Vasen, Hans FA; Devilee, Peter; Morreau, Hans; Houwing-Duistermaat, Jeanine J; Wijnen, Juul T; van Wezel, Tom

doi:10.1186/1471-2407-7-6

Research article
Open access
Published: 12 January 2007

A procedure for the detection of linkage with high density SNP arrays in a large pedigree with colorectal cancer

Anneke Middeldorp¹,
Shantie Jagmohan-Changur²,
Quinta Helmer³,
Heleen M van der Klift⁴,
Carli MJ Tops⁴,
Hans FA Vasen⁵,
Peter Devilee²,
Hans Morreau¹,
Jeanine J Houwing-Duistermaat³,
Juul T Wijnen^2,4 &
…
Tom van Wezel¹

BMC Cancer volume 7, Article number: 6 (2007) Cite this article

5636 Accesses
5 Citations
Metrics details

Abstract

Background

The apparent dominant model of colorectal cancer (CRC) inheritance in several large families, without mutations in known CRC susceptibility genes, suggests the presence of so far unidentified genes with strong or moderate effect on the development of CRC. Linkage analysis could lead to identification of susceptibility genes in such families. In comparison to classical linkage analysis with multi-allelic markers, single nucleotide polymorphism (SNP) arrays have increased information content and can be processed with higher throughput. Therefore, SNP arrays can be excellent tools for linkage analysis. However, the vast number of SNPs on the SNP arrays, combined with large informative pedigrees (e.g. >35–40 bits), presents us with a computational complexity that is challenging for existing statistical packages or even exceeds their capacity. We therefore setup a procedure for linkage analysis in large pedigrees and validated the method by genotyping using SNP arrays of a colorectal cancer family with a known MLH1 germ line mutation.

Methods

Quality control of the genotype data was performed in Alohomora, Mega2 and SimWalk2, with removal of uninformative SNPs, Mendelian inconsistencies and Mendelian consistent errors, respectively. Linkage disequilibrium was measured by SNPLINK and Merlin. Parametric linkage analysis using two flanking markers was performed using MENDEL. For multipoint parametric linkage analysis and haplotype analysis, SimWalk2 was used.

Results

On chromosome 3, in the MLH1-region, a LOD score of 1.9 was found by parametric linkage analysis using two flanking markers. On chromosome 11 a small region with LOD 1.1 was also detected. Upon linkage disequilibrium removal, multipoint linkage analysis yielded a LOD score of 2.1 in the MLH1 region, whereas the LOD score dropped to negative values in the region on chromosome 11. Subsequent haplotype analysis in the MLH1 region perfectly matched the mutation status of the family members.

Conclusion

We developed a workflow for linkage analysis in large families using high-density SNP arrays and validated this workflow in a family with colorectal cancer. Linkage disequilibrium has to be removed when using SNP arrays, because it can falsely inflate the LOD score. Haplotype analysis is adequate and can predict the carrier status of the family members.

Peer Review reports

Background

Colorectal cancer (CRC) is the one of the most common malignancies in the Western world. Already in 1913, familial aggregation of CRC was described by Warthin [1] and later Lynch et al. described an additional family with clustering of colorectal and endometrial cancer [2]. Clinical definition of Lynch syndrome, or HNPCC, in 1991 [3, 4] was instrumental for linkage analysis, and ultimately for the identification of the underlying gene defects in HNPCC families. The first HNPCC loci were mapped to chromosomes 2 and 3 using microsatellite markers [5, 6]. This eventually led to the identification of germ line mutations in MSH2 [7] and MLH1 [8], respectively. Later, PMS2 [9], MSH6 [10, 11] and recently MutYH [12] were identified as CRC susceptibility genes. However, the so far identified CRC susceptibility genes can only explain up to 5% of all cases [13], while in ~35% of all colorectal cancer cases familial clustering is seen [14]. Furthermore, it is shown that first degree relatives of patients with colorectal cancer have a relative risk of 2.3 to develop the disease [15]. This indicates that still some genes with strong or moderate effect on CRC development remain to be identified. In order to identify these genes, linkage analysis in families could point to the loci where unknown susceptibility genes may reside. Indeed, different linkage analysis studies revealed potentially interesting regions on chromosomes 3q, 9q, 11q, 14q, 15q and 22q [16–20].

Families with a clustering of colorectal cancer but without germ line mutations in CRC genes have been under surveillance in Leiden since the 1980s. Due to the long period of follow-up, with three to four affected generations, these Dutch HNPCC-like families have become informative for linkage analysis.

Traditionally, linkage analysis is performed with multi-allelic microsatellite markers. Recently, however, the more advanced single nucleotide polymorphism (SNP) arrays were brought into use for linkage analysis. It was shown that the information content of a dense SNP map is significantly and uniformly higher than that of a genome wide microsatellite marker map [21]. Several studies conducting linkage analysis on genotype data from SNP arrays appeared in recent years [22–24]. In these studies non-parametric as well as parametric linkage analysis was performed in sib pairs or in small to moderate size pedigrees. However, to date, no studies have been published on linkage analysis using SNPs in large pedigrees (e.g. >35–40 bits).

Studying large families with thousands of SNPs results in a computational complex analysis that is challenging for existing statistical packages and that may even exceed their capacity. Current linkage analysis programs can handle either large pedigrees or large numbers of markers, depending on the underlying algorithm. In order to perform linkage analysis in large pedigrees using SNP arrays, we explored the possibilities of currently available linkage analysis software. Most currently available programs are based on the Lander-Green or the Elston-Stewart algorithm or both. The computation time of the former algorithm increases exponentially with the number of bits (2n - f, where 'n' is the number of non-founders and 'f' the number of founders) in a pedigree, whereas the latter scales exponentially with the number of markers. To perform multipoint linkage analysis in a large family with SNP arrays in one run would probably take several months computation time, if at all possible.

Several programs are suitable for linkage analysis with bi-allelic markers. Genehunter and Merlin can handle a relative large numbers of markers, however the analysis is restricted to pedigrees of up to ~30-bits [25, 26]. Both programs are based on the Lander-Green Hidden Markov Model algorithm and can perform non-parametric as well as parametric linkage analysis. In Genehunter, the Elston-Stewart algorithm is also implemented, allowing the performance of simultaneous analysis of several markers as well as analysis of pedigrees of moderate size. A third program based on the Lander-Green Hidden Markov Model algorithm is Allegro 2. This program can handle large pedigrees (up to ~40 bits), although the computational costs increase substantially when not all genotype information of the family is available [27, 28]. Allegro calculates parametric LOD scores as well as NPL scores and allele-sharing LOD scores. Another program, SNPLINK [28, 29] can perform automated linkage analysis with LD removal using either Allegro or Merlin. However, for all the above mentioned programs the different branches of large families (i.e. >35–40 bits) need to be analyzed separately. This will lead to substantial loss of information and potential undetected linkage.

MENDEL [30] is a program that is suitable for linkage analysis with SNPs in large pedigrees. It allows adjusting the maximum number of meioses, though the computation time will increase in that case. Both parametric and non-parametric linkage analysis can be performed in MENDEL. The program will either use the Lander-Green or the Elston-Stewart algorithm, depending on whichever is more efficient for the pedigree. SimWalk2 is a program that can perform multipoint parametric linkage analysis, haplotype analysis and a few other analyses in large pedigrees using bi-allelic markers. It uses Markov chain Monte Carlo methods to compute the likelihood [31]. Simwalk2 uses the MENDEL program for computing location scores. With the aim to detect linkage in CRC families exceeding 40 bits we established a procedure using freely available software packages and validated this in a large colorectal cancer family, with a known causal MLH1 germ line mutation on chromosome 3.

Methods

Patients

A large colorectal cancer family (Figure 1) with a recently identified mutation in the MLH1 gene (c.1046dupT, p.Pro350fs) was studied. Nine family members are affected with colorectal cancer. Another two family members are affected with polyps and three cases with skin cancer (non-specified) and one case with endometrium cancer (non-specified) are seen as well. Peripheral blood lymphocytes were collected from the family members. DNA was extracted using standard procedures. A total of thirteen family members were genotyped on Affymetrix GeneChip Human Mapping 10K 2.0 SNP arrays. The arrays were processed according to the instructions of the manufacturer. The mean SNP call rate was 96.3% (89.0%-98.5%).

The study was approved by the Medical Ethical Committee of the LUMC (protocol P01-019).

Workflow

We processed the data according to the following workflow: 1) First, the genotype data were generated by GeneChip DNA Analysis Software (GDAS) from Affymetrix. 2) These genotype data were combined with the pedigree and the marker information in Alohomora. 3) In this program the uninformative SNPs were removed as well. 4) To be able to perform linkage analysis in the desired program, the output files (in Merlin-format) of Alohomora were by Mega2 converted to the proper format. 5) Mega2 also removed the Mendelian inconsistent errors. 6) The files were then ready to perform parametric linkage analysis using 2 flanking markers in MENDEL; affected-only analysis as well as parametric linkage analysis using liability classes was performed. 7) Based on the second analysis, regions of interest were defined that were further tested for Mendelian consistent errors and 8) possible linkage disequilibrium was removed in SNPLINK. 9) Multipoint parametric linkage analysis using the liability classes was then performed in Simwalk2 for the ROIs and 10) finally, the haplotypes were inferred in Simwalk2.

Data formatting and quality control

Genotype data of the individual family members were generated using GeneChip DNA Analysis Software (GDAS) from Affymetrix. In the Alohomora program [32] the pedigree information, allele frequencies and map position of the SNPs were combined with the genotype data generated by GDAS. The uninformative SNPs in this pedigree, that show either only A alleles and No Calls or only B alleles and No Calls, were removed from further analysis by Alohomora. The data files were exported in Merlin format. Subsequently, in Mega2 [33] these Alohomora files were converted into the appropriate format for the programs used for linkage analysis, i.e. either the Mendel 5 format or the SimWalk2 format. Mendelian inconsistent errors were removed from analysis with Mega2 by setting all genotypes of these SNPs to unknown.

Mendelian consistent errors

Mendelian consistent errors were identified by mistyping analysis. Since this analysis is computationally complex and therefore time consuming (2 1/4 hours for 35 SNPs), only the regions of interest were analyzed for Mendelian consistent errors. All chromosomal regions with LOD scores exceeding 1 and lacking negative LOD scores were defined as regions of interest (ROI). SimWalk2 [31] was used to check all ROI for Mendelian consistent errors by performing mistyping analysis. An error model with a uniform error rate for all mistypings was used. The overall rate of mistyping was set at 0.004 [34, 35]. The threshold for the posterior probability of mistyping was set at 0.5 [36].

Linkage disequilibrium estimation

In the ROI the pair-wise correlation coefficient r², as a measure of linkage disequilibrium (LD) between adjacent SNPs, was estimated using SNPLINK [29] and Merlin [26]. Since we are only interested in estimates of r², we split the large family into nuclear families. In addition to the family under study, genotypes from 12 Dutch nuclear families from other studies (unpublished results) were used to calculate LD. The program SNPLINK provides a list of SNPs to be removed. We used as cut off value for LD removal an r² ≥ 0.4. The information content was computed before and after removal of the SNPs using Merlin.

Linkage analysis

To determine the power to detect linkage in the MLH1 family, we performed a simulation study using Simlink [37] under the assumption of a dominant trait with a piecewise linear penetrance. Subsequently, we performed an affected-only linkage analysis and modeled a dominant trait with an allele frequency of 0.001. For parametric linkage analysis, the proper assignment of affected status to family members is crucial since, due to the surveillance of the families, adenomas will be detected and removed before they can develop into a carcinoma. Additionally, the risk of cancer increases with age. And the risk of developing an adenoma is different from the risk of developing a carcinoma. To adjust for these phenomena, we defined 10 liability classes: four classes were defined with different penetrances for colorectal cancer; four classes for polyp carriers and two more liability classes for spouses, that carry a population risk of developing polyps or colorectal cancer and one for the family members of which the disease status is not known. These liability classes are based on the incidences of CRC and adenomas in the members of HNPCC families in the Netherlands, that do not carry the disease causing mutation [38].

In MENDEL [30], an affected-only parametric linkage analysis was performed using two flanking markers (computation time: ~20 sec per chromosome). In this analysis only family members with colorectal cancer were defined as affected and all other persons were set to unknown. Parametric linkage analysis with liability classes was performed thereafter, using two flanking markers (computation time: ~20 sec per chromosome). Cancers other than colorectal cancer were not considered to be part of the syndrome. In the ROIs appearing from this linkage analysis, possible Mendelian consistent errors were removed as well as the possible presence of linkage disequilibrium. Subsequently, multipoint parametric linkage analysis was performed in SimWalk2 [31], using the ten liability classes. In this multipoint analysis no more than 30 SNPs were analyzed, limited by the computational complexity (analysis time: 1 3/4 hours for 30 SNPs).

Haplotype analysis

Haplotype analysis was performed in the ROI, using SimWalk2. All SNPs in the region of interest (~18) were included in this analysis (computation time: 1 1/3 hours for 18 SNPs). The results of the haplotyping were visualized in HaploPainter [39]. The haplotype segregation in the family could then be compared to the segregation of the mutation in MLH1 in this family.

Results and discussion

Linkage analysis using bi-allelic genotype data from SNP arrays and large families is a computational challenge using commonly used, freely available analysis software. For the different steps of the linkage analysis; e.g. data formatting, detection of Mendelian inconsistencies, mistyping analysis, LD removal and single to multipoint linkage analysis, we have chosen the following programs that can handle large pedigrees and many SNPs where required; Alohomora [32], Mega2 [33], MENDEL [30], SNPLINK [29] and SimWalk2 [31].

In advance of the linkage analysis we performed a simulation study to calculate the power using Simlink. The mean LOD score in 1000 simulations in this family was 2.0.

The Alohomora program [32] was used first to combine the genotype data generated with the SNP arrays, and the pedigree and SNP information and secondly, to convert these data into the appropriate format for further analysis. In addition, 1256 of the 10053 SNPs were uninformative and were therefore removed from analysis by Alohomora.

Since errors in genotyping can easily mask linkage, the data were checked for different types of errors. First, we have estimated the genotyping error rate in five duplicate experiments. The mean genotyping error rate between the duplicates was only 0.0051.

Mega2 was then used for several data validation checks, including errors in the pedigree data or Mendelian inconsistent errors. Mega2 was used since it supports 28 different programs, including the programs MENDEL and SimWalk2, which we have used for linkage analysis and haplotype analysis. The genotypes of 18 SNPs (0.21%) were removed from analysis, because of Mendelian inconsistencies. However, with bi-allelic markers not all errors appear as Mendelian inconsistent errors [40]. The data were therefore also checked for Mendelian consistent errors. Because of the computational complexity of these multipoint analyses, this error check was performed only in the regions of interest. The mistyping analysis option in SimWalk2 was used, since this program can handle such a complex analysis in a large pedigree. No Mendelian consistent errors were identified in the ROI.

Affected-only parametric linkage analysis and parametric linkage analysis using liability classes was performed in MENDEL, using two flanking markers. This analysis showed a maximum LOD score of 1.8 in the affected-only analysis and 1.9 using liability classes for a 1.7 Mb region around the MLH1 gene on chromosome 3 (Figure 2). A second region with a LOD 1.1 was found, both in the affected-only analysis and using liability classes, near the centromere on chromosome 11.

Current linkage analysis programs assume LD between markers and a disease locus and importantly, linkage equilibrium between markers. The presence of linkage disequilibrium between two markers can falsely inflate the LOD score and missing genotypes can increase this effect. Therefore, the r² as a measure of LD was computed in Merlin and SNPLINK. Using the threshold r² ≥ 0.4, 5 of the 27 SNPs in the region on chromosome 3 were removed from the analysis. From the region of interest on chromosome 11, 14 of the 30 SNPs with an r² ≥ 0.4 were removed from the analysis. After LD removal, multipoint linkage analysis in the region on chromosome 3 yielded a LOD score of 2.1, whereas on chromosome 11 negative LOD scores were seen by multipoint linkage analysis after LD was removed. This indicates that the strong LD in the region on chromosome 11 was responsible for the peak in the LOD in that region. On both chromosomes, the removal of SNPs with high LD had no significant effect on the information content (not shown).

We inferred the haplotypes of the family members, using SimWalk2 for the linkage region on chromosome 3. All known affected MLH1-mutation carriers share the same haplotype, as well as the affected obligate carriers. Therefore, this haplotype perfectly co-segregates with the clinical phenotype of the family members (Figure 1). Case 23, who had developed polyps at age 60, does not share this haplotype. Subsequent mutation analysis showed that this individual indeed did not carry the disease causing mutation in MLH1. Therefore, this case showed to be a phenocopy. Another family member, case 39, has to date not developed clinical symptoms of HNPCC, although he did inherit the disease causing allele according to the haplotype analysis. Indeed, sequence analysis showed that this person carries the mutation.

Conclusion

In conclusion, we show that we can perform linkage analysis with high-density 10K SNP arrays in large families for which not all members could be genotyped. We developed a workflow with different publicly available software to perform the analyses: removal of Mendelian consistent and Mendelian inconsistent errors, two and multipoint parametric linkage analysis, removal of linkage disequilibrium and haplotype analysis. The procedure was validated in a large CRC family carrying a known germ line mutation in MLH1. Linkage was found with the MLH1 gene and subsequent haplotype analysis corresponds to the mutation status of the family members. This procedure can now be used for linkage analysis of large families with an inherited condition, such as hereditary colorectal cancer.

Abbreviations

CRC:: colorectal cancer
SNP:: single nucleotide polymorphism
LD:: linkage disequilibrium
LOD:: log of odds
HNPCC:: hereditary nonpolyposis colorectal cancer
GDAS:: GeneChip DNA Analysis Software
ROI:: regions of interest

References

Warthin AS: Heredity with reference to carcinoma. Arch Intern Med. 1913, 9: 546-555.
Article Google Scholar
Lynch HT, Shaw MW, Magnuson CW, Larsen AL, Krush AJ: Hereditary factors in cancer. Study of two large midwestern kindreds. Arch Intern Med. 1966, 117: 206-212. 10.1001/archinte.117.2.206.
Article CAS PubMed Google Scholar
Vasen HF, Mecklin JP, Khan PM, Lynch HT: The International Collaborative Group on Hereditary Non-Polyposis Colorectal Cancer (ICG-HNPCC). Dis Colon Rectum. 1991, 34: 424-425. 10.1007/BF02053699.
Article CAS PubMed Google Scholar
Vasen HF, Watson P, Mecklin JP, Lynch HT: New clinical criteria for hereditary nonpolyposis colorectal cancer (HNPCC, Lynch syndrome) proposed by the International Collaborative group on HNPCC. Gastroenterology. 1999, 116: 1453-1456. 10.1016/S0016-5085(99)70510-X.
Article CAS PubMed Google Scholar
Peltomaki P, Aaltonen LA, Sistonen P, Pylkkanen L, Mecklin JP, Jarvinen H, Green JS, Jass JR, Weber JL, Leach FS, .: Genetic mapping of a locus predisposing to human colorectal cancer. Science. 1993, 260: 810-812. 10.1126/science.8484120.
Article CAS PubMed Google Scholar
Lindblom A, Tannergard P, Werelius B, Nordenskjold M: Genetic mapping of a second locus predisposing to hereditary non-polyposis colon cancer. Nat Genet. 1993, 5: 279-282. 10.1038/ng1193-279.
Article CAS PubMed Google Scholar
Fishel R, Lescoe MK, Rao MR, Copeland NG, Jenkins NA, Garber J, Kane M, Kolodner R: The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell. 1993, 75: 1027-1038. 10.1016/0092-8674(93)90546-3.
Article CAS PubMed Google Scholar
Bronner CE, Baker SM, Morrison PT, Warren G, Smith LG, Lescoe MK, Kane M, Earabino C, Lipford J, Lindblom A, .: Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non-polyposis colon cancer. Nature. 1994, 368: 258-261. 10.1038/368258a0.
Article CAS PubMed Google Scholar
Nicolaides NC, Papadopoulos N, Liu B, Wei YF, Carter KC, Ruben SM, Rosen CA, Haseltine WA, Fleischmann RD, Fraser CM: Mutations of two PMS homologues in hereditary nonpolyposis colon cancer. Nature. 1994, 371: 75-80. 10.1038/371075a0.
Article CAS PubMed Google Scholar
Akiyama Y, Sato H, Yamada T, Nagasaki H, Tsuchiya A, Abe R, Yuasa Y: Germ-line mutation of the hMSH6/GTBP gene in an atypical hereditary nonpolyposis colorectal cancer kindred. Cancer Res. 1997, 57: 3920-3923.
CAS PubMed Google Scholar
Miyaki M, Konishi M, Tanaka K, Kikuchi-Yanoshita R, Muraoka M, Yasuno M, Igari T, Koike M, Chiba M, Mori T: Germline mutation of MSH6 as the cause of hereditary nonpolyposis colorectal cancer. Nat Genet. 1997, 17: 271-272. 10.1038/ng1197-271.
Article CAS PubMed Google Scholar
Al Tassan N, Chmiel NH, Maynard J, Fleming N, Livingston AL, Williams GT, Hodges AK, Davies DR, David SS, Sampson JR, Cheadle JP: Inherited variants of MYH associated with somatic G:C-->T:A mutations in colorectal tumors. Nat Genet. 2002, 30: 227-232. 10.1038/ng828.
Article CAS PubMed Google Scholar
Kinzler KW, Vogelstein B: Lessons from hereditary colorectal cancer. Cell. 1996, 87: 159-170. 10.1016/S0092-8674(00)81333-1.
Article CAS PubMed Google Scholar
Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K: Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000, 343: 78-85. 10.1056/NEJM200007133430201.
Article CAS PubMed Google Scholar
Peto J, Houlston RS: Genetics and the common cancers. Eur J Cancer. 2001, 37 Suppl 8: S88-S96. 10.1016/S0959-8049(01)00255-6.
Article CAS PubMed Google Scholar
Djureinovic T, Skoglund J, Vandrovcova J, Zhou X, Kalushkova A, Iselius L, Lindblom A: A genome-wide linkage analysis in Swedish families with hereditary non-FAP/non-HNPCC colorectal cancer. Gut. 2005
Google Scholar
Kemp Z, Carvajal-Carmona L, Spain S, Barclay E, Gorman M, Martin L, Jaeger E, Brooks N, Bishop DT, Thomas H, Tomlinson I, Papaemmanuil E, Webb E, Sellick GS, Wood W, Evans G, Lucassen A, Maher ER, Houlston RS: Evidence for a colorectal cancer susceptibility locus on chromosome 3q21-q24 from a high-density SNP genome-wide linkage scan. Hum Mol Genet. 2006
Google Scholar
Kemp ZE, Carvajal-Carmona LG, Barclay E, Gorman M, Martin L, Wood W, Rowan A, Donohue C, Spain S, Jaeger E, Evans DG, Maher ER, Bishop T, Thomas H, Houlston R, Tomlinson I: Evidence of linkage to chromosome 9q22.33 in colorectal cancer kindreds from the United kingdom. Cancer Res. 2006, 66: 5003-5006. 10.1158/0008-5472.CAN-05-4074.
Article CAS PubMed Google Scholar
Tomlinson I, Rahman N, Frayling I, Mangion J, Barfoot R, Hamoudi R, Seal S, Northover J, Thomas HJ, Neale K, Hodgson S, Talbot I, Houlston R, Stratton MR: Inherited susceptibility to colorectal adenomas and carcinomas: evidence for a new predisposition gene on 15q14-q22. Gastroenterology. 1999, 116: 789-795. 10.1016/S0016-5085(99)70061-2.
Article CAS PubMed Google Scholar
Wiesner GL, Daley D, Lewis S, Ticknor C, Platzer P, Lutterbaugh J, MacMillen M, Baliner B, Willis J, Elston RC, Markowitz SD: A subset of familial colorectal neoplasia kindreds linked to chromosome 9q22.2-31.2. Proc Natl Acad Sci U S A. 2003, 100: 12961-12965. 10.1073/pnas.2132286100.
Article CAS PubMed PubMed Central Google Scholar
John S, Shephard N, Liu G, Zeggini E, Cao M, Chen W, Vasavda N, Mills T, Barton A, Hinks A, Eyre S, Jones KW, Ollier W, Silman A, Gibson N, Worthington J, Kennedy GC: Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites. Am J Hum Genet. 2004, 75: 54-64. 10.1086/422195.
Article CAS PubMed PubMed Central Google Scholar
Arinami T, Ohtsuki T, Ishiguro H, Ujike H, Tanaka Y, Morita Y, Mineta M, Takeichi M, Yamada S, Imamura A, Ohara K, Shibuya H, Ohara K, Suzuki Y, Muratake T, Kaneko N, Someya T, Inada T, Yoshikawa T, Toyota T, Yamada K, Kojima T, Takahashi S, Osamu O, Shinkai T, Nakamura M, Fukuzako H, Hashiguchi T, Niwa SI, Ueno T, Tachikawa H, Hori T, Asada T, Nanko S, Kunugi H, Hashimoto R, Ozaki N, Iwata N, Harano M, Arai H, Ohnuma T, Kusumi I, Koyama T, Yoneda H, Fukumaki Y, Shibata H, Kaneko S, Higuchi H, Yasui-Furukori N, Numachi Y, Itokawa M, Okazaki Y: Genomewide high-density SNP linkage analysis of 236 Japanese families supports the existence of schizophrenia susceptibility Loci on chromosomes 1p, 14q, and 20p. Am J Hum Genet. 2005, 77: 937-944. 10.1086/498122.
Article CAS PubMed PubMed Central Google Scholar
Sellick GS, Coleman RJ, Webb EL, Chow J, Bevan S, Rosbotham JL, Houlston RS: Dominantly inherited cutaneous small-vessel lymphocytic vasculitis maps to chromosome 6q26-q27. Hum Genet. 2005, 118: 82-6. 10.1007/s00439-005-0022-z.
Article PubMed Google Scholar
Sellick GS, Webb EL, Allinson R, Matutes E, Dyer MJ, Jonsson V, Langerak AW, Mauro FR, Fuller S, Wiley J, Lyttelton M, Callea V, Yuille M, Catovsky D, Houlston RS: A high-density SNP genomewide linkage scan for chronic lymphocytic leukemia-susceptibility loci. Am J Hum Genet. 2005, 77: 420-429. 10.1086/444472.
Article CAS PubMed PubMed Central Google Scholar
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 1347-1363.
CAS PubMed PubMed Central Google Scholar
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.
Article CAS PubMed Google Scholar
Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25: 12-13. 10.1038/75514.
Article CAS PubMed Google Scholar
Gudbjartsson DF, Thorvaldsson T, Kong A, Gunnarsson G, Ingolfsdottir A: Allegro version 2. Nat Genet. 2005, 37: 1015-1016. 10.1038/ng1005-1015.
Article CAS PubMed Google Scholar
Webb EL, Sellick GS, Houlston RS: SNPLINK: multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal. Bioinformatics. 2005, 21: 3060-3061. 10.1093/bioinformatics/bti449.
Article CAS PubMed Google Scholar
Lange K, Cantor R, Horvath S, Perola M, Sabatti C, Sinsheimer J, Sobel E: Mendel version 4.0: a complete package for the exact genetic analysis of discrete traits in pedigree and population data sets. Am J Hum Genet. 2001, 69(Supplement): A1886-
Google Scholar
Sobel E, Lange K: Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. Am J Hum Genet. 1996, 58: 1323-1337.
CAS PubMed PubMed Central Google Scholar
Ruschendorf F, Nurnberg P: ALOHOMORA: a tool for linkage analysis using 10K SNP array data. Bioinformatics. 2005, 21: 2123-2125. 10.1093/bioinformatics/bti264.
Article PubMed Google Scholar
Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE: Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics. 2005, 21: 2556-2557. 10.1093/bioinformatics/bti364.
Article CAS PubMed Google Scholar
Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J, Liu W, Yang G, Di X, Ryder T, He Z, Surti U, Phillips MS, Boyce-Jacino MT, Fodor SP, Jones KW: Large-scale genotyping of complex DNA. Nat Biotechnol. 2003, 21: 1233-1237. 10.1038/nbt869.
Article CAS PubMed Google Scholar
Matsuzaki H, Loi H, Dong S, Tsai YY, Fang J, Law J, Di X, Liu WM, Yang G, Liu G, Huang J, Kennedy GC, Ryder TB, Marcus GA, Walsh PS, Shriver MD, Puck JM, Jones KW, Mei R: Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res. 2004, 14: 414-425. 10.1101/gr.2014904.
Article CAS PubMed PubMed Central Google Scholar
Sobel E, Papp JC, Lange K: Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet. 2002, 70: 496-508. 10.1086/338920.
Article PubMed PubMed Central Google Scholar
Boehnke M, Ploughman LM: SIMLINK: A Program for Estimating the Power of a Proposed Linkage Study by Computer Simulation. 1997, [http://csg.sph.umich.edu/boehnke/simlink.php]
Google Scholar
de Jong AE, Morreau H, Nagengast FM, Mathus-Vliegen EM, Kleibeuker JH, Griffioen G, Cats A, Vasen HF: Prevalence of adenomas among young individuals at average risk for colorectal cancer. Am J Gastroenterol. 2005, 100: 139-143. 10.1111/j.1572-0241.2005.41000.x.
Article PubMed Google Scholar
Thiele H, Nurnberg P: HaploPainter: a tool for drawing pedigrees with complex haplotypes. Bioinformatics. 2005, 21: 1730-1732. 10.1093/bioinformatics/bth488.
Article CAS PubMed Google Scholar
Gordon D, Heath SC, Ott J: True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms. Hum Hered. 1999, 49: 65-70. 10.1159/000022846.
Article CAS PubMed Google Scholar

Pre-publication history

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2407/7/6/prepub

Download references

Acknowledgements

Grant support: Dutch Cancer Society UL 2005–3247, Nuts Ohra T07-092, NWO 912-03-014.

We thank Katja Philippo for her assistance in identifying the mutation status of several family members.

Author information

Authors and Affiliations

Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
Anneke Middeldorp, Hans Morreau & Tom van Wezel
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Shantie Jagmohan-Changur, Peter Devilee & Juul T Wijnen
Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
Quinta Helmer & Jeanine J Houwing-Duistermaat
Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
Heleen M van der Klift, Carli MJ Tops & Juul T Wijnen
The Netherlands Foundation for the Detection of Hereditary Tumours, Leiden, The Netherlands
Hans FA Vasen

Authors

Anneke Middeldorp
View author publications
You can also search for this author in PubMed Google Scholar
Shantie Jagmohan-Changur
View author publications
You can also search for this author in PubMed Google Scholar
Quinta Helmer
View author publications
You can also search for this author in PubMed Google Scholar
Heleen M van der Klift
View author publications
You can also search for this author in PubMed Google Scholar
Carli MJ Tops
View author publications
You can also search for this author in PubMed Google Scholar
Hans FA Vasen
View author publications
You can also search for this author in PubMed Google Scholar
Peter Devilee
View author publications
You can also search for this author in PubMed Google Scholar
Hans Morreau
View author publications
You can also search for this author in PubMed Google Scholar
Jeanine J Houwing-Duistermaat
View author publications
You can also search for this author in PubMed Google Scholar
Juul T Wijnen
View author publications
You can also search for this author in PubMed Google Scholar
Tom van Wezel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom van Wezel.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

AM performed SNP arrays and mutation analysis, statistical analyses and drafted the manuscript. SJC and QH assisted in the statistical analysis, and QH performed the LD analysis. HMVDK performed SNP arrays. CMJT provided DNA samples and mutation status. HFAV was responsible for family recruitment and surveillance. PD participated in study design. JJHD supervised the statistical analysis, JTW, HM and TVW designed and coordinated the study, TVW helped to draft the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Middeldorp, A., Jagmohan-Changur, S., Helmer, Q. et al. A procedure for the detection of linkage with high density SNP arrays in a large pedigree with colorectal cancer. BMC Cancer 7, 6 (2007). https://doi.org/10.1186/1471-2407-7-6

Download citation

Received: 07 August 2006
Accepted: 12 January 2007
Published: 12 January 2007
DOI: https://doi.org/10.1186/1471-2407-7-6

A procedure for the detection of linkage with high density SNP arrays in a large pedigree with colorectal cancer