We have developed a method for inferring regions of LOH by analyzing cohort heterozygosity values using SNP arrays, without the need for patient-matched samples. Initially, LOH regions were inferred in a small cohort of B-CLL patients using the Cohort Heterozygosity Comparison method. In order to investigate the efficacy of this method, regions of 'conventional LOH' were mapped in these patients through the use of SNP array data from patient-matched buccal samples, which was further validated by analysis of microsatellite markers. Following validation of this novel method of LOH analysis, CHC analysis was used to infer LOH in an extended cohort of B-CLL samples as well as larger cohorts of FL and DLBCL samples.
The CHC method employs genotype data generated by SNP microarray in order to detect regions of decreased heterozygosity frequency in the case cohort compared to the control cohort. Due to the decreasing heterozygosity of markers of markers on modern SNP array platforms, the CHC method only utilizes markers with a predicted capacity to generate at least one heterozygous genotype within each of the case and control cohorts. For each of these SNPs, delta values are calculated that correspond to the difference in heterozygosity between cases and controls. Decreases in heterozygosity frequency, as demonstrated by a positive delta score, are considered to be suggestive of LOH if they exceed a set threshold. This threshold (ΔT) is one of the modulatory parameters of the CHC method that can be used to alter the balance between sensitivity and specificity, but is usually set to maintain sufficient statistical power for the analysis. The second modulatory parameter contiguous point threshold (CPT), which is used to highlight regions as inferring LOH only if the number of contiguous markers exceeding ΔT is equal to or above the CPT. The CPT is applied in order to minimize false positives, and the ability of this parameter to modulate the sensitivity and specificity of this method is aptly shown by the decreasing number of inferred LOH regions and more accurate inference of LOH with increasing CPT, respectively (Table 1).
A caveat of the CHC method when employing small sample sizes is the increased requirement for high heterozygosity frequencies in order to detect LOH. This lead to a portion of the regions within the validation cohort mapping adjacent to, rather than encompassing, SNPs showing 'conventional LOH'. In this investigation the validation cohort sample size was modest; meaning that the effect size must be large in order to define regions with significantly decreased heterozygosity in the case cohort compared to the control cohort. In the validation cohort of 4 samples, when employing a CPT of 5 and a delta threshold of 0.5, there must be 5 consecutive SNPs with a 50% decrease in heterozygosity between the case and control cohorts in order LOH to be inferred. This means the CHC method can only detect LOH in small sample sizes if the region of LOH has a high enough initial rate of heterozygosity. In instances in which SNPs showing 'conventional LOH' had low population heterozygosity, the 'inferred LOH' region frequently mapped directly adjacent to them rather than encompassing them. This is supported by the fact that the SNPs showing 'conventional LOH' that mapped adjacent to regions of 'inferred LOH' derived by the CHC method had a significantly lower population heterozygosity than those that mapped inside the 'inferred LOH' regions. Furthermore, the distance of these adjacently positioned regions of 'inferred LOH' from 'conventional LOH' was determined primarily by the resolution of markers within each respective genomic region. The need for high population heterozygosity frequencies in order to map SNPs showing 'conventional LOH' within regions of 'inferred LOH' generated by the CHC method can be alleviated by increasing the cohort sample size. This would augment the power of the investigation and decrease the effect size required to define SNPs as being suggestive of LOH. However, even with the small sample size in the validation cohort the CHC method was shown to successfully infer LOH regions closely to SNPs showing 'conventional LOH', including those with low population heterozygosities. Furthermore, utilization of the CHC method with small sample sizes is further aided by the use of enrichment scores for each 'inferred LOH' region. This score is a sum of the heterozygosity frequency delta values, and was shown to significantly and negatively correlate with the distance of 'inferred LOH' from 'conventional LOH'.
The only current alternative to the CHC method for high resolution mapping of LOH using SNP microarray data without the need for patient-matched control samples is a method based on an HMM algorithm [7]. This method relies on identification of long strings of homozygous SNP genotypes in order to infer LOH. The HMM method therefore primarily differs from the CHC algorithm due to its mode of analysis being directed linearly within a single sample, while the CHC algorithm applies cross-sectional interrogation of genomic regions across entire cohorts of samples. This means that, while the HMM algorithm is predisposed to generating false-positive LOH calls as a result of consanguinity in individual patient's family trees and the subsequent enrichment of homozygosity, the cross-sectional approach adopted by the CHC algorithm is not as heavily affected by individual genetic background. We employed this method to analyze each of the four initial validation case samples in this investigation. It yielded 699 candidate regions with an average size of over 95 kb between the four cases. Only 19% of these candidate regions overlapped in two or more patients, resulting in 567 independent genomic regions that were suggestive of LOH. The HMM method does not provide a cumulative index or probability of LOH across all samples, and thus differences between this method and the CHC method are to be expected. However, when comparing the accuracy of the HMM-generated regions to the regions inferred by the CHC method with the recommended CPT, it can be seen that the CHC method delineated regions that were on average over 70% smaller, and more frequently mapped within the immediate vicinity of 'conventional LOH'. It should be noted that, although the size of HMM-inferred regions of LOH could be decreased by altering the state change parameter, this would also further decrease the specificity of the method. Nonetheless, the CHC method was found to infer LOH more specifically and within smaller genomic regions than the HMM method. This suggests that, not only is the CHC method more accurate in inferring LOH, but the increased precision may allow more specific delineation of smaller regions of LOH. Although there are clear advantages in combining cases into a cohort as part of the CHC method, this approach inhibits the ability to infer LOH regions within individual samples. Instead, the CHC method is more appropriately employed in generating candidate LOH regions implicating genes with a broad role in disease pathogenesis rather than a variable importance on a case-to-case basis.
In order to investigate whether automated analysis of larger samples of NHL patients could derive hypothetically important candidates, the CHC method was used to analyze larger cohorts B-CLL, FL and DLBCL samples. Using the enrichment score, the top two LOH candidate regions for each disease subtype were investigated for candidate genes that may possess a hypothetical tumor-suppressor function in NHL. Although one of the regions of 'inferred LOH' in DLBCL mapped over a genetic region with no closely linked coding loci, this region (3q11.2) has been shown to be lost in a range of cancers including acute lymphoblastic leukemia and mantle cell lymphoma [9, 10], and the remaining 5 candidate regions were closely linked to attractive TSG candidates. Among these were two DNA repair genes (FANCD2, ERCC8), an inhibitor of RAS-mediated transformation (ANP32C), and a repressor of NFκB activity that was highlighted in both B-CLL and FL cohorts (ETV3).
The association between DNA repair genes and the pathogenesis of NHL is demonstrated by lymphoma predisposition within immunodeficiency cases containing mutations in genes mediating DNA damage repair [11] to developing NHL, as well as the numerous associations between polymorphisms in DNA repair genes and genetic susceptibility to NHL [12–14]. It is therefore not surprising that DNA repair genes may play a central tumor suppressor role in NHL. The FANCD2 gene was linked with a region of 'inferred LOH' highlighted in the B-CLL patient cohort, and functions in by forming complexes with BRCA1 or RAD51 and mediating repair of transcriptionally active genes [15]. Furthermore, this gene has also been shown to be essential for maintaining the G2 cell-cycle checkpoint [16]. The ERCC8 gene is also involved in DNA repair of transcriptionally active genes [17], and was linked with a region of inferred LOH in the FL patient cohort. Further support for this gene as a potential TSG is provided by the increased frequency of cancer in Cackayne's syndrome patients in which the ERCC8 gene is mutated [18]. The targeting of the FANCD2 and ERCC8 genes by LOH, as inferred by the CHC method, indicates that knock-out of their function may allow for the accumulation of mutations within transcriptionally active genes. Furthermore this may also be compounded by subsequent un-checked progression through the G2 checkpoint without FANCD2-associated cell-cycle arrest or apoptotic induction.
The ANP32C gene was highlighted as a potential TSG by CHC analysis of the DLBCL patient cohort. This gene has been previously described as a tumor suppressor gene, and elucidation of its function revealed that this role may be due to repression of RAS-mediated tumorigenesis [18]. However, our implication of this gene in the pathogenesis of NHL is a completely novel finding. The most interesting finding within the CHC analysis results was the revelation of ETV3 as a potential TSG in both B-CLL and FL patient cohorts. This gene is part of the ETS-family of tumor suppressors and functions in repression of NFκB-activated transcription [19]. LOH of the ETV3 locus at 1q23.1 has been previously demonstrated in B-CLL [20], and its potential as a TSG is supported by the importance of NFκB signaling in promoting proliferation and inhibiting apoptosis within lymphocytes [21]. The implication of genes with well defined tumor suppressor roles by cohort heterozygosity comparison of B-CLL, FL and DLBCL provides evidence that this novel method of analysis infers LOH within plausible TSGs. Although the implication of these candidate genes is only a preliminary finding, and validation by microsatellite analysis in a larger cohort of samples would be required in order to definitively link them with the pathogenesis of NHL, they clearly illustrate the utility of the CHC method for derivation of LOH candidates.