Modeling the Aneuploidy Control of Cancer
© Li et al. 2010
Received: 30 August 2009
Accepted: 1 July 2010
Published: 1 July 2010
Skip to main content
© Li et al. 2010
Received: 30 August 2009
Accepted: 1 July 2010
Published: 1 July 2010
Aneuploidy has long been recognized to be associated with cancer. A growing body of evidence suggests that tumorigenesis, the formation of new tumors, can be attributed to some extent to errors occurring at the mitotic checkpoint, a major cell cycle control mechanism that acts to prevent chromosome missegregation. However, so far no statistical model has been available quantify the role aneuploidy plays in determining cancer.
We develop a statistical model for testing the association between aneuploidy loci and cancer risk in a genome-wide association study. The model incorporates quantitative genetic principles into a mixture-model framework in which various genetic effects, including additive, dominant, imprinting, and their interactions, are estimated by implementing the EM algorithm.
Under the new model, a series of hypotheses tests are formulated to explain the pattern of the genetic control of cancer through aneuploid loci. Simulation studies were performed to investigate the statistical behavior of the model.
The model will provide a tool for estimating the effects of genetic loci on aneuploidy abnormality in genome-wide studies of cancer cells.
In recent years, there has been a wealth of literature on the development of statistical methods for genetic analysis of complex diseases, such as cancer [1, 2]. These methods, mostly founded on rigorous statistical theory and models, have been instrumental in the analysis and modeling of genetic data, leading to the identification of significant genetic variants involved in pathogenesis [3, 4]. However, many existing statistical methods neglect biological principles refreshed and updated from the latest scientific discoveries obtained by using new genomic technologies. A lack of the integration between statistics and biology will significantly limit our detection and characterization of the new genetic underpinnings of a disease. The motivation of this study is to develop a novel statistical model for detecting the genetic control of cancer through chromosomal loci predisposing to aneuploidy.
Aneuploidy is confirmed to generate abnormal phenotypes, such as Down syndrome in humans and cancer in animals;
The degree of aneuploidy is correlated with phenotype abnormality;
Since aneuploidy imbalances the highly balance-sensitive components of the spindle apparatus, it destabilizes symmetrical chromosome segregation;
Both non-genotoxic and genotoxic carcinogens can cause aneuploidy by physical or chemical interaction with mitosis proteins.
Similar to point (2), there is additional evidence that cancer-specific phenotypes result when aneuploidy exceeds a certain threshold [13, 14]. Kops et al.  outlined the cytological mechanisms for aneuploid formation from checkpoint signalling. Normally, chromosome mis-segregation can be prevented at the mitotic checkpoint by delaying cell-cycle progression through mitosis until all chromosomes have successfully made spindle-microtubule attachments, but a defect in the mitotic checkpoint can generate aneuploidy, facilitate tumorigenesis, and can cause increased resistance to anti-cancer therapies .
The statistical model developed to detect cancer genes is constructed with a random sample of aneuploid patients with cancer drawn from a natural population. At an aneuploid locus, polyploids occur because of the duplication of one or two parental chromosomes and, thus, the model can be formulated to test the genetic imprinting of alleles due to their different parental origins.
If the aneuploidy hypothesis is continuously confirmed, this model will provide a timely tool to quantify the genetic effects of aneuploidy loci on cancer susceptibility by integrating the genetic data from the cancer genome project. Also, by comparing with the model for detecting somatic mutations, this new model will help to determine the relative importance of the aneuploidy and mutation hypotheses in cancer studies.
Suppose there is a normal human diploid population which is at Hardy-Winerberg equilibrium (HWE). Some individuals in this population form cancer owing to particular regions of their chromosomes multiplied to form a triploid, tetraploid, or a polyploid of any higher order. To simply describe our model, we only consider a triploid. As proven below, the population after chromosomal multiplication will deviate from HWE. We assume that a total of n cancer patients are randomly sampled from this population. Each sampled patient is a triploid at a particular aneuploid locus. We genotype all these patients at duplicated chromosomal segments with molecular markers, although the parental origin of chromosomal duplication is unknown. A phenotype that defines cancer is measured for all subjects. A model will be derived to distinguish between the genetic effects of alleles inherited from the maternal (M) and paternal parents (P).
AAA including configurations AA|A, duplicated from the left-side parent, and A|AA, duplicated from the rightside parent, of configuration AA;
AAa including configurations AA|a duplicated from the left-side parent of configuration A|a and a|AA duplicated from the right-side parent of configuration a|A;
Aaa including configurations aa|A duplicated from the left-side parent of configuration a|A and A|aa duplicated from the right-side parent of configuration A|a;
aaa including configurations aa|a, duplicated from the left-side parent, and a|aa, duplicated from the right-side parent, of configuration aa.
Let p and q (p + q = 1) are the allele frequencies of A and a in the original population before chromosome duplication. For a natural population at HWE, genotype frequencies can be expressed as p 2 for genotype AA, 2pq for genotype Aa, and q 2 for genotype aa.
For an HWE diploid population, chromosome duplication operating on particular loci can violate the equilibrium status of the population.
Thus, unless and , the duplicated population will be at Hardy-Weinberg disequilibrium.
This theorem shows that traditional HWE theory for population genetic studies will not be useful for cancer gene identification. Meanwhile, this theorem provides a foundation for deriving a statistical model to conduct genome-wide association studies of cancer.
The changes of genotypes and genotype frequencies after chromosomal duplication.
Genotypic values and proportions of different configurations of a triploid genotype at a duplicated gene.
For each triploid genotype, the relative proportions of two underlying configurations can be different, depending on the rate of the duplication of parent-specific chromosomes. Let u and 1 - u be the proportions of the duplication of allele A derived from the maternal and paternal parents, respectively. Similarly, let v and 1 - v be the proportions of the duplication of allele a derived from the maternal and paternal parents, respectively (Table 2). These proportions can be estimated from genotype data.
which is derived from a polynomial likelihood. The EM algorithm is implemented to estimate the allele frequencies ( and ) and HWD coefficients from the triploid genotype observations of the aneuploid population sampled (Table 1). It is described as follows:
for allele a.
where is the vector of unknown parameters, and exp (k = 1, ..., 4; j = 1, 2) is the normal distribution of the phenotypic trait with mean μ kj and variance σ 2.
under which genotype frequencies can be estimated from the estimated allele frequencies using equation (1). The log-likelihood ratio calculated under the null and alternative hypotheses follows a χ2-distribution with 2 degrees of freedom. It is interesting to test the two disequilibria separately. Under the null hypothesis H 0 : D 1 = 0. genotype frequencies are estimated using equation (1), but with a constraint P 3 = p 3, in addition to constraint P 1+P 2+P 3+P 4 = 1. Similarly, genotype frequencies are estimated with a constraint P 4 = q 3 for testing whether D 2 = 0.
Whether the duplicated gene is significantly associated with cancer susceptibility can be tested using the null hypothesis μ kj ≡ μ for k = 1, ..., 4; j = 1, 2. The additive effect and two types of dominance effects can be tested jointly or separately by formulating the relevant null hypotheses based on equations (14), (15), and (16). The imprinting effect and its interactions with additive and dominance effects can be tested by using the null hypothesis H 0 : λ = 0, H 0 : I aλ = 0, H 0 : I dλ = 0, and H 0 : I d'λ = 0 constructed with equations (??), (18), (18), and (19), respectively.
The model can also be used to test the significance of duplication rate for a parentspecific chromosome by formulating the null hypothesis H 0 : u = 1 or H 0 : v = 1. This information helps to understand the genetic structure and evolutionary process of cancer risk.
Simulation studies were used to investigate the statistical properties of the model in terms of estimation precision, power and false positive rates. We simulate a cancer population of triploids for a portion of chromosome. The allele frequencies at a triploid locus are = 0.6 and = 0.4. The HWD coefficients at this locus are assumed as D 1 = 0:08, D 2 = 0:06. By assuming the duplication rates of 0.3 and 0.4 for two parental chromosomes, respectively, the distribution of four different triploid genotypes AAA, AAa, Aaa, and aaa can be simulated. The phenotypic values of cancer traits were simulated by summing the additive, dominance, imprinting, and their interaction effects given with particular values and the errors of measurement within each triploid genotype following a normal distribution with variance scaled by a heritability of 0.05, 0.10, and 0.20, respectively. Different sample sizes, 400, 800, and 2,000 are considered.
The estimates of population genetic parameters (p, u, v) and quantitative genetic parameters (a, d, d', λ, I aλ , I dλ , I d'λ ) from simulated data with different sample size and heritability combinations.
The power to detect the overall genetic effect and imprinting effect was investigated. In general, the model has great power for the identification of aneuploid loci causing cancer. To achieve adequate power for imprinting effect detection, a large sample size and/or large heritability is required. Overall, a sample size of 400 with a heritability of 0.2 can reach power of over 0.75 for the detection of imprinting effects. We also performed simulation studies to examine the false positive rates for detecting overall genetic effects and imprinting effects at aneuploid loci. It appears that in each case the false positive rates can be controlled to be below 5-10%.
Over the past 100 years since Theodor Boveri hypothesized that mitotic defects that result in tetraploidy promote oncogenesis , a tremendous concern has been given to explore the genetic cause of tumorigenesis. It has been partly established that aneuploidy has an effect on proliferation and survival of tumors . The recent discovery of components of the mitotic checkpoint, as well as the realization that many of the classic tumour suppressors and oncogene products regulate mitotic progression, has renewed interest in the role of aneuploidy in tumorigenesis [10, 15, 16]. With the completion of the human genome projects and HapMap project, there is a pressing need for the development of statistical models for estimating the genetic effect of aneuploid loci on cancer risk.
In this article, we present a statistical strategy for detecting the genetic control of cancer traits through genotyping aneuploids of cancer cells. The model proposed presents two novelties. First, it has for the first time integrated the latest discovery of cancer genetic studies with statistical principles and directly pushed the modeling effort of cancer gene identification at the frontier of cancer biology. The experimental design used is founded on biologically relevant hypotheses from which data can be collected in an effective way. The derived closed forms for the EM algorithm to estimate various parameters will provide an efficient computation for any data set. Second, the model capitalizes on traditional quantitative genetic theory, allowing the partition of overall genetic control into different components. Particularly, we are able to estimate and test the effect of genetic imprinting on cancer risk [18, 19] and, thus, draw a detailed picture of genetic control triggered from different parental chromosomes. The model can also characterize the interactions of additive and dominant effects with imprinting effects, helping to gain a better insight into the complexity of the genetic architecture of cancer.
We performed computer simulation to examine the statistical properties of the model. Results from simulation studies were investigated, from which an appropriate sample size is determined for a cancer trait with a particular heritability. Analyses of model power and false positive rates validated the possible usefulness of the model when practical data sets are available. Through a simple mathematical proof, we found that the Hardy-Weinberg equilibrium of an original population can be destroyed when some chromosomes are duplicated.
The idea of the model can be extended to several more complicated situations. First, the aneuploidy control of cancer may be derived from high-order aneuploid, such as tetraploids. A high-order polyploid not only contain more allelic combinations, but also a more amount of missing data due to the duplication of different chromosomes with unknown parental origins. To model the tetraploidy control of cancer, a more sophisticated algorithm is required to obtain efficient estimates of parameters. Second, different aneuploid loci responsible for cancer traits may be associated in the duplication population and interact in a coordinated manner. Modeling of multi-locus associations and multi-locus epistasis will deserve a further investigation although these pieces of information can better explain the genetic variation of cancer than single loci. Third, other factors, such as sex, race, and life style, also contribute to cancer. It is crucial to incorporate these factors and study the effects of each of them and their interactions with genes in tumorigenesis.
We have derived a new statistical model for identifying genetic loci that control quantitative phenotypes of aneuploidy cancer through a genome-wide association study. We integrate quantitative genetic principles into the model, allowing the estimation of different types of genetic effects. The new model can generate a series of hypotheses tests about the explanation of the genetic control mechanisms of cancer through aneuploid loci. Although our model was explored merely from a theoretical perspective, specific experiments should be readily launched to collect the data according to the genetic design suggested. By analyzing such data, the new model should be able to uncover unique results, facilitating our understanding of how aneuploid processes are linked with cancer through genetic mediations.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.