Modeling the Aneuploidy Control of Cancer
 Yao Li^{1},
 Arthur Berg^{2, 3},
 Louie R Wu^{4},
 Zhong Wang^{2, 3},
 Gang Chen^{3} and
 Rongling Wu^{2, 3}Email author
DOI: 10.1186/1471240710346
© Li et al; licensee BioMed Central Ltd. 2010
Received: 30 August 2009
Accepted: 1 July 2010
Published: 1 July 2010
Abstract
Background
Aneuploidy has long been recognized to be associated with cancer. A growing body of evidence suggests that tumorigenesis, the formation of new tumors, can be attributed to some extent to errors occurring at the mitotic checkpoint, a major cell cycle control mechanism that acts to prevent chromosome missegregation. However, so far no statistical model has been available quantify the role aneuploidy plays in determining cancer.
Methods
We develop a statistical model for testing the association between aneuploidy loci and cancer risk in a genomewide association study. The model incorporates quantitative genetic principles into a mixturemodel framework in which various genetic effects, including additive, dominant, imprinting, and their interactions, are estimated by implementing the EM algorithm.
Results
Under the new model, a series of hypotheses tests are formulated to explain the pattern of the genetic control of cancer through aneuploid loci. Simulation studies were performed to investigate the statistical behavior of the model.
Conclusions
The model will provide a tool for estimating the effects of genetic loci on aneuploidy abnormality in genomewide studies of cancer cells.
Background
In recent years, there has been a wealth of literature on the development of statistical methods for genetic analysis of complex diseases, such as cancer [1, 2]. These methods, mostly founded on rigorous statistical theory and models, have been instrumental in the analysis and modeling of genetic data, leading to the identification of significant genetic variants involved in pathogenesis [3, 4]. However, many existing statistical methods neglect biological principles refreshed and updated from the latest scientific discoveries obtained by using new genomic technologies. A lack of the integration between statistics and biology will significantly limit our detection and characterization of the new genetic underpinnings of a disease. The motivation of this study is to develop a novel statistical model for detecting the genetic control of cancer through chromosomal loci predisposing to aneuploidy.
 (1)
Aneuploidy is confirmed to generate abnormal phenotypes, such as Down syndrome in humans and cancer in animals;
 (2)
The degree of aneuploidy is correlated with phenotype abnormality;
 (3)
Since aneuploidy imbalances the highly balancesensitive components of the spindle apparatus, it destabilizes symmetrical chromosome segregation;
 (4)
Both nongenotoxic and genotoxic carcinogens can cause aneuploidy by physical or chemical interaction with mitosis proteins.
Similar to point (2), there is additional evidence that cancerspecific phenotypes result when aneuploidy exceeds a certain threshold [13, 14]. Kops et al. [15] outlined the cytological mechanisms for aneuploid formation from checkpoint signalling. Normally, chromosome missegregation can be prevented at the mitotic checkpoint by delaying cellcycle progression through mitosis until all chromosomes have successfully made spindlemicrotubule attachments, but a defect in the mitotic checkpoint can generate aneuploidy, facilitate tumorigenesis, and can cause increased resistance to anticancer therapies [16].
The statistical model developed to detect cancer genes is constructed with a random sample of aneuploid patients with cancer drawn from a natural population. At an aneuploid locus, polyploids occur because of the duplication of one or two parental chromosomes and, thus, the model can be formulated to test the genetic imprinting of alleles due to their different parental origins.
If the aneuploidy hypothesis is continuously confirmed, this model will provide a timely tool to quantify the genetic effects of aneuploidy loci on cancer susceptibility by integrating the genetic data from the cancer genome project. Also, by comparing with the model for detecting somatic mutations, this new model will help to determine the relative importance of the aneuploidy and mutation hypotheses in cancer studies.
Methods
Study Design
Suppose there is a normal human diploid population which is at HardyWinerberg equilibrium (HWE). Some individuals in this population form cancer owing to particular regions of their chromosomes multiplied to form a triploid, tetraploid, or a polyploid of any higher order. To simply describe our model, we only consider a triploid. As proven below, the population after chromosomal multiplication will deviate from HWE. We assume that a total of n cancer patients are randomly sampled from this population. Each sampled patient is a triploid at a particular aneuploid locus. We genotype all these patients at duplicated chromosomal segments with molecular markers, although the parental origin of chromosomal duplication is unknown. A phenotype that defines cancer is measured for all subjects. A model will be derived to distinguish between the genetic effects of alleles inherited from the maternal (M) and paternal parents (P).
Chromosome Duplication
Triploid Model
 (1)
AAA including configurations AAA, duplicated from the leftside parent, and AAA, duplicated from the rightside parent, of configuration AA;
 (2)
AAa including configurations AAa duplicated from the leftside parent of configuration Aa and aAA duplicated from the rightside parent of configuration aA;
 (3)
Aaa including configurations aaA duplicated from the leftside parent of configuration aA and Aaa duplicated from the rightside parent of configuration Aa;
 (4)
aaa including configurations aaa, duplicated from the leftside parent, and aaa, duplicated from the rightside parent, of configuration aa.
Let p and q (p + q = 1) are the allele frequencies of A and a in the original population before chromosome duplication. For a natural population at HWE, genotype frequencies can be expressed as p ^{2} for genotype AA, 2pq for genotype Aa, and q ^{2} for genotype aa.
Theorem
For an HWE diploid population, chromosome duplication operating on particular loci can violate the equilibrium status of the population.
Thus, unless and , the duplicated population will be at HardyWeinberg disequilibrium.
This theorem shows that traditional HWE theory for population genetic studies will not be useful for cancer gene identification. Meanwhile, this theorem provides a foundation for deriving a statistical model to conduct genomewide association studies of cancer.
Quantitative Genetic Parameters
The changes of genotypes and genotype frequencies after chromosomal duplication.
Nonduplicated  Duplicated  

Genotype  Frequency  Duplication  Genotype  Frequency  Observation 
AA  p ^{2}  ⇒  AAA 
 n _{1} 
Aa  2pq  ⇒  AAa 
 n _{2} 
Aaa 
 n _{3}  
aa  q ^{2}  ⇒  aaa 
 n _{4} 
Genotypic values and proportions of different configurations of a triploid genotype at a duplicated gene.
Duplicated Genotype  Configuration  Genetic Value  Duplication Rate 

AAA 



AAa 



Aaa 



aaa 



For each triploid genotype, the relative proportions of two underlying configurations can be different, depending on the rate of the duplication of parentspecific chromosomes. Let u and 1  u be the proportions of the duplication of allele A derived from the maternal and paternal parents, respectively. Similarly, let v and 1  v be the proportions of the duplication of allele a derived from the maternal and paternal parents, respectively (Table 2). These proportions can be estimated from genotype data.
Estimation
which is derived from a polynomial likelihood. The EM algorithm is implemented to estimate the allele frequencies ( and ) and HWD coefficients from the triploid genotype observations of the aneuploid population sampled (Table 1). It is described as follows:
for allele a.
where is the vector of unknown parameters, and exp (k = 1, ..., 4; j = 1, 2) is the normal distribution of the phenotypic trait with mean μ _{ kj } and variance σ ^{2}.
Hypothesis Tests
under which genotype frequencies can be estimated from the estimated allele frequencies using equation (1). The loglikelihood ratio calculated under the null and alternative hypotheses follows a χ^{2}distribution with 2 degrees of freedom. It is interesting to test the two disequilibria separately. Under the null hypothesis H _{0} : D _{1} = 0. genotype frequencies are estimated using equation (1), but with a constraint P _{3} = p ^{3}, in addition to constraint P _{1}+P _{2}+P _{3}+P _{4} = 1. Similarly, genotype frequencies are estimated with a constraint P _{4} = q ^{3} for testing whether D _{2} = 0.
Whether the duplicated gene is significantly associated with cancer susceptibility can be tested using the null hypothesis μ _{ kj }≡ μ for k = 1, ..., 4; j = 1, 2. The additive effect and two types of dominance effects can be tested jointly or separately by formulating the relevant null hypotheses based on equations (14), (15), and (16). The imprinting effect and its interactions with additive and dominance effects can be tested by using the null hypothesis H _{0} : λ = 0, H _{0} : I _{ aλ }= 0, H _{0} : I _{ dλ }= 0, and H _{0} : I _{ d'λ }= 0 constructed with equations (??), (18), (18), and (19), respectively.
The model can also be used to test the significance of duplication rate for a parentspecific chromosome by formulating the null hypothesis H _{0} : u = 1 or H _{0} : v = 1. This information helps to understand the genetic structure and evolutionary process of cancer risk.
Results
Simulation studies were used to investigate the statistical properties of the model in terms of estimation precision, power and false positive rates. We simulate a cancer population of triploids for a portion of chromosome. The allele frequencies at a triploid locus are = 0.6 and = 0.4. The HWD coefficients at this locus are assumed as D _{1} = 0:08, D _{2} = 0:06. By assuming the duplication rates of 0.3 and 0.4 for two parental chromosomes, respectively, the distribution of four different triploid genotypes AAA, AAa, Aaa, and aaa can be simulated. The phenotypic values of cancer traits were simulated by summing the additive, dominance, imprinting, and their interaction effects given with particular values and the errors of measurement within each triploid genotype following a normal distribution with variance scaled by a heritability of 0.05, 0.10, and 0.20, respectively. Different sample sizes, 400, 800, and 2,000 are considered.
The estimates of population genetic parameters (p, u, v) and quantitative genetic parameters (a, d, d', λ, I _{ aλ }, I _{ dλ }, I _{ d'λ }) from simulated data with different sample size and heritability combinations.
Sample Size  H ^{2}  p  u  v  a  d  d ^{ ' }  λ  I _{ aλ }  I _{ dλ }  I _{ d'λ } 

True Value  0:6  0:3  0:4  0:8  0:5  0:4  0:5  0:4  0:5  0:3  
400  0.05  0:5973  0:2974  0:3389  0:9378  0:7793  0:5928  0:7974  0:6869  0:9731  0:5860 
(0:0017)  (0:0149)  (0:0197)  (0:0468)  (0:1618)  (0:1052)  (0:0832)  (0:0590)  (0:2354)  (0:1218)  
0.1  0:6014  0:2803  0:3783  0:9252  0:5270  0:3571  0:6732  0:5647  0:6423  0:5066  
(0:0016)  (0:0183)  (0:0190)  (0:0404)  (0:1548)  (0:0662)  (0:0664)  (0:0447)  (0:1854)  (0:0750)  
0.2  0:6016  0:2873  0:4139  0:8333  0:5436  0:3354  0:5116  0:4308  0:4066  0:3433  
(0:0016)  (0:0161)  (0:0198)  (0:0274)  (0:0798)  (0:0554)  (0:0435)  (0:0294)  (0:1093)  (0:0481)  
800  0.05  0:5980  0:2463  0:3449  1:0740  0:6989  0:3363  0:8930  0:7510  0:6963  0:5266 
(0:0012)  (0:0154)  (0:0227)  (0:0576)  (0:2247)  (0:1128)  (0:0909)  (0:0626)  (0:2694)  (0:1153)  
0.1  0:5993  0:2722  0:3783  0:8945  0:6552  0:4731  0:6004  0:5015  0:6375  0:5637  
(0:0013)  (0:0156)  (0:0210)  (0:0349)  (0:1199)  (0:0854)  (0:0606)  (0:0399)  (0:1565)  (0:0781)  
0.2  0:5988  0:2809  0:3763  0:8756  0:4041  0:4281  0:5925  0:4799  0:3461  0:3254  
(0:0011)  (0:0166)  (0:0194)  (0:0275)  (0:0781)  (0:0444)  (0:0379)  (0:0253)  (0:1035)  (0:0466)  
2000  0.05  0:6011  0:2428  0:3656  1:0367  0:7602  0:3605  0:7556  0:6271  0:8159  0:4858 
(0:0007)  (0:0165)  (0:0209)  (0:0622)  (0:2001)  (0:0852)  (0:0995)  (0:0647)  (0:2416)  (0:1019)  
0.1  0:6008  0:2812  0:3805  0:8417  0:6676  0:4135  0:4838  0:4217  0:6970  0:5141  
(0:0007)  (0:0161)  (0:0211)  (0:0320)  (0:1337)  (0:0546)  (0:0537)  (0:0370)  (0:1585)  (0:0622)  
0.2  0:6001  0:2860  0:3771  0:8103  0:5607  0:4490  0:4614  0:3758  0:6301  0:3942  
(0:0008)  (0:0132)  (0:0185)  (0:0153)  (0:0511)  (0:0383)  (0:0297)  (0:0195)  (0:0699)  (0:0331) 
The power to detect the overall genetic effect and imprinting effect was investigated. In general, the model has great power for the identification of aneuploid loci causing cancer. To achieve adequate power for imprinting effect detection, a large sample size and/or large heritability is required. Overall, a sample size of 400 with a heritability of 0.2 can reach power of over 0.75 for the detection of imprinting effects. We also performed simulation studies to examine the false positive rates for detecting overall genetic effects and imprinting effects at aneuploid loci. It appears that in each case the false positive rates can be controlled to be below 510%.
Discussion
Over the past 100 years since Theodor Boveri hypothesized that mitotic defects that result in tetraploidy promote oncogenesis [17], a tremendous concern has been given to explore the genetic cause of tumorigenesis. It has been partly established that aneuploidy has an effect on proliferation and survival of tumors [5]. The recent discovery of components of the mitotic checkpoint, as well as the realization that many of the classic tumour suppressors and oncogene products regulate mitotic progression, has renewed interest in the role of aneuploidy in tumorigenesis [10, 15, 16]. With the completion of the human genome projects and HapMap project, there is a pressing need for the development of statistical models for estimating the genetic effect of aneuploid loci on cancer risk.
In this article, we present a statistical strategy for detecting the genetic control of cancer traits through genotyping aneuploids of cancer cells. The model proposed presents two novelties. First, it has for the first time integrated the latest discovery of cancer genetic studies with statistical principles and directly pushed the modeling effort of cancer gene identification at the frontier of cancer biology. The experimental design used is founded on biologically relevant hypotheses from which data can be collected in an effective way. The derived closed forms for the EM algorithm to estimate various parameters will provide an efficient computation for any data set. Second, the model capitalizes on traditional quantitative genetic theory, allowing the partition of overall genetic control into different components. Particularly, we are able to estimate and test the effect of genetic imprinting on cancer risk [18, 19] and, thus, draw a detailed picture of genetic control triggered from different parental chromosomes. The model can also characterize the interactions of additive and dominant effects with imprinting effects, helping to gain a better insight into the complexity of the genetic architecture of cancer.
We performed computer simulation to examine the statistical properties of the model. Results from simulation studies were investigated, from which an appropriate sample size is determined for a cancer trait with a particular heritability. Analyses of model power and false positive rates validated the possible usefulness of the model when practical data sets are available. Through a simple mathematical proof, we found that the HardyWeinberg equilibrium of an original population can be destroyed when some chromosomes are duplicated.
The idea of the model can be extended to several more complicated situations. First, the aneuploidy control of cancer may be derived from highorder aneuploid, such as tetraploids. A highorder polyploid not only contain more allelic combinations, but also a more amount of missing data due to the duplication of different chromosomes with unknown parental origins. To model the tetraploidy control of cancer, a more sophisticated algorithm is required to obtain efficient estimates of parameters. Second, different aneuploid loci responsible for cancer traits may be associated in the duplication population and interact in a coordinated manner. Modeling of multilocus associations and multilocus epistasis will deserve a further investigation although these pieces of information can better explain the genetic variation of cancer than single loci. Third, other factors, such as sex, race, and life style, also contribute to cancer. It is crucial to incorporate these factors and study the effects of each of them and their interactions with genes in tumorigenesis.
Conclusion
We have derived a new statistical model for identifying genetic loci that control quantitative phenotypes of aneuploidy cancer through a genomewide association study. We integrate quantitative genetic principles into the model, allowing the estimation of different types of genetic effects. The new model can generate a series of hypotheses tests about the explanation of the genetic control mechanisms of cancer through aneuploid loci. Although our model was explored merely from a theoretical perspective, specific experiments should be readily launched to collect the data according to the genetic design suggested. By analyzing such data, the new model should be able to uncover unique results, facilitating our understanding of how aneuploid processes are linked with cancer through genetic mediations.
Acknowledgements
We thank Dr. Justo Lorenzo Bermejo, Dr. George Heinze, Dr. Marek Kimmel, and Dr. Elizabeth Petty for their constructive comments which help to improve the manuscript. This work is supported by joint grant DMS/NIGMS0540745 and a Penn State Cancer Institute Seed Grant.
Declarations
Authors’ Affiliations
References
 Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, Wang Y: The properties of highdimensional data spaces: implications for exploring gene and protein expression data. Nature Reviews Cancer. 2008, 8: 184194. 10.1038/nrc2294.View ArticleGoogle Scholar
 Stephens M, Balding DJ: Bayesian statistical methods for genetic association studies. Nature Reviews Genetics. 2009, 10: 681690. 10.1038/nrg2615.View ArticlePubMedGoogle Scholar
 Cookson W, Liang L, Abecasis G, Mokatt M, Lathrop M: Mapping complex disease traits with global gene expression. Nature Reviews Genetics. 2009, 10: 184194. 10.1038/nrg2537.View ArticlePubMedPubMed CentralGoogle Scholar
 Mackay TFC, Stone EA, Ayroles JF: The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics. 2009, 10: 565577. 10.1038/nrg2612.View ArticlePubMedGoogle Scholar
 Williams BR, Amon A: Aenuploidy: Cancer's fatal aw?. Cancer Research. 2009, 69 (13): 52895291. 10.1158/00085472.CAN090944.View ArticlePubMedPubMed CentralGoogle Scholar
 Hede K: Which came first? Studies clarify role of aneuploidy in cancer. Journal of National Cancer Institute. 2005, 97 (2): 8789. 10.1093/jnci/97.2.87.View ArticleGoogle Scholar
 Ganem N, Storchova Z, Pellman D: Tetraploidy, aneuploidy and cancer. Current Opinion in Genetics and Development. 2007, 17 (2): 157162. 10.1016/j.gde.2007.02.011.View ArticlePubMedGoogle Scholar
 Lengauer C, Kinzler KW, Vogelstein B: Genetic instabilities in human cancers. Nature. 1998, 396 (6712): 643649. 10.1038/25292.View ArticlePubMedGoogle Scholar
 Duesberg P, Rasnick D, Li R, Winters L, Rausch C, Hehlmann R: How aneuploidy may cause cancer and genetic instability. Anticancer Research. 1999, 19 (6A): 48874906.PubMedGoogle Scholar
 Hanks S, Rahman N: Aneuploidyconcer predisposition syndromes: A new link between the mitotic spindle checkpoint and cancer. Cell Cycle. 2005, 4 (2): 225227.View ArticlePubMedGoogle Scholar
 Stock RP, Bialy H: The sigmoidal curve of cancer. Nature Biotechnology. 2003, 21: 1314. 10.1038/nbt010313.View ArticlePubMedGoogle Scholar
 Weaver BA, Cleveland DW: Does aneuploidy cause cancer?. Current Opinion in Cell Biology. 2006, 18 (6): 658667. 10.1016/j.ceb.2006.10.002.View ArticlePubMedGoogle Scholar
 Duesberg P, Li R, Fabarius A, Hehlmann R: The chromosomal basis of cancer. Cellular Oncology. 2005, 27 (5): 293318.PubMedPubMed CentralGoogle Scholar
 Duesberg P: Chromosomal chaos and cancer. Scientifi American. 2007, 296 (5): 5259. 10.1038/scientificamerican050752.View ArticleGoogle Scholar
 Kops GJ, Weaver BA, Cleveland DW: On the road to cancer: aneuploidy and the mitotic checkpoint. Nature Reviews Cancer. 2005, 5 (10): 773785. 10.1038/nrc1714.View ArticlePubMedGoogle Scholar
 Suijkerbuijk SJ, Kops GJ: Preventing aneuploidy: The contribution of mitotic checkpoint proteins. BBAReviews on Cancer. 2008, 1786: 2431.PubMedGoogle Scholar
 Maderspacher F: Theodor Boveri and the natural experiment. Current Biology. 2008, 18 (7): 279286. 10.1016/j.cub.2008.02.061.View ArticleGoogle Scholar
 Pulford DJ, Falls JG, Killian JK, Jirtle RL: Polymorphisms, genomic imprinting and cancer susceptibilit. Mutation Research. 1999, 436: 5967. 10.1016/S13835742(98)000180.View ArticlePubMedGoogle Scholar
 Jirtle RL: Genomic imprinting and cancer. Experimental Cell Ressearch. 1999, 248: 1824. 10.1006/excr.1999.4453.View ArticleGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712407/10/346/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.