Genetic variability in MCF-7 sublines: evidence of rapid genomic and RNA expression profile modifications

Background Both phenotypic and cytogenetic variability have been reported for clones of breast carcinoma cell lines but have not been comprehensively studied. Despite this, cell lines such as MCF-7 cells are extensively used as model systems. Methods In this work we documented, using CGH and RNA expression profiles, the genetic variability at the genomic and RNA expression levels of MCF-7 cells of different origins. Eight MCF-7 sublines collected from different sources were studied as well as 3 subclones isolated from one of the sublines by limit dilution. Results MCF-7 sublines showed important differences in copy number alteration (CNA) profiles. Overall numbers of events ranged from 28 to 41. Involved chromosomal regions varied greatly from a subline to another. A total of 62 chromosomal regions were affected by either gains or losses in the 11 sublines studied. We performed a phylogenetic analysis of CGH profiles using maximum parsimony in order to reconstruct the putative filiation of the 11 MCF-7 sublines. The phylogenetic tree obtained showed that the MCF-7 clade was characterized by a restricted set of 8 CNAs and that the most divergent subline occupied the position closest to the common ancestor. Expression profiles of 8 MCF-7 sublines were analyzed along with those of 19 unrelated breast cancer cell lines using home made cDNA arrays comprising 720 genes. Hierarchical clustering analysis of the expression data showed that 7/8 MCF-7 sublines were grouped forming a cluster while the remaining subline clustered with unrelated breast cancer cell lines. These data thus showed that MCF-7 sublines differed at both the genomic and phenotypic levels. Conclusions The analysis of CGH profiles of the parent subline and its three subclones supported the heteroclonal nature of MCF-7 cells. This strongly suggested that the genetic plasticity of MCF-7 cells was related to their intrinsic capacity to generate clonal heterogeneity. We propose that MCF-7, and possibly the breast tumor it was derived from, evolved in a node like pattern, rather than according to a linear progression model. Due to their capacity to undergo rapid genetic changes MCF-7 cells could represent an interesting model for genetic evolution of breast tumors.

MCF-7, and possibly the breast tumor it was derived from, evolved in a node like pattern, rather than according to a linear progression model. Due to their capacity to undergo rapid genetic changes MCF-7 cells could represent an interesting model for genetic evolution of breast tumors.

Background
Primary breast tumors are known for their elevated level of inter-tumor heterogeneity, however, an important body of data has brought evidence of intra-tumoral heterogeneity as well. Such evidence stems from cytogenetic studies which have shown that cytogenetically unrelated clones can be found in breast tumors [1]. These findings have been interpreted either as the result of genetic instability following loss of proper mitotic controls [2], or as the expression of the admixture of multiple genetically non related cellular clones [3]. Flow cytometry has been another way to address the question of intratumoral heterogeneity, showing that breast tumors correspond to intricate admixture of tumor cells with different DNA contents (i.e. different ploidies) [4]. These findings were extended by Bonsing and coworkers [5], who showed that diploid and aneuploid cells, concurently present in breast tumors, had a number of genetic anomalies in common. In fact, all the allelic imbalances observed in the diploid compartment were found in aneuploid cells. This was a strong indication of a direct filiation between diploid and aneuploid cells in breast tumors. Heterogeneity is thus a major problem in mammary carcinogenesis and has important clinical implications in terms of prognosis and therapy. MCF-7 cells are the most commonly used model of estrogen positive breast cancer. This cell line has been originally established in 1973 at the Michigan Cancer Foundation from a pleural effusion taken from a woman with metastatic breast cancer [6] and since then MCF-7 cells have been widely distributed in laboratories throughout the world resulting in the production of different cellular stocks. Quite early in the history of MCF-7 cells reports on clonal variations were made in the literature. Most of the reported differences concerned phenotypic traits such as estrogen responsiveness or ability to form tumors in syngeneic mice, but karyotypic differences were observed as well [7][8][9]. MCF-7 cells presented extensive aneuploidy with important variations in chromosome numbers ranging from 60 to 140 according to the variant examined. Other cytogenetic differences concerned the presence or absence of specific marker chromosomes. While loss of marker chromosomes seemed a rare event, occurrence of new aberrations was more common [8]. However, some doubt remained on the true origin of these differences, as some MCF-7 sublines corresponded to other cancer cells of unknown origin [10].
The available data suggested an elevated level of genetic instability in MCF-7 cells. The observed karyotypic differences could reflect changes in selective pressure due to different culture conditions. Alternatively, work by Resnicoff and coworkers [11] showed that, upon fractionation of MCF-7 cells on a Percoll gradient, it was possible to isolate six different subpopulations, one of which bore the capacity to regenerate all other cellular populations. These data suggested that MCF-7 cells contain a fraction of stem cells able to generate clonal variability. This was proposed as an explanation for the heterogeneity of this cell line and as a model for breast tumor heterogeneity.
In a previous work [12] we analyzed by Comparative Genomic Hybridization (CGH) two sublines of MCF-7 cells which showed surprisingly different genomic profiles. Our data were concordant with that reported by Jones and coworkers [13]. We became interested in: (1) documenting the genetic variability, at both genomic and RNA expression levels, that exists among different MCF-7 sublines of different origins; (2) retracing their evolutionary history and unraveling their filiation; (3) addressing the issue of the cause of this diversity and whether it reflected their intrinsic capacity to generate clonal heterogeneity or resulted from local changes in culture conditions. The resulting information should help understand tumor heterogeneity.
To address these questions we collected 9 cell lines identified as MCF-7 variants. We also established 3 cell clones starting from one of the collected sublines. The different MCF-7 variants were compared at the genetic level using CGH as well as RNA expression profiling. CGH and RNA expression profiles were subjected to phylogenetic analyses to determine the degree of filiation between the different cell lines studied.

DNA and RNA purification
Genomic DNA and total RNA were isolated as previously described [14]. RNA integrity was controlled by denaturing formaldehyde agarose electrophoresis and checked by Northern blot, hybridizing the RNA with an oligonucleotide probe specific to the 28S rRNA.

Genetic analysis of the different sublines
All the cell lines used in this study have been haplotyped with a combination of 9 CA repeat microsatellite markers from the Généthon collection, respectively localized on chromosomes 1, 6 and 17: D1S2615, D1S2811, D1S2624, D6S310, D6S401, D6S460, D17S1855, D17S1865, D17S1604. Primers are described on the Généthon web site http://www.genlink.wustl.edu/ genethon_frame/. PCR conditions and size analysis of the products were as described [15].

Comparative Genomic Hybridization
Metaphase preparation, genomic DNA labeling, CGH reaction and image analysis were as described [16].

Phylogenetic analysis on CGH data
The evolutionary history of cell lineages was reconstructed in a cladistic framework. Chromosomal bands were considered as characters, existing under three possible discrete character states: gain, loss and normal (i.e. no mutation). Transformations from one character state to another were equally weighted, and the normal state (i.e. tumor/normal hybridization ratio = 1) was considered ancestral. Phylogenetic trees were reconstructed under the maximum parsimony (MP) criterion, using the following hypothesis [17]: (1) all characters were considered as independent: i.e. events occurring at one band did not affect events occurring at another band; (2) they were unordered : it was possible to directly change from one state (either normal, amplified, or deleted) to a second one, without invoking the third one; (3) they were equally weighted : each change from one state to another had the same probability of occurrence. Using the MP approach has two main advantages: (1) it considered chromosome bands one by one, and integrated CGH information available for each of them for all twelve cell lineages simultaneously ; (2) it allowed to trace a posteriori chromosomal events that characterize the different groups of cell lineages evidenced on the most parsimonious trees and to identify diagnostic events. All analyses were conducted with PAUP* [18], version 4 beta 8, with heuristic MP searches based on 1000 random addition of cell lineages, with tree bisection-reconnection (TBR) branch swapping, and accelerated transformation (ACCTRAN) optimization of character-states. To trace the character-state changes along the phylogenetic trees, we used the program MacClade [19], version 3.04. In order to evaluate whether CGH data were suitable for reconstructing the phylogeny of the cell lineages, and whether phylogenetic trees adequately represented them, the robustness of the different nodes has been measured, and independently estimated from two different approaches. Bootstrap [20] was conducted with 1000 replicates of character resampling, and the highest bootstrap percentages (BP) defined the strongest nodes. The Bremer approach [21] measured the number of extramutation-events required to break the corresponding nodes, and the highest Bremer support indices (BSI) defined the most robust nodes.

Preparation and hybridization of cDNA arrays
Variations in gene expression levels were analyzed by large-scale measurement with home-made cDNA mini-arrays (7.5 × 9 cm; 720 human genes; 11 genes/cm 2 ) produced in our facility (TAGC, University of Marseille Luminy). The spotted targets were PCR products amplified from control clones and IMAGE cDNA clones (IMAGE consortium, Hinxton, UK). Selected cDNA clones corresponded to identified genes positioned on chromosomes 1q and 17q. Information was gathered and crosschecked from different web based data bases such as genemap http://www.ncbi.nlm.nih.gov/genemap99/, Genecards http://genecards.weizmann.ac.il/, Genelynx http://www.genelynx.org/ or UCSC Genome http://genome.ucsc.edu/. PCR amplification and automatic spotting of PCR products to the arrays (nylon Hybond-N+ membranes, Amersham Pharmacia Biotech; Little Chalfont, UK) were performed according to Bertucci and colleagues (1999). Each array was hybridized with a 33 Plabeled probe synthesized by reverse transcribing 5 µg of total RNA for each sample [22]. Labeling of complex probes, hybridization and washing conditions were as described http://tagc.univ-mrs.fr/pub/cancer. Arrays were exposed to phosphor-imaging plates and then scanned with a FUJI BAS 5000 beta imager (Raytest, Asnieres, France). Hybridization signals were quantified with the HDG Analyzer software (Genomic Solution, Ann Arbor, MI, USA), by integrating all spot pixel intensities and removing a spot background value determined in the neighboring area.

Clustering analysis of gene expression data
Data display and analysis was performed using Excel software (Microsoft, Richmond, WA, USA). Intensity values were adjusted by a normalization step based on the DNA quantification of each spot and the sum of intensities detected in each experiment. Expression profiles were analyzed by hierarchical clustering using the Cluster program developed by Eisen and colleagues [23] and represented as a cladogram using the treeview software. and selected for Gentamycin resistance, while both MCF-7-MVLN-6ms7 and MCF-7-MVLN-6ms8 have been produced by a long term exposure of MCF-7-MVLN cells to 200 nM OH-TAM. We also isolated cell clones from MCF-7-R using limit dilution. Three clones MCF-7-R-F3, MCF-7-R-D4 and MCF-7-R-G1 were selected for further studies. This allowed us to verify that MCF-7 cells showed intrapopulational heterogeneity.

Common genetic origin of MCF-7 variants
Available information on the history of the different sublines was not sufficient to rebuild lineages. It was, therefore, important to ascertain that all the tested sublines bore a common genetic origin. To this end allelotypes at 9 polymorphic microsatellite markers located on 3 chromosomal arms were determined. Eight of the 9 sublines had identical haplotypes while the doxorubicin resistant variant presented divergent allelic profiles at all markers analyzed. This was therefore taken out of the study (data not shown).

CGH analysis
Patterns of gains and losses shown by the different MCF-7 variants were highly diverse ( Table 1 and Figure 1). Number of events ranged from 28 (MCF-7-ATCC) to 41 (MCF-7-MG) and, on average, losses were more frequent (21) than gains (15). Only 9 events (6 losses, 3 gains) were shared by the 11 cell lines (Figure 1). This small number of common events could in part be attributed to MCF-7-ATCC which presented the most divergent CGH pattern. Out of the 28 gains or losses this subline displayed, 11 (6 losses, 5 gains) were specific to MCF-7-ATCC cells. It was noticeable that the sizes of regions of losses or gains varied according to the subline. This was particularly striking for losses on 16q or gains at 3q or 5q ( Figure 1). Generally regions of gains tended to be more heterogeneous in size and occurrence than losses.

Phylogenetic analysis
The diversity of CGH patterns illustrates the genomic plasticity of MCF-7 cells and their capacity to acquire copy number aberrations. It was thus interesting to verify whether it was possible to reconstruct the phylogeny of the MCF-7 variants studied. This question is directly related to those classically addressed in evolutionary biology, where different species are ordered and hierarchized according to morphological and/or molecular characters. Hence, computational methods developed in systematics represented interesting tools to address the problem. We chose to apply a character based approach called cladistics under the maximum parsimony (MP) criterion, in which

Figure 2
CGH profiles of MCF-7-R cells and its three subclones. Events were ordered from left to right for gains and from right to left for losses. The relative order was (1) MCF-7-R, (2) MCF-7-R-D4, (3) MCF-7-R-G1, (4) MCF-7-R-F3. Circled events were specific to daughter clones. Boxed events correspond to gains or losses found only in the mother line (bold line) or in the mother and one or two subclones (dotted boxes). different sublines were considered as taxa and copy number changes at every chromosomal band as characters. We favored maximum parsimony because it allows to order the different taxa and construct a phylogenetic tree requiring the fewest number of changes. In such a tree each cell line (or biological object) is represented as a leaf while nodes correspond to a collection of inferred characters encountered in hypothetical ancestors. A supplementary advantage of maximum parsimony is that it allows the identification of diagnostic events characterizing groups of cell lines. Each chromosomal band was considered to exist under three discrete states; normal, loss, gain.
In the model we applied here, transformations from one character to another were equally weighted. Although this model did not perfectly match with CGH observations we adopted it as the most workable approximation. As a matter of fact, cytogenetic bands vary greatly in size and a number of them are below the resolution limit of CGH. However, it is the only existing subdivision of chromosomal arms and it was not possible to base our analysis using chromosomal arms as a unit because of an insufficient number of characters.
The maximum parsimony analysis was done twice. In the first analysis we included only the 11 MCF-7 sublines and defined a normal genome as the origin or root of our putative tree. In the second we included the doxorubicin resistant cell line. Given its different genetic origin, it was interesting to check how this cell line positioned relative to bona fide MCF-7 variants in the phylogenetic tree. Furthermore, the order of MCF-7-R and its subclones MCF-7-R-G1, D4 and F3, as well as of MCF-7-MVLN and its Tamoxifene resistant offshoots MCF-7-6ms7 and 6ms8 were important indications of the reliability of the phylogenetic reconstruction method used. Figure 3 shows

Diagnostic characters
Diagnostic characters were identified using the analysis in which only certified MCF-7 sublines had been included. Characters were considered as diagnostic for a given clade on the corresponding cladogram when they occurred once and only once during the evolution of the 11 MCF-7 variants. Events (losses or gains) selected as diagnostic characters corresponded to minimal consensus regions. Numbers of characters gradually added up when going down the tree. As shown in Table 2, 8 events (5 losses and 3 gains) were identified as diagnostic characters of the MCF-7 clade, since they were present in all the sublines, including MCF-7-ATCC. The number of diagnostic characters rose to 20 (9 corresponding to MCF-7-ATCC and 11 specific to R and its descendents) when MCF-7-R was taken as a starting point, 22 with MCF-7-MG and 25 with MCF-7-MVLN which is an endpoint on this tree.

Figure 3
Phylogenetic tree describing the relationships between the MCF-7 sublines. The root was arbitrarily defined as corresponding to a genome devoid of any CNA (normal genome). The doxorubicin resistant line was also included in the analysis. Since it did not belong to the MCF-7 group it qualified as a potential outgroup and was indeed positioned as such by the analysis. This tree is a consensus tree corresponding to the 3 most parsimonious trees identified. It is 711 mutations long. Values represented at the nodes correspond to bootstrap percentages (top) and Bremer support indices (bottom). These values measure the robustness of the nodes.  Figure 4B). Although the relative order found in this analysis is not identical to the one found in the first analysis, results were in accord confirming the divergence of MCF-7-ATCC cells.

Discussion
It is generally believed that divergence in cancer cell lines is the consequence of differences in culture conditions, Table 2: Diagnostic characters identified in the main nodes of the MCF-7 phylogenetic tree. Characters specific of each node (whose occurrence has been associated with the emergence of the corresponding branch) are presented in bold type sets. Events in italics correspond to characters passed on from ancestors.

Loss 1p32-p36
Loss 1p31-p36 Loss 1p31-p36 Loss 1p31-p36 Gain 1p13 Gain 1p13 Gain 1p13 Gain 1q31 Gain 1q31 Gain 1q31 Loss 11q42-q44 Loss 11q42-q44 Loss 11q42-q44 Loss 2q36-q37 Loss 2q36-q37 Loss 2q36-q37 Loss 2q36-q37 Gain 3q26 Gain 3q26 Gain 3q26 Loss 5q33 Loss 6q25-q27 Loss 6q25-q27 Gain 7q22 Gain 7q22 Gain 7q22 Loss 8p11-p23 Loss 8p11-p23 Loss 8p11-p23 Loss 8p11-p23 Gain 8q22-q23 Gain 8q22-q23 Gain 8q22-q23 Gain 8q22-q23 Loss 11q23-q25 Loss 11q23-q25 Loss 11q23-q25 Loss 12p13 Loss 12p13 Loss 12p13 Gain 12q15-q21 Gain 12q15-q21 Gain 12q15-q21 Loss 13q31-q34 Loss 13q31-q34 Loss 13q31-q34 Gain 14q21-q23 Gain 14q21-q24 Gain 14q21-q24 Gain 14q21-q24 Gain 15q26 Gain 16p11 Loss 17p11-p13 Loss 17p11-p13 Loss 17p11-p13 Gain 17q22-q24 Gain 17q22-q24 Gain 17q22-q24 Gain 17q22-q24 Loss 18q12-q23 Loss 18q12-q23 Loss 18q12-q23 Loss 18q12-q23 Loss 19p13-q13 Loss 19p13-q13 Loss 19p13-q13 Loss 19p13-q13 Loss 20p11-p13 Loss 20p11-p13 Gain 20q12-q13 Gain 20q12-q13 Gain 20q12-q13 Loss 21p13-q22 Loss 21p13-q22 Loss 21p13-q22 which change the selective pressure and, thus, favor the selection of new genomic anomalies. If this situation is extended on a large number of cell passages it will lead to important differences between cellular stocks. The level of divergence can be directly related to that of genetic instability and breast cancer cell lines seem particularly prone to it. Evidence for this can be found in recent work by Davidson and colleagues [12] and Kytola and colleagues [26], Clustering analysis wass done on raw quantification results, which were just subjected to a scaling step but not to ratio calculation. Parameters used in the analysis were Hierarchically Cluster Axes for Genesand Array: clusterand similarity metric correlation centered with average linkage clustering. The dendogram on top of the diagram represents cell lines ordered according to their degree of similarity. Complete datasets can be found at http://www.montp.inserm.fr/EMI0229/ download who studied breast cancer cell lines using 24 color caryotyping or SKY. Seven cell lines were studied by both groups and, for 3/7, reported data presented extensive differences. Interestingly, MCF-7 cells were the most divergent in both studies adding further evidence to existing data on phenotypic or caryotypic variations in this cell line. MCF-7 cells of different origins are characterized by their variable chromosome numbers, which range from 55 to 90. Noticeably, some subsets present a bimodal distribution with a first peak at 70 chromosomes and a second one at 130 [8], indicating the coexistence of two cellular subpopulations, one of which had undergone endoreduplication.
Data presented here document that different MCF-7 variants underwent divergence at both the genomic and the RNA expression levels. Furthermore, they indicate that this can occur rapidly according to the MCF-7 variant considered. All the MCF-7 variants studied here showed extensive differences in their CGH profiles. These differences affected the number of regions of either losses or gains, which ranged from 28 in MCF-7-ATCC to 41 in MCF-7-MG, as well as the size of the regions involved. Remarkably, closely related sublines such as MCF-7-R and its 3 daughter clones MCF-7-R-D4, MCF-7-R-F3 and MCF-7-R-G1 presented variations in their CGH profiles as well.
Daughter cells presented aberrations which were absent in the mother subline and, this was less expected, had lost anomalies present in the mother line. Furthermore, sister clones showed different sets of anomalies indicating that these cells bore the capacity to diverge over a limited number of cell generations, even kept in identical culture conditions. It is questionable whether this rapid upsurge of anomalies fits a linear progression model, where mutations are supposed to occur sequentially and be retained due to positive selection. We think more plausible that the differences shown by the 3 subclones be related to the oligoclonal nature of MCF-7-R parent cells. Anomalies found in the subclones in fact preexisted in MCF-7-R cells and were brought to light by cell cloning. In comparison MCF-7-MVLN and its two tamoxifene resistant derivatives MCF-7-MVLN-6ms7 and MCF-7-MVLN-6ms8 were less divergent. MCF-7-MVLN correspond to MCF-7 cells stably transfected with ERE-Luciferase construct and went through a gentamycin selection process. This could have lead to the loss of the preexisting genetic heterogeneity. We propose that MCF-7 cells contain an undetermined number of coexisting clones, out of which one (or several) possess stem clone potential and are responsible for the genetic oligoclonality. . We chose the maximum parsimony approach in a cladistic framework because it is a character based classification method and, as such, was considered to be best adapted to meet our goals [35]. We reconstructed the phylogeny of the MCF-7 clade and, interestingly, MCF-7-ATCC, which was the most divergent MCF-7 subline in our study, was positioned closest to the common ancestor. MCF-7-R came in second, positioned as the ancestor of all other MCF-7 sublines. Out of the total of 62 CNAs present in all the sublines tested, only 8 were selected as diagnostic of the MCF-7 clade. This means that this set of 8 events is shared by all the MCF-7 cells tested here and the original tumor possibly developed upon them. Thus, according to this phylogenetic tree MCF-7-ATCC and MCF-7-R, which bear respectively 28 and 34 CNAs, evolved from a common node. The robustness of these results was reinforced by bootstrap and Bremer analyses.
Given the extensive differences observed at the genomic level we were interested to check different MCF-7 sublines at the transcriptome level. Our RNA expression profiling results confirmed the divergent position of MCF-7-ATCC cells, which clustered with at some distance of other MCF-7 sublines. It, thus, appears from the expression profiling analysis, that MCF-7 sublines can show substantial differences at both the genomic and RNA expression levels and this strongly suggests that the genomic differences could translate into phenotypic differences of possibly equivalent importance. MCF-7 cells are the most commonly used model for hormone responsive breast cancer and there is generally little knowledge concerning the variant used. Our data indicate that this may bear some importance, given the level of genetic variability these cells show and the rapidity with which they evolve.

Conclusions
In conclusion we want to propose that MCF-7 cells could represent an interesting model for genetic evolution of a subset of breast tumors. Breast tumors are prone to chromosomal instability and frequently show cytogenetic oligoclonality [1]. While some cancers were shown to fit the linear progression model, in which each step corresponded to the occurrence of an additional anomaly [36], other data brought evidence of more complex molecular evolution schemes [37]. This latter study compared CGH patterns of matched sets of primary breast tumors and asynchronous metastases. A number of metastases fitted the linear progression model, but it was noticeable that some presented very divergent sets of anomalies compared to their matched primary tumor. Only a limited set of (in some cases none detectable) aberrations were shared. The authors proposed the existence of a common early stem clone which diverged, evolved independently and ultimately lead to the formation of tumors with different locations. This scheme is very similar to what we observed in MCF-7 cells when the ATCC subline was compared to more distant offshoots of MCF-7-R. This leads us to propose that the capacity to generate clonal heterogeneity could represents an important selective advantage in some cancers and lead to aggressive and metastatic forms of the disease.