Transcription profiles of non-immortalized breast cancer cell lines

Background Searches for differentially expressed genes in tumours have made extensive use of array technology. Most samples have been obtained from tumour biopsies or from established tumour-derived cell lines. Here we compare cultures of non-immortalized breast cancer cells, normal non-immortalized breast cells and immortalized normal and breast cancer cells to identify which elements of a defined set of well-known cancer-related genes are differentially expressed. Methods Cultures of cells from pleural effusions or ascitic fluids from breast cancer patients (MSSMs) were used in addition to commercially-available normal breast epithelial cells (HMECs), established breast cancer cell lines (T-est) and established normal breast cells (N-est). The Atlas Human Cancer 1.2 cDNA expression array was employed. The data obtained were analysed using widely-available statistical and clustering software and further validated through real-time PCR. Results According to Significance Analysis of Microarray (SAM) and AtlasImage software, 48 genes differed at least 2-fold in adjusted intensities between HMECs and MSSMs (p < 0.01). Some of these genes have already been directly linked with breast cancer, metastasis and malignant progression, whilst others encode receptors linked to signal transduction pathways or are otherwise related to cell proliferation. Fifty genes showed at least a 2.5-fold difference between MSSMs and T-est cells according to AtlasImage, 2-fold according to SAM. Most of these classified as genes related to metabolism and cell communication. Conclusion The expression profiles of 1176 genes were determined in finite life-span cultures of metastatic breast cancer cells and of normal breast cells. Significant differences were detected between the finite life-span breast cancer cell cultures and the established breast cancer cell lines. These data suggest caution in extrapolating information from established lines for application to clinical cancer research.


Background
The search for differentially expressed genes in tumours has made extensive use of array technology. Most studies have involved tumour biopsy samples or established tumour-derived cell lines [1]. Differentially expressed genes may help to identify tumours with high metastatic potential and pathways that might be therapeutic targets.
As noted by Dairkee et al. [2], one drawback of using established cell lines is that the process of immortaliza-tion in vitro can bias the expression profile when compared to native tissues. Another problem is that a number of widely-used cell lines have sub-lines that differ in their karyotypes and RNA expression levels [3][4][5]. Biopsies from tumours, unless they are microdissected, contain heterogeneous cell types. A molecular signature (or expression profile) of the whole tumour can be misleading since it is a composite of all the cells, normal and malignant, present in it. Although tumours may consist of several populations of cells that deviate slightly from the clonal strain of origin, metastases may involve only the subline(s) with specific genetic changes that favour metastatic behaviour. Cultures from freshly-derived ascites or pleural effusions may therefore be relatively homogeneous before they acquire the further mutations that immortalize cell lines with unlimited growth potential.
In this communication we report which genes, within a defined set of well-known cancer-related genes, are differentially expressed in freshly-derived breast cancer cell cultures compared with primary normal, established normal and breast cancer cultures. We compared expression patterns in cultures of finite life-span cells from pleural effusions or ascites of breast cancer patients (Mount Sinai School of Medicine: MSSM) with commercially-available finite life-span normal breast epithelial cells (HMECs). To ensure comparability, cultures were used at similar passage numbers and the karyotypes were analyzed. We also compared the expression profiles of some of the most widely-used established breast cancer cell lines and three putatively normal established breast cell lines.

Cultures of finite life-span cells a):
HMECs: Different batches of normal human mammary epithelial cells cultures were obtained (Biowhittaker Molecular Applications Inc., NJ). HMEC specimens were enumerated serially and maintained with MEGM (Clonetics, MD) supplemented with 2.5% FCS (Gibco Invitrogen, CA). b): MSSMs: Breast cancer cells obtained from ascitic fluids or pleural effusions of patients with breast cancer were seeded and maintained in our laboratory. They were designated MSSM 3 through MSSM 14 and maintained in MEGM supplemented with 2.5% of the corresponding original fluid when available and 2.5% FCS (Gibco Invitrogen, CA). The full characterization of these cultures will be published elsewhere (manuscript in preparation).
HMECs were passaged only twice or thrice in our laboratory. MSSMs were harvested at the 5th or 6th passage after initial plating. All cell cultures were harvested 48-72 h after plating at about 80% confluence. Sources and markers are given in Table 1.

Arrays
The Atlas Human Cancer 1.2 cDNA expression array (Clontech, CA) is a nylon membrane printed with 200-600 bp fragments of 1176 characterized genes involved in cancer, 9 housekeeping genes and 6 negative controls. RNAs were extracted and labelled with the Atlas pure total RNA labelling system and hybridized according to the manufacturer's instructions.
All the cell lines used for the arrays (9 HMECs, 10 MSSMs, 3 N-est and 7 T-est) were probed twice in separate assays. The accuracy of the duplicates was assessed by Pearson's correlation coefficient based on the adjusted intensities of all the genes spotted on the membrane, which ranged from 0.93 to 0.99.
Hybridizations with 30 µg of total RNA were performed according to the manufacturer's instructions. The hybridized membranes were exposed to a phosphorimager screen and were read at 100 µm resolution in a Storm Phosphorimaging system (Molecular Dynamics, CA). The scanned images were transformed to TIFF files with a PC bit order and then aligned and analyzed using AtlasImage 2.01 software (Clontech, CA). To average or compare the samples, the adjusted intensity signal was normalized using the global normalization mode featured in the software. We report only (a) those genes with significant (p < 0.01) differential expression of 2-fold or more, and (b) genes that were undefined for all the cell lines belonging to one type of sample, but were detected in other types with a difference at least equivalent to one background (540 units in intensity). (Undefined genes are those for which the intensity was below the signal threshold).
The AtlasImage software compares only two samples at a time. When we used it to determine the differences between cell classes, we first averaged the cell lines in the four classes (HMEC, MSSM, N-est and T-est) and then per-formed the comparisons as instructed by the manufacturer.

Quantitative real-time PCR (Q-PCR)
To validate the results of the cDNA array experiments, some of the genes found to be differentially expressed were further examined by real-time PCR in 10 HMECs, 9 MSSMs, 3 N-est and 8 T-est cell lines. Five µg of total RNA (corresponding to about 100 ng of mRNA) were reversetranscribed with oligo(dT) (SuperScript II system, Invitrogen, CA) in a 20 µl reaction volume, and after 125-fold dilution, 1.25 µl were used for a 40-cycle PCR on an ABI Prism 7900 thermal cycler. The reaction was carried out in a 384-well plate with a QuantiTect SYBR Green PCR kit (Qiagen Inc, CA) at an annealing temperature of 63°C and detection at 2-5°C below the T m of the product as determined from its dissociation curve. Product size was confirmed by agarose gel electrophoresis. The efficiency of each pair of primers for amplification was determined and expression of each gene relative to G3PDH was assessed by the program Qgene [10]. Primers were designed using the program PrimerQuest or Primer3, unless otherwise stated. Primer sequences, lengths and T m s of the products are given in Additional file 4.  Samples were tested twice in triplicate. Pearson correlation coefficients for the duplicate Q-PCR results ranged between 0.89 and 0.99.

Overall gene expression and class prediction
All the MSSM cultures displayed similar growth rates and had no or minimal chromosomal changes (data not shown). Table 1 shows the main characteristics of these cells.
Each cell line was probed twice. Averages of the duplicates after normalization of the adjusted intensities (as described in the Atlas manual and briefly in Materials and Methods) were used to obtain the expression values for further analysis. The overall gene expression profile, as determined by Pearson's correlation coefficient, discriminates between HMECs and cells derived from metastasis of breast cancer (MSSMs). As seen in Table 2 (and detailed in Additional file 1) the correlation within cell classes was > 0.90; the correlation between cell classes was < 0.82.
Hybridization signals of 928 genes (78.9%) were represented in at least three cell lines. (One class, N-est, comprised three cell lines; the other three classes, HMEC, MSSM and T-est, comprised more.) We used this subset of 928 genes, plus the 9 control genes, to construct a classification model using two prediction programs: PAM and SVM. Since these algorithms do not work well when the numbers of members differ among classes, we trained the SVM program using comparable numbers of lines from three classes (8 HMECs, 9 MSSMs and 7 T-est) and treated the three N-est cell lines and the remaining HMECs and MSSMs as unknowns to be classified. The model thus generated had a classification accuracy of 100% (24/24) by the leave-one-out cross-validation method. The prediction for the three N-est lines was T-est, suggesting that these cells have an expression profile resembling those of the other established cell lines rather than the cultures of finite life-span cells. The remaining HMEC and MSSM cell cultures were correctly assigned to their respective classes (see Additional file 5).
The same analysis with PAM, applying a threshold of 3.5, which gives the maximum number of significant genes yielding no misclassification error upon cross-validation, predicted MTSV1-7 and MCF 10F as members of the "Test" class and MCF 10A as an HMEC-class member with a probability of 1. Again, HMEC 13 and MSSM 14 were properly assigned (Fig 1 and Additional file 2). Further to elucidate the classification of MCF10A, which is the most widely-used putatively normal breast cell line in array analysis, we examined the expression values of KT14, KT8/18 and KT19 ( Fig. 2A and 2B) and CD104. The MCF10A values were similar to those of the HMECs.

Comparison of non-immortalized normal and cancer breast cells
Of the 1176 genes in the array, 862 (73.3%) gave hybridization signals in at least three of the finite life-span cell lines used. Of these, 123 (14.3%) showed differential expression when HMEC and MSSM cells were analyzed using the AtlasImage software (ratio >2 and difference in adjusted intensity > 540), and 101 were deemed significant genes with SAM (q-value: 0.7298, median #FDR: 0.73718) (Additional file 6).
Of these 123 from Atlas and 101 from SAM, 75 genes were differentially expressed with at least a 2-fold change according to SAM and AtlasImage, a minimum difference in intensity of 540 units and a p-value < 0.01 (Excel). These two programs use different algorithms to calculate the ratio (AtlasImage) and fold (SAM) values, so the outputs are not exactly the same. Fig. 3 shows the relative expressions of these 75 genes in all the cell lines analyzed by the arrays using the software Pomelo Tool (FDR and pvalues for each gene in Additional file 7).
Some genes showed a broad range of expression values among cell lines belonging to the same group: one or two individual cell lines over-expressed the gene (>5 fold the average of the cell class), while others in the same class gave no signal or were barely above background. Since we were looking for genes that could enable us to differentiate among types of cell lines, and hence be useful as markers for each class, we inspected the values of these genes manually. The aim was to exclude from the 75 genes in Fig  3 any that gave extreme values in 2 or more cell lines within a class; "extreme values" were those that lay in the range of the other class, or biased the average of the class in question to generate the required 2-fold change.
Twenty seven up-regulated and 21 down-regulated genes remained, in addition to the cytokeratins, which are not included in Tables 3 and 4    Relative expression of all the 29 cell linestested by Atlas arrays    Tables 3 and 4 (SVG file and p-values of GOMiner in Additional files 3 and 8 respectively).

Comparison of immortal and non-immortal breast cancer cells
Fifty genes showed significant >2-fold (according to SAM software) and >2.5-fold (according to Atlas) differences between the MSSMs and the established breast cancer cell lines. Twenty-five were up-regulated and 25 down-regu- Genes ordered according to their SAM scores and distribution of the genes according the GO terms using FatiGo lated in the established breast cancer cell lines compared to the finite life-span cultures. Most of the differentially expressed genes can be classified under the GO terms "metabolism" and "cell communication". In particular: most of the down-regulated genes seem to be related to remodelling of the extracellular matrix, cell adhesion and receptor-linked signal transduction, while the up-regulated genes are related to general signal transduction pathways and cell proliferation (Tables 5 and 6).

Real-time PCR
To validate the differences seen in the arrays, some genes were tested by real time PCR. We selected several genes of which the expression was significantly increased or decreased, according to SAM and Atlas, for both the MSSM/HMEC and T-est/MSSM comparisons: 14-3-3σ (also called SFN), SPINT2, FES, SPARC, BIGH3 (or TGFBI), TIMP1, TIMP2, MMP11 and DAB2. Also, we analyzed some genes that were deemed significant for only Table 5 Genes ordered according to their SAM scores and distribution of the genes according the GO terms using FatiGo Genes ordered according to their SAM scores and distribution of the genes according the GO terms using FatiGo one of the comparisons: NOTCH1, PLAU, CDA and SERPINB2 (or PAI2), and a few genes that were non-significant but somehow related to some of the aforementioned genes: tPA, PAI1, uPAR, DCK.

: Gene expression of T-est cells in relation to MSSM cells 25 Significant Up-regulated Genes in T-est cells SAM Atlas
It has been reported that correlation between Q-PCR and array data is highly variable [11]. It depends, in part, on the sensitivity of arrays in detecting genes with low expression levels or saturation due to very high ones. The genes we have tested showed correlations that ranged from 0.563 to 0.959. For the comparison between the normal (HMEC) and the tumour (MSSM) finite life-span cultures, the Q-PCR results for the manually curated genes (as explained above) supported the findings of the arrays in 17/17 (100%) of cases. In the T-est/MSSM evaluation, agreement between the two techniques was found in 15/ 17 (88.2%) of the cases (Fig. 4 and data available upon request).
One of the genes excluded from the HMEC/MSSM comparison was CYR61, which was tested by Q-PCR. Even though the overall up-regulation value for this gene was consistent (SAM: 3.53, Atlas: 3.48, Q-PCR: 3.29), its expression levels crossed over to the values of the other cell class in 4/17 cell lines, thus invalidating CYR61 as a reliable marker by itself.  Table S1.

Discussion
Established breast cancer cell lines have been widely used to study signal transduction pathways, test new pharmaceuticals and determine expression profiles that might predict the metastatic capabilities of primary tumours. In many cases MCF10A has been chosen as the "normal" control, even though this cell line has been reported to possess markers for both myoepithelial and luminal phenotypes [12].
One of the controversies about the use of HMECs as controls is the probable myoepithelial origin of these cells [13] based on expression of KT14 and CD104 (ITGB4) [14,15]. The expression levels of these genes in the MCF10 cells were equivalent to those in the HMECs. Furthermore, the prediction of PAM, based on the 58 genes that discriminate between the different classes of cells used in this study, was that MCF10A was similar to HMECs. Hence, MCF10A cells would have the same limitation as controls as the HMECs. Nevertheless, HMEC cells have the advantage of not being immortalized and pooling them combines the genetic backgrounds of a large cohort.
Both classes of non-immortalized cell lines, HMEC and MSSM, are more homogeneous than the established ones (N-est and T-est), as seen in Table 2 and Additional file 1. This is probably due to the cumulative effects of the mutations accrued individually by the established cell lines during successive passages and immortalization.
As seen in Fig 3, the expression of some genes in a particular class is clearly different from the others, e.g. up-regulation of FES, MMP11, DAB2 and down-regulation of SPINT2, SFN, JUP for the MSSM cells lines. Others are more distinctive of a "state", e.g. distribution of cytokeratins and down-regulation of certain integrins (ITGA6/ ITGB4, ITGA7) in all the tumour-derived cell lines. Therefore, the expression levels of these genes can be seen as specific attributes of certain classes. Their combined expression defines a profile that can be used to construct a model similar to those built by SVM and PAM for predicting the classification of an unknown cell line accurately, as shown in fig 1B. We tested only two sub-lines of MCF-7, well-known to be highly variable [3][4][5], and a single sub-line of each of the other established cell lines. We recognize that other sublines might be different.
When the MSSMs are compared with the HMECs, several genes display differential expression in a mode consistent with previous publications, where they have been shown to be significant for malignant progression or metastasis: down-regulation of the tumour suppressor and inhibitor of mitotic phase entry 14-3-3σ (SFN) [16,17], the serine protease inhibitor SPINT2 [18] and JUP (γ-catenin) [19]; and up-regulation of FES [20] and SPARC [21]. In addition, there is the "cadherin switching" (CDH3 = P-cadherin to CDH2 = N-cadherin) and its relationship to FGFR1 and MMP9 [22,23].
SPARC expression was detected in 17/17 human breast tumour biopsies and to a lesser extent in some established cell lines [21]. It has also been associated with malignant progression and invasive potential in breast cancer [24,25], and together with MMP11 in colorectal [26] and oesophageal cancer [27]. Its over-expression increases motility and invasion [28] and induces growth inhibition [29] in established breast cancer cell lines. In addition, it has been shown to induce expression of BIGH3 and PAI-1 [30]. MSSM cells showed up-regulation of SPARC, BIGH3, PAI1 and MMP11, while the T-est cells showed down-regulation of SPARC, BIGH3 and PAI1 (MDA-MB231 is an exception for the latter gene) (Fig 4A-B-C).
We also found genes of which the behaviour did not fully agree with previous descriptions, i.e. NOTCH1, CYR61 and DAB2. The many and varied functions of Notch signalling, achieved through activation or down-regulation, have been recently reviewed [31]. In the MSSM samples, NOTCH1 and its ligand JAG2 are down-regulated. In this case, this pathway is more likely to function as a tumour suppressor than an oncoprotein. This conclusion would be less clear if we had only compared established cell lines (mainly MCF10A with both MCF7s) (Fig 4D-E). CYR61 is a pro-angiogenic, secreted protein encoded by a growth factor-inducible immediate-early gene. It is overexpressed in some invasive established breast cancer cell lines and in 30-36% of primary tumours [32,33]. In this study we found only three cell lines with a truly high overexpression (MSSM6, MDA-MB231 and BT20) and five with a moderate over-expression compared to the mean expression in HMECs (between 2 and 4 fold). The same conclusion would have applied if we had considered MCF10A as a control (Fig 4F). DAB2 is considered a tumour suppressor since its expression is down-regulated in ovarian carcinomas and in some established breast cancer cell lines [34], and up-regulated during megakaryocyte differentiation [35]. Its continued expression in tumour cells led to growth inhibition or cell death [36] unless the cells were in contact with some type of basement membrane [37]. MSSM cells showed up-regulation of this gene (Fig 4G), perhaps because they grew as an attached cell line or because of their finite life-span phenotype in vitro.
Epithelial-mesenchymal transition (EMT) is considered a mechanism for carcinoma progression and metastasis, and the expression of vimentin (VIM) is its main marker. This view has now been extended to include whole pathways and a more complex relationship with the microen-vironment of the cell [38,39]. Among the other genes regulated during EMT [40], the MSSM cells showed upregulation of COL6A1, SPARC, CDH2 and DAB2 and down-regulation of JUP and BTG2 (Tables 3 and 4).

Conclusion
Using arrays, we have studied 10 finite lifespan breast cancer cell lines freshly isolated from metastatic pleural or peritoneal fluids, 9 finite lifespan normal breast cell lines, 7 established breast cancer cell lines, and 3 established normal breast cell lines. We tested 1176 genes considered to be related to cancer. Within each cell class there was significant homogeneity of gene expression. Two clusters of genes distinguished the MSSMs from the HMECs. These 48 genes, which were differentially expressed by at least 2 fold, concerned cell-cell interactions and remodelling of the extracellular matrix. Fifty genes that were differentially expressed at least 2 fold between MSSMs and established breast cancer cell lines are generally considered to be involved in cell communication and metabolism. Established breast cancer cell lines have been used to model biochemical and pharmacological responses in human breast cancer; the differences from freshly isolated breast cancer lines imply they are not wholly satisfactory models.