Genomic alterations identified by array comparative genomic hybridization as prognostic markers in tamoxifen-treated estrogen receptor-positive breast cancer

Background A considerable proportion of estrogen receptor (ER)-positive breast cancer recurs despite tamoxifen treatment, which is a serious problem commonly encountered in clinical practice. We tried to find novel prognostic markers in this subtype of breast cancer. Methods We performed array comparative genomic hybridization (CGH) with 1,440 human bacterial artificial chromosome (BAC) clones to assess copy number changes in 28 fresh-frozen ER-positive breast cancer tissues. All of the patients included had received at least 1 year of tamoxifen treatment. Nine patients had distant recurrence within 5 years (Recurrence group) of diagnosis and 19 patients were alive without disease at least 5 years after diagnosis (Non-recurrence group). Results Potential prognostic variables were comparable between the two groups. In an unsupervised clustering analysis, samples from each group were well separated. The most common regions of gain in all samples were 1q32.1, 17q23.3, 8q24.11, 17q12-q21.1, and 8p11.21, and the most common regions of loss were 6q14.1-q16.3, 11q21-q24.3, and 13q13.2-q14.3, as called by CGH-Explorer software. The average frequency of copy number changes was similar between the two groups. The most significant chromosomal alterations found more often in the Recurrence group using two different statistical methods were loss of 11p15.5-p15.4, 1p36.33, 11q13.1, and 11p11.2 (adjusted p values <0.001). In subgroup analysis according to lymph node status, loss of 11p15 and 1p36 were found more often in Recurrence group with borderline significance within the lymph node positive patients (adjusted p = 0.052). Conclusion Our array CGH analysis with BAC clones could detect various genomic alterations in ER-positive breast cancers, and Recurrence group samples showed a significantly different pattern of DNA copy number changes than did Non-recurrence group samples.


Background
The incidence of breast cancer has been rapidly increasing in Korea and it has been the most frequent malignancy in Korean women since 2002 [1]. Breast cancer is a highly heterogeneous disease both histologically and molecularly, and hormone receptor-positive and -negative tumors are quite distinct biologically. Recent gene expression profiling has identified hormone receptors as a fundamental parameter for distinguishing breast cancers, suggesting a molecular difference according to hormone receptor status [2]. More than 50% of breast cancer cases are estrogen-dependent, and treatment with estrogen antagonists that inhibit estrogen receptor (ER) action, particularly tamoxifen, has contributed to a dramatic reduction in breast cancer mortality. However, a substantial proportion of patients expressing ER either fail to respond initially or become progressively resistant to endocrine therapies [3]. Thus, it would be ideal to predict therapeutic efficacy for each patient before treatment is initiated.
Genetic and epigenetic alterations are important steps in the development of malignancies and may contribute to disease progression during treatment. Similarly, genetic alterations may play a role in the development of tamoxifen resistance [4]. Searches for genes exhibiting altered expression in resistant breast cancer cells have been performed using differential display, Serial Analysis of Gene Expression (SAGE), Comparative Genomic Hybridization (CGH), and expression microarray, and several marker genes have been identified using these techniques [5][6][7][8][9]. However, the exact molecular mechanism, other than ER expression, underlying tamoxifen response and resistance is not yet understood.
Array CGH has been used to localize copy number changes associated with human breast and other cancers [10][11][12][13][14]. Similar to chromosomal CGH, array CGH compares the abundance of specific genomic sequences in whole-tumor DNA relative to normal reference genomes. Array CGH can provide higher resolution than conventional CGH with more accurate mapping of regions that contain oncogenes or tumor suppressor genes [15]. With more and more array CGH data emerging, there is a need for efficient algorithms that automatically select regions of gains and losses. Recently, various software items have been released to make this complex analysis possible [16][17][18][19][20][21].
In this study, we used array CGH to assess DNA copy number changes in 28 fresh-frozen ER-positive breast cancer tissue samples. A program for array CGH data analysis, the CGH-Explorer and Analysis of Copy Error (ACE) algorithm by Lingjaerde et al. [20], was used for calling gains and losses. The purpose of this study was to elucidate whether DNA copy number changes in the primary tumor can predict a patient's prognosis and tamoxifen responsiveness in ER-positive breast cancer and to identify the corresponding chromosomal regions and genes.

Patients and tumor specimens
A total of 28 primary invasive breast cancer tissues selected from the frozen tissue archives in the Cancer Research Institute, Seoul National University, were used in this study. All tumors were excised between November 1996 and February 2001 and were histopathologically confirmed as invasive ductal carcinoma. No in situ cancers were included. This study was conducted under the approval of the Institutional Review Board of Seoul National University Hospital. Informed consent was obtained from all participants prior to surgery. All patients received tamoxifen as an adjuvant endocrine and PR (DAKO, Glostrup, Denmark) and immunohistochemical methods have been previously described [22]. A cut-off value of ≥ 10% positively-stained cells per 10 highpower fields was used to determine ER and PR positivity. Tissue samples were frozen in liquid nitrogen within 20 minutes following devascularization in the operating room and stored at -80°C. All tumor specimens analyzed contained more than 50% tumor cells.

Construction of BAC library
The array used in this study consists of 1,440 human Bacterial Artificial Chromosomes (BACs), which were spaced approximately 2.3 Mb on average across the entire genome (MacArray™ Karyo1400 from Macrogen, Inc., Seoul, Korea). BAC clones were selected from the proprietary BAC library of Macrogen, Inc. Briefly, the pECBAC1 vector [23] was digested with HindIII and size-selected HindIII-digested pooled male DNAs were used to generate a BAC library. These vectors were then transformed into and grown in the Escherichia coli DH10B strain.

Construction of BAC-mediated array CGH microarray
Clones were first selected to yield an average genomic coverage of 2-Mb resolution. All clones were two-end sequenced using an ABI PRISM ® 3700 DNA Analyzer (Applied Biosystems, Foster City, CA, USA), and their *Region of the highest frequency of alteration within a given range of gain or loss regions † Percentage of patients who have gain/loss. When an aberration covers more than one BAC clone, and the frequencies are uneven across the BAC clones, the minimum and maximum frequencies are shown.
sequences were blasted and mapped according to their positions described in the University of California, Santa Cruz (UCSC) human genome database http:// www.genome.ucsc.edu. Confirmation of locus specificity of the chosen clones was performed by removing multiple loci-binding clones by individual examination under standard fluorescence in situ hybridization (FISH) as described previously [24]. These clones were prepared by the conventional alkaline lysis method to obtain BAC DNA. The DNA was sonicated to generate ~3-kb fragments before mixing with 50% DMSO spotting buffer. The arrays were manufactured by an OmniGrid arrayer (GeneMachine, San Carlos, CA, USA) using a 24-pin format. Each BAC clone was represented on an array as triplicate spots and each array was pre-scanned using a GenePix 4200A scanner (Axon Instruments Inc., Foster City, CA, USA) for proper spot morphology.

DNA labeling for array CGH
DNA was extracted from each breast cancer tissue sample using the PureGene kit (Gentra Systems Inc., Minneapolis, MN, USA). The labeling and hybridization protocols described by Pinkel et al. [15] were used with some modification to the labeling procedure. Briefly, 21 µl solution containing 500 ng normal DNA (reference DNA) or tumor DNA (test DNA), 20 µl BioPrime ® DNA Labeling System random primers solution (Invitrogen, Carlsbad, CA, USA), and water were combined and incubated for 5 min at 95°C, and subsequently cooled on ice. After the addition of 5 µl 10 × dNTPs labeling mix (1 mM dCTP, 2 mM dATP, 2 mM dGTP, 2 mM dTTP), 3 µl 1 mM Cy-3 or Cy-5 dCTP (GeneChem Inc., Daejeon, Korea), and 40 U BioPrime ® DNA Labeling System Klenow fragment (Invitrogen), the mixture was gently mixed and incubated overnight at 37°C. The addition of 5 µl BioPrime ® DNA Labeling System Stop Buffer (Invitrogen) ended the reaction. After labeling, unincorporated fluorescent nucleotides were removed using QIAquick Spin columns (Qiagen, Germany). In one tube, Cy3-labeled sample and Cy5-labeled reference DNAs were mixed together, and 50 µg human Cot I DNA (Invitrogen), 20 µl 3 M sodium acetate, and 600 µl cold 100% ethanol were added for DNA precipitation.

Array hybridization, imaging, and data preprocessing
The pellet was resuspended in 40 µl hybridization solution containing 50% formamide, 10% dextran sulfate, 2× SSC, 4% SDS, and 200 µg yeast tRNA. The hybridization solution was denatured for 10 min at 72°C and was subsequently incubated for 1 hour at 37°C to allow blocking of repetitive sequences. Hybridization was performed in slide chambers for 48 hours at 37°C. After post-hybridization washes, arrays were rinsed, spin-dried, and scanned into two 16-bit TIFF image files using a GenePix 4200 A two-color fluorescent scanner (Axon Instruments), and individual spots were analyzed with GenePix Pro 3.0 imaging software (Axon Instruments). Clones on the X and Y chromosomes were not used for further analysis because their intensity may distort the entire data set. The log 2 -transformed fluorescent ratios were calculated from Red/green colored bars correspond to the percentage of samples with gain or loss in the given chromosomal region.
Cluster dendrogram of array CGH data from 28 ER (+) breast tumor tissues by unsupervised clustering Figure 1 Cluster dendrogram of array CGH data from 28 ER (+) breast tumor tissues by unsupervised clustering.
The scale on the left vertical bar indicates Manhattan distance. Samples with the prefix R-belong to the Recurrence group and those with the prefix N-belong to the Non-recurrence group. The samples from the nine patients in the Recurrence group were clustered together under the thirdorder dendrogram branch.
background-subtracted median intensity values, and these ratios were used to perform normalization according to intensity normalization methods. We applied LOWESS normalization, a smoothing adjustment that removes intensity-dependent variation in dye bias [25]. Data for each slide was normalized so that the mean in-slide expression value was 0 and the SD was 1. 109 different BAC clones have been excluded from further analysis because values of at least 2 spots of the triplicate have been missing. Only triplicate of 1,331 different BAC clones were used.

Real-time quantitative polymerase chain reaction (PCR)based copy number validation
The DNA copy number of one clone, BAC57_O15 (8q21.13; BAC start 82359567 and end 82463826: NCBI homo sapiens genome Build 35.1) was assayed according to a previously described protocol with normalized normal human pooled genomic DNA (Promega, Madison, WI, USA) as a calibration sample. Real-time PCR reactions were performed on an ABI 7900HT (Applied Biosystems). Amplification mixtures (20 µl) contained template DNA (approximately 2 ng), 1 × TaqMan PCR MasterMix, and    order to decide which region to pursue for further analysis. ACE is one of only two algorithms that incorporate FDR so far. Another advantage of ACE is that it can be used even in situations where normal DNA is unavailable.
The counts of loss or gain vs. no change were summarized by tumor group for each BAC, providing 2 x 2 tables for analysis. Chi-square analysis was applied to these tables to test for a significant difference in the distribution of loss or gain vs. no change between tumor groups (Recurrence and Non-recurrence) for each BAC. We then used the q value from the R package of Bioconductor to adjust for multiple comparisons and assign these resulting q values as adjusted p values. Equivalently, the q value is the minimal FDR at which the gene/clone appears significant.
As an alternative analysis to increase our power in identifying regional changes in copy number between tumor groups (Recurrence and Non-recurrence), we averaged log 2 ratios over a window of three consecutive BACs, shifting along the chromosome one BAC at a time. To find clones with differential aberrations between groups, we used the Significance Analysis of Microarrays (SAM) software [28] for the averaged log 2 ratios. SAM is most effective for small numbers of experiments and is the most popular method employed for microarray analysis. In the microarray context, the expression levels of some genes are highly correlated although an analytical FDR approach assumes that tests are independent. To overcome this drawback, SAM uses permutations to get an estimate for the FDR of the reported differential genes. SAM score (d) is the T-statistic value. "Fold change" in SAM output is the ratio of average expression levels of a given gene under each of two conditions (Recurrence group/Nonrecurrence group in this study). SAM adopts q-value as the lowest FDR at which the gene is called significant. The qvalue measures how significant the gene is: as d > 0 increases, the corresponding q-value decreases.
The follow-up duration was calculated from the date of diagnosis until the date of death or last contact. The distant metastasis-free survival was the time between diagnosis and confirmation of distant recurrence. Survival estimates were computed using the Kaplan-Meier method and the differences between survival times were assessed by means of the log rank test. Multivariate analyses were carried out using Cox's proportional hazards model [29]. Survival analyses were carried out using the SPSS (version 12.0) software package (Chicago, IL, USA).

Results
The distribution of potential prognostic factors was roughly matched between the Recurrence and Non-recurrence groups. Patient age, tumor size, axillary lymph node metastasis, histologic grade, nuclear grade, and expression of PR were not significantly different between the two groups, as shown in Table 1. Correlation of array CGH data (black bar) with real-time PCR data (white bar) Figure 3 Correlation of array CGH data (black bar) with realtime PCR data (white bar). Vertical axis is log 2 ratio of each sample DNA to control. *, samples called as "gain" with CGH-Explorer. The samples called as gain in array CGH study tend to have higher value with real-time PCR than the other samples.
First, we performed complete linkage unsupervised hierarchical clustering of all 28 samples using the 1,331 clones distributed across the 22 autosome pairs. The nine samples from Recurrence group patients tended to be located relatively close together below the third-order branch in the dendrogram (Figure 1). The genomic alteration pattern was expected to differ considerably between the Recurrence and Non-recurrence groups in this analysis.
Using the ACE algorithm in the CGH-Explorer program, copy errors were determined as either copy gain or loss. Out of the 1,331 clones, 1,154 clones showed gains and 1,088 clones showed losses in at least one sample. Only 14 clones showed no gain or loss across all 28 samples. The chromosomal regions with common genomic alterations (≥ 20%) are shown in Table 2. Gains in 1q21.1-q44, 17q23.2-q25.3, 8q11.21-q24.3, 17q12-q22, and 8p12-p11.1, and losses in 6q11.1-q27 and 11q13.5-q24.3 were the most common altered regions among the 28 ER-positive breast cancer samples (Table 2 and Fig. 2A). The majority of these regions have been reported in previous conventional or array CGH breast cancer studies [13,14,30,31].
To identify genomic alterations that are associated with disease recurrence, we first compared the gain and loss frequencies between the Recurrence and Non-recurrence groups. The average frequency of copy number changes was similar between the two groups. Of the Recurrence group samples, an average of 14.7% of the clones had gains and 12.3% had losses, whereas an average of 16.0% and 12.8% of the clones in the Non-recurrence group samples had gains and losses, respectively. The frequency patterns of copy number changes for each group across the entire genome are shown in Fig. 2B and 2C.
Using the chi-square test applied to a 2 × 2 table to compare gain and loss of individual BACs between the Recurrence and Non-recurrence groups, we identified BACs that were significantly different between the two subtypes (adjusted p value <0.05), including gains in 8q21. 13, 19p13.12-p13.11, 19q12- (Table 3 and 4). Using a more stringent threshold of sig-nificance (adjusted p <0.01), loss in 11p15.5-p15.4 was the only difference between the Recurrence and Nonrecurrence groups. Gains in 19p, 19q, 22q, 1p, 9q, and 17q and loss in 4q were found more often in the Nonrecurrence group than in the Recurrence group. To validate our array CGH data, real-time PCR for 8q21.13 was performed, which revealed a correlation between the two methods ( Figure 3).
As an additional method to identify copy regional changes that differed significantly between the two groups, we used SAM analysis applied to the sliding windows of three consecutive BAC clones from each group. The majority of significant changes were toward loss in the Recurrence group or gain in the Non-recurrence group. Copy changes of 125 clones were different with q values in SAM <0.5% between the two groups, including 11p15, 1p36, 7q22, 11q13, 19p13, 19q13, 9q34, 3q21, 10q26, and 11p11 (Table 5). Among these, losses in 11p15.5-p15.4, 1p36.33, 11q13.1, and 11p11.2 were significantly more frequent in the Recurrence group using both types of analysis. These regions that exhibited copy number changes are therefore strong candidates for prognosis indicators in ER-positive breast cancers.
In the Kaplan-Meier analysis of the 28 patients, both loss at 11p15.5 and gain at 8q21.13 were significantly associated with distant metastasis-free survival (p <0.001 and p = 0.006, respectively; Fig 4A, B). 1p36.33 loss was also the significant prognostic factor for distant recurrence (p <0.001, figure not shown). Multivariate analysis using the Cox proportional hazard model with parameters including age, T stage, lymph node status, nuclear grade, and PR revealed that loss of 11p15.5 was the most significant factor among the different variables (hazard ratio 12.3 [95% confidence interval: 2.7-55.4]) ( Table 6).
Next, we did subgroup analysis according to lymph node status. The number of cases in lymph node negative patients was too small for this analysis (n = 2 in Recurrence group and N = 6 in Non-recurrence group). In lymph node positive patients, no chromosomal aberration was significantly different between the two risk groups with an adjusted p value <0.05. Loss of 1p36.33-p35.3 and 11p15.5-p15.4 were found more often in Recurrence group, and loss of 4q13.1-q34.3 and gain of 19q12-q13.2 were found more in Non-recurrence group with borderline significance (p = 0.052) ( Table 7).

Discussion
This study showed that genomic alteration patterns detected by array CGH are different between ER-positive breast cancer patients in the Recurrence group and the Non-recurrence group after surgery and tamoxifen treatment. These differences were first suggested by unsupervised hierarchical clustering and then confirmed by two statistical methods that were designed to identify the most different specific BAC clones and chromosomal regions between the two groups. We found that loss of 11p15 and 1p36 and gain of 8q21 are significantly associated with distant recurrence of the disease within 5 years of diagnosis. These chromosomal aberrations are thus candidate markers for tumor aggressiveness or tamoxifen resistance in ER-positive breast cancers. The independent prognostic significance of two of these chromosomal aberrations was further demonstrated by a striking survival difference in patients who did and did not have the corresponding chromosomal copy number change. The fact that most of the genomic alterations that we identified have been previously reported in other conventional or array CGH studies underscores the confidence of the methodology used here. Clinically, if we can predict the prognosis of ER-positive breast cancer by whole genome DNA copy number analysis, use of new generation aromatase inhibitors or cytotoxic chemotherapy can be actively considered for the patients with a poorer prognosis.
Allelic loss at 11p15.5 is frequently observed in a variety of tumors, including breast cancer [32][33][34][35]. In invasive ductal carcinoma, the frequency of LOH at 11p15.5 is approximately 30-60% [32,33]. Two distinct regions on chromosome 11p15 that are subjected to LOH in breast cancer have been identified and refined by Karnik et al. [34]. Phillips et al. determined the potential effects of chromosome 11 on the tumorigenic and metastatic abilities of the MDA-MB-435 cell line via chromosome transfer, and indicated that chromosome 11 harbors a metastasis-suppressor gene for breast cancer [36]. Until now, any association between chromosome 11p15.5 and clinical parameters such as recurrence or survival has not been well established. Several studies have suggested that LOH in this region increases during breast cancer progression [34,37,38], and one conventional CGH study showed that 11p loss is associated with disease recurrence in lymph node-negative breast cancer [39].
We could not validate 11p15.5 loss with our real-time PCR method that was used to show the amplification of 8q21.13. The development of an efficient methodology that can validate the copy number loss should be given priority in future studies.
Ragnarsson et al. [40] found a high percentage of LOH at 1p36 and significant separation of survival curves between breast cancer patients with and without this alteration. These researchers also showed that LOH at 1p is a better prognostic indicator than any other variable, including lymph node metastasis.
Chromosome 8q gain is known to be associated with poor outcome in patients with clinically localized prostate cancer [41,42] and node-negative breast cancer [30,31]. TPD52 on chromosome 8q21 has been proposed as a potential amplification target gene in this region [43][44][45].
We divided the patients into axillary lymph node negative and positive breast cancers to avoid biases caused by mixing the different risk groups, because the axillary lymph node status has been shown to be the single most impor-  tant prognostic factor for disease-free survival and overall survival in breast cancer patients. Due to the small number of cases in each group, no chromosomal aberration was different in frequency between the two risk groups. 1p36 and 11p15, the most significant chromosomal location in overall analysis, were also found more in Recurrence group than Non-recurrence group with modest significance in lymph node positive patients. More cases are required to confirm this result in the subgroup analysis and to elucidate the usefulness of these potential prognostic factors in lymph node negative patients.
In the recent publication of Arpino et al., ER+/PR-tumors express higher levels of HER-1 and HER-2 and display more aggressive features than ER+/PR+ tumors [46]. In this study, we showed that several chromosomal aberrations, such as 11p15.5 loss have stronger effect on distant recurrence of ER+ patients than PR status. It would be also interesting to stratify the patients according to PR expression status and investigate whether the pattern of DNA copy change is different between ER+/PR-and ER+/PR+ tumors.
A major weakness of our study is the small number of samples used, which was due to the limited availability of suitable fresh-frozen breast cancer tissues. DNA from formalin-fixed, paraffin-embedded tissues that had been archived for many years was tested, but it was so fragmented that it could not be utilized in this array CGH study.
With the increasing needs for efficient software that automatically select regions of gains and losses in array CGH studies, many algorithms have been recently developed [17][18][19]47]. We used a representative software and algorithm, CGH-Explorer and ACE, for thresholding the copy number ratios. It was comprehensive and had userfriendly graphical tools. An analytical tool that can identify the clones between two subject groups has not yet been established for array CGH.

Conclusion
In conclusion, using array CGH analysis with BAC clones, we were able to detect various genomic alterations in ERpositive breast cancers. Patients in the Recurrence group showed a significantly different pattern of chromosomal gain and loss than patients in the Non-recurrence group. Copy number loss of 11p15 and 1p36 and gain of 8q21 are significantly associated with distant recurrence of the disease within 5 years of diagnosis. These regions will be further explored for the identification of metastasis-suppressing or -enhancing genes using a higher resolution method with a larger number of study subjects.