Array comparative genomic hybridization analysis discloses chromosome copy number alterations as indicators of patient outcome in lymph node-negative breast cancer

Background Patients with lymph node metastasis-negative (pN0) invasive breast cancer have favorable outcomes following initial treatment. However, false negatives which occur during routine histologic examination of lymph nodes are reported to underestimate the clinical stage of disease. To identify a high-risk group in pN0 invasive breast cancer, we examined copy number alterations (CNAs) of 800 cancer-related genes. Methods Using array-based comparative genomic hybridization (CGH) in 51 pN0 cases (19 relapsed and 32 non-relapsed cases), the positivities of specific gene CNAs in the relapsed and non-relapsed groups were compared. An unsupervised hierarchical cluster analysis was then performed to identify case groups that were correlated with patient outcomes. Results The cluster analysis identified three distinct clusters of cases: groups 1, 2, and 3. The major component was triple-negative cases (69%, 9 of 13) in group 1, luminal B-like (57%, 13 of 23) and HER2-overexpressing (26%, 6 of 23) subtypes in group 2, and luminal A-like subtype (60%, 9 of 15) in group 3. Among all 51 cases, those in group 1 showed significantly worse overall survival (OS) than group 2 (p = 0.014), and 5q15 loss was correlated with worse OS (p = 0.017). Among 19 relapsed cases, both OS and relapse-free survival (RFS) rates were significantly lower in group 1 than in group 2 (p = 0.0083 and 0.0018, respectively), and 5q15 loss, 12p13.31 gain, and absence of 16p13.3 gain were significantly correlated with worse OS and RFS (p = 0.019 and 0.0027, respectively). Conclusions As the target genes in these loci, NR2F1 (5q15), TNFRSF1A (12p13.31), and ABCA3 (16p13.3) were examined. 5q15 loss, 12p13.31 gain, and absence of 16q13.3 gain were potential indicators of high-risk recurrence and aggressive clinical behavior of pN0 invasive breast cancers.


Background
Breast cancer is one of the commonest malignancies in women, and mainly occurs in middle-aged or older women with an estimated 1.7million cases and 521,900 deaths in 2012 [1]. For the past decades, the treatment of breast cancer has experienced several changes by tissuebased biomarker and comprehensive analysis of gene expression profiles based on cDNA microarrays has revealed distinct intrinsic subtypes of breast cancer [2][3][4]. This intrinsic subtype classification was improved [5] and then modified as "surrogate" immunohistochemical subtypes comprising luminal A-like, luminal B-like (HER2-positive and HER2-negative), HER2-overexpressing, and triplenegative subtypes [6]. This immunohistochemical classification is clinically applied for treatment decisions and prediction of patient prognosis [7][8][9].
Lymph node metastasis-negative (pN0) invasive breast cancer was reported to be associated with good prognosis, with 20-30% 10-year recurrence rate [10][11][12]. The pN0 breast cancers were classified into subclasses of low and high recurrence risk according to immunohistochemical subtype or histopathological parameters [10][11][12]. However, immunohistochemical or histopathological parameters are not quantitative, and low inter-observer reproducibility can be problematic [13,14]. Therefore, the identification of novel quantitative and reproducible prognostic markers of pN0 breast cancers is of major importance.
Breast cancers have been reported to show a number of genomic alterations, including gene copy number and structural alterations [15][16][17][18][19]. Gene copy number alterations (CNAs) and gene expression profiles play important roles in carcinogenetic pathways and can serve as potential biomarkers for prognostication and treatment decisions. It has been utilized extensively for studying the characteristics of breast cancer. Recently, the CNA profiles of ASB13 and SGCZ genes have revealed significant association with survival outcome in young women with breast cancer [20].
In the present study, we examined the CNA profiles of 800 cancer-related genes in 51 pN0 invasive breast cancers using array-based comparative genomic hybridization (aCGH). pN0 invasive breast cancer has been considered to be at low risk of recurrence for more than 5 years after radical surgery. However, it has been reported that the routine examination of regional lymph nodes may be inadequate for the detection of obscure metastases, and that micrometastases were only identified through laborintensive multiple sectioning and additional immunostaining. Therefore, other potential biomarkers are needed in the decision-making process for treatment. Based on this background, we attempted to identify gene CNAs that were associated with relapse of cancer and patient death, and we were able to identify several gene alterations that may be useful as prognostic biomarkers.

Tumor samples
We analyzed genomic DNAs of the primary breast cancer tissues resected from patients diagnosed with pN0 breast cancer at the National Cancer Center Hospital, Tokyo, between 1990 and 1994 for the aCGH analysis. We listed 20 cases that did and another 40 cases that did not show relapse. In each case, a part of tumor tissue was acetone-fixed immediately after resection at 4°C overnight, embedded in paraffin, and stored at room temperature. Acetone fixation was employed to preserve high-quality nucleic acids. From hematoxylin-eosin (HE)-stained sections of the tissue blocks, a sufficient amount of genomic DNA was available from 19 relapsed cases and 32 non-relapsed cases.
The mean patient age was 52 years, ranging from 29 to 78. Histological type of cancers consisted of 47 invasive ductal carcinomas, no special type (IDCs-NST); three invasive lobular carcinomas; and one squamous cell carcinoma. The present study was approved by the internal review board for ethical issues of the National Cancer Center and National Defense Medical College. We utilized a strict set of inclusion criteria for patients in the study as follows. Patients were diagnosed with pN0 breast cancer at the National Cancer Center Hospital. All patients provided written informed consent, in accordance with ethical guidelines at the National Cancer Center Hospital. At the time of enrolment in the study, patients had not received adjuvant chemotherapy and tamoxifen was administered to hormone-receptorpositive cases. Each tissue sample was reviewed by a pathologist to confirm the diagnosis and that the sample met inclusion criteria.
Genomic DNA isolation and labeling DNA was isolated from 10 sheets of 10-μm-thick acetone-fixed paraffin-embedded tissue sections. Based on examination of HE-stained slides, over 60% of constituent cells in the tissue were confirmed to be tumor cells. These sections were cut with needles or laser microdissection (Leica Microsystems, Tokyo, Japan). Total genomic DNA was isolated with a DNA isolation kit (Gentra Puregene Tissue Kit, Qiagen, Hilden, Germany) following the manufacturer's instructions. DNA was quantitated using the Nanodrop spectrophotometer. DNA quality was assessed by evaluating the sample's A260/A280 ratio and its integrity by agarose gel electrophoresis. Reference DNA was derived from a pool of normal female peripheral blood samples. Isolated tumor and reference DNAs were cleaved with DpnII and labeled with Cy3-and Cy5-dCTP (GE Healthcare, Tokyo, Japan), respectively, using the random priming method.

Array-based comparative genomic hybridization (array CGH) using the BAC array
The Bacterial Artificial Chromosome (BAC) array used was constructed in the Fujifilm Advanced Research Laboratories based on the BAC array (MCG Cancer Array-800) that was previously constructed in the Department of Molecular Cytogenetics, Medical Research Institute and School of Biomedical Science, Tokyo Medical and Dental University, Japan [21]. This BAC array, which consists of 800 BACs harboring 800 known cancer-related genes, was intended for use in applying data for cancer-specific CNAs for diagnosis [21]. Hybridizations of the BAC array with tumor or reference genomic DNA was performed as described previously [22]. Hybridized slides were scanned with a GenePix 4000B (Axon Instruments, Union City, CA), and acquired images were analyzed with GenePix Pro 6.0 imaging software (Axon Instruments). Copy number gains and losses were defined as changes in the logarithm to base 2 of the tumor to reference signal intensity ratio (T/R) greater than 0.4 and less than − 0.4, respectively.

Immunohistochemistry and fluorescence in situ hybridization (FISH) test
Immunohistochemistry (IHC) was performed using the En-Vision method with primary antibodies against estrogen receptor (ER; mouse monoclonal clone 1D5, Dako, Glostrup, Denmark), progesterone receptor (PR; mouse monoclonal PgR636, Dako), and HER2 (rabbit polyclonal (HercepTest), Dako). ER and PR were defined as negative when < 1% of tumor cells showed nuclear immunoreaction regardless of the staining intensity [23]. HER2 was defined as negative when the IHC score was 0 or + 1, or when the IHC score was 2+ with negative gene amplification by fluorescence in situ hybridization [24]. According to the immunohistochemical subtypes, luminal A-like subtype was defined as ER-or PR-positive, HER2-negative, and histological grade 1 or 2; luminal B-like subtype was defined as ER-or PR-positive, HER2-negative, and histological grade 3 (luminal B-like, HER2-negative), or ER-or PRpositive and HER2-positive (luminal B-like, HER2-positive); HER2-overexpressing subtype was defined as ER/PR-negative and HER2 positive; and triple-negative was defined as ER/PR negative and HER2-negative [7][8][9].

Hierarchical cluster analysis
An unsupervised hierarchical clustering method was applied to analyze genomic aberration similarities across the 51 primary tumor samples using Cluster 3.0 and TreeView software programs. The clustering algorithm NS not significant, SD standard deviation was set to complete linkage clustering using an uncentered correlation.

Statistical analysis
Differences in frequencies of parameters were calculated using the chi-square test with or without Yates' correction or Fisher's exact test. The Kolmogorov-Smirnov test was applied to the normality of data distribution. Comparison of non-normally distributed data expressed as medians were calculated using the Mann-Whitney U test. Overall survival (OS) curves and relapse-free survival (RFS) curves were drawn with Kaplan-Meier methods, and differences in curves were analyzed using the log-rank test. P < 0.05 for a two sided-test was considered the level of significant difference. Ekuseru-Toukei 2015 (Social Survey Research Information Co., Ltd.) or Exel R statistical software (ystat 2006.xls; Igaku Tosho Shuppan, Tokyo, Japan) was used to analyze the data. Vertical lines indicate frequency of gain or loss. Gene copy-number gains and losses are indicated by red and green, respectively. Green asterisks are the regions that showed frequent gains in over 50% of the cases. Red asterisks are the regions that showed frequent losses in over 50% of these tumors. The chromosome loci that frequently showed gains or losses were common between the relapsed group and the non-relapsed group

Results
Comparison between relapsed and non-relapsed groups The distribution of patient age, clinical stage, histological type, grade, and immunohistochemical subtype did not differ significantly between relapsed and non-relapsed groups (Table 1).
In the array CGH analysis using an MCG cancer array-800, frequent copy number gains above 50% were detected in the loci of 1q22, 1q23.1, 1q42.13, 8q24.3 and 16p13.3, and frequent copy number losses above 50% were detected in the locus of 16q23.1 of the 51 pN0 breast cancers. Loci that showed frequent CNAs did not differ between the relapsed and non-relapsed groups (Fig. 1, Table 2).
The average total number of CNAs in the relapsed group was 129, ranging between 23 and 339, with a standard deviation (SD) of 90. The average total number of CNAs in the non-relapsed group was 114, ranging between 3 and 341, with a SD of 81. These averages were not significantly different between these two groups.

Classification of lymph node-negative primary breast cancers based on unsupervised hierarchical cluster analysis
Unsupervised hierarchical cluster analysis including all 51 tumor samples identified three distinct groups according to the CNA pattern of two clusters of genes in the vertical direction (Fig. 2a). Each cluster consisted of genes at the same loci; genes at the loci on chromosomes 4q, 5q, 6q, 9p, 16q, 18p, and Xp belonged only to the first cluster of genes, and genes at the loci on chromosomes 1q and 16p belonged only to the second cluster of genes. Among these clusters, group 1 had the largest number of CNAs, with an average of 194 (SD 101), ranging from 30 to 341. Group 2 had an intermediate number of CNAs, with an average of 113 (SD 64), ranging from 18 to 249. Group 3 had the smallest number of CNAs, with an average of 65 (SD 40), ranging from 4 to 169 (Fig. 2b). The average number of CNAs in group 1 was higher than that in groups 2 (p = 0.026) and 3 (p = 0.00036).
Cases of histological grade 3 were more frequent in groups 1 and 2, than in group 3 (85 and 15%, respectively), and cases of histological grade 1 or 2 were more frequent in group 3 than in groups 1 and 2 (67 and 33%, respectively; p = 0.013).
Among all 51 pN0 breast cancers, the OS rate of group 1 was lower than that of group 2 (p = 0.014) (Fig. 3a), whereas the RFS rate was not (Fig. 3b). In contrast, among the 19 relapsed patients, both the OS and RFS rates of group 1 were lower than those of group 2 (p = 0.0083 and 0.0018, respectively) ( Fig. 3c and d).

Specific copy number alterations correlated with patient outcomes
Among frequent CNAs occurring in groups 1, 2, and 3 shown in Table 4, loss of 5q15 loci was detected only in group 1 tumors (54%, 7 of 13). Cases with the 5q15 loss showed a significantly lower OS rate (p = 0.017) and a lower RFS rate (p = 0.081) than did those without ( Fig. 4a  and b). Interestingly, among the 19 patients who suffered relapse, 5q15 loss was significantly correlated with lower OS (p = 0.018) and RFS (p = 0.0055) (Fig. 4c and d). The gain of 12p13.31 locus was most commonly detected in group 1 tumors (69%, 9 of 13), but was detected in only 1 case among group 2 tumors (4%), and in none of the group 3 tumors (0%). Of the 51 cases, there was no significant difference in OS or RFS curves between the subgroups with and without 12p13.31 gain (p = 0.21 and p = 0.60, respectively). However, among the 19 patients who suffered relapse, both OS and RFS rates of the cases with 12p13.31 gain were significantly lower than those of the cases without (p = 0.012 and 0.0055, respectively) ( Fig. 5a  and b).
The copy number gain of 16p13.3 locus was common in group 2 (74%, 17 of 23) and group 3 (80%, 12 of 15), but was less common in group 1 tumors (23%, 3 of 13). 16p13.3 gain did not affect OS and RFS in any of the 51 cases (p = 0.38 and p = 0.76, respectively), but among the 19 relapsed cases, both OS and RFS rates were significantly higher with 16p13.3 gain than without (p = 0.019 and 0.0027, respectively) ( Fig. 5c and d).

Discussion
Although numerous studies have examined clinicopathological correlation with CNA status [15][16][17][18][19], limited studies have explored the prognostic implication of CNA status in pN0 breast cancer. For instance, the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort data has demonstrated integrative cluster associations with histopathological subtypes, tumor grades, and lymphocyte distributions [25]. However, as no information is available on well characterized clinical data with long term follow-up in the study, they did not directly explore the prognostic implication of CNA status in pN0 breast cancer. In our study, based on the total number of CNAs and on CNAs of specific chromosome loci using array CGH technology, we attempted to identify high-and lowrisk types of pN0 breast cancer. At first, we identified that there were no differences in the total number of CNAs or of CNAs in the specific loci between the relapsed and nonrelapsed group. Therefore, we performed an unsupervised hierarchical cluster analysis based on the array CGH dataset from all 51 cases examined. Fig. 2 a Unsupervised hierarchical cluster analysis of genome copy number profiles measured for 51 pN0 breast cancers. In the horizontal direction, tumor samples are arranged; non-relapsed cases are indicated in light blue, and relapsed cases in red. In the vertical direction, the genes to which the gene copy number was assigned were examined, arranged, and largely classified into two clusters. Gene copy-number gains and losses are indicated by red and green, respectively. A total of 51 tumor samples were classified into three major clusters: groups 1, 2, and 3. b Significant differences in genome copy number alteration patterns among groups 1, 2, and 3. In group 1, gains in 1q23.   Based on this cluster analysis, these 51 cases were classified into three clusters, groups 1 to 3, which corresponded to immunohisochemical subtypes. Group 1 was mostly composed of cases belonging to the triple-negative subtype, group 2 of luminal B-like and HER2-overexpressing subtypes, and group 3 of luminal-A like cases.

A B
Such strong correspondence between CNA pattern and "intrinsic subtype" classification was shown in previous reports [15,26,27]. Hicks et al. proposed three patterns of genome rearrangements in breast cancer, i.e., sawtooth, firestorm, and simplex. They suggested that cases of the firestorm pattern were frequently Grade 2 or 3 and were accompanied by CNAs of chromosomes 6, 8, 11, 17, and 20, and especially by amplifications of CCND1 in 11q and of ERBB2 (HER2) in 17q [26]. In the present study, group 2 was mostly composed of luminal B-like and HER2overexpressing subtypes, and frequently carried 1q, 8q, 16p, 17q gains, which was similar to the firestorm CNA pattern [27]. Hicks et al. showed that the simplex pattern was frequent in grade 1 tumors and was characterized by 16q deletion and 16p duplication, often coupled with 8p loss and duplications of 1q and 8q [26]. These characteristics of the simplex pattern were very similar with those of the present group 3, characterized by luminal A-like subtype, gains of 1q and 16p, and loss of 16q.
In contrast, Natrajan et al. demonstrated that the sawtooth pattern was characterized by high histological grade and basal-like subtype, and numerous CNAs were commonly detected [26]. These CNAs included 5q loss and gain of 1q21.1, 8q24.21-q24.23, and 12p13.31, which agreed with the results for group 1 tumors.
Andre et al. classified breast cancer cases into three groups, i.e., non-negative matrix factorization (NMF) classes I to III, according to data of CGH and cluster analysis [15]. A total of 65% of NMF class I cases were triple negative, and 6p gain, 5q loss, and 15q loss were reported to be common. NMF class II included most of the HER2-overexpressing subtype, and was characterized by 17q gain corresponding to HER2 amplification. In NMF class III, 73% were ER-positive, 97% were HER2negative, and 1q gain and 16q loss were common.
With regard to patient outcome, the OS of group 1 was significantly worse than that of group 2. Unfortunately, we could not identify any significant clinical/biological association with outcome in the non-relapsed groups, but among the relapsed cases, group 1 patients showed shorter OS and RFS than did group 2. Furthermore, we showed that 5q loss, 12p gain, and absence of 16p13.3 gain were correlated with worse patient outcomes in all cases and/or subsets of recurrent cases in pN0 breast cancer patients.
5q15 loss was commonly detected in group 1, and was correlated with shorter OS in all 51 cases and shorter OS and DFS in the 19 relapsed cases. Curtis et al. classified Table 4 Frequent gains and losses in chromosome loci in groups 1, 2, and 3 identified using hierarchical cluster analysis.
2000 breast cancer samples into 10 integrated clusters (IntClust 10) composed mainly of a basal-like subtype that frequently exhibited 5q loss, 8q gain, 10p gain, and 12p gain [16]. These characteristic CNAs in the IntCrust 10 were partly compatible with those in the present group 1.
Curtis et al. also indicated that 5q harbored genes encoding numerous signaling molecules or transcription factors, cell division genes, and 5q deletions that can modulate the coordinate transcriptional control of genomic and chromosomal instability and cell cycle regulation within cancer cells [16].  One of target genes at the 5q15 locus was nuclear receptor subfamily 2, Group F, Member 1 (NR2F1) [28]. NR2F1 provokes a reduction in chemokine CXCL12 expression and enhancement of CXCR4 expression, and it stimulates breast cancer cell migration [29]. On the other hand, NR2F1 also functions as a dormancy gene, so suppression of this gene results in growth of ERpositive MCF-7 cells in vivo [29,30]. Based on these observations, enhanced migration and release from the dormant state in tumor cells by the loss of NR2F1 may be associated with earlier relapse [28].
The 12p13.31 gain was selectively detected in group 1, whereas 16p13.3 gain was only detected in groups 2 and 3. The 12p13 gain was shown to be common in the CNA in basal-like subtype [27]. In contrast, 16p gain was shown to be frequent in luminal subtype, and it was correlated with a good prognosis [16].
One of the target genes at the 12p13.31 locus is TNFRSF1A. This gene encodes a receptor of tumor necrosis factor-α (TNF-α), and their ligand-receptor interaction is conditional for the presentation of cellular growth, invasion, and metastasis. The development of primary cancers and metastases were inhibited in TNFRSF1A-deficient mice [31]. TNFRSF1A expression is reportedly associated with poor prognosis in diffuse large B-cell lymphoma [32]. In addition, an RNA sequence analysis showed that the breast cancer-associated fusion transcript SCNN1A-TNFRSF1A may play a role in the development of breast cancer [33]. Furthermore, Egusquiaguirre et al. have demonstrated that elevated TNFRSF1A levels may predict a subset of breast tumours that are sensitive to STAT3 transcriptional inhibitors [34].
One of the target genes at the 16p13.3 locus is ABCA3, which encodes an ATP-binding cassette (ABC) transporter or a family of transmembrane proteins that can transport a wide variety of substrates across biological membranes in an energy-dependent manner [35]. Negative ABCA3 cytoplasmic immunoreaction or decreased ABCA3 expression was significantly associated with lymph node involvement and worse clinical outcome [36].
All relapsed cases showing 5q15 loss or 12p13.31 gain experienced relapses within 25 months after tumor resection. Therefore, 12p13.31 gain and 5q15 loss were considered markers of pN0 breast cancers showing highly aggressive clinical behavior, with most recurrences being triple-negative breast cancer (TNBC).
Limitations of this study include its small scale and its retrospective case-control design. The number of genes mounted on the array was 800. Nonetheless, we were able to show specific CNAs that were correlated with worse or better patient prognosis using array CGH technology in invasive pN0 breast cancers. The data acquired were compatible with the results of stricter studies using current single nucleotide polymorphism (SNP) arrays, and are considered applicable for the identification of high-risk pN0 breast cancer, after the results are confirmed for larger scale cohorts.

Conclusions
In conclusion, a CGH array analysis of pN0 invasive breast cancers sorted the cases into three distinct clusters according to an unsupervised hierarchical cluster analysis. We were not able to identify CNAs specific to the non-relapsed group that could be applicable for avoiding unnecessary adjuvant chemotherapy. However, we did identify several specific CNAs as prognostic markers, i.e., 5q15 (NR2F1) loss, 12p13.31 (TNFRSF1A) gain, and absence of 16q13.3 gain, in all pN0 cases and in the high-risk group of pN0 cases that showed relapse. These specific CNAs could be potential candidates for prognostic biomarkers in pN0 invasive breast cancer for monitoring occult metastases which cannot be detected by present histologic examination of the lymph nodes.