Characterization of genetic rearrangements in esophageal squamous carcinoma cell lines by a combination of M-FISH and array-CGH: further confirmation of some split genomic regions in primary tumors

Background Chromosomal and genomic aberrations are common features of human cancers. However, chromosomal numerical and structural aberrations, breakpoints and disrupted genes have yet to be identified in esophageal squamous cell carcinoma (ESCC). Methods Using multiplex-fluorescence in situ hybridization (M-FISH) and oligo array-based comparative hybridization (array-CGH), we identified aberrations and breakpoints in six ESCC cell lines. Furthermore, we detected recurrent breakpoints in primary tumors by dual-color FISH. Results M-FISH and array-CGH results revealed complex numerical and structural aberrations. Frequent gains occurred at 3q26.33-qter, 5p14.1-p11, 7pter-p12.3, 8q24.13-q24.21, 9q31.1-qter, 11p13-p11, 11q11-q13.4, 17q23.3-qter, 18pter-p11, 19 and 20q13.32-qter. Losses were frequent at 18q21.1-qter. Breakpoints that clustered within 1 or 2 Mb were identified, including 9p21.3, 11q13.3-q13.4, 15q25.3 and 3q28. By dual-color FISH, we observed that several recurrent breakpoint regions in cell lines were also present in ESCC tumors. In particular, breakpoints clustered at 11q13.3-q13.4 were identified in 43.3% (58/134) of ESCC tumors. Both 11q13.3-q13.4 splitting and amplification were significantly correlated with lymph node metastasis (LNM) (P = 0.004 and 0.022) and advanced stages (P = 0.004 and 0.039). Multivariate logistic regression analysis revealed that only 11q13.3-q13.4 splitting was an independent predictor for LNM (P = 0.026). Conclusions The combination of M-FISH and array-CGH helps produce more accurate karyotypes. Our data provide significant, detailed information for appropriate uses of these ESCC cell lines for cytogenetic and molecular biological studies. The aberrations and breakpoints detected in both the cell lines and primary tumors will contribute to identify affected genes involved in the development and progression of ESCC.


Background
Chromosomal and genomic rearrangements are significant features of malignant human tumors. Rearrangements are often associated with structural aberrations, such as translocations, insertions and inversions. They could also result in the copy number alterations (CNAs) [1,2]. Characterizing rearrangements and genes affected by the aberrations and breakpoints might help us to understand tumor development and progression better.
The products and implications of chromosomal rearrangements (e.g., fusion genes, truncated genes, and gene dysregulation by ectopic promoters) have been described in leukemia, lymphoma, sarcomas, and epithelial cancers [3,4]. It was initially difficult to detect chromosomal rearrangements and affected genes in the epithelial cancers, mainly due to the technical difficulty of preparing metaphase spreads from primary epithelial tumors and the karyotypic complexity. Until recently, multiple gene rearrangements and even genomic landscapes which reflect the structural aberrations throughout the genomes have been identified in multiple types of epithelial cancers, including prostate cancer [5,6], breast cancer [7,8], lung cancer [9,10], colorectal cancer [11], gastric cancer [12], head and neck cancer [13], hepatocellular carcinoma [14] and so on.
Recently, it has been reported that recurrent rearrangements could affect genes at the boundaries of CNAs [2,15], thus recurrent breakpoints might be important for screening and identifying frequent unbalanced rearrangements and the involved genes. Multiplex-fluorescence in situ hybridization (M-FISH) [16] and spectral karyotyping (SKY) [17] were designed to replace traditional Gbanding in chromosomal analyses of tumor cells, but the resolution of these techniques is not sufficient to detect small rearrangements. Array-based comparative genomic hybridization (array-CGH) was developed to analyze the CNAs, including genomic gains, losses, amplifications and deletions [18,19]. It was recently demonstrated that array-CGH could be used to identify unbalanced breakpoints of the rearrangements in many types of cancer cells at a potentially higher resolution [20][21][22][23][24]. Array-CGH has also been used, in combination with cytogenetic information, to determine the breakpoints in reciprocal translocations [25].
Esophageal cancer (EC) is a common malignant epithelial cancer worldwide, causing more than 40,000 deaths each year [26]. The most prevalent type of EC is esophageal squamous cell carcinoma (ESCC), and China is among the highest risk areas [26,27]. Recently, our group reported the karyotype of ESCC cell line KYSE180 [28] and KYSE450 [29] by 12-color M-FISH, as well as the karyotype of KYSE410-4 by 6-color M-FISH [30]. CGH [31][32][33][34], SKY and CGH [35], and array-CGH [36][37][38] experiments from other groups have also been performed on ESCC cell lines and primary tumors. These studies have revealed numerical and structural chromosomal aberrations. However, genomic rearrangements, breakpoints and genes that are involved in ESCC remain to be decoded and clarified.
Our study intended to identify candidate recurrent breakpoints which might affect genes at or near the boundaries. In this study, we describe CNAs and unbalanced genetic rearrangements in six ESCC cell lines through a combination of M-FISH and 44K array-CGH techniques. We found recurrent breakpoint regions in the cell lines and breakage of several regions present in primary ESCC tumors, which may contribute to disruption of critical genes.

Cell lines and sample collection
ESCC cell lines KYSE30, KYSE150, KYSE180, KYSE450, KYSE510 and YES2 were kindly provided by Yutaka Shimada (Kyoto University, Japan). KYSE150 and KYSE510 were established from female patients, and KYSE30, KYSE180, KYSE450 and YES2 were from male patients. Each cell line was cultured in RPMI-1640 (Invitrogen, USA) supplemented with 10% fetal calf serum (FCS). ESCC tissue samples were procured from Chinese Academy of Medical Sciences Cancer Hospital. All the samples used in this study were residual specimens collected after diagnosis sampling. And all patients received no treatment before surgery, and signed separate informed consent forms for the sampling and molecular analyses. This study has been approved by the Ethics Committee/ IRB of Cancer Institute (Hospital), PUMC/CAMS.

Metaphase chromosomes and interphase cell nuclei preparations
Metaphase chromosomes from ESCC cell lines and normal peripheral blood lymphocytes were harvested after incubation with 0.04 μg/ml Colcemid (Invitrogen) at 37°C for 1-2 hours, followed by treatment with a hypotonic solution (0.075 mol/L KCl) for 30 minutes and three successive changes of the fixative solution (methanol/acetic acid, 3:1). ESCC tissue samples were cut into small pieces in phosphate-buffered saline (PBS), and the interphase nuclei were then prepared following the procedures described above. Metaphase chromosomes and interphase cell nuclei in suspensions were stored at 4°C overnight and then stored at -20°C until use. The nuclear suspensions were dropped onto clean slides and aged at room temperature for 2-3 days prior to the FISH experiments.
Split regions were detected using dual-color breakapart bacterial artificial chromosome (BAC) DNA clone probes, which were labeled with Green-dUTP and Cy3-dUTP by random priming using BioPrime DNA labeling system (Invitrogen The slides for M-FISH and dual-color break-apart FISH analyses were pretreated with RNase A (100 mg/ ml in 2 x saline sodium citrate [SSC]) and pepsin (50 mg/ml in 0.01 mol/l HCl). The slides were subsequently denatured in 70% formamide/2 x SSC at 73°C-75°C for 3 minutes, quickly cooled with two rinses of 2 x SSC at 4°C, dehydrated in a gradient series of ethanol (75%, 85% and 100%), and air dried. The labeled probes were precipitated, and redissolved in the hybridization solution (50% formamide, 10% dextran sulfate, 1% Tween-20, 2 x SSC), denatured at 75°C for 8 minutes, and quick-chilled on ice for 2 minutes. Hybridization was performed in a humid chamber at 37°C for 24-48 hours. Posthybridization washes were performed in 50% formamide/ 2 x SSC for 15 minutes at 43°C and were performed twice for 3 minutes each in 2 x SSC. The slides were dehydrated in 75%, 85% and 100% ethanol, air dried, counterstained with 40,6-diamidino-2-phenylindole (DAPI) (1 mg/ml) and covered with coverslips.
For 12-color FISH analysis [28], the slides were hybridized twice on metaphase spreads as previously described, which was named two-round FISH. After digital fluorescence image acquisition, coverslips on the slides were removed by dipping in 100% ethanol for 30 min, and washed twice in 100% ethanol for 3 min each time, then air dried, and then the slides could be denatured as the above procedures.
Microscopy and digital image analysis FISH images were captured using a Zeiss Axio fluorescence microscope equipped with a cooled charged-coupled device (CCD) camera (Princeton Instruments, USA) or a JAI M4 Plus CCD camera (Metasystems International, Germany). All of the fluorescent images were captured with individual single-band-pass filters specific for visualizing DAPI, DEAC, Green, Cy3, Alexa 594 and Cy5 fluorochromes. Pseudo-color images were constructed and analyzed using MetaMorph (Universal Imaging Corporation, USA) or Metacyte module of Metafer imaging systems (Metasystems International).
Genomic DNA isolation and oligo array-based comparative genomic hybridization (array-CGH) Genomic DNA from ESCC cell lines was isolated using DNeasy Blood & Tissue Kit (Qiagen, Germany). Genome-wide copy number studies were then performed using an Agilent 44K oligo array platform (Agilent Technologies, USA), with sex-matched normal human DNA (Promega Corporation, USA) used as the reference. Briefly, 1 μg samples of the tested and reference DNA were digested with AluI and RsaI, and differentially labeled with Cy3-dUTP and Cy5-dUTP using Agilent Genomic DNA Enzymatic Labeling Kit Part Number 5190-0449 (Agilent Technologies), respectively. Then Microcon YM-30 (Millipore) was used to clean up the labeled probes. Tested and reference DNA probes were combined and hybridized onto the microarrays enclosed in Agilent SureHyb-enabled hybridization chambers for 40 hours. After hybridization, slides were washed sequentially and scanned with an Agilent DNA Microarray Scanner. Annotations for the probes were based on UCSC hg18 (NCBI Build 36). CNAs and breakpoint data were analyzed via the Agilent Genomic Workbench Software 5.0, set to use the ADM-2 algorithm, an aberration threshold of 5.0 and an absolute average log 2 ratio ≥ 0.5.

Statistical analysis
Statistical analyses were carried out by using the SPSS 17.0 software package. The association between splitting of breakpoint regions and clinico-pathological characteristics were assessed by the χ 2 test, Fisher's exact test or Kruskal-Wallis test. Logistic regression analysis was performed to determine the independent predictors of lymph node metastasis. P values < 0.05 were considered significant.
The detail CNAs of these cell lines were detected by array-CGH, and the profiles of gains and losses are shown in Figure 2 and Additional file 1: Table S1. Our results were compared with the data available from Cancer Cell Line Project on the Wellcome Trust Sanger Institute Cosmic website (http://www.sanger.ac.uk/ genetics/CGP/cosmic). Copy number data of KYSE150, KYSE450 and KYSE510 on the website were analyzed using Affymetrix SNP6.0 arrays. Copy number profiles derived from our Agilent 44K platform are very similar to those from the Affymetrix platform. We then compared CNAs among the six cell lines according to the array-CGH data, and frequent gains and losses in at least two cell lines were summarized in Table 1. More gains were found than losses. The results were combined with the data from other 17 ESCC cell lines available on Cosmic website, including KYSE70, KYSE140, KYSE270, KYSE410, KYSE520, Colo-68N, EC-GI-10, HCE-4, TE-1, TE-5, TE-6, TE-8, TE-9, TE-10, TE-11, TE-12 and TE-15. The gains with high frequencies were shown in Additional file 2: Table S2.

Unbalanced breakpoints
Breakpoints were restricted to the boundaries between two adjacent DNA fragments with significantly distinctive log 2 ratio values, reflecting different copy numbers. Using this scheme, 261 candidate unbalanced breakpoints were identified (Additional file 3: Table S3). Among these candidates, 39 occurred in the centromeric regions, and the other 224 were present on chromosome arms. Fifty-seven of arm breakpoints were localized in the vicinity of fragile sites. Breakpoints on chromosome arms and copy number status of the regions at both sides of the breakpoints were listed in Additional file 3: Table S3. Cell lines were ranked according to the number of breakpoints, and the top three were KYSE30, KYSE510 and YES2, respectively. This tendency was similar to that in M-FISH results.

Chromosomal structural aberrations
M-FISH results of four cell lines (KYSE30, KYSE150, KYSE510 and YES2) as well as that of the previously reported two cell lines (KYSE180 and KYSE450) showed that a total of 156 derivative chromosomes resulted from translocations, most of which were unbalanced; only 12.8% (20/156) were reciprocal. Approximately, 35% of the translocation derivative chromosomes were fused at the centromeric regions. Chromosomes 1, 2, 3, 5, 6,7,8,9,11,12,14,15 and X were frequently rearranged. Combining M-FISH with array-CGH, we further characterized multiple rearrangements present in these cell lines (Table 3). KYSE30 is the cell line with the most complex rearrangements, and array-CGH results have also indicated that much more breakpoints were present in KYSE30 than the other cell lines, which are consistent with M-FISH results.
Genes which might be interrupted by the recurrent breakpoints in each cell line were listed in Table 4. Ten of these common breakpoint regions were localized in the vicinity of fragile sites. Genes in these cell lines with inner breakpoints included CDKN2A, LEPREL1, JAK-MIP3, LIMCH1, CSTF3, ABTB2, CDKN2B-AS1, FHIT and ABI3BP. For these genes, one breakpoint could be detected. Small HDs were also observed inside some genes, resulting in two breakpoints, such as FHIT gene in KYSE450. Other genes flanking or close to the boundaries might also be influenced by the breakpoints.
To determine whether genomic aberrations found in these cell lines are also present in primary tumors, we first tested a small sample of 15 ESCC tumors by dualcolor FISH. This analysis revealed splitting of regions 11q13.3-q13.4, 9p21, 15q25.3 and 3q28, which presented the highest frequency of disruption in the cell lines. Splitting of these regions had occurred in 5, 1, 2 and 3 out of 15 tumors, respectively. We also examined online data of ESCC cell lines. The results showed that both high level amplifications and breakages existed at 67-72 Mb positions in 11q13 ( Figure 3). Multiple breakpoints  are present in most of the cell lines, revealing these positions may be highly rearranged. Due to the highest splitting frequency of 11q13.3-q13.4 in the initial 15 cases, we further expanded the sample pool to further characterize splitting of this region in primary ESCC cases ( Figures 3B and 3C). Splitting frequencies of 11q13. 4 Figure 3A and Additional file 4: Figure S1 showed amplification of the region proximal or distal to the breakpoints. Similarly, most of the splitting-positive ESCC tumors examined by FISH presented focal high-level amplification of the region. The majority of breakpoints between NONSC16D6 and Cancer_1D11 were proximal to the amplicon, while most of the breakpoints between Cancer_1D11 and NONSC2E5 as well as those between NONS2E5 and NONSC15F5 were distal to the amplicon ( Figure 3C and Additional file 5: Table S4).

Correlations between split and amplified regions and clinicopathological characteristics
Clinicopathological parameters of each patient were listed in Additional file 6: Table S5, and the relationships  The distance between two outermost breakpoints of all the different cell lines. c These genes are located at or close to breakpoints in each cell line. " * ": Obvious breakpoints were detected inside of genes. " } " and " † ": Genes at the left and right side of the breakpoint regions, respectively. Genes that are not labeled are located in the breakpoint regions, but positions of the exact breakpoints are not determined. " # ": Genes with an inside homozygous deletion (HD), and thus might also be disrupted.
between regional splitting events and clinicopathological characteristics were summarized in

Discussion
Genomic numerical and structural alterations are common features in ESCC. Our study characterized CNAs, structural aberrations, and recurrent breakpoints in six ESCC cell lines by a combination of M-FISH and array-CGH analyses, which helps provide accurate karyotypes of these cell lines. We further found the correlation between splitting of an amplified region 11q13.3-q13.4 and lymph node metastasis.
Genomic CNAs may influence gene expression through the following mechanisms. A well known mechanism is that gains or losses may result in gene amplifications or deletions, and thus upregulate or downregulate the protein expression [40]. Different situations may occur on genes at the boundaries of gain or loss regions. CNA boundaries inside of the genes usually indicate gene breakage. Gene rearrangements may result from such breakages, leading to the formation of an aberrant gene product [41]. If the CNA boundaries occur in non-coding regions flanking genes, expression may be controlled by proximity to regulatory sequences from other genes. Alternatively, the recurrent breakpoint may indicate loss of a tumor suppressor gene distal to the CNA boundary [42]. Small deletions inside of the genes may result in structural aberrant proteins, truncated proteins, or even loss-of-function proteins. Small amplifications and deletions inside of genes may also indicate gene breakage, and the gene products may also be affected by rearrangements with the partner gene. On the other hand, many recurrent rearrangements occurred at boundaries of the breakpoints, resulting in fusion genes, truncated genes, as well as other structural variants [2]. Therefore, we focused on the breakpoints with CNAs involved in genomic rearrangements and breakpoints mapped to specific sites.
11q13 is an important region that presents various aberrations in many malignancies. Gain of 11q13 has a The P value of each variable which is not significantly correlated with LNM in univariate analysis is indicated with "-" in the multivariate analysis. Multivariate logistic regression analysis is performed using forward procedures. LNM: lymph node metastasis; RR, relative risk; CI, confidence interval; amp: amplification.
The current array-CGH profiling enabled us to set the boundaries of 11q13 amplicons in ESCC cell lines. We observed that multiple breakpoints existed in high level amplification regions involving 11q13.3 were located in 67-72 Mb position in three ESCC cell lines we detected and ten online cell lines, which is similar to the amplification peak in HNSCC [75]. The mechanisms for formation of several amplicons have been well described by a model of breakage-fusion-bridge (BFB) cycle. According to this model, the formation of amplicons is initiated by distal DNA breakages at fragile sites. During DNA replication, a dicentric chromosome with an inverted duplication may be resulted from the sister chromatid fusion (SCF). Breakage-fusion-bridge cycle may continue when another break between two centromeres occurs. The cycle may be then stabilized by a telomere or by translocation [42,98,[100][101][102]. Albertson suggested that amplicon boundaries might also be set by selection for overexpressed genes in the amplicons, or by selection against expression changes of genes outside of amplified regions induced by CNAs [101,103]. 11q13 harbors three fragile sites, FRA11A, FRA11H and FRA11F [104]. FRA11A is a rare fragile site, while FRA11H and FRA11F are common fragile sites. FRA11A is located between RIN1 (11q13.2) and CCND1 (11q13.3) [98]. FRA11H is positioned at 11q13, but the exact location still needs to be characterized. FRA11F is located between the BAC clones of RP11-281H14 and RP11-841F15 in 11q14.2 [42]. Reshmi [98]. They also found the involvement of FRA11H in some OSCC cases with amplifications of genes in 11q13 [42,105]. In the present study, distal boundaries of amplicons in the majority of ESCC cell lines and primary tumors with 11q13 amplification were clustered within 67-72 Mb region of 11q13.3, which may involve FRA11H breakages for these cases. Another breakpoint was observed at NAALAD2 gene in KYSE510, and it was located within FRA11F. In addition, breakpoints proximal of 11q13 amplicons in KYSE180, KYSE510 and five online cell lines were located in FRA11A region in 11q13.3, while the proximal breakpoints in KYSE30 and other online cell lines were distal to FRA11A or in FRA11H. In the tested ESCC tumors, the majority of breakpoints in 11q13.3(1) were proximal to the amplicons, and most of those in 11q13.3(2) and 11q13.4 were distal to the amplicons. Thus, we speculate that initial distal breakages may primarily occur at FRA11H, and the process may involve FRA11F in some cases. FRA11A or FRA11H may contribute to setting amplicon boundaries by promoting subsequent steps of BFB cycle. Concerning the presentation of multiple breakpoint boundaries in some of ESCC cell lines and primary tumors with high-level amplification of 11q13, several cycles of random breakages may be undergone.
In the current study, we have noticed that copy numbers of the regions from centromere to boundaries at initial breaks were higher than those of the regions distal to breakpoints in most of ESCC cell lines with 11q13 amplifications. Gains of proximal regions, losses of distal regions, intrachromosomal or interchromosomal rearrangements of 11q13 have been found in the cell lines or primary tumors of human cancer and demonstrated to be indicators of BFB cycle [98,101]. At the end of BFB cycles, distal breakpoints of 11q13.3-q13.4 amplicon may undergo intrachromosomal rearrangements or translocating to other chromosomes, which may affect genes at distal boundaries through forming intragenic rearrangements or fusing to other genes. Notably, most of 11q13.3-q13.4 splitting cell lines according to our and online array-CGH data showed high-level amplification of 11q13 proximal or distal to the breakpoints in ESCC cell lines and primary tumors. Moreover, amplicons involving intrachromosomal or interchromosomal rearrangements have also been detected. Thus, recurrent breakage at 11q13.3-q13.4 may reflect the following aspects. On one hand, genes between two BACs flanking the regions may be amplified, with proximal gain and gene overexpression. On the other hand, breakages between two BACs and thus rearrangements of genes at the amplicon boundaries may also dysregulate expression of these genes.
The relationship between gain of 11q13 and LNM or prognosis have been analyzed and discussed in several studies. However, contrary opinions still exist. Tada et al. conducted CGH on 36 ESCC specimens, and demonstrated that gain of 11q13 did not occur at a significantly different rate between LNM and non-LNM groups [106]. Genes located in 11q13 were analyzed. Amplification of CTTN was correlated with LNM, while no significant association was found between CCND1 amplification and LNM [76], however, predicting CCND1 amplification using plasma DNA may be an independent prognostic factor in ESCCs [107]. Komatsu et al. found that overexpression of ORAOV1 showed a significant association with LNM and stages. Gain of 11q13.2 was determined to be an independent prognostic factor for predicting poor outcome, and amplification of CPT1A in 11q13.2 was correlated with shorter overall survival in ESCC. Here, we found the correlation between 11q13.3-q13.4 amplification and LNM as well as advanced stages. The relationship between gene status in 11q13 and LNM has also been evaluated in other cancers. Amplification of 11q13 DNA is associated with lymph node involvement in HNSCC [108]. CCND1 amplification and overexpression are significantly associated with LNM and survival in OSCC [109]. Another study confirmed amplifications of 11 genes in 11q13, and found two amplification cores, including core 1 (TPCN2 and MYEOV), and core 2 (from CCND1 to CTTN). Amplification of CTTN (core 2) and/or TPCN2/ MYEOV (core 1) was further demonstrated to be associated with LNM in OSCC [110]. However, Huang et al. reported that there was no correlation between LNM and amplification or expression of the tested genes in 11q13 in OSCC [111]. Fortin et al. also found 11q13 amplifications not appear to be a reliable marker for subclinical LNM prediction in oral and oropharyngeal carcinomas [112]. A study by Xia et al. indicated that amplifications of ORAOV1 and CTTN are indicated to be associated with LNM [113]. In the breast cancer, PPFIA1 is coamplified with CCND1, which is significantly associated with high-grade phenotype but not tumor stage or nodal stage [67].