Combined use of expression and CGH arrays pinpoints novel candidate genes in Ewing sarcoma family of tumors

Background Ewing sarcoma family of tumors (ESFT), characterized by t(11;22)(q24;q12), is one of the most common tumors of bone in children and young adults. In addition to EWS/FLI1 gene fusion, copy number changes are known to be significant for the underlying neoplastic development of ESFT and for patient outcome. Our genome-wide high-resolution analysis aspired to pinpoint genomic regions of highest interest and possible target genes in these areas. Methods Array comparative genomic hybridization (CGH) and expression arrays were used to screen for copy number alterations and expression changes in ESFT patient samples. A total of 31 ESFT samples were analyzed by aCGH and in 16 patients DNA and RNA level data, created by expression arrays, was integrated. Time of the follow-up of these patients was 5–192 months. Clinical outcome was statistically evaluated by Kaplan-Meier/Logrank methods and RT-PCR was applied on 42 patient samples to study the gene of the highest interest. Results Copy number changes were detected in 87% of the cases. The most recurrent copy number changes were gains at 1q, 2, 8, and 12, and losses at 9p and 16q. Cumulative event free survival (ESFT) and overall survival (OS) were significantly better (P < 0.05) for primary tumors with three or less copy number changes than for tumors with higher number of copy number aberrations. In three samples copy number imbalances were detected in chromosomes 11 and 22 affecting the FLI1 and EWSR1 loci, suggesting that an unbalanced t(11;22) and subsequent duplication of the derivative chromosome harboring fusion gene is a common event in ESFT. Further, amplifications on chromosomes 20 and 22 seen in one patient sample suggest a novel translocation type between EWSR1 and an unidentified fusion partner at 20q. In total 20 novel ESFT associated putative oncogenes and tumor suppressor genes were found in the integration analysis of array CGH and expression data. Quantitative RT-PCR to study the expression levels of the most interesting gene, HDGF, confirmed that its expression was higher than in control samples. However, no association between HDGF expression and patient survival was observed. Conclusion We conclude that array CGH and integration analysis proved to be effective methods to identify chromosome regions and novel target genes involved in the tumorigenesis of ESFT.


Background
The Ewing sarcoma family of tumors (ESFT) is a group of highly aggressive and often metastatic small round cell tumors characterized by specific t(11;22)(q24;q12) chromosomal rearrangements, which create the EWS/FLI1 gene fusion and thereby a chimeric, oncogenic transcription factor [1]. ESFT is one of the most common bone and soft tissue tumors in children and young adults arising generally during the second decade of life [2,3]. The ESFT tumors are divided into four subtypes according to the histopathological description: classical Ewing sarcoma in bones, extraskeletal Ewing sarcoma, peripheral neuroepithelioma (PNET), and Askin's tumor. Most of these ESFT cases manifest defects in the maintenance of genomic stability with subsequent DNA copy number alterations.
Conventional CGH and array CGH studies have shown that 63-84% of ESFT patient samples have copy number changes [4][5][6][7][8][9]. These copy number alterations play a significant role in the tumorigenesis and malignant progression of solid tumors. The diagnosis and clinical management of patients would substantially benefit from identification of these novel chromosomal targets and molecular markers involved in the tumorigenesis of ESFT, since secondary genetic alterations in ESFT have been shown to correlate with patient's outcome. In addition to overall number of chromosomal imbalances [10,11], gains of 1q, 8 and 12 and losses of 9p21. 3 and 16q have been associated with poor clinical outcome [7,[12][13][14]. Rapid development of microarray technology has led to more sophisticated analyses, which can be utilized to find novel tumor specific genetic alterations. Further, numerous studies have demonstrated that integrating genomic data from different sources, e.g. at RNA and DNA level, can enhance the reliability of genetic analysis in understanding tumor progression. Our aim was to identify common regions of gain and loss and to define the influence of copy number alterations on gene expression to identify chromosomal areas and genes involved in malignant progression of Ewing sarcoma. We used high-resolution array-based CGH to screen simultaneously multiple loci for possible copy number imbalances in ESFT patient samples. This approach enables us to detect both largescale and gene-size copy number alterations down to ~35 kb in size. To investigate the impact of copy number imbalances on the gene expression levels of affected genes, we performed also an expression array analysis to combine RNA and DNA level data and validated the most interesting result by quantitative RT-PCR analysis.

Patient samples and clinical data
Fresh frozen samples (stored at -70°C) were collected from the archives of the Laboratory of Oncologic Research, Istituti Ortopedici Rizzoli (IOR), Bologna. A total of 31 tumor specimens of from ESFT patients treated at IOR between years 1992 and 2005 were available for the aCGH study. In order to study ESFT expression profiles, 42 patient samples were collected for RNA extraction. To validate the ESFT diagnosis, the presence of EWS/ FLI or EWS/ERG translocation was confirmed by RT-PCR for all samples with available RNA. Clinical data for 31 samples (Table 1) used in aCGH and in data integration analysis were collected from the patient records at IOR. All patients were treated within controlled prospective trials [15,16]. The mean age of the patients was 20.7 years, ranging from 5 to 41 years and the male-to-female ratio was 22:9 (2.4). Of the 31 samples used in aCGH analysis, 23 were primary tumors, two recurrencies, and six metastatic tumors. The majority of these patients (22/31) were diagnosed with classical Ewing sarcoma, four with soft tissue Ewing sarcoma, three with Askin's tumor and two with PNET. Seven of the patients with primary tumors had metastases at the time of diagnosis. Sixteen tumors had Type 1 (exon 7 of EWS/exon 6 of FLI1) gene fusion, eight had other types of fusion (Type 2: exon 7 of EWS/exon 5 of FLI1 or Type 3: exon 10 of EWS/exon 6 of FLI1), three samples were negative for the most common fusion genes (EWS-FLI1 and EWS-ERG), and in four cases this information was not available. The sample set was handled in a coded fashion and the collected clinical and quality control data of the samples is publicly available in a microarray database at http://www.cangem.org [17]. This study has been reviewed and approved by the Ethical Review Board of Helsinki University Central Hospital.

Nucleic acid isolation
Genomic DNA from 31 samples was extracted using the standard phenol-chloroform method. Prior to extraction, the proportion of tumor cells was verified to exceed 75% in all samples by using hematoxylin and eosin-stained sections. Tissue necrosis was evaluated on the whole tumor mass. In cases with high percentages of necrosis, nucleic acids were isolated from the tissue samples in which viable cells were still present. Reference DNAs, male and female, were extracted from pooled blood samples (4 individuals) obtained from Blood Service, Red Cross, Finland. RNA from 42 ESFT samples was isolated using a TRIzol extraction kit (Invitrogen Ltd., Paisley, UK) according to the manufacturer's instructions. Both high quality genomic DNA and RNA were available for 16 patients after the nucleic acid extraction instead of 42 patients, due to insufficient amount of starting material. DNA and RNA concentrations were measured using a GeneQuant pro spectrophotometer (Amersham Pharmacia, Cambridge, UK), and RNA quality was assessed using Agilent's 2100 Bioanalyzer (Agilent, Palo Alto, CA).

Array CGH hybridization, microarray image and data analysis
Digestion, labeling, hybridization, and data analysis of genomic DNA was performed according to Agilent's protocol version 2.0 for 44K arrays as described previously [18,19]. In brief, the sample and reference DNAs, 7 μg each, were fragmented and 1.0-1.5 μg of the fragmented DNA was labeled by random priming using a BioPrime array labeling kit (Invitrogen, Carlsbad, CA) with Cy3-dUTP and Cy5-dUTP dyes (Perkin-Elmer, Wellesley, MA). Labeled samples were purified, combined, and hybridized for 48 h at 65°C, 10 rpm to Human Genome CGH 44B oligomicroarray slides (Agilent Technologies Santa Clara, CA) against gender matched reference DNAs. Then the arrays were washed and scanned [18]. The array images were analyzed and data was extracted using Agilent's Feature Extraction (FE) Software version 8.1, providing dye normalization (Linear Lowess) and background substraction. The chromosomal imbalances were identified using Agilent's CGH Analytics software version 3.4. The altered chromosomal regions and breakpoints were detected using ADM-2 (threshold 8.0) with 1.0 Mb window size. Patient survival analysis was then performed by Kaplan-Meier and Logrank (Mantel-Cox) methods considering either event-free or overall survival.

Expression array hybridizations
The ESFT RNA samples, 42 cases and control samples, a CD34+ cell line and a pool of normal muscle tissue samples were hybridized to Affymetrix Human Genome U133 Plus 2.0 oligonucleotide microarrays (Affymetrix, Santa Clara, CA) according to the manufacturer's GeneChip ® One-Cycle Target Labeling-protocol. In brief, 5 μg of total RNA was reverse transcribed to cDNA using One-Cycle cDNA Synthesis Kit (Affymetrix). Biotin-labeling of antisense cRNA was carried out using IVT Labeling Kit (Affymetrix). The labeled and fragmented cRNA (15 μg of each) was hybridized for 16 h at 45°C in a hybridization oven 640 (60 rpm). Washing and staining of the arrays with streptavidin-phycoerythin (SAPE) was completed in a Fluidics Station 450 (Affymetrix). The arrays were then scanned using a confocal laser GeneChip Scanner 3000 and images were analyzed using GeneChip Operating Software (GCOS; Affymetrix, Sacramento, CA). The expression measurements were preprocessed using Robust Multi-array Analysis (RMA) for the whole collection of 44 chips (42 ESFT patients and two hypothetical normal samples). While only 16 of these were used in the integration, running the preprocessing for the whole collection (n = 44) provides more accurate estimates of the true expression levels.

Integration of gene copy number and expression data
In order to compare the measurements obtained on Affymetrix and Agilent platforms, the sequences used in the probes were matched to the NCBI36 human genome build, using the BLAST algorithm to provide a unique location for each Affymetrix probe set using the target sequences provided by Affymetrix and each Agilent probe. Multiple matches were combined to provide a single location covering all matches if the resulting sequence length was below 2,5 Mb. Note that the locations do not necessarily match the reference sequences of the NCBI36 genome, since they correspond to the locations of the probe sequences, not RefSeqs. In the joint analysis, each Affymetrix probe set was paired with the closest Agilent probe, measured as the distance between the mean points of the sequences. The Affymetrix probe sets that had no Agilent probes within 375 kb were ignored. Correlation between expression and gene copy number of different patients was measured separately for each gene (identified based on the Affymetrix probe set). Genes with high positive or negative correlation were chosen for further examination. The goal of this process was to detect genes where a copy number change and a change in expression are observed on the same patients. A similar analysis was conducted by first dividing patients into groups according to their copy number status and then testing whether these groups have a significant difference in their expression levels [20]. Here the correlation approach was chosen instead of the testing approach, because it can take into account also small amplification imbalances not detectable with the method described in Section "Array CGH hybridization, microarray image and data analysis". It also takes into account possible higher copy number changes. As a correlation measure we used Spearman's rank correlation, since the copy number data does not follow normal distribution. We used the algorithm of Best [21] to compute the p-value against the correlation being zero, and corrected for multiple testing by computing false discovery rates using the q-value procedure [22]. The correlation was computed only for genes located on chromosome arms where at least 20% of the full patient collection, including also samples not used in the integration analysis, showed copy number aberration, in order to focus on regions where associations would be likely.

Quantitative RT-PCT analysis by TaqMan Low Density Arrays
Pre-designed TaqMan PCR probe and primer sets for HDGF were used: Assay ID Hs00610314-m1 (Applied Biosystems, Foster City, CA, USA). All PCRs were done by using ABI PRISM 7900 Sequence Detection System (Applied Biosystems) as recommended by the supplier. Thermal cycling conditions were: 50°C for 2 min, 95°C for 10 min, 95°C for 15 sec and 60°C for 1 min. Gene expression values were calculated based on the ΔΔCt method [23], in which RNA from CD34+ cells derived from human bone marrow and pooled muscle normal tissues derived from three patients were the designated cali-brators for the analysis of all other samples. CD34+ positive cells and pooled muscle normal tissues were processed in the same way as tumor samples and used as separate calibrators for the RT-PCR experiments. For evaluating the prognostic value of HDGF, we calculated its median expression value, and patients were stratified as "high-expressors" or "low-expressors" relative to the median value. Patient survival analysis was then performed by Kaplan-Meier and Logrank methods considering either event-free survival or overall survival.

Copy number changes
Results from a high-resolution analysis of copy number aberrations in ESFT (n = 31), using Agilent's 44K oligoarray platform and CGH analytics software are shown in Table 2. In all ESFT patient samples, 0-26 aberrations were detected per sample (mean: 7.2) and 27 of the 31 samples showed (87%) copy number changes. All samples without copy number changes (n = 4) were primary tumors. Metastases (mean: 11.8) showed more copy number changes than local recurrencies (mean: 9.5) and primary tumors (mean: 5.8). The sizes of these aberrations ranged from < 60 kb deletions to gains or losses of whole chromosomes. Among primary tumors, the samples with low copy number changes (≤ 3 copy number aberrations) showed a significantly better prognosis with respect to those with a high number of chromosomal alterations (> 3 copy number aberrations), both in terms of event-free and overall survival ( Figure 1). Indeed, only 3/11 patients (27%) with less than three copy number changes developed metastases within 6 years from diagnosis in contrast with 8/10 (80%) of those with a high number of chromosomal alterations (P = 0,03 Fisher's test), indicating how the number of chromosomal alterations may have a highly prognostic significance despite the low number of patients here considered. Recurrent aberrations were gains of 1q (32%), 2 (29%), 8 (67%), and 12 (29%) and losses at 9p (23%) and 16q (32%) as visualized in Figure 2. The prominent deletion in 9p21.3 harboring CDKN2A tumor suppressor gene and microdeletions of these region have been previously described and discussed in a separate report by Savola et al. [18]. The gain of chromosome 8 was the most prominent copy number change in our sample set (21 of 31 cases). Gain of 8q arm (minimal common overlapping area) was present in all samples with chromosome 8 aberration. The minimal common overlapping area of copy number gain in chromosome 1 was 1q22-qter. In chromosome 12 the smallest common region of gain was 12q13.2-q14.1, which harbors two known oncogenes, ERBB3 and CDK4.

Integration of gene copy number and expression data
Array CGH data and expression data were combined for a total of 16 patient samples ( Table 1). Matching of expression microarray probes to the corresponding copy number microarray probes using a 375 kb genomic window yielded 53,145 probe pairs. 10,115 of those located in chromosomal areas where at least 20% of patients showed a copy number aberration (1q, 2q, 8q, 12, and 16q). Several putative ESFT-related genes were pinpointed, differentially expressed due to copy number alteration in these chromosomal locations of highest interest. These novel putative oncogenes and tumor suppressor genes based on our data analysis include 20 genes (by q-value < 0.20), which previously have not been associated with ESFT (Table 3). For a supplementary table with all integration analysis results see Additional file 1.

Microarray analysis and quantitative RT-PCR on HDGF
Array CGH and expression microarray results on showed clear evidence that patients with HDGF gain had higher HDGF expression ( Figure 4A-C, correlation 0.81) than patients without HDGF gain. However, ESFT patients could not be divided unambiguously into two groups (see Figure 4A and 4B) based on this data. To validate HDGF microarray results, the relative expression levels of HDGF were analysed by TaqMan Low Density arrays in all 42 available ESFT patient samples ( Figure 4F). This analysis confirmed that ESFT patient samples express higher levels of HDGF than normal controls. No statistically significant correlation of HDGF expression with poor clinical outcome could be shown ( Figure 4D and 4E), nor correlation with patient gender, age or location could be shown. Clinical data summary of these 42 ESFT patients included in the analysis can be viewed on Additional file 2.

Discussion
In this study, we have performed a comprehensive genome wide array CGH analysis of 31 EFST patient samples. Our oligoarray CGH results, the recurrent gains of 1q, 2, 8, and 12, and losses at 9p and 16q that were present in more than 20% of the patient samples, are in agreement with previous ESFT studies by G-banding, conventional CGH [4][5][6][7][8] and array CGH [9]. Our array CGH results revealed complex large-scale changes in several samples. Gains of DNA sequences were more prevalent Outcome of patients with low copy number changes (≤ 3 copy number aberrations) and high copy number changes (> 3 copy number aberrations) than losses and most of the gains affected whole chromosomes or chromosome arms. Further, our analysis showed that patients with low copy number changes (≤ 3 copy number aberrations) showed a significantly better prognosis than patients with a high number of chromosomal alterations, both in terms of event-free and overall survival.  [12,13,24]. Our results suggest that duplication of the der(22)t(11;22) is a common event in ESFT. Copy number gain of the fusion gene EWS-FLI1 may further increase the expression of this fusion product and possibly impair the prognosis. Amplification or gain of a chimeric fusion gene is relatively infrequent mechanism in both leukemia and solid tumors. However, rare cases of gain or amplification of the derivative chromosomes or episomes carrying the fusion gene have been reported [25][26][27] and gene dosage effect of the fusion gene can improve the tumor growth resulting in more aggres-Chromosomal locations of copy number changes in ESFT patient sample (n = 31) Figure 2 Chromosomal locations of copy number changes in ESFT patient sample (n = 31). The ideogram shows the summary of gains and losses of DNA sequence copy numbers and their frequencies in ESFT tissue samples (n = 31) analyzed by array CGH. Gains (light green) and amplifications (dark green) are shown on the right of each chromosome and losses (red) on the left (number refer to the percentage per band). Chromosomal ideogram was generated using the PROGENETIX software [46].   Correlation of HDGF copy number and expression by microarray analysis and validation of HDGF expression using RT-PCR partner of EWSR1 is at 20q [28]. However, the specific chromosomal region in 20q remained unknown. Based on our results, the translocation partner of EWSR1 on chromosome 20 might reside in the amplification breakpoint, either at 20q11.23 or 20q13.12-q13.2 (Fig. 2E).
Putative translocation partners of EWSR1 are therefore genes assigned to the breakpoints of these amplifications:RPN2, BLCAP, CDH22, SLC13A3, EYA2, NCOA3, Kua-UEV, and NFATC2. Based on literature, the most interesting candidates are EYA2 (located at 20q13.12), which has been found to function as a transcriptional activator in ovarian cancer cells [29], and NFATC2 (located at 20q13.2), which functions in positive regulation of transcription [30]. Both EYA2 and NFATC2 are oriented on the amplification breakpoints so that they are in the correct direction for transcription after the possible fusion event. In addition to the chromosome 20, genes on region 8p are interesting as putative fusion partners, since many of these genes are involved in carcinomas and sarcomas. Indeed the region of 8p11.21-p21.2 was gained in patent sample D154. However, the possible involvement of 8p11.21-p21.2 as a location of the translocation partner for EWSR1 was ruled out since this region was not amplified like EWSR1 was. We would assume that the fusion partners would be amplified on the same scale, since translocation is likely to take place before the amplification of the fusion gene.
According to our integrated analysis of array CGH and expression data including 16 ESFT patient samples, we selected as one of the most interesting putative target genes within the common 1q22-qter gain gene HDGF, which has been reported as a putative prognostic marker for several tumor types, e.g., gastrointestinal stromal tumors (GIST) [31,32], hepatocellular carcinoma [33], non-small-cell lung carcinoma [34,35] and pancreatic ductal carcinoma [36]. HDGF has been shown to stimulate cell proliferation and growth after nuclear translocation [37,38], which makes it a likely target also in ESFT. Furthermore, our preliminary results from an aCGH analysis of ESFT cell lines showed that HDGF was inside the minimal common overlapping area of 1q21.1-q23.1 (Savola et al, unpublished results). Our RT-PCR analysis confirmed that Ewing's sarcoma cells expressed higher levels of HDGF with respect to putative normal controls (CD34 positive cells and normal muscle tissues). However, when we analyzed HDGF expression level correlation with patient survival, no significant association was seen. So HDGF can play a role in the tumorigenesis and tumor progression of EFST, but it shows no prognostic value. However, due to limitations in numbers of patients (n = 42) included in the HDGF expression study, no definitive conclusions of the outcome evaluation of HDGF expression in ESFT can be drawn.
Other interesting target genes pinpointed by integration analysis in 1q include TMEM63A (1q42.12), C1orf107 (1q32.2), HEATR1 (1q43), all relatively unknown genes in their functions and COG2 (1q42.2), gene involved in various Golgi functions [39]. In chromosome 8 genes WDR67 (8q24.13) and GSDMDC1 (8q24.3) locating nearby each other and in chromosome 12 DDX47 (12p13.1) and CACNA1C (12p13.3) [40,41] are interesting targets for further studies. Also potential oncogenes in ESFT at 12q are WSB2 (12q24.23), which takes part in the intracellular signalling cascades and has shown to be a potential biomarker in colorectal cancer [42], PPHLN1 (12q12), which controls cell cycle regulation by modifying expression of cdc7 involved in progression of DNA replication [43,44] and KRT79 (12q13.13), a member of human type II keratin gene family. Previously loss of 16q has been shown to be a sign of poor prognosis in ESFT [7,13]. Our results suggest that the putative target gene within this chromosomal area is HEATR3 (16q12.1) or ANKRD11 (16q24.3), which has been recently identified to interact with p53 and act as a co-activator in the regulatory feedback loop with p53 [45]. Functional studies to confirm these results are warranted.

Conclusion
This study adds new information regarding gene copy number changes and their relation to expression in ESFT providing valuable data for further analysis. In addition, array CGH showed to be efficient in the detection of a putative novel translocation in one patient sample and provided new information about copy number changes of the EWS/FLI1 fusion gene. Therefore we can conclude that array CGH analysis and integrated DNA microarray analysis of global gene expression patterns and gene copy number imbalances is a powerful method to identify novel molecular targets and chromosomal regions of highest interest in ESFT.