Identification of novel candidate target genes, including EPHB3, MASP1 and SST at 3q26.2–q29 in squamous cell carcinoma of the lung

Background The underlying genetic alterations for squamous cell carcinoma (SCC) and adenocarcinoma (AC) carcinogenesis are largely unknown. Methods High-resolution array- CGH was performed to identify the differences in the patterns of genomic imbalances between SCC and AC of non-small cell lung cancer (NSCLC). Results On a genome-wide profile, SCCs showed higher frequency of gains than ACs (p = 0.067). More specifically, statistically significant differences were observed across the histologic subtypes for gains at 2q14.2, 3q26.2–q29, 12p13.2–p13.33, and 19p13.3, as well as losses at 3p26.2–p26.3, 16p13.11, and 17p11.2 in SCC, and gains at 7q22.1 and losses at 15q22.2–q25.2 occurred in AC (P < 0.05). The most striking difference between SCC and AC was gains at the 3q26.2–q29, occurring in 86% (19/22) of SCCs, but in only 21% (3/14) of ACs. Many significant genes at the 3q26.2–q29 regions previously linked to a specific histology, such as EVI1,MDS1, PIK3CA and TP73L, were observed in SCC (P < 0.05). In addition, we identified the following possible target genes (> 30% of patients) at 3q26.2–q29: LOC389174 (3q26.2),KCNMB3 (3q26.32),EPHB3 (3q27.1), MASP1 and SST (3q27.3), LPP and FGF12 (3q28), and OPA1,KIAA022,LOC220729, LOC440996,LOC440997, and LOC440998 (3q29), all of which were significantly targeted in SCC (P < 0.05). Among these same genes, high-level amplifications were detected for the gene, EPHB3, at 3q27.1, and MASP1 and SST, at 3q27.3 (18, 18, and 14%, respectively). Quantitative real time PCR demonstrated array CGH detected potential candidate genes that were over expressed in SCCs. Conclusion Using whole-genome array CGH, we have successfully identified significant differences and unique information of chromosomal signatures prevalent between the SCC and AC subtypes of NSCLC. The newly identified candidate target genes may prove to be highly attractive candidate molecular markers for the classification of NSCLC histologic subtypes, and could potentially contribute to the pathogenesis of the squamous cell carcinoma of the lung.


Background
Lung cancer is responsible for the highest cancer-related morbidity and mortality worldwide [1]. Non-small cell lung cancer (NSCLC) comprises approximately 80% of all lung cancers; squamous cell carcinoma (SCC) and adenocarcinoma (AC) are the two most common subtypes of NSCLC [2]. Cumulative information suggests that the SCC and AC subtypes' progress through different carcinogenic pathways [2][3][4], but the genetic aberrations promoting such differences, especially for the molecular difference between two subtypes, remain unclear.
The most prevalent known chromosomal changes in NSCLC include gains/amplifications at 3q, 5p, 7p, and 8q, and losses at 3p, 8p, 9p, 13q, and 17p [5][6][7]. Many significant genes that map to these regions had previously been associated with specific histologies [2][3][4][5]. Gains of 3q, 7p, 12p, and 20q, as well as losses of 2q, 3p, 16p, and 17p, are more frequently detected in SCC, whereas gains of 1q and 6p as well as losses of 9q and 10p are more prevalent in AC [7][8][9][10]. One of the most prevalent and significant differences between SCC and AC, a gain at the chromosome 3q location, has been recognized in several molecular cytogenetic studies [3][4][5]. Emerging data suggests that regions of amplification of 3q have a profound effect on tumor development and house candidate biomarkers of disease progression, response to therapy, and prognosis of SCC [11]. These findings suggest that genes located at these chromosomal regions progress through differing pathogenic pathways, but the genetic aberrations promoting such differences are largely unknown.
Array CGH has been recognized as a successful and valuable tool for evaluation of the whole genome, as well as significant genetic information at the single gene level, and has enabled us to classify different neoplasm's based on characteristic genetic patterns [12]. It has been used extensively to study various human solid tumors including NSCLC [13][14][15]. Although, recurrent genetic alterations in NSCLC have been studied extensively, to our knowledge, only a few studies have been performed to date to correlate the molecular difference between histologic subtypes of NSCLC using high-resolution microarray CGH. Therefore, further investigations are needed to gain additional insight into the clinical significance of recurrent chromosomal alterations between the two subtypes of NSCLC.
In this study, therefore, we performed high-resolution array-CGH to compare the different patterns of genetics alterations, and to identify potential candidate genes that may be associated with phenotypic properties that differentiate early stage SCC from AC.

Tumor Samples and DNA Extraction
Twenty two SCCs and 14 ACs of the lung patients undergoing surgery as a primary treatment, without previous radiation or chemotherapy, were analyzed. This study has been reviewed and approved by the Institutional Review Board of the Chungnam National University Hospital. All cases were reviewed by pathologists to verify the original histopathological diagnosis, depth of tumor, invasion, tumor differentiation and lymph node metastasis. The written informed consent was obtained from each patient according to institutional regulations. The demographic and pathological data, including age, gender and the tumor stage were obtained by a review of the medical records. All of the patients were classified according to the WHO histologic typing of lung carcinomas and the UICC TNM (tumor-node-metastasis) staging system. Some of these samples were previously profiled for copy number variations [15].
Tumor preparations were performed as described previously [7]. DNA isolation was performed following the manufacturer's instructions (Promega, Madison, WI, USA), with some modifications as described before [15,16]: commercial genomic DNA was used as a reference (Human Genomic DNA: Female; Promega Corporation, Madison, WI; Cat. No. G1521).

Contraction of BAC clones mediated array CGH microarray
The characteristics of the MacArray™ Karyo4000 chips (Macrogen, Inc., Seoul) [17][18][19][20] were used in this study consist of 4,046 human BACs, which were applied in duplicate and a resolution of 1 Mbp http://www.macro gen.co.kr. BAC clones were selected from the proprietary BAC library of Macrogen, Inc. All clones were two-end sequenced using an ABI PRISM 3700 DNA Analyzer (Applied Biosystems, Foster City, CA), and their sequences were blasted (using BLAST; http:// blast.ncbi.nlm.nih.gov/Blast.cgi)) and mapped according to their positions, as described in the University of California, Santa Cruz (UCSC) human genome database http://www.genome.ucsc.edu (Build 36, Version Mar. 2006 (hg18)). Locus specificities of chosen clones were confirmed by removing multiple loci-binding clones individually under standard fluorescence in situ hybridization (FISH) [21]. These clones were prepared using the conventional alkaline lysis method to obtain BAC DNA. The arrays were manufactured using an OmniGrid arrayer (GeneMachine, San Carlos, CA) using a 24-pin format. Each BAC clone was represented on an array as triplicate spots and each array was pre-scanned using a GenePix 4200A scanner (Axon Instruments, Foster City, CA) for proper spot morphology.

Array CGH experiment
Array CGH was performed as described previously [15]. Briefly, arrays were pre-hybridized with salmon sperm DNA to block repetitive sequences in the BACs. 500 ng of normal male DNA (reference) and digested tumor DNA (test) were labeled with Cy5-dCTP and Cy3-dCTP, respectively, by randomly primed labeling (Array CGH Genomic Labeling System; Invitrogen, CA, USA). The labeled probe and human Cot-I DNA were mixed and dissolved in hybridization solution. Hybridizations were performed in a sealed chamber for 48 h at 37°C. After hybridization, array slides were scanned on a GenePix 4200A two-color fluorescent scanner (Axon Instruments, Union City, CA, USA); quantification was performed using GenePix software (Axon Instruments).

Analyzed BAC clones
After scanning, the fluorescent intensities of the red and green channels were saved as two TIFF image files and the background was subtracted from these. Log 2 -transformed fluorescence ratios were calculated from background-subtracted median intensity values, and these ratios were used for normalization using intensity normalization methods. To adjust for effects due to variation between the red and green dyes, LOWESS normalization was applied. Then, the ratio of the red/green channel of each clone was calculated and log2 transformed. Spot quality criteria were set as foreground to background greater than 3.0 and standard deviation of triplicates less than 0.2. The breakpoint detection and status assignment of genomic regions is performed by the GLAD software was used [22]. The total number of 3,776 BAC clones was analyzed excluding the 31 missing values and sex chromosomes (238) since female tumor DNA was hybridized with male control DNA to serve as an internal control. A low-level copy number gain was defined as a log 2 ratio > 0.25 and a copy number loss was defined as a log 2 ratio <-0.25. High-level amplification of clones was defined when their intensity ratios were higher than 0.8 in log 2 scale and vice versa for homozygous deletion [23][24][25]. This threshold value was defined empirically as a value 3-fold that of the standard deviation calculated from 30 normal male to normal females in hybridization experiments. Macrogen's MAC viewer v1.6.6, CGH-explorer 2.55, and avadis 3.3 prophetic were used for graphical illustration and image analysis of array CGH data.

Statistical analysis for array CGH
For group comparison, the differences in log2 ratios, as well as the Fisher exact test were used to determine whether there was any significant gain or loss of genomic content within particular cytobands with cancer type. The Fisher exact test utilized two categories normal and abnormal (loss and gain), with the null hypothesis that the relative proportions of each of the two imbalance categories would be expected to be the same in the groups. The counts of abnormal versus normal were summarized by subtype of NSCLC (SCC and AC)for each BAC, providing 2 × 2 tables for analysis. A multiple testing correction (Benjamini-Hochberg false discovery rate (FDR)) applied to correct for the high number of false positive calls. The R 2.2.1 package of the Bioconductor Project http:// www.bioconductor.org was used for detection of the frequency of gain or loss and statistical analysis. Macrogen's MAC viewer v1.6.6, CGH-explorer 2.55, and avadis 3.3 prophetic were used for graphical illustration and image analysis of array CGH data.

Quantitative Real-time PCR analysis
Real-time quantitative PCR analysis was performed using the ABI PRISM 7900HT Sequence Detection system and TaqMan Gene Expression assays according to the manufacturer's instructions (Applied Biosystems, Foster city, CA). In brief, samples (2.5% of the reverse transcription reaction) were amplified using the Universal Master Mix (Applied Biosystems) and cycling conditions of 15 s of denaturating time (95°C) and 1 min of annealing/amplification time (60°C) for 40 cycles after an initial activation step of 10 min at 95°C. All samples were assayed in triplicates. To enable detection of possible contaminating genomic DNA, we analyzed non-reversed transcribed total RNA from all tumors in parallel with the cDNAs. Normalized normal human pooled genomic DNAs (Promega, Madison, WI, USA) were used as reference DNAs. All data analysis was used ArrayAssist ® (Stratagene, La Jolla, USA) and R (Ver.2.7.2). Correlation between BAC chip and Q RT-PCR data was performed by Pearson correlation analysis (P < 0.05).

Array CGH analysis in SCCs and ACs of the lung
One-megabase through put whole genome array-CGH was performed to establish distinct differences in chromosomal copy number changes between the SCC and AC histologic subtypes of NSCLC. Clinicopathological data for the 22 SCCs and 14 ACs patients are summarized in Table  1. All of the NSCLC patients (100%) had copy number aberrations and each patient evidenced numerous copy number changes. On average, 173 clones were gained (range, 14-579), and 136 clones were lost (range, 5-537) per patient. Although the difference in the copy number gain, between the two histologic subtypes was not statistically significant, we found higher frequency of gains in SCC compared to AC (203 vs. 125, respectively, P = 0.067). To visualize both common and specific altered chromosomal regions in each subtype of NSCLC, signal intensity ratios for each spotted BAC clones were calculated and displayed as log2 plots (Figures 1 and 2). Most of the chromosomes in this profile showed multiple seg-mental alterations, including single copy as well as high level gains and losses.

Distinct genomic signatures between SCC and AC histologic subtypes
Across the whole genome, we identified specific genomic alterations between the two subtypes. Gains at 2q, 3q, 12p, and 19p, as well as losses at 3p, 4p, 16p, and 17p were found specific to SCC, whereas gains at 6p and 7q and losses at 4q and 15q were more prevalent in AC. A summary of preferentially gained and lost genomic changes for patients with SCC and AC are listed in Table 2 (> 30% of patients). More specifically, statistically significant differences were observed across the histologic subtypes. We identified significant chromosomal regions between the two subtypes for gains of 2q14.2, 3q26.2-q29, 12p13.2-p13.33, and 19p13.3, as well as losses of 3p26.2-p26.3, 16p13.11, and 17p11 in SCC (P < 0.05), and a gain of 7q22.1 and losses of 15q22.2-q25.2 in AC (P < 0.05). The statistically significant genomic regions preferentially gained and lost by histologic subtypes and the potential target genes are summarized in Table 3 and Figure 3 (see Additional file 1).

Significant copy number differences of target genes at 3q26.2-q29 between SCC and AC
In this array profile, the most striking difference between SCC and AC was gains at 3q26.2-q29, with 19 of 22 patients showing gains in at least part of these chromosomal regions (86%) in SCCs, but only 21% (3/14) were observed in ACs. Several putative cancer-related genes at the 3q26.2-q29 regions were previously linked to specific histologic subtypes; specifically, gains at EVI1 and MDS1 (3q26.2),PIK3CA (3q26.32), and TP73L (3q28) were significantly observed in SCCs (P < 0.05). Additionally, we identified possible candidate target genes in these chromosomal regions, that are not yet known for their involvement in the pathogenesis of SCC (> 30% of patients): namely, LOC389174 (3q26. Among these significantly associated genes at 3q26-q29 in SCC, high-level amplifications were detected for the genes, EVI1 and MDSI at 3q26.2 (23 and 5%, respectively), and EPHB3,MASP1, and SST at 3q27.1-q27.3 (18,18, and 9%, respectively). For this analysis, we defined a high level amplification as log2 signal intensity ratio reaching +0.8. The X-axis represents chromosome number (1 -22) and the Y-axis represents the genome-wide frequencies of gains (> 0.25 of intensity ratio) and losses (< 0.25 of intensity ratio) for each clone of NSCLC are shown As a gain of EVI1 and MDSI at 3q26.2 has been described previously in SCC, we sought to determine whether there exist any correlations between EVI1 or MDSI and the newly amplified genes of EPHB3, MASP1, or SST. Interestingly, co-amplifications were demonstrated for EVI1 and EPHB3 in 18% and for EVI1 and SST in 14%. All of the amplified genes, including significantly associated targets in SCC at the 3q26.2-q29 regions, are summarized in Table 4. Figure 5A shows a weighted frequency (%) diagram for chromosome 3 with high-level amplifications, and Figure 5B represents more detailed profiles at the 3q26.2-q29 regions with the significantly associated genes in SCC. The data discussed in this publication have been deposited in NCBIs Gene Expression Omnibus (GEO) and are accessible through GEO Series accession number GSE 16597 http://www.ncbi.nlm.nih.gov/geo/ query/acc.cgi?acc=GSE16597.

Real Time quantitative PCR analysis
To validate the consequences of gene amplification by array CGH, we performed subsequently real-time quantitative PCR analyses for three potential oncogenes (EPHB3, MASP1, or SST) at 3q27.1-q27.3. Primers for the three genes (EPHB3, MASP1, or SST) are presented in Table 5.
Although the absolute values of selected genes were different between two analyses, significant correlations were observed between two data sets (P < 0.05). The value of array CGH was depicted by linear-ratios and N-value was delineated in real time PCR. Correlation coefficients Signal intensity ratios for each spotted BAC clones of SCCs and ACs displayed as log2 plots Figure 2 Signal intensity ratios for each spotted BAC clones of SCCs and ACs displayed as log2 plots. A total of 4,046 BAC clones were ordered (x-axis) according to the map positions and the chromosomal order from 1pter to 22qter.
between gene expression levels in real time PCR and array CGH analysis for three genes (EPHB3, MASP1, or SST) were 0.694, 0.723 and 0.752, respectively. Figure 6A shows a comparison of mean relative expression levels for EPHB3, MASP1, or SST genes between array CGH and real time PCR results in NSCLCs and three cases with amplification at 3q27.1-q31.3 are shown in Figure 6C

Discussion
Non-small cell lung cancer is the most frequently occurring type of lung cancer, with SCC and AC being the two main subtypes [12]. Since the most common genetic aberrations in NSCLC had already been identified in our previous study [15], we paid attention to explore candidate genes that may be associated with phenotypic properties that differentiate early stages SCC from AC. A whole genomic strategy allowed us to define candidate regions that may contain specific cancer-related genes involved in early stages of SCC and AC.
Gains of 2q, 3q, 12p, and 19p, as well as losses of 3p, 4p, 16p, and 17p, were found to be specific to SCC, whereas gains of 6p and 7q and losses of 4q and 15q were more prevalent in AC. More specifically, statistically significant differences were observed between the two subtypes for gains at 2q14.  Although, these findings are similar to the findings reported in previous studies [26][27][28][29], four chromosomal regions for gains at 2q14.2 or 19p13.3 in SCC, and gains at 7q22.1 or losses at 15q22.2-q25.2 in AC, have not been described as focal regions of lung cancer.
We found 55% of copy number gains at the 2q14 region in SCCs, but only 7% of gains were detected in ACs (P = 0.046). Although, little is known about gains at chromosome 2q14, the genomic region containing this region may harbor potential oncogenes involved in the tumorigenesis of SCC. Future studies will be needed to verify the significances of this outcome.
A gain at 19p13.3 was observed in 41% of SCCs, whereas no gains in ACs (P = 0.008), which harbored the genes SHC2,C19orf19,MADCAM1,C19orf20,CDC34, GZMM, and BSG. Emerging data [30] have described that the C19orf19 gene product is EGFR-associated and phosphorylated at 5 tyrosines in response to EGFR activation and, therefore, represents a new component of the EGFR signaling network. The over expression of the EGFR gene in SCC has been well recognized in several molecular cytogenetic studies [8,31,32]. Although no significant correlation was found regarding the association between EGFR and C19orf19 in this study (data not shown), further investigations will enable us to determine the functional associations of these two genes, and whether these genes or additional genes at 19p13.3 contribute to the genome differentiating SCC from AC.
The present study revealed that gain of 7q22.1 is more specific for lung AC than for SCC (P = 0.023). This region detected as an 89-kbp gene-specific copy number gain, entered at TRRAP,ZKSCAN1,ZNF38 and ZNF3. Loukopoulos et al. [33] recently showed that the frequent ampli- fication of the TRRAP gene in AC from the pancreas, and another study using array-CGH reported high level amplifications at 7q21-q22 in gastric ACs [34]. Taken together, these results and our own results suggest that this region might be affected in tumorigenesis of AC.
Chromosome 15q was detected in 36% of copy number losses with 15q22.2-q25.2 in the AC group, whereas only 5% of loss were detected in SCC (P = 0.033). These regions have not been described as a common change in AC of the lung thus far, but are commonly found in ACs from follicular and clear cell AC. Roque et al. [35] demonstrated that 15q loss was significantly associated with follicular adenocarcinomas, and Okada et al. [36] reported that the loss of heterozygosity (LOH) at 15q was detected in at least 50% of clear cell ACs, indicating that these candidate regions may contain specific cancer related genes involved in AC. Future work will validate of these findings.
In this array survey, the most salient discriminators between SCC and AC were gains at 3q26.2-q29, occurring in 86% of SCCs, whereas only 21% were observed in ACs. Furthermore, high-level amplifications in these regions were more prevalent in SCCs than ACs (6/22 = 27% vs. 1/ 14 = 7%, respectively). Our data also pointed out that 4 of 6 (67%) high level amplifications at 3q26.2-q29 regions were detected in stage (I+IIA) SCCs without lymph node metastatic lesions. It is very interesting to note that the high level amplifications at 3q26.2-q29 were more prevalent in stage (I+IIA) SCCs than more advanced stages.
Heselmeyer et al. [37,38] reported that gain of chromosome 3q could be found in early dysplasic lesions as well as in invasive cervical cancer, but at a reduced frequency in advanced stages of disease. Furthermore, Yen el al. [39] described that the high-level amplifications on 3q25.3qter were all found in stage IB tumors in esophageal squamous cell carcinoma. By combining results of this study with other reports, it is very likely that amplification of genes located on 3q may occur in early stages in the cancer. One possible explanation is that the alterations of The statistically significant genomic regions preferentially gained and lost between SCC and AC of NSCLC patients Figure 3 The statistically significant genomic regions preferentially gained and lost between SCC and AC of NSCLC patients. Significant genomic regions between SCC and AC of NSCLC represent in the x-axis and the percentage of gains (upper panel) and losses (lower panel) expressed regions in each chromosomal regions is illustrated to the y-axis (red bars represent the SCC and green bars represent ACs) .    T        T        T       T        T      T     T      T         S          S         S        T         S                                                                               T       T       T        T       T     T     T     T      T Individual profiles of high copy number changes at 3q Figure 4 Individual profiles of high copy number changes at 3q. A. High-level amplifications on 3q27 for AC from patient 12, B. 3q26.2-q28 for SCC from patient 27. In the intensity ratio profiles, the x-axis represents the map position of corresponding clone and the intensity ratios were assigned to the y-axis. The schematic presentation of cytogenetic bands as well as a map position is shown below the plot.      Independent T-test genes are no longer necessary for maintenance of cancer cells survival [40]. Further studies are needed to confirm this hypothesis.
Previous analyses of the NSCLC genome with low-resolution chromosomal or BAC array-CGH have consistently demonstrated genome differentiation between SCC and AC in the telomeric subregion, 3q26-qter [40][41][42][43]. Several interesting cancer-related genes located in these genomic alteration regions have previously been identified with specific histologies, such as EVI1 (ecotropic viral integration site 1) and MDS1 (myelodysplastic syndrome 1) at 3q26.2,PIK3CA (phosphoinositide-3-kinase, catalytic, alpha polypeptide) at 3q26.32, and TP73L (tumor protein p73-like) at 3q28 region [40,41]. In this array survey, we sought to determine whether there exist additional candidate genes at 3q26-qter regions that drive genome differentiation of SCC from AC subtypes of NSCLC, and we were capable of detecting the following possible target genes previously not assumed to play a pathogenic role in SCC; LOC389174 (3q26.2),KCNMB3 (3q26.32),EPHB3 (3q27.1), MASP1 and SST (3q27.3), LPP and FGF12 (3q28), and OPA1,KIAA022, LOC220729, LOC440996,LOC440997, and LOC440998 (3q29), all of which were significantly targeted in SCC (P < 0.05). These genes have not been described in squamous cell carcinoma of the lung thus far, but are commonly found in other cancers or cancer cell lines [40][41][42][43][44]. Lukashova-v Zangen I el al. [43] described that the overexpression of EPHB3 gene in ependymomas with high proliferation indices was associated with a poor outcome, and Kuraya el al. [45] demonstrated the high expression of MASP1 gene in glioma cell line. More strikingly, among these possible target genes within 3q26-q29 regions, EPHB3 (3q27.1), and MASP1 and SST (3q27.3) showed highlevel amplifications, in more than three patients each in SCC, implicating that these genes may be major potential targets for characterization of NSCLC histologic subtypes. Real time PCR analysis demonstrated over expression level for the three genes (EPHB3, MASP1 and SST) in SCCs compare to ACs. These results were in agreement with array-CGH results.

Conclusion
The high-resolution analysis allowed us to propose novel candidate target genes that may be associated with phenotypic properties that differentiate early stages of SCC from AC. The newly identified candidate genes could be useful biomarkers for the early detection and characterization of NSCLC histological subtypes as well as novel targets for therapeutic interventions of early stages of squamous cell carcinoma of the lung.