ETV4 plays a role on the primary events during the adenoma-adenocarcinoma progression in colorectal cancer

Background Colorectal cancer (CRC) is one of the most common cancers worldwide; it is the fourth leading cause of death in the world and the third in Brazil. Mutations in the APC, DCC, KRAS and TP53 genes have been associated with the progression of sporadic CRC, occurring at defined pathological stages of the tumor progression and consequently modulating several genes in the corresponding signaling pathways. Therefore, the identification of gene signatures that occur at each stage during the CRC progression is critical and can present an impact on the diagnosis and prognosis of the patient. In this study, our main goal was to determine these signatures, by evaluating the gene expression of paired colorectal adenoma and adenocarcinoma samples to identify novel genetic markers in association to the adenoma-adenocarcinoma stage transition. Methods Ten paired adenoma and adenocarcinoma colorectal samples were subjected to microarray gene expression analysis. In addition, mutations in APC, KRAS and TP53 genes were investigated by DNA sequencing in paired samples of adenoma, adenocarcinoma, normal tissue, and peripheral blood from ten patients. Results Gene expression analysis revealed a signature of 689 differentially expressed genes (DEG) (fold-change> 2, p< 0.05), between the adenoma and adenocarcinoma paired samples analyzed. Gene pathway analysis using the 689 DEG identified important cancer pathways such as remodeling of the extracellular matrix and epithelial-mesenchymal transition. Among these DEG, the ETV4 stood out as one of the most expressed in the adenocarcinoma samples, further confirmed in the adenocarcinoma set of samples from the TCGA database. Subsequent in vitro siRNA assays against ETV4 resulted in the decrease of cell proliferation, colony formation and cell migration in the HT29 and SW480 colorectal cell lines. DNA sequencing analysis revealed KRAS and TP53 gene pathogenic mutations, exclusively in the adenocarcinomas samples. Conclusion Our study identified a set of genes with high potential to be used as biomarkers in CRC, with a special emphasis on the ETV4 gene, which demonstrated involvement in proliferation and migration. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-07857-x.

The frequency of transformation of an advanced adenoma into a carcinoma can range from 2.6 to 5.6% depending on the age of the patients [10]. Although this frequency is apparently low, CRC was responsible for 935.173 deaths worldwide and represented about 1.931.590 new cases in 2020 [1]. Despite the high incidence of CRC, and the extensive molecular profiling of these tumors, there are still no available molecular markers that can predict the progression from adenoma to adenocarcinoma.
The goal of this study was to identify a gene signature able to discriminate between adenoma and adenocarcinoma and potentially identify novel CRC biomarkers. To achieve our goal, we performed gene expression profiling in ten paired colorectal adenomas and adenocarcinomas samples. Among the differentially expressed genes (DEG), the ETV4 gene, a variant transcription factor of the ETS family, showed high upregulated expression in the adenocarcinoma samples as compared to adenoma. The overexpression of this gene was further confirmed in the TCGA databases adenocarcinoma samples. Moreover, we observed that the ETV4 knockdown led to decrease in cell proliferation and migration on CRC cell lines, suggesting its potential role in CRC tumorigenesis. Finally, DNA sequencing analysis of the paired samples revealed mutations in the KRAS and TP53 genes in the adenocarcinomas.

Samples
Paired adenoma, adenocarcinoma and adjacent normal tissue samples from 10 patients were obtained during the surgical procedure at the Clinical Hospital of the Faculty of Medicine, University of São Paulo, Ribeirão Preto (HC-FMRP/USP) under the approval of the HC-FMRP/USP Research Ethics Committee (No 12636/ 2010). Initially, during the colonoscopy exam, adenomas were collected from patients who also had a second lesion with a clinical characteristic of adenocarcinoma and surgical indication. Each adenoma specimen was divided into two parts: one part was sent for histopathological analysis and the other part was frozen immediately after collection in liquid nitrogen. Patients who underwent the surgical procedure had the tumor resected including a safety margin. Fragments of the tumor and from the margin were also removed and divided in half; one-half was placed in formalin for histopathological analysis and the other half was immediately frozen in liquid nitrogen, still in the operating room. Only patients who had histopathological confirmation for the simultaneous presence of adenoma, adenocarcinoma and non-tumor tissue, were selected for molecular analysis. During the collections, all the polyps found were only within adenomas.
Additionally, fifteen adenomas and six adenocarcinoma samples were collected to validate the microarray gene expression profiling by RT-qPCR. The fifteen adenoma samples were isolated from seven males and eight females, with an average age and SD of 66.4±8.683 years (range 48-79). The six adenocarcinoma samples were isolated from five males and one female, with an average age and SD of 74.3±8.500 years (range 62-87). Adenoma samples were collected by colonoscopy and adenocarcinoma samples, by surgery. Using the same methodology used in the matched samples, half of each fragment was sent for histopathological analysis and the other half was frozen in liquid nitrogen for subsequent DNA and RNA isolation.
The inclusion criteria for the selection of samples were based on the clinical and histopathological diagnosis and it included: positive diagnosis for colorectal cancer and presence of adenoma and the agreement to participate in the study, by a signed consent form. Exclusion criteria were: previous treatment (chemo or radiotherapy) and history of Familial Adenomatous Polyposis (FAP), Hereditary Non-Polyposis Syndrome (HNPCC) or inflammatory bowel disease. Clinical data from the patients are shown in Table S1, Additional File 1.

DNA and RNA isolation
Total RNA and DNA were isolated from frozen tissue using TRIzol® Reagent (Invitrogen) according to the manufacturer's instructions (DNAse and RNAse were used, respectively). The concentrations were evaluated in a NanoVue Plus® spectrophotometer (GE Life Sciences). Specifically for the samples used in the microarrays, the RNA was isolated with RNeasy kit (Qiagen) and its quality assessed by the 2100 Bioanalyzer equipment (Agilent Technologies). Only samples that showed a RIN (RNA Integrity Number) greater than or equal to seven, were considered. The remaining 22 samples, used to validate the microarray results by RT-qPCR, had their RNA integrity assessed on 1.5% agarose gel stained with ethidium bromide.
Mutation screening using high resolution melting (HRM) assay Adenoma and adenocarcinoma samples were subjected to mutational screening using the HRM assay for the entire APC and TP53 gene-coding regions. These regions were amplified with specific sense and antisense primers that flanked each intron/exon, as previously described by Miyoshi et al. (1992) [11] and Bastien et al. (2008) [12] respectively. The HRM analysis was performed in the 7500 Fast Real-Time PCR System (Applied Biosystems), using MeltDoctor HRM Master Mix (Applied Biosystems), according to manufacturer's instructions.

DNA sequencing
DNA fragments amplified by PCR-HRM, showing abnormal melting curves by the HRM assay were subjected to direct sequencing in an automatic capillary sequencing system ABI 3500 XL (Applied Biosystems), using BigDye Terminator kit, following the manufacturer's instructions. The same methodology was used to sequence exon 2 of the KRAS gene, using specific sense and antisense primers that flanked each intron/exon, as previously described at Fassina et al. (2010) [13]. The sequencing results were analyzed in Chromas Lite v2.1 [14].
The normal paired tissue of the mutated samples was also sequenced, to investigate whether the mutations were of somatic or germline origin. The sequences obtained were compared to the reference from the Gen-Bank NM_000038.5, GenBank NM_000546.5 and Gen-Bank NM_004985.4, respectively to APC, TP53 and KRAS genes.
Pathogenicity prediction was performed in Sift [15] and Mutation Taster [16] online tools and only mutations that were predicted as damaging in both tools were classified as pathogenic.

Microarray hybridizations
To investigate differential gene expression between colorectal adenoma and adenocarcinoma, the platform Whole Human Genome Microarray Kit 4x44K v2 (G4112F, Agilent Technologies) was employed. Prior to the hybridizations, 200 ng of total RNA from each sample were used for cDNA synthesis. The arrays slides were washed following the manufacturer's guidelines and then scanned using the GenePix 4000B scanner (Axon Instruments) with the GenePix Pro 6.0 software and the hybridization signal intensity of each array was extracted using the Agilent Feature Extraction software 9.5.3.1. (Agilent Technologies).

Microarray data analysis
To evaluate the data quality, we used the array Quality-Metrics R/Bioconductor package [17][18][19]. Normalization was performed by a three-step approach with the R/Bioconductor limma package methods [17,18,20]. Initially it was applied as a cyclic loss method between technical replicates, quantile between samples of adenoma and adenocarcinoma group and quantile between arrays [21]. Then, the detection of differentially expressed genes (DEG) was also performed by the limma package, applying the Benjamini-Hochberg method for p-value correction [20,22]. To evaluate the expression pattern of DEG, Euclidian distance and complete linkage was performed for genes and samples clustering, and then visualized in a heatmap.

Gene expression validation
The reverse transcription reaction was performed using High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) according to the manufacturer's instructions. After synthesis, the cDNA was diluted 1:5 and then used in quantitative PCR.
TaqMan probes (Applied Biosystems and IDT) were used for RT-qPCR validation of the genes previously selected by microarray. To avoid amplification of genomic DNA (gDNA), no-RT negative control and Taqman probes in exon-exon junction were used. The HPRT1 (4326321E, Applied Biosystems) housekeeping gene was chosen as endogenous control. Primers and probes for gene expression were IL-6 (Hs00174131, Applied Biosystems), IL-8 All reactions were performed in an ABI Prism® 7500 Fast Sequence Detection System (Applied Biosystems). The relative expression for each gene was calculated by the 2 -ΔΔCT method [23].

TCGA data analysis
The Hiseq platform gene expression level 3 RNASeqV2 data from normal and tumor samples from Colon Adenocarcinoma (COADnormal -41 and tumor -285) and Rectal Adenocarcinoma (READ -normal -10 and tumor -94) were downloaded from the database The Cancer Genome Atlas (TCGA), on March 22, 2017, through the TCGAbiolinks R/Bioconductor package [24]. Differential expression analysis was performed using EdgeR R/Bioconductor package [25]. We considered DEG absolute values of log2 fold-change> 1 and p-value adjusted by FDR≤0.05.

Gene pathways
Gene pathway analysis was performed using the 689 DEG in the MetaCore from Clarivate Analytics. For the analysis in MetaCore we used the fold change of each gene to obtain the enriched gene pathways.

Culture and siRNA assay
For functional assays, HT29 and SW480 colorectal carcinoma cell lines were used (cell lines kindly provided by Prof. Eloiza Helena Tajara da Silva, from UNESP -University of São Paulo State). The cell lines were cultivated in RPMI (Roswell Park Memorial Institute) 1640 medium (Gibco) supplemented with 10% fetal bovine serum (FBS) and 0.5% penicillin-streptomycin under controlled temperature and humidity conditions (37°C, 5% CO2, 95% humidity). For siRNA inhibition studies, cells were transfected with ETV4 siRNA (siETV4) or negative control (siCTRL) (Sigma-Aldrich, St. Louis, MO) in a final concentration of 30 nM, using Lipofectamine RNAiMAX reagent (Invitrogen, Carlsbad, CA, USA), according to manufacturer's instructions. After 48 h of transfection, cells were collected for functional assays, which were all performed in triplicates.

Cell proliferation assay
Cell proliferation assay was performed using CFSE (5,6carboxyfluorescein diacetate succinimidyl ester) according to the manufacturer's instructions. Cells were transfected (siCTRL and siETV4) and labeled with CFSE simultaneously. A third group of cells with no treatment was used for the cytometer calibration. After 24 h of labeling, cells were evaluated in a FACSCalibur flow cytometer (Becton Dickinson, Franklin Lakes, NJ, USA), and consecutively analyzed every 24 h for a total of 96 h.

Anchorage-dependent colony formation assay
Colorectal cancer cell lines were transfected and cultured in a density of 500 cells/well. After 12 days of culture, the cells were washed with PBS, fixed with 4% paraformaldehyde, and stained with 0.5% crystal violet. The plates were then photographed on ImageQuant LAS 4000 (GE Healthcare) and colonies were counted.

Transwell migration assay
Cell migration assays were performed on 24-well plates with 8 μm transwell inserts (Greiner Bio One). After 48 h of transfection, 1 × 10 5 cells were seeded on top of the insert in 200 μl of serum-free medium. In the bottom of the well, cells were seeded in 600 μl of 10% FBS medium. After 24 h of migration, cells were fixed, stained with 0.5% crystal violet and the non-migrating cells from the top of the insert were cleaned with cotton swabs. The inserts were then photographed on ImageQuant LAS 4000 (GE Healthcare) and cells were manually counted.

Statistical analysis
Statistical analysis was carried out in the GraphPad Prism 6.0 for Windows (GraphPad Software, San Diego, California, USA) [26]. Mann-Whitney's test was applied for comparisons between two-independent groups. Statistical analysis of the proliferation assay was determined by two-way ANOVA followed by Bonferroni multiple comparisons test. P values ≤ 0.05 were considered significant.

DNA sequencing and gene expression analysis of paired samples of colorectal adenoma and adenocarcinoma
DNA sequencing analysis of the ten adenomaadenocarcinoma sample set, revealed seven mutations in the APC gene, six synonymous and one nonsynonymous mutation, all of them germline polymorphisms. Both TP53 and KRAS genes showed three and two somatic damaging mutations (P152L/R273C/R273H and G12A/G2D), in the ten paired samples, respectively. In addition, two polymorphisms were observed in the TP53 gene (Table S2, Additional File 2). Simultaneous mutations co-occurring in the three genes were observed only in three of the ten adenoma/adenocarcinoma samples.
Gene expression analysis from the ten matched colorectal adenoma-adenocarcinoma identified 689 differentially expressed genes (DEG) between these tissue types: 329 genes were upregulated in adenocarcinomas and 360 upregulated in adenomas (Fig. 1a). Unsupervised hierarchical clustering analysis of the 689 DEG was able to delimitate gene clusters specific for each tissue (Fig. 1b).
Although we observed some non-synonymous mutations in the TP53 and KRAS genes, those genes were not differentially expressed in our microarray analyses.
The 689 DEGs were used to evaluate the enrichment of gene pathways potentially related to the adenomaadenocarcinoma transition. Of the ten pathways with the highest number of genes, the pathways of cell adhesion and remodeling, epithelial-mesenchymal transition and the IGF family pathway (related to CRC) stood out.
In order to select genes with a potential role in the adenoma-adenocarcinoma transition process, we applied two filters: first the top 50 most DEG between colorectal adenoma and adenocarcinoma samples (Table 1) were selected. A second filter selected the top 50 genes DEG across all the ten adenocarcinoma-adenoma paired samples (Table 2).
To confirm the gene expression data obtained from microarray analysis, eight genes were selected for validation by RT-qPCR (SIM2, ESM1, SFRP4, IL8, IL6, OSM, ETV4 and RETNLB) on 25 adenomas and 16 adenocarcinomas (the initial 10 adenoma/adenocarcinoma paired samples and additional 15 adenomas and 6 adenocarcinoma samples). The RT-qPCR results were in agreement with the microarray data. In addition, genes presented a very similar expression pattern between the adenoma-adenocarcinoma pairs in both techniques (Fig. 2a-d), reinforcing the robustness of the data.
To further validate our findings in a different cohort using non-neoplastic tissue, we performed an in silico gene expression analysis between normal colorectal tissue and both colon and rectal adenocarcinoma, using data available at TCGA.
Gene expression analyses were performed separately. First, normal colon tissue and colon adenocarcinoma were compared (Fig. 3a). A second analysis compared the normal rectal tissue with the rectal adenocarcinoma (Fig. 3b). In both analyses, ETV4 was overexpressed in adenocarcinoma samples ( Fig.  2a-b). As this gene was also found to be upregulated in our cohort (Table 1), we proceeded with a functional investigation of its role in colorectal tumorigenesis.

ETV4 acts in the proliferation, colony formation and cellular migration
To investigate the possible role of ETV4 in colorectal carcinoma tumorigenesis, HT29 and SW480 colorectal cancer cell lines were used in functional assays. Transient transfection using siRNA oligos efficiently knocked down 80% of the ETV4 gene expression levels in HT29 cell line and 88% in SW480 cell line, as confirmed by RT-qPCR (Supplementary Figure 1, Additional File 3). Cell proliferation rates were analyzed by CFSE labelling and evaluated by flow cytometry every 24 h during 96 h. The results showed a significant reduction of HT29 cell proliferation rates at 48 h and 72 h after transfection with ETV4-siRNA (Fig. 4a). Accordingly, in SW480 cells depleted for ETV4 expression, cell proliferation was significantly reduced after 48 h (Fig. 4b). Downregulation of ETV4 expression in colorectal cell lines did not affect cell apoptosis or viability, (Supplementary Figure 2A Colony formation assays were also performed by counting the number of colonies formed by sparsely cultured cells after 12 days. ETV4 gene knockdown in HT29 and SW480 cells impaired their ability to grow as colonies in 20 and 50%, respectively ( Fig. 5a-b). We also evaluated the effect of ETV4 gene knockdown on the migratory capacity of colorectal tumor cell lines. The migration assay was performed on transwell membranes 24 h after the ETV4 knockdown in the HT29 and SW480 cell lines. As it can be seen in Fig. 5c-d, depletion of ETV4 expression decreased cell migration in both cell lines. Taken together, these results suggest an important role of ETV4 in diverse tumorigenic processes,

Adenoma and adenocarcinoma colorectal gene signatures
In this study, the analysis of gene expression profiles of adenomas and adenocarcinomas by microarrays and signaling pathways analysis revealed many pathways and cellular processes associated with extracellular matrix remodeling, angiogenesis and epithelial-mesenchymal transition, as well as the IGF (Insulin-like growth factor) signaling pathway, which is known to be directly linked to colorectal cancer (Metacore from Clarivate Analytics) (Table S3, Additional File 5). The strategy of comparing adenoma-adenocarcinoma samples from the same patient reduces sample and tumor heterogeneity, increasing the power of the study to generate a potential gene signature for the adenomaadenocarcinoma transition.
Several studies on differentially expressed genes in CRC are found in the literature. However, the different microarray platforms and statistical methods used in these studies hamper the discovery of reliable biomarkers to be used in clinical practice. To overcome this limitation, some research groups [27][28][29][30] applied metaanalysis approaches, comparing different microarray analysis in samples from normal tissue, adenoma and adenocarcinoma or only between normal tissue and adenocarcinoma. Some of the genes described in these meta-analysis studies were also found in our data and are briefly discussed below. FcGBP, was found to be downregulated in our study and in normal-adenomacarcinoma sequence according to Lee and colleagues [31]. FcGBP was also indicated as a prognosis marker in gallbladder cancer [32]. CLCA1 has been described as a marker of the transition from proliferation to differentiation in CRC [33]. CLCA1 decreased expression was also described in serum and CRC tissues, showing an inverse correlation with CRC metastasis and tumor stage [34]. CLCA1 and ADH1C were shown to be downregulated in familial adenomatous polyposis [35]. SLC4A4 associated with proliferation and migration in colon and breast cancer [36]. COL1A1 was overexpressed in tumor tissues from colorectal adenocarcinomas and its silencing significantly inhibited proliferation, migration and invasion, while cell apoptosis was promoted [37]. ZG16 has been associated with stemness and progression in CRC [38] and its expression has been shown to be sequentially reduced from normal tissue to adenoma and to carcinoma [39]. ETV4 and FABP6 were co-expressed in tumor samples and significantly associated with metastasis in CRC [40]. DEFA6 was shown to be associated with overall survival rate and is an independent prognostic marker for CRC [41]. L1TD1 has been described as a good prognosis marker candidate in CRC, but its elevated expression has also been associated with poor prognosis in other cancer types. These distinctive roles are dependent on its interaction partners. Several co-expression partners of L1TD1 already described in CRC have also been observed in our study, such as SPINK4, RETNLB, CLCA1, FcGBP, HEPACAM2, ITLN1 and, DEFA5 [42]. The common genes observed in our study with previous meta-analysis and other studies reinforces the importance of our findings.

ETV4 functional validation
The ETV4 gene (E1AF/PEA3 -ets variant 4) is a transcription factor member of the ETS oncogene family that comprises a conserved amino acid sequence, the ETS domain, the DNA binding site to the ETS oncogenes [43]. Its elevated expression has been described in several types of cancer, such as breast, ovary, prostate, gastric and colorectal [44][45][46][47][48].
Our assays demonstrated that ETV4 silencing in the HT29 and SW480 CRC cell lines reduced proliferation, colony formation and cell migration. Previous    studies have shown the effects of ETV4 silencing in the reduction of cell proliferation, migration and invasion, in both colon and prostate cancer cell lines, but no data on colony formation has been previously associated with ETV4 in CRCs [49,50]. Hence, our results suggest that ETV4 is important for the growth of CRC cells.
The reduction in proliferation demonstrated by Hollenhorst et al (2011) [51] in prostate cancer cell lines after ETV4 inhibition was found in combination with a decreased MYC gene expression, due to the direct regulation of MYC by ETV4. Our findings are in agreement with the reduced proliferation after ETV4 inhibition however, we identified an increase in MYC gene expression in adenocarcinomas as compared to adenomas.
Previous studies have shown that activation of multiple matrix metalloproteinases plays an important role in tumor invasion by degradation of the extracellular matrix in colorectal cancer [52][53][54][55]. MMP1 was identified as a direct or indirect ETV4 target acting on the CRC progression [56]. In a study carried out in nonsmall cell lung cancer [57] the ETV4-MMP1 axis was associated with a poor prognosis. A similar relationship was observed in breast cancer, but for ETV4 and MMP13 [58].
In our samples, only MMP11 (stromelysin-3) was upregulated in adenocarcinoma compared with paired adenoma samples. This metalloproteinase is described in several types of cancer, acting in the proliferation, migration and invasion control [59][60][61]. In CRC, elevated MMP11 expression was associated with poor prognosis and reduced survival in stage II patients [62]. Serum levels of MMP11 were previously shown to be significantly higher in patients with lymph node metastasis and was also identified as an independent prognostic factor for 5-year mortality in CRC [63]. MMP11 upregulation has also been related to lymph node metastasis in non-small cell lung cancer b Volcano plot representing the differentially expressed genes between 10 normal tissue samples and 94 colon adenocarcinomas samples (FDR= 5.544394e-33 and log2 fold change = 5.13). Red dot: ETV4 gene overexpressed in cancer samples Fig. 4 The knockdown of the ETV4 gene led to decrease of cell proliferation in both CRC cell lines: HT29 (a) and SW480 (b). Cells were simultaneously transfected with siRNAs and stained for CFSE, and then plated in 24 well plates. Flow cytometry analysis of CFSE staining were performed every 24 h during 96 h (**p< 0.05; ****p< 0.0001 Two-way ANOVA followed by Bonferroni correction) and colorectal cancer [64,65]. Accordingly, MMP11 was upregulated in all our cases that also had compromised lymph nodes. To our knowledge, there is no information available about the relationship between ETV4 and MMP11, so one might speculate that both genes may be involved in lymph node metastasis in CRC. Indeed, ETV4 is related to the embryonic development of different organs, but it is also closely linked to carcinogenesis, especially in metastasis development [66].
Several studies indicate the activation of MMPs by ETV4 [58,67,68]. However, it is likely that this activation occurs in dependence of expression and/or functional alterations in other genes involved in the MMP/ ETV4 axis.

Conclusions
In summary, this study identified a set of differentially expressed genes in CRC, including FcGBP, CLCA1, ADH1C, COL1A1, ZG16, which could be strong candidates to be used as biomarkers of colorectal adenomaadenocarcinoma progression. Among those genes, ETV4 was further investigated and was shown to act on proliferation and migration of CRC cell lines, indicating that ETV4 could be a robust adenocarcinoma biomarker and a potential target for gene therapy studies in CRC.