Gene promoter and exon DNA methylation changes in colon cancer development – mRNA expression and tumor mutation alterations

Background DNA mutations occur randomly and sporadically in growth-related genes, mostly on cytosines. Demethylation of cytosines may lead to genetic instability through spontaneous deamination. Aims were whole genome methylation and targeted mutation analysis of colorectal cancer (CRC)-related genes and mRNA expression analysis of TP53 pathway genes. Methods Long interspersed nuclear element-1 (LINE-1) BS-PCR followed by pyrosequencing was performed for the estimation of global DNA metlyation levels along the colorectal normal-adenoma-carcinoma sequence. Methyl capture sequencing was done on 6 normal adjacent (NAT), 15 adenomatous (AD) and 9 CRC tissues. Overall quantitative methylation analysis, selection of top hyper/hypomethylated genes, methylation analysis on mutation regions and TP53 pathway gene promoters were performed. Mutations of 12 CRC-related genes (APC, BRAF, CTNNB1, EGFR, FBXW7, KRAS, NRAS, MSH6, PIK3CA, SMAD2, SMAD4, TP53) were evaluated. mRNA expression of TP53 pathway genes was also analyzed. Results According to the LINE-1 methylation results, overall hypomethylation was observed along the normal-adenoma-carcinoma sequence. Within top50 differential methylated regions (DMRs), in AD-N comparison TP73, NGFR, PDGFRA genes were hypermethylated, FMN1, SLC16A7 genes were hypomethylated. In CRC-N comparison DKK2, SDC2, SOX1 genes showed hypermethylation, while ERBB4, CREB5, CNTN1 genes were hypomethylated. In certain mutation hot spot regions significant DNA methylation alterations were detected. The TP53 gene body was addressed by hypermethylation in adenomas. APC, TP53 and KRAS mutations were found in 30, 15, 21% of adenomas, and in 29, 53, 29% of CRCs, respectively. mRNA expression changes were observed in several TP53 pathway genes showing promoter methylation alterations. Conclusions DNA methylation with consecutive phenotypic effect can be observed in a high number of promoter and gene body regions through CRC development. Electronic supplementary material The online version of this article (10.1186/s12885-018-4609-x) contains supplementary material, which is available to authorized users.


Background
Colorectal cancer (CRC) is a clinically important malignant disease due to its high incidence and mortality. According to the GLOBOCAN estimates with 1.4 million new cases and 694.000 deaths annually, CRC is the third most common cancer in the world, after lung and breast cancers [1].
The majority of sporadic CRCs develop according to the normal-adenoma-dysplasia-carcinoma sequence described by Fearon and Vogelstein [2]. The accumulation of genetic and epigenetic alterations in colonic epithelium leads to CRC through early and late precancerous adenoma stages in which promoter DNA methylation changes of certain tumor suppressor genes with consecutive mRNA expression changes are one of the earliest events, often prior to the appearance of mutations in well-known genes such as the adenomatosis polyposis coli gene (APC) [3].
Recently, comprehensive molecular characterization of several human cancers including CRC has been performed and the data integrated into The Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/). Integrative evaluation of genetic, epigenetic and gene expression data of hundreds of CRC and paired normal adjacent tissue (NAT) samples revealed that in addition to the known mutations, epigenetic changes (especially DNA methylation) also play a key role in establishing CRC subtypes with different prognostic and therapeutic phenotypes [4]. The majority (84%) of CRCs were found to be non-hypermutated. Non-hypermutated cancers with distinct colonic or rectal location could be distinguished according to copy-number alteration, DNA methylation or gene expression profiles [4].
BeadChip 27K and 450K arrays and RRBS offer opportunities for analysis of DNA methylation at single nucleotide resolution mainly within CpG islands, however recently developed Epic BeadChip arraysbesides examination of CpG island methylation -allow more extensive study of CpG sites outside of CpG islands, as well. WGBS provides the most widespread whole methylome analysis at single nucleotide resolution, but it is not commonly used due to its high cost. MethylCap-seq is an alternative genome-wide methylation analysis technique to identify novel differentially methylated regions (DMRs) [17,18]. It gives extensive information about both promoter and gene body methylation, though at lower resolution [18]. Unlike BeadChip arrays, it is suitable for investigation of mutation hot spot regions within the gene body. It is known that mutations can cause altered DNA methylation and DNA methylation changes also can lead to development of mutations [19,20]. The mutation rate is higher at methylated CpG sites than non-methylated ones [21,22]. The change of 5-methylcytosine to thymine via spontaneous deamination [23,24] 'which is less effectively repaired by the DNA repair machinery than the cytosine to uracil deamination reaction' [22,23] can cause the increased mutability of cytosines within CpG sites.
The aim of this study was to analyze genome-wide tissue DNA methylation differences along the colorectal normal-adenoma-carcinoma sequence progression, including gene body methylation changes using MethylCap-seq. The second aim was to search for a potential relation between DNA methylation and mutation alterations for 12 CRC-associated genes. The possible effects of the genetic and epigenetic changes on neoplastic phenotype at transcriptome level were also examined.

Methods
Estimation of global methylation levels using long interspersed nuclear element-1 (LINE-1) bisulfite sequencing After DNA isolation from 5 colorectal adenoma, 5 CRC and 10 normal (N) colonic biopsy samples, bisulfite conversion of DNA samples was performed using EZ DNA Methylation-Direct Kit (Zymo Research). For quantification of methylation levels of the LINE-1 retrotransposable element, bisulfite-specific PCR (BS-PCR) was done and 146 bp long LINE-1 PCR products were sequenced on Pyromark Q24 system (Qiagen) using the Qiagen Q24 CpG LINE-1 Kit (Qiagen) according to the manufacturers' instruction.

MethylCap-seq
Global DNA methylation alterations were determined using MethylCap-seq data of 30 colonic tissue samples (15 AD,9 CRC,6 NAT) published previously by our research group [7]. In the previous study [7], only the DNA methylation changes of 160 WNT pathway genes and promoters were evaluated, while in this study whole methylome analysis was performed.
After informed consent of untreated patients, colonic biopsy samples were taken during routine endoscopic intervention. Using parallel formalin-fixed samples from the same site, histological diagnoses were established by experienced pathologists. Tissue samples from untreated CRC patients were also obtained from surgically removed colon or rectal tumors and from NAT that originated from the area farthest available from the tumor. The detailed patient specification has been described earlier [7]. The study was conducted according to the Helsinki declaration and approved by the local ethics committee and government authorities ( Genomic DNA was isolated using High Pure PCR Template Preparation Kit (Roche Applied Science) according to the manufacturer's instructions [16]. The capture of methylated DNA fragments and next generation sequencing were performed as previously described [7]. Briefly, after fragmentation of 3 μg genomic DNA samples, the DNA fragments with methylated CpGs were selected using the Auto MethylCap kit (Diagenode). Purification of the methylated DNA fraction was carried out on QIAquick PCR purification columns (Qiagen). Library preparation was performed using the TruSeq ChIP Sample Preparation kit (Illumina) and clusters were generated using TruSeq SR Cluster Kit v3-cBot-HS (Illumina). Next generation sequencing of the methylated DNA fragments was performed on the HiS-canSQ instrument using TruSeq SBS v3-HS reagents (Illumina,) according to the manufacturer's instructions. Bowtie2 software with default settings was used to map the 100 bp paired and 50 bp unpaired reads to the hg19 human genome reference assembly [25]. The aligned data were processed using the MEDIPS Bioconductor R package [26]. Methylation probabilities (β-values hereafter) were calculated for 100 bp long analysis windows (differentially methylated regions = DMRs), with respect to genome-wide CpG density dependent Poisson distributions.
Promoter DNA methylation and mRNA expression analysis of TP53 signaling pathway genes The list of the TP53 pathway genes (in total 67 gene symbols) was constructed according to the KEGG pathway database. Promoters were defined as described earlier using Encode ChromHMM results [7]. Promoter DNA methylation was determined using methyl capture results of 30 colonic biopsy samples in a 100 base pair analysis window resolution and DMRs were identified between the diagnostic groups. In silico mRNA expression analysis for TP53 signaling pathway genes was performed using microarray data from colonic tissue samples (Affymetrix HGU133Plus2.0; GEO accession numbers: GSE37364 [28], GSE18105 [29], GSE4107 [30], GSE9348 [31], GSE22242 [32], GSE8671 [33]).

Statistical analysis
For MethylCap-seq DNA methylation data analysis, differences between diagnostic groups (9 CRC samples versus 6 NAT samples, 15 AD samples versus 6 NAT samples) were characterized by Δβ-values (the differences of the average β-values of each sample group). The top50 candidate DMRs were selected according to the highest absolute values of Δβ-values. For estimation of global methylation levels using LINE-1 bisulfite sequencing, average methylation percentages of 3 analyzed CpG sites were calculated. For gene expression logFC calculations, the differences between the averages of samples groups were compared. During statistical evaluation of DNA methylation and gene expression data, for paired comparisons of diagnostic groups, Student's t-test and False Discovery Rate (FDR) were applied as the Kolmogorov-Smirnov test resulted in normal distribution and the standard deviation of data were similar. Variance analysis was performed using the non-parametric Kruskal-Wallis test. A p-value of < 0.05 was considered as significant.

Results
Global DNA methylation alterations of the colorectal normal-adenoma-carcinoma sequence Genome-wide decreases in DNA methylation were observed for samples from the adenoma stage of colorectal carcinogenesis. Based on the LINE-1 bisulfite sequencing results, significant global DNA hypomethylation was detected both in CRC (63 ± 6.7%; p = 0.0302) and adenoma samples (65 ± 3.8%; p = 0.0093) compared to normal tissue (73 ± 1.4%). Variance analysis also revealed significantly lower DNA methylation level both in CRC and adenoma than in normal samples (Kruskal-Wallis test: p < 0.00104) (Fig. 1a). MethylCap-seq results showed that decreased DNA methylation appeared principally in 40-60% and 80-100% methylation percentage categories in adenoma and CRC samples compared to NAT controls (Fig. 1b).

Top DMRs in CRC and adenoma samples identified by MethylCap-seq
In CRC samples known CRC-associated genes including heparan sulfate-glucosamine 3-sulfotransferase 2 (HS3ST2), dickkopf WNT signaling pathway inhibitor 2 (DKK2), tissue factor pathway inhibitor 2 (TFPI2) and syndecan 2 (SDC2) occurred in the top50 significantly hypermethylated 100 base paired regions (p < 0.001), showing elevated promoter DNA methylation levels located within CpG islands. Δβ-values representing methylation differences between CRC and NAT samples were in a range from 0.68 to 0.81. More than one third of the top50 hypermethylated DMRs align with weak (9%) or active (9%) promoters according to the Encode ChromHMM data. The majority of the top50 DMRs that were significantly hypomethylated in CRC compared to NAT samples (p < 0.001) could not be assigned to genes, gene promoters, and were located in intergenic regions. Similar to the hypermethylated DMRs, large differences were found for hypomethylated DMRs with Δβ-values between − 0.74 and − 0.65 (Additional file 1: Table S1A, B).
In the AD versus NAT comparison, 94% of the top50 highly methylated DMRs were found in CpG islands including generally known CRC-associated DNA methylation markers like Fli-1 proto-oncogene, ETS transcription factor (FLI1), GATA binding protein 4 (GATA4) and nerve growth factor receptor (NGFR). The top50 significant (p < 0.0001) methylation alterations appeared to be more intensive in adenomas compared to NAT samples (Δβ-values were between 0.86 and 0.79). Considering Encode ChromHMM data, 38% of top50 hypermethylated DMRs were found to be located in promoter regions, and 26% can function as active promoters. Similar to the results in CRC versus NAT comparison, almost all of the top50 DMRs showing significantly decreased DNA methylation in AD could not be annotated (p < 0.0001) with stronger methylation differences than found in CRC versus NAT (Δβ-values between − 0.90 and − 0.74) (Additional file 1: Table S1C, D).

DNA methylation alterations and expression of CRC-associated, frequently mutated genes
The mutation frequencies of a panel consisting 12 CRC-associated genes in CRC and AD samples were measured in our previous multiplex PCR-based CRC mutation hot-spot sequencing study [27]. DNA methylation alterations were also detected in the mutation hot-spot regions of 12 analyzed CRC-associated genes that are frequently mutated, including TP53, APC, KRAS, BRAF and FBXW7. DNA methylation changes on 100 base pair long analysis windows located on mutation hot-spot regions of TP53, APC, KRAS, BRAF and FBXW7 can be seen in Fig. 2. Evaluation of promoter methylation patterns of the 12 frequently mutated genes revealed several significant alterations including hypermethylation of the APC promoter in CRC and AD tissue specimens (p < 0.05; Δβ = 0.27-0.39) (Fig. 3a), hypermethylation of the TP53 promoter in AD (p < 0.001; Δβ = 0.40) and hypomethylation of CTNNB1 (p < 0.05; Δβ between − 0.30 and − 0.45) (Fig. 3a) and SMAD2 (p = 0.024; Δβ = − 0.28) in CRC compared to NAT samples. SMAD4 promoter region was found to be hypomethylated both in AD and CRC biopsy samples (p < 0.05; Δβ between − 0.25 and − 0.32). mRNA expression profiles of the 12 analyzed CRC-associated genes revealed that APC and CTNNB1 could be regulated by DNA methylation during the colorectal carcinogenesis as showing inverse relation between promoter DNA methylation and mRNA expression (Fig. 3b) .

DNA methylation on TP53 signaling pathway gene promoters -Relation with gene expression results
The TP53 pathway genes selected according to the KEGG pathway database were represented with 67 gene symbols. Promoters were defined as described earlier using Encode ChromHMM results [7]. In the CRC versus NAT comparison, 26.9% of TP53 pathway genes (18 from 67 genes) showed significant DNA methylation alterations in their promoter regions with at least a 10% methylation difference (p < 0.05, Δβ ≥ 0.1) ( Table 1). In CRC samples hypermethylated DMRs were found in the promoter regions of 11 genes such as caspase 8 (CASP8), cyclin dependent kinase inhibitor 1A and 2A (CDKN1A and CDKN2A), insulin-like growth factor binding protein 3 (IGFBP3), sestrin 2 (SESN2) and tumor protein p73 (TP73), while seven TP53 pathway genes including G2 and S-phase expressed 1 (GTSE1) showed hypomethylation in their promoters. The box plots of the significant hyper-, and hypomethylated DMRs in TP53 pathway gene promoters showing the highest DNA methylation differences between CRC and NAT samples are represented on Fig. 4 and the box plots of all DMRs fulfilling the criteria can be seen in Additional file 2: Figure S1.
By applying the same criteria, significant promoter DNA methylation changes were observed in 37.3% of TP53 pathway genes (25/67) in AD compared to NAT samples (p < 0.05, Δβ ≥ 0.1) ( Table 1). Fifteen TP53 pathway genes showed elevated promoter methylation in AD samples including CDKN2A, IGFBP3 and TP73, while hypomethylation was detected in the promoter regions of 10 genes such as GTSE1, damage specific DNA binding protein 2 (DDB2) and cyclin dependent kinase 1 (CDK1).

Discussion
The accumulation of DNA methylation alterations accompanied by genetic changes such as mutations and  [27] are also represented. *p < 0.05, CRC = colorectal cancer, Ad = adenoma, N = normal adjacent tissue deletions is known to contribute to the pathogenesis of various cancer types including CRC [3,4]. Comprehensive DNA methylation changes found in precancerous adenoma stages can serve as early detection markers [7,8,11,34]. In this study, global DNA methylation alterations were analyzed along the colorectal normal-adenoma-carcinoma sequence, and top differentially methylated genes/regions were identified using genome-wide MethylCap-seq analysis. The second aim of the study was to find out if there is a potential correlation between DNA methylational and mutational alterations for 12 CRC-associated genes. Furthermore, the possible effects of the genetic and epigenetic changes on TP53 signaling pathway genes at the transcriptome level were also examined.
Global hypomethylation was detected by LINE-1 bisulfite sequencing in CRC samples compared to normal tissue in line with previous data [35][36][37]. Although to a lower extent, global DNA hypomethylation could be detected as early as the AD stage. LINE-1 bisulfite sequencing was used for overall hypomethylation analysis due to its superior advance over MethylCap-seq, which predominantly targets genomic regions with high methylated CpGs density [14].
In this study, we identified 22 novel AD-and/or CRC-associated hypermethylated DMRs (approximately one fourth of top50 hypermethylated DMRs) which could be assigned to genes with previously undescribed methylation changes in cancers including CRC. These markers are principally involved in transcription regulation (e.g. BHLHE23, CUX2, HLX, MAFB, MKX, NKX1-1, and GSC2), transport processes (e.g. SLC24A2, GLRA3, LRRC38, SNAP91), and intracellular signaling (e.g. RGS20, GNAL, NRG3). Among the hypermethylated transcription factors, the expression of H2.0 like homeobox (HLX) was found to be reduced in moderately differentiated CRCs [38]. Moreover, HLX is also considered as a tumor suppressor in hepatocellular carcinoma [39]. The platelet derived growth factor receptor alpha  (PDGFRA) was observed to be hypermethylated in AD compared to normal controls in our study. It was found to be overexpressed in CRC, but -in accordance with promoter hypermethylation detected in our MethylCap-seq studyit was down-regulated in adenomatous polyps [40]. Nevertheless, one fifth of hypermethylated and the majority of hypomethylated DMRs could not be associated with known genes, both in CRC versus NAT and AD versus NAT comparisons. The identified significant top50 methylation changes could be observed in a high proportion (> 80%) of the specimens within a sample group compared to the mutational alterations analyzed in this study. On the basis of the methylation levels of the top50 hypomethylated and hypermethylated markers determined in this study, including the newly identified DMRs, the clear separation of CRC and NAT samples was also apparent for an independent sample set (Additional file 2: Figure S2). Furthermore, a partially overlapping set of samples also showed consistent DNA methylation profiles analyzed by MethylCap-seq and EpiTect Methyl qPCR methods (Additional file 2: Figure S3).
Farkas et al. evaluated DNA methylation changes of genes frequently mutated in CRC using BeadChip450K technology, including 11 of the 12 genes analyzed in our study [49] and reported hypomethylation in CTNNB1 and SMAD2 promoters in CRC compared to NAT samples. Decreased promoter DNA methylation levels of these genes were also observed in our MethylCap-seq analysis together with methylation alterations of other genes such as SMAD4 and TP53 promoters during colorectal carcinogenesis.
In the current project, DNA methylation alterations were also detected in the mutation hot-spot regions of 12 analyzed CRC-associated frequently mutated genes including TP53, APC, KRAS, BRAF, and FBXW7. In accordance with the observation that C -T transitions at CpG sites are the most prevalent mutations in TP53 gene in colon tumors [63], the high mutation rate and methylation changes at mutation hot spot regions of this gene could be detected in our study. DNA methylation can cause mutations in tumor suppressor genes such as TP53, as mutations occur 10-40 times more frequently on the basis of methylated cytosine than of unmethylated cytosine [19,20]. The conversion of 5-methylcytosine to thymine via spontaneous deamination [23,24] or by the APOBEC/ AID system [64] can lead to a high mutational burden of 5-methylcytosine. The 5-methylcytosine can be involved in increased mutability through other mechanisms. According to a recent report, elevated C to G transversion rate in cancer genomes can be associated with 5-hydroxymethylcytosines derived by the oxidation of 5-methylcytosine catalyzed by TET proteins [65].
Hypomethylation was also detected in addition to the elevated methylation levels on certain mutation hot-spots. This is only seemingly contradictory to previous data indicating that the mutation rate is higher on methylated CpG sites than on unmethylated ones [21], as the relative hypomethylation (from high level to intermediate level) and not the absolute loss of DNA methylation was observed on certain mutation hot-spots in our study. It is in conjunction with the results of a recent work describing that among the methylated CpG sites, the rate of mutations (or SNP density) was found to be increased on less methylated CpG sites (20-60%) as compared to high-intermediately and highly methylated CpGs (60-80%; > 80%) [21,66]. Cancer-associated overall hypomethylation of the genome including heterochromatic DNA repeats, retrotransposons, and endogenous retroviral elements also contribute to genome instability [20].
In our analysis, DMRs could be identified on all chromosomes with the relatively largest number of aligned sequence reads on chromosome 17, similar to the MethylCap-seq study performed by Simmer et al. [14]. Next, DNA methylation alterations of TP53 (encoded on chr 17) signaling pathway genes were also investigated. TP53 pathway deregulation frequently occurs through the mutations or deletion of TP53 itself [67]. Outside the mutations of the TP53 gene, this pathway is rarely hit by any other mutations/polymorphisms [68][69][70]. Other mechanisms, such as epigenetic regulation including DNA methylation changes of TP53 pathway genes, also contribute to attenuating the pathway and participate in cancer development [67], and TP53 itself is also thought to regulate cancer-associated genes showing altered methylation patterns [71]. Accordingly, our MethylCap-Seq analysis revealed significant promoter DNA methylation changes in approximately one third of TP53 signaling pathway genes in CRC. Moreover, an even greater proportion of TP53 pathway gene promoters (around 40%) showed altered DNA methylation in AD samples compared to NAT controls. The alterations of the identified TP53 pathway genes with inverse promoter DNA methylation and mRNA expression differences (Table 2) were found to be associated with tumorigenesis in different cancer types including CRC [72][73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88]. Among these markers, in addition to the down-regulation of well known p16 (CDKN2A) [75,76] and p21 (CDKN1A) [74] cyclin dependent kinase inhibitors, BCL2 associated X, apoptosis regulator (BAX) [72], SESN2 [85][86][87], IGFBP3 [84] and cytochrome c, somatic (CYCS) [77] are also thought to exert tumor suppressor functions. Diminished or loss of CYCS protein expression in AD and CRC tissue was found to be correlated with apoptosis resistance [77]. DDB2 damage specific DNA binding protein, which was described to suppress the tumorigenicity in case of ovarian cancer [78] and reduces CRC invasiveness [79], showed promoter hypomethylation and overexpression in AD samples in our study, suggesting its contribution to the inhibition of uncontrolled expansion in the adenoma stage.

Conclusions
Using genome-wide DNA methylation analysis, we identified novel aberrant methylation profiles of genes including HLX, CUX2, MKX, NRG3 and PDGFRA associated with the colorectal adenoma-carcinoma sequence progression. In addition to the genetic changes, DNA methylation alterations were also shown in the mutation hot-spot regions of 12 analyzed, CRC-associated, frequently mutated genes including, TP53, APC, KRAS, BRAF, and FBXW7. Global hypomethylationwhich might be linked to genetic instability -could be detected as early as the adenoma stage.
Our study also revealed that promoter DNA methylation changes influence the mRNA expression level in the case of a significant part of the TP53 pathway genes. Thus epigenetic alterations can also contribute to a whole pathway-related effect on DNA repair and apoptosis in addition to single gene (e.g. TP53) mutations.
In summary, the methyl-capture sequencing technique yielded reproducible, clinically relevant results on the whole genome level which are related to cancer phenotype development through mRNA expression changes and to the cancer genotype through the link of mutation formation.

Additional files
Additional file 1: Additional file 2: Figure S1. Box plots of significant DMRs in TP53 pathway gene promoters between CRC and NAT samples. Figure S2. DNA methylation pattern of top50 hypermethylated DMRs on an independent set of samples. Figure  Availability of data and materials Additional data and materials may be obtained from the corresponding author on reasonable request.