Establishing transcriptional regulation in RSCC and LSCC
Differential analysis identified 2495 DEGs in LSCC, with respect to left normal colon tissue, and 2589 DEGs in RSCC, with respect to right normal colon tissue (Fig. 1a). 957 (up) and 975 (down) DEGs are identified to be commonly regulated in both RSCC and LSCC while, 655 and 561 DEGs are “uniquely” regulated in RSCC and LSCC, respectively (Fig. 1b and c).
A common program of tumorigenesis exists between right and left sided colon tumors
Malignant tumor cells are highly plastic and are characterized by alterations in metabolism, adhesion, proliferation and migration, requiring coordinated activity of several signaling pathways and mechanisms. Our results indicate that the large overlap of “commonly” regulated DEGs that exist between RSCC and LSCC fit broadly into these categories and are frequently seen dysregulated in colon cancers (Fig. 2a). For instance, dysregulation of WNT/β-catenin pathway genes affecting proliferative potential of CRC is evident in both, with the upregulation of several WNT pathway genes including AXIN2, WNT2, WNT3, WNT7B, DKK1/4, NKD1/4, TCF7, MYC and NOTUM. NOTUM, a glypican-dependent WNT inhibitor serves as a negative feedback regulator for WNT activation [25], and is associated with the progression of CRC [26]. We identified NOTUM to be significantly associated with OS in patients with both RSCC (Hazard Ratio 95% CI - 0.44 (0.24–0.82), logrank p < 0.01) and LSCC (HR 95% CI – 3.23 (1.27–8.2), p < 0.01) indicating that a higher expression favors LSCC while lower expression favors RSCC (Fig. 2b). Other frequently dysregulated genes including APC, GSK3B were identified as commonly dysregulated (albeit below our fc threshold).
Cellular metabolism is tightly linked with cellular growth and proliferation of tumors. Loss of AMP-activated protein kinase (AMPK1) activity can drive reprogramming of cellular metabolism and is at the center of the network regulating cell growth and proliferation (via TP53). AMPK1 driven-metabolic dis-homeostasis becomes evident by the observed dysregulation of melanoma antigens (MAGEA2/A3/A6) [27], within both these tumor types. The acidic and hypoxic tumor microenvironment also influences survival and proliferative potential. Carbonic anhydrases (CA), metalloenzymes which catalyze reversible hydration of CO2, have been identified as crucial mediators of tumor pH [28]. Notably, several cytosolic CAs (such as CA1/2/4) and water channels (AQP8) are suppressed in both, alluding to the reduced availability of the universal buffer HCO3−. It has been previously postulated that extracellular CAs such as CA9 (upregulated in both tumor types), act to raise the extracellular pH favoring tumor cell growth, proliferation, and survival [29]. Several other markers including FDA approved CRC biomarkers such as CA125 (MUC16) and CEA (carcino-embyonic antigen, CEACAM1/7) are upregulated in both RSCC and LSCC.
Interestingly, several genes whose precise molecular interactions are yet to be completely understood, are among the most highly expressed genes in both proximal and distal tumors including OTOP2 (controlled by wild-type TP53) [30], OTOP3, PYY and PPIAL4; and many with limited supporting evidence, might serve as interesting candidates for future research in colon cancers (Supplementary Table S7). Additionally, several genes discussed herein are regulated across all stages further highlighting their role in the evolution and maintenance of tumors over time, in a side-independent manner (Fig. 2c).
Right-sided colon tumors exhibit altered lipid, bile and xenobiotic metabolism
Liver is largely considered the major organ for biotransformation (chemical detoxification and metabolism). However, there is increasing acknowledgement of extra-hepatic biotransformation (especially in the gastrointestinal (GI) tract) and its association with GI carcinogenesis. Several families of enzymes are associated with various stages of breakdown of carcinogens within the human body including cytochrome P450 (CYP), glutathione S-transferase (GSTA1), and UDP-glucuronosyltransferase (UGT) superfamily [31]. Notably, we identify these gene families to be enriched among gene-sets suppressed in RSCC (Fig. 3a). Particularly, we identify a suppression of enzymes from CYP2C and 4F families (CYP2C8, CYP2C18 and CYP4F12, Fig. 3b). These results are interesting in light of a recent study [32], which identified contrasting results with an upregulation of CYP2C family of enzymes in animal models of CRC. The CYP2C pathway enzymes convert arachidonic acid (AA) into active epoxyeicosatrienoic acids (EETs), while CYP4F family convert AA to hydroxyl EETs, both compounds suggested to promote carcinogenesis in certain contexts. UGT proteins catalyze the glucuronidation reaction, allowing for the utilization and/or detoxification of necessary chemicals [33]. We identify suppression of several UGT1A isoforms in tumor compared to normal, particularly the extra-hepatic isoforms UGT1A10/A7 and A8.
Proximal tumors predominantly exhibit dysregulation of UGT1A hepatic isoforms UGT1A3 and UGT1A9. Several solute carrier transporters particularly associated with drug (SLC25A42, SLC44A4, SLC46A1) and ascorbic acid transport (SLC23A1/A3) are also suppressed in RSCC. Stage specific analysis revealed a unique dysregulation of members from the CYP2C and UGT1A family within proximal tumors, specifically at early stages (T1-T2) (Fig. 3c).
Cancer cells preferentially use aerobic glycolysis to metabolize glucose, over mitochondrial oxidative phosphorylation (OXPHOS) characterized by increased glycolysis and lactate production. Our results suggest a more pronounced shift in metabolism in proximal tumors over distal tumors. The selective upregulation of SLC2A1 (GLUT1), a pivotal rate-limiting element in the transport and uptake of glucose combined with the unique downregulation of several mitochondrial metabolic markers involved in fatty acid degradation and oxidative phosphorylation including G6PC, FABP1, CPT1A, CPT2, ACAT1, ACAA2, ACOX1, EPHX2 and EHHADH further support the more pronounced shift in metabolism away from OXPHOS, within primary proximal tumors, asserting its more aggressive state [34].
HOXB13 and SLC6A4 show opposing regulation trends in RSCC and LSCC
Two genes appear to exhibit opposing regulation trends within RSCC and LSCC – HOXB13 and SLC6A4 (Fig. 4). Nearly 95% of the body’s neurotransmitter-serotonin (5-hydroxytryptamine; 5-HT) is generated by the enterochromaffin cells, catalyzed by tryptophan hydroxylase (TPH1/2) within the intestine. Global loss-of-function studies for TPH1 have indicated an almost complete loss of intestinal 5-HT synthesis, implying that the observed suppression of TPH1 in both RSCC and LSCC indicates a curbed extracellular production of 5-HT [35]. Similarly, suppression of 5-HT receptors (e.g. HTR3E, HTR4) and intracellular enzymes required for breakdown of 5-HT (e.g. MAOA, MAOB) indicates a decreased bioavailability of 5-HT in both LSCC and RSCC. In light of this, it is reasonable to observe a suppression of SLC6A4 (a ligand gated serotonin-selective reuptake transporter (SERT)), required for transport of 5-HT, such as in the case of RSCC. Though no evidence in literature exists for the differential role of SLC6A4 in proximal or distal tumors within humans, we speculate that the observed upregulation of SERT expression within distal tumors (indicative of its increased activity), suggests alternate roles for SLC6A4 and/or mechanisms controlling its expression within LSCC.
On the other hand, HOXB13 is an acknowledged oncogene. Studies geared specifically towards specific tumor location, have identified suppression of HOXB13 within distal tumors [36] and upregulation within proximal tumors [37], consistent with our current analysis. Interestingly however, we also observe an upregulation of PRAC1 and PRAC2 (C17orf93), two genes genomically adjacent to HOXB13, within proximal tumors.
Suppressed immune signaling predominates left-sided colon tumorigenesis
Chemokines are expressed by various cell types, constitutively or under inflammatory conditions. Remarkably, distal tumors exhibit an enrichment of suppressed chemokine signaling, particularly B-cell and TFH markers, important immune infiltrates in colon cancer (Fig. 5). The role of B-cells, and its supporting cell types in immunosurveillance is complex and dichotomous. On one hand, animal models studies suggest participation in proliferation and metastasis by promoting chronic inflammation, and suppressing antitumor responses [38], while on the other hand, promote long term survival leading to increased intratumor densities of tumor infiltrating immune cells suppressing tumorigenesis [39]. MS4A1 (CD20, tumor infiltrating B-cell marker), and BACH2 (a well-known transcriptional regulator of B and TFH cells), two genes previously implicated in contributing to immune landscape differences between RSCC and LSCC [39], are more predominantly suppressed within distal tumors. Particularly interesting is the predominant downregulation of two chemokine signaling axes, within distal tumors (compared to normal) including the homeostatic chemokines CXCL13/CXCR5 (TFH cell markers) and CCL19/CCL21- CCR7 (migration and activation of immune cell types). Several of these markers including MS4A1, CXCR5, and CXCL13 are also suppressed across all stages (Stages 1–4) with respect to normal, further emphasizing their role in sustaining tumor behavior within LSCC (Fig. 5b and c).
In order to better understand the observed suppression of chemokine markers in a larger framework of regulation within distal tumors, we extracted clusters from a distal-specific protein interaction network (see Methods, Fig. 5d). We detected a large cluster of chemokines co-expressed with several G-protein couple receptor signaling and cAMP signaling proteins, including GRM8 (a cell surface marker in CRC), GNG2/4/7, EDN2/3 and ADCY2/5/9 (adenylate cyclase). Two downregulated receptors LPAR1 (Lysophoatidic acid receptor), and CASR (Ca2+ sensing receptor) involved in Ca2+ homeostasis, were also detected within this cluster [40, 41]. Taken together, these results lead us to speculate on a nexus between altered Ca2+ signaling mediated by GPCRs, specifically chemokines, and their subsequent impact on the inflammatory signatures within distal tumors (LSCC). Notably, 8/78 genes within this cluster including LPAR1, GNG4, GNG7, PMCH, GPR18, EDNRA, GPER1 and EDN3, (hypergeometric p < 0.07), are sufficient to distinguish distal tumors, from normal distal colon as identified via recursive SVM classifier (see Methods).
RSCC exhibits pronounced post-transcriptional regulation
Small non-translatable RNAs called miRNAs and several other RNA-binding proteins (RBPs), form an important class of molecules involved in post-transcriptional regulation (PTR). We focused on utilizing two levels of -omics data for analyzing differences in PTR within LSCC and RSCC.
Side-specific control of tumorigenesis by miRNAs
Side-specific differential analysis of 1046 micro-RNAs (miRs) identified 325 differentially regulated miRs in RSCC and 200 miRs in LSCC, compared to their respective normal tissue (see Methods, Supplementary table S8). A large majority of dysregulated miRs (198) are changing in both RSCC and LSCC. Several of the top commonly upregulated miRs are oncogenic-miRs including miR-135b, miR-577, miR-19a, miR-592 with roles in tumor initiation, proliferation/ progression and migration [42, 43]. Likewise, several miRs suppressed within both tumor types, including miR-328 [44], miR-486 [45], have been previously indicated in inhibition of tumor progression in CRC.
Increasing evidence however suggests malleable roles for miRs, with multiple targets, amplifying their inhibitory or stimulatory effects on gene regulation through positive or negative feedback loops in conjunction with other miRs. We established functionally relevant, side-specific miRNA-mRNA clusters (see Methods) in an effort to identify the influence of the differentially regulated miRs on gene expression. Analysis of clusters within RSCC revealed miRs regulating genes in interconnected pathways of cellular metabolism, cell growth and proliferation (Fig. 6a). For instance, uniquely up-regulated miR-23a correlates with several mitochondrial proteins including G6PC and PPARGC1. Several miRs, particularly, miR-181d and miR-576, correlate with cell cycle genes including BCL2 and CCND1. BCL2, a major regulator of mitochondrial apoptosis, has been consistently shown to be down regulated in colon (and cancer) [46]. Control of BCL2 expression via miR-24-2 (strongly upregulated in both proximal and distal tumors), has been previously reported in human embryonic kidney and breast cancer cell lines [47]. Interestingly, several uniquely regulated miRs correlate significantly with (hypermethylated) TWIST1, a primal transcription factor uniquely upregulated within proximal tumors [48], whose activation has been implicated in reverting cells to a non-lineage specific proliferative state.
Only two miRs however, are uniquely regulated within distal tumors– miR-3607 and miR-29a (Fig. 6b). Interestingly, members of miR-29 family of oncomiRs (miR-29c and miR-29a) appeared to correlate with ECM and clock genes within distal tumors including downregulated PER1 (negative regulator of circadian rhythm) [49].
Particularly interesting are clusters conserved within both RSCC and LSCC (Fig. 6c and d). For example, miR-22 and miR-34a, two commonly regulated miRs in CRC, appear to cluster together. These miRs are known to impinge on processes of metabolism, angiogenesis, proliferation, migration, invasion, apoptosis and epithelial-to-mesenchymal transition (EMT) (a primary transformation for metastatic and invasive tumor cells. miR-34a (a tumor suppressor induced by p53 involved in EMT in CRC) [50, 51] correlates with several commonly regulated genes involved in signal transduction and EMT via the WNT and AKT signaling pathways including MAGEA3, GFRA3, EPHA5, ANK3 and TCF7. The uniquely upregulated INHBB, which correlates with miR-34a expression in proximal tumors (Fig. 6c), was also identified to be significantly associated with OS in RSCC (HR 95% CI - 0.34 (0.18–0.65), logrank p < 0.001).
Differences in alternative splicing events mediated by RNA-binding proteins in LSCC and RSCC
Alternative splicing (AS) is an active PTR mechanism during which mRNA is actively rearranged accounting for the observed protein repertoire of complex organisms [17]. Utilizing Percent Splice-In (PSI) values from TCGASpliceSeq (see Methods), we identify 115 sigAS events among DEGs in RSCC and 101 sigAS events among DEGs in LSCC (see Methods). Exon skipping (ES), usage of alternate promoters (AP) and terminators (AT) were detected to be predominant and potent mechanisms for AS contributing to the etiology of colon cancers (Fig. 7a, Supplementary Table S10). Notably, our results indicate that a large proportion of the sigAS events (n = 64) occur in genes commonly dysregulated in both LSCC and RSCC, making alternative splicing a major PTR regulatory mechanism within colon cancers, including genes such as AXIN2 (ES, exon 7), and MXI1 (AP, exon 3) associated with the WNT pathway, and others such as IGF2 (AP, exon5), CXCL12 (AT, exon 5.2), CCL24 (AP, exon 1), and S100A2 (AP exon 3). SULT1A2 (RI, 1.2:1.3) and CALD1 (ES, 8.3:9) exhibit the highest Δmedian PSI values between healthy and tumor tissues, in both left and right. SULT1A2 (suppressed ~ 3 log2 fc in both right and left tumors) is a sulfotransferase liver enzyme involved in detoxification of a variety of endogenous and xenobiotic compounds [52], while CALD1 is a novel target of TEA domain family member 4 involved in cell proliferation and migration (Fig. 7b and c). Missplicing of both these transcripts have been previously detected as events correlated with the etiology of disease [53, 54]. Particularly interesting are sigAS events that occur in a side specific manner, within genes uniquely regulated in either RSCC or LSCC, for instance, CYP4F12 (Δmedian PSI = − 0.19), UGT1A1 (Δmedian PSI = 0.12) SRI (Δmedian PSI = − 0.64) all exhibit significant AS within right tumors.
We identify a total of 76 and 66 RBPs to be differentially regulated in RSCC and LSCC, respectively. A large proportion (47 RBPs) are commonly regulated in both RSCC and LSCC with several enriched for binding among DEGs (adj p < 0.05, see Methods), including RBPs previously discussed in the context of CRC such MSI2, MEX3A, IGF2BP1/3, ELVAL4 [55, 56], and cancers in general, such as RBM47, DKC1, CELF4, ELAVL3 (Fig. 7e, Supplementary Table S11). Downregulation of RBM47 is involved in increased cell migration and invasion, and is indicated to promote EMT and metastasis within CRCs [57]. Notably, RBM47 is also significantly differentially spliced within both distal and proximal tumors, compared to normal tissues (AP, exon 2). However, a significant anti-correlation between its expression and PSI values is observed only within distal tumors (Fig. 7d), implying a possibility of feedback mechanisms controlling RBM47 within distal tumors.
Additionally, we identify significant correlation between the expression of CELF4, RBM20, NOVA1 and PPARGC1A splicing associated RBPs and sigAS events in both proximal and distal tumors (see Methods). AFF2 is however uniquely associated within distal tumors. The resulting correlation network indicated that greater than 50% of sigAS events (66/101-left and 58/115) are correlated with these specific RBPs (adj p < 0.05, Supplementary Figure 3), highlighting a possibly crucial role for them in the observed (many-to-one) regulation of transcripts within colon cancer.
Differences in marker methylation and its association with gene expression, in RSCC and LSCC
Development and progression of colorectal cancer is understood to undergo several genetic and epigenetic changes. Changes in the DNA methylation is one of major epigenetic mechanisms controlling CRC [9]. Differential methylation analysis identified a larger proportion of hypermethylated CpG sites (DMPs) in proximal/RSCC samples; while distal/LSCC exhibited a larger proportion of hypomethylated sites, compared to their controls respectively (see Methods, Fig. 8a). It is interesting to observe the genomic distribution indicated highest number of hypermethylated DMPs within the CpG Islands, while hypomethylation occurs in Open Seas (Fig. 8b). Previously published methylation markers including SEPT9, VIM, GATA4, INA, MAL, WNT (WNT2/2B/3/6/5A/7A, APC2) and CNRIP1 are hypermethylated in both RSCC and LSCC [58, 59], further establishing them as side-agnostic methylation markers of CRC.
We were additionally interested in identifying impact of differential methylation on DEGs and to this extent, extracted significant probe-gene pairs (both anti-correlated and correlated) from both RSCC and LSCC. We find that 33% of the downregulated genes are significantly anti-correlated with at least one hypermethylated probe and 27% with hypomethylated probes within distal tumors. On the other hand, we found a higher fraction of genes being controlled by differential methylation in proximal tumors (~ 40% of the downregulated/hypermethylated genes, and ~ 20% upregulated/hypomethylated genes) indicative of a role for increased hypermethylation in suppressing expression with RSCC, consistent with prior research. Interestingly however, the hypermethylation and expression states of several commonly regulated DEGs (such as OTOP2/3, CA1/2/4, NOTUM) is more obvious in LSCC than RSCC (Supplementary Table S9). Notably, we identified a significant enrichment of gene-probe pairs that exhibited positive correlation (overexpression and hypermethylation in tumors). For instance, WNT5A/2/3/7B all exhibit significant correlation between expression and methylated DMPs in LSCC.
Changes in methylation state of region can be due to gain/loss of site-specific transcription factors [60]. We employed ELMER, to obtain insight into motifs and TFs which may be involved in setting tumor specific DNA methylation patterns within LSCC and RSCC (see Methods). In both these tumor types, we identify the FOSL1 binding motif to be highly ranked for hypomethylated/upregulated loci, indicating a possible gain of FOSL1 (significantly up ~ 3 fc, in both), in a side independent manner. Likewise, downregulated/hypermethylated loci are enriched for the SP1/2/3 binding motifs. TF factors including ISX (suppressed ~ 2.3 fc in both), contain these binding motifs, suggesting an observed loss of these site-specific TFs might dictate de-novo hypermethylation and suppression of its downstream targets, in a side independent manner within colon cancers.