Molecular pathways undergoing dramatic transcriptomic changes during tumor development in the human colon

Background The malignant transformation of precancerous colorectal lesions involves progressive alterations at both the molecular and morphologic levels, the latter consisting of increases in size and in the degree of cellular atypia. Analyzing preinvasive tumors of different sizes can therefore shed light on the sequence of these alterations. Methods We used a molecular pathway-based approach to analyze transcriptomic profiles of 59 colorectal tumors representing early and late preinvasive stages and the invasive stage of tumorigenesis. Random set analysis was used to identify biological pathways enriched for genes differentially regulated in tumors (compared with 59 samples of normal mucosa). Results Of the 880 canonical pathways we investigated, 112 displayed significant tumor-related upregulation or downregulation at one or more stages of tumorigenesis. This allowed us to distinguish between pathways whose dysregulation is probably necessary throughout tumorigenesis and those whose involvement specifically drives progression from one stage to the next. We were also able to pinpoint specific changes within each gene set that seem to play key roles at each transition. The early preinvasive stage was characterized by cell-cycle checkpoint activation triggered by DNA replication stress and dramatic downregulation of basic transmembrane signaling processes that maintain epithelial/stromal homeostasis in the normal mucosa. In late preinvasive lesions, there was also downregulation of signal transduction pathways (e.g., those mediated by G proteins and nuclear hormone receptors) involved in cell differentiation and upregulation of pathways governing nuclear envelope dynamics and the G2>M transition in the cell cycle. The main features of the invasive stage were activation of the G1>S transition in the cell cycle, upregulated expression of tumor-promoting microenvironmental factors, and profound dysregulation of metabolic pathways (e.g., increased aerobic glycolysis, downregulation of pathways that metabolize drugs and xenobiotics). Conclusions Our analysis revealed specific pathways whose dysregulation might play a role in each transition of the transformation process. This is the first study in which such an approach has been used to gain further insights into colorectal tumorigenesis. Therefore, these data provide a launchpad for further exploration of the molecular characterization of colorectal tumorigenesis using systems biology approaches.


Background
Colon carcinogenesis is a multistep process involving the gradual accumulation of genetic and epigenetic alterations. These changes promote the malignant transformation of precancerous lesions of the colorectal mucosa [1], a process reflected by progressively severe cellular dysplasia and increases in lesion size. At least two-thirds of all colorectal cancers develop from precancerous lesions with adenomatous features [2]. The "serrated" histotype characterized by cells arranged in a saw-toothed pattern [1] is somewhat less common, but in both cases, size is an important indicator of the distance the lesion has travelled on the road toward malignancy. For this reason, post-polypectomy surveillance guidelines vary depending in part on the size of the polyps removed. In fact, individuals with 3 or more adenomas on initial colonoscopy, including 1 or more measuring ≥10 mm, are significantly more likely to present with new lesions at the next colonoscopy [3].
Analysis of precancerous colorectal lesions of different sizes can thus furnish important information on the steps involved in their malignant transformation. During colonoscopy, benign lesions of all sizes are routinely removed to prevent their progression toward cancer, and this provides a valuable source of tissues for molecular studies. Efforts of this type have already identified several genetic and epigenetic changes that seem to occur at the transition from normal mucosa to precancerous lesions. Mutations involving the APC or CTNNB1 gene, for example, are considered early events that fuel epithelial-cell proliferation [4,5]. Gain-of-function mutations in the oncogenes KRAS and BRAF are also frequent findings in early stages of transformation [6]. Additional alterations (genetic and epigenetic) are believed to be necessary for subsequent steps toward invasiveness, such as those identified with recent genome-wide analyses [7,8].
The transcriptomes of colorectal cancers have been intensively investigated with high-throughput, arraybased tools, which furnish quantitative, genome-wide descriptions of the individual gene expression levels associated with different cell phenotypes (e.g., adenoma cells vs. normal epithelial cells) [9][10][11][12]. More recently, other methods of analyzing gene expression data have been developed to gain additional insight into the mechanisms driving the phenotypic differences. One such approach involves the analysis not of single genes but of predefined functional gene sets, that is, groups of genes that are known components of a defined molecular pathway representing a given biological process. The basic aim here is to identify those gene sets (i.e., pathways) that display enrichment for-or overrepresentation of-genes whose expression is substantially altered in the phenotype being investigated.
We have explored several methods for quantitatively analyzing transcriptomic data for pathway enrichment [13][14][15], including gene set enrichment analysis (GSEA) [16], random-set methods (RS) [17], and gene list analysis with prediction accuracy (a method developed by our group) [15]. Although these methods differ substantially from one another, all three are statistically accurate and identify relevant gene sets, and none consistently outperforms the others [14].
Our experience indicates that pathway-based analysis of gene expression data furnishes highly reproducible results that can be useful for dissecting a complex, polygenic disease like colorectal cancer. For instance, we recently used GSEA and RS analysis to identify pathway enrichment in four independent transcriptional data sets representing colorectal cancer and normal mucosa. The results of these analyses displayed substantial overlap: both of the analytical methods used revealed similar dysregulation of 53 pathways in each of the four data sets. These pathways are very likely to play important roles in the pathology of colorectal cancer [13].
In the present study, we used RS analysis to explore a large body of previously collected transcriptomic data on human colorectal tissues, including normal mucosa, preinvasive lesions of various sizes, and colorectal cancers (CRCs). Our aim was to identify biological processes that become dysregulated during the course of colorectal tumorigenesis. Because the preinvasive stages have been far less extensively explored than the cancerous phases of this process, there were no independent sets of transcriptomic data on precancerous lesions that we could use to validate our findings. To overcome this limitation, we used two strategies. First, we re-analyzed all the original data sets with GSEA and compared the results with those obtained with RS. Second, we performed RS analysis of two publicly available sets of data on CRCs and normal colorectal mucosa.

Methods
All data were analyzed in MatLab (MathWorks, Natick, MA) unless otherwise stated.

Data set
The data set analyzed in this study consisted of the transcriptome profiles of a series of 118 human colorectal tissues (details below) analyzed with the GeneChip Human Exon 1.0 ST array (Affymetrix, Santa Clara, CA, USA). Raw microarray data are available in GEO (GSE21962 [18]) and ArrayExpress (E-MTAB-829).
In brief, arrays were analyzed in the Affymetrix Gene-Chip Scanner 3000 7 G. Cell intensities were measured with Affymetrix GeneChip Operating Software, and Affymetrix Expression Console Software was used for quality assessment: probe expression intensity in each tissue sample was subjected to background adjustment and normalization with the Robust Multi-array Analysis algorithm.
The tissues themselves had been prospectively collected during colonoscopy (precancerous lesions) or surgery (cancers). They consisted of 59 tumor specimens, each accompanied by a sample of normal mucosa collected in the same colon segment >2 cm from the lesion. The fragment used for microarray analysis (~20 mg of epithelial tissue) was cut from each specimen immediately after removal, leaving the underlying muscularis mucosae intact, and the remaining tissue was submitted for pathologic analysis. (We used only lesions measuring >1 cm to ensure that our sampling procedure would not interfere with the histologic diagnosis.) All tumors were sporadic lesions with a functional DNA mismatch repair system. As expected, LPLs were more likely to exhibit villous changes (43.5% vs. 36.8% of the SPLs) and highgrade dysplasia (34.8% vs. 10.5% of the SPLs).
For the purposes of the present study, we divided the gene expression data into four subsets representing successive stages of colorectal tumorigenesis: 19 small preinvasive lesions (SPLs) measuring 11-20 mm in diameter, 23 large preinvasive lesions (LPLs) with diameters > 20 mm, and 17 CRCs (Table 1). A fourth set was created with data for all 59 normal mucosal (N) samples. The 20-mm cutoff for SPLs was chosen in part to obtain two similarly sized subgroups (for statistical purposes) and in part because our previous observations [18] suggested such subgroups are likely to present biological differences. All of the preinvasive lesions were adenomas except five, which exhibited serrated histology. These five lesions were included since they did not behave as outliers in Principal Component Analysis (PCA), and their exclusion did not significant alter the data reported in this study.
The study was carried out according to the principles of the Declaration of Helsinki and was approved by the Ethics Committees of the Italian hospitals where the tissues were collected (Istituti Ospitalieri, Cremona, and Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy). Each subject investigated provided written informed consent to collection and analysis of data and publication of the findings.

Gene sets
Our analyses focused on 880 functional gene sets from the CP-C2 collection in the Molecular Signatures Database (MSigDB), version 3.0 [16]. These canonical representations of biological pathways or processes have been compiled by domain experts and curated from several online databases (BioCarta, Gene Arrays, BioScience Corp, KEGG, Reactome, Sigma-Aldrich Pathways, Signal Transduction Knowledge Environment, Signaling Gateway).

Statistical methods
The RS method was used to identify tumor-associated pathway enrichment. In brief, a pathway-level statistic is used to average differential-expression evidence across all genes (e.g., gene-level scores) in a given pathway (gene set C containing n distinct genes). The enrichment of pathway C for differentially expressed genes is then measured by comparing C with other hypothetical gene sets made up of the same number (n) of genes randomly selected from the array. RS analysis can be used with a variety of gene-level scores. In this case, we used the rank of two-sample t-test values of genes in the array [13,14]. The mean and variance of the RS score distribution can be analytically derived, and the induced distribution is approximately Gaussian. This offers an easily computed standardized statistic for measuring pathway enrichment. The RS method has several practical advantages, including high computation efficiency [14], an extremely important feature when large numbers of experiments have to be performed.
For each gene set considered in our analysis, the distribution of the component gene expression levels in the N data subset was independently compared with that of each of the stage-specific tumor subsets (i.e., N vs. SPL, N vs. LPL, and N vs. CRC). In each case, the difference was calculated to quantify tumor-related upregulation or downregulation of the pathway (reflected by positive and negative RS scores, respectively) at that stage of tumorigenesis.
The statistical significance of the RS enrichment score was assessed with non-parametric permutation tests [19]. For this purpose, we computed the nominal pvalue of each score by comparing the actual score with the empirical probability density function under the null hypothesis (no genotype-phenotype association) derived using 1000 permutations of the phenotypic labels (0=N, 1=tumor, i.e., SPL, LPL, or CRC lesions). A p-value cutoff of 0.05 was used to define significant pathway enrichment.
Expression data for genes in the Biocarta cell cycle pathway were also subjected to hierarchical clustering analysis and PCA to confirm the relevance of our results. For the former analysis, a Euclidean distance metric and inner squared distance linkage were used to generate hierarchical trees. We analyzed three multidimensional data sets, each representing normal mucosa and a given stage of tumor, and clustered heat maps were shown. PCA was applied to the entire multidimensional data set representing normal mucosa and tumors of all stages. Each tissue sample was then projected onto the first two principal components to create a 2-dimensional map of the data set.
The validation procedure involved the use of standard GSEA [16], and p-values for the enrichment scores were computed on the basis of 1000 label permutations.

Results and discussion
As shown in Tables 2 and 3, a total of 64 pathways were found to be significantly upregulated (n=23) or downregulated (n=41) in SPLs; 50 were upregulated (n=21) or downregulated (n=29) in LPLs; and 58 were upregulated (n=33) or downregulated (n=25) in the CRCs. The approach we used allows in-depth exploration of each instance of pathway dysregulation to characterize its evolution across the transformation process. Because   this process is progressive, it was not surprising to find significant dysregulation of certain pathways in 2 or even 3 of the tumor stage-specific data sets, but other alterations were more circumscribed (Figure 1). For example, the BIOCARTA CELL CYCLE PATHWAY ( Table 2, row 5) is one of the 23 gene sets that displayed significant upregulation only in the CRCs. This gene set comprises 22 genes (32 RefSeqs) encoding cyclins, cyclindependent kinases (CDK), cyclin-dependent kinase inhibitors (CDKI), and transcription factors, including E2F1, whose activation governs the G1-to-S phase transition of the cell cycle. The tumor suppressor RB1 (retinoblastoma protein) negatively regulates cell cycling by complexing with E2F1, and this effect is reversed by the phosphorylation of RB1 by cyclin D/CDK4, cyclin D/ CDK6, and cyclin E/CDK2, which releases E2F1 from this complex and allows cell cycling to resume. For this reason, specific inhibitors of the cyclin/CDK complexes, such as p15 (CDKN2B), p16 (CDKN2A), p21 (CDKN1A), and p27 (CDKN1B), are also considered tumor suppressors. Dysregulation of this network stemming (for example) from the overexpression of certain cyclins, CDKs, or E2F1 itself, or from the downregulation of certain CDKIs, can lead to uncontrolled cell growth, which favors tumor formation and progression [20][21][22][23][24]. Figure 2 (panels A, B, C) shows heat maps of the expression of the 22 genes included in the Biocarta cell cycle pathway at each stage of tumorigenesis (compared with normal mucosa). Each of the three tumor + N data sets was subjected to hierarchical clustering analysis using the 22 cell cycle-associated genes. As shown in Figure 2A, this analysis identified two clusters within the N vs. SPL data set, which showed no relation to the actual tissue labels (see column labels in Figure 2A). In the N vs. LPL data set ( Figure 2B), the two tissue-type groups were more readily distinguished (only 6 LPL samples were misclassified), and in the N vs. CRC set, the two classes of tissues were separated with only three errors. Collectively, these findings point to progressive dysregulation of the cell cycle pathway, which becomes overt in the invasive stage of tumorigenesis, as highlighted by our RS analysis. Major involvement of this pathway at the CRC stage also emerged when the gene expression profiles were subjected to PCA ( Figure 2D).
As shown in Figures 3A and 3B, certain cell cycle genes were already overexpressed in SPLs and LPLs, including those encoding CCND1, CCND2, and CCNE1, CDKs 2, 4, 6, and 7, and the oncogenes CDC25A and TFDP1. These changes were associated with downregulated transcription of the genes encoding the CDKI p15 (CDKN2B) and p21 (CDKN1A), an expected finding for preinvasive lesions with high proliferation rates. In contrast, CDKI p27 (CDKN1B) expression was upregulated in LPLs, but not CRCs ( Figure 3C), a finding that is consistent with previously reported immunostaining profiles of adenomatous and cancerous colorectal tissues [25]. Interestingly, the tumor suppressor RB1 was also upregulated across all stages of tumorigenesis (Figure 3), whereas, in previous studies, this alteration has been documented only in the malignant phases [26][27][28]. The most convincing explanation proposed for the upregulated expression of RB1 and p27 envisions these factors as possible mediators of a homeostatic mechanism that protects cells from the putatively toxic effects of excessive cyclin, CDK, or E2F1 activity [25,28].
One of the most dramatic changes that characterized the transition to CRC ( Figure 3C) was an increase in the expression of E2F1, the master regulator of the cell cycle pathway. This alteration is well known in colorectal carcinomas [22,29], and it seems to be associated with higher tumor stages and poorer prognoses in these cancers [30] and those of other organs as well [31][32][33]. Two other important cell cycle genes, those encoding the tumor suppressors p16 (CDKN2A) and the RB homolog p107 (RBL1), were also upregulated in CRCs. The expression of p16 can be silenced during tumorigenesis by gene promoter methylation, but this phenomenon is largely confined to colorectal cancers with the hypermethylator phenotype and DNA mismatch repair defects, which account for < 20% of all colorectal cancers [34][35][36]. We have found p16 overexpression iñ 80% of the colorectal cancers we have studied over the years (unpublished data). Like the p27 and RB1 upregulation mentioned above (or that of RBL1, which exerts inhibitory effects on E2F1-mediated trans-activation),   p16 upregulation might represent a negative feedback mechanism aimed at preventing the G1-to-S transition (although E2F1 can readily overcome a p16-mediated G1 block) [37]. It is interesting to note that the trends shown in Figure 3, which are based on our analysis of transcript levels, are-on the whole-consistent with published data on the corresponding gene products. Closer inspection of Tables 2 and 3 shows that the pathways exhibiting tumor-related downregulation were generally larger (in terms of the total number of RefSeqs they contained) than those that were upregulated in tumor tissues (mean numbers of RefSeqs in the gene sets: 69 vs. 27.9, respectively; p-value of one-tailed ttest = 2.4 · 10 -4 ). This finding might be related to the fact that tumor-associated downregulation was often seen in highly conserved pathways that govern normal mucosa homeostasis (e.g., cell differentiation programs). Pathways of this type have been extensively studied since the early days of molecular biology, and a relatively large number of their gene components have been identified. Consequently, the gene sets representing these pathways are likely to be larger than those of more specialized pathways, which have probably been less thoroughly explored. Nonetheless, it is also possible that fundamental pathways and networks are effectively larger as a result of relatively high-level component redundancy, a feature that would increase their robustness and versatility and ensure essential cellular functions in normal tissues under a variety of conditions.
Because the preinvasive stages of colorectal tumorigenesis analyzed in our study have been far less extensively explored than the cancerous phases, there were no independent transcriptomic data sets for precancerous lesions to use to validate our results. To overcome this limitation, we used two different approaches.
First, we re-analyzed our three data sets (N vs. SPL, N vs. LPL, and N vs. CRC) with GSEA [16], in a manner similar to that used in previous studies by our group [13]. Table 4 shows the numbers of pathways displaying significant tumor-associated enrichment in the RS and GSEA analyses. In all cases, a high percentage of the pathways found to be significantly up-or downregulated in tumors (compared with normal mucosa)  . The number of enriched pathways identified by GSEA was always substantially higher than that obtained with RS analysis. This finding reflects the fact that in GSEA the nominal p-value of a pathway enrichment score is computed via an empirical phenotype-based permutation test procedure [16]. RS analysis uses a more stringent selection process in which the actual enrichment score of each pathway is compared with the scores obtained by the permutation of labels-an approach similar to that used in GSEA-and with the scores for sets composed of randomly selected genes [17]. Second, we validated the findings regarding CRCs by performing RS analysis of two publicly available, independent transcriptomic data sets. The first (V-set I) had been generated by Affymetrix HGU133A GeneChip analysis of 47 samples of human colorectal tissues (22 of normal mucosa, 25 CRCs) and is accessible through the ArrayExpress site (E-MTAB-57). The second (V-set II) was obtained with GeneChip Human Exon 1.0 ST array analysis of 20 paired CRC-normal mucosa samples [38]. The results of these validation analyses are shown in Table 5. The vast majority of pathways exhibiting CRCrelated upregulation in the original N vs. CRC data set were also significantly upregulated in V-set I (73%, pvalue = 1.1x10 -16 , Fisher's exact test) and V-set II (82%, p-value = 3.3x10 -16 , Fisher's exact test). Lower but still excellent degrees of overlap were also observed for the pathways found to be downregulated in CRCs compared with normal mucosa. Figure 4 summarizes the most relevant tumor-related pathway dysregulations at different stages of transformation. Due to space constraints, only the upregulated pathways (Table 2) are discussed below; those that were downregulated (Table 3) are considered in detail in Additional file 1.
Our data suggest that the early preinvasive phase of colorectal tumorigenesis is characterized on the whole by upregulated activity of pathways involved in DNA replication and repair (i.e., KEGG BASE EXCISION REPAIR, KEGG HOMOLOGOUS RECOMBIN-ATION, REACTOME ACTIVATION OF THE PRE-REPLICATIVE COMPLEX). These findings are consistent with recent reports [39,40] showing that the progression of early precancerous lesions (in the colon and elsewhere) is curbed by cell cycle checkpoints that are activated by DNA replication "stress." The precise nature of this stress is currently unclear, but it is probably initiated by increased expression of or gain-of-function mutations involving oncogenes (e.g., CCN1, KRAS, or MYC), which are known to be early events in tumorigenesis. Abnormal activation of the prereplicative complex entails upregulation of CDC6 and several minichromosome maintenance genes. (Our data and those described by Freeman et al. [41] might reflect an early step in this type of replicative stress.) This process is associated with stalling and/or collapse of replication forks and double-strand breaks, which slow or arrest the cell cycle to allow the DNA to be repaired (e.g., via homologous recombination). Activation of base excision repair suggests that DNA base oxidation or deamination may also be accelerated in early preinvasive lesions. Paradoxically, each of these repair processes can per se cause genomic instability [40,42]. This would favor the onset and selection of loss-offunction mutations involving tumor suppressor genes, whose protein products drive the cell cycle checkpoints (e.g., TP53, which is often mutated in the later phases of colorectal tumorigenesis [1]), and the result would be unrestrained tumor progression.
In line with the above findings, two other pathways also appeared to be upregulated in our SPLs and LPLs. The BIOCARTA ARF PATHWAY emanates from the tumor suppressor proteins p16INK4a and p14ARF (both encoded by CDKN2A). It is a key sensor of oncogenic stress (e.g., the KRAS-or MYC-associated hyperproliferative signal documented in colorectal adenomas). Activation of the ARF pathway stabilizes TP53, thereby promoting effective checkpoint activity [43]. Both classes of preinvasive lesions also displayed upregulated nucleotide excision repair (KEGG NUCLEOTIDE EXCISION REPAIR), which targets UV-and carcinogen-induced DNA adducts [44]. In conditions of replicative stress, sustained activation of this pathway might be triggered by the complex (but poorly defined) mixture of putative (See figure on previous page.) Figure 2 Hierarchical clustering and PCA of data sets based on cell cycle gene expression. Heat maps in panels A, B, and C show expression levels for the Biocarta cell cycle pathway's 22 gene components (listed on the right) across samples in the 3 tumor-stage-specific data subsets: SPLs, LPLs, and CRCs, respectively (each containing corresponding samples of normal mucosa, N). Actual sample labels are shown at the top of each heat map (0=normal mucosa; 1=tumor); the groups identified by hierarchical clustering analysis are separated by vertical white lines. (Dendrograms are not shown.) (D) Bi-dimensional projection via PCA of all tumors and normal mucosal specimens using expression levels for the 22 cell cycle-related genes. Each dot represents a tissue sample (pink circle: N; yellow star: SPL; green diamond: LPL; blue square: CRC). The first two components, PC1 and PC2, account for 81% of the variance in this set.
carcinogens generated in the colorectum by host and bacterial metabolism.
DNA damage checkpoints and apoptosis appear to be efficient barriers that can restrain tumor growth for up to two decades [45]. Nonetheless, DNA replication stress and repair are naturally associated with increased cell proliferation rates in colorectal tumors. The need for DNA building blocks, before and after these barriers have been disrupted, explains why nucleotide metabolism is increased throughout tumorigenesis, as reflected by the early persistent upregulation we observed in the REACTOME PURINE RIBONUCLEOSIDE MONO-PHOSPHATE BIOSYNTHESIS pathway and also by that of the KEGG PYRIMIDINE METABOLISM pathway. (The significance of the latter upregulation was borderline, so it is not listed in Table 2.) DNA replication is followed by dramatic changes in the nucleus and its membrane during mitosis, so it was not surprising that the RAN/mitotic spindle pathway (BIOCARTA RANMS PATHWAY) was upregulated at all three stages of tumorigenesis. The small nuclear GTPase RAN (ras-related nuclear protein) directs the assembly of the mitotic spindle and later that of the nuclear envelope, whose nuclear pore complexes are necessary to re-establish nucleocytoplasmic transport [46]. Pathways involved in the G2-to-M transition of the cell cycle (e.g., REACTOME CYCLIN A1 ASSOCIATED EVENTS DURING G2 M TRANSITION) were also constantly upregulated during tumorigenesis, as was the REACTOME FORMATION OF TUBULIN FOLDING INTERMEDIATES BY CCT TRIC pathway, which is involved in protein folding mediated by the chaperonin containing the TCP1 complex. This complex plays central roles in the folding and assembly of numerous proteins [47], so the upregulated expression of several genes encoding its subunits could be easily ascribed to increased protein metabolism in tumor cells.
Of the 23 pathways selectively upregulated in CRCs, six pointed to the activation of the G1-to-S phase transition: SA REG CASCADE OF CYCLIN EXPR (Regulatory cascades of cyclin expression), BIOCARTA SKP2E2F PATH-WAY, BIOCARTA CELLCYCLE PATHWAY, BIOCARTA P27 PATHWAY, REACTOME G1 PHASE, and BIO-CARTA RB PATHWAY (see also first section of Results and Discussion). The simultaneous upregulation of these inter-related cell-cycle pathways in advanced colorectal tumors reflects the sustained proliferation that is a fundamental trait of cancer cells [48]. The invasive stages of tumorigenesis are thought to be characterized by mutations involving tumor suppressor genes like TP53 or PTEN, alterations that allow cancer cells to circumvent programs that limit proliferation (e.g., the cell-cycle checkpoints, which operate more efficiently in early-stage tumors, as discussed above). This high-proliferation environment is naturally associated with increased transcription and translation, as documented in our dataset by the upregulation of diverse RNA polymerase II and III functions, amino-acid transport across the plasma membrane, and tRNA aminoacylation ( Table 2).
(See figure on previous page.) Figure 3 Dysregulation of the cell cycle pathway during tumor progression. Expression levels for the 22 Biocarta cell cycle genes in each tumor stage-specific data subset-SPLs (A), LPLs (B), and CRCs (C)-were compared with those in the normal mucosa (N) data set using two-sample t-test. Each graph contains 22 nodes representing the genes in the pathway (white, yellow, and blue rectangles, and yellow ellipses) plus a node for each tumor-stage being analyzed (green rectangles; those outlined in red represent the stage considered in the panel). Yellow and blue rectangles: genes displaying tumor-associated upregulation or downregulation, respectively, in the stage represented in the panel; white rectangles: genes that were also dysregulated in at least one of the other two stages; yellow ellipses: cell-cycle genes that displayed no tumor-related dysregulation at any of the three stages. The connection matrix used for the graph was a sparse square matrix of order 25 where 1 indicates connection between nodes and 0 indicates no connection. Black lines: connection between a gene node and tumor-stage node (i.e., tumor-related up-or downregulation of the gene at that stage).  Over the past 20 years, important roles have emerged for nonepithelial cells in the progression of colorectal adenocarcinomas (and those involving other organs) [48]. Macrophages, for example, seem to play conflicting (but nonetheless crucial) roles in both tumor development and metastasis [49], and this is consistent with the marked upregulation of the BIOCARTA MONOCYTE PATHWAY observed in our CRC dataset. Monocyte differentiation gives rise to tumor-antagonizing and tumor-promoting macrophages. The latter cells promote angiogenesis, enhance tumor cell migration and invasion, and suppress antitumor immunity [49]. CRC-related upregulation of the BIOCARTA SET PATHWAY reflects the importance of another stromal contribution to colorectal carcinogenesis: granzyme release by cytotoxic T lymphocytes. These serine proteases (along with the multiprotein SET complex, whose components are encoded by genes frequently upregulated in our tumors) trigger apoptosis and are therefore regarded as mediators of antitumor immunity [50]. But they can also provoke inflammation and cleave extracellular matrix components [50]. Moreover, the SET protein is believed to act as an oncoprotein (given its apoptosis-inhibiting activity within the SET complex) and as a regulator of chromatin remodeling [51,52]. On the basis of our transcriptomic data alone, it is difficult to discern what type of impact SET pathway activation has on colorectal cancer progression.
Finally, the REACTOME GLYCOLYSIS pathway was found to be upregulated in CRCs. Since its first description in 1924 by Otto Warburg [53], aerobic glycolysis has been considered the preferred pathway for metabolizing glucose in cancer cells (as opposed to the oxidative metabolism used by normal differentiated cells). Our data demonstrate that the switch to aerobic metabolism can be documented with transcriptional analysis of the genes encoding metabolic enzymes. Cancer cells appear to exploit aerobic glycolysis to produce the biomass needed for new cells, despite the pathway's inefficient ATP generation [54]. Cancer cells' need for nutrients to fuel biomass production is also reflected in the activation of other pathways mentioned above, such as those involving glucose and amino-acid transport, regulation of glucokinase, and purine biosynthesis.

Conclusions
Our exhaustive description of the sequence of critical molecular events characterizing the progression of colorectal tumors is based on a statistically robust analysis of transcriptomic data carried out at the level of functional molecular processes rather than individual genes or proteins. This analysis revealed specific pathways whose dysregulation might play a role in each transition of the transformation process. This is the first study in which such an approach has been used to gain further insights into colorectal tumorigenesis. Therefore, our findings provide a foundation for larger projects in which transcriptomic data will be integrated with (epi)genomic, proteomic, and metabolomic data from ongoing and future studies. They should open roads to experimental research aimed at providing more in-depth, systems-level understanding of colorectal tumorigenesis.

Additional file
Additional file 1: Biological pathways found to be downregulated at different stages of colorectal transformation (Table 3 and Figure 4).
(See figure on previous page.) Figure 4 Overview of tumor-related pathway dysregulation at different stages of transformation. Pathways displaying identical configurations of dysregulation (e.g., upregulated in SPLs and LPLs but not CRCs) have been combined into 10 more general biological groups (white boxes). Arrows indicate type (up vs. down) of dysregulation.