Untargeted lipidomic features associated with colorectal cancer in a prospective cohort

Perttula, Kelsi; Schiffman, Courtney; Edmands, William M B; Petrick, Lauren; Grigoryan, Hasmik; Cai, Xiaoming; Gunter, Marc J; Naccarati, Alessio; Polidoro, Silvia; Dudoit, Sandrine; Vineis, Paolo; Rappaport, Stephen M

doi:10.1186/s12885-018-4894-4

Research article
Open access
Published: 19 October 2018

Untargeted lipidomic features associated with colorectal cancer in a prospective cohort

Kelsi Perttula¹,
Courtney Schiffman¹,
William M B Edmands¹,
Lauren Petrick^1,7,
Hasmik Grigoryan¹,
Xiaoming Cai¹,
Marc J Gunter³,
Alessio Naccarati⁴,
Silvia Polidoro⁴,
Sandrine Dudoit^1,2,6,
Paolo Vineis^4,5 &
…
Stephen M Rappaport ORCID: orcid.org/0000-0002-3806-0848¹

BMC Cancer volume 18, Article number: 996 (2018) Cite this article

3306 Accesses
22 Citations
3 Altmetric
Metrics details

Abstract

Background

Epidemiologists are beginning to employ metabolomics and lipidomics with archived blood from incident cases and controls to discover causes of cancer. Although several such studies have focused on colorectal cancer (CRC), they all followed targeted or semi-targeted designs that limited their ability to find discriminating molecules and pathways related to the causes of CRC.

Methods

Using an untargeted design, we measured lipophilic metabolites in prediagnostic serum from 66 CRC patients and 66 matched controls from the European Prospective Investigation into Cancer and Nutrition (Turin, Italy). Samples were analyzed by liquid chromatography-high-resolution mass spectrometry (LC-MS), resulting in 8690 features for statistical analysis.

Results

Rather than the usual multiple-hypothesis-testing approach, we based variable selection on an ensemble of regression methods, which found nine features to be associated with case-control status. We then regressed each selected feature on time-to-diagnosis to determine whether the feature was likely to be either a potentially causal biomarker or a reactive product of disease progression (reverse causality).

Conclusions

Of the nine selected LC-MS features, four appear to be involved in CRC etiology and merit further investigation in prospective studies of CRC. Four other features appear to be related to progression of the disease (reverse causality), and may represent biomarkers of value for early detection of CRC.

Peer Review reports

Background

Colorectal cancer (CRC) accounts for over 25% of all cancer-related deaths with global incidence rates steadily rising [1,2,3]. Since less than 15% of CRC risk has been attributed to heritable genetics [4, 5], non-shared exposures and their contributions to gut inflammation are believed to be important etiologic factors [6]. Increased CRC risks have been associated with cigarette smoking, alcohol use, lack of physical activity, obesity, abnormal glucose metabolism, and consumption of red meat and n-6 polyunsaturated fatty acids (PUFAs) [7,8,9,10]. Conversely, consumption of n-3 PUFAs, fruits, fish, vitamins D and E, and regular use of aspirin appear to reduce CRC risks [7, 11, 12]. There are also persistent suggestions that the interplay between dietary factors - particularly red meat, lipids, and fiber - and the gut microbiota are effect modifiers for CRC [6, 13,14,15,16].

Many of the associations between exposures and CRC have been gleaned from epidemiological studies that employed self-reported dietary and lifestyle factors [8, 9, 16, 17]. Given the inherent limitations of such data for discovering causal exposures, investigators have recently employed metabolomics to compare small-molecule features between CRC cases and controls. This strategy is based on the idea that small molecules in human blood reflect chemical exposures from both internal and external sources, including the diet, microbiota, psychosocial stress, and pollutants [18]. However, since molecules that discriminate cases from controls in cross-sectional studies can reflect both potential causes of CRC and dysregulation of metabolic processes that result from progression of the disease (reverse causality) [5, 19], it is important that biospecimens be collected before diagnosis to gain insights into causes and effects. Indeed, a class of ultra-long-chain fatty acids (ULCFAs) that discriminated for CRC in several cross-sectional studies [20, 21] was essentially ruled out as a causal factor in a prospective cohort [22].

Metabolomic analyses of blood from prospective cohorts have found some associations between CRC incidence and small molecules, as summarized in Table 1, with periods of follow-up ranging from 3.7 to 14.7 years [19, 22,23,24,25,26,27]. Interestingly, all of these nested case-control studies followed targeted or semi-targeted designs where relatively few molecular features were tested between cases and controls. Two of the studies focused on metabolism of dietary choline and found that the mammalian metabolite, betaine, was moderately protective against CRC whereas trimethylamine-N-oxide (TMAO), a metabolite mediated via intestinal microbiota, was associated with increased risk [26, 27]. A genetic link between TMAO and CRC risk has also been reported [28]. Intriguingly, red meat and other phosphatidylcholine-rich foods appear to contribute to dysbiotic microbiota that generate trimethylamine (the precursor of TMAO) [15, 29], whereas fiber-rich foods appear to encourage symbiotic bacteria that are associated with decreased CRC risk [15, 30].

Table 1 Studies that investigated associations of colorectal cancer with small molecules in plasma or serum from prospective cohorts

Full size table

Since untargeted metabolomics via liquid chromatography-mass spectrometry (LC-MS) can detect thousands of small-molecule features, traditional hypothesis-testing approaches that adjust for multiple comparisons by controlling false positive error rates, such as the false discovery rate (FDR) [31], can make it difficult to find features whose levels differ significantly between cases and controls. This may have motivated the semi-targeting strategy of Cross et al. (Table 1) [25], who limited hypothesis tests of the thousands of detected features to only 278 molecules that had been fully annotated. Such a strategy is likely to be biased towards well curated metabolites that participate in recognized human pathways [18], and thus can miss novel exposures of potential importance to initiation of cancer, including those experienced predominately by either cases or controls. Indeed, of the 278 small-molecules tested by Cross et al. [25], only glycochenodeoxycholate (a secondary bile salt) was associated with increased CRC risk in women (but not men) after using the conservative Bonferonni correction of the p-value.

Here, we report results of an untargeted metabolomics analysis of serum from 66 incident CRC cases and matched controls from the European Prospective Investigation of Cancer and Nutrition (EPIC). Given the involvement of lipids in inflammatory processes and CRC [32,33,34], the serum-extraction procedure favored lipophilic molecules. As an alternative to the traditional multiple-hypothesis-testing paradigm for selecting features of potential importance to CRC, we developed a variable-selection strategy that employs an ensemble of diverse prediction methods, including regularized linear regression and regression trees [35,36,37]. Such methods have recently been applied independently for analyzing metabolomic and other -omic data [35, 36, 38]. Our analyses point to a small set of features that were predictive of CRC-case status. However, as with all discovery studies, these potentially important features and the molecules they represent must be further validated with independent data sets.

Methods

Study population

EPIC is a prospective cohort study with approximately 520,000 adult participants from across Europe that were enrolled from 1992 through 2000 [39]. We had received EPIC serum to test the hypothesis that ULCFAs were protective of colorectal cancer [22], and simultaneously performed untargeted metabolomics to discover other potentially causal features. For the current study, data were analyzed from serum collected between 1993 and 1997 from 66 case-control pairs in Turin, Italy. Controls were matched to incident cases by age, year and season of enrollment, and gender. Dietary data were collected with food frequency questionnaires [40, 41]. Summary statistics for these subjects are listed in Table 2, including time to diagnosis (ttd), gender, body mass index (bmi), waist circumference, smoking status, diabetes status, physical activity, and alcohol and meat consumption. These variables were selected based on previous evidence of associations with CRC risk [7, 42, 43]. Across our subjects, the only significant differences between CRC cases and controls were observed for bmi and waist circumference, both of which were higher in cases (Table 2).

Table 2 Descriptive statistics of human subjects matched by age, study enrollment, gender, and selected covariates

Full size table

Chemicals

Isopropanol (LC-MS grade, Fluka), methanol, water and ¹³C- cholic acid (internal standard) were from Sigma-Aldrich (Milwaukee, WI, USA). Acetic acid (LC-MS grade, Optima) and chloroform were from Fisher Scientific (Santa Clara, CA, USA). All chemicals were of analytical grade and were used without purification.

Sample processing

Serum was stored after collection in 0.5-ml aliquots that were placed in cryostraws, sealed, and stored in liquid nitrogen (-196 °C) at the International Agency for Research on Cancer in Lyon, France. Approximately 1 year prior to analysis, cryostraws were transported (with dry ice) to our laboratory in Berkeley, CA (USA), where they were maintained at -80 °C. As previously reported [22], 20 μl of serum was mixed with 100 μl of a solvent mixture (isopropanol/methanol/water = 60:35:5) containing ¹³C-cholic acid as an internal standard (final concentration of 3.0 μg/ml). After mixing samples for 1 minute with a vortex mixer, samples were left at room temperature for 10 min. to precipitate proteins, and were then centrifuged for 10 min at 10,000 g. The supernatant was retained and stored at 4 °C prior to LC-MS. Case-control pairs were analyzed sequentially but in random order. A local quality-control sample, prepared by pooling aliquots from all serum specimens of each batch, was analyzed after every ten samples to monitor system stability and estimate the precision of the analyses.

Mass spectrometry

Analysis was performed with an Agilent LC (1100 series) coupled to an Agilent high resolution MS (Model 6550 QTOF, Santa Clara, CA, USA) as previously reported [22]. Briefly, 10 μl of extracts were slowly loaded on to a Luna C5 column (Phenomenex, Los Angeles, CA) with a 22-min gradient elution of mobile phase A (methanol/0.5% acetic acid = 5:95) and mobile phase B (isopropanol/methanol/0.5% acetic acid = 60:35:5). The electrospray was operated in negative electrospray-ionization (ESI) mode. Tandem MS/MS spectra were obtained on the same platform in data-dependent mode (immediately after data collection) or targeted mode (analysis of the selected features). Full LC-MS acquisition parameters were previously published [22].

Approximately one third of the serum samples had a gelled consistency that resulted from an additive to the cryostraws [22, 44, 45]. Pairs with at least one gelled sample were analyzed in one batch (batch 1, n = 96), and the remaining (non-gelled) pairs were analyzed in a second batch (batch 2, n = 36).

Data processing

Raw data were converted to mzXML format for peak picking using ProteoWizard software (Spielberg Family Center for Applied Proteomics, Los Angeles, CA). Peak detection and retention-time alignment were performed as described previously [22], using the XCMS package within the R statistical programming environment [46,47,48]. The CAMERA package was used to identify isotopes, ESI adducts, and in-source fragments with the custom rule set used from Stanstrup et al. [49, 50]. Annotation of features was conducted using the compMS2Miner package [51], by comparing accurate masses and MS2 fragmentation patterns with the Human Metabolome Database (HMDB) and Metlin [52, 53].

Over 24,300 features were initially detected in negative ESI mode. Features were filtered by removing those with a mean fold-change in abundance less than 1.5 compared to the same peaks in reagent blanks (background noise) and those with coefficients of variation (CV) from QC samples greater than 30% [54, 55]. This resulted in a final dataset of 8690 features for statistical analysis. Feature intensities were (natural) log-transformed and adjusted for batch and gel-status effects using the following linear regression model, previously described in [22]:

$$ \mathit{\log}{Y}_i={\beta}_0+{\beta}_1{X}_{i, gel}+{\beta}_2{X}_{i, batch}+{\epsilon}_i, $$

(1)

where Y_i denotes the intensity of a given feature for the i^th subject and X_{i, gel} and X_{i, batch} are the corresponding categorical covariates for gel-status and batch. After fitting the linear model, normalized (logged) intensities were obtained by subtracting the estimated batch and gel effects from the original (logged) intensities.

Upper-quantile scaling was used to render the distributions of feature abundances more comparable across all subjects [56, 57]. A correlation-network program (Cytoscape, [58]) and an R package clustering algorithm (RAMclust, [59]) were used to identify clustered ions and assist with annotations.

Statistical methods: Variable selection

In order to identify discriminating features between CRC cases and control, we shifted the paradigm from multiple hypothesis testing to variable selection based on a combination of three regression methods. First, we considered the following standard linear regression model for the raw intensity of a given feature Y in the i^th subject:

$$ \mathit{\log}{Y}_i={\beta}_0+{\beta}_1{X}_{i, caco}+{\beta}_2{X}_{i, gel}+{\beta}_3{X}_{i, batch}+{\beta}_4{X}_{i, age}+{\beta}_5{X}_{i, gender}+{\epsilon}_i, $$

(2)

where caco, gel, and gender denote binary variables for case-control status, presence or absence of gelled serum, and the matched variables of gender and age (in years). Features were then ranked based on the nominal unadjusted p-value for the case-control coefficient (β₁). Second, a regularized logistic regression (LASSO: least absolute shrinkage and selection operator) was performed [35, 60] with case-control status as the binary outcome variable regressed on the following covariates: normalized log intensities from Eq. (1) for all 8690 metabolites, age and gender. Although age and gender were in the LASSO model, neither variable was selected by the regularized regression.

In order to stabilize feature selection with LASSO, 500 bootstrap samples were taken for each of a variety of penalty parameters [61]. Features chosen by LASSO in at least 10% of the bootstrap samples, across a wide range of penalty parameters, were retained. A data-driven cutoff of 10% was chosen based on plots of the percentage of time that each metabolite was selected during the bootstrap iterations across various penalty parameters, sorted in decreasing order. There was an obvious gap between metabolites selected more than and less than 10% of the time, which lead to choosing this as a natural cutoff. Third, the random forest algorithm [36, 62] was used to build a predictor of case-control status, using the same covariates as for the LASSO regression. No obvious jump in variable importance could be seen for the sorted random forest variable importance or the sorted linear regression p-values. Therefore, a cutoff of 1% was selected for both of these criteria because this cutoff is relatively stringent, yet still included a reasonable number of variables for consideration. In summary, to select a final set of variables, we included only features that were selected by the bootstrap LASSO and were also among the top 1% of features ranked by linear regression p-values and random forest variable importance.

When a set of features was selected that satisfied all criteria, the extracted ion chromatographs were visually inspected and those with poor peak morphology (ill-defined Gaussian shape) or integration were removed. Then, the three variable selection methods were repeated as needed to arrive at a final set of selected features with good peak morphology and integration.

Initially, only covariates on which the samples were matched (i.e. age and gender) were included in the models used for variable selection. We did not include other dietary or health related covariates because we did not want to obscure possible associations between the metabolites and case-control status. However, we did subsequently test for associations between the nine selected metabolites and the following covariates weight, bmi, smoking status, and consumption of beef, pork, and alcohol. As shown in Additional file 1, only one feature (ID 839) was marginally associated with any of the covariates (i.e. bmi and consumption of beef). Also, when each of these covariates was added to the LASSO model (Eq. (2)) with case-control status as the binary outcome variable none of them was selected by the regularized regression.

Results

Features that discriminate for CRC

After applying the three variable-selection methods described above, two of which prioritize predictive ability (LASSO and random forest), nine features were selected. The volcano plot in Fig. 1 relates case-control fold changes and –log₁₀ p-values (for the model in Eq. 2) for all features and highlights the nine selected metabolites (shown in Table 3). Case-control fold-changes ranged from approximately 0.2 to 3.0 overall and between 0.40 and 1.40 for the nine selected features. Due to the nature of our variable selection method, the p-values of the selected features were not necessarily the smallest, nor were their fold-changes necessarily the largest. Nevertheless, the nine selected metabolites resulted in a 79% correct classification rate when they were used to fit a logistic regression model on the learning set to predict case-control status. Although this correct classification is likely optimistic because the same data were used to perform the variable selection and to build and test the predictor, the selected features are worthy of validation in independent samples of CRC cases and controls from prospective cohorts.

Table 3 Untargeted features selected as predictors of case-control status

Full size table

Potentially causal and reactive biomarkers

The nine selected features were evaluated to determine their associations with time to diagnosis (ttd) as a means of discerning whether they represent potentially causal exposures or reactive effects of disease progression [22]. If the log fold-change for a given feature was constant across the whole range of ttd in a linear model (p-value > 0.05), the feature was classified as potentially causal (C) and if the case-control difference decreased with increasing ttd, the feature was classified as potentially reactive (R). These (C) and (R) classifications are listed in Table 3 for the nine selected features and the plots for the ttd linear models are shown in Fig. 2. (See Additional file 1 for regression coefficients and p-values). This process resulted in four potentially causal features (no apparent effect of ttd for IDs: 5080, 3207, 6054 and 839), four potentially reactive features (case-control differences diminish with ttd for IDs: 235, 4250, 4294 and 14,963), and one feature that could not be classified as either (C) or (R) (case control differences increased with ttd, ID 5749). The four potentially causal metabolites resulted in a 72% correct classification rate to predict case-control status (with the same caveats mentioned above). While the four potentially causal features may be linked to exposures that contribute to CRC, the four reactive features may be useful pre-diagnostic biomarkers.

Discussion

Using untargeted metabolomics in serum samples from 66 pairs of CRC cases and controls from the EPIC cohort, we sought evidence linking lipophilic molecules with the etiology of CRC. The LC-MS data collected from these samples included over 24,000 features. After filtering for noise (mean fold-change above blank samples), reproducibility (CVs), and likely artifacts (CAMERA), 8690 features were available for evaluating potential associations with CRC case status.

It has been standard practice in metabolomics to identify features that discriminate for case-control status using a multiple-testing approach, e.g., based on a cutoff for p-values that have been adjusted to control for a false positive error rate such as the FDR [63]. Since untargeted metabolomics can detect thousands of features, FDR correction is severe [63] and can drastically reduce the number of selected metabolites, thereby resulting in false negatives. Thus, we shifted our paradigm to a variable-selection approach, based on an ensemble of diverse regression methods, in order to uncover a reliable set of features for further investigation. This led to selection of 9 features that discriminated for CRC (Table 3) with a correct classification of 79%. Based on regressions of case/control fold changes on ttd (Fig. 2), four of these features appear to be related to causal factors for CRC, four appear to be related to cancer progression, and one is indeterminate.

As mentioned earlier, the top 1 % of features in terms of smallest p-values were considered for variable selection. However, many of these roughly 90 features were not also high ranking in terms of predictive importance, as assessed by random forest and LASSO. Indeed, inspection of data from the features with very small p-values found some to have small standard errors rather than meaningful case-control differences, while other features with large fold changes had great uncertainties in fold-change estimates. This reinforces the value of using an ensemble of regression methods to evaluate biological variability rather than a single measure such as a small p-value or a large case-control fold change.

Possible annotations

Potential annotations of the nine selected features were based on comparisons of MS2 spectra with human metabolome database (HMDB) entries as summarized in Table 4. Focusing first on the four potentially causal features shown in Table 3, MS2 were only obtained for IDs 3207 and 6054, which were positively correlated (Pearson correlation coefficient of 0.64) and were both present at lower levels in cases than in controls, indicating possibly protective effects. The two MS2 fragment ions detected for ID 3207 could not be identified. However, ID 6054 had fragments characteristic of [C₁₆H₂₉O₂]⁻ and losses of two H₂O molecules, consistent with the loss of two hydroxyl groups and a hexadecenoic acetate fragment. These fragments are suggestive of ceramide lipids [64], a class of molecules that has been implicated in the regulation of cancer cells [65]. Although IDs 3207 and 6054 and the two other potentially causal features (IDs 5080 and 839) could not be fully annotated, the identified characteristics of accurate mass, retention time, and MS2 fragments can be used for validation in future studies.

Table 4 Results of tandem MS/MS analyses of features associated with case-control status

Full size table

Turning now to the likely reactive features, ID 4294 was putatively identified as ULCFA 468, which had been evaluated separately in our targeted study of 8 ULFCAs [22] and had been first reported by Ritchie et al. [24]. This feature had neutral losses of H₂O and CO₂, characteristic of a hydroxylated fatty acid and a likely molecular formula for [M-H]⁻ of C₂₈H₅₂O₅, within 1.56 ppm of the exact mass. Another reactive feature (ID 4250) also displayed these characteristic neutral losses of H₂O and CO₂ and was highly correlated with ID 4294 with a correlation coefficient of 0.85. This suggests that ID 4250 is a previously uncharacterized ULCFA with molecular formula for [M-H]⁻ of C₂₇H₅₀O₅, within − 2.76 ppm. While odd-numbered fatty acids are less common in humans, microbial single carbon metabolism in very long chain fatty acids has been reported [66]. As a class, ULCFAs tend to be present at higher levels among controls compared to paired cases, but this difference diminishes with ttd, suggesting that they result from disease progression [22]. Nonetheless, the fact that these two ULCFAs were selected from approximately 9000 features that survived filtering of the untargeted metabolomics data offers partial validation to our variable-selection strategy. Based on correlation maps (data not shown) both of these features clustered with five other ULCFAs that have been described by Ritchie, et al. [24] (ULCFAs 465, 466, 492, 518, and 538; exact masses within 10 ppm of calculated m/z), and were also analyzed in our targeted study [22]. Another selected feature (ID 235) exhibited similar reactive (R) behavior to the ULCFAs (Fig. 2), and the presence of a neutral loss of CO₂, indicating that ID 235 may be a fatty or bile acid. Deoxycholic acid (3α, 12α-dihydroxy-5β-cholanic acid) [M-H]⁻, chenodeoxycholic acid (3α, 7α-dihydroxy-5β-cholanic acid) [M-H]⁻, and adrenic acid [M + HAc-H]⁻ were eliminated as possible annotations of feature 235 by comparison of retention times between the experimental data and analytical standards. However, these two tested molecules are just two isomers of a large class of bile acids, some of which are positively correlated with CRC [25, 67]; in our study, feature 235 was negatively correlated with CRC.

Limitations

Limitations of this study include the small sample size, which reduced the power to detect differences between case-control pairs, and lack of information regarding aspirin consumption and a family history of CRC, two covariates that have been associated with CRC incidence [1, 68]. Any bias (and potential confounding) introduced by the gelling of some samples from the cryostraws should have been removed by adjustment via Eq. (1). Nonetheless, gelling complicated sample processing and was, therefore, a source of random variation that probably reduced our ability to detect differences between cases and controls. Gelling of EPIC serum resulted from a ‘gel powder’ that had been added to seal one end of the cryostraw (https://patents.google.com/patent/US7056727B2/en). This illustrates how proper storage of biological specimens for decades is challenging because preservation of cells, DNA, RNA, proteins and small molecules must be considered. However, decades later, shortcomings of then-contemporary technology (such as gelling of serum) can be revealed and their reporting can improve the design of future investigations.

Conclusions

In summary, of the nearly 9000 filtered features subjected to statistical analysis, four appear to be potentially causal features that are worthy of following up in an independent set of prospective CRC cases and controls. When these four features alone were used to build a logistic regression predictor of case/control status on the learning set, they resulted in a correct classification rate of 72%. Again, this is likely an optimistic correct classification rate, but given that only four features were used for prediction, it is quite promising. Four other selected features, notably some ULCFAs and related fatty acids, appear to be products of disease progression and, therefore, could be useful diagnostic biomarkers for early detection of CRC. Since ULCFAs had previously been shown to discriminate CRC cases from controls in several cross-sectional investigations, it is reassuring that two putative ULCFAs (IDs 4294 and 4250) were selected as predictive features in this untargeted analysis. While the relatively modest number of samples limited the power to detect other associations, the nine features selected in our study correctly predicted case-control status in 79% of the samples. The stability of these features across three disparate feature-selection methods is promising. Furthermore, based on m/z and annotation information, these nine features appear to be different than those reported in the prospective CRC study by Cross et al. [25], warranting further identification and validation.

Abbreviations

bmi:: Body mass index
CRC:: Colorectal cancer
CV:: Coefficient of variation
EPIC:: European prospective investigation into cancer and nutrition
FDR:: False discovery rate
HMDB:: Human Metabolome Database
LASSO:: Least absolute shrinkage and selection operator
LCMS:: Liquid chromatography mass spectrometry
PUFA:: Poly-unsaturated fatty acid
TMAO:: Trimethylamine-N-oxide
ttd :: Time to (case) diagnosis
ULCFA:: Ultra-long chain fatty acid

References

Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2016;0:1–9.
Google Scholar
Haggar FA, Boushey RP. Colorectal Cancer Epidemiology : incidence, mortality, survival, and risk factors. Clin Colon Rectal Surg. 2009;22:191–7.
Article PubMed Google Scholar
American Cancer Society. Cancer facts and figures 2015; 2015. p. 1–9.
Google Scholar
Hemminki K, Czene K. Attributable risks of familial Cancer from the family-Cancer database attributable risks of familial Cancer from the family-Cancer Database 1; 2002. p. 1638–44.
Google Scholar
Rappaport SM. Genetic factors are not the major causes of chronic diseases. PLoS One. 2016;11:e0154387.
Article PubMed Google Scholar
Gassler N, Klaus C, Kaemmerer E, Reinartz A. Modifier-concept of colorectal carcinogenesis: Lipidomics as a technical tool in pathway analysis. World J Gastroenterol. 2010;16:1820–7.
Article CAS PubMed Google Scholar
Stone WL, Krishnan K, Campbell SE, Palau VE. The role of antioxidants and pro-oxidants in colon cancer. World J Gastrointest Oncol. 2014;6:55–66.
Article PubMed Google Scholar
Leufkens AM, Van Duijnhoven FJB, Siersema PD, Boshuizen HC, Vrieling A, Agudo A, et al. Cigarette smoking and colorectal cancer risk in the European prospective investigation into Cancer and nutrition study. Clin. Gastroenterol Hepatol. 2011;9:137–44.
Article Google Scholar
Platz EA, Willett WC, Colditz GA, Rimm EB, Spiegelman D, Giovannucci E. Proportion of colon cancer risk that might be preventable in a cohort of middle-aged US men. Cancer Causes Control. 2000;11:579–88.
Article CAS Google Scholar
Oostindjer M, Alexander J, Amdam GV, Andersen G, Bryan NS, Chen D, et al. The role of red and processed meat in colorectal cancer development : a perspective. MESC. 2014;97:583–96 Elsevier BV.
Google Scholar
Rothwell PM, Fowkes FGR, Belch JFF, Ogawa H, Warlow CP, Meade TW. Effect of daily aspirin on long-term risk of death due to cancer: analysis of individual patient data from randomised trials. Lancet. 2010;377:31–41 Elsevier Ltd.
Article Google Scholar
Ma Y, Zhang P, Wang F, Yang J, Liu Z, Qin H. Association between vitamin D and risk of colorectal Cancer : a systematic review of prospective studies. J Clin Oncol. 2011;3775–82.
Article CAS Google Scholar
Terzić J, Grivennikov S, Karin E, Karin M. Inflammation and Colon Cancer. Gastroenterology. 2010;138:2101–14.
Article PubMed Google Scholar
O'Keefe S. Diet, microorganisms and their metabolites, and colon cancer. Nat Rev Gastroenterol Hepatol. 2016;13:691–706.
Article CAS Google Scholar
Vipperla K, O’Keefe S. Diet, microbiota, and dysbiosis: a “recipe” for colorectal cancer. Food Funct. 2016;7:1731–40 Royal Society of Chemistry.
Article CAS Google Scholar
Norat T, Bingham S, Ferrari P, Slimani N, Jenab M, Mazuir M, et al. Meat, fish, and colorectal cancer risk : the European prospective investigation into cancer and nutrition. J Natl Cancer Inst. 2005;97:906–16.
Article PubMed Google Scholar
Chao A, Connell CJ, Mccullough ML, Jacobs EJ, Flanders WD, Rodriguez C, et al. Meat consumption and risk of colorectal Cancer. J Am Med Assoc. 2005:293, 172–82.
Article CAS Google Scholar
Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. The blood exposome and its role in discovering causes of disease. Environ Health Perspect. 2014;122:769–74.
Article PubMed Google Scholar
van Duijnhoven FJ, Bueno-De-Mesquita HB, Calligaro M, Jenab M, Pischon T, Jansen EH, Frohlich J, Ayyobi A, Overvad K, Toft-Petersen AP, Tjønneland A. Blood lipid and lipoprotein concentrations and colorectal cancer risk in the European Prospective Investigation into Cancer and Nutrition. Gut. 2011;60(8):1094–102.
Ritchie SA, Heath D, Yamazaki Y, Grimmalt B, Kavianpour A, Krenitsky K, et al. Reduction of novel circulating long-chain fatty acids in colorectal cancer patients is independent of tumor burden and correlates with age. BMC Gastroenterol. 2010;10:140 BioMed Central Ltd.
Article CAS PubMed Google Scholar
Ritchie SA, Tonita J, Alvi R, Lehotay D, Elshoni H, Su-Myat, et al. Low-serum GTA-446 anti-inflammatory fatty acid levels as a new risk factor for colon cancer. Int J Cancer. 2013;132:355–62.
Article CAS Google Scholar
Perttula K, Edmands WMB, Grigoryan H, Cai X, Iavarone AT, Gunter MJ, et al. Evaluating ultra-long-chain fatty acids as biomarkers of colorectal Cancer risk. Cancer Epidemiol Biomark Prev. 2016;25:1216–24.
Article CAS Google Scholar
Nishiumi S, Kobayashi T, Ikeda A, Yoshie T, Kibi M, Izumi Y, et al. A novel serum metabolomics-based diagnostic approach for colorectal cancer. PLoS One. 2012;7:1–10.
Article Google Scholar
Ritchie, SA, Ahiahonu PWK, Jayasinghe D, Heath D, Liu J, Lu Y, et al. Reduced levels of hydroxylated, polyunsaturated ultra long-chain fatty acids in the serum of colorectal cancer patients: implications for early screening and detection. BMC Med. 2010;8:13.
Article PubMed Google Scholar
Cross AJ, Moore SC, Boca S, Huang WY, Xiong X, Stolzenberg‐Solomon R, Sinha R, Sampson JN. A prospective study of serum metabolites and colorectal cancer risk. Cancer. 2014;19(120):3049–57.
Article CAS PubMed Google Scholar
Bae S, Ulrich CM, Neuhouser ML, Malysheva O, Bailey LB, Xiao L, et al. Plasma choline metabolites and colorectal cancer risk in the women’s health initiative observational study. Cancer Res. 2014;74:7442–52.
Article CAS PubMed Google Scholar
Nitter M, Norgård B, De VS, Eussen SJPM, Meyer K, Ulvik A, et al. Plasma methionine, choline, betaine, and dimethylglycine in relation to colorectal cancer risk in the European prospective investigation into Cancer and nutrition ( EPIC ). Ann Oncol. 2014;25:1609–15.
Article CAS Google Scholar
Xu R, Wang Q, Li L. A genome-wide systems analysis reveals strong link between colorectal cancer and trimethylamine N-oxide ( TMAO ), a gut microbial metabolite of dietary meat and fat. BMC Genomics. 2015;16:1–9.
Article Google Scholar
Wang Z, Klipfell E, Bennett BJ, Koeth R, Levison BS, Dugar B, et al. Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature. Nat Publ Group. 2011;472:57–63.
CAS Google Scholar
Zackular JP, Rogers MAM, Ruf MT, Schloss PD. The human gut microbiome as a screening tool for colorectal Cancer. Cancer Prev Res. 2014;7:1112–22.
Article CAS Google Scholar
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300.
Google Scholar
Santos CR, Schulze A. Lipid metabolism in cancer. FEBS J. 2012;279:2610–23.
Article CAS Google Scholar
Gassler N, Klaus C, Kaemmerer E, Reinartz A, Gassler N, Klaus C, et al. Modifier-concept of colorectal carcinogenesis : Lipidomics as a technical tool in pathway analysis. 2010;16:1820–7.
Lydic TA, Townsend S, Adda CG, Collins C, Mathivanan S, Reid GE. Rapid and comprehensive “shotgun” lipidome profiling of colorectal cancer cell derived exosomes. Methods. 2015;87:83–95 Elsevier Inc.
Article CAS PubMed Google Scholar
Friedman J, Hastie T, Tibshirani RJ. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.
Article PubMed Google Scholar
Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2:18–22.
Google Scholar
Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinformatics. 2016;17:1–13.
Article Google Scholar
Saeys Y, Inza I, Larranaga P. Gene expression a review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–17.
Article CAS Google Scholar
Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, et al. European prospective investigation into Cancer and nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002;5:1113–24.
Article CAS PubMed Google Scholar
Williams MD, Reeves R, Resar LS, Hill HH. Metabolomics of colorectal cancer: past and current analytical platforms. Anal Bioanal Chem. 2013;405:5013–30.
Article CAS Google Scholar
Nolen BM, Brand RE, Prosser D, Velikokhatnaya L, Allen PJ, Zeh HJ, et al. Prediagnostic serum biomarkers as early detection tools for pancreatic cancer in a large prospective cohort study. PLoS One. 2014;9:e94928.
Article PubMed Google Scholar
Moore LL, Bradlee ML, Singer MR, Splansky GL, Proctor MH, Ellison RC, et al. BMI and waist circumference as predictors of lifetime colon cancer risk in Framingham study adults. Int J Obes Relat Metab Disord. 2004;28:559–67.
Article CAS Google Scholar
Aleksandrova K, Boeing H, Jenab M, Bueno-de-Mesquita HB, Jansen E, Van Duijnhoven FJB, et al. Metabolic syndrome and risks of colon and rectal cancer: the european prospective investigation into cancer and nutrition study. Cancer Prev Res. 2011;4:1873–83.
Article CAS Google Scholar
Talwar P. Manual of assisted reproductive technologies and clinical embryology: Jaypee brothers medical Publisher Pvt. Limited; New Delhi.2014.
Saint-Ramon J-G, Beau C, Ehrsam A. Tubes for conservation biological particle; for use as tool in biological sampling: Google Patents; US 2002O188222A1. 2001.
Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. LC / MS preprocessing and analysis with xcms. Anal Chem. 2006;78:779–87.
Article CAS Google Scholar
Benton HP, Wong DM, Trauger S a, Siuzdak G. XCMS 2 : processing tandem mass spectrometry data for metabolite identification and structural characterization. Anal Chem. 2008;80:6382–9.
Article CAS PubMed Google Scholar
R Development Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013. URL http://www.R-project.org/
Google Scholar
Kuhl C, Tautenhahn R, Christoph B, Larson TR, Neumann S. CAMERA: an integrated strategy for compound spectra extraction and annotation of LC/MS data sets. Anal Chem. 2011;84:283–9.
Article PubMed Google Scholar
Stanstrup J, Gerlich M, Dragsted LO, Neumann S. Metabolite profiling and beyond: approaches for the rapid processing and annotation of human blood serum mass spectrometry data. Anal Bioanal Chem. 2013;405:5037–48.
Article CAS Google Scholar
Edmands WMB, Petrick LM, Barupal DK, Scalbert A, Wilson M, Wickliffe J, et al. compMS2Miner : an automatable metabolite identification, visualization and data-sharing R package for high-resolution LC-MS datasets. Anal Chem. 2017;89:3919–28.
Article CAS Google Scholar
Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D, Sawhney S, Fung C. HMDB: the human metabolome database. Nucleic Acids Res. 2007;35(suppl_1):D521–6.
Article CAS PubMed Google Scholar
Smith CA, O’Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, et al. METLIN a metabolite mass spectral database. Proc 9Th Int Congr Ther Drug Monit Clin Toxicol. 2005;27:747–51.
Article CAS Google Scholar
Veselkov KA, Vingara LK, Masson P, Robinette SL, Want E, Li JV, et al. Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. Anal Chem. 2011;83:5864–72.
Article CAS Google Scholar
Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N, et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc. 2011;6:1060–83.
Article CAS Google Scholar
Bolstad BM, Irizarry RA. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–93.
Article CAS Google Scholar
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
Article PubMed Google Scholar
Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.
Article CAS PubMed Google Scholar
Broeckling CD, Afsar FA, Neumann S, Prenni JE. RAMClust: A Novel Feature Clustering Method Enables Spectral- Matching-Based Annotation for Metabolomics Data. 2014;
Google Scholar
Tibshirani R. Regression Selection and Shrinkage via the Lasso. J R Stat Soc B. 1996;58:267–88.
Google Scholar
Bach FR, Project-team IW. Bolasso : Model Consistent Lasso Estimation through the Bootstrap. 2008;
Book Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Article Google Scholar
Broadhurst DI, Kell DB. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics. 2006;2:171–96.
Article CAS Google Scholar
Byrdwell WC. Dual parallel liquid chromatography with dual mass spectrometry (LC2/MS2) for a total lipid analysis. Front Biosci. 2008;13:100–20.
Article CAS Google Scholar
Ogretmen B. Sphingolipid metabolism in cancer signalling and therapy. Nat Rev Cancer. 2017;18:33–50.
Article PubMed Google Scholar
Rezanka T, Sigler K. Odd-numbered very-long-chain fatty acids from the microbial, animal and plant kingdoms. Prog Lipid Res. 2009;48:206–38.
Article CAS Google Scholar
Weir TL, Manter DK, Sheflin AM, Barnett BA, Heuberger AL, Ryan EP. Stool microbiome and metabolome differences between colorectal Cancer patients and healthy adults. PLoS One. 2013;8:e70803.
Article CAS PubMed Google Scholar
Agle SC, Philips P, Martin RCG. Environmental exposures, tumor heterogeneity, and colorectal Cancer outcomes. Curr Colorectal Cancer Rep. 2014;10:189–94.
Article Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the assistance of Agilent Technologies (Santa Clara, CA, USA) for the loan of the high-resolution mass spectrometer used in these analyses. We also thank the subject cases and controls for their participation in EPIC.

Funding

Support for this work was provided by grant P42ES04705 from the National Institute for Environmental Health Sciences (National Institutes of Health) (to S.M.R.) and grant agreement 308610-FP7 from the European Commission (Project Exposomics) (to P.V.).

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available since the covariates originated from a third party (EPIC, IARC), but are available from the corresponding author on reasonable request and approval of the third party.

Author information

Authors and Affiliations

School of Public Health, University of California, Berkeley, CA, 94720-7356, USA
Kelsi Perttula, Courtney Schiffman, William M B Edmands, Lauren Petrick, Hasmik Grigoryan, Xiaoming Cai, Sandrine Dudoit & Stephen M Rappaport
California Institute for Quantitative Biosciences, University of California, Berkeley, CA, 94720, USA
Sandrine Dudoit
International Agency for Research on Cancer, Lyon, France
Marc J Gunter
Italian Institute for Genomic Medicine (IIGM), Torino, Italy
Alessio Naccarati, Silvia Polidoro & Paolo Vineis
MRC-PHE Centre for Environment and Health, Imperial College, Place London W2 1PG, Norfolk, UK
Paolo Vineis
Department of Statistics, University of California, Berkeley, CA, 94720, USA
Sandrine Dudoit
Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
Lauren Petrick

Authors

Kelsi Perttula
View author publications
You can also search for this author in PubMed Google Scholar
Courtney Schiffman
View author publications
You can also search for this author in PubMed Google Scholar
William M B Edmands
View author publications
You can also search for this author in PubMed Google Scholar
Lauren Petrick
View author publications
You can also search for this author in PubMed Google Scholar
Hasmik Grigoryan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Cai
View author publications
You can also search for this author in PubMed Google Scholar
Marc J Gunter
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Naccarati
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Polidoro
View author publications
You can also search for this author in PubMed Google Scholar
Sandrine Dudoit
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Vineis
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M Rappaport
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KP, WBE, PV and SR conceived and designed the study. XC, KP, WBE, and HG developed LCMS methodology. KP and WBE performed experiments and acquired the LCMS data. KP, WBE, LP, and XC analyzed LCMS features. CS, KP, SD, and SR performed the normalization and statistical analysis. PV, MG, AN, SP and MG provided and interpreted the metadata from the EPIC study that were required for statistical analyses. KP, CS, and SR wrote the manuscript. All authors contributed to the final manuscript and approved its content.

Corresponding author

Correspondence to Stephen M Rappaport.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the ethics review board of the International Agency for Research on Cancer (IARC). Informed written consent was provided by all participants at the participating centers; for this study, the center in Turin, Italy.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Additional statistical analyses of the nine selected metabolomic features. Table S1. Covariate associations with the nine selected features; Table S2. Time-to-diagnosis model coefficients and p-values for the nine selected features. (PDF 60 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Perttula, K., Schiffman, C., Edmands, W.M.B. et al. Untargeted lipidomic features associated with colorectal cancer in a prospective cohort. BMC Cancer 18, 996 (2018). https://doi.org/10.1186/s12885-018-4894-4

Download citation

Received: 17 November 2017
Accepted: 03 October 2018
Published: 19 October 2018
DOI: https://doi.org/10.1186/s12885-018-4894-4

Untargeted lipidomic features associated with colorectal cancer in a prospective cohort

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Study population

Chemicals

Sample processing

Mass spectrometry

Data processing

Statistical methods: Variable selection

Results

Features that discriminate for CRC

Potentially causal and reactive biomarkers

Discussion

Possible annotations

Limitations

Conclusions

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional file

Additional file 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Cancer

Contact us