Computational models applied to metabolomics data hints at the relevance of glutamine metabolism in breast cancer

Background Metabolomics has a great potential in the development of new biomarkers in cancer and it has experiment recent technical advances. Methods In this study, metabolomics and gene expression data from 67 localized (stage I to IIIB) breast cancer tumor samples were analyzed, using (1) probabilistic graphical models to define associations using quantitative data without other a priori information; and (2) Flux Balance Analysis and flux activities to characterize differences in metabolic pathways. Results On the one hand, both analyses highlighted the importance of glutamine in breast cancer. Moreover, cell experiments showed that treating breast cancer cells with drugs targeting glutamine metabolism significantly affects cell viability. On the other hand, these computational methods suggested some hypotheses and have demonstrated their utility in the analysis of metabolomics data and in associating metabolomics with patient’s clinical outcome. Conclusions Computational analyses applied to metabolomics data suggested that glutamine metabolism is a relevant process in breast cancer. Cell experiments confirmed this hypothesis. In addition, these computational analyses allow associating metabolomics data with patient prognosis.


Background
Breast cancer is one of the most common malignancies, with 266,120 estimated new cases and 40,920 estimated deaths in the United States in 2018 [1]. In clinical practice, the expression of hormonal receptors and HER2 allows the classification of this disease into three groups: hormonal receptor-positive (ER+), HER2+ and triple negative (TNBC).
Metabolomics, a technique focused in the holistic study of the metabolites present in a biological system, is considered the most recent -omics. It consists of measuring the entire set of metabolites present in a biological sample [2]. The most common techniques in metabolomics experiments are mass spectrometry-related methods, which are based on the mass/charge relationships of each metabolite or its fragments [3]. Recent advances in this technique allow the measurement of thousands of metabolites from minimal amounts of biological samples [3,4]. Therefore, metabolomics is a promising tool for the development of new biomarkers [5]. Metabolomics has emerged in the field of oncology as a very informative technique for characterizing metabolic profiles associated with oncogenotypes, disease progression, and therapeutic targets [6]. For instance, in early breast cancer, a previous study identified an association between levels of choline, glycine, and lactate and histopathological grades and tumor size [7,8].
Transcriptomics and metabolomics data offer complementary information. We used two different methods to merge metabolomics and gene expression data in breast cancer. In previous studies, we used probabilistic graphical models (PGMs) to study differences between breast tumor subtypes and to characterize muscle-invasive bladder cancer at a functional level using proteomics data [9][10][11]. Flux Balance Analysis (FBA), however, is a method that has been widely used to study biochemical networks [12]. FBA predicts the growth rate or the rate of production of a given metabolite [13], and it has previously been used to characterize breast cancer cell responses against drugs targeting metabolism [14]. In this study, flux activities were proposed as a feasible method to compare flux patterns in metabolic pathways.
Glutamine has a relevant role in tumor metabolism. The entrance of glutamine in the tricarboxylic acid cycle (TCA) generates lactate, a process known as glutaminolysis. The metabolism of glutamine serves to maintain the availability of non-essential aminoacids and to maintain TCA intermediates while NADH is generating [15]. Glutamine is necessary to cellular proliferation and its metabolism is regulated by the levels of MYC oncogene [16,17].
In the present study, metabolomics and gene expression data from 67 fresh tissue samples [18] were analyzed through PGMs and FBA. Our aim was to find associations between metabolomics and gene expression data and the characterization of breast cancer from a metabolomics point of view.

Patients included in the study
Metabolomics and gene expression data from 67 fresh tumor tissue samples originally analyzed by Terunuma et al. were included in this study. Untargeted metabolomic quantification was performed by Metabolon Inc. Samples were prepared using the automated MicroLab STAR system from Hamilton company and massspectrometry experiments were done in a Waters ACQUITY UPLC and a Thermo-Finnigan linear trap quadrupole mass spectrometer and in a Thermo-Finnigan Trace DSQ fast-scanning quadrupole mass spectrometer [18].

Preprocessing of gene expression and metabolomics data
Metabolomics data contains information about 536 metabolites. Log2 was calculated. As quality criteria, data were filtered to include detectable measurements in at least 75% of the samples. Missing values were imputed to a normal distribution using Perseus software [19]. After quality control, 237 metabolites were considered for subsequent analyses.
In terms of gene expression data, the 2000 most variable genes, i.e., those genes with the highest standard deviation, were chosen to build the PGM. This data was from an Affymetrix array and they are available in Gene Expression Omnibus Database under the identifier GSE37751.
Probabilistic graphical models and gene ontology analyses As previously described [10,11,14,20], PGMs compatible with high dimensional data were used, using correlation as associative criteria. PGMs were built using metabolomics, gene expression or flux activity data without any a priori information. The grapHD package [21] and R v3.2.5 [22] were employed to build the PGMs. In this case, PGMs use gene expression or metabolomics quantification without any a priori information, making connections based on correlation as associative method. PGM construction is based on two sequential steps: first, finding the spanning tree with the maximum likelihood and, second, reduce the graph edges based on the reduction of Bayesian Information Criterion (BIC) and the preservation of decomposability [23]. The visualization of the resulting network was done using Cytoscape software [24] (Sup Files 1-3). The resulting networks were divided into branches and ontology analyses were done to assign a majority function/metabolic pathway to each branch, defining in this way different functional nodes in the networks. In the case of genes, gene ontology analyses were performed using the DAVID web tool with "Homo sapiens" as background and GOTERM, KEGG and Biocarta selected as categories [25]. In the case of metabolites, the Integrated Molecular Pathway Level Analysis (IMPaLA) web tool was used to assign a main metabolic pathway to each branch [26].
Once the functional structure was defined, functional node activities were calculated in order to make comparisons between groups, as previously described [10,11,14,20]. Briefly, each functional node activity was calculated as the mean of the expression/quantity of genes/ metabolites of each node that are related to the main node function/metabolic pathway.

Flux balance analysis and flux activities
FBA is a method that allows the estimation of tumor growth rate and the model of the flow of metabolites in a metabolic network. FBA was performed using the library COBRA Toolbox v2.0 [27] available for MATLAB. FBA was calculated using the whole human metabolic reconstruction Recon2 [28]. This metabolic reconstruction includes 2191 genes collected into the Gene Protein Reaction rules (GPRs), 5063 metabolites and 7440 reactions. GPRs represent the relationships between genes and metabolic reactions and they are included into the model as Boolean expressions. GPRs were solved as described in previous studies [11,14], using a modification of the Barker et al. algorithm [29], which were incorporated into the model by a modified E-flux method [14,30]. Briefly, the "OR" operators were solved as the sum and the "AND" operators were solved as the minimum. Then, the GPR data were normalized using the Max-min function and introduced into the model as the reaction bounds. As the objective function, the biomass reaction proposed in the Recon2 was used as representative of tumor growth. This biomass reaction was based on experimental measurements of leukemia cells. The 7440 reactions are grouped into 101 metabolic pathways. The formulated mathematical problem was solved by linear programming. Finally, an estimation of tumor growth rate and a value of flux for each reaction in the model were obtained.
Flux activities were previously proposed as a measurement to compare differences at the metabolic pathway level [14]. Briefly, they were calculated as the sum of the fluxes of the reactions included in each pathway defined in Recon2.

Statistical analyses
The statistical analyses were performed with GraphPad Prism v6. Predictor signatures were built with the BRB Array Tool from Dr. Richard Simon's team [31]. Variables related to overall survival were ranked on the basis of their p-values for the long-rank test, using the overall survival as endpoint. Leave-one-out cross-validation was used to evaluate the predictive accuracy of the profiles. The cutoff point was established a priori and to test the statistical significance, the p-value of the log-rank test statistic for the risk groups was evaluated using 1000 random permutations. Then, the independence between the defined predictors and clinical parameters was checked using Cox regression models. All p-values are two-sided and are considered statistically significant under 0.05.

Cell viability assays
Dose-response curves were designed for AOA and GPNA. As the preparation of GPNA needs an acid medium, HEPES (50 mM) was added to buffer the medium. About 5000 cells were seeded in each well in 96-well plates and after 24 h, drugs were added. After an incubation of 72 h, cell viability was determined using CellTiter 96 AQueous One Solution Cell Proliferation Assay (Promega) kit and absorbance was quantified on a microplate reader (TECAN). As a control untreated cells were used and all the experiments were performed by triplicate.

Patient characteristics
With the aim of study the relationships between metabolomics, gene expression, and FBA results, metabolomics and gene expression data, analyzed by mass-spectrometry and microarrays GeneChip Human Gene 1.0 ST (Affymetrix) respectively and published by Terunuma et al., were analyzed [18].
A total of 67 tumor fresh tissue samples from patients with breast cancer recruited in Baltimore hospital were studied. This patient cohort comprises 33 ER+ and 34 ER-(of which 14 were also TNBC). The median followup was 50 months, and 31 deaths occurred during this time. No significant differences regarding overall survival (OS) were observed between patients with ER+ or ERtumors. Patient characteristics are shown in Table 1.

Analysis of metabolomics data
After Kaplan-Meier analysis, 29 metabolites were found related to OS (p < 0.05) (Sup Table 1).
Metabolomics data without using any a priori information were analyzed through PGM. Metabolomics database, including information about 536 metabolites, was reduced to 237 metabolites due to quality criteria (detected in at least 75% of the samples). A main metabolic pathway was assigned to each functional node of the resulting network using IMPaLA. IMPaLA is a tool that allows ontology analyses based on metabolic pathways and metabolites instead of other platforms such as DA-VID which are based on genes and biological processes. Strikingly, this network had a functional structure, grouping the metabolites into metabolic pathways as it has been previously shown for gene and protein PGMs [10,11,14,20]. This is relevant because this type of networks is built based on the gene expression data or, in this case, the metabolomics data, without any a priori information, meaning that those metabolites that are involved in a concrete metabolic pathway had a correlated quantification profile across the series. Five functional nodes were defined, each with a different overrepresented metabolic pathway (Fig. 2).
The activity of each functional node was calculated as previously described and comparisons between ER+ and ER-were done [10,11,14,20]. Significant differences were found between ER+ and ER-tumors regarding lipid and purine metabolism (p < 0.05) (S1 Fig).
Moreover, the lipid metabolism functional node showed prognostic value in this cohort in a Kaplan-Meier analysis (p-value = 0.008, FDR = 0.04). Then, a predictor signature was built (p = 0.045, HR = 0.48, 50:50%) (Fig. 3). However, a multivariate analysis showed that the predictor do not add additional prognostic information to that provided by clinical features (S2 Table).

Analyses combining gene expression with metabolomics data
On the other hand, a network combining metabolomics and gene expression data was built. Due to the differences between both kinds of data, most of the metabolites were grouped together. However, some metabolites were integrated into gene branches (Fig. 4). This combined network was then functionally characterized based on the majority function of the genes contained in each branch. The resulting network had eleven functional nodes and a twelfth branch that include the majority of the metabolites (Fig. 4).
Once the main functions were assigned, a literature review was performed to study the relationship between metabolites included in the gene functional nodes and the main function of each functional node. We found out that a relationship with functional nodes had been previously described for 4 of 20 metabolites: succinate, cytidine, histamine and 1,2-propanediol. The relationships between metabolites and their node function are shown in Table 2.

Flux balance analysis and flux activities
FBA and flux activities were calculated as previously described [14]. Briefly, gene expression data from 67 tumor samples were used, GPR rules were solved and the normalized values were introduced into the metabolic model using modified E-flux algorithm [11,30]. Finally, FBA was calculated using a biomass reaction representative of tumor growth. No significant differences were found in the tumor growth rate between ER+ and ERtumors (S2 Fig). Flux activities showed significant differences between ER+ and ER-in glycerophospholipid metabolism, phosphatidyl inositol metabolism, urea cycle, propanoate  In addition, the combination of glutamate metabolism (the pathway that includes the glutamine) and alanine and aspartate metabolism flux activities showed prognostic value in this cohort (p-value = 0.024, HR = 0.41, 50:50%) (Fig. 5). A multivariate analysis showed that this flux activity-based predictor provides prognostic information independently from clinical data (S3 Table).

PGM analysis combining flux activities with metabolomics data
Flux activities were calculated for each metabolic pathway defined in the Recon2. Then, using flux activities and metabolomics data, a new PGM was built to study association between both types of data. Interestingly, both types of data appeared mixed in the network; with the peculiarity that flux activities appeared usually at the periphery of the network (Fig. 6).
The resulting network was split into several branches to study the relationship of the metabolites with the flux activities included in each branch (Fig. 6). Coherence between both types of data was shown by the PGM, associating flux activities and metabolites related to these flux activities in the same branch. For instance, branch 1 includes glycolysis flux activity and three metabolites previously related to glycolysis (S4 Table). Regarding vitamins and cofactors, it was not possible make comparisons because the IMPaLA label for this category is "Vitamin and co-factor metabolism" and Recon2 labels differentiate between the various vitamins, labeling them as "Vitamin B6 metabolism", "Vitamin A metabolism", etc.

Cell viability assays using drugs targeting glutamine metabolism
As the computational analyses pointed out the relevance of glutamine and its metabolism in breast cancer (glutamine and glutamate metabolism appeared recurrently in the analyses), cell viability assays employing two drugs targeting glutaminolysis (aminooxyacetic acid [AOA], and L-Glutamic acid γ-(p-nitroanilide) hydrochloride [GPNA]) were performed. Dose-response curves of these two drugs confirm that targeting glutamine metabolism affected cell viability (Fig. 7).
In AOA-treated cell lines, the IC 50 did not show any differential response between breast cancer subtypes. However, in GPNA-treated cells, IC 50 for ER+ cell lines were lower than IC 50 for TNBC cells (Table 3).   [18,32]. Based on this dataset, we applied PGMs for the first time in metabolomics data from tumor samples and also in metabolomics data combined with gene expression data and flux activities, with the aim of confirming known associations and finding new ones.
First, we evaluated whether metabolomics data were related to OS in patients with breast cancer. An OS predictive signature was built that included the expression values of glutamine, deoxycarnitine, butyrylcarnitine, glycerophosphorylcholine and 2-hydroxypalmitate [15]. The first four metabolites have been previously related to survival in breast cancer [33,34]. However, to our knowledge, this is the first report associating 2hydroxypalmitate with cancer survival. Additionally, in the previous study by Terunuma et al., 2hydroxyglutarate was associated with poor prognosis in patients with breast cancer [18]. 2-hydroxyglutarate is a glutamine intermediate in the tricarboxylic acid cycle, involved in the conversion of glutamine into lactate, a process known as glutaminolysis [15]. The negative sign in the predictor of glutamine points a protective effect (the more glutamine the better prognosis). An increased  presence of glutamine could indicate that it has not been introduced into the Krebs cycle and transformed into lactate, a fact associated with a more aggressive phenotype and a worst prognosis.
A metabolite network using metabolomics data was built using PGMs. PGMs are based on expression data, or quantification data in the case of metabolomics, and they do not need any additional information. The output of this analysis is a network that reflects the correlations between genes, proteins or metabolites. On the other hand, IMPaLA assigned a dominant metabolic function to each resulting node. In previous studies, we demonstrated that PGMs are useful for functionally characterizing gene or protein networks [10,11,20]. However, to our knowledge, this is the first time a PGM has been applied to metabolomics data from tumor samples. Just as observed in genes or proteins, metabolites are grouped into metabolic pathways, allowing the characterization of differences in metabolic pathways between ER+ and ER-tumors. For example, both lipid metabolism and purine metabolism node activities were higher in ER-tumors. Although there has not been described a relationship between lipids and breast cancer subtypes, it was described that ER-tumors usually overexpress genes related to lipid metabolism [35]. Moreover, the activity of the lipid metabolism node had prognostic value. No relationship between purine metabolism and breast cancer has previously been defined.
On the other hand, the network combining gene expression data and metabolomics data grouped most of the metabolites into an isolated branch. Yet, some metabolites were included into gene branches. We found that four out of twenty metabolites showed a previously reported relationship with the main function of the gene functional node in which they were included. Succinate and cytidine were located in the immune response functional node. Succinate acts as an inflammation activation signal, inducing IL-1β cytokine production through hypoxia-inducible factor 1 [36]. In addition, succinate increases dendritic cell capability to act as antigen-presenting cells, prompting an adaptive immune response [37]. Regarding cytidine, Wachowska et al. described that 5-aza-2′-deoxycytidine modulates the levels of major histocompatibility complex class I molecules in tumor cells, induces P1A antigen and has immunomodulatory activity when combined with photodynamic therapy [38].
Both histamine and 1,2-propanediol appeared to be related to the angiogenesis functional node. Histamine is known to promote angiogenesis through vascular epithelial growth factor [39]. On the other hand, sulfoquinovosyl acylpropanediol, an 1,2-propanediol derivate, inhibits angiogenesis in murine models with pulmonary carcinoma [40].
The remaining sixteen metabolites require an in-depth study to establish associations with their respective functional nodes. These results support the potential of PGMs as a tool to generate hypotheses without the need of a priori knowledge.
FBA was used to model metabolism using gene expression data. Although FBA-predicted biomass did not show significant differences between ER+ and ER-tumors, differences in flux activities were shown between both subtypes. Some of these activities were also related to prognosis, such as "Glutamine metabolism", which agrees with the results obtained from the metabolomics data, including glutamine in the metabolite signature capable of predicting OS. These results highlighted the  relevance of glutamine metabolism in breast cancer, suggesting the utility of drugs targeting this pathway such as GPNA, which it has already been described as affecting proliferation in lung cancer cells [41]. Strikingly, cell viability experiments using GPNA and AOA, two drugs targeting glutamine metabolism, showed a decreased in cell viability, confirming the relevant role of this process in breast cancer. Despite the highest levels of glutaminerelated enzymes described in TNBC and HER2 tumors comparing with luminal tumors, dose-response curves did not show any differential response between ER+ and TNBC breast cancer cells in the case of AOA [42,43]. In addition, ER+ cells seem to be more sensible to GPNA. On the other hand, AOA has been successfully tested in ER+ and ER-breast cancer xenograft models [44].
With the aim of associating metabolomics and FBA results, flux activities and metabolomics data were combined to form a new network. As opposed to gene and metabolite data, metabolomics data and flux activities combined well in the network. Interestingly, flux activities are dead-end nodes, perhaps due to the fact that they are by definition a final summary of each pathway. IMPaLA assigned a main metabolic pathway to resulting branches; thus, it was possible to know how many metabolites were related to flux activity in each branch. In most cases with available information, there was coherence between metabolites included in the branch and its flux activity. This validates FBA and flux activities, both based on gene expression, as a method of simulating metabolism.
Our study has some limitations. The limited number of samples leads us to consider the results as preliminary, and validation in an independent cohort is needed. Additionally, our results are difficult to place in the current clinical landscape, given that tumors in the original series had not been assessed for HER2 expression. On the other hand, evolving techniques currently allow the detection of more metabolites, which would permit a more thorough analysis.

Conclusions
Metabolomics is postulated as a booming technique for the biomarker search in cancer. Additionally, PGMs reveal their utility in the analysis of metabolomics data from a functional point of view, not only metabolomics data alone, but also in combination with flux or gene expression data. Therefore, PGM is postulated as a method to propose new hypotheses in the metabolomics field. We also found that it is possible to associate metabolomics data with clinical outcomes and to build prognostic signatures based on metabolomics data. Finally, these computational analyses suggested a main role of glutamine metabolism in breast cancer, a fact that was experimentally validated.
Additional file 1: File S1: Cytoscape file of metabolomics network.
Additional file 2: File S2: Cytoscape file of the network combining gene expression and metabolomics data.
Additional file 3: File S3: Cytoscape file of the network combining metabolomics and flux activities.
Additional file 5: Table S2: Multivariate Cox regression model comparing OS predictor based on metabolomics data. T = tumor stage, N = lymph node status, G = tumor grade.
Additional file 6: Table S3: Multivariate Cox regression model comparing OS predictor based on node activity of lipid metabolism. T = tumor stage, N = lymph node status, G = tumor grade. Additional file 7: Table S4: Multivariate Cox regression model comparing OS predictor based on flux activities. T = tumor stage, N = lymph node status, G = tumor grade.
Additional file 8: Table S5: Metabolites associated with flux activity of each network branch.
Additional file 9: Fig S1: Node activities from the metabolic network. Fig S2: Tumor growth rate predicted using FBA for ER+ and ER-tumors. Authors' contributions All the authors have directly participated in the preparation of this manuscript and have approved the final version submitted. JMA, MD-A, HN and PM contributed the probabilistic graphical models. LT-F, AG-P, G-PV, and AZ-M performed the statistical analyses, the graphical model interpretation and the ontology analyses. LT-F, AG-P, JAFV, PZ and EE conceived of the study and participated in its design and interpretation. MD-A and LT-F performed the Flux Balance Analysis. EL-C did the cell viability experiments. LT-F drafted the manuscript. AG-P, JAFV, and EE supported the manuscript drafting. AG-P and JAFV coordinated the study.

Funding
This study was supported by Instituto de Salud Carlos III, Spanish Economy and Competitiveness Ministry, Spain and co-funded by the FEDER program, "Una forma de hacer Europa" (PI15/01310). LT-F is supported by the Spanish Economy and Competitiveness Ministry (DI-15-07614). GP-V and EL-C are supported by Consejería de Educación, Juventud y Deporte of Comunidad de Madrid (IND2017/BMD7783). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.
Availability of data and materials Data used in this work is from Terunuma et al.

Consent for publication
Not applicable.