Serum microRNA signatures and metabolomics have high diagnostic value in gastric cancer

Background Many novel diagnostic biomarkers have been developed for gastric cancer (GC) recently. We chose two methods with high diagnostic value, the detection of serum microRNAs and metabolomics based on gas chromatography/mass spectrometry (GC/MS), and aimed to establish appropriate models. Methods We reviewed the diagnostic accuracies of all microRNAs identified by previous diagnostic tests. Then appropriate microRNAs and their combinations were validated the diagnostic value. We included 80 patients with GC and 82 healthy controls (HCs) and detected the expression of the microRNAs. GC/MS analysis was conducted, and we used three multivariate statistical analyses to establish diagnostic models. The concentrations of carcinoembryonic antigen (CEA) and carbohydrate antigen 19–9 (CA19–9) were detected for comparison with the novel models. Results Sixty-seven published studies and 70 microRNAs were finally included in the systematic review. MiR-18a, miR-19a, miR-21, miR-92a, miR-199a and miR-421 were chosen to further validate their diagnostic efficiencies. Five of those microRNAs in GC patients had significantly different expression. The combination of miR-19a and miR-92a had the highest area under the curve (AUC) at 0.850 with a sensitivity of 91.3% and a specificity of 61.0%. The GC/MS analysis performed an excellent diagnostic value and the AUC reached 1.0. Conclusion There is a good potential for microRNAs and GC/MS analysis as new diagnostic methods in view of their high diagnostic value compared with traditional biomarkers. Electronic supplementary material The online version of this article (10.1186/s12885-018-4343-4) contains supplementary material, which is available to authorized users.

MicroRNAs are non-protein-coding RNAs with small molecular size that regulate target gene expression by binding to their 3′ untranslated region [8]. Thousands of microRNAs have been discovered over the past decade, and quite a few microRNAs have been determined the potential for the diagnosis of GC. Nevertheless, the diagnostic efficiencies of the reported circulating microRNAs are not consistent among studies. It is thus necessary to summarize the diagnostic value of these microRNAs via a systematic review. We did abovementioned work and aimed to overcome the deficiencies of previous systematic reviews and meta-analyses, such as small including article number, single researched microRNA [9], or lack of the information of each microRNA [10][11][12][13]. Then we chose six microRNAs with high Youden indexes or area under the curve (AUC) values of the receiver operating curve (ROC) to validate their diagnostic value and establish a diagnostic panel.
Metabolomics is defined as the quantitative measurement of low-molecular-weight metabolites in an organism at a specified time under specific environmental conditions [14]. GC/MS, which is one of metabolomic techniques, has robust results and is widely used in metabolite identification because of its peak resolution, high sensitivity, and reproducibility [15,16]. Several studies reported its high diagnostic value for GC, and the AUC value usually reached more than 0.90 [17]. As highthroughput experimental data, the results of GC/MS are always processed by multivariate statistical analysis, including the principal component analysis (PCA), partial least squares-discriminate analysis (PLS-DA), and orthogonal partial least squares-discriminant analysis (OPLS-DA). We further validated the diagnostic value of metabolomics and compared the three most frequently used statistical methods.

Study design
First of all, we reviewed the diagnostic accuracies of microRNAs mentioned in previous studies. We searched several relevant databases, including PubMed, Embase, and the Chinese Biomedical Literature Database (CBM) up to Jul 26, 2017. The search strategy was ("stomach neoplasms" [Mesh] OR "gastric cancer" OR "stomach cancer") AND (miRNA OR microRNA OR miR) AND (blood OR serum OR plasma OR circulating) AND (diagnosis OR diagnostic OR diagnose). There were no language restrictions in searching process. Lists of references of articles were searched manually for additional publications [18].
Then, we selected the microRNAs with high Youden indexes and high AUC values to establish a diagnostic model according to the results of the systematic review. The serum specimens from 80 patients with GC and 82 healthy controls (HCs) were obtained to detect the microRNA levels using quantitative reverse-transcription polymerase chain reaction (qRT-PCR).
Next, we selected 25 GC patients and 30 HCs from the cohort mentioned above with a completely random method and utilized GC/MS to profile the metabolomic signatures.
Finally, the diagnostic value was compared among the new models and the traditional tumor biomarkers, carcinoembryonic antigen (CEA) and carbohydrate antigen 19-9 (CA19-9). An overview of the study design is illustrated in Fig. 1.

Inclusion and exclusion criteria of the literature
Studies were included if they met the following inclusion criteria: (1) studies regarding the diagnostic value of microRNAs in GC; (2) blood specimens; and (3)

Data extraction
Data were extracted independently by two reviewers from all of the included articles: (1) basic characteristics of the studies, including the first author, year of publication, country of publication, ethnicity, sample size, mean or median age, gender, type of specimens (serum or plasma), target microRNAs, and reference control RNA; and (2) diagnostic information of the microRNAs, including the sensitivity, specificity, AUC and expression variation.

Patients and specimens
We included 80 patients with GC and 82 HCs who were from in Zhongshan Hospital, Fudan University between May 2015 and September 2015. The GC patients were all definitively diagnosed by an endoscopic biopsy. Exclusion criteria were history of other malignant tumors, a surgical operation, radiotherapy or chemotherapy. Healthy individuals were identified by clinical manifestations, histories of diseases and results of blood tests. The samples were centrifuged for 10 min at 820 g and 4°C to remove residual cell debris, and the supernatants were immediately stored at − 80°C until further analyses. The serum concentrations of serum CEA and CA19-9 were measured with the electro-chemiluminescence immunoassay. Approval for the study was given by the Ethics Committee of Zhongshan Hospital of Fudan University, Shanghai. All GC patients and control subjects provided written informed consents before enrollment in this study.

RNA extraction and reverse transcription
200 μl of the serum samples was spiked with 2 μl of 25 fmol synthetic cel-miR-39 (Tiangen, Beijing, China) as the external reference. Total RNA enriched for small RNAs was isolated simultaneously from the serum with the miRcute microRNA Isolation Kit (Tiangen, Beijing, China) according to the modified manufacturer's protocol [19]. To determine the purities and concentrations, we utilized a NanoDrop spectrophotometer (NanoDrop, Wilmington, DE, USA) to assess the optical density of the extracted RNA at 260 and 280 nm.
The extracted microRNA was polyadenylated by 20 μl of the poly (A) polymerase. 6 μl of the poly (A) reaction solution was reverse transcribed to cDNA in another 20 μl with miRcute microRNA The First-strand cDNA Synthesis Kit (Tiangen, Beijing, China) following the manufacturer's instructions. Reverse transcription was run in triplicate.

Quantitative real-time PCR
The PCR reaction was performed for amplification using the miRcute microRNA qPCR Detection Kit (Tiangen, Beijing, China) on ABI PRISM 7500 Sequence Detection System (Applied Biosystems, Foster City, CA, USA). Each qPCR reaction solution contained diluted cDNA, 2× miRcute microRNA premix (with SYBR and ROX), the manufacturer-provided microRNA-specific forward primer, and a universal reverse primer to a total volume of 20 μl. The qPCR reaction parameters were 94°C predenaturation for 2 min, 45 cycles of 94°C for 20 s, 60°C annealing for 34 s, and 72°C extension for 30 s. A melting curve analysis was accomplished to ensure the specificity of the target PCR product in the end.
The relative expression of the microRNAs was calculated using the equation log 10 (2 −ΔCT ). The ΔCT was equal to CT values of the microRNAs of interest minus the CT values of the cel-miR-39 [19].

Specimen processing for metabolomics
For the GC/MS analysis, the serum samples were transferred into glass centrifuge tubes in a 200-μl volume. Each sample was spiked with 200 μl of 2-chloro-phenylalanine (0.3 g/L) as an internal standard and 600 μl of methanol. The mixture was vortexed for 30 s, incubated for 10 min at − 20°C and then centrifuged for 15 min at 12000×g and 4°C. Supernatant in an 800-μl volume was collected separately into an ampoule bottle and then evaporated to dryness under a stream of nitrogen gas at 50°C for around 30 min. Subsequently, 200 μl of a methoxyamine pyridine solution (15 g/L) was put into the ampoule bottle. The mixture was vortexed for 2 min and incubated for 60 min at 37°C. Next, we added 200 μl of bis-(trimethylsilyl)-trifluoroacetamide (BSTFA) plus 1% trimethylchlorosilane (TMCS), and the mixture was vortexed for 2 min and incubated for 30 min at 100°C. The methanol, 2-chlorophenylalanine, methoxyamine and pyridine were bought from Aladdin (Shanghai, China). The BSTFA with 1% TMCS was bought from Sigma-Aldrich (St. Louis, MO, USA). All reaction samples were performed in duplicate.

GC/MS analysis
The GC/MS analysis was carried out on an Agilent 6980 GC system equipped with a fused-silica capillary column with a 0.25-μm HP-5MS stationary phase (Agilent, Shanghai, China). We used the same operational methods as our previous studies [20].

Statistical analyses
The statistical analyses were conducted with Stata 12.0 (StataCorp LP, College Station, TX, USA), SIMCA-P 13. 0 (Umetrics AB, Umea, Vasterbotten, Sweden) and R software 3.3.3 (R Foundation for Statistical Computing, Vienna, Austria). A P value less than 0.05 was considered statistically significant.
Meta-analysis methods for diagnostic tests were used to assess the value of the individual microRNAs to diagnose GC using the sensitivity, specificity and AUC of the summary receiver operator characteristic (SROC). Deeks' funnel plot was adopted to evaluate the publication bias. A power analysis was used to obtain the sample size of the GC cases and controls in the microRNA validation phase. Wilcoxon-Mann-Whitney test and Student's t-test were used for the comparison between the patients and the HCs, including the expression of the microRNAs and the concentrations of CEA and CA19-9. The diagnostic efficiencies of the microRNAs were assessed with the sensitivity, specificity and the AUC of the ROC. A logistic regression was utilized to build an appropriate diagnostic model.
The metabolomic information was normalized with "XCMS" package in R software and the data were edited into a two-dimensional matrix, including the mass-tocharge ratio (MZ), retention time (RT) and peak intensity. SIMCA-P software was used to perform multivariate data analyses, including PCA, PLS-DA, and OPLS-DA. A logistic regression was used to investigate the better diagnostic model by combinations of the various components when more than one component was extracted. The metabolites were identified based on the National Institute of Standards and Technology (NIST) mass spectra library through RT and MZ [20]. We screened the significantly different metabolites via the variable importance in the projection (VIP) value (> 1) of the OPLS-DA model and the P value (< 0.001) of fold change of Student's t-test between the patients and the HCs. We use U to represent the upregulated expression, use D to represent the downregulated expression and use N to represent no significant difference in the GC patients versus the control group. The data on the sensitivity, specificity and AUC were obtained via the meta-analysis when the number of included articles was more than one Abbreviations: GC gastric cancer, AUC area under the curve, NA not available

Study selection and literature characteristics
The initial search returned a total of 478 records, among which, 146 were from PubMed, 249 were from Embase, and 83 were from CBM. We removed 156 duplicates, 249 irrelevant studies and six articles that failed to provide enough diagnostic information. Sixty-seven candidate articles were finally enrolled into this systematic review with a total of 5261 GC patients and 4386 healthy controls (Additional file 1: Table S1 and Additional file 2: Table S2).

Diagnostic value of microRNAs in the literature
There were 70 microRNAs mentioned in the included articles, of which, 39 were studied in one single article. We performed the meta-analyses to represent the diagnostic value of the other 31 microRNAs. The details regarding each microRNA are displayed in Table 1.

Publication bias
Publication bias was assessed with a Deeks' funnel plot (Additional file 3: Figure S1), and the P value of Deeks' test was 0.24. Therefore, there was no evidence showing that publication bias existed.

Study population
The clinical and pathological features of the patients and HCs are presented in Table 2. Age was found significant differences between the GC patients and the HCs. We thus performed a covariance analysis. The results suggested that there were no correlations between age and either the expression of the microRNAs, the scores of  Abbreviation: TNM tumor-node-metastasis the components of the metabolomics or the concentrations of CEA and CA19-9.

Expression of microRNAs
MiR-18a, miR-19a, miR-21, miR-92a, miR-199a and miR-421 were chosen in view of their high diagnostic efficiencies in previous studies. The results of the qRT-PCR showed that the serum levels of the microRNAs except miR-421 in the GC patients were significantly higher than those in the HCs (Additional file 4: Table S3 and Fig. 2). The expression of miR-421 wasn't observed significant difference between the patients and HCs.

Diagnostic models established using microRNAs
We calculated the sensitivity, specificity, AUC value of each microRNA and their combinations at the optimal cut-off value to find the appropriate diagnostic model ( Table 3). The combination of miR-19a and miR-92a had the highest AUC value at 0.850, with a sensitivity of 91. 3% and a specificity of 61.0%. The cut-off value of the

Discrepant metabolites and total ion chromatogram
A total of 1118 features were extracted in GC/MS analysis. We found 25 significantly different metabolites (Additional file 5: Table S4). The retention time in the total ion chromatograms was stable with no drift in all of the peaks, which implied that the results were credible.

Diagnostic models established using metabolomics
We extracted eleven principal components in the PCA model, while eigenvalues in seven of the eleven principal components were more than 1.0. We calculated the diagnostic efficiencies when fitting into one to eleven principal components. When enrolled into more than six principal components, the AUC value reached up to 1.0. Five components were extracted in the PLS-DA model, and the AUC values were all higher than those in the PCA model with the same number of components. Just one factor was extracted in the OPLS-DA model, and the AUC value was 1.0. More details of diagnostic information from the three statistical methods are presented in Table 4 and Fig. 3.

Diagnostic value of traditional tumor biomarkers
The CEA concentration in GC patients was significantly higher than that of HCs (Wilcoxon-Mann-Whitney test, P < 0.001). The median concentrations in the patients and HCs were 2.6 (range, 0.5-302.4) and 1.3 (range, 0. 3-4.2) μg/L, respectively. For CEA, the sensitivity was 45.0% and the specificity was 95.1% with an AUC of 0. 763 (95% CI = 0.686-0.839) when the cut-off value was 2.85 μg/L. When the cut-off value was set at 5 μg/L, which is the traditional upper bound of healthy people, the sensitivity was 22.5%, and the specificity was 100%.
The ROC curves of the new models and the traditional tumor biomarkers are displayed in Fig. 4.

Discussion
The development of new technologies has spawned a series of new diagnostic biomarkers. Genomics, microarrays, proteomics, and metabolomics have become general methods for finding novel biomarkers [5]. After reviewing the oncogenes (MMP-9, STC1 and S100A6) [21][22][23], DNA methylated markers (APBA2, SPG20 and SOX17) [24][25][26], lncRNAs (UCA1 and LSINCT-5) [27] and the combinations of autoantibody spectrum [28,29], we found their diagnostic efficiencies not up to expectations. On the contrary, the combinations of microRNAs and metabolomics have the satisfactory diagnostic value constantly [11,17]. MicroRNA detection has a good many advantages. Compared with long non-coding RNAs and mRNAs, microRNAs are stable and easy to amplify. The stability is reflected at room temperature and even after repeated freeze-thawing [30]. In contrast with gastroscopy, it is inexpensive and non-invasive with almost no complications. Each sample detection for six microRNAs costs approximately 28 dollars in China, which is half of the expense of gastroscopy plus biopsy. The superiority of microRNA detection would be larger in developed countries because of the fancy price of endoscopy. Nevertheless, as nucleic acids, microRNAs cannot be detected directly, and they must first be extracted and reverse transcribed. Furthermore, fold changes and cut-off values are tremendously diverse among different studies because the choice of reference RNA, the dosage of reagents, qPCR detecting instrument and an operating process are not yet standardized. The standardization of protocol is necessary to achieve detection automation and clinical application. The expression of serum microRNAs were altered in various malignant tumors [11,[31][32][33][34]. Nevertheless, microRNA diagnostic models may be optimal in determining whether a patient has a malignant tumor. A position diagnosis can be completed through typical clinical manifestations, imaging reports and gastroscopy.
A common research routine of diagnostic test of micro-RNAs is to screen by the microarray in a small sample size and then validate the results by qRT-PCR in a larger sample size [35]. Other studies validated by qRT-PCR directly after screening from microRNA databases. We chose microRNAs with high diagnostic value via meta-analyses. In view of including more subjects, the selection of micro-RNAs are more reliable. Three of these microRNAs have potential to become independent biomarkers (AUC > 0.7). It is somewhat disappointing that the combinations of microRNAs didn't increase the AUC value substantially when we attempted all probable combinations of micro-RNAs. The combination of miR-18a, miR-19a, miR-21,  miR-92a and miR-199a had the AUC value at 0.867 (Table 3). However, it was not significantly different compared to the combination of miR-19a and miR-92a according to the logistic regression. Similar to previous studies on circulating metabolomics in GC patients, endogenous metabolites, such as amino acids, organic acids, carbohydrates, fatty acids and steroids, were detected with significant differences [36][37][38]. These varieties suggested metabolism of tumor cells disturbed several metabolic pathways in patients. As a kind of omics technology, metabolomics show a great advantage in diagnosis of GC. It is conceivable that there are hundreds of thousands of low-molecularweight metabolites that change the concentrations in patients with malignant tumor. Our preliminary experiments even indicated that different malignant tumors could be divided by metabolomics. Besides high diagnostic value, GC/MS analysis also has the affordable price, 72.5 dollars. However, the pretreatment process is not standardized, including the choice of the internal standard and derivatization reagents, the time of each step and the operating order.
Conducting the high-throughput data, the PCA, PLS-DA and OPLD-DA models remain stable when the variables are numerous and the observations are sparse. The results of our study suggest that the OPLS-DA model has the highest AUC and the PCA model ranks the last when including the same number of components. The conclusion could be explained by statistics. PLS-DA and OPLS-DA are supervisory analysis methods, while PCA is non-supervisory. Based on PLS, OPLS further separates the orthogonal variables by an orthogonal signal correction [39,40]. Although the PCA model is the worst in the three multivariate statistical methods, we could increase the AUC by extracting more principal components. We have noticed that only significantly different metabolites, usually less than ten varieties, were fitted into the diagnostic statistical models in previous studies of metabolomics. We used all 1118 metabolites to construct the model in our study and an internal validation indicated that the models with all metabolites were more robust than those with limited metabolites [41].
Compared with new diagnostic models, CEA showed the inferior diagnostic efficiencies. CEA is better to become a biomarker to predict the recurrence actually [42]. It is interesting that there was no significant differences between GC patients and HCs for CA19-9, which was more commonly used to diagnose pancreatic cancer and colorectal cancer. The cut-off value established by Youden index or Euclidean index of ROC curve could realize more potential to a biomarker than that established by the upper bound of 95% of healthy people.
Ethics approval and consent to participate All samples were coded anonymously in accordance with local ethical guidelines, as stipulated by the Declaration of Helsinki with written informed consent and a protocol approved by the Ethics Review Committee of Zhongshan Hospital of Fudan University with the following reference number B2016-113R, and every patient provided written informed consent before enrollment.