Skip to main content

Computational analysis for identification of early diagnostic biomarkers and prognostic biomarkers of liver cancer based on GEO and TCGA databases and studies on pathways and biological functions affecting the survival time of liver cancer

Abstract

Background

Liver cancer is the sixth most commonly diagnosed cancer and the fourth most common cause of cancer death. The purpose of this work is to find new diagnostic biomarkers or prognostic biomarkers and explore the biological functions related to the prognosis of liver cancer.

Methods

GSE25097 datasets were firstly obtained and compared with TCGA LICA datasets and an analysis of the overlapping differentially expressed genes (DEGs) was conducted. Cytoscape was used to screen out the Hub Genes among the DEGs. ROC curve analysis was used to screen the Hub Genes to determine the genes that could be used as diagnostic biomarkers. Kaplan-Meier analysis and Cox proportional hazards model screened genes associated with prognosis biomarkers, and further Gene Set Enrichment Analysis was performed on the prognosis genes to explore the mechanism affecting the survival and prognosis of liver cancer patients.

Results

790 DEGs and 2162 DEGs were obtained respectively from the GSE25097 and TCGA LIHC data sets, and 102 Common DEGs were identified by overlapping the two DEGs. Further screening identified 22 Hub Genes from 102 Common DEGs. ROC and survival curves were used to analyze these 22 Hub Genes and it was found that there were 16 genes with a value of AUC > 90%. Among these, the expression levels of ESR1,SPP1 and FOSB genes were closely related to the survival time of liver cancer patients. Three common pathways of ESR1, FOBS and SPP1 genes were identified along with seven common pathways of ESR1 and SPP1 genes and four common pathways of ESR1 and FOSB genes.

Conclusions

SPP1, AURKA, NUSAP1, TOP2A, UBE2C, AFP, GMNN, PTTG1, RRM2, SPARCL1, CXCL12, FOS, DCN, SOCS3, FOSB and PCK1 can be used as diagnostic biomarkers for liver cancer, among which FOBS and SPP1 genes can also be used as prognostic biomarkers. Activation of the cell cycle-related pathway, pancreas beta cells pathway, and the estrogen signaling pathway, while on the other hand inhibition of the hallmark heme metabolism pathway, hallmark coagulation pathway, and the fat metabolism pathway may promote prognosis in liver cancer patients.

Peer Review reports

Background

According to the Global Cancer Statistics report of 2018, liver Cancer became the sixth most commonly diagnosed cancer and the fourth leading cause of cancer death in the world in 2018 [1]. The highest incidence (mortality) of liver cancer is in East Asia, accounting for 35.5% of the global total. The main risk factors for liver cancer are chronic hepatitis B virus (HBV) [2,3,4], hepatitis C virus (HCV) [5,6,7], aflatoxin-contaminated food [8], heavy alcohol consumption [6, 9, 10], obesity [11], smoking [12] and type 2 diabetes [13, 14]. According to statistics, the risk factors of liver cancer formation are different in 53 countries and throughout different regions in the world. In most high-risk areas such as China and East Africa, chronic HBV infection and aflatoxin exposure are the main determinants of liver cancer. In contrast, HCV infection is the leading cause of liver cancer in Japan and Egypt [15, 16]. For low-risk liver cancer areas, an increase in obesity rates is the leading cause of the increase in liver cancer case.

The internationally recognized TNM cancer staging method divides cancers into stage I, II, III and IV [17]. Also, work on the topic has previously divided cancer into early, middle and late stages. Corresponding to TNM stages, phase I is early-stage, phase II and III are middle-stages, and phase IV is late-stage. Most cancers are diagnosed at the late stage and this holds especially true for liver cancer. Modern medical research has shown that there is no pain sensation in the liver and even if any liver disease had started, the body can’t feel or recognize it through a pain-feedback mechanism. Hence, the clinical manifestation of liver disease is very slight, most patients with liver cancer are diagnosed at a late stage owing to a lack of timely symptom manifestation and identification [18,19,20,21]. The cure rate of early-stage liver cancer is very optimistic, therefore if a diagnosis can be made in any stage before stage IV, the treatment of the cancerous mass will be less intense as it would be for the final stage.

Alpha-fetoprotein (AFP) is currently the only clinically used biomarker for the early diagnosis of liver cancer. AFP was discovered more than 50 years ago and is not a very accurate diagnostic biomarker for liver cancer. 32 to 59% of liver cancer patients have been shown to have normal AFP levels [22]. Therefore, finding new diagnostic biomarkers of liver cancer is of great significance for accurate diagnosis. For cancer patients, the prognosis and survival time of cancer is of utmost importance for improving the quality of life of patients, as well as the diagnosis and treatment scheme adopted. Currently, therapeutic indications for the treatment of liver cancer are more concerned with tumor size and the number of nodules and less concerned with its aggressiveness to spread [23]. Compared with a small and aggressive liver cancer node, patients with multiple large but non aggressive liver cancer nodules may have a better prognosis, hence it may be assumed that the current prognostic criteria are not accurate or the best for prognosis. If new genes related to the prognosis of liver cancer can be identified, it will hold large positive ramifications for both treatment and the improvement of patients’ quality of life. In this scientific work, the data of liver cancer patients in TGCA and GEO databases were taken as search criteria to identify diagnostic biomarkers and prognostic biomarkers of liver cancer through data mining. The aim is to improve the accuracy of the early diagnosis of liver cancer, achieve early detection and treatment and thus reduce mortality. At the same time, through the accurate judgment of the prognosis of liver cancer patients, adjuvant treatment to determine the plan of action could be streamlined.

Methods

Microarray data

The liver cancer dataset was obtained from TCGA (https://portal.gdc.cancer.gov/). This included 50 normal liver tissue samples and 371 samples of liver cancer which was coupled with clinical data. Another gene expression profiling dataset (GSE25097) included information on 243 normal samples and 268 tumor samples, which was downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/database) and measured in an array (Platform: GPL10687 Rosetta/Merck Human RSTA Affymetrix 1.0 microarray, Custom CDF).

Data processing

The original microarray data of the GSE25097 and TCGA LIHC datasets were respectively analyzed with R language to screen the differentially expressed genes (DEGs). Adj .p-value < 0.05 and |logFC| > 2 were used as the cut-off criteria. A Draw Venn Diagram online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to calculate the intersection of two differentially DEGs derived from two different datasets, which represented common differentially expressed genes (the Common DEGs).

Volcano maps and heat maps of DEGs obtained from GEO and TCGA databases

Packet pheatmap, packet ggplot2 and other R packets were used to draw heat maps and volcanic maps of DEGs.

Gene ontology and Reactome pathway analysis

GO analysis of the obtained DEGs was carried out using the package clusterProfiler. The package ReactomePA was used for enrichment analysis of the obtained DEGs in the Reactome pathway. P < 0.05 was considered as statistically significant.

Protein-protein interaction network

Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) is an online protein interaction tool (https://string-db.org/) that can integrate known protein-protein correlation data to build upstream and downstream relationships between proteins [24]. The Common DEGs were inserted into STRING software to build and visualize the protein-protein interaction (PPI) network. Also, cytoHubba in Cytoscape software (Cytoscape_v3.6.1) was utilized to screen hub genes. The top 22 genes with a connection degree of > 5 were selected as hub genes.

Drawing the ROC curve of Hub Genes

Using the package pROC, Receiver Operating Characteristic (ROC) curve analysis was performed on 22 hub genes. AUC > 90% was set as the cutoff value to determine the diagnostic significance of Hub Genes.

Survival and statistical analysis

For survival analysis, gene expression values were divided into low and high expression groups by using R language. The hazard ratio (HR) was determined via a Cox regression model and survival curves were plotted from Kaplan-Meier estimations. P < 0.05 was considered to indicate a statistically significant difference.

Hub Gene expression

The package ggpubr was used to draw a boxplot to observe the distribution of Hub Genes in liver cancer tissue and normal liver tissue.

Gene set enrichment analysis

Gene set enrichment analysis (GSEA) is a computational method that assesses whether a prior defined set of genes shows statistical significance and concordant differences between two biological states [25]. To investigate the role of ESR1, SPP1 and FOBS gene in liver cancer, the package clusterProfiler was used to conduct single-gene GSEA analysis. P-value < 0.05 and p.adjust < 0.05 were regarded as the cut-off criteria.

Results

Identification of DEGs

The GSE25097 dataset was processed with R, DEGs with adj.p value < 0.05 and |logFC| > 2. This summed 790 genes, which were screened for further investigation (Fig. 1, Supplement Table 1). The TCGA LIHC dataset was analyzed with R × 64 3.6.1, using the package DEGseq2, Adj. p value < 0.05 and |logFC| > 2 were regarded as the cut-off criteria. This identified 2162 genes met the standards (Fig. 1, Supplement Table 2). To confirm the reliability of DEGs in liver cancer, Common DEGs of the two datasets were obtained which included 102 genes (Fig. 1, Table 1). The volcano map (Fig. 2A, Fig. 2C) and heat map (Fig. 2B, Fig. 2D) were drawn based on the differential genes obtained from data sets GSE25097 and TCGA LIHC, respectively..

Fig. 1
figure1

Venn diagram of DEGs of GSE25097 and TCGA LIHC datasets

Table 1 102 Common DEGs in TCGA and GSE25097
Fig. 2
figure2

Identification of DEGs of GSE25097 and TCGA LIHC datasets, adj.p-value < 0.05 and |logFC| > 2 were used as the cut off criteria. LIHC: liver cancer; TCGA: The Cancer Genome Atlas. A. Volcano map of DEGs obtained from the GSE25097 dataset B. Heap map of DEGs obtained from the GSE25097 dataset C. Volcano map of DEGs obtained from the TCGA LIHC dataset D. Heat map of DEGs obtained from the TCGA LIHC dataset

GO and Reactome pathway analysis of the DEGs

GO analysis and Reactome Pathway analysis were used to conduct enrichment analysis of the 102 Common DEGs. GO analysis included biological process (BP), cellular component (CC) and molecular function (MF) analysis (Fig. 3a). BP analysis showed that liver cancer caused changes in hormone metabolism (Cellular hormone metabolic process, Hormone metabolic process), cell reaction to copper, cadmium ions and inorganic substances and detoxification function (Cellular response to cadmium ion, Cellular response to metal ion, Cellular response to inorganic substance, Cellular response to copper ion, Detoxification of copper ion and Detoxification). CC analysis showed that the Collagen trimer and Collagen-containing extracellular matrix of liver cancer cells were changed. Moreover, the MF analysis showed that patients with liver cancer had an abnormal expression of oxidoreductase activity and molecular binding function (Glycosaminoglycan binding, Cytokine receptor binding, iron ion binding, extracellular matrix binding and carbohydrate binding). The results showed that the changes of collagen were observed at the cellular level. The changes of hormone metabolism, reaction to metal ions and detoxification were observed at the biological function leved and the changes of molecular binding and oxidoreductase activity were observed at the molecular level.

Fig. 3
figure3

Enrichment analysis diagram of differentially expressed genes DEGs. A.GO analysis. B. Reactome analysis

Through Reactome enrichment analysis (Fig. 3B), it was seen that liver cancer caused changes in biological oxidation reactions and conjugation ability to metal ions (phase II-conjugation of compounds, metallothioneins bind metals and response to metal ions) and also affected growth hormone receptor signaling.

Comparing the results of the two enrichment analyses, it was found that the information obtained by the two was consistent. The two analyses were enriched with changes in hormone metabolism, biological oxidation, cell reaction to metal ions and other aspects in patients with liver cancer.

PPI network analysis and screening for Hub Genes

102 DEGs were used as input into STRING to build a PPI network (Fig. 4A). The PPI network diagram was exported to Cytoscape (3.2.1). CytoHubba app plug-in was used to calculate the Degree Value and other parameter values (Supplement Table 3). Genes whose Degree Values are > = 5 are taken as Hub Genes and a total of 22 Hub Genes were obtained (Table 2). See Fig. 4B for the relationship between 22 Hub Genes.

Fig. 4
figure4

PPI network diagram drawn by String. a. PPI network map of 102 DEGs. b. PPI network map of 22 Hub Genes

Table 2 Top 22 Hub Genes with degree > = 5

Expression of Hub Genes in patients with liver cancer

The expression of 22 Hub Genes in liver cancer and normal liver tissues was analyzed and it was found that SPP1, AURKA, NQO1, NUSAP1, TOP2A, UBE2C, AFP, GMNN, PTTG1, RRM2, UBE2T, GPC3, SPARCL1 etc. (Fig. 5A), a total of 13 genes, were highly expressed in liver cancer tissues. However, ESR1, CXCL12, FOS, DCN, EGR1, SOCS3, CYP1A2, FOSB, PCK1 etc., a total of 9 genes, were under-expressed in liver cancer tissues (Fig. 5B).

Fig. 5
figure5

Expression levels of 22 Hub Gene. A. genes that is highly expressed in liver cancer. (a) SPP1, (b) AURKA, (c) NQO1, (d) NUSAP1, (e) TOP2A, (f) UBE2C, (g) AFP, (h) GMNN, (i) PTTG1, (j) RRM2, (k) UBE2T, (l) GPC3, (m) SPARCL1 in Normal Liver versus Liver Cancer tissues. B. genes that is lowly expressed in liver cancer. (n) ESR1, (o) CXCL12, (p) FOS, (q) DCN, (r) EGR1, (s) SOCS3, (t) CYP1A2, (u) FOSB, (v) PCK1 in Normal Liver versus Liver Cancer tissues

ROC curve analysis of Hub Genes

ROC curve analysis was performed on 22 Hub Genes using the package pROC. AUC > 90% was taken as the cutoff value, and it was found that 16 of the 22 Hub Genes with AUC > 90% included SPP1, AURKA, CXCL12, FOS, NUSAP1, TOP2A, UBE2C, AFP, DCN, GMNN, PTTG1, RRM2, SOCS3, FOSB, PCK1 and SPARCL1 respectively. The expression levels of these genes have high accuracy in distinguishing normal tissue from liver cancer tissue, and could be a potential “tumor biomarker”. At the same time, it can be used as a biomarker for the diagnosis of liver cancer, which has important significance for the accurate diagnosis of liver cancer (Fig. 6).

Fig. 6
figure6

ROC curve of Hub Gene. a ESR1, (b)SPP1, (c) AURKA, (d)CXCL12, (e) FOS, (f) NQO1, (g) NUSAP1, (h)TOP2A, (i)UBE2C, (j) AFP, (k) DCN, (l) EGR1, (m) GMNN, (n)PTTG1, (o)RRM2, (p)SOCS3, (q)UBE2T, (r)CYP1A2, (s)GPC3, (t) FOSB, (u)PCK1, (v)SPARCL1

The survival curve of Hub Genes

Survival curves were plotted from Kaplan-Meier estimations (Fig. 7). The Cox regression model was used to calculate the Hazard Ratio (HR) of the hub genes for liver cancer patients. The results showed that among these Hub Genes, the expression levels of ESR1, SPP1 and FOSB genes was closely related to the survival time of liver cancer patients, with statistically significant differences (p < 0.05). HR values were 0.88, 1.1 and 0.88, respectively, This can be translated as ESR1 and FOSB representing low-risk factors, while SPP1 was a high-risk factor.

Fig. 7
figure7

Survival analysis of 22 Hub Genes: (a) ESR1, (b) SPP1, (c) AURKA (d) CXCL12, (e) FOS, (f) NQO1, (g) NUSAP1, (h) TOP2A, (i) UBE2C, (j) AFP, (k) DCN, (l) EGR1, (m) GMNN, (n) PTTG1, (o) RRM2, (p) SOCS3, (q) UBE2T, (r) CYP1A2, (s) GPC3, (t) FOSB, (u) PCK1, (v) SPARCL1; p < 0.05 was considered as statistically significant

GSEA revealed the biological function that affects the survival time of liver cancer

Single-gene GSEA was used to investigate biological pathways and biological functions related to survival time (Fig. 8). Figure 8A shows all the related pathways of ESR1, FOSB and SSP1 genes,respectively. Figure 8B shows the commonly related pathways of ESR1, FOSB and SSP1 genes. Figure 8B a1, b1 and c1 are the three common pathways of ESR1, FOBS and SPP1 genes. Figure 8B a2 and c2 are the seven common pathways of ESR1 and SPP1 genes, Fig. 8B a3 and b3 are the four common pathways of ESR1 and FOSB genes.

Fig. 8
figure8

Identification of the enriched gene sets with GSEA analysis focused on a single gene as a phenotype. A.dot plot. B.curve graph. a1,b1 and c1 are the common pathways obtained by enrichment of ESR1, FOSB and SSP1 genes, respectively; a2 and c2 are the common pathways obtained by enrichment of ESR1 and SSP1 genes, respectively; a3 and b3 are the common pathways obtained by enrichment of ESR1 and FOSB genes, respectively

The three common pathways enriched by ESR1, FOBS and SPP1 genes are HALLMARK MYC TARGETS V1, HALLMARK G2M CHECKPOINT and HALLMARK E2F TARGETS pathways (Fig. 8Ba1, b1, c1). According to the information in Fig. 8B, it can be seen that the high expression of ESR1 and FOBS can activate these three pathways, while the high expression of SPP1 can inhibit these three pathways. However, in liver cancer tissue, ESR1 and FOBS genes were low in expression, while SPP1 genes were high in expression (see Fig. 5). Therefore, changes in the expression levels of ESR1, FOBS and SPP1 genes in liver cancer inhibited all three pathways.

Seven common pathways were obtained by enrichment analysis of ESR1 and SPP1 genes. They are HALLMARK PANCREAS BETA CELLS, HALLMARK ESTROGEN RESPONSE LATE, HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM, HALLMARK XENOBIOTIC METABOLISM and HALLMARK PEROXISOME pathways. The high expression of the ESR1 gene can activate the HALLMARK PANCREAS BETA CELLS and HALLMARK ESTROGEN RESPONSE LATE pathways, and five pathways, namely, HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM, HALLMARK XENOBIOTIC METABOLISM, and HALLMARK PEROXISOME were inhibited, while the SPP1 gene was opposite to the ESR1 gene (Fig. 8Ba2, c2). In liver cancer, the ESR1 gene is a low expression gene, while the SPP1 gene is a high expression gene (see Fig. 5). Therefore, changes in ESR1 and SPP1 gene expression in liver cancer activated the HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM, HALLMARK XENOBIOTIC METABOLISM, and HALLMARK PEROXISOME pathways. However both the HALLMARK PANCREAS BETA CELLS and HALLMARK ESTROGEN RESPONSE LATE pathways were suppressed.

The four common pathways enriched by ESR1 and FOSB genes are HALLMARK MYC TARGETS V2, HALLMARK HEME METABOLISM, HALLMARK COAGULATION and HALLMARK UV RESPONSE DN pathways. High expression of ESR1 and FOSB can activate the HALLMARK MYC TARGETS V2 pathway and inhibit three pathways, namely HALLMARK HEME METABOLISM, HALLMARK COAGULATION and HALLMARK UV RESPONSE_DN (Fig. 8B a3, b3). However, in liver cancer, both ESR1 and FOBS genes were low expressed (see Fig. 5). Therefore, the changes in the expression levels of ESR1 and FOBS genes in liver cancer inhibited HALLMARK MYC TARGETS V2 pathway, while HALLMARK HEME METABOLISM, HALLMARK COAGULATION and HALLMARK UV RESPONSE_DN pathways were activated.

Discussion

Most patients with liver cancer do not seek medical treatment until they have symptoms in the late stage of liver cancer, therefore the early diagnosis of liver cancer is of great significance for treatment. At present, alpha-fetoprotein (AFP) is a diagnostic biomarker used in the clinical diagnosis of liver cancer. AFP was discovered 50 years ago as a diagnostic biomarker of liver cancer and currently, there are problems associated with the inaccuracy of diagnosis. According to investigations, 32 to 59% of liver cancer patients have normal AFP levels [22]. Therefore, it is necessary to find new and more accurate biomarkers for liver cancer diagnosis. Also, the prognosis of cancer patients is of great significance to the quality of life and treatment of patients. Therefore, the search for prognostic biomarkers is also of great significance for tumor patients. In order to achieve this goal, this scientific work uses data mining analysis to find diagnostic biomarkers and prognostic biomarkers associated with liver cancer.

First, liver cancer data sets from the TCGA database were obtained which included 50 normal liver tissue samples and 371 liver cancer samples. The GSE25097 dataset was obtained from the GEO database consisted of 243 non-tumor tissue samples and 268 liver cancer samples. After DEGs analysis, 102 Common DEGs were obtained from TCGA and GSE25097 data sets. GO analysis was then conducted and Reactome Pathway analysis was used to conduct enrichment analysis on 102 Common DEGs, The results showed that liver cancer showed changes in collagen at the cellular level, changes in hormone metabolism and reaction to metal ions at the biological function and abnormalities in molecular binding and oxidoreductase activity at the molecular level (Fig. 3).

A PPI network was constructed for 102 Common DEGs to find the correlation between genes and 22 Hub Genes were screened from 102 Common DEGs based on Degree value (Table 2). ROC curve is a curve reflecting the relationship between sensitivity and specificity, which is of great significance for the accurate diagnosis of diseases [26]. A ROC curve was used to analyze 22 Hub Genes with AUC greater than 90% as the threshold and this resulted in 16 Hub Genes. They were SPP1, AURKA, CXCL12, FOS, NUSAP1, TOP2A, UBE2C, AFP, DCN, GMNN, PTTG1, RRM2, SOCS3, FOSB, PCK1 and SPARCL1. The expression levels of the 16 Hub Genes in liver cancer can accurately distinguish normal liver tissue from liver cancer, therefore the 16 genes can be used as diagnostic biomarkers of liver cancer for the early diagnosis of liver cancer (along with AFP which is currently used in clinical practice). At the same time, the effect of the 22 Hub Ggenes on the survival time of liver cancer patients was observed and the risk coefficient was calculated. It was found that the expression levels of ESR1, SPP1 and FOSB genes in the 22 hub genes had a significant impact on the survival time of liver cancer patients(p < 0.05), with HR values of 0.88, 1.1 and 0.88, respectively, indicating that ESR1 and FOSB are low-risk genes while SPP1 is high-risk gene. However, the AUC value of ESR1 is 68.7%(Fig. 6a), which showed that the accurate diagnosis rate of ESR1 gene is low and not suitable for use as a diagnostic biomarker. As a result, only the FOSB and SPP1 genes are suitable for use as prognostic biomarkers of liver cancer, where the FOSB is a low-risk gene while the SPP1 is a high-risk gene. In other words, the survival rate of liver cancer patients with high expression of FOSB is higher than that of patients with low expression. In comparison, the survival rate of patients with high expression of SPP1 is lower than that of patients with low expression. This conclusion has been verified through literature. Tang C. et al. found that an overexpression of FOSB protein inhibited tumor cell proliferation, clone formation and cell migration [27], while the silencing of FOSB protein expression promoted tumor cell proliferation, clone formation and cell migration [28]. Li H.’s study also confirmed that the overexpression of FOSB protein can promote the proliferation of cancer cells. These studies confirmed that FOSB is a low-risk gene. Similarly, regarding SPP1, Lu C et al. found that the silencing of OPN protein (encode by SPP1 gene) in liver cancer reduced the number of cell clones and proliferation rate, and in vivo pharmacodynamics observed that the tumor volume of tumor-bearing mice decreased [29]. It was confirmed that the SPP1 is a high-risk gene.

Finally, single-gene GSEA analysis was performed on the three prognostic genes, ESR1, SPP1 and FOSB, that affect the survival time of liver cancer patients (Fig. 8) in order to explore the mechanism affecting the prognosis of liver cancer patients. Through analysis, it was found that there were three pathways closely related to ESR1, FOBS and SPP1 genes (Fig. 8B a1, b1, c1), seven pathways closely related to ESR1 and SPP1 genes (Fig. 8B a2, c2), and four pathways closely related to ESR1 and FOSB genes (Fig. 8B a3, b3).

The three common pathways related to ESR1, FOBS, and SPP1 genes are HALLMARK MYC TARGETS V1, HALLMARK G2/M CHECKPOINT and HALLMARK E2F TARGETS. Among them, high expression of ESR1 and FOBS genes can activate these three pathways, while high expression of SPP1 gene inhibits these three pathways (Fig. 8a1, b1, c1). At the same time, since ESR1 and FOBS genes are low-risk factors, high expression of ESR1 and FOBS genes can activate these three pathways. SPP1 gene is a high-risk factor, high expression of SPP1 can inhibit these three pathways (Fig. 8 a, b, t). Hence, activation of these three pathways is conducive to improving the survival time of liver cancer patients. MYC TARGETS V1 pathway is a new anticancer target [30,31,32] which is closely related to cell proliferation, differentiation and cell cycle. In contrast, the G2/M CHECKPOINT pathway [33] and HALLMARK E2F TARGETS pathway are all closely related to the cell cycle [34]. In summation, patients with liver cancer whose cell cycle pathway is activated have a better prognosis.

The seven common pathways related to ESR1 and SPP1 genes are HALLMARK PANCREAS BETA CELLS, HALLMARK ESTROGEN RESPONSE LATE, HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM, HALLMARK XENOBIOTIC METABOLISM and HALLMARK PEROXISOME. Among them, high ESR1 gene expression can activate the HALLMARK PANCREAS BETA CELLS and HALLMARK ESTROGEN RESPONSE LATE pathways, inhibit the five pathways of HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM, HALLMARK XENOBIOTIC METABOLISM and HALLMARK PEROXISOME. In contrast, SPP1 gene was opposite to ESR1 gene (Figure 8 a2, c2). Similarly, the ESR1 gene represents a low-risk-factor, SPP1 gene represent a high-risk factor and therefore liver cancer patients that show HALLMARK PANCREAS BETA CELLS and HALLMARK ESTROGEN RESPONSE LATE pathway activated and the HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM, HALLMARK XENOBIOTIC METABOLISM and HALLMARK PEROXISOME pathways inhibited have a better prognosis. By analyzing these pathways, it has been found that these seven pathways can be divided into four aspects in terms of function: 1. The prognosis of liver cancer patients with HALLMARK PANCREAS BETA CELLS pathway activated is better than that of liver cancer patients with this pathway inhibited. HALLMARK PANCREAS BETA CELLS pathway restrained and islet cell dysfunction are important cause of type 2 diabetes. This also means that patients with liver cancer complicated with type 2 diabetes have a poor prognosis. Patients with type 2 diabetes are also a high-risk population for developing liver cancer. This conclusion is consistent with the conclusion of an epidemiological investigation of liver cancer [17]. 2. The prognosis of liver cancer patients that HALLMARK ESTROGEN RESPONSE LATE pathway activated is better. Clinically, “Palmar Erythema” and “spider nevus” appear in the palms of some patients with cancer [35] and severe liver dysfunction [36]. These manifestations are caused by the decreased metabolism of estrogen in the liver, resulting in excessive estrogen [37] in the blood and stimulation of capillary arterial congestion and dilation. In other words, the presence of “Palmar Erythema” and “spider arachnoid” is a manifestation of the inhibition of estrogen pathway and the prognosis of liver cancer patients with “ Palmar Erythema “ and “ spider nevus “ is poor. Also, in clinical practice, some male liver cancer patients, due to the inhibition of estrogen metabolism, have an increase of estrogen level in their blood resulting in breast development. The prognosis of such liver cancer patients is not positive [38]. 3. The prognosis is better in patients with liver cancer whose fat metabolism-related pathways (HALLMARK ADIPOGENESIS, HALLMARK FATTY ACID METABOLISM, HALLMARK BILE ACID METABOLISM and HALLMARK PEROXISOME) are inhibited. Epidemiological investigation shows that obesity is one of the important factors causing liver cancer and for the prognosis of liver cancer patients, the prognosis of patients with fat metabolism-related pathways being inhibited is better. 4. Patients whose HALLMARK XENOBIOTIC METABOLISM is inhibited have a more positive prognosis.

Four common pathways related to ESR1 and FOSB genes are activation of HALLMARK MYC TARGETS V2 and inhibition of HALLMARK HEME METABOLISM, HALLMARK COAGULATION and HALLMARK UV RESPONSE DN pathways. Both ESR1 and FOSB genes were low-risk factors, therefore patients whose HALLMARK MYC TARGETS V2 pathway was activated, and the HALLMARK HEME METABOLISM, HALLMARK COAGULATION and HALLMARK UV RESPONSE DN pathways were suppressed had a better prognosis. HALLMARK E2F TARGETS V2 pathway is closely related to the cell cycle, that is to say, the prognosis of liver cancer patients with activated cell cycle pathway is better, which is consistent with the conclusion previously arrived at. Also, HALLMARK HEME METABOLISM pathway regulates HEME METABOLISM, and the main product of HEME METABOLISM is bile pigment, which includes many compounds such as bilirubin, biliverdin, bilinogen and choline. Under normal circumstances, bile pigment is mainly excreted with bile. Bilirubin is the main pigment in bile, which is orange-yellow in color. The metabolic disorder of bilirubin is closely related to clinical hepatobiliary diseases. If the HALLMARK HEME METABOLISM pathway is activated, the heme will be massively metabolized into bilirubin, resulting in an excessively high concentration in plasma and then will be diffused into tissue, resulting in jaundice (easily seen in sclera, skin, etc.). According to the conclusions of the data analysis in this scientific work, patients with inhibited HALLMARK METABOLISM pathway have a good prognosis. In contrast, those with an activated HALLMARK METABOLISM pathway have a poor prognosis. After having activated HALLMARK METABOLISM pathway, patients will show jaundice related symptoms and liver cancer patients with jaundice have a poor prognosis whilepatients with suppressed HALLMARK COAGULATION pathway have a good prognosis, The HALLMARK COAGULATION pathway mainly regulates the COAGULATION function. Abnormal COAGULATION function in liver cancer patients is a common clinical symptom, mainly related to the lack of COAGULATION factor, thrombocytopenia and increased vascular permeability. The results of the data analysis in this work show that the prognosis of patients with inhibited blood clotting function is better than that of patients with this function activated.

Through a very detailed and painstkeing analysis, it was found that the prognosis of liver cancer patients is mainly related to the following functions: 1. It is closely related to the regulation of the cell cycle and patients with activated cell cycle have a good prognosis. 2. Liver cancer patients with activated HALLMARK PANCREAS BETA CELLS pathway have a good prognosis, while liver cancer patients with type 2 diabetes have a poor prognosis. 3. Patients with activated hepatocellular estrogen pathway have a good prognosis and those with “liver palm”, “spider nevus” and abnormal breast development have a poor prognosis. 4. Liver cancer patients whose fat metabolism-related pathways are inhibited have a good prognosis. 5. Liver cancer patients whose HALLMARK XENOBIOTIC METABOLISM pathway is inhibited have a good prognosis. 6. The prognosis of liver cancer patients is good if HALLMARK HEME METABOLISAM pathway is inhibited, and poor if the patient has “jaundice”. 7. Liver cancer patients whose HALLMARK COAGULATION pathway is inhibited have a good prognosis.

Conclusion

Ten genes have been identified which show high expression in the event of liver cancer. These include SPP1, AURKA, NUSAP1, TOP2A, UBE2C, AFP, GMNN, PTTG1, RRM2 and SPARCL1. Six genes show low expression and include CXCL12, FOS, DCN, SOCS3, FOSB and PCK1. These can be used as markers for liver cancer diagnosis, among which FOBS and SPP1 genes can also be used as prognostic markers of liver cancer. Activation of the cell cycle-related pathway, PANCREAS BETA CELLS pathway and the estrogen signaling pathway in LIVER CANCER patients, while inhibition of the HALLMARK HEME METABOLISM pathway, HALLMARK COAGULATION pathway, and the fat metabolism pathway may promote prognosis in LIVER CANCER patients.

Availability of data and materials

The datasets generated and/or analyzed during the current study are available in the [https://www.ncbi.nlm.nih.gov/geo/database] and https://portal.gdc.cancer.gov/].

Abbreviations

AFP:

Alpha-fetoprotein

BP:

Biological process

CC:

Cellular component

Common DEGs:

The common differentially expressed genes

DEGs:

The differentially expressed genes

GEO:

Gene expression omnibus

GSEA:

Gene set enrichment analysis

HBV:

Hepatitis B virus

HCV:

Hepatitis C virus

HR:

The hazard ratio

MF:

Molecular function

ROC:

Receiver operating characteristic curve

STRING:

Search tool for the retrieval of interacting genes/proteins

TCGA:

The cancer genome atlas

References

  1. 1.

    Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68(6):394-424.

  2. 2.

    Piñero F, Tanno M, Aballay Soteras G, Tisi Baña M, Dirchwolf M, Fassio E, Ruf A, Mengarelli S, Borzi S, Fernández N, Ridruejo E, Descalzi V, Anders M, Mazzolini G, Reggiardo V, Marciano S, Perazzo F, Spina JC, McCormack L, Maraschio M, Lagues C, Gadano A, Villamil F, Silva M, Cairo F, Ameigeiras B, Argentinean Association for the Study of Liver Diseases (A.A.E.E.H). Argentinian clinical practice guideline for surveillance, diagnosis, staging and treatment of hepatocellular carcinoma. Ann Hepatol 2020;19(5):546–569. https://doi.org/https://doi.org/10.1016/j.aohep.2020.06.003.

  3. 3.

    Feld JJ, Krassenburg LAP. What comes first: treatment of viral hepatitis or liver cancer? Dig Dis Sci 2019;64(4):1041–1049. https://doi.org/https://doi.org/10.1007/s10620-019-05518-5.

  4. 4.

    Ringelhan M, O'Connor T, Protzer U, Heikenwalder M. The direct and indirect roles of HBV in liver cancer: prospective markers for HCC screening and potential therapeutic targets. J Pathol 2015;235(2):355–367. https://doi.org/https://doi.org/10.1002/path.4434.

  5. 5.

    Pham C, Sin MK. Use of electronic health Records at Federally Qualified Health Centers: a potent tool to increase viral hepatitis screening and address the climbing incidence of liver Cancer. J Cancer Educ 2020. https://doi.org/https://doi.org/10.1007/s13187-020-01741-1.

  6. 6.

    Szabo G, Saha B, Bukong TN. Alcohol and HCV: implications for liver cancer. Adv Exp Med Biol 2015;815:197–216. https://doi.org/https://doi.org/10.1007/978-3-319-09614-8_12.

  7. 7.

    Yao M, Yang JL, Wang L, Yao DF. Carcinoembryonic type specific markers and liver cancer immunotherapy. Zhonghua Gan Zang Bing Za Zhi 2020;28(6):466–470. https://doi.org/https://doi.org/10.3760/cma.j.cn501113-20200311-00107.

  8. 8.

    Henry SH, Bosch FX, Bowers JC. Aflatoxin, hepatitis and worldwide liver cancer risks. Adv Exp Med Biol 2002;504:229–233. https://doi.org/https://doi.org/10.1007/978-1-4615-0629-4_24.

  9. 9.

    Grewal P, Viswanathen VA. Liver cancer and alcohol. Clin Liver Dis 2012;16(4):839–850. https://doi.org/https://doi.org/10.1016/j.cld.2012.08.011.

  10. 10.

    Turati F, Galeone C, Rota M, Pelucchi C, Negri E, Bagnardi V, Corrao G, Boffetta P, la Vecchia C Alcohol and liver cancer: a systematic review and meta-analysis of prospective studies. Ann Oncol 2014;25(8):1526–1535. https://doi.org/https://doi.org/10.1093/annonc/mdu020.

  11. 11.

    Saitta C, Pollicino T, Raimondo G. Obesity and liver cancer. Ann Hepatol 2019;18(6):810–815. https://doi.org/https://doi.org/10.1016/j.aohep.2019.07.004.

  12. 12.

    Lee YC, Cohet C, Yang YC, Stayner L, Hashibe M, Straif K. Meta-analysis of epidemiologic studies on cigarette smoking and liver cancer. Int J Epidemiol 2009;38(6):1497–1511. https://doi.org/https://doi.org/10.1093/ije/dyp280.

  13. 13.

    Tahmasebi-Birgani M, Ansari H, Carloni V. Defective mitosis-linked DNA damage response and chromosomal instability in liver cancer. Biochim Biophys Acta Rev Cancer 2019;1872(1):60–65. https://doi.org/https://doi.org/10.1016/j.bbcan.2019.05.008.

  14. 14.

    Wang Y, Wang B, Yan S, Shen F, Cao H, Fan J, Zhang R, Gu J Type 2 diabetes and gender differences in liver cancer by considering different confounding factors: a meta-analysis of cohort studies. Ann Epidemiol 2016;26(11):764–772. https://doi.org/https://doi.org/10.1016/j.annepidem.2016.09.006.

  15. 15.

    Kudo M. Surveillance, diagnosis, treatment, and outcome of liver cancer in Japan. Liver Cancer 2015;4(1):39–50. https://doi.org/https://doi.org/10.1159/000367727.

  16. 16.

    Yamashita T, Kaneko S. Liver Cancer. Rinsho Byori. 2016;64(7):787–96.

    PubMed  Google Scholar 

  17. 17.

    Nozaki Y, Yamamoto M, Ikai I, Yamamoto Y, Ozaki N, Fujii H, et al. Reconsideration of the lymph node metastasis pattern (N factor) from intrahepatic cholangiocarcinoma using the International Union against Cancer TNM staging system for primary liver carcinoma. Cancer. 1998;83(9):1923–9..

  18. 18.

    Anwanwan D, Singh SK, Singh S, Saikam V, Singh R. Challenges in liver cancer and possible treatment approaches. Biochim Biophys Acta Rev Cancer. 2020;1873(1):188314. https://doi.org/https://doi.org/10.1016/j.bbcan.2019.188314.

  19. 19.

    Kaffe E, Magkrioti C, Aidinis V. Deregulated lysophosphatidic acid metabolism and signaling in liver cancer. Cancers (Basel) 2019;11(11):1626. https://doi.org/https://doi.org/10.3390/cancers11111626.

  20. 20.

    Mato JM, Elortza F, Lu SC, Brun V, Paradela A, Corrales FJ. Liver cancer-associated changes to the proteome: what deserves clinical focus? Expert Rev Proteomics 2018;15(9):749–756. https://doi.org/https://doi.org/10.1080/14789450.2018.1521277.

  21. 21.

    Wrighton PJ, Oderberg IM, Goessling W. There is something fishy about liver cancer: zebrafish models of hepatocellular carcinoma. Cell Mol Gastroenterol Hepatol 2019;8(3):347–363. https://doi.org/https://doi.org/10.1016/j.jcmgh.2019.05.002.

  22. 22.

    Charrière B, Maulat C, Suc B, Muscari F. Contribution of alpha-fetoprotein in liver transplantation for hepatocellular carcinoma. World J Hepatol 2016;8(21):881–890. https://doi.org/https://doi.org/10.4254/wjh.v8.i21.881.

  23. 23.

    Lei Q, Chen H, Zheng H, Deng F, Wang F, Li J, et al. Zygomatic bone metastasis from hepatocellular carcinoma and the therapeutic efficacy of apatinib: a case report and literature review. Medicine (Baltimore). 2019;98(18):e14595 https://doi.org/10.10r97/MD.0000000000014595.

    CAS  Article  Google Scholar 

  24. 24.

    Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 2017;45(D1):D362–D368. https://doi.org/https://doi.org/10.1093/nar/gkw937.

  25. 25.

    Subramanian A, Kuehn H, Gould J, Tamayo P, Mesirov JP. GSEA-P: a desktop application for gene set enrichment analysis. Bioinformatics. 2007;23(23):3251–3253. https://doi.org/https://doi.org/10.1093/bioinformatics/btm369.

  26. 26.

    Obuchowski NA, Bullen JA. Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine. Phys Med Biol. 2018;63(7):07tr01. https://doi.org/https://doi.org/10.1088/1361-6560/aab4b1.

  27. 27.

    Tang C, Jiang Y, Shao W, Shi W, Gao X, Qin W, Jiang T, Wang F, Feng S. Abnormal expression of FOSB correlates with tumor progression and poor survival in patients with gastric cancer. Int J Oncol2016;49(4):1489–1496. https://doi: https://doi.org/10.3892/ijo.2016.3661.

  28. 28.

    Li H, Li L, Zheng H, Yao X, Zang W. Regulatory effects of ΔFosB on proliferation and apoptosis of MCF-7 breast cancer cells. Tumour Biol 2016;37(5):6053–6063. https://doi: https://doi.org/10.1007/s13277-015-4356-4.

  29. 29.

    Lu C, Fang S, Weng Q, Lv X, Meng M, Zhu J, Zheng L, Hu Y, Gao Y, Wu X, Mao J, Tang B, Zhao Z, Huang L, Ji J. Integrated analysis reveals critical glycolytic regulators in hepatocellular carcinoma. Cell Commun Signal 2020;18(1):97. https://doi: https://doi.org/10.1186/s12964-020-00539-4.

  30. 30.

    Dang CV. MYC on the path to cancer. Cell. 2012;149(1):22–35. https://doi.org/https://doi.org/10.1016/j.cell.2012.03.003.

  31. 31.

    Hsieh AL, Walton ZE, Altman BJ, Stine ZE, Dang CV. MYC and metabolism on the path to cancer. Semin Cell Dev Biol 2015;43:11–21. https://doi.org/https://doi.org/10.1016/j.semcdb.2015.08.003.

  32. 32.

    Stine ZE, Walton ZE, Altman BJ, Hsieh AL, Dang CV. MYC, metabolism, and cancer. Cancer Discov 2015;5(10):1024–1039. https://doi.org/https://doi.org/10.1158/2159-8290.CD-15-0507.

  33. 33.

    Oshi M, Takahashi H. G2M cell cycle pathway score as a prognostic biomarker of metastasis in estrogen receptor (ER)-positive breast cancer. Int J Mol Sci 2020;21(8):2921. https://doi.org/https://doi.org/10.3390/ijms21082921.

  34. 34.

    De Meyer T, Bijsmans IT, Van de Vijver KK, Bekaert S, Oosting J, Van Criekinge W, et al. E2Fs mediate a fundamental cell-cycle deregulation in high-grade serous ovarian carcinomas. J Pathol 2009;217(1):14–20. https://doi.org/https://doi.org/10.1002/path.2452.

  35. 35.

    Maekawa M. Palmar erythema as a sign of cancer. Cleve Clin J Med 2017;84(9):666–667. https://doi.org/https://doi.org/10.3949/ccjm.84a.16114.

  36. 36.

    Li H, Wang R, Méndez-Sánchez N, Peng Y, Guo X, Qi X. Impact of spider nevus and subcutaneous collateral vessel of chest/abdominal wall on outcomes of liver cirrhosis. Arch Med Sci 2019;15(2):434–448. https://doi.org/https://doi.org/10.5114/aoms.2018.74788.

  37. 37.

    Serrao R, Zirwas M, English JC. Palmar erythema. Am J Clin Dermatol 2007;8(6):347–356. https://doi.org/https://doi.org/10.2165/00128071-200708060-00004.

  38. 38.

    Kuper H, Mantzoros C, Lagiou P, Tzonou A, Tamimi R, Mucci L, Benetou V, Spanos E, Stuver SO, Trichopoulos D Estrogens, testosterone and sex hormone binding globulin in relation to liver cancer in men. Oncology. 2001;60(4):355–360. https://doi.org/https://doi.org/10.1159/000058532.

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Affiliations

Authors

Contributions

S. G. and H. T. designed the study. S. G. and J. G. carried out data acquisition and analysis. S. G. and H. T. wrote the manuscript. S. G. and J. G. contributed to preparing and making figures and Tables. M.Y. and G.X. edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Shiyong Gao or Huixin Tan.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

All authors declare that they have no conflict of interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplement Table 1.

The differentially expressed genes of GSE25097

Additional file 2: Supplement Table 2.

TCGA the differentially expressed genes

Additional file 3: Supplement Table 3.

Parameter values of the common differentially expressed genes

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gao, S., Gang, J., Yu, M. et al. Computational analysis for identification of early diagnostic biomarkers and prognostic biomarkers of liver cancer based on GEO and TCGA databases and studies on pathways and biological functions affecting the survival time of liver cancer. BMC Cancer 21, 791 (2021). https://doi.org/10.1186/s12885-021-08520-1

Download citation

Keywords

  • Bioinformatics
  • Biomarker
  • Liver cancer
  • Diagnostic biomarker
  • Prognostic biomarker