- Research article
- Open Access
Identification of a protein signature for predicting overall survival of hepatocellular carcinoma: a study based on data mining
BMC Cancer volume 20, Article number: 720 (2020)
Hepatocellular carcinoma (HCC), is the fifth most common cancer in the world and the second most common cause of cancer-related deaths. Over 500,000 new HCC cases are diagnosed each year. Combining advanced genomic analysis with proteomic characterization not only has great potential in the discovery of useful biomarkers but also drives the development of new diagnostic methods.
This study obtained proteomic data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) and validated in The Cancer Proteome Atlas (TCPA) and TCGA dataset to identify HCC biomarkers and the dysfunctional of proteogenomics.
The CPTAC database contained data for 159 patients diagnosed with Hepatitis-B related HCC and 422 differentially expressed proteins (112 upregulated and 310 downregulated proteins). Restricting our analysis to the intersection in survival-related proteins between CPTAC and TCPA database revealed four coverage survival-related proteins including PCNA, MSH6, CDK1, and ASNS.
This study established a novel protein signature for HCC prognosis prediction using data retrieved from online databases. However, the signatures need to be verified using independent cohorts and functional experiments.
Hepatocellular carcinoma (HCC), is the fifth most common cancer in the world and the second most common cause of cancer-related deaths. Over 500,000 new HCC cases are diagnosed each year . Viral hepatitis and nonalcoholic steatohepatitis are the most common causes of cirrhosis which underlies approximately 80% of cases of HCC . HCC prognosis remains a challenge due to the recurrence of HCC and the 5-year overall survival rate is only 34 to 50% . Despite the rapid advancements in medical technology, there are still no effective treatment strategies for HCC patients . Byeno et al  reported that based on long-term survival data, the serum OPN and DKK1 levels in patients with liver cancer can be used as novel biomarkers that predict prognosis. Other serum markers, such as alpha-fetoprotein (AFP) and alkaline phosphatase (ALP or AKP), have also been reported in clinical practice, however, these markers lack sufficient sensitivity and specificity . Therefore, it is necessary to find effective biomarkers essential for diagnosis and treatment for HCC.
Proteomics is a field of research that studies the proteins at a large-scale level. Biomarker analysis uses high-throughput sequencing technologies in proteomics and genomics. Mass spectrometry-based targeted proteomics has been used to set up multiple omics. Mass spectrometry-based identification of matching or homologous peptide identification can further refine gene model . This allows for an in-depth analysis of host-pathogen interactions. Combining advanced genomic analysis with proteomic characterization not only has great potential in the discovery of useful biomarkers but also drives the development of new diagnostic methods and therapies. Proteogenomic studies have enabled the exploration of the prognosis of cancer progression, however, its role and mechanism remain unclear. Chiou et al  used integrated proteomic, genomic, and transcriptomic techniques to obtain protein expression profiles from HCC patients. This study found that S100A9 and granulin protein markers were associated with tumorigenesis and cancer metastasis in HCC. Similarly, Chen et al  using a proteomic approach found that curcumin/β-cyclodextrin polymer (CUR/CDP) inclusion complex exhibited inhibitory effects on HepG2 cell growth. Over the last few years, integrative tools useful in executing complete proteogenomics analyses have been developed. In this study, we systematically evaluated the prognostic protein signature for the prediction of overall survival (OS) for HCC patients. The availability of high-throughput expression data has made it possible to use global gene expression information to analyze the genetic and clinical aspects of HCC patients. Therefore, in this study, protein data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) and validated in The Cancer Proteome Atlas (TCPA) and the cancer genomic maps (TCGA) dataset was used to identify HCC biomarkers and the dysfunctional of proteogenomics.
CPTAC is a public repository of well-characterized, mass spectrometry (MS)-based and targeted proteomic assays, useful in characterizing the protein inventory in tumors by leveraging the latest advances in mass spectrometry-based discovery proteomics . TCPA is a user-friendly data portal that contains 8167 tumor samples in total, which consists primarily of TCGA tumor tissue samples and provides a unique opportunity to validate the TCGA data and identify model cell lines for functional investigations . TCGA has generated multi-platform cancer genomic data and generated some proteomic data using the Reverse Phase Protein Array (RPPA) platform, measuring protein levels in tumors for about 150 proteins and 50 phosphoproteins . In this study, proteomics data was downloaded from TCPA (level 4) and combined with clinical data from TCGA, and comprehensive analysis of proteomics performed through CPTAC.
Establishing the prognostic gene signature
Univariate Cox regression analysis was performed to identify prognostic genes and establish their genetic characteristics. The prognostic gene signature was demonstrated as risk score = (CoefficientmRNA1 × expression of mRNA1) + (CoefficientmRNA2 × expression of mRNA2) + ⋯ + (CoefficientmRNAn × expression mRNAn). Based on the median risk score, the patients were classified into the low-risk (
Building and validating a predictive nomogram
Nomograms are often used to predict the prognosis of cancer. Mainly because they can simplify statistical prediction models to a single numerical assessment of the probability of an event (such as relapse or death) depending on the condition of an individual patient . A receiver operating characteristic (ROC) curve was plotted over time to assess the prediction accuracy of prognostic signals in HCC patients. Univariate and multifactorial Cox regression analysis was used to analyze the relationship between gene clinicopathological parameters.
Statistical analyses were performed using R (version 3.5.3) and R Bioconductor software packages. Benjamini–Hochberg’s method was used to convert P values to FDR. Perl language was used for data matrix and data processing and a P value less than 0.05 was used. The identification of differentially expressed proteins between HCC and non-cancerous samples in CPTAC used |log2FC| > 1 and a P-value < 0.05 was considered to be statistically significant.
Establishment of the prognostic gene signatures
Figure 1 presents a flow chart of this study scheme. A total of 159 patients diagnosed with Hepatitis-B related HCC  (159 tumor tissues and 159 paratumor tissues Table S1) and 422 differentially proteins (112 upregulated and 310 downregulated Table S2) were identified from the CPTAC database. To analyze the function of the identified differentially expressed proteins, biological analyses were performed using gene ontology (GO) enrichment and KEGG pathway analysis. GO analysis revealed that the GO terms related to biological processes (BP) of differentially expressed proteins were enriched in fatty acid biosynthesis and catabolism, molecular function (MF) were mainly enriched in cofactor binding, coenzyme binding, vitamin binding, monooxygenase activity, carboxylic acid-binding, iron ion binding, and organic acid binding and cell component (CC) were mainly enriched in the mitochondrial matrix, MCM complex, collagen trimer, peroxisome, microbody, microbody part, peroxisomal part, peroxisomal matrix, and microbody lumen. KEGG pathway analysis revealed that the differentially expressed proteins were mainly enriched in retinol metabolism, chemical carcinogenesis, drug metabolism-cytochrome P450, fatty acid degradation, arginine biosynthesis, PPAR signaling pathway and other metabolic pathways (Fig. 2).
Protein-protein interaction (PPI) network construction and module analysis
To further explore the relationship between differentially expressed proteins at the protein level, the PPI network was constructed based on the interactions of differentially expressed proteins. A total of 542 interactions and 236 nodes were screened to establish the PPI network and the top five most contiguous nodes between genes were CDK1, AOX1, CYP2E1, CYP3A4, and TOP2A (Table S3-S4).
Survival data was extracted from HCC patients in CPTAC and used to perform univariate Cox regression analysis. The expression of survival-related proteins revealed 105 survival-related proteins (P<0.05, Table S5). Univariate and multivariate Cox regression analysis was performed on the clinical factors and survival-related proteins and 41 proteins that can act as independent prognostic factors for OS were identified (Table S6-S7). ROC curves were used to investigate the use of the protein patterns as early predictors of HCC incidence. This model demonstrated that 8 proteins (MCM3, MCM7, PCNA, SLC39A1, SMC2, TOP2A, UBE2C, and UHRF1) had an AUC value above 0.7 (Table S8). Table S9 presents detailed information about the relationship between the 8 proteins and clinical factors. The 8 proteins were used to build a prognostic model, and the median risk score set as the threshold to divide the cohort into high-risk and low-risk groups. The detailed prognostic signature information of the HCC group is shown in Fig. 3.
Building a predictive nomogram
A Nomogram was constructed by involving clinical pathology and prognosis models. The LASSO logistic regression algorithm was used to select the most important prediction markers which greatly contributed to the final prediction model. The model included features in CPTAC: gender, age, tumor differentiation, history of liver cirrhosis, number of tumors, tumor size, tumor thrombus, tumor encapsulation, HBcAb, AFP, PTT, TB, ALB, ALT, and GGT (Fig. 4). The use of the prognostic model and clinical pathology data can improve the sensitivity and specificity of 1-, 3-, and 5-year OS prediction.
Proteomics data was downloaded from TCPA-HCC (level 4; 184 samples and 218 proteins) and combined with clinical data from TCGA. Univariate Cox regression analysis determined the expression of survival-related proteins (Table S10). and we intersect survival-related proteins with CPTAC database, and four survival-related proteins PCNA, MSH6, CDK1, and ASNS were identified. The Human Protein Atlas (HPA) is a website that involves immunohistochemistry-based expression data for distribution and expression of 20 tumor tissues, 47 cell lines, 48 human normal tissues, and 12 blood cells . In this study, the direct contrast of protein expression of the four genes between normal and HCC tissues was used by immunohistochemistry image and the results are shown in Fig. 5. However, PCNA, CDK1, and ASNS proteins were not expressed in normal liver tissues but were expressed in high to medium levels in HCC tissues. Besides, MSH6 was lowly expressed in normal tissues and highly expressed in tumor tissues. TIMER (Differential gene expression module) is a comprehensive asset for systematical investigation of immune infiltrates over various malignancy types. It was used to explore PCNA, MSH6, CDK1, and ASNS based on thousands of variations in copy numbers or gene expressions in patients with HCC. Similar to our findings, the four proteins were significantly overexpressed in HCC patients in the TIMER database (Fig. 6). OS analysis demonstrated that the four proteins with high had a poorer prognosis than that with a low group (P < 0.05) (Fig. 7).
Proteomic analysis of early-stage cancers provides new insights into changes that occur in the early stages of tumorigenesis and represents a new resource for biomarkers for early-stage disease. Proteome characteristics of tumor cells distinguish them from normal cells and are critical in the study of their growth and survival. Proteomic analysis in signaling pathways has become ideal targets for personalized therapeutic intervention in cancer patients . In this study, we identified novel and effective prognostic signatures for patients with HCC. These signatures show great potential in the prognosis prediction of HCC.
In this study, we did a comprehensive analysis of proteomics through CPTAC as well as downloaded proteomic data from TCPA (level 4) which combined with clinical data from TCGA. We first identified 422 differentially proteins and analyzed the function of the identified differentially proteins and then the PPI network construction, we found the most contiguous nodes was CDK1. BP was significantly enriched in acid biosynthetic process and catabolic process, MF were mainly enriched in biological compounds binding, CC was mainly enriched in organelles and enzymes and retinol metabolism, chemical carcinogenesis, drug metabolism-cytochrome P450, fatty acid degradation, arginine biosynthesis, PPAR signaling pathway, and other metabolism pathways. A recent study found that Simvastatin can inhibit the HIF-1α/PPAR-γ/PKM2 axis resulting in decreased proliferation and increased apoptosis in HCC cells . Similarly, Wang et al  confirmed that the anticancer efficacy of avicularin in HCC was dependent on the regulation of PPAR-γ activities. Therefore, we hypothesis that the differentially expressed proteins identified may play a critical role in drug chemical carcinogenesis via the PPAR signaling pathway, however, there is a need for further studies to confirm this hypothesis. The analysis was restricted to the intersection between CPTAC and TCPA database survival-related proteins and four survival-related proteins PCNA, MSH6, CDK1, and ASNS were identified.
Proliferating cell nuclear antigen (PCNA, also known as ATLD2), is a cofactor of DNA polymerase delta which is ubiquitinated in response to DNA damage. A recent study found that PCNA knockdown-HepG2 cells under hypoxia showed the induction of more epithelial-mesenchymal transition (EMT) process compared to the control . PCNA and EMT-related markers were down-regulated following treatment with Wnt/β-catenin signaling inhibitor (XAV939) and the proliferative activity of HCC cells was significantly inhibited . MutS homolog 6 (MSH6) is a member of the DNA mismatch repair MutS family. Togni et al  reported a nuclear expression of MSH6 in HCC excluding a DNA mismatch repair defect and Ozer et al  studied the methylation status of MSH6 involved in DNA repair mechanisms. MSH6 is associated with an increased risk for breast cancer and should be considered in individuals with a family history of breast cancer . Another study evaluated metachronous colorectal cancer (CRC) incidence according to the MSH6 gene in Lynch Syndrome (LS) patients who underwent a segmental colectomy . However, there is currently no comprehensive study on the role of MSH6 in HCC and this study may provide important information for consideration in future studies. Cyclin-dependent kinase 1 (CDK1, also known as CDC2; CDC28A; P34CDC2), is a member of the Ser/Thr protein kinase family which is essential for G1/S and G2/M phase transitions of the eukaryotic cell cycle. Anti-CDK1 treatment can boost sorafenib antitumor responses in HCC patient-derived xenograft (PDX) tumor models . Gao et al  demonstrated that karyopherin subunit-α 2 (KPNA2) may promote tumor cell proliferation by increasing the expression of CDK1. Asparagine synthetase (ASNS, also known as TS11; ASNSD), is involved in the synthesis of asparagine. The expression of ASNS has been reported to be high in HCC tumor tissues and closely correlated with the serum AFP level, tumor size, microscopic vascular invasion, tumor encapsulation, TNM stage, and BCLC stage . Li et al  found that the expressions of ASNS decreased and also functioned as an independent predictor of OS in HCC patients. This study’s OS analysis demonstrated that these four proteins with high had a bad prognosis than those with the low group.
A total of 41 proteins were identified that can serve as an independent prognostic factor for OS. Among the proteins, 8 proteins (MCM3, MCM7, PCNA, SLC39A1, SMC2, TOP2A, UBE2C, and UHRF1) had AUC value above 0.7. The use of the prognostic model and clinical pathology data can improve the sensitivity and specificity of 1-, 3-, and 5-year OS prediction. The 8 proteins were used to build a prognostic model and final SLC39A1 and UBE2C choose to build the prognostic model. Solute carrier family 39 member 1 (SLC39A1, also known as ZIP1, ZIRTL), acts as a molecular zipper to bring homologous chromosomes to close apposition . In prostate cancer, zinc levels have been reported to be decreased and the ZIP1 transporter is lost . Similarly, studies reveal that hZIP1 (SLC39A1) is expressed in the zinc-accumulating human prostate cell lines, LNCaP, and PC-3 . However, the role of SLC39A1 in HCC remains unknown. Ubiquitin-conjugating enzyme E2 C (UBE2C, also known as UBCH10; dJ447F3.2) is an enzyme required for the destruction of mitotic cyclins and cell cycle progression. Studies have demonstrated that knockdown of UBE2C expression suppresses proliferation, migration, and invasion of HCC cells in vitro. Moreover, the silencing of UBE2C also increases the sensitivity of HCC cells to sorafenib . This study was not without limitations. The results have not been validated in clinical samples, and they do not provide accurate clinical data due to the relatively small number of patients used.
This study established a novel protein signature for HCC prognosis prediction using data retrieved from online databases. However, the signatures need to be verified using independent cohorts and functional experiments.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Clinical Proteomic Tumor Analysis Consortium
The Cancer Proteome Atlas
The Cancer Genome Atlas
Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108.
Coskun M. Hepatocellular carcinoma in the cirrhotic liver: evaluation using computed tomography and magnetic resonance imaging. Exp Clin Transplant. 2017;15(Suppl 2):36.
Lang H, Sotiropoulos GC, Brokalaki EI, Schmitz KJ, Bertona C, Meyer G. Survival and recurrence rates after resection for hepatocellular carcinoma in noncirrhotic livers. J Am Coll Surg. 2007;205(1):27–36.
Jiao Y, Fu Z, Li Y, Meng L, Liu Y. High EIF2B5 mRNA expression and its prognostic significance in liver cancer: a study based on the TCGA and GEO database. Cancer Manag Res. 2018 Nov 20;10:6003–14.
Byeon H, Lee SD, Hong EK, Lee DE, Kim BH, Seo Y, Joo J, Han SS. Long-term prognostic impact of osteopontin and Dickkopf-related protein 1 in patients with hepatocellular carcinoma after hepatectomy. Pathol Res Pract. 2018;214(6):814–20.
Shen Y, Bu L, Li R. Screening effective differential expression genes for hepatic carcinoma with metastasis in the peripheral blood mononuclear cells by RNA-seq. Oncotarget. 2017;8(17):27976–89.
Menschaert G, Feny D. Proteogenomics from a bioinformatics angle: a growing field. Mass Spectrom Rev. 2017;36(5):584–99.
Chiou SH, Lee KT. Proteomic analysis and translational perspective of hepatocellular carcinoma: identification of diagnostic protein biomarkers by an onco-proteogenomics approach. Kaohsiung J Med Sci. 2016;32(11):535–44.
Chen J, Cao X, Qin X. Proteomic analysis of the molecular mechanism of curcumin/β-cyclodextrin polymer inclusion complex inhibiting HepG2 cells growth. J Food Biochem. 2020;44:e13119.
Whiteaker JR, Halusa GN, Hoofnagle AN, Sharma V, MacLean B, Yan P, Wrobel JA, Kennedy J, Mani DR, Zimmerman LJ, Meyer MR, Mesri M. Rodriguez H; clinical proteomic tumor analysis consortium (CPTAC), Paulovich AG. CPTAC assay portal: a repository of targeted proteomic assays. Nat Methods. 2014;11(7):703–4.
Li J, Lu Y, Akbani R, Ju Z, Roebuck PL, Liu W, Yang J-Y, Broom BM, Verhaak RGW, Kane DW, et al. TCPA: a resource for Cancer functional proteomics data. Nat Methods. 2013;10(11):1046–7.
Akbani R, Ng PK, Werner HM. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat Commun. 2014;20129:3887.
Iasonos A, Schrag D, Raj GV, Panageas KS. How to build and interpret a nomogram for cancer prognosis. J Clin Oncol. 2008;26(8):1364–70.
Gao Q, Zhu H, Dong L. Integrated Proteogenomic characterization of HBV-related hepatocellular carcinoma. Cell. 2019;179(5):1240.
Asplund A, Edqvist PH, Schwenk JM, Pontén F. Antibodies for profiling the human proteome-the human protein atlas as a resource for cancer research. Proteomics. 2012;12:2067–77.
Jiang Y, Sun A, Zhao Y. Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma. Nature. 2019;567(7747):257–61.
Feng J, Dai W, Mao Y. Simvastatin re-sensitizes hepatocellular carcinoma cells to sorafenib by inhibiting HIF-1α/PPAR-γ/PKM2-mediated glycolysis. J Exp Clin Cancer Res. 2020;39(1):24.
Wang Z, Li F, Quan Y. Avicularin ameliorates human hepatocellular carcinoma via the regulation of NF-κB/COX-2/PPAR-γ activities. Mol Med Rep. 2019;19(6):5417–23.
Jo H, Lee J, Jeon J. The critical role of glucose deprivation in epithelial-mesenchymal transition in hepatocellular carcinoma under hypoxia. Sci Rep. 2020;10(1):1538.
Liao S, Chen H, Liu M, Gan L, Li C, Zhang W, Lv L, Mei Z. Aquaporin 9 inhibits growth and metastasis of hepatocellular carcinoma cells via Wnt/β-catenin pathway. Aging (Albany NY). 2020;12(2):1527–44.
Togni R, Bagla N, Muiesan P, Miquel R, O'Grady J, Heaton N, Knisely AS, Portmann B, Quaglia A. Microsatellite instability in hepatocellular carcinoma in non-cirrhotic liver in patients older than 60 years. Hepatol Res. 2009;39(3):266–73.
Ozer O, Bilezikci B, Aktas S, Sahin FI. Methylation profile analysis of DNA repair genes in hepatocellular carcinoma with MS-MLPA. Diagn Mol Pathol. 2013;22(4):222–7.
Roberts ME, Jackson SA, Susswein LR, Zeinomar N, Ma X, Marshall ML, Stettner AR, Milewski B, Xu Z, Solomon BD, et al. MSH6 and PMS2 germ-line pathogenic variants implicated in lynch syndrome are associated with breast cancer. Genet Med. 2018;20(10):1167–74.
Quezada-Diaz FF, Hameed I, von Mueffling A, Salo-Mullen EE, Catalano JD, Smith JJ, Weiser MR, Garcia-Aguilar J, Stadler ZK, Guillem JG. Risk of Metachronous colorectal neoplasm after a segmental colectomy in lynch syndrome patients according to mismatch repair gene status. J Am Coll Surg. 2020;230(4):669–75.
Xing WC, Qi WX, Ho CS. Blocking CDK1/PDK1/β-catenin signaling by CDK1 inhibitor RO3306 increased the efficacy of sorafenib treatment by targeting cancer stem cells in a preclinical model of hepatocellular carcinoma. Theranostics. 2018;8(14):3737–50.
Gao CL, Wang GW, Yang GQ, Yang H, Zhuang L. Karyopherin subunit-α 2 expression accelerates cell cycle progression by upregulating CCNB2 and CDK1 in hepatocellular carcinoma. Oncol Lett. 2018;15(3):2815–20.
Zhang B, Dong LW, Tan YX, Zhang J, Pan YF, Yang C, Li MH, Ding ZW, Liu LJ, Jiang TY, et al. Asparagine synthetase is an independent predictor of surgical survival and a potential therapeutic target in hepatocellular carcinoma. Br J Cancer. 2013;109(1):14–23.
Li W, Dong C. Polymorphism in asparagine synthetase is associated with overall survival of hepatocellular carcinoma patients. BMC Gastroenterol. 2017;17(1):79.
Sym M, Engebrecht JA, Roeder GS. ZIP1 is a synaptonemal complex protein required for meiotic chromosome synapsis. Cell. 1993;72(3):365–78.
Costello LC, Franklin RB, Zou J. Human prostate cancer ZIP1/zinc/citrate genetic/metabolic relationship in the TRAMP prostate cancer animal model [J]. Cancer Biol Ther. 2011;12(12):1078–84.
Franklin RB, Ma J, Zou J, Guan Z, Kukoyi BI, Feng P, Costello LC. Human ZIP1 is a major zinc uptake transporter for the accumulation of zinc in prostate cells. J Inorg Biochem. 2003;96(2–3):435–42.
Xiong Y, Lu J, Fang Q, Lu Y, Xie C, Wu H, Yin Z. UBE2C functions as a potential oncogene by enhancing cell proliferation, migration, invasion, and drug resistance in hepatocellular carcinoma cells. Biosci Rep. 2019;39(4):BSR20182384.
This work is not supported by grants.
Ethics approval and consent to participate
No permissions were required to use the repository data.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. The detailed clinical information of CPTAC-HCC patients. Table S2. The 422 differentially expressed proteins identified using the CPTAC database. Table S3. A total of 542 interactions and 236 nodes screened to establish the PPI network. Table S4. The top five most contiguous nodes: CDK1, AOX1, CYP2E1, CYP3A4, and TOP2A. Table S5. Cox regression analysis of the identified 105 survival-related proteins. Table S6. Univariate Cox regression analysis of survival-related proteins. Table S7. Multivariate Cox regression analysis of survival-related proteins and 41 proteins identified as independent prognostic factors for OS. Table S8. ROC curves investigating the use of the protein patterns as early predictors of HCC incidence and the 8 proteins with AUC value above 0.7. Table S9. The relationship between the 8 proteins and clinical factors. Table S10. Univariate Cox regression analysis exploring the expression of survival-related proteins in the TCPA database.
About this article
Cite this article
Wu, Z., Yang, D. Identification of a protein signature for predicting overall survival of hepatocellular carcinoma: a study based on data mining. BMC Cancer 20, 720 (2020). https://doi.org/10.1186/s12885-020-07229-x
- Hepatocellular carcinoma