Differentially expressed PRGs in the TCGA cohort and landscape of genetic and expression variation of PRGs in HCC
The flowchart of data analysis was shown in Fig. 1. We compared the expression of 52 PRGs in TCGA data from 50 adjacent tissues and 374 tumor tissues, and finally 42 DEGs were identified. Among these DEGs, 5 genes were down-regulated in tumor group including IL1B, AIM2, IL6, NLRP3, and NLRP6. The other 37 genes were enriched in the tumor group, including BAK1, BAX, CASP3, CASP4, CHMP2A, CHMP2B, CHMP3, CHMP4A, CHMP4B, CHMP4C, CHMP6, CHMP7, CYCS, GSDMD, GSDME, HMGB1, IL1A, IRF2, TP53, TP63, CASP6, CASP8, CASP9, GPX4, GSDMA, GSDMB, GSDMC, NLRP1, NLRP7, NOD1, NOD2, PJVK, PLCG1, PRKACA, PYCARD, SCAF11, and TIRAP (Fig. 2A). We then demonstrated the incidence of CNVs and somatic mutations of 52 PRGs in HCC. As shown in Fig. 2B, genetic mutation was identified in 157 of 364 (43.13%) HCC samples. Missense mutation was the most common variant. In addition, TP53 gene showed the highest mutation frequency, followed by NLRP2 and NLRP3 genes (Fig. 2B). Figure 2C presented the location of CNV alterations of the 52 PRGs on chromosomes. For the CNV alteration frequency, all the 52 PRGs showed prevalent CNV alteration. More than half of the 52 PRGs had copy number amplification, while the CNV deletion frequencies of CASP9, CASP3, HMGB1, ELANE, CASP6, IRF2, GSDMB, GSDMA, GPX4, CASP4, CASP5, CASP1, IL18, TIRAP, CHMP2B, NLRP1, TP53 and CHMP7 were widespread (Fig. 2D).
The PPI network analysis was given to further explore the interactions of these PRGs. As shown in Supplementary Fig. S1A, the minimum interaction score was set at 0.9 with the highest confidence. Seven hub genes were screened including IL1B, NLRP3, PYCARD, CASP8, CASP3, TP53, as well as CHMP2A. The correlation network containing all PRGs was presented in Supplementary Fig. S1B.
Differentially expressed PRGs between HCC tissues and adjacent Normal tissues
We found 42 differential PRGs in normal paracancer samples and tumor samples of HCC by analyzing TCGA database. The Volcano plots of DEGs is shown in Supplementary Fig. S2. To explore the correlation between the expression of the 42 DEGs and HCC clusters, we performed a consensus clustering analysis involving 374 HCC patients in the TCGA cohort. When the clustering variable (k) was set at 2, the intragroup correlations were the highest and the intergroup correlations were the lowest, indicating that the 374 HCC patients could be well divided into two clusters based on the 42 DEGs (Fig. 3A). The gene expression profile and the clinical features were presented in a heatmap, which showed significant differences in the distribution of T stage, stage, grade and gender between the two cluster groups (P < 0.05). In contrast, there was no statistical difference in the distribution of N stage, M stage and age between the two groups (P > 0.05, Fig. 3B). There is a significant difference in the OS time of two clusters (P < 0.001, Fig. 3C).
Development of a prognostic gene model in the TCGA cohort
A total of 374 HCC samples were matched with the corresponding patients with complete survival information. Univariate Cox regression analysis was used to screen the survival-related pyroptosis genes (Fig. 4A). Then, these genes were performed by multivariate Cox regression analysis. In total, 10 genes were identified and used for the subsequent modeling, including BAK1, BAX, CHMP2A, GSDME, IL1A, TP53, TP63, GPX4, PRKACA and SCAF11. The risk score was calculated as follows: risk score = (0.368611051021708 * expression of BAK1) + (0.308688517099686 * expression of BAX) + (− 0.52007432297355 *expression of CHMP2A) + (0.330587747807719 * expression of GSDME) + (− 0.807361948750797 * expression of IL1A) + (− 0.323671479794998 * expression of TP53) + (− 0.604855494168515* expression of TP63) + (0.512445054990862 * expression of GPX4) + (− 0.283264209118667 * expression of PRKACA) + (0.432681763927682 * expression of SCAF11). Based on the median score calculated by the risk score formula, 374 patients were equally divided into low-risk group and high-risk group (Fig. 4B). Patients in the high-risk group showed a higher death rate and a shorter survival time than those in the low-risk group (Fig. 4C). A notable difference in OS time was detected between the low-risk group and high-risk group (P < 0.001, Fig. 4D). Time dependent ROC analysis was applied to evaluate the sensitivity and specificity of the prognostic model, yielding AUC of 0.766 for 1-year, 0.694 for 3-year, and 0.676 for 5-year survival, respectively (Fig. 4E).
External validation of the risk signature
Patients from GSE14520 and ICGC-LIRI-JP cohorts were utilized as the validation set. Kaplan–Meier analysis indicated a significant difference in the survival rate between the low-risk group and high-risk group in the ICGC-LIRI-JP cohort (P = 0.008, Fig. 5A), as well as the GSE14520 cohort (P = 0.027, Fig. 5B). ROC curve analysis of the ICGC cohort showed that our model had good predictive efficacy for 1-year (AUC = 0.614) and 3-year surivival (AUC = 0.683) (Fig. 5C), respectively. The GSE14520 cohort showed that the model had good predictive efficacy for 3-year survival (AUC = 0.571) and 5-year survival (AUC = 0.637) (Fig. 5D).
Independent prognostic value of the risk model
Univariate and multivariable Cox regression analyses were performed to evaluate whether the risk score derived from the gene signature model could serve as an independent prognostic factor. The univariate Cox regression analysis indicated that the risk score was an independent factor for poor survival in the TCGA cohort (HR = 1.601, 95% CI: 1.374–1.864, Fig. 6A). The multivariate analysis also implied that, after adjusting for other confounding factors, the risk score was an independent prognostic factor for patients with HCC in the TCGA cohort (HR = 1.485, 95% CI: 1.261–1.750, Fig. 6B). In addition, a heatmap of clinical features for the TCGA cohort indicated that the T stage and grade were differently distributed between the low-risk group and high-risk group (Fig. 6C).
Establishment and evaluation of a nomogram for predicting patient 1-year, 3-year and 5-year OS
Four prognostic factors were combined to establish a nomogram for predicting 1-year, 3-year and 5-year OS based on the TCGA dataset (Fig. 7A). The calibration curves for predicting 1-year, 3-year and 5-year OS were in good agreement with the observed values (Fig. 7B). The AUC for predicting 1-year, 3-year and 5-year OS was 0.81, 0.80 and 0.76, respectively (Fig. 7C).
Relationship between prognostic signature and immune infiltration
Firstly, we examined component differences of immune cells between high- and low-risk groups, as well as risk score values. Spearman correlation analysis was performed using different algorithms, with a resulting lollipop shape, as displayed in Fig. 8A. The results indicated that most immune cells were positively correlated with the risk score, which was consistent with our GSEA finding that the high-risk group was predominantly enriched in immune-related pathways. The heatmap demonstrated that the infiltration of most immune cells was higher in the high-risk group than in the low-risk group (Fig. 8B). We further elucidated the correlation of PRGs expression with each type of immune cell infiltration. The infiltration of NK cells was positively correlated with the expression of CHMP2A, while the infiltration of macrophages M1 was negatively correlated with the GSDME (Supplementary Fig. S3).
Gene set enrichment analysis (GSEA) and mutation data analysis of PRGs between the high- and low-risk group
GSEA had an advantage in exploring the involved signaling pathways, which revealed that the genes in the high-risk group of TCGA cohorts were significantly enriched in tumor and immune-related pathways such as B cell receptor signaling pathway, T cell receptor signaling pathway, P53 signaling pathway, pathways involved in the pathogenesis of cancer and cell cycle. In contrast, the low-risk group genes were significantly enriched in metabolism-related pathways such as complement and coagulation cascades, drug metabolism cytochrome p450, retinol metabolism, fatty acid metabolism, as well as linoleic acid metabolism (Fig. 9A). Meanwhile, the top 2 driver genes TP53 and CTNNB1 were significantly different between high (Fig. 9B) and low-risk groups (Fig. 9C).
IC50 of chemotherapeutic drugs between the high- and low-risk group
We further examined whether the risk score can predict the sensitivity of patients to chemotherapy, which showed that patients in the high-risk group were more sensitive to Axitinib, Dasatinib, Erlotinib, Lapatinib (p < 0.001) (Fig. 10A). The patients in the low-risk group were more sensitive to Gemcitabine, Nilotinib, Camptothecin, and Tipifarnib (p < 0.001) (Fig. 10B). These results suggested that PRGs were of great significance in targeted drug therapy.
Validated PRGs between HCC tissues and adjacent Normal tissues
To explore the expression of BAK1, BAX, CHMP2A, GSDME, IL1A, TP53, TP63, GPX4, PRKACA and SCAF11 in HCC tissues, we detected PRGs expression in HCC tissues from 30 patients by qRT-PCR assay. The results of qRT-PCR suggested that BAK1, BAX, CHMP2A, GSDME, IL1A, TP53, TP63, GPX4, PRKACA and SCAF11 were highly expressed in HCC tissues (Fig. 11A-J).