Eliminating of batch effect
To remove the batch effect among the three distinct datasets, “limma” and “sav” algorithms (see Section Method) were performed. In total, 16,421 genes were extracted from the three HCC cohorts. Considering the batch effect, HCC samples were gathered in batches according to the top two principal components of the non-normalized mRNA sequencing (Fig. 1A). After removal of the batch effect, the scatter plot based on the PCA of the normalized expression matrix indicated successful removal of the batch effect by cross-platform normalization (Fig. 1B).
Landscape of tumor immune microenvironment
The CIBERSORT algorithm was used to determine the comprehensive TIME context (Table S1). Figure 1C shows the relative abundance of the 22 TIC subpopulations. The correlations between TIME patterns and clinical traits were investigated and are presented in a comprehensive heatmap (Fig. 1D). To further uncover the potential intrinsic interactions of infiltrating immune cells, correlation analysis was conducted to visualize the comprehensive landscape of the TIME (Fig. 1E). Notably, CD8+ T cells were the most negatively correlated with resting CD4+ T memory cells, whereas CD8+ T cells were the most positively correlated with activated CD4+ T memory cells.
Construction of WGCNA co-expression network
A gene matrix within 15,420 genes and relative subpopulations of 22 immune infiltrations were included for further analyses of the WGCNA co-expression network. The first power value [4] when the index of scale-free topologies achieved 0.90 was set as the optimal soft threshold power (β) to construct the scaleless network (Fig. 2A). Similar expressed genes were assigned to the same candidate module using a dynamic tree-cutting algorithm (module size =60), thereby creating a hierarchical 23-module clustering tree (Fig. 2B). The columns represent 22 TIC abundances, whereas the rows represent the 23 modules with different traits (Fig. 2C). Notably, the brown module was significantly and positively correlated with γδ T cell infiltration (cor = 0.33, P = 1e-14). Our primary concern was the infiltration of γδ T cells; therefore, the 1755 genes (Table S2) from the brown module were included in the subsequent analysis.
GO and KEGG functional annotation
To further reveal the potential role of γδ T cell-specific genes in biological processes, GO and KEGG pathway enrichment analyses were performed (Tables S3 and S4, respectively). The GO pathway enrichment analysis indicated that γδ T cell-specific genes were predominantly involved in T cell activation, regulation of lymphocyte activation, and regulation of immune effector processes in biological processes, and concentrated in the secretory granule membrane, external side of the plasma membrane, and endosome membrane in cellular components. These genes are also involved in cytokine receptor binding, carbohydrate binding, and cytokine activity in molecular function (Fig. S1A-C). As for the KEGG terms analysis, the top three enriched pathways were cytokine-cytokine receptor interaction, Epstein-Barr virus infection, and tuberculosis (Fig. S1D).
Construction of prognostic risk signature
In total, 448 γδ T cell-related hub genes (23 downregulated genes and 425 upregulated genes) were identified as DEGs between tumor and normal tissue samples (Fig. 2D). To explore the prognosis prediction capability of these genes, expression data and clinical data were extracted from TCGA-LIHC project data. Using univariate Cox regression, 194 γδ T cell-specific genes were screened for prognostic significance (P < 0.05, Table S6). Next, the LASSO algorithm was performed to identify 11 hub γδ T cell-related genes (Fig. 2E, F). Using a multivariate proportional hazards model, we identified five γδ T cell-specific genes (ATP1B3, PZP, ST6GALNAC4, RFESD, and IFRD2) as the final hub genes, among which PZP was indicated as a favorable prognostic indicator (all hazard ratios [HRs] < 1), whereas the other four genes (ATP1B3, ST6GALNAC4, RFESD, and IFRD2) were considered risk prognostic factors (all HRs > 1, Table S7). Gene expression values based on TCGA database showed that the expression levels of most genes were significantly and abnormally regulated in HCC samples relative to normal tissues (Fig. S2A-E). In addition, survival analysis between the low- and high-gene expression subgroups showed that abnormal mRNA expression of most hub genes resulted in significantly distinct prognostic outcomes (most P < 0.05, Fig. S3A-E).
Finally, these five hub genes were introduced into the prognostic risk model for HCC. The risk score was calculated as follows:
$$Risk\;score\:=\:(0.3115\:\times\;expression\;level\;of\;ATP1B3)\:+\:(0.3777\:\times\:expression\;level\;of\;ST6GALNAC4)\:+\:(0.9482\:\times\:expression\;level\;of\;RFESD)\:+\:(0.3308\:\times\:expression\;level\;of\;IFRD2)\;-\;(0.5348\:\times\:expression\;level\;of\;PZP)$$
Finally, each HCC sample, together with the corresponding risk score, was stratified into low- and high-risk groups according to the median cutoff value.
Validation of prognostic risk signature
Figure 3A shows the distributions of the hub gene expression values in the patient and risk groups. In addition, risk score allocations and dot pot of survival status highlighted that high-risk HCC patients exhibited shorter OS times (Figure 3B, C). Kaplan-Meier survival analysis revealed that high-risk patients showed significantly poorer prognosis relative to low-risk patients (P < 0.001, Fig. 3D). Moreover, the area under the curve (AUC) was 0.747, 0.750, and 0.729 at 1, 2, and 3 years, respectively (Fig. 3E). Next, the univariate Cox analysis determined the HR for the risk score was 1.446 (95% confidence interval [CI]:1.310–1.595; Fig. 3F). Furthermore, multivariate Cox regression analysis (HR = 1.416, 95% CI:1.268–1.580; Fig. 3G) demonstrated that risk score is an independent prognosis prediction factor in HCC.
These results were further validated in the internal testing group to demonstrate the prognostic performance of the constructed model. The distributions of gene expression patterns, survival status/time, and risk scores in the internal testing group and the entire HCC cohort are shown in Fig. 4A-C. Moreover, Kaplan-Meier curves were analyzed, and we found that low-risk HCC patients exhibited notably longer OS times compared with high-risk patients in the internal testing group (Fig. 4D, P =0.013). The areas under the ROC curves were both 0.74 or higher in the testing group (Fig. 4E), suggesting the robustness of this risk model for prognosis prediction in HCC. Likewise, risk score was as an independent prognosis prediction trait in both the univariable and multivariable regression analyses of the testing group (Fig. 4F-G).
The signature was applied to the ICGC-LIRI-JP cohort to validate the external prognosis prediction performance. Fig. S4A-C shows the distributions of the six gene expression patterns, sample survival status, and corresponding risk score in the external validation cohort. Furthermore, survival analysis showed that high-risk HCC patients had a poorer prognosis than low-risk patients (Fig. S4D, p = 0.003). The areas under the ROC values were more than 0.68 at 1, 2, and 3 years in the external testing cohort (Fig. S4E), which was consistent with our previous results in the training group. Taken together, our results confirm the external validity of the prognostic risk signature in distinct populations.
Clinical significance of risk score
Subsequently, the distribution of clinical variables in distinct risk subgroups was investigated (Fig. 5A). Figure 5B-D show the fraction of subgroups for clinicopathological grade, clinical staging, and T status in the low- and high-risk subgroups, respectively. We discovered that with advanced pathological grade (4/6, P < 0.05, Fig. 5E), late tumor staging (2/6, P < 0.05, Fig. 5F), and higher T status (4/6, P < 0.05, Fig. 5G), the risk score significantly increased.
Stratification survival analyses were performed to explore whether the risk score maintained its prognosis prediction capability when HCC patients were partitioned into subgroups with different clinical features. Relative to patients with high-risk HCC, samples from low-risk HCC patients showed higher OS probability in the ≤65- or > 65-year-old age subgroups (Fig. S5A, B). Similarly, the risk score showed great prognostic capability for both female and male patients (Fig. S5C, D), as well as tumor grades 1–2 or 3–4 (Fig. S5E, F), early- and late-stages (Fig. S5G, H), T1–2 or T3–4 categories (Fig. S5I, J), N0 status (Fig. S5K), and M0 category (Fig. S5L). These findings suggest that risk score may be an excellent prognostic predictor, independent of clinical traits.
Correlation of risk with immune infiltration and therapeutic estimation
As the γδ T cell-based risk signature was derived from immune infiltration profiling, the potential contribution of the risk score to immune infiltration was further explored. The results showed that risk score was significantly and negatively associated with infiltration of resting natural killer (NK) cells, whereas it was positively correlated with an abundance of CD8+ T cells, cancer-associated fibroblasts, monocytes, and M1 macrophages (Fig. 6A). The detailed results are presented in Table S8.
Subsequently, the possible roles of risk scores in immunotherapy were investigated. First, a correlation analysis of ICB hub targets (PDCD1, CD274, PDCD1LG2, CTLA-4, HAVCR2, and IDO1) [33,34,35] and risk score was conducted. The risk score was significantly and positively associated with the expression levels of CD274 (r = 0.38; P = 4.9e-14), CTLA-4 (r = 0.22; P = 3.1e-05), HAVCR2 (r = 0.33; P = 5.1e− 11), IDO1 (r = 0.16; P = 0.0028), PDCD1 (r = 0.19; P = 0.00027), and PDCD1LG2 (r = 0.28; P = 4e− 8; Fig. 6B-G), suggesting that risk score may act as a key driving factor in immunotherapeutic prediction of HCC.
Using the “pRRophetic” algorithm, the IC50 values of four chemotherapeutic drugs (cisplatin, gemcitabine, metformin, and nilotinib) were evaluated in HCC patients. Cisplatin and gemcitabine presented a lower IC50 trend in high-risk patients (both p < 0.05; Fig. 6H, I). Conversely, the IC50 of metformin and nilotinib was lower in HCC samples from low-risk patients (both p < 0.05; Fig. 6J, K). These findings demonstrate that patients from different risk subgroups were sensitive to distinct chemotherapeutic drugs.
Enrichment of signaling pathways in low- and high-risk groups
To further reveal the biological roles of distinct risk groups in tumorigenicity and progression, GSVA was performed (Fig. 7A, B). The risk score was positively correlated with heightened activities of the Wnt-β-catenin, tumor necrosis factor (TNF)α/NF-κB, transforming growth factor (TGF)-β, phosphatidylinositol-3-kinase (PI3K)/AKT/mammalian target of rapamycin (Mtor), P53, Notch, interleukin (IL)6/Janus kinase (JAK)/signal transducer and activator of transcription (STAT)3, IL2/STAT5, mitogen activated protein kinases (MAPK), and KRAS signaling pathways. These results suggest that high-risk patients may be more suitable for administration of targeted molecular inhibitors. The risk score was negatively correlated with gene sets enriched in bile and fatty acid metabolism signaling pathways. To further reveal the underlying mechanism in low-risk patients, an investigation of metabonomic data is required in the future.
Association between Risk and TMB.
Accumulating studies have provided strong evidence that TMB may act as a crucial indicator of sensitivity to anti-cancer treatment. Therefore, we investigated the intrinsic interaction of γδ T cell-based risk with TMB and elucidated the hereditary variations between risk subtypes. First, the TMB level of each HCC sample was measured. The TMB value showed a relatively lower trend in low-risk patients than in high-risk patients (p = 0.024, Fig. 8A). Subsequently, the HCC samples were divided into distinct subgroups based on the TMB value set point. Survival analysis further indicated that the TMB value can provide a significant prognostic difference (p < 0.001, Fig. 8B). Correlation analysis indicated that the TMB level was significantly and positively associated with risk score (R = 0.15, p = 0.0047, Fig. 8C). As revealed in the stratified survival curves, there was no interference of risk score with TMB value in prognosis prediction. The risk subgroups showed significant prognostic differences in both the low and high TMB value subtypes (p < 0.001, Fig. 8D). Furthermore, the multivariable regression analysis showed that TMB value was not an independent prognosis prediction factor for HCC (Fig. 8E, F). Our results suggest that risk score has the potential to predict anti-cancer therapy outcomes.
Subsequently, the distribution of somatic variants was investigated in both the high- and low-risk subgroups. The comprehensive context of the gene mutations showed the variation patterns and clinical traits of the top 20 driver genes with the most frequent alterations (Fig. 8G, H). The mutational landscapes showed that TTN experienced the highest somatic mutation rates in the low-risk score subtype, whereas TP53 possessed the highest somatic mutation rates in the high-risk score subgroup (Fig. 9A, D). In the high-risk subgroup, missense mutation was the primary variant classification; most mutations belonged to single nucleotide polymorphisms, and C > T was the most common variation, with the highest number of variations per sample and the median of variation types (Fig. 9B, C). Similar results were observed in the low-risk subgroup (Fig. 9E, F). These results provide novel insights into the intrinsic interactions of somatic variants with γδ T cell patterns in HCC.
Development of novel prognostic nomogram
ROC curves were analyzed, and the AUC values for 1-, 2-, and 3-year OS were 0.715, 0.741, and 0.754, respectively, demonstrating the high predictive accuracy of the constructed risk model (Fig. 10A). To demonstrate that risk score was the best prognostic predictor among various clinical characteristics, age, sex, tumor stage, clinical grade, and TNM category were considered as candidate prognostic factors. These clinicopathological variables were used to implement the AUC analysis for 1-, 2-, and 3-year OS; risk score exhibited the highest AUC value (Fig. 10B-D). Subsequently, a novel nomogram, including risk score and clinical stage, was established for OS probability prediction (Fig. 10E). Age, sex, and tumor grade were excluded from the nomogram because their AUC values were less than 0.6. Calibration curves were plotted, which suggested high prognostic accuracy of the risk-stage nomogram (Fig. 10F-H). In addition, this nomogram was used in the ICGC-LIRI-JP cohort to confirm its external prognostic ability. Figure 11A shows the AUC values for 1-, 2-, and 3-year OS of the risk scores in the external validation cohort. Moreover, risk and stage were good indicators with AUC values > 0.6 (Fig. 11B-D). The survival rates of HCC patients from the ICGC-LIRI-JP cohort were predicted using the constructed nomogram (Fig. 11E) and its prognostic robustness was validated (Fig. 11F). Finally, the ROC curve was analyzed to confirm the prognostic performance of the nomogram relative to other prognostic factors in TCGA-LIHC project (Fig. 11G).
Potential function of RFESD in prognosis prediction, mechanism of immune infiltration, and therapeutic estimation
The potential role of RFESD as a prognostic γδ T cell-related gene has not yet been revealed yet in HCC. Thus, the biological roles of RFESD were further explored in subsequent analyses. The expression values of RFESD were compared between normal and tumor tissue samples based on TCGA dataset. For normal specimens and tumor tissues, RFESD expression levels showed a lower trend in normal compared with tumor tissue samples (Fig. 12A). Using qRT-PCR, the expression levels of RFESD in hepatic cell lines and two distinct hepatic cancer cell lines were investigated. Likewise, hepatic cancer cells exhibited significantly higher RFESD values than normal liver cells (Fig. 12B). To estimate the prognosis prediction performance of RFESD, survival analysis was performed to compare HCC patients with low and high RFESD expression. These results showed that a higher expression value of RFESD was significantly associated with a shorter OS time (P = 0.00093, Fig. 12C).
To reveal the biological roles of RFESD in immune infiltration, the correlation between RFESD expression and the abundance of infiltrating immune cells was investigated using the TIMER database. Arm-level gain was the main mutation type in CD4+ T cells, macrophages, and neutrophils (Fig. 12D). In addition, the expression level of RFESD was significantly correlated with neutrophil infiltration (r = 0.134; P = 1.26e− 02; Fig. 12E).
Next, the correlation of RFESD with key ICB genes adjusted by tumor purity was analyzed to investigate the potential functions of RFESD in immunotherapeutic prediction. These results showed that RFESD had a significant positive association with two of the four key ICB genes, including CD274 (r = 0.136; P = 1.16e− 02) and PDCD1LG2 (r = 0.117; P = 2.97e− 02; Fig. 12F-I), indicating that RFESD is an indispensable regulator in HCC immunotherapy.
Downregulation of RFESD suppressed HCC cell proliferation
To further reveal the biological roles of RFESD, RFESD siRNA was used to silence RFESD protein expression. The transfection effect on HCC cells was first estimated by qRT-PCR, and we observed that the relative mRNA expression value of RFESD was significantly lower after transfection with siRNA (Fig. 13A, P = 0.023). To further reveal the possible role of RFESD in cell proliferation, CCK-8 assays were performed to evaluate the effect of RFESD silencing on HCC cell growth. After RFESD knockdown, MHCC-97H cell proliferation was significantly decreased relative to that in control cells (Fig. 13B, P < 0.05). The EdU assay results also suggested that RFESD silencing significantly repressed HCC cell proliferation (Fig. 13C). These findings highlight that RFESD knockdown inhibits the proliferative abilities of HCC cells.