Comprehensive Analysis of Immune Prognostic ‐ Related Genes in the Tumor Microenvironment of Hepatocellular Carcinoma

Background: The percentage of death resulted from hepatocellular carcinoma (HCC) remains high worldwide, despite surgical and chemotherapy treatment. Immunotherapy offers great promise in the treatment of a rapidly expanding spectrum of HCC. Therefore, further exploration of the immune-related signatures in the tumor microenvironment, which plays a vital role in tumor initiation and progression for immunotherapy is currently needed. Methods: In this present research, 866 immune-related difference expression genes (DEGs) were identied by integrating the DEGs between TCGA HCC and normal tissue and the immune genes from databases (Innate DB; Imm Port), and 144 candidate prognostic genes were dened through weighted gene co-expression network analysis (WGCNA). Results: Seven prognostic immune-related DEGs were determined with LASSO Cox PH model, which was followed by constructing the ImmuneRiskScore model based on the prognostic immune-related DEGs. The prognostic index of the ImmuneRiskScore was validated then in the dependent dataset. Patients were divided into high- and low-risk groups according to ImmuneRiskScore. The difference in ImmuneRiskScore and inltration of immune cells between groups was detected and the correlation analysis for immunotherapy biomarkers was further explored. Conclusion:The ImmuneRiskScore of HCC could a prognostic signature reect characteristics within Furthermore, it also may provide novel immunotherapy predictive biomarker for HCC patients in the near future.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common malignancies [1,2]. With 5-year survival being 18%, HCC is the second most lethal tumor after pancreatic cancer [3] and the fourth leading cause of cancer-related mortality worldwide [4,5]. Hepatocellular carcinoma is the major cancer type for primary liver cancers and its increase in deaths is a growing concern. However, general therapies such as radiotherapy and chemotherapy do not prolong overall survival (OS) signi cantly in HCC [6].
Thus, new strategies are required. Immunotherapy with checkpoint inhibitors is also emerging as an important treatment option. Immune checkpoint inhibitors (ICIs) is revolutionizing the clinical treatment landscape of multiple tumors, most notably advanced melanoma [7][8][9][10][11][12][13], non-small-cell lung cancer [14,15] and renal cell carcinoma [16,17]. Since HCC seems to be submissive to the programmed cell death protein 1 (PD1) pathway blockade [18], ICIs with approval likely for additional indications for HCC in the near future [6,19,20]. Though the remarkable progress that has been achieved, there is only a limited number of patients could bene t from ICIs [21]. Therefore, there is an urgent need for new, immune-based biomarkers distinguishing HCC patients more likely to have a better prognosis and further bene t from immunotherapy.
As an important element of the HCC tumor microenvironment, immune cells show clinicopathological signi cance in predicting prognosis and therapeutic e cacy [22][23][24]. It is an active eld to investigate the characteristics of the tumor microenvironment functionally impact effect on immunotherapy.
In this study, we make full use of TCGA data and a priori immune-related genes to construct a prognostic immune risk score by WGCNA and Lasso cox model. We also analyzed the correlation for ImmuneRiskScore and different immune cells to elucidate mechanisms possible for the formation of the microenvironment. Last, we explored the correlation with other immune biomarkers and the potential to identify patients eligible for immunotherapy to improve the therapy effects. The owchart as shown in Fig. 1.

Data download and procession
We downloaded the RNA sequencing expression pro le(count and RPKM format) and clinical data of TCGA-HCC from the UCSC Xena data portal(https://xena.ucsc.edu/), which contains 50 normal samples and 374 tumor ones. The immune-related genes set contains 1052 immune genes downloaded from InnateDB (https://www.innatedb.ca/) (Tables S1) and 1811 immune-related genes downloaded from ImmPort [25](https://www.immport.org) (Tables S2). The expression pro ling datasets GSE14520 as well as the clinical data were obtained by GEOquery [26] package of Bioconductor in R-3.5.2. GSE14520 contains 162 tumor samples respectively after removing the normal samples. Microarray probe ID mapped to gene symbols based on the GPL3921 platform (Affymetrix HT Human Genome U133A Array) and incorporated in the dataset matrix for each dataset. Eventually, the average of multiple probes computed that correspond to a single gene for each dataset individually employing in-house R scripts. The tumor mutation burden data was download from PanCancerAtla (https://www.cell.com/consortium/pancanceratlas).

Difference Expression Analysis
We identi ed differential expression genes (DEGs) based on limma [27] packages in R between Normal (n = 50) and cancer(n = 367) patients based on the raw counts of HCC gene expression data from TCGA.
Empirical Bayes method was used in selecting signi cant DEGs based on "limma" package which was used in the standard comparison mode, and the threshold was set at p < 0.05 and |log2 fold change| >= 1.5.

Gene Ontology (GO) and pathway enrichment analysis of DEGs
As an ontology-based R package, clusterPro ler not only can automate the process of biological-term classi cation and the enrichment analysis of gene clusters, but also provides a visualization module for analysis results display [28]. In this present research, the clusterPro ler package was used in identifying and visualizing the GO terms (biological process, cellular component, and molecular function included) and KEGG pathways enriched by DEGs. We set P-value < 0.01 as the cut-off criterion, the signi cant adjustment method was BH and the cut-off criterion of q-value was also set 0.01.

WGCNA
We used fpkm format of TCGA datasets to construct WGCNA analysis [29]. First, Basic data preprocessing for handling missing data and removing outliers; Second, Choosing the soft-thresholding power by the pickSoftThreshold. It can calculate the scale-free topology t index for several powers and provide the appropriate soft-thresholding power for network construction. Third, constructing a one-step automatic network and detecting modules. Turning adjacency into topological overlap to measure the network connectivity of a gene considered as the sum of its adjacency with all other genes for the generation of network. Hierarchical clustering function was used in classifying genes with a similar expression pro le into modules [30]; Next, Selecting the key modules related to OS and OS time and visualized by Cytoscape [31]. In this present study, modules were chosen and were visualized. The correlation between MEs and clinical traits "survival" and "survival time" was calculated to identify the module related. And then, in the linear regression between gene expression and clinical information, we de ned gene signi cance (GS) as the log10 transformation of the P-value (GS = lgP). In a module, module signi cance (MS) was de ned as the average GS for all the genes. Hub genes were identi ed with high clinical traits signi cance (> 0.1) and high intramodular connectivity (> 0.5) in interesting modules and were selected as the candidates to be further analysed and validated.
Gene set enrichment analysis of hub genes We used gpro ler2 (https://CRAN.R-project.org/package=gpro ler2) to perform over-representation analysis on input HCC hub gene list. Based on high-quality up-to-date data across different evidence types, g:Pro ler can provide a reliable service [32]. It maps these immune genes to known functional information sources and detects statistically signi cantly enriched terms. We included pathways from KEGG(https://www.genome.jp/kegg/), Reactome (https://reactome.org/), and CORUM (http://mips.helmholtz-muenchen.de/corum/). This method was done with the hypergeometric test followed by FDR correction for multiple testing.

Construction and Validation of an Immunoscore Prognostic Model
With the univariable Cox proportional hazards regression model in 'survival' package, we made a calculation of the hazard proportions for DEGs of the HCC cohort. DEGs with signi cance p value < 0.05 were analyzed, and we re-computed these DEGs with survival risk for Cox regression models by glmnet package [32]to select several more important prognostic genes among DEGs. As an R package, glmnet ts a generalized linear model via penalized maximum likelihood. The regularization path is computed for the lasso by setting the regularization parameter lambda as 1. To predict patient survival, a formula for the ImmuneRiskScore model was established as follows: Immune Risk Score = Estimation of Immune Cell Type Fractions and estimate score CIBERSORT [33] can characterize cell composition of complex tissues from their gene expression pro les.
In this study, we used CIBERSORT and LM22 reference gene expression matrix to quantify cell composition in diverse HCC samples. Analysis of normalized gene expression data were carried out by using the CIBERSORT algorithm, running with 1,000 permutations. Calculation of immune and stromal scores were performed with ESTIMATE, which was an algorithm could provid scores for the level of stromal cells present, and the in ltration level of immune cells in tumor tissues [34].

Statistical Analysis
The survival curve for the hub immune genes was created by the Kaplan-Meier method and the statistical signi cance of difference was judged by the Log-rank test. The receiver operating characteristic (ROC) curve was applied to describe the sensitivity and speci city of survival prediction based on the Immune Risk Score, and the pROC package was utilized to quantify the area under the curve (AUC). Nonparametric Mann-Whitney-Wilcoxon Test was used to compare the data from different groups and Pearson's chisquare test was performed in measuring the level of signi cance for association amongst variables. All statistical analyses were conducted using R software. All of the reported P values were two-tailed, and p < 0.05 was considered to be statistically signi cant.

Identi cation of DEGs related to immune
From the TCGA database, we obtained the expression pro les of 417 HCC samples, which contained 367 tumor samples and 50 nontumor ones after data preprocessing. Totally 7194 genes were identi ed as DEGs with the threshold of P < 0.05 and |log2FC| > 1.5, containing 3657 genes upregulated and 3537 genes downregulated ( Fig. 2A, Tables S3). The samples could be well clustered for normal and tumor groups when the top 200 DEGs were selected for unsupervised hierarchical clustering (Fig. 2B). To obtained the immune-related DEGs for HCC samples, we selected the genes both be immune-related genes and difference expressed for HCC. We total collected 2542 human immune-related genes composed by the 1811 immune-related genes from Immport and 1052 innate immune-related genes from InnatedDB. Finally, there were 866 genes within all the DEGs when intersecting with immune genes (Fig. 1C, Tables S4), and These immune-related DEGs were then chosen for further analysis.
The further GO enrichment analysis of these 866 immune-related DEGs showed that 1008 go terms including 891 biological processes (BPs), 39 molecule functions (MFs), and 78 cellular components (CCs) were signi cantly enriched in (Tables S5). As shown in Fig. 3A, the top 10 go terms were showed by classi cation. Leukocyte migration, positive regulation of cytokine production, cell chemotaxis, leukocyte cell to cell adhesion, and T lymphocyte activation are the top enriched biological process; the enriched cellular components including collagen − containing extracellular matrix, external side of plasma membrane, collagen trimer, secretory granule membrane, extracellular matrix component; the enriched molecule functions contains structural constituent of extracellular matrix, carbohydrate binding, receptor ligand activity, cytokine activity, glycosaminoglycan binding. More details about the top GO enrichment analysis results were showed by overview their genes in Fig. 3B, C, D. These circle plots signi cantly could help us to understand the function of DEGs in these enriched terms. And most of the time, it is more meaningful to further represent genes with various functions. Furthermore, analysis of KEGG pathway enrichment (Tables S6) indicated that signi cantly enriched pathways include interaction between cytokine and cytokine receptor, Viral protein interaction with cytokine and cytokine receptor, Cell adhesion molecules (CAMs), Chemokine signaling pathway, Malaria (Fig. 3E). The top 10 pathways were showed and interact with their assigned genes in Fig. 3F. The KEGG pathways analysis results were different from GO enrichment results, indicating that the immune microenvironment of HCC was sophisticated.

Weighted Co-expression Network Construction And Key Modules Identi cation
The 866 immune-related DEGs and 367 HCC tumor samples were used in constructing gene coexpression network. After checked the missing values, we detected the outlier samples by hierarchical clustering (Figure S1), and the dendrogram shows 5 outlier samples that will be removed from the analysis. In this study, β = 4 (scale-free R2 = 0.85) was selected to be the soft-thresholding parameter to ensure a scale-free network. As shown in Fig. 4A, we performed network topology analysis for thresholding powers from 1 to 20, 4 was the lowest power for the scale-free topology t index on 0.85. We obtained gene clustering tree by using hierarchical clustering of TOM-based dissimilarity and identi ed 6 modules (Fig. 4B, Table 1). To select the clinically signi cant modules, we used WGCNA to calculate the correlations between the external clinical information and gene modules. As shown in Fig. 4C, the Green module was found to be the highest one associated with overall survival, and the green and brown eigengenes are highly related ( Figure S2). We visualized the green module as networks by Cytoscape and selected the top 100 gene pairs by sorting the weight of gene pairs. As shown in Fig. 4D, it indicates that some genes, such as PDLIM7, EHHADH, DMGDH, and CYP8B1 with larger size have higher node degree. Finally, we screened 144 immune-hub genes with high gene signi cance with OS and OS time (> 0.1), and high relationship with interesting green module (> 0.5) (Table S7).
Statistical gene set enrichment analysis about the 144 immune-hub genes was performed to nd overrepresentation of functions from biological pathways like KEGG, Reactome, and complexes in CORUM, etc. This is done with the hypergeometric test followed by correction for multiple testing. The results showed that immune-hub genes were enriched signi cantly in 52 pathways and complexes (p < 0.05) ( Fig. 4E; Tables S8), including KEGG pathways such as TNF signaling pathway, Metabolic pathways, Reactome pathways such as Phenylalanine and tyrosine metabolism, and CORUM complexes such as PLAUR − PLAU complex, IGF2R − PLAUR − PLAU complex, MAK − ACTR − AR complex, IGF2R − PLG − PLAU − PLAUR − LTGFbeta1 complex. These ndings show that these hub genes not only affect the metabolic, apoptosis, and cell survival as well as in ammation and immunity of HCC, but also plays a pivotal role in the protein complexes of immune cells.

Establishment of the lasso cox-based prognostic gene signature
We subsequently revealed that 108 of the 144 immune-hub genes were signi cantly related to OS through univariate Cox regression analysis (results in Tables S9). Then, we performed lasso-penalized Cox analysis to further narrow the hub genes (Fig. 5AB). Seven genes were identi ed and utilized thereafter in constructing an ImmuneRiskScore model to evaluate the prognostic value of CRC patients. The seven genes identi ed and their cox coe cient were showed in Table 2  . The GO enrichment analysis shows that these seven genes were enrichment in several molecular functions such as asparaginase activity, beta-aspartyl-peptidase activity, 6-phosphofructokinase activity, glucose-6phosphatase activity, and sugar-terminal-phosphatase activity. The formula for the ImmuneRiskScore model was described in the part of Materials and Methods.
Next, we divided HCC patients into high-and low-score groups, based on the lasso cox hub genes and ImmuneRiskScore, according to the optimal cut-off got from survminer package. The results indicated that ve genes (PLBD1, ETV4, PFKP, GNAZ, ASRGL1) are risk factors, and that high-score samples had a worse OS than those with low score and the SPP2 and G6PC are protective factors (Fig. 5C). The last gure in Fig. 5C showed the prognostic accuracy of ImmuneRiskScore (95% CI for HR: 0.48 (0.33-0.68), log-rank test p < 0.0001). Additionally, the result of multivariate Cox regression analyses showed that the predictive value of the ImmuneRiskScore for HCC patients is not associated with common clinical variables (Tables S10).

Validation Of The Immuneriskscore In Tcga Crc Cohort
To further investigate the prognostic value of ImmuneRiskScore, we conducted a validation analysis in another GEO cohort (GSE14520). The dataset was categorized into two groups based on ImmuneRiskScore; the results of analysis indicate that there is a signi cant prognostic value between ImmuneRiskScore and OS as well as recurrence, the high-score group had a worse OS rate than the lowscore one, not merely in the TCGA cohort (Fig. 6AB). Figure 6C showed the prognostic accuracy of ImmuneRiskScore in the GEO dataset, which was considered as a continuous variable during investigation. The area under the ROC curves (AUC) of the prognostic model for OS was 0.608 at 1 year, 0.614 at 3 years, 0.620 at 5 years. These results indicated that ImmuneRiskScore is a good model to predict survival.
Stromal and Immune cell in ltration landscapes between high and low ImmuneRiskScore By using ESTIMATE algorithm, we estimated the in ltrating cells and tumor purity of tumor tissue. Stromal score represented the presence of stromal cells in tumor tissue, and immune score indicated the in ltration of immune cells in tumor tissue, and combination of them stood for a measurement of tumor purity (Tables S11). We then distinguished differences in stromal and immune scores between high-and low-risk patients with LUSC. As shown in Fig. 7(A, B), the immune and stromal score in high-level patients were both signi cantly (Wilcox p < 0.05) higher than low-level HCC patients indicate the overall presence of immune and stromal levels are associated with Immune Risk Score.
We further estimated the ratio of 22 immune cell categories in HCC patients using the CIBERSORT method, the results are shown in Tables S12. The proportion distribution of immune cells is heterogeneity not only between the high and low-levels samples of ImmuneRiskScore but among the HCC samples

Correlations between the ImmuneRiskScore and immune biomarkers
The discovery of broad immune biomarkers of the TME could effectively predict clinical bene t to ICIs.
We next introduced a few important immune biomarkers including PDL1, PD1, PDL2, CTLA4, CYT, and IFN gamma. Among these biomarkers, the immune checkpoint genes including PDL1, PD1, PDL2, CTLA4 are worth attentional for Immune checkpoint inhibitors which are revolutionizing the clinical treatment landscape. Cytolytic activity (CYT) focuses on cytotoxic T cells (CTL) and natural killer cells (NK) for their powerful ability to lyse tumor cells [35]. CYT is measured based on the geometric mean of expression of granzyme A (GZMA) and perforin (PRF1). Interferon-γ(IFN-gamma) signature is a key cytokine to activate the PD-1 signaling axis by directly upregulating the ligands PD-L1 and PD-L2 mainly in tumor cell, which was produced by T cells activated, NK and NKT cells [36,37]. Tumor mutational burden (TMB) is de ned as the number of nonsynonymous mutations detected in each megabase sequenced. TMB is proved to be associated with improved responses to checkpoint blockade, in some tumors such as melanoma [38] and non-small-cell lung cancer [39,40]. An experimentally determined pan-broblast TGF-β response signature (Pan-F-TBRS) showed elevated mean expression in non-responders and decreased overall survival particularly in patients with mUC [41]. transforming growth factor β (TGF-β) signalling in broblasts is documented as a pleiotropic cytokine having relationship with poor prognosis in multiple tumor categories [42,43], and it is thought to be vital in advanced cancers in promotion of immunosuppression, angiogenesis, metastasis, tumor cell epithelial to mesenchymal transition (EMT), broblast activation and desmoplasia [44][45][46].
We further explored the relationship between ImmuneRiskScore and these immune biomarkers (Tables   S13, Tables S14) by  All of the signi cant p value for correlation was smaller than 0.001, suggesting that ImmuneRiskScore might be a potential biomarker for immunotherapy especially for Immune checkpoint inhibitors. In addition, there is no correlation between ImmuneRiskScore and TMB, indicating TMB is an independent factor mediating the TME in these two groups.

Discussion
Increasing evidence indicates that immune-related biomarker is associated with prognostic for various cancer types [47][48][49]. Moreover, prognostic biomarkers effectively guiding cancer therapy especially immunotherapy are still in need. Therefore, we constructed ImmuneRiskScore that contributes to overall survival by investigating the tumor microenvironment of HCC. Based on the gene expression data of HCC from TCGA, we identi ed a prognostic signature of seven immune hub genes (SPP2, G6PC, PLBD1, ETV4, PFKP, GNAZ, ASRGL1) through combining WGCNA, univariate Cox regression analysis, and LASSO PH model. Then, we constructed a seven-gene risk scoring system and classi ed HCC patients into two risk groups with signi cantly different survival rates. Successfully validation of the prognostic performance of the risk scoring model was conducted in an independent set from GEO. It shows that the seven-genes ImmuneRiskScore are promising prognostic biomarkers for HCC and may function importantly in the TME of HCC.
Generally, the ImmuneRiskScore is mainly the result of the presence of TME which including the complex interaction between cancer cells, stromal cells, and immune cells. Under this condition, we aimed to further explore the relationship between them. We found that the relative or absolute in ltration levels of B cells memory, Plasma cells, T cells CD4 memory activated, T cells regulatory Tregs, NK cells resting, Macrophages M0, Dendritic cells resting, Mast cells resting, Neutrophils are signi cantly correlated with ImmuneRiskScore, indicated that these cells are likely to have prognostic values. Such as in ltration of macrophages in solid tumors is associated with poor prognosis and correlates with chemotherapy resistance in most cancers [50]. It is noteworthy that these cell types didn't conclude most of T cell compartment, which the clinical response strategies have largely focused on. In fact, other immune cells may also contribute to anti-tumour immunity [51][52][53]. Especially memory B cells also has potential role in the response to ICB treatment [54].
With an aim to effectively predict clinical bene t to checkpoint inhibitor strategies, we performed a wider exploration of active innate and adaptive immune responses within the tumor microenvironment by gene expression pro ling. The predictive value of our immune relation score is envisaged for the positive correlation with PDL1, PD-1, PD-L2, CTLA-4, CYT, IFN gamma and Pan-F-TBRS. These positive related biomarkers are involved in the proin ammatory cytokines related in ammation microenvironment of tumor [55,56] and the TGF-β signal pathway related excluded microenvironment of tumor [41].
Corresponds to the previous in ltration results, the in ammation of TME can be measured by the cellular content of the tumor, for example, in ltrating T and B cells. In amed tumors also contain a broad proin ammatory cytokines pro le and a type I interferon signature indicating activation of innate immune response. While TGF-β can drive an excluded phenotype in TME as its impact on stromal cells, and prevent T cell penetration into the centre of the tumor, [41]. These results indicate that anti-tumor immunological effect is a bidirectional and dynamic system in the tumor microenvironment (TME). This biomarker analysis should help us to unravel the complexities of the interaction and molecular mechanisms between cancer and the host immune system.
In general, we gained a comprehensive insight into the TME of HCC and created a prognostic immunerelated score that might become potential prognostic and predictive biomarkers.     Comparison of the statistical differences between the two groups were performed through Wilcox rank-sum test. Each boxplot is labeled with asterisks for p values. (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001) Availability of data and materials The datasets generated and analyzed during the current study are available in the TCGA(https://xena.ucsc.edu/) and GEO repository with the dataset number GSE14520 , (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE14520).
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.   and KEGG (B). The top 10 terms by adjusted p-values were selected to show. The x-axis of the plot represents the z-score. The -log (adjusted p-value) is displayed on the y-axis which corresponding to the signi cance of the term. The area of the plotted circles is proportional to the number of genes assigned to the term. Each circle is colored according to its category and labeled with the ID. The yellow lines represent a threshold for the displayed labels (adjusted p-value<0.05).   Comparison of the statistical differences between the two groups were performed through Wilcox rank-sum test. Each boxplot is labeled with asterisks for p values . (*p<0.05, **p<0.01, ***p<0.001, ****p<0.0001) Figure 8 Correlation scatterplots between ImmuneRiskScore and immune check block (ICB) associated biomarkers, combined with density plot of expression distribution. The ICB biomarkers including PDL1, PD1, PDL2, CTLA4, CYT, IFN gamma, Pan_F_TBRS and TMB.