Novel immune-related genes in the tumor microenvironment with prognostic value in breast cancer

Background Breast cancer is one of the most frequently diagnosed cancers among women worldwide. Alterations in the tumor microenvironment (TME) have been increasingly recognized as key in the development and progression of breast cancer in recent years. To deeply comprehend the gene expression profiling of the TME and identify immunological targets, as well as determine the relationship between gene expression and different prognoses is highly critical. Methods The stromal/immune scores of breast cancer patients from The Cancer Genome Atlas (TCGA) were employed to comprehensively evaluate the TME. Then, TME characteristics were assessed, overlapping genes of the top 3 Gene Ontology (GO) terms and upregulated differentially expressed genes (DEGs) were analyzed. Finally, through combined analyses of overall survival, time-dependent receiver operating characteristic (ROC), and protein-protein interaction (PPI) network, novel immune related genes with good prognosis were screened and validated in both TCGA and GEO database. Results Although the TME did not correlate with the stages of breast cancer, it was closely associated with the subtypes of breast cancer and gene mutations (CDH1, TP53 and PTEN), and had immunological characteristics. Based on GO functional enrichment analysis, the upregulated genes from the high vs low immune score groups were mainly involved in T cell activation, the external side of the plasma membrane, and receptor ligand activity. The top GO terms of the upregulated DEGs from the high vs low immune score groups exhibited better prognosis in breast cancer; 15 of them were related to good prognosis in breast cancer, especially CD226 and KLRC4-KLRK1. Conclusions High CD226 and KLRC4-KLRK1 expression levels were identified and validated to correlate with better overall survival in specific stages or subtypes of breast cancer. CD226, KLRC4-KLRK1 and other new targets seem to be promising avenues for promoting antitumor targeted immunotherapy in breast cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-07837-1.

(NK) cells, T lymphocytes, and B lymphocytes), extracellular matrix (ECM), signaling molecules and cytokines [5,6]. Stromal cells and immune cells are two major types of nontumor components in the TME, so the stromal or immune components are proposed to be valuable for diagnostic and prognostic assessments of the TME. It was demonstrated that epigenetic alterations generating aberrant gene expression in the cells of the TME are predictive of clinical outcomes [7]. In addition, there is increasing interest in the breast cancer microenvironment as a prognostic factor as well as a potential therapeutic target; thus, a wider assessment of immune responses within the TME by gene expression pro ling might effectively predict clinical outcomes [8]. A deeper comprehension of the genetic pro le of the breast cancer TME is urgently needed.
In this study, comprehensive bioinformatics analyses were performed to better understand the immunerelated genetic pro le and determine the relationship between gene expression in the microenvironment of breast cancer and prognosis. Our study will provide applicable information on novel immunological targets from the TME, complement other treatment options for breast cancer and improve the overall survival (OS) of patients.

Methods
Data sources and preprocessing RNA sequencing (RNA-seq) data in fragments per kilobase of transcript per million mapped reads (FPKM), single-nucleotide polymorphism data and clinical follow-up information such as age, molecular subtype, outcome and survival time of breast cancer were downloaded from The Cancer Genome Atlas (TCGA) database. A total of 1097 cases were obtained from the TCGA dataset. Then, the RNA-seq data were converted into transcripts per kilobase million (TPM) expression pro les. The immune and stromal scores of the samples were calculated by the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm via the ESTIMATE R software package. Next, two series of gene expression pro les from the Gene Expression Omnibus (GEO) database (GSE20685 and GSE42568) with clinical data from 431 breast cancer patients were used as the validation datasets. The GEO data were downloaded and normalized by the LIMMA R software package.
Comprehensive assessment of the TME characteristics in breast cancer Based on the mean stromal or immune scores as the cutoff value, the patients were divided into two groups: the high stromal/immune score group and the low stromal/immune score group. Then, we analyzed the relationships between different stages/subtypes of breast cancer and stromal/immune scores. Kaplan-Meier (K-M) analysis was performed to determine the differences in OS between the highand low-score groups of the four subtypes of breast cancer. In addition, we detected the correlations between stromal/immune scores and conventional gene mutations.

Identi cation of differentially expressed genes (DEGs)
Following the high and low score grouping, we screened for DEGs to obtain those with a false discovery rate (FDR) <0.05 and a log2 fold change (FC)>1. Heatmaps and cluster maps of the DEGs were generated using the Pheatmap package, and volcano plots were designed by the ggplot2 package in R software.
Then, the upregulated genes with high vs low scores were further analyzed with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. GO categories including biological process (BP), molecular function (MF), or cellular component (CC) were analyzed using the EnrichGO function in the clusterPro ler R package and KEGG analysis was performed by the EnrichKEGG function in the clusterPro ler R package, with the parameters pvalue-Cutoff=0.05 and qvalue-Cutoff=0.05. A Venn diagram was used to screen the overlapping genes between the upregulated DEGs from high vs low scores and the top 3 GO terms by the online tool VENNY 2.1 (https://bioinfogp.cnb.csic.es/tools/venny/index.html).

Construction of the protein-protein interaction (PPI) network
The PPI network was established in the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database and reanalyzed by Cytoscape software. The medium con dence was set as 0.4. Only individual networks with 10 or more nodes were included for further analysis. The connectivity degree of each node of the network was calculated. The logFC value of the gene expression was used to re ect the color of each node, and the size of the node represented the number of proteins interacting with the designated protein. Molecular COmplex Detection (MCODE) was then used to locate the central gene cluster.

Screening immune-related genes with prognostic value
To identify genes with prognostic value in the TME, we performed univariate survival analysis and used K-M analysis to analyze the relationship between the expression of these genes and the OS of breast cancer patients. Then, a time-dependent receiver operating characteristic (ROC) curve was generated to assess the predictive accuracy

Statistical analysis
All data in the present study were performed in R software (version 3.6.1; https://www.r-project.org/). The stromal or immune scores were calculated by ESTIMATE package. Survival curve was constructed by K-M analysis and compared by log-rank test, which was done to explore the survival differences in subtypes or stages of breast cancer and to identify the prognosis of selected genes using the GraphPad Prism 6.0 software (GraphPad Software Inc, La Jolla, CA, USA). Heatmaps, volcano plots, GO enrichment, KEGG pathway and Time-dependent ROC curve were plotted by R software. Time-dependent ROC curve as done with the TimeROC package. The con dence index of AUC was calculated by combined with Survival and TimeROC package. All statistical tests were two-sided and P-value<0.05 was considered statistically signi cant.

Results
Immune scores and stromal scores are closely related to breast cancer subtypes but not associated with breast cancer stage In our study, the stromal scores ranged from -2164.14 to 2050.55, and the immune scores were distributed between -1724.88 and 3459. 35. First, we assessed the relationships between stromal scores or immune scores and different stages of breast cancer. However, neither stromal scores nor immune scores showed signi cant differences in different stages of breast cancer ( Figure 1A, 1B). Then, we explored whether stromal scores or immune scores were correlated with breast cancer subtypes. The average stromal scores of the basal-like subtype were the lowest of all four subtypes ( Figure 1C); however, the average immune scores of the basal-like subtype ranked highest ( Figure 1D). In addition, the immune scores of the HER2-enriched subtype were higher than those of the luminal B subtype ( Figure  1D). Finally, we divided the scores into high and low groups and evaluated the survival probabilities of distinct subtypes in both groups of stromal scores and immune scores. Unfortunately, the high-and lowscore groups did not show any survival differences in the four breast cancer subtypes ( Figure S1).
Breast cancer gene mutations were correlated with stromal scores or immune scores Some gene mutations are implicated in cancer susceptibility. Thus, we downloaded single-nucleotide polymorphism data on several conventional mutant genes in the clinic, such as BRCA1, BRCA2, CHEK2, PALB2, BRIP1, TP53, PTEN, STK11, CDH1, ATM, BARD1, MLH1, MRE11A, MSH2, MSH6, NBN, MUTYH, PMS1, PMS2, RAD50, and RAD51C. Then, the patients were divided into mutant and wild-type groups, and the distributions of stromal scores and immune scores were plotted based on the status of the mutant gene in breast cancer. When compared with the wild-type group, either the BRCA1 or BRCA2 mutant group exhibited no signi cantly difference in the stromal scores and immune scores (Figure 2A, 2B, 2F, 2G). However, CDH1 mutant cases had higher stromal scores and immune scores ( Figure 2C, 2H). Although stromal scores of the PTEN/TP53 mutant and wild-type groups were not statistically signi cant ( Figure 2D, 2E), PTEN/TP53 mutant cases exhibited higher immune scores than wild-type cases ( Figure   2I, 2J). Other mutant genes, such as CHEK2, PALB2, BRIP1, STK11, ATM, BARD1, MLH1, MRE11A, MSH2, MSH6, NBN, MUTYH, PMS1, PMS2, RAD50, and RAD51C, had no signi cant differences in stromal scores or immune scores between the mutant and wild-type groups (data not shown). Thus, CDH1, PTEN and TP53 mutations were closely associated with stromal/immune scores in breast cancer.

Differential gene expression pro les with stromal scores and immune scores in breast cancer
To determine the correlation between comprehensive gene expression pro les and stromal/immune scores, we analyzed the 1097 cases acquired from the TCGA database. The heatmap shows the top 100 distinct gene expression pro les of cases belonging to the high vs. low score groups ( Figure 3A, 3B).
Based on the stromal score grouping, 2832 genes were upregulated and 253 genes were downregulated in the high vs. low stromal score groups. Similarly, 1930 genes were upregulated and 491 genes were downregulated in the high vs. low immune score groups (p<0.05, log2 FC>1). Figure 3C and 3D show the volcano plot of differentially expressed genes (DEGs) in high vs. low stromal/immune score groups.
Next, in order to comprehend the function of DEGs, we selected the upregulated DEGs for GO enrichment and KEGG pathway analyses. The top GO terms identi ed for the upregulated DEGs from the high vs. low stromal score groups were extracellular structure organization, collagen-containing extracellular matrix, and receptor ligand activity ( Figure 4A). In addition, the top GO terms of the upregulated DEGs from the high vs. low immune score groups were T cell activation, external side of the plasma membrane, and receptor ligand activity ( Figure 4B). Moreover, the top KEGG pathways of the upregulated DEGs from the high vs. low stromal score and high vs. low immune score groups were neuroactive ligand-receptor interaction and cytokine-cytokine receptor interaction, respectively ( Figure 4C, 4D).
For deeply exploring the gene pro les of the top GO terms, we integrated the top 1 term of the BP, MF, or CC categories as gene datasets. Then, we screened the overlapping genes between upregulated DEGs and the gene datasets (Table S1). Venn diagrams showed that 225 genes overlapped between the upregulated genes from the high vs. low stromal score groups and the gene datasets (for convenience of description, we named these genes overlapping 1 genes), and 169 genes overlapped between the upregulated genes from the high vs. low immune score groups and the gene datasets (for convenience of description, we named these genes overlapping 2 genes) ( Figure 4E, 4F).
The screened immune-related DEGs are associated with good prognosis in breast cancer We performed a univariate survival analysis to reveal the relationship between the selected overlapping genes and prognosis in breast cancer patients from TCGA. Although the overlapping 1 genes did not show statistical signi cance in the OS analysis, the overlapping 2 genes exhibited highly signi cant differences in the prognosis of breast cancer, so we chose the overlapping 2 genes for further analysis. Using p value <0.05, 54 genes related to good prognosis were included. Then, a time-dependent ROC curve (area under the curve (AUC) >0.6) was employed to assess the prediction accuracy, and 31 genes with prognostic value in breast cancer remained. To better understand the interplay among the identi ed 31 genes, we performed PPI analysis, and Figure 5A shows the overall PPI network, which included 29 nodes and 110 edges ( Figure 5A). Then, we selected the extremely critical module containing 15 nodes for further analysis ( Figure 5B).   Table 2. The accuracies in predicting the 1-year, 3-year and 5-year OS of patients with CD226 (AUC 0.73, 0.62, and 0.57, respectively), KLRD1 (AUC 0.73, 0.6, and 0.54, respectively), and KLRC4-KLRK1 (AUC 0.72, 0.6, and 0.56, respectively) were higher than those with the other selected genes, especially for the 1-year prediction accuracy ( Figure 7 and Table 2). Therefore, we identi ed CD226, KLRD1, and KLRC4-KLRK1 for further functional evaluation.
High expression of CD226 and KLRC4-KLRK1 results in a better prognosis in stage II, stage III or luminal B breast cancer To further explore the identi ed CD226, KLRD1, and KLRC4-KLRK1 genes with prognostic value, we analyzed the expression of these genes and their association with OS in different breast cancer stages or subtypes. Although the high expression of KLRD1 was not related to a better prognosis in any stage or subtype variation (data not shown), the expression of CD226 and KLRC4-KLRK1 displayed a strong correlation with OS for stages or subtypes. High CD226 expression was related to better prognosis in stage II and stage III breast cancer, as well as in luminal B breast cancer ( Figure 8A-8C). Similarly, despite not being signi cant for stage II (Figure 8D), high KLRC4-KLRK1 expression revealed preferable survival probabilities in stage III breast cancer and in the luminal B subtype ( Figure 8E, 8F). In other stages or subtypes of breast cancer, CD226 and KLRC4-KLRK1 expression was not signi cantly related to OS ( Figure S2).
Validating that the prognostic genes identi ed from the TCGA database are equally signi cant in the GEO database To validate whether the prognostic genes identi ed from the TCGA database are equally signi cant in the GEO database, two gene pro les from GEO (GSE20685 and GSE42568) were used as validation datasets. The gene expression data of 327 cases from GSE20685 and 104 cases from GSE42568 with clinical information were downloaded and evaluated for OS. Data from GSE42568 demonstrated that CD226 was related to good prognosis in breast cancer patients, while KLRC4-KLRK1 did not show any correlation with breast cancer OS ( Figure 9A, 9B). However, the data from GSE20685 showed the reversed results; instead of CD226, KLRC4-KLRK1 was related to good prognosis in breast cancer ( Figure 9C, 9D). Next, we employed the data from GSE20685 to better con rm the relationships between gene expression and OS in different cancer stages ( Figure 9E-9J), and found CD226 was rmly associated with good prognosis in stage II breast cancer patients. Thus, our validation datasets partly proved that high CD226 and KLRC4-KLRK1 expression levels would predict satis ed overall survival of breast cancer, especially in speci c stages.

Discussion
The TME is crucial in tumor initiation, progression and drug resistance in breast cancer, and the in ltration of nontumor cells in the TME was calculated by the ESTIMATE algorithm. The ESTIMATE algorithm developed based on gene expression data is valid and effective in various cancers, such as prostate cancer, breast cancer, and colon cancer [9,10]. By utilizing the breast cancer cohorts in the TCGA database and ESTIMATE algorithm-derived scores, we rst analyzed the relationship between stromal/immune scores and different stages or subtypes of breast cancer and found that stromal/immune scores were highly correlated with subtypes of breast cancer, especially in the basal-like subtype, and the luminal B and HER2-enriched subtypes also showed relevance with immune scores. This nding demonstrated that TME variation correlated with subtypes of breast cancer and possessed immunological characteristics, which is consistent with the ndings of a previous study [11]. However, the high immune-related scores did not predict a better prognosis in breast cancer subtypes. This is because prognosis is more in uenced by various factors, such as age, race, never being pregnant or having a rst child after age 30, type of surgical procedure, initial tumor size, clinical lymph mode status, and neoadjuvant chemotherapy [12,13]. In addition, when considering the tumor stroma, cancer-associated broblasts (CAFs) are the major constituent of tumor stroma, take part in induction of tumor progression in breast cancer. Mounting evidence indicates that CAFs are heterogeneous, like tumor cells, among different subtypes or in the same subtypes of breast cancer which patients have different length of survival [14,15]. Currently, the potential role of CAFs as a predictive biomarker of tumor prognosis is still debated.
Second, we analyzed the relationship between the TME and gene mutations. Although our data demonstrated that CDH1, TP53 and PTEN mutations were closely associated with the TME, BRCA1 and BRCA2 mutations did not show signi cant differences in TME variation. BRCA1 and BRCA2 mutant genes predispose individuals to an elevated risk of breast cancer, and those with a family history of cancer are recommended to undergo gene detection based on the National Comprehensive Cancer Network (NCCN) guidelines [16]. Thus, BRCA1 and BRCA2 mutations are not highly frequent in sporadic cases, and the design of upcoming trials could stratify patients by BRCA status to avoid potential bias. CDH1, TP53 and PTEN are tumor suppressor genes, and their alteration presents poor survival and worse prognosis in breast cancer [17][18][19]. Our study ndings coincide with those of a previous exploration that after the exclusion of BRCA1 or BRCA2, the TME correlates with TP53, CDH1, and PTEN mutations, and their mutations revealed a high or moderately increased risk of breast cancer [20].
Third, when further exploring the DEGs in the TME, many previous studies have ignored genes with low expression but exhibit high signi cance in antitumor activity and are related to better prognosis in patients. Thus, to eliminate the factor of tumor cells downregulating the expression of antitumor genes during tumor progression, we focused on the genes of the top GO terms of the upregulated DEGs based on functional enrichment.
Fourth, through step-by-step screening via OS analysis, time-dependent ROC analysis, and PPI network analysis, we revealed that the top GO term genes of the upregulated DEGs from the high vs low immune score groups exhibited better prognosis in breast cancer, which could be explained by the fact that the immune system plays an important role in cancer development and therefore potentially offers novel targeted therapies in antitumor treatment [21]. Ultimately, we found 2 important TME genes with good prognosis (CD226 and KLRC4-KLRK1).
CD226, also known as DNAM-1, is an activating receptor expressed on various immune cells, such as CD4+ and CD8+ T lymphocytes, regulatory T cells (Tregs), monocytes, macrophages, and NK cells [22,23]. CD226 serves as a costimulator, enhances T cell and NK cell activation [22], and exhibits signi cance in innate/adaptive immune regulatory networks. When combined with its ligand CD115 or CD112 upregulated in tumor cells [24], CD226 facilitates the cytotoxicity of NK cells [25]. In addition, in Tregmediated tumor immune escape, Tregs express relatively high levels of TIGIT and low levels of CD226 compared with effector T cells (Teffs), resulting in a high ratio of TIGIT/CD226 expression and accelerating tumor development. In contrast, augmenting CD226 expression and reversing the TIGIT/CD226 ratio would predict a good clinical outcome [26]. However, reduction in CD226 expression decreases the immune regulatory capacity, and CD226-de cient CTL or NK cells exhibit markedly less cytotoxic activity against DNAM-1 ligand-expressing tumors [27]. Accumulating evidence has shown that CD226 plays a pivotal role in tumor recognition and cancer immune surveillance [28], even promoting antitumor immune responses mediated by NK and T cells.
Killer cell lectin-like receptor subfamily C4 -killer cell lectin-like receptor subfamily K1 (KLRC4-KLRK1) belongs to killer cell lectin-like receptor family [29] and represents naturally occurring read-through transcription between neighboring KLRC4 and KLRK1. KLRC4 lacks a signi cant portion of the KLRC4 coding sequence but encodes the KLRK1 protein. Once tumorigenesis occurs, the amount of KLRK1 ligand increases immediately. KLRK1 (or NKG2D) is also an activating receptor expressed by NK cells and T cell subsets. In addition, KLRK1 could augment the cytotoxicity of NK cells/T cells or synergize with immune checkpoint inhibitors to eliminate tumor cells [30]. However, tumor cells evoke a range of mechanisms to evade KLRK1 surveillance system detection and impair the clinical bene ts of immunotherapy in various cancers [31], [32]. The downregulation of KLRK1 hampered NK cell cytotoxicity [33]. Conversely, blocking the shedding of ligands by tumors or the release of KLRK1 ligandbearing exosomes might restore the expression of KLRK1 receptors on NK cells and T cells and improve their activity [34].
It was reported that the KLRK1 axis is becoming an emerging target in cancer immunotherapy [35,36], and the overexpression of CD226 or KLRK1 on NK cells resulted in e cient anti-sarcoma activity [37].
These studies were in accordance with our exploration to provide a molecular basis for the development of CD226-and KLRC4-KLRK1-targeted antitumor immune therapeutics. When considering that the results from the TCGA database were veri ed by the GEO database in part and that the samples from GEO were much fewer than those from TCGA database, further exploration should include more breast cancer patients.
Tumorigenesis is initiated by 3 steps: cancer cell elimination by various immune cells, such as NK cells and CD8+ T cells, then immune pressure leading to the selection of tumor cell variants and nally, immune escape by inhibiting effector cells or inducing tolerogenic cells [38]. Escaping antitumor immunity is a hallmark for the progression of breast cancer. In the TME, tumor cells interact with various types of immune cells by activating immune checkpoint pathways [39,40]. Immune checkpoints (CTLA-4, PD-1/PDL1, LAG3, TIM3 and TIGIT) are orchestrated by a series of costimulatory and inhibitory signaling molecules and then modulate effector T lymphocyte (Teff) activity. Recent advancements in antibodies against immune checkpoints have highlighted the bene ts of immune checkpoint inhibitors in both animal studies and clinical trials [41].
In this study, we also detected immune checkpoint genes, such as CTLA-4, PD-1, LAG3 and TIM3, and found that their high expression was not associated with good prognosis in breast cancer (data not shown). This nding is probably related to the patients' status of checkpoint gene expression and immune state of the TME [8]. To achieve unbiased results, the identi cation of immune checkpoint gene expression status is critical. The investigation into the relationship between immune checkpoint gene expression and the TME immune state (such as tumor-in ltrating lymphocytes and the T cell receptor repertoire) would provide key insights into checkpoint blockade therapy.
At present, there are multiple antitumor approaches: inverting tumor immunosuppression (for example, employing immune checkpoint inhibitors and the direct induction of the Teff immune response), using immune-based therapies targeting speci c immune cell types (for example, improving cytotoxic e cacy and immune surveillance through NK cells or Teffs), reducing the number of immunosuppressive myeloid cells, inhibiting Tregs, and altering the function of myeloid cells [7].
Despite combining multidisciplinary treatment strategies, breast cancer patients still have a comparably high mortality rate. An improved understanding of the immune-related genetic pro le of the TME in breast cancer and the identi cation of new immunological targets are critical for improving clinical outcomes. CD226, KLRC4-KLRK1 and subsequent new targets seem to be promising avenues for promoting antitumor targeted therapy in breast cancer.

Conclusions
The exploration of the TME and immunological treatment in breast cancer have become increasingly important in recent years. In our study, although the TME did not correlate with the stages of breast cancer, we veri ed that it was highly associated with the subtypes of breast cancer and gene mutations (CDH1, TP53 and PTEN) and possessed immunological characteristics. The combined analysis of OS, time-dependent ROC, and the PPI network revealed that the genes of the top 3 GO terms of the upregulated DEGs from the high vs. low immune score groups were associated with better prognosis in breast cancer, and 15 of them were related to good prognosis in breast cancer, especially CD226 and KLRC4-KLRK1. High CD226 and KLRC4-KLRK1 expression levels were identi ed, and their correlation with better OS in speci c stages or subtypes of breast cancer was validated.

Abbreviations
DEGs: differentially expressed genes; ESTIMATE: Estimation of stromal and immune cells in malignant tumors using expression data; GO: Gene Ontology; TCGA: The Cancer Genome Atlas; TPM: transcripts per kilobase million; KEGG: Kyoto Encyclopedia of Genes and Genomes.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
All data generated or analysed during this study are included in this published article and its supplementary information les.

Competing interests
The authors declare that there are no nancial or commercial con icts of interest.

Funding
No funding was obtained for this study.
Authors' contributions WT: Study design, data analysis, and manuscript writing; ML: Data collection; LW: Data collection; YG, CW, SZ, CL: Study design; NL: Data analysis. All authors have read and approved the nal version of the manuscript.