Skip to main content

Advanced machine learning unveils CD8 + T cell genetic markers enhancing prognosis and immunotherapy efficacy in breast cancer

Abstract

Background

Breast cancer (BC) is the most common cancer in women and poses a significant health burden, especially in China. Despite advances in diagnosis and treatment, patient variability and limited early detection contribute to poor outcomes. This study examines the role of CD8 + T cells in the tumor microenvironment to identify new biomarkers that improve prognosis and guide treatment strategies.

Methods

CD8 + T-cell marker genes were identified using single-cell RNA sequencing (scRNA-seq), and a CD8 + T cell-related gene prognostic signature (CTRGPS) was developed using 10 machine-learning algorithms. The model was validated across seven independent public datasets from the GEO database. Clinical features and previously published signatures were also analyzed for comparison. The clinical applications of CTRGPS in biological function, immune microenvironment, and drug selection were explored, and the role of hub genes in BC progression was further investigated.

Results

We identified 71 CD8 + T cell-related genes and developed the CTRGPS, which demonstrated significant prognostic value, with higher risk scores linked to poorer overall survival (OS). The model’s accuracy and robustness were confirmed through Kaplan-Meier and ROC curve analyses across multiple datasets. CTRGPS outperformed existing prognostic signatures and served as an independent prognostic factor. The role of the hub gene TTK in promoting malignant proliferation and migration of BC cells was validated.

Conclusion

The CTRGPS enhances early diagnosis and treatment precision in BC, improving clinical outcomes. TTK, a key gene in the signature, shows promise as a therapeutic target, supporting the CTRGPS’s potential clinical utility.

Peer Review reports

Introduction

BC is the most prevalent cancer among women, representing a significant contributor to their disease burden globally [1]. In Asia, China reports the highest incidence of BC, making it the leading cause of malignancy-related morbidity among females [2]. Despite substantial advancements in medical technology, including enhanced diagnostic techniques and treatment options, the prognosis and quality of life for BC patients remain challenged by the disease’s heterogeneity and the absence of effective early diagnostic markers [3]. Therefore, the identification of new biomarkers is essential to predict prognosis accurately and devise effective treatment strategies, ultimately improving patient survival and quality of life. Recent research highlights the critical role of the tumor immune microenvironment (TME) and immune infiltration in the development and progression of BC [4, 5]. The TME comprises a complex network of intrinsic and adaptive immune cells that interact dynamically, influencing tumor behavior and response to immunotherapy [6]. Among these, CD8 + T cells are pivotal in mediating anti-tumor immunity [7]. Activated CD8 + T cells release cytolytic molecules, such as perforin and granzyme B, which induce tumor cell lysis directly [8]. Additionally, they secrete cytokines like interferon-gamma (IFN-γ), which not only activate other immune cells, including macrophages and natural killer cells but also enhance overall anti-tumor responses [9]. CD8 + T cells interact with regulatory T cells, macrophages, and other immune cells to maintain immune homeostasis within the TME, bolstering their anti-tumor efficacy and the effectiveness of immunotherapy [10, 11].

The infiltration abundance of CD8 + T cells has been recognized as a major immune signature linked closely to the clinical outcome and prognosis of immunotherapy. For instance, the deletion of SNX9 has been shown to alleviate CD8 + T-cell depletion, leading to more effective immunotherapy against cellular carcinoma [12]. Similarly, constructing gene markers associated with CD8 + T cells has demonstrated potential in predicting prognosis and the efficacy of immunotherapeutic approaches in various cancers, including bladder cancer [13]. Consequently, exploring the CD8 + T-cell-associated gene network is crucial for rapid diagnosis and precise treatment of BC patients.

In our study, we introduced a consensus machine learning-derived CD8 + T cell-related gene prognostic signature (CTRGPS) to enhance early diagnosis and precision treatment for BC patients. Using single-cell RNA sequencing, we identified CD8 + T-cell marker genes and further recognized 71 CD8 + T cell-related genes (CTRGs) through consensus clustering and weighted gene co-expression network analysis (WGCNA). Subsequently, we developed and validated a prognostic signature using ten machine-learning algorithms. Our research utilized seven independent public datasets, including one single-cell RNA sequencing (scRNA-seq) dataset and six microarray datasets, sourced from the GEO database. We employed scRNA-seq data to identify CD8 + T-cell marker genes and combined GSE7390 and GSE42568 datasets as the main cohort for constructing our signature and subsequent analysis. Additional datasets were used to validate our signature. The scRNA-seq dataset, containing 26 primary tumors from three major BC subtypes, was processed to ensure high-quality cells for analysis. Using the “Seurat” R package, we normalized the data, identified highly variable genes, and performed principal component analysis (PCA) to determine the optimal number of PCs and resolution. Clusters were identified, and cell types were annotated using classical cell surface markers and differentially expressed genes (DEGs). The CD8 + T-cell marker genes were identified through re-clustering of T cells. For microarray data, we processed the raw data to convert gene probes to human gene symbols, normalized the genes, and removed batch effects. Consensus clustering was performed to partition the tumor samples into groups, and the optimal number of clusters was determined using the cumulative distribution function (CDF) curve.

In summary, our study identifies and validates CD8 + T-cell marker genes and constructs a robust prognostic signature, contributing to the precision treatment of BC patients and improving their clinical prognosis. This approach underscores the importance of integrating advanced computational techniques with immunological insights to address the complexities of BC management.

Materials and methods

Public data collection

We collected a total of seven independent public datasets from the GEO database (https://www.ncbi.nlm.nih.gov/), comprising one single-cell RNA sequencing (scRNA-seq) dataset and six microarray datasets. Among these, the scRNA-seq dataset GSE176078 was utilized to identify CD8 + T-cell marker genes. Dataset GSE7390 and GSE42568 were combined as the Main-Cohort for constructing our signature and subsequent analyses, while datasets GSE1456, GSE16446, GSE20685, and GSE86166 were used for signature validation. Clinical information for all datasets was collected from the easyGEO database (https://tau.cmmt.ubc.ca/eVITTA/easyGEO/). The mRNA expression levels of breast cancer cell lines were obtained from the Cancer Cell Line Encyclopedia (CCLE) database (https://depmap.org/portal/).

Processing of scRNA-seq data

The GSE176078 dataset comprises 26 primary breast tumors representing ER+ [11], HER2+ [5], and TNBC [10] subtypes. Six ER + samples with poor data quality were excluded, leaving 20 for analysis. Quality control ensured high-quality cells met criteria: genes expressed > 300 and in > 10 cells, with 200–6000 expressed genes per cell, mitochondrial genes < 20%, ribosomal genes > 3%, and hemoglobin genes < 0.1%. After filtering, 76,292 cells remained. scRNA-seq data was processed using the “Seurat” package. Normalization and identification of the top 2000 variable genes for PCA were performed. Optimal PC number and resolution were determined using the “ElbowPlot” function and “clustree” package. Clusters were identified using “FindNeighbors” and “FindClusters” functions. Visualization employed UMAP. Differential expression genes (DEGs) were filtered using “FindAllMarkers” (logFC = 0.25, min.pct = 0.25). Cell markers from CellMarker2.0 [14] and DEGs annotated individual cells. T cell expression matrix was reclustered (resolution = 1.2). CD8 + T-cell marker genes were identified using “FindAllMarkers” (logFC = 0.1, min.pct = 0.25). Cytotoxic effector gene signature scores were calculated using the “AUCell” package [15].

Processing of microarray data

Raw microarray data from the GEO database underwent gene probe conversion to human gene SYMBOL and removal of probes matching multiple genes. For genes matching multiple probes, average values were selected. Normalization followed a previous study [16]. Batch effects were removed using the “removeBatchEffect” function of the “limma” R package. GSE7390 and GSE42568 datasets were merged after batch effect removal for main-cohort construction and subsequent analysis.

Consensus clustering analysis

ConsensusClusterPlus R package performed consensus cluster analysis on tumor samples. K-means clustering algorithm with Euclidean correlation distance metric and 80% sample resampling for 1000 repetitions generated up to K (max K = 6) groups. Optimal number (K = 2) of clusters was determined using the cumulative distribution function (CDF) curve. PCA visually assessed clustering patterns. “survminer” R package drew Kaplan-Meier curves for overall survival (OS) comparison.

Weighted correlation network analysis (WGCNA)

WGCNA method, as described previously [17], used soft threshold β = 6 to meet scale-free network criterion. Weighted adjacency matrix was converted to a topological overlap matrix (TOM), generating corresponding dissimilarity (1-TOM) values. Dynamic tree cutting identified modules, and the module with highest correlation coefficient with CD8 + T cell infiltration clusters was selected. Genes with high GS and MM were designated CD8 + T cell-related genes.

Machine learningbased signature construction and validation

CTRGs with potential prognostic value were screened in the main-cohort through univariate Cox regression analysis. 101 different combinations of 10 machine learning algorithms were integrated to develop a prognostic signature. Models were tested on four other datasets, and the model with the highest average c-index was considered optimal [18, 19].

Prognostic value of CTRGPS and potential clinical application

We selected the optimal model and classified breast cancer patients into high- and low-CTRGPS groups based on the method described previously [20,21,22], utilizing the median risk score of the respective cohorts. Kaplan-Meier and ROC curves were used to evaluate the prognostic effectiveness and accuracy of CTRGPS. To comprehensively compare its performance, we retrieved 52 prognostic signatures related to breast cancer, calculating the score of each sample based on published coefficients.

Additionally, we conducted univariate and multivariate Cox regression analyses to determine whether CTRGPS can serve as an independent prognosis factor. To enhance its clinical application, we integrated clinical pathological characteristics with CTRGPS to build a nomogram model. Calibration curves were drawn to describe accuracy, and decision curve analysis (DCA) was employed to calculate clinical benefits for patients.

Functional and pathway enrichment analysis

Biological functions and pathway processes related to CTRGPS were explored through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses on differentially expressed genes (adjust P value < 0.05, log2 fold change > 1) with high and low-CTRGPS. Additionally, hallmark gene sets from the Molecular Signature Database (MSigDB) were downloaded for GSVA enrichment analysis on all genes in high and low-CTRGPS samples. The R package “limma” calculated the difference of hallmark gene sets between the two groups.

Assessment of the immune microenvironment and drug sensitivity

To evaluate the role of CTRGPS in the tumor immune microenvironment, we utilized various immune infiltration algorithms, including the Estimation of Proportion of Immune and Cancer cells (EPIC) [23], Microenvironment Cell Populations-counter (MCPcounter) [24], QUANTISEQ [25], single-sample Gene Set Enrichment Analysis (ssGSEA) [26], Xcell [27], and Tumor Immune Estimation Resource (TIMER) [28]. Additionally, we applied the Estimation of STromal and Immune cells in MAlignant Tumours using Expression data (ESTIMATE) [29] to calculate immunological scores and tumor purity. Subsequently, we collected 55 immune modulator molecules [30], including genes involved in antigen presentation, cell adhesion, co-inhibitors, co-stimulators, ligands, and receptors, and compared the differences in high-and low-CTRGPS groups. Moreover, we utilized the Tracking Tumor Immunophenotype (TIP) website (TIP, http://biocc.hrbmu.edu.cn/TIP/) to analyze cancer immunity cycle differences between two CTRGPS groups.

Finally, our study obtained drug response data through the Genomics of Drug Sensitivity in Cancer (GDSC) database to explore the relationship between drug sensitivity and high- and low-CTRGPS. The R package “oncoPredict” [31] was used for analysis, and drug sensitivities were compared between different groups using the Wilcoxon rank sum test.

Cell culture and transfection

The human breast cancer cell line MDA-MB-231 was obtained from Procell (Procell Life Science&Technology Co.,Ltd). MDA-MB-231 was cultured in Dulbecco’s modified Eagle’s medium (DMEM, Procell, China) supplemented with 10% fetal bovine serum (FBS, Procell, China) and 1% penicillin and streptomycin, and stored in a humidified environment at 37 °C, 5% CO2.

Small interfering RNAs (siRNAs) targeting TTK coding sequence (Supplementary Table 1) was obtained from GenePharma (China) according to previously study [32]. The siRNAs sequence was transfected into cells (70% confluence) using jetPRIME transfection reagent (Polyplus, France) in strictly accordance with the reagent manufacturer’s instructions. After incubating for 48 h, the cells were collected and used to detect the protein levels.

Western blot analysis

The western blotting procedures followed a previously described method [33]. Initially, cells were collected and lysed with RIPA buffer, followed by centrifugation to collect the supernatant. Proteins were separated by 8% SDS-PAGE and transferred to a PVDF membrane. After blocking with 5% nonfat goat milk powder for 2 h, the membranes were incubated overnight at 4 °C with primary antibodies. Subsequently, they were incubated with HRP-coupled secondary antibodies for 2 h at room temperature. Protein bands were visualized using a chemiluminescence instrument. Primary antibodies used were anti-TTK (10381-1-AP, Rabbit, 1:1000, Proteintech, China) and anti-β-actin (66009-1-Ig, Mouse, 1:5000, Proteintech, China), while HRP-conjugated secondary antibodies were HRP goat anti-mouse IgG (DY60203, Mouse, 1:5000, Diyibio, China) and HRP goat anti-rabbit IgG (DY60202, Rabbit, 1:5000, Diyibio, China).

Cell counting kit‑8 (CCK‑8) assay

The CCK-8 assay (Seven, China) procedure was carried out following the established protocol as previously described [33]. Briefly, Transfected MDA-MB-231 cells were inoculated in 96-well plates (2 × 103 cells per well). After incubation at 37 °C with 5% CO2, the absorbance at 450 nm was measured using a microplate reader (Bio-Rad, USA).

Scratch wound healing assay

Transfected MDA-MB-231 cells were seeded into 6-well plates and incubated in 37 °C and 5% CO2 until the cell density reached 100%. Subsequently, the monomolecular layer was scraped with a 200 µl pipette tip. After removal of cell debris, the cells were continued to be cultured using complete medium containing 2% FBS (Procell, China). Images were taken at 0 h and 48 h after scratching. Cell migration was reflected by the wound healing percentage (initial wound area-end wound area)/initial wound area ×100%).

Statistical analysis

The statistical analysis was conducted using R (version 4.3.1) and GraphPad Prism (version 8.0.2). All experiments were carried out at least three biological replicates, and data were presented as mean ± SD (standard deviation). Pearson correlation analysis assessed correlations between continuous variables, while the Mantel test calculated correlations between matrices. Categorical variables were compared using the chi-square test, and continuous variables were compared using either the Wilcoxon rank sum test or t-test. P values less than 0.05 was considered statistically significant 。Statistical significance was defined as P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, while “ns” indicates non-significance.

Results

Identification of CD8 + T-cell marker genes by scRNA‑seq

Following quality control on the GSE176078 dataset, 76,292 cells were retained for analysis (Fig. S1 A, B, see Materials and Methods). PCA dimensionality reduction based on 2000 hypervariable genes in 20 BC samples determined the optimal PC number using heatmap and elbow plot analysis (Fig. S1 C-E, Fig. S2 A, B). A total of 29 cell clusters were identified (Fig. S2 C-E), delineating ten major cell types based on classical surface markers (Fig. 1A), including T cells (CD3E), fibroblast (DCN), epithelial cells (EPCAM), pericyte (ACTA2), endothelial cells (PECAM1), myeloid cells (CD68), B cells (MS4A1), plasma cells (JCHAIN), basal cells (KRT14), and natural killer cells (GNLY) (Fig. 1B). Heatmap visualization depicted the expression of top 5 marker genes for each cell type (Fig. 1C), while the histogram illustrated the proportions of these cell types across samples (Fig. 1D). T cell subpopulations were re-clustered and classified into CD4 + and CD8 + T cells based on classical surface markers (Fig. S2F, Fig. 1E, F). Subsequent calculation of cytotoxicity scores revealed a strong correlation with CD8 + T cells (Fig. 1G), leading to the identification of 272 marker genes as CTRGs in the CD8 + T cell cluster for further investigation (Supplementary Table 2).

Fig. 1
figure 1

Identification of CD8 + T cell marker genes through single-cell RNA sequencing analysis. (A) UMAP plot displaying the distribution of 10 distinct cell types identified in the dataset. (B) UMAP plots illustrating the expression patterns of marker genes in different cell clusters. (C) Heatmap depicting the expression levels of the top 5 marker genes specific to each indicated cell type. (D) Relative proportion of each cell type across 20 samples. (E) UMAP plot highlighting two major T cell subsets, CD4 + and CD8 + T cells. (F) UMAP plots illustrating the expression profiles of marker genes in T cell subsets. (G) Cytotoxicity score of the cytotoxic effector gene signature across different cell types

Development and validation of CD8 + T cell infiltration consensus clusters

Two hundred CTRGs were used to performed a consensus cluster analysis on the BC patients in the Main-Cohort after excluding unique 72 CTRGs in the scRNA-seq dataset (Fig. S3A). All BC patients were finally divided into 2 clusters based on the optimal cutoff (k = 2) (Fig. 2A-C and Fig. S3B). Importantly, survival curve analysis showing the BC patients in cluster2 had a shorter OS than those in cluster1 (p < 0.0001) (Fig. 2D). Meanwhile, higher CD8 + T cell infiltration was observed in cluster 2 BC patients compared to cluster 1 BC patients (Fig. 2E, F). Therefore, we defined cluster1 as “low CD8 + T cell infiltration” tumors and cluster2 as “high CD8 + T cell infiltration” tumors. And, the accuracy and robustness of the results were further validated by 5 immune infiltration algorithms, including EPIC, quanTIseq, ssGSEA, TIMER and xCell (Fig. 2E, F).

Fig. 2
figure 2

Identification of CD8 + T cell-related genes (CTRGs) via two algorithms. (A) Consensus clustering matrix of 200 candidate CD8 + T cell marker genes (CTCMGs) divided into two clusters (C1 = 183, C2 = 119). (B) Cumulative distribution function (CDF) curves of the consensus matrix for each k value, indicated by different colors. (C) Principal component analysis (PCA) map illustrating the distribution between the two clusters. (D) Kaplan-Meier curves showing overall survival (OS) of the two clusters in breast cancer patients. (E) Infiltrating abundance of CD8 + T cell subsets in both clusters assessed by five algorithms. (F) Distribution of CD8 + T cell subsets infiltration between the two clusters. (G) Network topology analysis with different soft-threshold powers. The left panel displays the effect of soft-threshold power on the scale-free topology fit index, while the right panel shows the effect on average connectivity. (H) Clustering dendrograms of genes, with dissimilarity based on topological overlap, along with assigned module colors. (I) Correlation analysis between clinical characteristics and module characteristic genes. (J) Correlation between module membership (MM, X-axis) and gene significance (GS, Y-axis) in the cluster. Points within the red rectangle were identified as CTRGs with both high GS and high MM

Identification of CTRGs

Utilizing the Weighted Gene Co-Expression Network Analysis (WGCNA) algorithm, we transformed the Pearson’s correlation matrix of genes into an enhanced adjacency matrix with power β = 6, based on a scale-free topology with R2 = 0.91 (Fig. 2G). Subsequently, we employed the TOM-based dissimilarity measurement method using the dynamic tree cutting algorithm to cluster 13,496 genes, resulting in the identification of 14 modules, each marked with distinct colors (Fig. 2H).

Subsequently, we delved into the analysis of the correlation between clinical features, CD8 + T cell infiltration, and each distinct module (Fig. 2I). Ultimately, the turquoise module, which displayed the highest correlation coefficient (cor = -0.72, p = 2e-48) with the consensus cluster, was selected for further investigation (Fig. 2J). Furthermore, the correlation coefficient between gene significance (GS) and module membership (MM) in the turquoise module reached 0.84. Based on these findings, 71 genes were identified as hub CTRGs with GS > 0.5 and MM > 0.6 (Supplementary Table 3).

Construction of a prognostic signature by integrating machine learning

Initially, we conducted univariate Cox regression analysis to identify prognostic CTRGs associated with overall survival (OS), resulting in the identification of 36 prognostic CTRGs (Supplementary Table 4) from a pool of 71 candidate genes. Subsequently, the expression profiles of these 36 CTRGs were integrated with machine learning algorithms to develop a consensus machine learning-derived CD8 + T cell-related gene prognostic signature (CTRGPS).

In the Main-Cohort, we employed 10 different machine learning algorithms to fit 101 prediction models. The prognostic performance of each model was evaluated using the concordance index (C-index). Additionally, we calculated the C-index of each model in four validation cohorts to assess the robustness of the model across different datasets (Fig. 3A, Fig. S3C). Ultimately, the combination of CoxBoost and Elastic Net (Enet) with alpha = 0.2 algorithms exhibited the highest average C-index (0.669), leading to its selection as the best-performing model. Subsequently, we identified six hub genes from the selected model and utilized them to construct the prognostic signature. Notably, breast cancer (BC) patients with high expression levels of four signature genes (CIRBP, MOAP1, PTP4A2, and VAV3) demonstrated an extended survival period, whereas those with increased expression of two signature genes (DSC2 and TTK) had a shortened survival period in the Main-Cohort (Fig. S3D-E).

Fig. 3
figure 3

Development and validation of CTRGPS utilizing multiple machine learning algorithms. (A) A total of 101 prediction models were constructed through a 10-fold cross-validation framework, and the concordance index (c-index) of each model was calculated for all validation datasets. The results of the top 12 machine learning algorithm combinations are presented. (B) Kaplan-Meier curves illustrating overall survival (OS) in the high- and low-CTRGPS groups. (C) Receiver operating characteristic (ROC) curves depicting the 2-, 3-, and 5-year OS in the high- and low-CTRGPS groups

To assess the prognostic performance of CTRGPS, we calculated the risk score for each sample in both in-/external cohorts. Based on the median risk score, all samples were stratified into high- and low-CTRGPS groups. Kaplan-Meier analysis revealed that the overall survival of the low-CTRGPS group was significantly better than that of the high-CTRGPS group in the Main-Cohort (P < 0.001). Similar results were observed in multiple independent cohorts, including GSE1456, GSE16446, GSE20685, GSE86166, and Meta-Cohort (Fig. 3B). Moreover, the area under the receiver operating characteristic (ROC) curve (AUC) values for 2-, 3-, and 5-year OS further demonstrated the accurate and robust performance of CTRGPS across different datasets (Fig. 3C). These findings underscore the predictive utility of CTRGPS in determining the prognosis of BC patients in diverse cohorts.

Evaluation of the independent prognostic predictor and clinical value of CTRGPS

We conducted a comparative analysis of risk scores across various stratification characteristics to investigate the prognostic significance of CTRGPS in relation to clinicopathologic features. Notably, breast cancer (BC) patients with high histological grade (G3) and tumor size (> 2 cm) exhibited elevated risk scores (Fig. 4A) and demonstrated poorer overall survival (OS) outcomes (Fig. 4B). These findings underscored the potential of a high CTRGPS score as a prognostic factor for BC patients. Furthermore, our analysis confirmed that CTRGPS could independently predict OS for BC patients (Fig. 4C, D).

Fig. 4
figure 4

Evaluation of the clinical independence and application value of CTRGPS in the main cohort. (A) Comparison of risk scores between different clusters stratified by clinicopathological characteristics using violin plots. (B) Kaplan-Meier curves illustrating overall survival (OS) in clinicopathological characteristic stratification. (C) Univariate Cox regression analysis of the correlation between CTRGPS and OS. (D) Multivariate Cox regression analysis of the correlation between CTRGPS and OS. (E) Construction of a nomogram to predict 2-, 3-, and 5-year OS. (F) Calibration curves of the nomogram for predicting overall survival at 2-year, 3-year, and 5-year in the Main-Cohort. (G) Decision curve analysis (DCA) demonstrating the benefit of the nomogram in clinical practice for BC patients

To illustrate the clinical value of CTRGPS, we constructed a nomogram model for prognostic prediction in BC patients by integrating clinical characteristics (Fig. 4E). Calibration curves demonstrated the model’s accuracy in predicting 2-, 3-, and 5-year mortality in BC patients (Fig. 4F). Additionally, the decision curve analysis (DCA) curve revealed a significant net benefit across a broad range of risks, with CTRGPS showing the highest net benefit compared to other independent factors (Fig. 4G). Overall, our study underscores the excellent performance of CTRGPS in predicting the prognosis of BC patients.

The evaluation of CTRGPS performance in predicting OS as an independent prognostic predictor

To comprehensively assess the performance of CTRGPS signatures relative to other signatures, we compiled a dataset encompassing signatures published over the past decade, resulting in a total of 52 features (Supplementary Table 5). These signatures correspond to diverse biological features, including glycolysis, metabolism, epigenetics, inflammatory factors, ferroptosis, apoptosis, aging, immune response, immune infiltration, and others.

Remarkably, CTRGPS exhibited superior performance in terms of the C-index across multiple cohorts, including the Main-Cohort, GSE1456, GSE16446, GSE20685, GSE86166, and Meta-Cohort (Fig. 5A-F). These results underscore the accuracy and robustness of CTRGPS in predicting overall survival as an independent prognostic predictor.

Fig. 5
figure 5

Comparison between CTRGPS and 52 previously published signatures in the Main-Cohort (A), GSE1456 (B), GSE16446 (C), GSE20685 (D), GSE86166 (E) and Meta-Cohort (F)

Potential biological mechanisms of the CTRGPS groups

To elucidate the underlying biological processes associated with the CTRGPS groups, we conducted comprehensive enrichment analyses. Gene ontology (GO) analysis results revealed distinct enrichments across different categories (Fig. 6A). Regarding biological processes (BP), genes related to CTRGPS were primarily enriched in chromosome segregation. In terms of cellular components (CC), genes linked to CTRGPS showed enrichment in the collagen-containing extracellular matrix. The molecular functions (MF) associated with CTRGPS were predominantly involved in G protein-coupled receptor binding, organic acid binding, and chemokine receptor binding. Pathway enrichment analysis highlighted the enrichment of CTRGPS-related genes in 20 pathways, with notable emphasis on cytokine-cytokine receptor interaction and the cell cycle (Fig. 6B). Additionally, we employed gene set variation analysis (GSVA) to uncover functional disparities between the high- and low-CTRGPS groups. Our findings revealed that the high CTRGPS group exhibited activation in pathways related to the G2M checkpoint, E2F targets, MTORC1 signaling, MYC targets V2, mitotic spindle, unfolded protein response, MYC targets V1, glycolysis, among others. Conversely, the low CTRGPS group showed activation in pathways such as the estrogen response early, estrogen response late, heme metabolism, bile acid metabolism, peroxisome, fatty acid metabolism, myogenesis, and others (Fig. 6C, D).

Fig. 6
figure 6

Functional enrichment and annotation analysis of CTRGPS groups (A) GO enrichment analyses of CTRGPS groups; (B) KEGG enrichment analyses of CTRGPS groups; (C-D) GSVA analysis of all genes in the high- and low-risk groups to obtain enriched pathways

Characterization of the tumor immune microenvironment between low- and high-CTRGPS group

We conducted a comprehensive assessment to delineate the role of CTRGPS in the immune microenvironment of BC using various immune infiltration algorithms. Our findings demonstrated that the high-CTRGPS group exhibited elevated immune infiltration abundance across numerous immune cell types, including T cells, NK cells, B cells, plasma cells, myeloid-derived suppressor cells (MDSCs), dendritic cells, and others (Fig. 7A). Importantly, the CTRGPS score showed associations with a wide array of immune cells, such as Th2 cells, activated CD4 T cells, MDSCs, monocyte lineage, macrophage M1, and more (Fig. 7B). Furthermore, we examined the expression levels of immune modulators, including antigen presentation, cell adhesion molecules, co-inhibitors, co-stimulators, ligands, and receptors, revealing higher expression levels in the high-CTRGPS group (Fig. 7C). Notably, a positive correlation was observed between CTRGPS scores and classical immune checkpoint molecules, such as LAG-3, CTLA-4, and IDO-1 (Fig. 7D). Additionally, our study demonstrated that the high-CTRGPS group exhibited activation in the initial six steps of the cancer immunity cycle, encompassing antigen release, cancer antigen presentation, priming and activation, recruitment of tumor immune infiltrating cells, immune cells infiltration, and cancer cells recognition by T cells (Fig. 7E). For nearly all immune cell subpopulations, the high CTRGPS group displayed significantly higher ssGSEA enrichment scores (Fig. 7F). Intriguingly, elevated scores of all immune-related functions were identified in the high-CTRGPS group compared to the low-CTRGPS group, consistent with our previous observations (Fig. 7G). Mantel test analysis further underscored the significant association of CTRGPS with various immune cell subpopulations and functions (Fig. 7G).

Fig. 7
figure 7

Immune-related characteristics of the CTRGPS. (A) Estimation of immune infiltrating cells using multiple algorithms between high- and low-CTRGPS. (B) Relationship between CTRGPS and immune cell infiltrations. (C) Estimation of immune modulator molecules between high- and low-CTRGPS. (D) Relationship between CTRGPS and immune modulator molecules. (E) Box plot showing the dissimilarities in the cancer immunity cycle between high- and low-CTRGPS. (F) Box plot comparing scores for 16 immune cell types between high- and low-CTRGPS. (G) Box plot comparing scores for 13 immune-related functions between high- and low-CTRGPS. (H) Butterfly diagram illustrating the correlation between immune cell infiltration, immune-related functions, and CTRGPS. Violin plots comparing the ESTIMATE score (I), stromal score (J), immune score (K), and tumor purity (L) between high- and low-CTRGPS

Finally, ESTIMATE analysis revealed that the high-CTRGPS group exhibited a lower stromal score and tumor purity but higher immune score and ESTIMATE score compared to the low-CTRGPS group (Fig. 7I-L). These findings collectively underscore the pivotal role of CTRGPS in shaping the tumor immune microenvironment in BC.

Drug sensitivity analysis of CTRGPS group

To elucidate the potential implications of CTRGPS in precision therapy and personalized drug selection, we conducted a thorough analysis to identify the sensitivities of 16 common chemotherapeutic agents sourced from the GDSC database. Remarkably, we observed significant differences in drug sensitivity between the low- and high-CTRGPS groups. Notably, the high-CTRGPS group exhibited lower sensitivities across all 16 drugs (Fig. 8A). Specifically, CTRGPS was significantly associated with 15 other chemotherapeutic agents, excluding epirubicin, with notable impacts observed in drugs such as cisplatin, cyclophosphamide, paclitaxel, and teniposide (Fig. 8B). In summary, our findings suggest that CTRGPS holds promise as a potential therapeutic strategy in the treatment of BC patients. Moreover, based on the results of univariate analysis (Fig. 8C), TTK emerged with the highest hazard ratio, indicating its pivotal role as a positive factor influencing the prognosis of BC patients. Consequently, we categorized BC patients into high- and low-TTK groups using median TTK expression as a cutoff and compared the differences in sensitivity to chemotherapy and targeted drugs based on their semi-inhibitory concentration values.

Fig. 8
figure 8

Drug sensitivity analysis of CTRGPS group. (A) Box plot comparing the sensitivity (IC50) of 16 common chemotherapy drugs in clinical practice between high- and low-CTRGPS. (B) Butterfly diagram illustrating the correlation between drugs and CTRGPS. (C) Univariate analysis of six genes. (D) Box plot comparing the sensitivity (IC50) of 16 common chemotherapy drugs in clinical practice between low- and high-TTK groups

Excitingly, the results revealed significant differences in sensitivity among 14 drugs, with higher mRNA levels of TTK associated with lower drug sensitivity (Fig. 8D). This suggests the potential of TTK as a therapeutic target to impede BC progression and highlights its significance in personalized treatment strategies.

TTK promotes malignant proliferation of BC cell

Our investigation into the role of TTK in BC progression commenced with an analysis of its mRNA levels across multiple datasets. The findings revealed elevated mRNA levels of TTK in tumor tissues compared to normal tissues, with particularly pronounced upregulation observed in triple-negative breast cancer (TNBC) (Fig. 9A). We utilized the CCLE dataset to acquire mRNA expression of BC cell lines. The results showed that high expression level of TTK was observed in mostly BC cell lines, especially TNBC cells (Fig. 9B). Additionally, immunohistochemistry (IHC) results sourced from the HPA database provided further support, demonstrating a significant upregulation of TTK protein levels in breast tumor tissue (Fig. 9C). To deepen our understanding of the impact of TTK on BC cell proliferation and migration, we conducted experiments involving the transfection of two siRNAs (siTTK-1, siTTK-2) into MDA-MB-231 cells. The results revealed a notable reduction in TTK protein levels upon transfection (Fig. 9D). Subsequently, siTTK-1 was chosen for further investigations, including cell viability and scratch wound healing assays. Notably, TTK knockdown led to a significant decrease in cell viability (Fig. 9E) and impaired the migration ability of MDA-MB-231 cells (Fig. 9F, G). These findings highlight the significant role of TTK in promoting the malignant proliferation of BC cells, suggesting its potential as a therapeutic target for inhibiting BC progression.

Fig. 9
figure 9

Illustrates the role of TTK in promoting malignant proliferation of breast cancer cells. Panel (A) compares the expression levels of TTK across normal and various breast cancer subtypes in TCGA, METABRIC, and GSE162228 datasets. Panel (B) displays distribution of mRNA expression across different breast cancer cell lines. Panel (C) displays the distribution of TTK protein expression in normal breast and tumor tissues from the HPA database. Panel (D) demonstrates the efficiency of TTK knockdown in MDA-MB-231 cells. Panels (E-G) depict the results of the cell viability CCK-8 assay and wound-healing assay, respectively, with statistical significance denoted as *P < 0.05, **P < 0.01, and ***P < 0.005. NC represents the negative control

Discussion

CD8 + T cells play a pivotal role in the immune response against breast cancer (BC), orchestrating cytotoxic effects on tumor cells and modulating immune function [34, 35]. The advent of immunotherapy, particularly immune checkpoint inhibitors and CAR-T cell therapies, has revolutionized BC treatment, offering patients novel therapeutic avenues and renewed hope [36, 37]. However, challenges persist in accurately predicting prognosis and optimizing treatment timing, highlighting the critical need for novel biomarkers such as CD8 + T-cell-related gene signatures [38].

Precision medicine in breast cancer prognosis faces challenges due to the inadequacy of conventional indicators like tumor grading and size, which may not accurately predict prognosis or guide optimal treatment timing [39]. Existing prognostic models often suffer from subjective algorithm selection and lack of validation across diverse datasets, leading to suboptimal performance or overfitting [40]. To overcome these limitations, we employed ten commonly used machine learning algorithms to construct 101 models, aiming to enhance prognostic accuracy while minimizing overfitting. Our findings highlight the superiority of the combined CoxBoost and Enet (alpha = 0.2) model in breast cancer prognosis, offering enhanced translational capabilities by simplifying the model and reducing variable dimensionality. Similarly, CTRGPS emerges as an independent risk factor surpassing traditional clinical indicators such as grading and tumor size. Furthermore, comparison with 52 previously published signatures underscores the superior performance of CTRGPS, indicating its potential for clinical applications.

Enrichment analysis of CTRGPS-related genes revealed enrichment in various cellular processes, cancer-related signaling pathways, and biological systems. Particularly noteworthy were the significant correlations between the high CTRGPS group and biological processes associated with cell cycle dysregulation and tumor malignant proliferation. These findings provide insights into the underlying mechanisms contributing to the poor prognosis observed in patients with high CTRGPS scores.

A growing body of evidence underscores the pivotal role of the tumor immune microenvironment in breast cancer (BC) prognosis and response to immunotherapy [41, 42]. Interestingly, our study revealed that BC patients in the high CTRGPS group exhibited abundant immune cell infiltration, including T cells, Th cells, MDSCs, B cells, DCs, macrophage M1, and Treg cells, which are known to modulate anti-tumor or pro-tumor immune responses in the context of immunotherapy [43,44,45,46]. However, despite this immune cell infiltration, patients in the high CTRGPS group experienced a worse prognosis, contrary to conventional understanding. This paradoxical outcome may be attributed to immune exhaustion or the production of immunosuppressive factors. Prolonged and sustained activation of highly infiltrated immune cells in the tumor microenvironment may lead to gradual loss of immune cell function, impairing their ability to recognize and eliminate tumor cells, ultimately diminishing treatment efficacy and prognosis [46]. Additionally, tumor cells and their surrounding immune cells produce inhibitory molecules or cytokines, such as IL-10 and TGF-β released by tumor-associated macrophages [47], which further suppress immune cell function and contribute to poorer prognosis in patients with higher immune cell infiltration. Meanwhile, hub genes in CTRGPS also have potential roles in immune cells surrounding tumors. Studies have shown that TTK plays an important role in the reprogramming of the immune microenvironment of TNBC. Inhibiting TTK can effectively induce the STING signaling pathway and promote anti-tumor immunity through the infiltration and activation of CD8 + T cells [48]. In other cancers, inhibiting TTK can also promote immunotherapy response by activating the tumor STING signaling pathway [49]. In addition, DSC2 is closely associated with immune cell infiltration in a variety of cancers, including osteosarcoma and bladder cancer [50, 51]. DSC2 significantly affects the tumor immune microenvironment through oxidative stress, and its expression level is positively correlated with the infiltration amount of CD4 memory-activated T cells, mast cells, and neutrophils in cutaneous melanoma [52]. In the study of glioblastoma, it was found that the reduction of PTP4A2 can significantly inhibit tumor growth and induce the tumor microenvironment (TME) to shift to an immunosuppressive state [53]. Meanwhile, the expression of VAV3 in NK cells and its unique function in triggering NK cytotoxicity suggest that it may play an important role in the tumor immune microenvironment [54]. Additionally, CIRBP activates T cells through a TLR4-dependent mechanism, which reveals a novel mechanism of CIRBP involvement in gynecological tumor progression through regulation of T cells [55]. Finally, this study also identified MOAP1, a new gene associated with CD8 T cell prognosis. However, the specific function and mechanism of this gene in the immune microenvironment of BC need further research and clarification.

Moreover, the positive correlation between CTRGPS and classical immune checkpoint molecules, including IDO-1, CTLA-4, CD40, CD86, LAG-3, and IL2RA, suggests that BC patients with high CTRGPS scores may derive greater benefit from immunotherapy. Furthermore, our study evaluated the drug sensitivity of CTRGPS using the GDSC database, revealing significant correlations with all 15 clinically common drugs. This finding suggests potential for combination medication in BC patients and underscores the utility of CTRGPS as a reference for early identification of drug-sensitive BC patients. This research contributes to the development of biomarker-guided precision treatment plans, enabling the selection of optimal drugs and treatment intensity for individual BC patients. Additionally, our study delved into the potential molecular functions of TTK, a critical component of the spindle assembly checkpoint involved in mitosis, in the malignant progression of BC. These findings support TTK as a promising therapeutic target for BC, warranting further investigation into its molecular mechanisms in future studies.

In our study, a similar approach [20] were also used to first identify CD8 + T cell marker genes using single-cell sequencing datasets. Subsequently, CTRGPS was developed based on a combination of 10 emerging machine learning algorithms [21, 56]. However, the distinction lies in the overfitting has been solved to improve the robustness and generalization ability of the signature in this study. Meanwhile, the sensitivity of CTRGPS to drug in this study was further applied to be one of the functional validation methods apart from conventional enrichment and immune microenvironment analysis [21, 57] to evaluate the potential of CTRGPS in the treatment of BC patients serve as one of the therapeutic strategies. As in most similar studies [20, 56], function of hub genes of prognostic signature was also explored by molecular experiments in this study. However, the difference from the previously studies is that we identified the most relevant prognostic gene, TTK, by univariate analysis before performing molecular experiments. In summary, more rigorous approaches in the construction of prognostic signature and the selection of hub genes were used to maintain the overall logic of this study and the feasibility of molecular experiments compared with others’ studies. However, it is essential to acknowledge the limitations of this study. Firstly, the included cohorts differed in terms of sequencing timing, size, and platform, despite employing correction algorithms such as batch removal and normalization to mitigate these differences. Secondly, further investigation is warranted to elucidate the molecular mechanisms underlying the impact of CTRGPS genes on tumor progression. Lastly, all included datasets were from single-center retrospective studies, highlighting the need for future validation of CTRGPS in prospective multicenter cohorts.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Zheng X, Ma H, Wang J, Huang M, Fu D, Qin L, et al. Energy metabolism pathways in breast cancer progression: the reprogramming, crosstalk, and potential therapeutic targets. Transl Oncol. 2022;26:101534.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Yin Q, Ma H, Bamunuarachchi G, Zheng X, Ma Y. Long non-coding RNAs, cell cycle, and human breast Cancer. Hum Gene Ther. 2023;34(11–12):481–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Harbeck N, Gnant M. Breast cancer. Lancet. 2017;389(10074):1134–50.

    Article  PubMed  Google Scholar 

  4. Araujo AM, Abaurrea A, Azcoaga P, Lopez-Velazco JI, Manzano S, Rodriguez J et al. Stromal oncostatin M cytokine promotes breast cancer progression by reprogramming the tumor microenvironment. J Clin Invest. 2022;132(7).

  5. Nalio Ramos R, Missolo-Koussou Y, Gerber-Ferder Y, Bromley CP, Bugatti M, Nunez NG, et al. Tissue-resident FOLR2(+) macrophages associate with CD8(+) T cell infiltration in human breast cancer. Cell. 2022;185(7):1189–207. e25.

    Article  CAS  PubMed  Google Scholar 

  6. Gajewski TF, Schreiber H, Fu YX. Innate and adaptive immune cells in the tumor microenvironment. Nat Immunol. 2013;14(10):1014–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Virassamy B, Caramia F, Savas P, Sant S, Wang J, Christo SN, et al. Intratumoral CD8(+) T cells with a tissue-resident memory phenotype mediate local immunity and immune checkpoint responses in breast cancer. Cancer Cell. 2023;41(3):585–601. e8.

    Article  CAS  PubMed  Google Scholar 

  8. Leclerc M, Voilin E, Gros G, Corgnac S, de Montpreville V, Validire P, et al. Regulation of antitumour CD8 T-cell immunity and checkpoint blockade immunotherapy by Neuropilin-1. Nat Commun. 2019;10(1):3345.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Uhl LFK, Cai H, Oram SL, Mahale JN, MacLean AJ, Mazet JM, et al. Interferon-gamma couples CD8(+) T cell avidity and differentiation during infection. Nat Commun. 2023;14(1):6727.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zhang N, Bevan MJ. CD8(+) T cells: foot soldiers of the immune system. Immunity. 2011;35(2):161–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Mami-Chouaib F, Blanc C, Corgnac S, Hans S, Malenica I, Granier C, et al. Resident memory T cells, critical components in tumor immunology. J Immunother Cancer. 2018;6(1):87.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Trefny MP, Kirchhammer N, Auf der Maur P, Natoli M, Schmid D, Germann M, et al. Deletion of SNX9 alleviates CD8 T cell exhaustion for effective cellular cancer immunotherapy. Nat Commun. 2023;14(1):86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lin F, Ke ZB, Xue YT, Chen JY, Cai H, Lin YZ, et al. A novel CD8(+) T cell-related gene signature for predicting the prognosis and immunotherapy efficacy in bladder cancer. Inflamm Res. 2023;72(8):1665–87.

    Article  CAS  PubMed  Google Scholar 

  14. Hu C, Li T, Xu Y, Zhang X, Li F, Bai J, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 2023;51(D1):D870–6.

    Article  CAS  PubMed  Google Scholar 

  15. Aibar S, Gonzalez-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. 2017;14(11):1083–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20(3):307–15.

    Article  CAS  PubMed  Google Scholar 

  17. Zheng X, Ma H, Dong Y, Fang M, Wang J, Xiong X, et al. Immune-related biomarkers predict the prognosis and immune response of breast cancer based on bioinformatic analysis and machine learning. Funct Integr Genomics. 2023;23(3):201.

    Article  CAS  PubMed  Google Scholar 

  18. Liu Z, Liu L, Weng S, Guo C, Dang Q, Xu H, et al. Machine learning-based integration develops an immune-derived lncRNA signature for improving outcomes in colorectal cancer. Nat Commun. 2022;13(1):816.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Liu Z, Guo C, Dang Q, Wang L, Liu L, Weng S, et al. Integrative analysis from multi-center studies identities a consensus machine learning-derived lncRNA signature for stage II/III colorectal cancer. EBioMedicine. 2022;75:103750.

    Article  CAS  PubMed  Google Scholar 

  20. Zhu W, Zeng H, Huang J, Wu J, Wang Y, Wang Z, et al. Integrated machine learning identifies epithelial cell marker genes for improving outcomes and immunotherapy in prostate cancer. J Transl Med. 2023;21(1):782.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Qin H, Abulaiti A, Maimaiti A, Abulaiti Z, Fan G, Aili Y, et al. Integrated machine learning survival framework develops a prognostic model based on inter-crosstalk definition of mitochondrial function and cell death patterns in a large multicenter cohort for lower-grade glioma. J Transl Med. 2023;21(1):588.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Chu G, Ji X, Wang Y, Niu H. Integrated multiomics analysis and machine learning refine molecular subtypes and prognosis for muscle-invasive urothelial cancer. Mol Ther Nucleic Acids. 2023;33:110–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Racle J, Gfeller D. EPIC: a Tool to Estimate the proportions of different cell types from bulk gene expression data. Methods Mol Biol. 2020;2120:233–48.

    Article  CAS  PubMed  Google Scholar 

  24. Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Plattner C, Finotello F, Rieder D. Deconvoluting tumor-infiltrating immune cells from RNA-seq data using quanTIseq. Methods Enzymol. 2020;636:261–85.

    Article  CAS  PubMed  Google Scholar 

  26. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462(7269):108–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18(1):220.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Li T, Fan J, Wang B, Traugh N, Chen Q, Liu JS, et al. TIMER: a web server for Comprehensive Analysis of Tumor-infiltrating Immune cells. Cancer Res. 2017;77(21):e108–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Yoshihara K, Shahmoradgoli M, Martinez E, Vegesna R, Kim H, Torres-Garcia W, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612.

    Article  PubMed  Google Scholar 

  30. Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, et al. The Immune Landscape of Cancer. Immunity. 2018;48(4):812–30. e14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform. 2021;22(6).

  32. Fang WK, Liao LD, Li LY, Xie YM, Xu XE, Zhao WJ, et al. Down-regulated desmocollin-2 promotes cell aggressiveness through redistributing adherens junctions and activating beta-catenin signalling in oesophageal squamous cell carcinoma. J Pathol. 2013;231(2):257–70.

    Article  CAS  PubMed  Google Scholar 

  33. Yin Q, Ma H, Dong Y, Zhang S, Wang J, Liang J, et al. The integration of multidisciplinary approaches revealed PTGES3 as a novel drug target for breast cancer treatment. J Transl Med. 2024;22(1):84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. St Paul M, Ohashi PS. The roles of CD8(+) T cell subsets in Antitumor Immunity. Trends Cell Biol. 2020;30(9):695–704.

    Article  CAS  PubMed  Google Scholar 

  35. Reina-Campos M, Scharping NE, Goldrath AW. CD8(+) T cell metabolism in infection and cancer. Nat Rev Immunol. 2021;21(11):718–38.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Gaynor N, Crown J, Collins DM. Immune checkpoint inhibitors: key trials and an emerging role in breast cancer. Semin Cancer Biol. 2022;79:44–57.

    Article  CAS  PubMed  Google Scholar 

  37. Xu N, Palmer DC, Robeson AC, Shou P, Bommiasamy H, Laurie SJ et al. STING agonist promotes CAR T cell trafficking and persistence in breast cancer. J Exp Med. 2021;218(2).

  38. Ye F, Dewanjee S, Li Y, Jha NK, Chen ZS, Kumar A, et al. Advancements in clinical aspects of targeted therapy and immunotherapy in breast cancer. Mol Cancer. 2023;22(1):105.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Park YH, Lee SJ, Cho EY, Choi Y, Lee JE, Nam SJ, et al. Clinical relevance of TNM staging system according to breast cancer subtypes. Ann Oncol. 2011;22(7):1554–60.

    Article  CAS  PubMed  Google Scholar 

  40. Zhang N, Zhang H, Wu W, Zhou R, Li S, Wang Z, et al. Machine learning-based identification of tumor-infiltrating immune cell-associated lncRNAs for improving outcomes and immunotherapy responses in patients with low-grade glioma. Theranostics. 2022;12(13):5931–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Fu T, Dai LJ, Wu SY, Xiao Y, Ma D, Jiang YZ, et al. Spatial architecture of the immune microenvironment orchestrates tumor immunity and therapeutic response. J Hematol Oncol. 2021;14(1):98.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Park J, Hsueh PC, Li Z, Ho PC. Microenvironment-driven metabolic adaptations guiding CD8(+) T cell anti-tumor immunity. Immunity. 2023;56(1):32–42.

    Article  CAS  PubMed  Google Scholar 

  43. Borst J, Ahrends T, Babala N, Melief CJM, Kastenmuller W. CD4(+) T cell help in cancer immunology and immunotherapy. Nat Rev Immunol. 2018;18(10):635–47.

    Article  CAS  PubMed  Google Scholar 

  44. Shimasaki N, Jain A, Campana D. NK cells for cancer immunotherapy. Nat Rev Drug Discov. 2020;19(3):200–18.

    Article  CAS  PubMed  Google Scholar 

  45. Hollern DP, Xu N, Thennavan A, Glodowski C, Garcia-Recio S, Mott KR, et al. B cells and T Follicular Helper Cells Mediate Response to checkpoint inhibitors in high mutation Burden mouse models of breast Cancer. Cell. 2019;179(5):1191–206. e21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wu Y, Yi M, Niu M, Mei Q, Wu K. Myeloid-derived suppressor cells: an emerging target for anticancer immunotherapy. Mol Cancer. 2022;21(1):184.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Mirlekar B. Tumor promoting roles of IL-10, TGF-beta, IL-4, and IL-35: its implications in cancer immunotherapy. SAGE Open Med. 2022;10:20503121211069012.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Hu X, Li G, Li S, Wang Q, Wang Y, Zhang P, et al. TTK inhibition activates STING signal and promotes anti-PD1 immunotherapy in breast cancer. Biochem Biophys Res Commun. 2024;694:149388.

    Article  CAS  PubMed  Google Scholar 

  49. Bharti V, Kumar A, Wang Y, Roychowdhury N, de Lima Bellan D, Kassaye BB et al. TTK inhibitor OSU13 promotes immunotherapy responses by activating tumor STING. JCI Insight. 2024;9(15).

  50. Zeng J, Sun Y, Man Y, Tang H, Xie L, He M. Validation the role of desmocollin-2 in osteosarcoma based on single cell and bulk RNA seq and experimental analyses. J Cancer. 2023;14(14):2619–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Fu Y, Sun S, Bi J, Kong C, Yin L. Construction and analysis of a ceRNA network and patterns of immune infiltration in bladder cancer. Transl Androl Urol. 2021;10(5):1939–55.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Rong D, Su Y, Jia D, Zeng Z, Yang Y, Wei D, et al. Experimentally validated oxidative stress -associated prognostic signatures describe the immune landscape and predict the drug response and prognosis of SKCM. Front Immunol. 2024;15:1387316.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Chouleur T, Emanuelli A, Souleyreau W, Derieppe MA, Leboucq T, Hardy S, et al. PTP4A2 promotes Glioblastoma Progression and Macrophage polarization under Microenvironmental pressure. Cancer Res Commun. 2024;4(7):1702–14.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Cella M, Fujikawa K, Tassi I, Kim S, Latinis K, Nishi S, et al. Differential requirements for vav proteins in DAP10- and ITAM-mediated NK cell cytotoxicity. J Exp Med. 2004;200(6):817–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Shang C, Li Y, Wu Z, Han Q, Zhu Y, He T, et al. The prognostic value of DNA Methylation, post-translational modifications and correlated with Immune infiltrates in gynecologic cancers. Pharmgenomics Pers Med. 2021;14:39–53.

    PubMed  PubMed Central  Google Scholar 

  56. Zhang Y, Wang Y, Chen J, Xia Y, Huang Y. A programmed cell death-related model based on machine learning for predicting prognosis and immunotherapy responses in patients with lung adenocarcinoma. Front Immunol. 2023;14:1183230.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Chen H, Yang W, Ji Z. Machine learning-based identification of tumor-infiltrating immune cell-associated model with appealing implications in improving prognosis and immunotherapy response in bladder cancer patients. Front Immunol. 2023;14:1171420.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the National Natural Science Foundation of China (82302966) the Key Projects of Medical Science and Technology of Henan Province (SBGJ202102199), the Joint Fund of Henan Science and Technology Research (232103810048), the Key Project of Medical and Health Development Project (2302016 A), Fundamental Research Funds for the Henan University of Science and Technology (QNY) and Henan Science and Technology Research Plan and A-type Doctoral Talent Project of the Henan University of Science and Technology (XWZ).

Author information

Authors and Affiliations

Authors

Contributions

HM and XZ: Writing—original draft, Writing—review and editing, Bioinformatic analysis. HM and JZ: Bioinformatic analysis, Molecular experiments. SZ, ST and ZQ: Writing—review and editing, statistical analysis. YC: Revising—Bioinformatic analysis. LS and LZ: Writing—review and editing. XX: Ethical approval. XZ and QY: Visualization, Supervision, Writing—original draft, Writing—review and editing. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xuewei Zheng or Qinan Yin.

Ethics declarations

Ethics approval and consent to participate

The study protocol was performed in accordance with the Declaration of Helsinki and approved by the Medical Research Ethics Committee of the First Affiliated Hospital of Nanchang University [(2023) CDYFYYLK (04–036)]. No patient was involved in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Shi, L., Zheng, J. et al. Advanced machine learning unveils CD8 + T cell genetic markers enhancing prognosis and immunotherapy efficacy in breast cancer. BMC Cancer 24, 1222 (2024). https://doi.org/10.1186/s12885-024-12952-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12885-024-12952-w

Keywords