Skip to main content

Prediction of prognostic signatures in triple-negative breast cancer based on the differential expression analysis via NanoString nCounter immune panel

Abstract

Background

Triple-Negative Breast Cancer (TNBC) is an aggressive and complex subtype of breast cancer. The current biomarkers used in the context of breast cancer treatment are highly dependent on the targeting of oestrogen receptor, progesterone receptor, or HER2, resulting in treatment failure and disease recurrence and creating clinical challenges. Thus, there is still a crucial need for the improvement of TNBC treatment; the discovery of effective biomarkers that can be easily translated to the clinics is essential.

Methods

We report an approach for the discovery of biomarkers that can predict tumour relapse and pathologic complete response (pCR) in TNBC on the basis of mRNA expression quantified using the NanoString nCounter Immunology Panel. To overcome the limited sample size, prediction models based on random Forest were constructed using the differentially expressed genes (DEGs) as selected features. We also evaluated the differences between pre- and post-treatment groups aiming for the combinatorial assessment of pCR and relapse using additive models in edgeR.

Results

We identify nine and 13 DEGs strongly associated with pCR and relapse, respectively, from 579 immune genes in a small number of samples (n = 55) using edgeR. An additive model for the comparison of pre- and post-treatment groups via the adjustment of the independent subject in the relapse group revealed associations for 41 genes. Comprehensive analysis indicated that our prediction models outperformed those constructed using features extracted from the existing feature selection model Elastic Net in terms of accuracy. The prediction models were assessed using a randomization test to validate the robustness (empirical P for the model of pCR = 0.015 and empirical P for the model of relapse = 0.018). Furthermore, three DEGs (FCER1A, EDNRB, and TGFBI) in the model of relapse showed prognostic significance for predicting the survival of patients with cancer through Cox proportional hazards regression model-based survival analysis.

Conclusion

Gene expression quantified via the NanoString nCounter Immunology Panel can be seamlessly analysed using edgeR, even considering small sample sizes. Our approach provides a scalable framework that can easily be applied for the discovery of biomarkers based on the NanoString nCounter Immunology Panel.

Data availability

The source code will be available from github at https://github.com/sungheep/nanostring.

Peer Review reports

Background

The subtypes of breast cancer have distinct pathological features and clinical implications and primarily include hormone receptor-positive breast cancer, HER2-positive breast cancer, and triple-negative breast cancer (TNBC). Of note, breast cancer classification depends on protein or gene expression profiling; importantly, it provides helpful information for prognosis establishment and adoption of treatment strategies. Breast cancer therapy involves drugs that target oestrogen, progesterone, and HER2 receptors expressed on hormone receptor-positive and HER2-positive breast cancer cells, respectively [1]. However, TNBC does not respond to these therapies, including tamoxifen or trastuzumab, as no specific receptors are expressed in TNBC. TNBC is characterized by its invasiveness, widespread metastasis, and high post-treatment relapse rates, although many studies have attempted to predict the aetiology, response to treatment, and prognosis of TNBC [2]. In this study, we aimed to identify prognostic biomarkers for TNBC to facilitate improvements in the current treatment approaches.

The NanoString nCounter Analysis System is composed of a prep station and a digital analyser and is used to quantify gene expression levels and chromosome variations. This system identifies target genes using specific 100-mer probes and simultaneously analyses 800 genes. A key advantage of the NanoString technology with respect to next-generation sequencing (NGS) is the absence of an amplification step and the ability to directly quantify target molecules, thus preventing artificial amplification bias.

Since 2014, numerous studies have attempted to analyse TNBC using a NanoString nCounter Immunology Panel [3,4,5,6]. Most expression analyses studies using NanoString nCounter data [7] focused on statistical analyses and clustering analyses with gene heat maps similar to those used in microarray data analysis. Such statistical analyses included the Mann-Whitney U test [5, 6, 8, 9], the t-test, or the analysis of variance [3, 6], as well as the Fisher’s exact test [5, 9], Spearman’s correlation [5, 10], and negative binomial distribution [8]. Using the NanoString nCounter Immunology Panel, gene expression can be quantified as the counts measured in a manner similar to that used to quantify expression on the NGS platform; the statistical methods generally used are also used in microarray data analysis.

We developed an approach for biomarker discovery which predicts relapse and pathological complete responses (pCR) after neoadjuvant chemotherapy in TNBC as per learning prediction models using random Forest with features selected via the analysing of differential gene expression using edgeR. The data was obtained using the NanoString nCounter Immunology Panel. This study takes advantage of the prognostic model for predicting tumour relapse and pCR with a small sample size via the application of edgeR to assess differential gene expression, which is suitable for nCounter Immunology Panel analysis in feature selection.

Methods

Paraffin-embedded tissue biopsy samples from 55 TNBC patients treated with anthracycline and taxane-based neoadjuvant chemotherapy (or surgery) from 2010 to 2012 at the Asan Medical Centre. The study was approved by the Institutional Review Board of the Asan Medical Center (IRB No. 2013–0866). We used whole sections of biopsy tissues with usually 4–5 cores. Importantly, when we cut the FFPE blocks, we used different blades for each sample after cleaning the microtome with 70% ethanol to prevent cross-contamination. So, our experiment is free from cross-contamination. The samples were processed using the GX Human Immunology V2kit (NanoString Technologies, Seattle, WA, USA) for NanoString nCounter Gene expression analysis of a total of 579 immunology-related human genes [5]. Among the 55 cases, a NanoString nCounter assay was performed using specimens from 14 patients, including 6 cases of pCR, after treatment. The clinical data of the patients (Additional file 5: Table S1), including the survival time, survival parameters, and chemotherapeutic responsiveness (pCR, residual cancer burden, Miller Payne grade), were collected. Thereafter, a two-row count matrix was constructed; one row showed the 55 samples and the other showed the 14 samples collected after treatment.

We have subjected the same samples from 56 patients in our data set to anti-CD3, CD8, and -CD20 immunohistochemistry (IHC) staining. Correlations between IHC staining intensity and gene expression levels were then assessed using the NanoString nCounter platform and are summarized (Additional file 5: Table S2). Spearman correlation ranged from 0.623–0.761. We analysed the spearman correlation between the expression of five genes (CD3D, CD3E, CD8A, CD8B, and CD20) quantified using the NanoString nCounter platform and the immunohistochemistry staining results for CD3+, CD8+, and CD20+ cells. Of note, in the CD3+, CD8+, and CD20+ cells, the intensity of immunohistochemistry stainings was highly correlated with the expression of T and B cell markers. With respect to this observation, gene expression quantified via the NanoString nCounter platform overall highly correlated with IHC stainings.

Feature selection

Our cohort contained a small number of samples (n = 55). Feature selection methods including the Elastic Net (EN) may not guarantee high accuracy in the context of small data sets. The NanoString nCounter platform allows a count format, unlike microarray analysis, for quantifying gene expression on an NGS platform without amplification. We used edgeR, which identifies differentially expressed genes (DEGs) under different conditions, using a negative binomial statistical model fitted to these observed counts [11]. EdgeR can identify DEGs in a relatively small number of samples (n ≥ 2).

DEGs identified by edgeR were considered as features to develop prediction models and for survival analysis (Fig. 1). Functional annotations were performed using DAVID to evaluate the significant signature genes identified via the edgeR analysis.

Fig. 1
figure1

A Work flow diagram

Feature selection using edgeR

We converted raw count data to log counts per million, normalized the counts using the trimmed mean of M-values in edgeR, and estimated the dispersions. DEGs were analysed to select features useful in prediction models of pCR and relapse. After setting the design matrix (e.g., pCR = 1 vs non-pCR = 0, conditions for the pCR model, and relapse = 1 and non-relapse = 0 for the relapse model), significant DEGs were identified at a false discovery rate (FDR) < 0.05.

To identify DEGs using edgeR for comparisons pre- and post-treatment (PRE and POST, respectively), we first divided samples from biopsies performed prior to neoadjuvant chemotherapy (or operation specimens) and those collected after neoadjuvant chemotherapy. Fourteen paired samples (pre- and post-treatment) were stratified based on relapse and pCR (Additional file 1: Figure S1). We conducted three DEG tests to evaluate combinations of disease status (relapse and pCR) and treatments (e.g., PRE and POST). The three combinations were POST and pCR, POST and relapse, and PRE/POST and relapse. For a DEG test, an additive model was designed and applied to detect genes differentially expressed in samples obtained at POST and PRE, adjusting for the baseline difference between patients. In the combination POST and relapse, two samples, NS010 in the non-relapse group and NS032 in the relapse group, were omitted because of heterogeneity in their multidimensional scaling plot (Additional file 1: Figure S2). After the design matrix was constructed, normalization was performed and dispersions were estimated. Significant DEGs were identified at an FDR < 0.05.

Feature selection using lasso and elastic net

We compared the performance of our approach to that of EN to determine whether DEG selection using edgeR in our method was more appropriate for the selection of features for prediction models. We used “glmnet” from the R package to implement EN.

Prediction model

Data pre-processing

Global normalization was performed by dividing the raw count data into sequencing depths, and normalization was carried out considering relatively different gene lengths. We subsequently stratified the dataset into a ratio of 7: 3 for the training and test sets, with pCR and relapse groups as class labels. Datasets for each pCR and relapse set are shown in Table 1.

Table 1 Dataset

Prediction using the random Forest models

Considering the DEGs in pCR and relapse sets, we trained random Forest models to predict pCR and relapse. One limitation of this classification is that greater sample imbalance leads to more biased results in larger sample groups [12]. To compensate for this problem, we performed oversampling using SMOTE. After oversampling, the small group was matched at a ratio of 1:1 to the large group, and the random Forest model was trained. We used sklearn’s GridSearchCV in the hyperparameter optimization process to optimise model performance. Five hyperparameters (n_estimator, max_feature, max_depth, min_samples_split, and min_samples_leaf) were used among several hyperparameters to maximize model performance, and a 5-fold cross-validation was performed to avoid overfitting caused by the small number of samples.

Evaluation

To evaluate the performance of the predictive models, we used AUC metrics determined via receiver operating characteristic analysis. In addition, the positive prediction value (PPV: TP/TP + FP) was used, which is widely employed to assess the performance of a diagnostic test.

Randomization test

Hypothesis tests were conducted to determine whether the accuracy of the pCR and relapse prediction models resulted from chance events. Empirical P values were calculated as per 1000 permutations using the same number of features and same hyperparameter optimization used in our prediction models and rerunning a prediction model within each permuted dataset. The AUC and PPV of a prediction model for a permuted dataset were assessed with respect to their accuracy.

Survival analysis using cox proportional hazard model

We used the Cox proportional hazard model (Cox proportional hazards regression model), which is a survival analysis model for multiple variables, using all genes in the prediction models [13]. For prognostic prediction, significant genes (P < 0.05) were identified as per survival analysis of the Cox proportional hazards regression models.

Functional enrichment analysis

Functional analysis was performed for DEGs extracted via edgeR using DAVID (Database for Annotation, Visualisation and Integrated Discovery, http://david.abcc.ncifcrf.gov).

Protein interactions using STRING

To analyse the pathological significance of breast cancers of feature-derived DEGs, network analysis was performed with STRING (biological database and visualization for network analysis); https://string-db.org/cgi/input.pl?sessionId=Sdm7S6Tqzlf4&input_page_show_search=on

Results

The overall workflow of our study is shown in Fig. 1. The analysis of differentially expressed genes (DEGs) using edgeR for feature selection, the construction of prediction models, and the survival analysis are depicted. Functional analysis was performed using DAVID for signature genes, while the biological interpretation was carried out through a literature survey.

Feature selection

Differential gene expression analysis was performed using edgeR for pCR and relapse conditions. Nine and 13 DEGs associated with pCR and relapse, respectively, were identified (Table 2). We also analysed the differences between pre- and post-treatment paired samples upon combinatorial assessment of pCR and relapse. PRE refers to biopsies performed prior to neoadjuvant chemotherapy (n = 55), whereas POST indicates operations performed after neoadjuvant chemotherapy (n = 14, including 6 pCR cases in Additional file 1: Figure S1). We evaluated multiple experimental factors. Three tests were performed to evaluate the combinations of disease status (relapse and pCR) and treatments (e.g., PRE or POST); i.e., comparison of a POST and pCR group with a POST and relapse group, comparison of a POST and relapse group with a POST and non-relapse group, and comparison of POST and PRE relapse groups, which revealed significant DEGs. One (e.g., KLRG2) and three (HLA-DQA1, HLA-DQB1, and CEACAM6) genes were significantly associated with the combination of POST treatment and pCR and relapse groups, respectively (Table 3). An additive model for the comparison of PRE to POSTgroups by adjusting the independent subject in the relapse group revealed associations for 41 genes (Additional file 1: Table S1). KLRG2 and CEACAM6 were included as DEGs in the PRE and POST comparison in the relapse group. Differences in these DEGs between PRE and POST groups were much greater than those between the relapse and non-relapse groups (Additional file 2: Figure S1).

Table 2 DEGs in pCR and relapse groups
Table 3 DEGs in combinations of pCR and relapse with POST treatment

Prediction of pCR and relapse

The performance of our predictive models for pCR and relapse, based on the random Forest method, showed an area under the curve (AUC) of 0.84 and a positive predictive value (PPV) of 0.7 for the pCR predictive model (Table 4). For the relapse predictive model, the AUC was 0.88 and PPV was 0.69 (Table 4). We compared the performance of these models to those of the classic feature selection method, Elastic Net (EN) (Fig. 2 and Table 4). The predictive models based on EN for pCR showed an AUC and PPV of 0.64 and 0, respectively, whereas our pCR analysis predicted these values to be 0.84 and 0.7, respectively. The PPV of our pCR model was not comparable to that of EN; the performance was low for our immune panel data and did not control the false-positive rate. The relapse model based on EN predicted an AUC and PPV of 0.68 and 0.23, respectively. Comprehensive analysis indicated that our model outperformed the EN with respect to the type I error rate (Table 4). Prediction models involving combinatorial assessment of PRE and POST groups for pCR and relapse models (Table 3) were not constructed, as the sample size of POST is limited.

Table 4 Performance comparison with our model and Elastic Net model
Fig. 2
figure2

Comparison of our prediction models with EN-based prediction models. a AUCs between ours and the EN-based prediction model of pCR. Blue represents our RF model of pCR constructed on the features selected using edgeR, which is more robust than that the model represented in green, based on features selected using EN. b AUCs between ours and EN-based prediction model of relapse. Blue represents our RF model of relapse constructed on the features selected using edgeR

We performed a randomization analysis to investigate whether the significant DEGs in the prediction models were identified by chance. For pCR analysis based on random feature selection, 15 cases met our cut-off for accuracy (empirical P = 0.015) to reject the null hypothesis. The models were superior to our prediction model of pCR with respect to the AUC (> 0.84) and PPV (> 0.7) (Fig. 3). For relapse, 18 cases (P = 0.018) displayed a superior AUC (> 0.88) and PPV (> 0.69) than that of our prediction model of relapse (Fig. 4).

Fig. 3
figure3

A randomization test for pCR. Plots present the distributions of the AUC and PPV values for all 1000 permutations. Histograms represent (a) AUCs and (b) PPV of permutations. Red represents AUC or PPV of permutations below (Table 4) whilst green represents permutations superior to our thresholds

Fig. 4
figure4

A randomization test for relapse. Plots present the distribution of the AUC and PPV values for all 1000 permutations. Histograms represent (a) AUC and (b) PPV of permutations. Red represents AUC or PPV below (Table 4) whilst green represents permutations superior to our thresholds

Survival analysis

We used the Cox proportional hazards regression model to evaluate if the nine and 13 genes in the prediction models of pCR and relapse affected patient survival to determine their value as prognostic markers. In the pCR survival plot (Fig. 5a), a significant difference was observed between pCR and non-pCR patients during the first 4 years. In contrast to pCR survival analysis, recurrence-free survival was observed among patients showing expression changes in 13 DEGs, with very poor survival during the first 4 years (Fig. 5b). TNBC relapse was predicted to be 80% in earlier years, decreasing to less than 10% at 4 years, indicating that the DEGs are associated with a high risk of TNBC relapse during the first 4 years. The prognostic impact of these DEGs was constant after exceeding this point.

Fig. 5
figure5

Survival analysis using the Cox proportional hazards regression model. a Stands for the cox model built on pCR genes. Red indicates pCR and blue non-pCR. b Stands for the cox model built on relapse genes. Red represents relapse and blue non-relapse

Two significant genes (FCER1A and CD1A) in the Cox proportional hazard regression model (Table 5) were downregulated in the pCR group (Fig. 6). The violin plots for the remaining genes in the pCR model are shown in Additional file 3. In the relapse group, three genes (i.e. FCER1A, EDNRB, and TGFBI) were significantly up-regulated (P < 0.05; Table 6 and Fig. 7). The violin plots for the remaining 10 genes in the relapse model are shown in Additional file 4.

Table 5 Significant genes in overall survival analysis for the pCR model
Fig. 6
figure6

Violin plots for significant genes in the Cox model of pCR. a and b represent CD1A and FCER1A, respectively. CD1A, and FCER1A have high expression in non-pCR

Table 6 Significant genes in disease free survival analysis for the relapse model
Fig. 7
figure7

Violin plots for significant genes in the Cox model of relapse. Violin plots for three significant genes in the Cox proportional hazards regression model. a Corresponds to EDNRB with high expression at relapse, b corresponds to FCER1A with high expression at relapse, and c corresponds to TGFBI with high expression at relapse

Interpretation of functional annotations and network analysis

Significant DEGs were evaluated via functional enrichment analysis using the DAVID functional annotation tool. The results of functional annotation are summarised in Table 7 (e.g., nine genes for pCR and 13 genes for relapse) and showed that DEGs for pCR and relapse had similar functions. Both were enriched in cytokine-cytokine interactions. As indicated by KEGG pathway annotation, cytokine-cytokine receptor interactions play an important role in cell proliferation and differentiation, survival, and pathogen resistance; the release of cytokines in response to infection, inflammation, and immunity can inhibit tumorigenesis and cancer progression [9].

Table 7 Functional enrichment analysis for pCR and RELAPSE models

Network analysis was performed using the STRING database. Gene interactions were determined using the STRING database through experiments, text mining, and gene fusions. Among the DEGs in the pCR group, CCL5, CCL20, CD1A, and IL2RA interact with CCR5 and CCR6, alias CMKBR6 (Fig. 8). Gene-gene interactions were detected in the relapse group (Fig. 9); CCL5, GZMB, IL2RA, SELE, and CCL8 interacted with CCR5 in both the pCR and relapse groups and were associated with tumour progression [14], and metastasis [10]. In patients with breast cancer, CCR5 and its ligand CCL5 were found to be upregulated among DEGs in the pCR group [10]. Furthermore, CCR5 is a novel therapeutic target for metastatic cancer, and recent clinical trials have targeted this gene in breast and colon cancer.

Fig. 8
figure8

An interaction network for genes in the pCR model. The interaction network is derived from STRING for the nine significant genes in the pCR model. FCER1A did not form any interaction as per the STRING Database. Red represents interactions derived from gene fusion, purple from experiments, and yellow from text mining

Fig. 9
figure9

An interaction network for genes in the relapse model. Of 13 genes in the relapse model, eight genes formed interactions. Five genes on the top left corner did not form any interaction as per the STRING database. The interaction network is derived from STRING for the nine significant genes. Red represents interactions derived from gene fusion, purple from experiments, and yellow from text mining

CCR6 interacted with CCL20 in the pCR group and is a receptor of CCL20; overexpression of CCL20 augments mitogen-activated protein kinase and protein kinase C signalling, resulting in tumour progression [15,16,17]. Significant enrichment in molecular functions determined by Gene Ontology analysis of genes and their interaction partners in the STRING database are summarised in Table 8.

Table 8 Molecular function enrichment of genes in interaction networks

Discussion

In this study, we developed a novel biomarker discovery approach in the context of pCR and relapse in TNBC. We used edgeR for feature selection from the NanoString nCounter Immunology Panel and constructed prediction models for pCR and relapse for TNBC based on selected features using the random Forest method. Moreover, we verified the gene signatures of pCR and relapse prediction models for TNBC treatment through a literature survey.

Two significant genes (CD1a and FCER1A) related with the survival outcome in the pCR prediction model have been reported as conventional dendritic cell markers and are highly expressed in innate antigen-presenting cells infiltrating breast cancer tissues [18], which is consistent with our findings (Fig. 6). We found that low expression of conventional dendritic cell markers (CD1a and FCER1A) was associated with pCR, potentially affecting the overall patient survival in TNBC; however, a previous study [19] reported no significant association between the levels of CD1a + tumour-infiltrating dendritic cells and pCR in either the primary tumours or axillary lymph node metastasis. There has been no previous study evaluating the effects of CD1a + in dendritic cells in the context of breast cancer survival (in patients receiving neoadjuvant chemotherapy).

Of the three significant genes (FCER1A, EDNRB, and TGFBI) in the relapse prediction model, EDNRB is located on chromosome 13 and encodes a G protein-coupled receptor. EDNRB downregulation can prevent TNBC progression and may be a biomarker candidate for TNBC treatment efficacy prediction [20]. It has been reported that TGFBI is associated with both breast cancer inhibition [21] and progression [22]. A recent study revealed that TGFBI was upregulated in treatment-resistant TNBC MDA-MB-231/IR cells and is involved in drug and radiation resistance [23]. No previous study has reported that TGFBI is associated with recurrence-free survival of patients with TNBC. In total, three genes were considered as targeted diagnostic and treatment biomarkers for TNBC. Nevertheless, additional experimental validation of these genes as new drug targets is required.

The cost of NGS-based approaches for immune profiling remains high, despite the high demand for this method. This may result in the analysis of limited sample sizes. Most machine learning methods for feature selection are suitable for large datasets or show poor performance for small sample sizes.

Conclusions

We described a practical approach for analysing transcriptome data generated using the NanoString nCounter Immunology platform with a matrix similar to that of an NGS platform, which currently involves statistical methods based on microarray analysis. This study provides a framework for transcriptome analysis in NGS, which can be applied to data obtained using the NanoString nCounter Immunology Panel.

Availability of data and materials

The dataset supporting the conclusions of this article is available from the Gene Expression Omnibus - accession ID GSE: GSE143222.

References

  1. 1.

    Fragomeni SM, Sciallis A, Jeruss JS. Molecular Subtypes and Local-Regional Control of Breast Cancer; 2018.

    Google Scholar 

  2. 2.

    Fleisher B, Clarke C, Ait-Oudhia S. Current advances in biomarkers for targeted therapy in triple-negative breast cancer. Breast Cancer: Targets and Therapy. 2016.

    Google Scholar 

  3. 3.

    Li X, Warren S, Pelekanou V, Wali V, Cesano A, Liu M, et al. Immune profiling of pre- and post-treatment breast cancer tissues from the SWOG S0800 neoadjuvant trial. J Immunother Cancer. 2019;7(1):88.

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Liu MC, Pitcher BN, Mardis ER, Davies SR, Friedman PN, Snider JE, et al. PAM50 gene signatures and breast cancer prognosis with adjuvant anthracycline-and taxane-based chemotherapy: Correlative analysis of C9741 (alliance). Npj Breast Cancer. 2016;2:15023.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Lee HJ, Lee JJ, Song IH, Park IA, Kang J, Yu JH, et al. Prognostic and predictive value of NanoString-based immune-related gene signatures in a neoadjuvant setting of triple-negative breast cancer: relationship to tumor-infiltrating lymphocytes. Breast Cancer Res Treat. 2015;151(3):619–27.

    CAS  Google Scholar 

  6. 6.

    Foulds GA, Vadakekolathu J, Abdel-Fatah TMA, Nagarajan D, Reeder S, Johnson C, et al. Immune-phenotyping and transcriptomic profiling of peripheral blood mononuclear cells from patients with breast cancer: identification of a 3 gene signature which predicts relapse of triple negative breast cancer. Front Immunol. 2018;9:2028.

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Prat A, Lluch A, Albanell J, Barry WT, Fan C, Chacón JI, et al. Predicting response and survival in chemotherapy-treated triple-negative breast cancer. Br J Cancer. 2014;111(8):1532–41.

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Waggott D, Chu K, Yin S, Wouters BG, Liu FF, Boutros PC. NanoStringNorm: An extensible R package for the pre-processing of nanostring mRNA and miRNA data. Bioinformatics. 2012;28(11):1546–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Lee M, Rhee I. Cytokine signaling in tumor progression. Immune Network; 2017.

    Google Scholar 

  10. 10.

    Singh SK, Mishra MK, Eltoum IEA, Bae S, Lillard JW, Singh R. CCR5/CCL5 axis interaction promotes migratory and invasiveness of pancreatic cancer cells. Sci Rep. 2018;8:1–12.

    Google Scholar 

  11. 11.

    Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009;26(1):139–40.

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak. 2011;11:51.

    PubMed  PubMed Central  Google Scholar 

  13. 13.

    Bradburn MJ, Clark TG, Love SB, Altman DG. Survival Analysis Part II: Multivariate data analysis- An introduction to concepts and methods. Br J Cancer. 2003;89(3):431–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Norton KA, Wallace T, Pandey NB, Popel AS. An agent-based model of triple-negative breast cancer: the interplay between chemokine receptor CCR5 expression, cancer stem cells, and hypoxia. BMC Syst Biol. 2017;11(1):68.

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Ranasinghe R, Eri R. Modulation of the CCR6-CCL20 Axis: A Potential Therapeutic Target in Inflammation and Cancer. Medicina (Kaunas, Lithuania). 2018;54(5):88.

    Google Scholar 

  16. 16.

    Boyle ST, Faulkner JW, McColl SR, Kochetkova M. The chemokine receptor CCR6 facilitates the onset of mammary neoplasia in the MMTV-PyMT mouse model via recruitment of tumor-promoting macrophages. Mol Cancer. 2015;14:115.

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Chen W, Qin Y, Wang D, Zhou L, Liu Y, Chen S, et al. CCL20 triggered by chemotherapy hinders the therapeutic efficacy of breast cancer. PLoS Biol. 2018;16(7):e2005869.

    PubMed  PubMed Central  Google Scholar 

  18. 18.

    Adjustment of dendritic cells to the breast-cancer microenvironment is subset specific | Nature Immunology. https://www.nature.com/articles/s41590-018-0145-8. Accessed 8 Jan 2020.

  19. 19.

    The Differential Contribution of the Innate Immune System to a Good Pathological Response in the Breast and Axillary Lymph Nodes Induced by Neoadju... - PubMed - NCBI. https://www.ncbi.nlm.nih.gov/pubmed/28913366. Accessed 8 Jan 2020.

  20. 20.

    Li Y, Liao Q, Li K, Zhong D, Weng X, Mi M. Knockdown of endothelin A receptor expression inhibits osteosarcoma pulmonary metastasis in an orthotopic xenograft mouse model. Mol Med Rep. 2012;5(6):1391–5.

    CAS  PubMed  Google Scholar 

  21. 21.

    The role of TGFBI in mesothelioma and breast cancer: association with tumor suppression. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3480943/. Accessed 8 Jan 2020.

  22. 22.

    TGFBI Expression Reduces In Vitro and In Vivo Metastatic Potential of Lung and Breast Tumor Cells. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3167173/. Accessed 8 Jan 2020.

  23. 23.

    Baicalein Suppresses Stem Cell-Like Characteristics in Radio- and Chemoresistant MDA-MB-231 Human Breast Cancer Cells through Up-Regulation of IFIT2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6471144/. Accessed 8 Jan 2020.

Download references

Acknowledgements

Not applicable.

Funding

The National Research Foundation of Korea supported this work, with funding from the Ministry of Education, Science, and Technology (NRF-2016R1A6A3A11934851 and NRF-2019R1I1A1A01058752). This study was also supported by a grant (2018–785) from the Asan Institute for Life Sciences, Asan Medical Center, Seoul, Korea. The funding body had no role in the design of the study, the collection, analysis, and interpretation of data, and in the writing of the manuscript.

Author information

Affiliations

Authors

Contributions

GBL carried out the analysis of differential expression and the implementation of prediction models, and wrote the manuscript. YAK, JHS, HJH, and GG designed and performed the experiments. SHP designed the whole procedure of data analysis, implement the initial code for prediction models and survival analysis and wrote the manuscript. GBL, YAK, JHS, HJL, GG, and SHP finalised the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Gyungyub Gong or Sung Hee Park.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Institutional Review Board of Asan Medical Center (IRB No. 2013–0866). Written informed consent was obtained from all eligible participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

PRE_POST analysis. Figure S1. Data distribution in PRE and POST groups. Figure S2. A MDS plot for POST relapse and POST non-relapse groups. Table S1. DEGs for POST vs PRE in the relapse group.

Additional file 2.

41 DEGs for comparison of POST to PRE in relapse. Figure S1. Violin plots for 41 DEGs for comparison of POST versus PRE in relapse.

Additional file 3.

Violin plots for nine DEGs in the pCR model. Figure S1. Violin plots of nine pCR DEGs. IL2RA, CCL5, SELE, CCL20, CD7, genes have high expression in pCR. FCER1A, CD1A, HAMP, C4A.B genes have high expression in non-pCR.

Additional file 4.

Violin pots for 13 DEGs in the relapse model. Figure S1. Violin plots of 13 relapse DEGs. CCL5, CCL7, TNFSF13B, CSF2RB, CLEC4E, CCL8, SELE, IL17B, IL2RA, and GZMB genes have high expression in non-RELAPSE. FCER1A, EDNRB, TGFBI genes have high expression in relapse.

Additional file 5.

Clinical data for all patients. Table S1: Clinical data for all patients. Table S2. Spearman correlation between immunohistochemistry staining and gene expression quantified using the NanoString nCounter platform.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lim, G.B., Kim, YA., Seo, JH. et al. Prediction of prognostic signatures in triple-negative breast cancer based on the differential expression analysis via NanoString nCounter immune panel. BMC Cancer 20, 1052 (2020). https://doi.org/10.1186/s12885-020-07399-8

Download citation

Keywords

  • Triple negative breast cancer
  • Prognostic marker
  • NanoString nCounter immune panel
  • Differential expression
  • Pathological complete response
  • Relapse
  • Prediction model
  • Random Forest