Skip to main content

Unraveling the causal links and novel molecular classification of Crohn’s disease in breast Cancer: a two-sample mendelian randomization and transcriptome analysis with prognostic modeling

Abstract

Background

Crohn’s disease (CD), a prominent manifestation of chronic gastrointestinal inflammation, and breast cancer (BC), seemingly disparate in the medical domain, exhibit a shared characteristic. This convergence arises from their involvement in chronic inflammation and immune responses, an aspect that has progressively captivated the attention of investigators but remain controversial.

Methods

We used two-sample Mendelian Randomization (MR) and transcriptomics to explore the relationship between CD and BC. MR assessed causality of CD on different BC subtypes and reverse causality of BC on CD. We identified CD-related differentially expressed genes and their prognostic impact on BC, and developed a new molecular BC classification based on these key genes.

Results

MR revealed a causal link between CD and increased BC risk, especially in estrogen receptor-positive (ER+) patients, but not in ER-negative (ER-) cases. BC showed no causal effect on CD. Transcriptomics pinpointed genes like B4GALNT2 and FGF19 that affected BC prognosis in CD patients. A nomogram based on these genes predicted BC outcomes with high accuracy. Using these genes, a new molecular classification of BC patients was proposed.

Conclusions

CD is a risk factor for ER + BC but not for ER- BC. BC does not causally affect CD. Our prognostic model and new BC molecular classifications offer insights for personalized treatment strategies.

Peer Review reports

Background

Breast cancer (BC) and Crohn’s disease (CD) are distinct medical conditions, with BC being a prevalent malignancy among women and CD being a chronic inflammatory disease of the gastrointestinal tract [1, 2]. Currently, there’s no consensus on the relationship between them. This study aims to explore any underlying association between these diseases. While CD mainly affects the gastrointestinal tract, leading to inflammation, ulceration, and stenosis, its exact cause remains uncertain, though genetics, environmental factors, and changes in gut microbiota are believed to play roles [3]. In contrast, BC is a widespread malignant tumor that originates in breast tissue and is also possible in men, only at a lower incidence rate [4]. BC is currently the most commonly diagnosed cancer in the world, posing a significant burden to healthcare systems and human health alike [5].

Although CD and BC arise in distinct physiological systems, there is a burgeoning interest in investigating the potential connection between these two diseases, because both diseases share a fundamental characteristic of chronic inflammation and an associated immune response. Chronic inflammation has been identified as a potential catalyst for tumorigenesis in a variety of contexts [6, 7]. Some studies have confirmed that first-degree relatives of individuals with CD disease exhibit a higher incidence of BC [8, 9]. Therefore, this has also prompted us to investigate the potential interactions between CD and BC. However, studies on the effect of the chronic inflammatory disease CD on BC are scarce and have yielded mixed conclusions [10,11,12]. Moreover, to our knowledge, there have been no further studies on the subtypes, mechanisms and immunologic effects of CD on BC. Moreover, the investigation of the intricate relationship between CD and BC is expected to reveal potential common molecular pathways and common risk factors that influence the development of these diseases. These findings hold the potential to open avenues for novel strategies in risk stratification, early detection, and personalized therapeutic interventions.

Our research focuses on these interactions, exploring shared mechanisms and clinical implications. We believe that understanding their connection is vital for better medical care. To minimize interference from other factors, we used a two-sample MR approach to analyze the link between CD and BC risk, including its different forms. Two-Sample MR uses genetic variants as instrumental variables (IVs) to infer causal relationships, effectively controlling for confounding factors in observational studies. The core principle is that if a genetic variant is associated with an outcome solely through its effect on an exposure, then this genetic variant can serve as an instrumental variable to estimate the causal effect of the exposure on the outcome [13]. We also conducted transcriptomic, molecular typing, and immunocorrelation analyses to deepen our understanding of their association. Through this comprehensive exploration, we aim to provide insights beneficial for future medical practices and expand knowledge on CD and BC.

Methods

The schematic representation of our research design is illustrated (Fig. 1). The sources of MR and transcriptome data are publicly accessible online.

Fig. 1
figure 1

Flowchart of the study process

Mendelian randomization analysis

We employed genetic markers drawn from the GWAS repository (https://gwas.mrcieu.ac.uk) as IVs. We employed a two-sample MR technique, supplemented with bidirectional MR for robustness. To better understand links between CD and BC variants, we separately assessed associations with ER + and ER- breast cancers. Further details on these groupings are delineated in Table S1.

To ensure analytical rigor and mitigate confounders, the following single nucleotide polymorphisms (SNPs) criteria as IVs were set: (1) Independence among SNPs; (2) Specific thresholds: P < 5e-08, r2 < 0.001, and clumping distance exceeding 10,000 kb, ensuring minimized linkage effects; (3) A demonstrable strength in the SNPs, as evidenced by F-statistics above 20; (4) Using the PhenoScanner tool (http://www.phenoscanner.medschl.cam.ac.uk/). We used Phenoscanner to validate that each selected SNP primarily influences the exposure factor, in this case, Crohn’s disease, and not other traits that could confound the results. Analyses were executed using the ‘Two Sample MR’ R package and the IVW (Inverse-variance Weighted) method was our primary tool [14].

Transcriptomic data analysis

Data Acquisition and differentially expressed genes (DEGs) analysis

Differential expression analysis for the CD expression profile dataset GSE69762 (N = 30) obtained from the GEO database (https://www.ncbi.nlm.nih.gov/) used the ‘limma’ R package [15]. DEGs were identified with a significance threshold of P < 0.05 and a minimum fold change of 1.2. For The Cancer Genome Atlas (TCGA) (https://www.cancer.gov/) BC dataset (N = 1218), we conducted Principal Component Analysis (PCA) following z-score normalization and the removal of outlier samples thereby ensuring that subsequent analyses were performed on a dataset of high quality and homogeneity. We performed Limma analysis with a significance threshold of P < 0.05 and a minimum fold change of 1.5 to identify DEGs of BC. Subsequently, we further assessed the overlap between DEGs identified in CD and BC.

Weighted Gene Co-expression Network Analysis (WGCNA) of differential genes

Utilizing WGCNA, we analyzed a group of 549 detected DEGs. For analytical robustness, the Median Absolute Deviation (MAD) for every gene was computed initially. Using the ‘goodSamplesGenes’ function within the WGCNA package in R, we pinpointed and excluded outlier genes and samples [16].

Enrichment analysis

An enrichment assessment was carried out on 549 DEGs, with 8 genes being omitted due to ineffective clustering. For this analysis, we sourced Gene Ontology (GO) annotations and utilized the Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway annotations.

Regression analysis and prognostic prediction model construction

In this study, we selected a cohort of 1034 patients excluding those with incomplete clinical information and patients with a follow-up time of zero from TCGA BC dataset. In our research, we employed the ‘glmnet’ package in R to execute regression through the Least Absolute Shrinkage and Selection Operator combined with the Cox Proportional-Hazards approach (Lasso-Cox) [17]. We also implemented a 10-fold cross-validation for optimal model determination. The choice of lambda was determined through cross-validation, where we tested a range of lambda values and selected the one that minimized the cross-validation error. This approach ensures that the lambda parameter optimally balances the trade-off between model complexity and predictive accuracy, thereby improving the generalizability of our model to new data. Based on the 5 prognostically relevant genes expression data and coef values, we constructed a RiskScore = 0.0135*B4GALNT2 + 0.0184*C10orf10-0.0266*FGF19 + 0.0354*HLF + 0.0748*SAP30. The determination of the optimal cutoff value for the riskscore was executed using the ‘maxstat’ [18]. Subsequently, we harnessed the ‘survival’ package to scrutinize the differences in overall survival (OS) [19]. We undertook an in-depth analysis that combined several clinical indicators, such as survival duration, survival outcome, RiskScore, age, T stage, N stage, M stage, and other pertinent data. This was achieved using the ‘rms’ package in R [20]. For evaluating the precision of our models, we initiated Receiver Operating Characteristic (ROC) curve analysis using the ‘pROC’ package [21]. GSE20685(N = 327) obtained from the GEO database (https://www.ncbi.nlm.nih.gov/) was used to validate the accuracy of the predictive model.

Novel molecular classification of Crohn’s disease in breast cancer

The key genes for molecular classification were determined through Lasso-Cox analysis. The R package ‘ConsensusClusterPlus’ was utilized for executing consensus clustering on the TCGA-BRCA dataset [22]. The optimal k was determined using Euclidean distances with the k-means algorithm. We subsequently scrutinized the associations between subgroups and gene expressions. Following this, the Gene Set Variation Analysis (GSVA) was employed to compare the variations in gene sets between different groups, based on the R package ‘GSVA’ [23]. Additionally, we performed single-sample Gene Set Enrichment Analysis (ssGSEA) between different subgroups using the R package ‘GSEABase’ [24]. CIBERSORT was to employed analyze immune cell infiltration. We used the Wilcoxon test for comparisons between two groups, and the Kruskal-Wallis test for comparisons of more than two groups.

Results

Results of studies related to mendelian randomization

Details regarding the inclusion of SNPs in each exposure and outcome group can be found in Table 1. The results from the IVW analysis indicate a causal relationship between CD and BC. CD is identified as a risk factor for BC (OR = 1.0431, 95%CI:1.0149–1.0721, P = 0.0025). Moreover, tests for genetic pleiotropy and heterogeneity did not find statistically significant differences (P > 0.05), suggesting the reliability and stability of our results. Furthermore, we created SNPs forest plots and conducted leave-one-out validation. These results confirm that CD is a risk factor for BC, and the causal relationship is not driven by a single SNP (Fig. 2A-B). All MR methods show consistent directional predictions, indicating that the predictions are uniformly positive (Fig. 2C). The funnel plot exhibits excellent bilateral symmetry (Fig. 2D). In contrast, the reverse validation for the relationship between BC and CD shows that the causal relationship is not established (P > 0.05) (Figure S1A-B).

Table 1 Causal effects of CD on BC and its subtypes
Fig. 2
figure 2

The causality of Crohn’s Disease on Breast Cancer risk or ER+ Breast Cancer risk. (A, E) The forest plot of all SNPs represents the causal effects. The red points demonstrate the integrated estimates all SNPs, horizontal lines represent 95% confidence intervals. (B, F) Leave-one-out analysis was employed to evaluate the individual impact of each SNP on the causal effect. The red point denotes the inverse-variance weighted estimate all SNPs. (C, G) The scatter plot illustrates the estimated effect of all MR methods. (D, H) The symmetrical funnel plot indicates the absence of significant horizontal pleiotropy. Vertical lines represent estimates with all SNPs

To further investigate the impact of CD on BC subtypes, our results indicate that CD has a causal relationship with ER + BC (OR = 1.0213, 95%CI:1.0026–1.0404, P = 0.0258). There is some horizontal heterogeneity in the results(P < 0.05), therefore a random effects IVW analysis was used. Importantly, these results were verified to be free from genetic pleiotropy (P > 0.05). Furthermore, SNPs forest plots and leave-one-out validation provide additional evidence that CD is a risk factor for ER + BC, with the causal relationship not being influenced by a single SNP (Fig. 2E-F). Consistently, scatter plots confirm the consistent direction of all methods predictions (Fig. 2G). The funnel plot also demonstrates excellent bilateral symmetry (Fig. 2H). However, it’s worth noting that the causal relationship between CD and ER- BC was not established (P > 0.05) according to the validation results (Figure S1C-D).

Transcriptome analysis results

Regarding the differential expression analysis in the GSE69762 CD dataset using limma, we observed that the CD group exhibits significant differential expression compared to the control group. There were 880 upregulated genes and 842 downregulated genes. (Fig. 3A-B). Performing PCA analysis on the BC dataset (Figure S1E). After removing outlier samples, we observed 3,134 significantly upregulated genes and 4,496 downregulated genes in BC relative to normal samples (Fig. 3C-D). Interestingly, there is an overlap of 557 differentially expressed genes between CD and BC (Fig. 3E).

Fig. 3
figure 3

Transcriptomic analysis of DEGs in BC and CD patients. (A) The volcano plot presents the expression pattern of CD DEGs in the GSE69762 Dataset. Red: up-regulation; Green: down-regulation. (B) The Heatmap presents the expression pattern of CD DEGs in the GSE69762 Dataset. (C) The volcano plot presents the expression pattern of BC DEGs in the TCGA-BRCA Database. Red: up-regulation; Green: down-regulation. (D) The Heatmap presents the expression pattern of BC DEGs in the TCGA-BRCA Database. (E) Venn diagram illustrating the common and unique DEGs between CD and BC

We conducted WGCNA based on the data of 557 shared differentially expressed genes. Firstly, we determined the optimal soft threshold value as 4 through analysis of scale independence and average connectivity (Fig. 4A-B). The selection of this soft threshold value improved the quality of sample clustering (Figure S1F) and gene clustering (Fig. 4C). The results of the WGCNA analysis revealed a total of 6 co-expression modules (Fig. 4D). Notably, the ‘grey’ module comprised 8 genes that could not be assigned to any other module. Additionally, the ‘turquoise’ and ‘brown’ module encompassed 274 differentially expressed genes highly correlated with BC (Fig. 4E). The correlation was further confirmed by the scatterplot of the Gene Significance (GS) and Module Membership (MM) of these genes (Fig. 4F-G). After omitting 8 genes from the ‘grey’ module, we proceeded with both GO and KEGG enrichment analyses to gain insights into the biological roles and pathways of the remaining 549 differentially expressed genes. The results highlight the top ten GO enrichment categories, indicating a profound enrichment of the genes in fundamental biological undertakings, notably immune regulation and cell signaling (Fig. 4H, S1G-H). These genes also prominently feature in cellular structures like the cytoplasm and cell membrane and partake in molecular functions encompassing receptor binding, growth factor activation, and signal transduction. Complementing these insights, the KEGG analysis pinpoints the major involvement of these genes in several tumor immunity-related pathways (Fig. 4I). Noteworthy are the TNF signaling pathway and the pathways associated with PD-L1 expression and the PD-1 checkpoint. The centrality of these pathways in governing immune reactions, cellular signaling, and inflammatory responses underscores the genes’ pivotal roles in influencing tumor immunity.

Fig. 4
figure 4

WGCNA analysis of 557 CD and BC co-differentially expressed genes. (A) Analysis of scale-free index for the different softthresholding powers. The appropriate soft-power was four. (B) Analysis of average connectivity across varying soft threshold values highlights the appropriateness of a soft threshold of 4. (C) Clustering dendrogram of 557 genes according to the measurement of dissimilarity. Genes are hierarchically divided into six modules with different colors. (D) Module Eigengene Clustering Analysis of six modules based on the similarity of module eigengenes. The “distance” in this context refers to the Topological Overlap Measure (TOM) distance, which is a measure of network connectivity. TOM represents the interconnectedness between genes. A lower TOM distance indicates a higher similarity between gene expression profiles. (E) Module-Phenotype Correlation Heatmap Analysis based on the correlation between six modules and phenotypic characteristics of breast tumors. (F-G) Gene Significance and Module Membership Heatmap of Brown Module and Turquoise Module. (H) GO enrichment analysis of DEGs in biological processes except gray module genes. (I) KEGG enrichment analysis of DEGs except gray module genes

Nomogram model construction

By setting the Lambda parameter to 0.0243 and merging the regression coefficient results (coef values) with gene expression profiles, we derived a ‘RiskScore’ based on 5 genes. (Fig. 5A-B). In our survival analysis, we set the optimal threshold at 1.0173, categorizing patients into high-risk and low-risk cohorts. This distinction yielded a pronounced prognosis difference between the groups (P < 0.05) (Fig. 5C). Remarkably, FGF19 exhibited a protective influence, whereas B4GALNT2, C10orf10, HLF, and SAP30 emerged as risk determinants (Fig. 5D). In order to delve deeper into the potential of the RiskScore predictive model for prognosis in BC patients, we concurrently considered clinical factors such as age and TNM staging, and conducted Cox regression analysis. Our results unequivocally reveal a significant and pivotal correlation between age and RiskScore with the prognosis of BC patients. Furthermore, we vividly illustrate the impact of different T, N and M subgroups on prognosis (Fig. 5E). This visual representation assists us in gaining a more profound understanding of the distinct roles played by various staging categories in the prognosis of BC patients. Using the Cox method, we constructed a nomogram to assess the prognostic significance of the aforementioned features in the cohort of 1034 samples. The overall C-index of the model was 0.78, 95%CI: (0.74–0.82), P < 0.001 (Fig. 5F). The calibration curve demonstrated good performance (Fig. 5G). The optimal cutoff value for the predictive model score was determined to be 0.3282. Based on this value, patients were stratified into high-risk (H) and low-risk (L) groups. Further analysis and testing revealed a significant difference in OS between the two groups (HR = 4.52, 95%CI:3.20–6.38, P < 0.001) (Fig. 5H). ROC curve analysis was used to verify the accuracy of the model (Fig. 5I).

Fig. 5
figure 5

Selection of prognostic genes and construction of risk score. (A-B) A regression model with LASSO algorithm was used to avoid overfitting and identify key prognostic genes. (C) The Kaplan-Meier survival analysis of the high- (Blue) and low- (Red) risk score groups. (D) The distribution of risk score, the scatter plot of survival status, and the heatmap of gene expression in the prognostic model were presented. (E) Multivariate COX regression analysis indicating the riskscore signature was an independent risk factor. (F) A nomogram for overall survival prediction in patients with BC. (G) Calibration displaying actual risk probability at 1, 3, and 5 Years. Indicating power for predicting survival for patients with breast cancer. (H) The Kaplan-Meier survival analysis of the nomogram prognostic model. Blue, the high-risk group. Red, the low-risk group. (I) Time-dependent ROC curves demonstrate nomogram accuracy. 1-year AUC: 0.86, 3-years AUC: 0.79, 5-years AUC: 0.79. (J) The Kaplan-Meier curve of overall survival in the GSE20685 cohort to validate the predictive power of the nomogram. (K-L) ROC Curves for 3-years and 5-years Survival in the Validation Set GSE20685. 3-years AUC: 0.72, 5-years AUC: 0.65

To validate our prognostic prediction model, we used the GSE20685 dataset as an external validation cohort. The optimal cutoff value for the predictive model score, determined from the training dataset, was 0.3282. Using this cutoff, we classified 327 breast cancer patients into high-risk (H) and low-risk (L) groups. A significant difference in survival was observed between the two groups (P < 0.001) (Fig. 5J). To further verify the accuracy of the survival predictions in the validation cohort, we performed ROC curve analysis (Fig. 5K-L), demonstrating the model’s robustness and reliability in distinguishing between high-risk and low-risk patients.

Novel molecular classification of Crohn’s disease in breast cancer

In our endeavor to uncover a new molecular categorization of BC informed by prognostic gene expression, we undertook unsupervised class discovery for 1034 BC patients from the TCGA database. We ascertained the ideal cluster count through alterations in the cumulative distribution function (CDF) curve area and through consensus heatmaps (Figures S1I-J), and it was identified as 3 clusters (k = 3) (Fig. 6A). The heatmap reveals strong consensus within these groups, as indicated by the blue blocks along the diagonal. Subsequently, we categorized all 1034 patients into three subgroups: CD-BC1 (367 cases, 35.5% of the total), CD-BC2 (239 cases, 23.1% of the total), and CD-BC3 (428 cases, 41.4% of the total). We observed substantial differences in gene expression within the new BC clustering modules, in the genes B4GALNT2, C10orf10, FGF19, HLF, and SAP30 (Fig. 6B). In the analyzed data, the gene B4GALNT2 stood out for its significant high expression in the CD-BC2 group. This marked upregulation suggests the potential of B4GALNT2 as a marker gene for diagnostic subgroups, especially for distinguishing the CD-BC2 group from others. Concurrently, both HLF and C10orf10 exhibited marked down-regulation in the CD-BC1 subgroup, while showing notable up-regulation in the CD-BC3 cohort. The trend for SAP30 was observed to be the inverse. Furthermore, immunoinfiltration analysis indicated marked differences in tumor immune infiltrating cells between different modules (Fig. 6C). The infiltration level of several immune cells, like activated B cells, activated CD4+ T cells, MDSC (Myeloid-Derived Suppressor Cells), and macrophages, displays significant variations among the clusters. Lastly, through ssGSEA, we identified significant differences in the enrichment of signaling pathways among these three novel molecular subgroups (Fig. 6D-F). The color gradients, from blue (low enrichment) to red (high enrichment), illustrate the pathway enrichment intensities. Compared to the other two groups, the CD-BC2 group distinctly exhibited upregulation in the pathways ‘Basal Cell Carcinoma’ and ‘Hedgehog Signaling Pathway’. This marked increase in pathway activity may offer insights into the pathogenic mechanism and biological behavior of breast cancer in the groups, providing a more detailed explanation for its underlying processes. These findings underscore the clinical significance of our new molecular classification in terms of functional enrichment and immune characteristics, offering valuable insights for the treatment and management of BC patients.

Fig. 6
figure 6

The unsupervised consensus clustering algorithm was used to investigate the molecular subtypes of CD-BC patients. (A) The consensus clustering matrix for k = 3, determined as the optimal number of clusters, is displayed. (B) The differential expression analysis of prognosis-related genes of different molecular subtypes of BC. (C) The immune infiltration analysis of prognosis-related genes of different molecular subtypes of BC. (D-F) The ssGSEA signal pathway enrichment analysis of prognosis-related genes of different molecular subtypes of BC

Discussion

Inflammation and immune-related aspects have long been pivotal in cancer research. CD is an autoimmune disorder characterized by an exaggerated immune response against the normal intestinal flora, causing damage to the intestinal mucosal tissues, intestinal wall thinning, and the formation of ulcers and fistulas [1]. BC is a serious and heterogeneous disease characterized by the excessive proliferation of breast epithelial cells, leading to high morbidity [25]. Despite their significance, there remains a need for comprehensive investigations into the pathogenesis of both CD and BC, particularly in relation to their potential interconnection. To address potential confounding and reverse causality, we conducted an MR analysis to explore the existence of a causal link between CD and BC. For a deeper investigation into the underlying mechanisms and impacts, we developed and validated a genetic prognostic model through gene transcriptome analysis. Furthermore, we introduced an innovative molecular classification specific to BC. We also examined its influence on the tumor immune microenvironment in BC.

The results demonstrated the establishment of a causal relationship between CD and the development of BC, validating CD as a risk factor for BC development. And this study further explored to validate the effect of CD on BC subtypes, and surprisingly found that the causal relationship of CD was established for ER + BC but not for ER- BC, which suggests that CD has a greater impact on the prevalence of ER + BC. A previous meta-analysis of cohort studies, however, failed to confirm a significant correlation between CD and an increased risk of BC [11]. Another study published also failed to confirm the causal relationship [12]. Additionally, the results of a more recent third study confirmed that CD is a risk factor for the development of BC [10]. This finding aligns with the overall trend observed in our present study, with discrepancies possibly arising from differences in parameter settings, such as the use of KB > 5000 by the author of the third study. Moreover, studies have demonstrated that patients with CD and their relatives have a high prevalence of BC, which is more advanced when detected [8, 9]. It has been shown that BC patients with CD tend to have later stage and worse prognosis than BC patients without CD [26]. This suggests that patients with CD, especially premenopausal women with CD, and their first-degree family members may need to be actively monitored for gastrointestinal and breast health, as well as actively treated for CD.

It is important to emphasize that causality analysis is a complex task that requires the integration of several factors and necessitates in-depth research at multiple levels, including genetic, molecular and clinical investigations [13]. While our study has uncovered a potential causal relationship between CD and BC, further experimental and clinical investigations are imperative to validate this finding and delve into the underlying biological mechanisms. In this context, our research delves into the transcriptome gene correlation analysis between CD and BC. We employed a comprehensive approach, including WGCNA and annotations by GO and KEGG, to scrutinize the 557 DEGs common to CD and BC. Our results affirm a high correlation between 274 genes within the ‘turquoise’ and ‘brown’ module and BC incidence. Furthermore, some genes within the ‘turquoise’ and ‘brown’ module exhibit a correlation with patients’ OS. A LASSO regression analysis were instrumental in determining prognostically relevant hub genes, including B4GALNT2, C10orf10, FGF19, HLF, and SAP30. Based on the coef values and gene expression data, we devised a RiskScore and a nomogram to predict prognosis for patients. Notably, FGF19 is a protective factor for prognosis, demonstrating a downregulation trend with an increase in the risk score, while B4GALNT2, C10orf10, HLF and SAP30 genes are prognostic risk factors.

B4GALNT2 is crucial for glycoprotein synthesis and it’s significant in the SID blood group system and melanoma prognosis [27, 28]. Research has looked into using Hsp1-Hsp2Cas9-Y to knock out B4GALNT2 in porcine fetal fibroblasts (PFF) [29]. This could potentially benefit CD-related BC prognosis. C10orf10 plays a role in cell cycle regulation and DNA repair [30]. Lower expression levels correlate with worse BC outcomes [31] and higher levels are linked to glioma proliferation [32]. These findings suggest C10orf10’s relevance not only to gliomas but also to the prognosis of CD-related BC. FGF19 is part of the fibroblast growth factor family, vital for cellular growth and tissue repair [33]. Increased FGF19 levels were found in nasopharyngeal carcinoma (NPC) patients, suggesting its role in disease progression [34]. FGF19 is also associated with gallbladder cancer [35]. This is different from the prognostic protective factor in this study, therefore, further validation of the role of FGF19 in BC is warranted. HLF is a transcription factor, a type of protein that plays a crucial role in regulating gene expression. Some findings indicated that HLF regulates apoptosis and autophagy [36]. SAP30 proteins typically regulate gene expression. They interact with Sin3A proteins to form the Sin3 complex, which impacts gene silencing and expression [37]. Our study revealed significant differential expression of SAP30 in a novel BC molecular classification and was associated with BC prognosis. Although the exact mechanism of SAP30 is not fully understood, SAP30 has been found to be involved in the regulation of gene expression and chromatin structure in other studies. Studies have demonstrated that SAP30 is associated with BC poor prognosis, consistent with the findings in this paper [37]. This study indicates that the RiskScore derived from DEGs associated with CD can effectively stratify BC patients into high-risk(H) and low-risk(L) groups. Kaplan-Meier survival curves reveal a significant difference in OS between these two groups. Thus, RiskScore serves as a robust predictor of clinical endpoints. Overall, our findings signify the potential utility of these molecular risk assessment in optimizing patient care and prognosis for those grappling with both CD and BC. Nonetheless, further studies and clinical validations are necessary to cement the clinical applicability of these findings.

This study conducted a novel molecular classification of 1034 BC patients based on prognostic genes associated with CD. The patients were categorized into three subgroups, namely CD-BC1, CD-BC2, and CD-BC3. Significant differences in tumor immune infiltration and enrichment of signaling pathways were observed among these subgroups. These findings provide a plausible explanation for the differing prognostic outcomes in these three subgroups. As seen in the previous section, the immune microenvironment and inflammatory response closely link the two diseases, CD and BC, which are seemingly unrelated. It has been shown that the systemic inflammatory response in patients with ulcerative colitis (UC) leads to the down-regulation of BC resistance protein (BCRP/ABCG2), which in turn induces breast carcinogenesis [38, 39]. CD and UC belong to the same group of infammatory bowel disease [40]. Therefore, this may also provide theoretical support for the mechanism in this study, but this needs to be further verified by relevant basic research. Differential gene GO, KEGG enrichment analysis in this study similarly enriched numerous immune-inflammation-related pathways.

Conclusions

In our study, we established that CD is a significant risk factor for BC development, especially in hormone receptor-positive (ER+) BC cases. However, the association between CD and BC was not significant in hormone receptor-negative (ER-) BC cases. Our reverse validation confirmed that BC does not directly affect CD. Regarding prognosis, we identified five genes, B4GALNT2, C10orf10, FGF19, HLF, and SAP30. We developed a prognostic model presented as a nomogram, with a strong predictive value, offering valuable insights into patient prognosis and individualized treatment. Furthermore, based on the aforementioned genes, we classified BC patients into three new groups characterized by differences in pathway enrichment and significant variations in immune profiles. To conclude, our study sheds light on the complex relationship between CD and BC, with specific relevance to certain BC subtypes. These findings have the potential to guide future disease prevention, treatment, and personalized medicine efforts.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

CD:

Crohn’s Disease

BC:

Breast Cancer

MR:

Mendelian Randomization

GWAS:

Genome Wide Association Studies

IVW:

Inverse-variance Weighted

ER+:

Estrogen Receptor-Positive

ER-:

Estrogen Receptor-Negative

IVs:

Instrumental Variables

SNPs:

Single Nucleotide Polymorphisms

TCGA:

The Cancer Genome Atlas

BRCA:

Breast Cancer

LASSO:

Least Absolute Shrinkage and Selection Operator

DEGs:

Differentially Expressed Genes

OS:

Overall Survival

CDF:

Cumulative Distribution Function

ssGSEA:

Single-sample Gene Set Enrichment Analysis

GSVA:

Gene Set Variation Analysis

UC:

Ulcerative Colitis

TOM:

Topological Overlap Measure

Significance levels:

**** (P < 0.0001), *** (P < 0.001), ** (P < 0.01), * (P < 0.05)

References

  1. Roda G, Chien Ng S, Kotze PG, et al. Crohn’s disease. Nat Rev Dis Primers. 2020;6(1):22.

    Article  PubMed  Google Scholar 

  2. Sung H, Ferlay J, Siegel RL, et al. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and Mortality Worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  PubMed  Google Scholar 

  3. Le Berre C, Ananthakrishnan AN, Danese S, Singh S, Peyrin-Biroulet L. Ulcerative Colitis and Crohn’s Disease have similar burden and goals for treatment. Clin Gastroenterol Hepatol. 2020;18(1):14–23.

    Article  PubMed  Google Scholar 

  4. Donegan WL. Cancer of the breast in men. CA Cancer J Clin. 1991;41(6):339–54.

    Article  CAS  PubMed  Google Scholar 

  5. Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol. 2022;95(1130):20211033.

    Article  PubMed  Google Scholar 

  6. Burocziova M, Grusanovic S, Vanickova K, Kosanovic S, Alberich-Jorda M. Chronic inflammation promotes Cancer Progression as a second hit. Exp Hematol 2023.

  7. Kim ES, Kim SY, Moon A. C-Reactive protein signaling pathways in Tumor Progression. Biomol Ther (Seoul). 2023;31(5):473–83.

    Article  CAS  PubMed  Google Scholar 

  8. Pellino G, Sciaudone G, Patturelli M, et al. Relatives of Crohn’s disease patients and breast cancer: an overlooked condition. Int J Surg. 2014;12(Suppl 1):S156–158.

    Article  PubMed  Google Scholar 

  9. Riegler G, Caserta L, Castiglione F, et al. Increased risk of breast cancer in first-degree relatives of Crohn’s disease patients. An IG-IBD study. Dig Liver Dis. 2006;38(1):18–23.

    Article  CAS  PubMed  Google Scholar 

  10. Gao H, Zheng S, Yuan X, Xie J, Xu L. Causal association between inflammatory bowel disease and 32 site-specific extracolonic cancers: a mendelian randomization study. BMC Med. 2023;21(1):389.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gong C, Xu R, Zou P, Zhang Y, Wang X. Inflammatory bowel disease and risk of breast cancer: a meta-analysis of cohort studies. Eur J Cancer Prev. 2022;31(1):54–63.

    Article  CAS  PubMed  Google Scholar 

  12. Lu Y, Ma L. Investigation of the causal relationship between breast cancer and autoimmune diseases: a bidirectional mendelian randomization study. Med (Baltim). 2023;102(34):e34612.

    Article  CAS  Google Scholar 

  13. Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37(7):658–65.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Hemani G, Tilling K, Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13(11):e1007081.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Tang J, Kong D, Cui Q, et al. Prognostic genes of breast Cancer identified by Gene Co-expression Network Analysis. Front Oncol. 2018;8:374.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized Linear models via Coordinate Descent. J Stat Softw. 2010;33(1):1–22.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Buergy D, Würschmidt F, Gkika E, et al. Stereotactic body radiotherapy of adrenal metastases-A dose-finding study. Int J Cancer. 2022;151(3):412–21.

    Article  CAS  PubMed  Google Scholar 

  19. Xu Q, Chen S, Hu Y, Huang W. Landscape of Immune Microenvironment under Immune Cell infiltration pattern in breast Cancer. Front Immunol. 2021;12:711433.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Liu TT, Li R, Huo C, et al. Identification of CDK2-Related Immune Forecast Model and ceRNA in Lung Adenocarcinoma, a Pan-cancer Analysis. Front Cell Dev Biol. 2021;9:682002.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hu X, Ni S, Zhao K, Qian J, Duan Y. Bioinformatics-Led Discovery of Osteoarthritis biomarkers and inflammatory infiltrates. Front Immunol. 2022;13:871008.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhao P, Zhen H, Zhao H, Huang Y, Cao B. Identification of hub genes and potential molecular mechanisms related to radiotherapy sensitivity in rectal cancer based on multiple datasets. J Transl Med. 2023;21(1):176.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Wang L, Wang D, Yang L, et al. Cuproptosis related genes associated with Jab1 shapes tumor microenvironment and pharmacological profile in nasopharyngeal carcinoma. Front Immunol. 2022;13:989286.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Bravi F, Decarli A, Russo AG. Risk factors for breast cancer in a cohort of mammographic screening program: a nested case-control study within the FRiCaM study. Cancer Med. 2018;7(5):2145–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Søgaard KK, Cronin-Fenton DP, Pedersen L, Sørensen HT, Lash TL. Survival in Danish patients with breast cancer and inflammatory bowel disease: a nationwide cohort study. Inflamm Bowel Dis. 2008;14(4):519–25.

    Article  PubMed  Google Scholar 

  27. Stenfelt L, Nilsson J, Hellberg Å et al. Glycoproteomic and phenotypic elucidation of B4GALNT2 expression variants in the SID Histo-Blood Group System. Int J Mol Sci 2022;23(7).

  28. Ke G, Cheng N, Sun H, Meng X, Xu L. Explore the impact of hypoxia-related genes (HRGs) in cutaneous melanoma. BMC Med Genomics. 2023;16(1):160.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Yamada M, Watanabe Y, Gootenberg JS, et al. Crystal structure of the minimal Cas9 from Campylobacter jejuni reveals the Molecular Diversity in the CRISPR-Cas9 systems. Mol Cell. 2017;65(6):1109–e11211103.

    Article  CAS  PubMed  Google Scholar 

  30. Salcher S, Hermann M, Kiechl-Kohlendorfer U, Ausserlechner MJ, Obexer P. C10ORF10/DEPP-mediated ROS accumulation is a critical modulator of FOXO3-induced autophagy. Mol Cancer. 2017;16(1):95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Deng J, Dong Y, Li C, et al. Decreased expression of C10orf10 and its prognostic significance in human breast cancer. PLoS ONE. 2014;9(6):e99730.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Chen Y, Tang M, Li H, Huang J. Effects of C10orf10 on growth and prognosis of glioma under hypoxia. Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2023;48(4):499–507.

    PubMed  Google Scholar 

  33. Shi L, Zhao T, Huang L, et al. Engineered FGF19(∆KLB) protects against intrahepatic cholestatic liver injury in ANIT-induced and Mdr2-/- mice model. BMC Biotechnol. 2023;23(1):43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Shi S, Zhang Q, Zhang K et al. FGF19 promotes nasopharyngeal carcinoma progression by inducing angiogenesis via inhibiting TRIM21-mediated ANXA2 ubiquitination. Cell Oncol (Dordr). 2023.

  35. Chen T, Liu H, Liu Z, et al. FGF19 and FGFR4 promotes the progression of gallbladder carcinoma in an autocrine pathway dependent on GPBAR1-cAMP-EGR1 axis. Oncogene. 2021;40(30):4941–53.

    Article  CAS  PubMed  Google Scholar 

  36. Xue P, Liu Y, Wang H, Huang J, Luo M. miRNA-103-3p-Hlf regulates apoptosis and autophagy by targeting hepatic leukaemia factor in heart failure. ESC Heart Fail. 2023;10(5):3038–45.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Bao L, Kumar A, Zhu M et al. SAP30 promotes breast tumor progression by bridging the transcriptional corepressor SIN3 complex and MLL1. J Clin Invest 2023;133(17).

  38. Gutmann H, Hruz P, Zimmermann C, et al. Breast cancer resistance protein and P-glycoprotein expression in patients with newly diagnosed and therapy-refractory ulcerative colitis compared with healthy controls. Digestion. 2008;78(2–3):154–62.

    Article  CAS  PubMed  Google Scholar 

  39. Englund G, Jacobson A, Rorsman F, Artursson P, Kindmark A, Rönnblom A. Efflux transporters in ulcerative colitis: decreased expression of BCRP (ABCG2) and pgp (ABCB1). Inflamm Bowel Dis. 2007;13(3):291–7.

    Article  PubMed  Google Scholar 

  40. Kaplan GG, Windsor JW. The four epidemiological stages in the global evolution of inflammatory bowel disease. Nat Rev Gastroenterol Hepatol. 2021;18(1):56–66.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the creation of the flowchart in this article by Figdraw.

Funding

This work was supported by the High-level Talent Introduction Project of Fujian Cancer Hospital (Grant/Award Number: F2328R-GC301-01); The High-level Talent Training Program of Fujian Cancer Hospital (Grant/Award Number: 2024YNG03).

Author information

Authors and Affiliations

Authors

Contributions

YX and YYS contributed equally to this work. SCG and YX conceived the study. YYS and YX had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. YX and YXQ performed data analyses. YYS and HXW wrote the first draft of this manuscript. JZR, WQ and SCG critically revised the manuscript. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Chuangui Song.

Ethics declarations

Ethics approval and consent to participate

Given that this study utilized data sourced from publicly available datasets, neither ethics approval nor informed consent were deemed necessary.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, X., Yu, Y., Huang, X. et al. Unraveling the causal links and novel molecular classification of Crohn’s disease in breast Cancer: a two-sample mendelian randomization and transcriptome analysis with prognostic modeling. BMC Cancer 24, 1134 (2024). https://doi.org/10.1186/s12885-024-12838-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12885-024-12838-x

Keywords