- Research
- Open access
- Published:
Unraveling the causal links and novel molecular classification of Crohn’s disease in breast Cancer: a two-sample mendelian randomization and transcriptome analysis with prognostic modeling
BMC Cancer volume 24, Article number: 1134 (2024)
Abstract
Background
Crohn’s disease (CD), a prominent manifestation of chronic gastrointestinal inflammation, and breast cancer (BC), seemingly disparate in the medical domain, exhibit a shared characteristic. This convergence arises from their involvement in chronic inflammation and immune responses, an aspect that has progressively captivated the attention of investigators but remain controversial.
Methods
We used two-sample Mendelian Randomization (MR) and transcriptomics to explore the relationship between CD and BC. MR assessed causality of CD on different BC subtypes and reverse causality of BC on CD. We identified CD-related differentially expressed genes and their prognostic impact on BC, and developed a new molecular BC classification based on these key genes.
Results
MR revealed a causal link between CD and increased BC risk, especially in estrogen receptor-positive (ER+) patients, but not in ER-negative (ER-) cases. BC showed no causal effect on CD. Transcriptomics pinpointed genes like B4GALNT2 and FGF19 that affected BC prognosis in CD patients. A nomogram based on these genes predicted BC outcomes with high accuracy. Using these genes, a new molecular classification of BC patients was proposed.
Conclusions
CD is a risk factor for ER + BC but not for ER- BC. BC does not causally affect CD. Our prognostic model and new BC molecular classifications offer insights for personalized treatment strategies.
Background
Breast cancer (BC) and Crohn’s disease (CD) are distinct medical conditions, with BC being a prevalent malignancy among women and CD being a chronic inflammatory disease of the gastrointestinal tract [1, 2]. Currently, there’s no consensus on the relationship between them. This study aims to explore any underlying association between these diseases. While CD mainly affects the gastrointestinal tract, leading to inflammation, ulceration, and stenosis, its exact cause remains uncertain, though genetics, environmental factors, and changes in gut microbiota are believed to play roles [3]. In contrast, BC is a widespread malignant tumor that originates in breast tissue and is also possible in men, only at a lower incidence rate [4]. BC is currently the most commonly diagnosed cancer in the world, posing a significant burden to healthcare systems and human health alike [5].
Although CD and BC arise in distinct physiological systems, there is a burgeoning interest in investigating the potential connection between these two diseases, because both diseases share a fundamental characteristic of chronic inflammation and an associated immune response. Chronic inflammation has been identified as a potential catalyst for tumorigenesis in a variety of contexts [6, 7]. Some studies have confirmed that first-degree relatives of individuals with CD disease exhibit a higher incidence of BC [8, 9]. Therefore, this has also prompted us to investigate the potential interactions between CD and BC. However, studies on the effect of the chronic inflammatory disease CD on BC are scarce and have yielded mixed conclusions [10,11,12]. Moreover, to our knowledge, there have been no further studies on the subtypes, mechanisms and immunologic effects of CD on BC. Moreover, the investigation of the intricate relationship between CD and BC is expected to reveal potential common molecular pathways and common risk factors that influence the development of these diseases. These findings hold the potential to open avenues for novel strategies in risk stratification, early detection, and personalized therapeutic interventions.
Our research focuses on these interactions, exploring shared mechanisms and clinical implications. We believe that understanding their connection is vital for better medical care. To minimize interference from other factors, we used a two-sample MR approach to analyze the link between CD and BC risk, including its different forms. Two-Sample MR uses genetic variants as instrumental variables (IVs) to infer causal relationships, effectively controlling for confounding factors in observational studies. The core principle is that if a genetic variant is associated with an outcome solely through its effect on an exposure, then this genetic variant can serve as an instrumental variable to estimate the causal effect of the exposure on the outcome [13]. We also conducted transcriptomic, molecular typing, and immunocorrelation analyses to deepen our understanding of their association. Through this comprehensive exploration, we aim to provide insights beneficial for future medical practices and expand knowledge on CD and BC.
Methods
The schematic representation of our research design is illustrated (Fig. 1). The sources of MR and transcriptome data are publicly accessible online.
Mendelian randomization analysis
We employed genetic markers drawn from the GWAS repository (https://gwas.mrcieu.ac.uk) as IVs. We employed a two-sample MR technique, supplemented with bidirectional MR for robustness. To better understand links between CD and BC variants, we separately assessed associations with ER + and ER- breast cancers. Further details on these groupings are delineated in Table S1.
To ensure analytical rigor and mitigate confounders, the following single nucleotide polymorphisms (SNPs) criteria as IVs were set: (1) Independence among SNPs; (2) Specific thresholds: P < 5e-08, r2 < 0.001, and clumping distance exceeding 10,000 kb, ensuring minimized linkage effects; (3) A demonstrable strength in the SNPs, as evidenced by F-statistics above 20; (4) Using the PhenoScanner tool (http://www.phenoscanner.medschl.cam.ac.uk/). We used Phenoscanner to validate that each selected SNP primarily influences the exposure factor, in this case, Crohn’s disease, and not other traits that could confound the results. Analyses were executed using the ‘Two Sample MR’ R package and the IVW (Inverse-variance Weighted) method was our primary tool [14].
Transcriptomic data analysis
Data Acquisition and differentially expressed genes (DEGs) analysis
Differential expression analysis for the CD expression profile dataset GSE69762 (N = 30) obtained from the GEO database (https://www.ncbi.nlm.nih.gov/) used the ‘limma’ R package [15]. DEGs were identified with a significance threshold of P < 0.05 and a minimum fold change of 1.2. For The Cancer Genome Atlas (TCGA) (https://www.cancer.gov/) BC dataset (N = 1218), we conducted Principal Component Analysis (PCA) following z-score normalization and the removal of outlier samples thereby ensuring that subsequent analyses were performed on a dataset of high quality and homogeneity. We performed Limma analysis with a significance threshold of P < 0.05 and a minimum fold change of 1.5 to identify DEGs of BC. Subsequently, we further assessed the overlap between DEGs identified in CD and BC.
Weighted Gene Co-expression Network Analysis (WGCNA) of differential genes
Utilizing WGCNA, we analyzed a group of 549 detected DEGs. For analytical robustness, the Median Absolute Deviation (MAD) for every gene was computed initially. Using the ‘goodSamplesGenes’ function within the WGCNA package in R, we pinpointed and excluded outlier genes and samples [16].
Enrichment analysis
An enrichment assessment was carried out on 549 DEGs, with 8 genes being omitted due to ineffective clustering. For this analysis, we sourced Gene Ontology (GO) annotations and utilized the Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway annotations.
Regression analysis and prognostic prediction model construction
In this study, we selected a cohort of 1034 patients excluding those with incomplete clinical information and patients with a follow-up time of zero from TCGA BC dataset. In our research, we employed the ‘glmnet’ package in R to execute regression through the Least Absolute Shrinkage and Selection Operator combined with the Cox Proportional-Hazards approach (Lasso-Cox) [17]. We also implemented a 10-fold cross-validation for optimal model determination. The choice of lambda was determined through cross-validation, where we tested a range of lambda values and selected the one that minimized the cross-validation error. This approach ensures that the lambda parameter optimally balances the trade-off between model complexity and predictive accuracy, thereby improving the generalizability of our model to new data. Based on the 5 prognostically relevant genes expression data and coef values, we constructed a RiskScore = 0.0135*B4GALNT2 + 0.0184*C10orf10-0.0266*FGF19 + 0.0354*HLF + 0.0748*SAP30. The determination of the optimal cutoff value for the riskscore was executed using the ‘maxstat’ [18]. Subsequently, we harnessed the ‘survival’ package to scrutinize the differences in overall survival (OS) [19]. We undertook an in-depth analysis that combined several clinical indicators, such as survival duration, survival outcome, RiskScore, age, T stage, N stage, M stage, and other pertinent data. This was achieved using the ‘rms’ package in R [20]. For evaluating the precision of our models, we initiated Receiver Operating Characteristic (ROC) curve analysis using the ‘pROC’ package [21]. GSE20685(N = 327) obtained from the GEO database (https://www.ncbi.nlm.nih.gov/) was used to validate the accuracy of the predictive model.
Novel molecular classification of Crohn’s disease in breast cancer
The key genes for molecular classification were determined through Lasso-Cox analysis. The R package ‘ConsensusClusterPlus’ was utilized for executing consensus clustering on the TCGA-BRCA dataset [22]. The optimal k was determined using Euclidean distances with the k-means algorithm. We subsequently scrutinized the associations between subgroups and gene expressions. Following this, the Gene Set Variation Analysis (GSVA) was employed to compare the variations in gene sets between different groups, based on the R package ‘GSVA’ [23]. Additionally, we performed single-sample Gene Set Enrichment Analysis (ssGSEA) between different subgroups using the R package ‘GSEABase’ [24]. CIBERSORT was to employed analyze immune cell infiltration. We used the Wilcoxon test for comparisons between two groups, and the Kruskal-Wallis test for comparisons of more than two groups.
Results
Results of studies related to mendelian randomization
Details regarding the inclusion of SNPs in each exposure and outcome group can be found in Table 1. The results from the IVW analysis indicate a causal relationship between CD and BC. CD is identified as a risk factor for BC (OR = 1.0431, 95%CI:1.0149–1.0721, P = 0.0025). Moreover, tests for genetic pleiotropy and heterogeneity did not find statistically significant differences (P > 0.05), suggesting the reliability and stability of our results. Furthermore, we created SNPs forest plots and conducted leave-one-out validation. These results confirm that CD is a risk factor for BC, and the causal relationship is not driven by a single SNP (Fig. 2A-B). All MR methods show consistent directional predictions, indicating that the predictions are uniformly positive (Fig. 2C). The funnel plot exhibits excellent bilateral symmetry (Fig. 2D). In contrast, the reverse validation for the relationship between BC and CD shows that the causal relationship is not established (P > 0.05) (Figure S1A-B).
To further investigate the impact of CD on BC subtypes, our results indicate that CD has a causal relationship with ER + BC (OR = 1.0213, 95%CI:1.0026–1.0404, P = 0.0258). There is some horizontal heterogeneity in the results(P < 0.05), therefore a random effects IVW analysis was used. Importantly, these results were verified to be free from genetic pleiotropy (P > 0.05). Furthermore, SNPs forest plots and leave-one-out validation provide additional evidence that CD is a risk factor for ER + BC, with the causal relationship not being influenced by a single SNP (Fig. 2E-F). Consistently, scatter plots confirm the consistent direction of all methods predictions (Fig. 2G). The funnel plot also demonstrates excellent bilateral symmetry (Fig. 2H). However, it’s worth noting that the causal relationship between CD and ER- BC was not established (P > 0.05) according to the validation results (Figure S1C-D).
Transcriptome analysis results
Regarding the differential expression analysis in the GSE69762 CD dataset using limma, we observed that the CD group exhibits significant differential expression compared to the control group. There were 880 upregulated genes and 842 downregulated genes. (Fig. 3A-B). Performing PCA analysis on the BC dataset (Figure S1E). After removing outlier samples, we observed 3,134 significantly upregulated genes and 4,496 downregulated genes in BC relative to normal samples (Fig. 3C-D). Interestingly, there is an overlap of 557 differentially expressed genes between CD and BC (Fig. 3E).
We conducted WGCNA based on the data of 557 shared differentially expressed genes. Firstly, we determined the optimal soft threshold value as 4 through analysis of scale independence and average connectivity (Fig. 4A-B). The selection of this soft threshold value improved the quality of sample clustering (Figure S1F) and gene clustering (Fig. 4C). The results of the WGCNA analysis revealed a total of 6 co-expression modules (Fig. 4D). Notably, the ‘grey’ module comprised 8 genes that could not be assigned to any other module. Additionally, the ‘turquoise’ and ‘brown’ module encompassed 274 differentially expressed genes highly correlated with BC (Fig. 4E). The correlation was further confirmed by the scatterplot of the Gene Significance (GS) and Module Membership (MM) of these genes (Fig. 4F-G). After omitting 8 genes from the ‘grey’ module, we proceeded with both GO and KEGG enrichment analyses to gain insights into the biological roles and pathways of the remaining 549 differentially expressed genes. The results highlight the top ten GO enrichment categories, indicating a profound enrichment of the genes in fundamental biological undertakings, notably immune regulation and cell signaling (Fig. 4H, S1G-H). These genes also prominently feature in cellular structures like the cytoplasm and cell membrane and partake in molecular functions encompassing receptor binding, growth factor activation, and signal transduction. Complementing these insights, the KEGG analysis pinpoints the major involvement of these genes in several tumor immunity-related pathways (Fig. 4I). Noteworthy are the TNF signaling pathway and the pathways associated with PD-L1 expression and the PD-1 checkpoint. The centrality of these pathways in governing immune reactions, cellular signaling, and inflammatory responses underscores the genes’ pivotal roles in influencing tumor immunity.
Nomogram model construction
By setting the Lambda parameter to 0.0243 and merging the regression coefficient results (coef values) with gene expression profiles, we derived a ‘RiskScore’ based on 5 genes. (Fig. 5A-B). In our survival analysis, we set the optimal threshold at 1.0173, categorizing patients into high-risk and low-risk cohorts. This distinction yielded a pronounced prognosis difference between the groups (P < 0.05) (Fig. 5C). Remarkably, FGF19 exhibited a protective influence, whereas B4GALNT2, C10orf10, HLF, and SAP30 emerged as risk determinants (Fig. 5D). In order to delve deeper into the potential of the RiskScore predictive model for prognosis in BC patients, we concurrently considered clinical factors such as age and TNM staging, and conducted Cox regression analysis. Our results unequivocally reveal a significant and pivotal correlation between age and RiskScore with the prognosis of BC patients. Furthermore, we vividly illustrate the impact of different T, N and M subgroups on prognosis (Fig. 5E). This visual representation assists us in gaining a more profound understanding of the distinct roles played by various staging categories in the prognosis of BC patients. Using the Cox method, we constructed a nomogram to assess the prognostic significance of the aforementioned features in the cohort of 1034 samples. The overall C-index of the model was 0.78, 95%CI: (0.74–0.82), P < 0.001 (Fig. 5F). The calibration curve demonstrated good performance (Fig. 5G). The optimal cutoff value for the predictive model score was determined to be 0.3282. Based on this value, patients were stratified into high-risk (H) and low-risk (L) groups. Further analysis and testing revealed a significant difference in OS between the two groups (HR = 4.52, 95%CI:3.20–6.38, P < 0.001) (Fig. 5H). ROC curve analysis was used to verify the accuracy of the model (Fig. 5I).
To validate our prognostic prediction model, we used the GSE20685 dataset as an external validation cohort. The optimal cutoff value for the predictive model score, determined from the training dataset, was 0.3282. Using this cutoff, we classified 327 breast cancer patients into high-risk (H) and low-risk (L) groups. A significant difference in survival was observed between the two groups (P < 0.001) (Fig. 5J). To further verify the accuracy of the survival predictions in the validation cohort, we performed ROC curve analysis (Fig. 5K-L), demonstrating the model’s robustness and reliability in distinguishing between high-risk and low-risk patients.
Novel molecular classification of Crohn’s disease in breast cancer
In our endeavor to uncover a new molecular categorization of BC informed by prognostic gene expression, we undertook unsupervised class discovery for 1034 BC patients from the TCGA database. We ascertained the ideal cluster count through alterations in the cumulative distribution function (CDF) curve area and through consensus heatmaps (Figures S1I-J), and it was identified as 3 clusters (k = 3) (Fig. 6A). The heatmap reveals strong consensus within these groups, as indicated by the blue blocks along the diagonal. Subsequently, we categorized all 1034 patients into three subgroups: CD-BC1 (367 cases, 35.5% of the total), CD-BC2 (239 cases, 23.1% of the total), and CD-BC3 (428 cases, 41.4% of the total). We observed substantial differences in gene expression within the new BC clustering modules, in the genes B4GALNT2, C10orf10, FGF19, HLF, and SAP30 (Fig. 6B). In the analyzed data, the gene B4GALNT2 stood out for its significant high expression in the CD-BC2 group. This marked upregulation suggests the potential of B4GALNT2 as a marker gene for diagnostic subgroups, especially for distinguishing the CD-BC2 group from others. Concurrently, both HLF and C10orf10 exhibited marked down-regulation in the CD-BC1 subgroup, while showing notable up-regulation in the CD-BC3 cohort. The trend for SAP30 was observed to be the inverse. Furthermore, immunoinfiltration analysis indicated marked differences in tumor immune infiltrating cells between different modules (Fig. 6C). The infiltration level of several immune cells, like activated B cells, activated CD4+ T cells, MDSC (Myeloid-Derived Suppressor Cells), and macrophages, displays significant variations among the clusters. Lastly, through ssGSEA, we identified significant differences in the enrichment of signaling pathways among these three novel molecular subgroups (Fig. 6D-F). The color gradients, from blue (low enrichment) to red (high enrichment), illustrate the pathway enrichment intensities. Compared to the other two groups, the CD-BC2 group distinctly exhibited upregulation in the pathways ‘Basal Cell Carcinoma’ and ‘Hedgehog Signaling Pathway’. This marked increase in pathway activity may offer insights into the pathogenic mechanism and biological behavior of breast cancer in the groups, providing a more detailed explanation for its underlying processes. These findings underscore the clinical significance of our new molecular classification in terms of functional enrichment and immune characteristics, offering valuable insights for the treatment and management of BC patients.
Discussion
Inflammation and immune-related aspects have long been pivotal in cancer research. CD is an autoimmune disorder characterized by an exaggerated immune response against the normal intestinal flora, causing damage to the intestinal mucosal tissues, intestinal wall thinning, and the formation of ulcers and fistulas [1]. BC is a serious and heterogeneous disease characterized by the excessive proliferation of breast epithelial cells, leading to high morbidity [25]. Despite their significance, there remains a need for comprehensive investigations into the pathogenesis of both CD and BC, particularly in relation to their potential interconnection. To address potential confounding and reverse causality, we conducted an MR analysis to explore the existence of a causal link between CD and BC. For a deeper investigation into the underlying mechanisms and impacts, we developed and validated a genetic prognostic model through gene transcriptome analysis. Furthermore, we introduced an innovative molecular classification specific to BC. We also examined its influence on the tumor immune microenvironment in BC.
The results demonstrated the establishment of a causal relationship between CD and the development of BC, validating CD as a risk factor for BC development. And this study further explored to validate the effect of CD on BC subtypes, and surprisingly found that the causal relationship of CD was established for ER + BC but not for ER- BC, which suggests that CD has a greater impact on the prevalence of ER + BC. A previous meta-analysis of cohort studies, however, failed to confirm a significant correlation between CD and an increased risk of BC [11]. Another study published also failed to confirm the causal relationship [12]. Additionally, the results of a more recent third study confirmed that CD is a risk factor for the development of BC [10]. This finding aligns with the overall trend observed in our present study, with discrepancies possibly arising from differences in parameter settings, such as the use of KB > 5000 by the author of the third study. Moreover, studies have demonstrated that patients with CD and their relatives have a high prevalence of BC, which is more advanced when detected [8, 9]. It has been shown that BC patients with CD tend to have later stage and worse prognosis than BC patients without CD [26]. This suggests that patients with CD, especially premenopausal women with CD, and their first-degree family members may need to be actively monitored for gastrointestinal and breast health, as well as actively treated for CD.
It is important to emphasize that causality analysis is a complex task that requires the integration of several factors and necessitates in-depth research at multiple levels, including genetic, molecular and clinical investigations [13]. While our study has uncovered a potential causal relationship between CD and BC, further experimental and clinical investigations are imperative to validate this finding and delve into the underlying biological mechanisms. In this context, our research delves into the transcriptome gene correlation analysis between CD and BC. We employed a comprehensive approach, including WGCNA and annotations by GO and KEGG, to scrutinize the 557 DEGs common to CD and BC. Our results affirm a high correlation between 274 genes within the ‘turquoise’ and ‘brown’ module and BC incidence. Furthermore, some genes within the ‘turquoise’ and ‘brown’ module exhibit a correlation with patients’ OS. A LASSO regression analysis were instrumental in determining prognostically relevant hub genes, including B4GALNT2, C10orf10, FGF19, HLF, and SAP30. Based on the coef values and gene expression data, we devised a RiskScore and a nomogram to predict prognosis for patients. Notably, FGF19 is a protective factor for prognosis, demonstrating a downregulation trend with an increase in the risk score, while B4GALNT2, C10orf10, HLF and SAP30 genes are prognostic risk factors.
B4GALNT2 is crucial for glycoprotein synthesis and it’s significant in the SID blood group system and melanoma prognosis [27, 28]. Research has looked into using Hsp1-Hsp2Cas9-Y to knock out B4GALNT2 in porcine fetal fibroblasts (PFF) [29]. This could potentially benefit CD-related BC prognosis. C10orf10 plays a role in cell cycle regulation and DNA repair [30]. Lower expression levels correlate with worse BC outcomes [31] and higher levels are linked to glioma proliferation [32]. These findings suggest C10orf10’s relevance not only to gliomas but also to the prognosis of CD-related BC. FGF19 is part of the fibroblast growth factor family, vital for cellular growth and tissue repair [33]. Increased FGF19 levels were found in nasopharyngeal carcinoma (NPC) patients, suggesting its role in disease progression [34]. FGF19 is also associated with gallbladder cancer [35]. This is different from the prognostic protective factor in this study, therefore, further validation of the role of FGF19 in BC is warranted. HLF is a transcription factor, a type of protein that plays a crucial role in regulating gene expression. Some findings indicated that HLF regulates apoptosis and autophagy [36]. SAP30 proteins typically regulate gene expression. They interact with Sin3A proteins to form the Sin3 complex, which impacts gene silencing and expression [37]. Our study revealed significant differential expression of SAP30 in a novel BC molecular classification and was associated with BC prognosis. Although the exact mechanism of SAP30 is not fully understood, SAP30 has been found to be involved in the regulation of gene expression and chromatin structure in other studies. Studies have demonstrated that SAP30 is associated with BC poor prognosis, consistent with the findings in this paper [37]. This study indicates that the RiskScore derived from DEGs associated with CD can effectively stratify BC patients into high-risk(H) and low-risk(L) groups. Kaplan-Meier survival curves reveal a significant difference in OS between these two groups. Thus, RiskScore serves as a robust predictor of clinical endpoints. Overall, our findings signify the potential utility of these molecular risk assessment in optimizing patient care and prognosis for those grappling with both CD and BC. Nonetheless, further studies and clinical validations are necessary to cement the clinical applicability of these findings.
This study conducted a novel molecular classification of 1034 BC patients based on prognostic genes associated with CD. The patients were categorized into three subgroups, namely CD-BC1, CD-BC2, and CD-BC3. Significant differences in tumor immune infiltration and enrichment of signaling pathways were observed among these subgroups. These findings provide a plausible explanation for the differing prognostic outcomes in these three subgroups. As seen in the previous section, the immune microenvironment and inflammatory response closely link the two diseases, CD and BC, which are seemingly unrelated. It has been shown that the systemic inflammatory response in patients with ulcerative colitis (UC) leads to the down-regulation of BC resistance protein (BCRP/ABCG2), which in turn induces breast carcinogenesis [38, 39]. CD and UC belong to the same group of infammatory bowel disease [40]. Therefore, this may also provide theoretical support for the mechanism in this study, but this needs to be further verified by relevant basic research. Differential gene GO, KEGG enrichment analysis in this study similarly enriched numerous immune-inflammation-related pathways.
Conclusions
In our study, we established that CD is a significant risk factor for BC development, especially in hormone receptor-positive (ER+) BC cases. However, the association between CD and BC was not significant in hormone receptor-negative (ER-) BC cases. Our reverse validation confirmed that BC does not directly affect CD. Regarding prognosis, we identified five genes, B4GALNT2, C10orf10, FGF19, HLF, and SAP30. We developed a prognostic model presented as a nomogram, with a strong predictive value, offering valuable insights into patient prognosis and individualized treatment. Furthermore, based on the aforementioned genes, we classified BC patients into three new groups characterized by differences in pathway enrichment and significant variations in immune profiles. To conclude, our study sheds light on the complex relationship between CD and BC, with specific relevance to certain BC subtypes. These findings have the potential to guide future disease prevention, treatment, and personalized medicine efforts.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- CD:
-
Crohn’s Disease
- BC:
-
Breast Cancer
- MR:
-
Mendelian Randomization
- GWAS:
-
Genome Wide Association Studies
- IVW:
-
Inverse-variance Weighted
- ER+:
-
Estrogen Receptor-Positive
- ER-:
-
Estrogen Receptor-Negative
- IVs:
-
Instrumental Variables
- SNPs:
-
Single Nucleotide Polymorphisms
- TCGA:
-
The Cancer Genome Atlas
- BRCA:
-
Breast Cancer
- LASSO:
-
Least Absolute Shrinkage and Selection Operator
- DEGs:
-
Differentially Expressed Genes
- OS:
-
Overall Survival
- CDF:
-
Cumulative Distribution Function
- ssGSEA:
-
Single-sample Gene Set Enrichment Analysis
- GSVA:
-
Gene Set Variation Analysis
- UC:
-
Ulcerative Colitis
- TOM:
-
Topological Overlap Measure
- Significance levels:
-
**** (P < 0.0001), *** (P < 0.001), ** (P < 0.01), * (P < 0.05)
References
Roda G, Chien Ng S, Kotze PG, et al. Crohn’s disease. Nat Rev Dis Primers. 2020;6(1):22.
Sung H, Ferlay J, Siegel RL, et al. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and Mortality Worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Le Berre C, Ananthakrishnan AN, Danese S, Singh S, Peyrin-Biroulet L. Ulcerative Colitis and Crohn’s Disease have similar burden and goals for treatment. Clin Gastroenterol Hepatol. 2020;18(1):14–23.
Donegan WL. Cancer of the breast in men. CA Cancer J Clin. 1991;41(6):339–54.
Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol. 2022;95(1130):20211033.
Burocziova M, Grusanovic S, Vanickova K, Kosanovic S, Alberich-Jorda M. Chronic inflammation promotes Cancer Progression as a second hit. Exp Hematol 2023.
Kim ES, Kim SY, Moon A. C-Reactive protein signaling pathways in Tumor Progression. Biomol Ther (Seoul). 2023;31(5):473–83.
Pellino G, Sciaudone G, Patturelli M, et al. Relatives of Crohn’s disease patients and breast cancer: an overlooked condition. Int J Surg. 2014;12(Suppl 1):S156–158.
Riegler G, Caserta L, Castiglione F, et al. Increased risk of breast cancer in first-degree relatives of Crohn’s disease patients. An IG-IBD study. Dig Liver Dis. 2006;38(1):18–23.
Gao H, Zheng S, Yuan X, Xie J, Xu L. Causal association between inflammatory bowel disease and 32 site-specific extracolonic cancers: a mendelian randomization study. BMC Med. 2023;21(1):389.
Gong C, Xu R, Zou P, Zhang Y, Wang X. Inflammatory bowel disease and risk of breast cancer: a meta-analysis of cohort studies. Eur J Cancer Prev. 2022;31(1):54–63.
Lu Y, Ma L. Investigation of the causal relationship between breast cancer and autoimmune diseases: a bidirectional mendelian randomization study. Med (Baltim). 2023;102(34):e34612.
Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37(7):658–65.
Hemani G, Tilling K, Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13(11):e1007081.
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.
Tang J, Kong D, Cui Q, et al. Prognostic genes of breast Cancer identified by Gene Co-expression Network Analysis. Front Oncol. 2018;8:374.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized Linear models via Coordinate Descent. J Stat Softw. 2010;33(1):1–22.
Buergy D, Würschmidt F, Gkika E, et al. Stereotactic body radiotherapy of adrenal metastases-A dose-finding study. Int J Cancer. 2022;151(3):412–21.
Xu Q, Chen S, Hu Y, Huang W. Landscape of Immune Microenvironment under Immune Cell infiltration pattern in breast Cancer. Front Immunol. 2021;12:711433.
Liu TT, Li R, Huo C, et al. Identification of CDK2-Related Immune Forecast Model and ceRNA in Lung Adenocarcinoma, a Pan-cancer Analysis. Front Cell Dev Biol. 2021;9:682002.
Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.
Hu X, Ni S, Zhao K, Qian J, Duan Y. Bioinformatics-Led Discovery of Osteoarthritis biomarkers and inflammatory infiltrates. Front Immunol. 2022;13:871008.
Zhao P, Zhen H, Zhao H, Huang Y, Cao B. Identification of hub genes and potential molecular mechanisms related to radiotherapy sensitivity in rectal cancer based on multiple datasets. J Transl Med. 2023;21(1):176.
Wang L, Wang D, Yang L, et al. Cuproptosis related genes associated with Jab1 shapes tumor microenvironment and pharmacological profile in nasopharyngeal carcinoma. Front Immunol. 2022;13:989286.
Bravi F, Decarli A, Russo AG. Risk factors for breast cancer in a cohort of mammographic screening program: a nested case-control study within the FRiCaM study. Cancer Med. 2018;7(5):2145–52.
Søgaard KK, Cronin-Fenton DP, Pedersen L, Sørensen HT, Lash TL. Survival in Danish patients with breast cancer and inflammatory bowel disease: a nationwide cohort study. Inflamm Bowel Dis. 2008;14(4):519–25.
Stenfelt L, Nilsson J, Hellberg Å et al. Glycoproteomic and phenotypic elucidation of B4GALNT2 expression variants in the SID Histo-Blood Group System. Int J Mol Sci 2022;23(7).
Ke G, Cheng N, Sun H, Meng X, Xu L. Explore the impact of hypoxia-related genes (HRGs) in cutaneous melanoma. BMC Med Genomics. 2023;16(1):160.
Yamada M, Watanabe Y, Gootenberg JS, et al. Crystal structure of the minimal Cas9 from Campylobacter jejuni reveals the Molecular Diversity in the CRISPR-Cas9 systems. Mol Cell. 2017;65(6):1109–e11211103.
Salcher S, Hermann M, Kiechl-Kohlendorfer U, Ausserlechner MJ, Obexer P. C10ORF10/DEPP-mediated ROS accumulation is a critical modulator of FOXO3-induced autophagy. Mol Cancer. 2017;16(1):95.
Deng J, Dong Y, Li C, et al. Decreased expression of C10orf10 and its prognostic significance in human breast cancer. PLoS ONE. 2014;9(6):e99730.
Chen Y, Tang M, Li H, Huang J. Effects of C10orf10 on growth and prognosis of glioma under hypoxia. Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2023;48(4):499–507.
Shi L, Zhao T, Huang L, et al. Engineered FGF19(∆KLB) protects against intrahepatic cholestatic liver injury in ANIT-induced and Mdr2-/- mice model. BMC Biotechnol. 2023;23(1):43.
Shi S, Zhang Q, Zhang K et al. FGF19 promotes nasopharyngeal carcinoma progression by inducing angiogenesis via inhibiting TRIM21-mediated ANXA2 ubiquitination. Cell Oncol (Dordr). 2023.
Chen T, Liu H, Liu Z, et al. FGF19 and FGFR4 promotes the progression of gallbladder carcinoma in an autocrine pathway dependent on GPBAR1-cAMP-EGR1 axis. Oncogene. 2021;40(30):4941–53.
Xue P, Liu Y, Wang H, Huang J, Luo M. miRNA-103-3p-Hlf regulates apoptosis and autophagy by targeting hepatic leukaemia factor in heart failure. ESC Heart Fail. 2023;10(5):3038–45.
Bao L, Kumar A, Zhu M et al. SAP30 promotes breast tumor progression by bridging the transcriptional corepressor SIN3 complex and MLL1. J Clin Invest 2023;133(17).
Gutmann H, Hruz P, Zimmermann C, et al. Breast cancer resistance protein and P-glycoprotein expression in patients with newly diagnosed and therapy-refractory ulcerative colitis compared with healthy controls. Digestion. 2008;78(2–3):154–62.
Englund G, Jacobson A, Rorsman F, Artursson P, Kindmark A, Rönnblom A. Efflux transporters in ulcerative colitis: decreased expression of BCRP (ABCG2) and pgp (ABCB1). Inflamm Bowel Dis. 2007;13(3):291–7.
Kaplan GG, Windsor JW. The four epidemiological stages in the global evolution of inflammatory bowel disease. Nat Rev Gastroenterol Hepatol. 2021;18(1):56–66.
Acknowledgements
We gratefully acknowledge the creation of the flowchart in this article by Figdraw.
Funding
This work was supported by the High-level Talent Introduction Project of Fujian Cancer Hospital (Grant/Award Number: F2328R-GC301-01); The High-level Talent Training Program of Fujian Cancer Hospital (Grant/Award Number: 2024YNG03).
Author information
Authors and Affiliations
Contributions
YX and YYS contributed equally to this work. SCG and YX conceived the study. YYS and YX had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. YX and YXQ performed data analyses. YYS and HXW wrote the first draft of this manuscript. JZR, WQ and SCG critically revised the manuscript. All authors approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Given that this study utilized data sourced from publicly available datasets, neither ethics approval nor informed consent were deemed necessary.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yu, X., Yu, Y., Huang, X. et al. Unraveling the causal links and novel molecular classification of Crohn’s disease in breast Cancer: a two-sample mendelian randomization and transcriptome analysis with prognostic modeling. BMC Cancer 24, 1134 (2024). https://doi.org/10.1186/s12885-024-12838-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12885-024-12838-x