Using machine learning to identify gene interaction networks associated with breast cancer

Liu, Liyuan; Zhai, Wenli; Wang, Fei; Yu, Lixiang; Zhou, Fei; Xiang, Yujuan; Huang, Shuya; Zheng, Chao; Yuan, Zhongshang; He, Yong; Yu, Zhigang; Ji, Jiadong

doi:10.1186/s12885-022-10170-w

Research
Open access
Published: 17 October 2022

Using machine learning to identify gene interaction networks associated with breast cancer

Liyuan Liu^1,2^na1,
Wenli Zhai³^na1,
Fei Wang^1,4,
Lixiang Yu^1,4,
Fei Zhou^1,4,
Yujuan Xiang^1,4,
Shuya Huang^1,4,
Chao Zheng^1,4,
Zhongshang Yuan⁵,
Yong He³,
Zhigang Yu^1,4 &
…
Jiadong Ji³

BMC Cancer volume 22, Article number: 1070 (2022) Cite this article

2545 Accesses
4 Citations
1 Altmetric
Metrics details

Abstract

Background

Breast cancer (BC) is one of the most prevalent cancers worldwide but its etiology remains unclear. Obesity is recognized as a risk factor for BC, and many obesity-related genes may be involved in its occurrence and development. Research assessing the complex genetic mechanisms of BC should not only consider the effect of a single gene on the disease, but also focus on the interaction between genes. This study sought to construct a gene interaction network to identify potential pathogenic BC genes.

Methods

The study included 953 BC patients and 963 control individuals. Chi-square analysis was used to assess the correlation between demographic characteristics and BC. The joint density-based non-parametric differential interaction network analysis and classification (JDINAC) was used to build a BC gene interaction network using single nucleotide polymorphisms (SNP). The odds ratio (OR) and 95% confidence interval (95% CI) of hub gene SNPs were evaluated using a logistic regression model. To assess reliability, the hub genes were quantified by edgeR program using BC RNA-seq data from The Cancer Genome Atlas (TCGA) and identical edges were verified by logistic regression using UK Biobank datasets. Go and KEGG enrichment analysis were used to explore the biological functions of interactive genes.

Results

Body mass index (BMI) and menopause are important risk factors for BC. After adjusting for potential confounding factors, the BC gene interaction network was identified using JDINAC. LEP, LEPR, XRCC6, and RETN were identified as hub genes and both hub genes and edges were verified. LEPR genetic polymorphisms (rs1137101 and rs4655555) were also significantly associated with BC. Enrichment analysis showed that the identified genes were mainly involved in energy regulation and fat-related signaling pathways.

Conclusion

We explored the interaction network of genes derived from SNP data in BC progression. Gene interaction networks provide new insight into the underlying mechanisms of BC.

Peer Review reports

Background

The World Health Organization (WHO)'s International Agency for Research on Cancer (IARC) showed that the most predominant change in global cancer data in 2020 was a rapid increase in breast cancer (BC) incidence. BC has replaced lung cancer as the most common cancer worldwide [1]. The mortality rate of female BC is particularly high in transitional versus developed countries [2]. Obesity is a recognized risk factor for many cancers [3, 4]. Higher estrogen levels resulting from the aromatization of adipose tissue, increased production of inflammatory cytokines such as tumor necrosis factor α, interleukin-6, and prostaglandin E2, insulin resistance, and over activation of insulin-like growth factor signaling, adipokine production, and oxidative stress in obese women are associated with the development of cancer [5]. Structural variants of genes associated with BC and obesity, including LEP, LEPR, PON1, FTO, and MC4R, are associated with a higher or lower risk of BC [5].

Genome-wide association studies (GWAS) have linked many single nucleotide polymorphisms (SNPs) with BC occurrence [6,7,8,9]. In our previous studies, a potential relationship between the sequence variations of individual gene and BC has been proposed. In the study of 11 SNPs of PTPN1, rs3787345, rs718050, rs3215684, and rs718049 were associated with a reduction in BC risk [10]. Several studies have identified the genomic region of PTPN1 as a quantitative trait locus (QTL) in obesity and diabetes mellitus [11,12,13]. XRCC5 and XRCC6 SNP genotyping revealed that XRCC5 rs16855458 was associated with BC, XRCC6 rs2267437 was associated with ER-/PR- BC risk, and there may be interactions with environmental factors [14]. However, current research has largely focused on the impact of a single SNP on disease, and potential SNP-SNP interactions remain less well studied. Most diseases, including cancers, follow a polygenic model, indicating that they may involve multiple genes or SNPs [9]. However, little is known about how they interact. Understanding this issue will help to characterize the biological mechanism of BC risk.

Differential network analysis provides information about how genes interact. Recent studies suggest that cancer occurrence and development are not only caused by gene mutations but also by abnormal gene regulation [15]. Thus, it is important to assess the impact of both a single gene and gene–gene interactions on cancer onset and progression. Network analysis can effectively capture gene–gene interactions and genetic data can be used to establish gene regulation networks that characterize the biological mechanisms of disease [16]. A recent study analyzed the genetic and clinical data from gastric cancer patients using weighted gene co-expression network analysis (WGCNA) to explore new prognostic markers and therapeutic targets of gastric cancer [17]. Jubair et al. proposed a novel network-based method by integrating a protein–protein interaction network with gene expression data to identify biomarkers for different BC subtypes and predict patients ‘ survivability [18]. Another study constructed the multi-omics markers associated with BC by high-dimensional embedding and residual neural network [19]. To date, network analysis has relied on DNA methylation and RNA-seq data [17,18,19,20]. Meanwhile, genetic effects of combinations of functionally related SNPs may affect genes in a synergistic manner, thereby increasing BC risk [21, 22]. Network analysis using SNP data can provide insights into the mechanisms of disease.

The joint density-based nonparametric difference interaction network analysis and classification (JDINAC) method [23] was used to identify the differential gene interaction network between individuals in the BC and healthy control groups. Unlike previous studies, gene interaction network results were based on SNP data, providing new insight into potential pathogenic BC genes.

Methods

Participants

The study population has been described previously [10]. In brief, a hospital-based case–control study was used that included patients diagnosed with BC by pathology between April 2012 and April 2013 in the second hospital of Shandong University and 21 collaborative hospitals. Non-BC patients were selected as controls using 1:1 matching on age group (±3 years), hospital, and treatment time period (within 2 months). The subjects were 25 to 70 years of age. Patients with clinical or pathological diagnoses of recurrence or metastasis or other malignant tumor complications were excluded. The selection of cases and controls was carried out in strict accordance with project research design standards.

Data collection

The data used for this study were obtained from a key project of clinical discipline dataset belonging to the hospitals under the Ministry of Health (administered) of the People's Republic of China [24]. The present study collected data from a face-to-face interview and, clinical breast and imaging examinations. The interview included questions relating to demographics, physiology, reproductive factors, chronic disease, and family history. Height, weight, hip and waist circumference were also obtained, body mass index (BMI) and the waist-hip rate (WHR) were calculated. Clinical examination results were also collected, including visual examination, palpation, and related diagnostic tests, including breast ultrasound, mammography, and blood testing. Blood samples were collected using an EDTA vacuum collector.

RNA-seq expression and clinical data from BC patients, including 112 tumor tissue samples and matched normal tissue samples, were downloaded from The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov/). SNP data from 4,030 and 3,494 women with and without BC, respectively, were screened using UK Biobank BC data [25]. These data were used as validation datasets.

Genotyping and laboratory methods

The blood samples consisting of fasting venous whole blood were injected into EDTA anticoagulant tubes. These were placed fully upside-down in a 4 °C refrigerator and vertically placed in a -80 °C refrigerator after sedimentation. DNA was extracted using the Wizard Genomic DNA Purification Kit (a1120, Promega) and genotyped using the Sequenom MassARRAY SNP system (CapitalBio Technology, Beijing, China).

Statistical analysis

Differential network analysis using JDINAC method

A Chi-square test was used to analyze differences in demographic and BC-related factors between the case and control groups. BMI data from the cases and controls was represented as the mean ± standard deviation. First, 101 SNPs were matched to their respective genes and the mean value of SNP for each gene was calculated for each sample. The gene difference interaction network was obtained using the JDINAC method. The 95% confidence interval (95% CI) and odds ratio (OR) were also estimated for hub gene polymorphisms in the gene difference interaction network. Significance was defined as a p-value < 0.05. All data were statistically analyzed using R × 64 4.1.0.

The JDINAC method assumes that the network-level difference between BC patients and healthy controls is the result of the collective effect of differential pairwise gene–gene interactions that are characterized by the conditional joint density of two genes [23]. Formally, Y_l (l = 1,2,…,n) is the binary response vector and if the lth subject is BC, Y_l = 1, otherwise Y_l = 0. Pr is the probability of the subject with BC, i.e., Pr = P(Y_l = 1), and S_i is the ith gene risk score. The JDINAC method based on the logistic regression is then represented as:

$$\text{logit(Pr)}={\alpha }_{0}+\sum_{t=1}^{T}{\alpha }_{t}{Z}_{t}+\sum_{i=1}^{p}\sum_{j>i}^{p}{\beta }_{ij}1\mathrm{n}\frac{{f}_{ij}^{1}\left({S}_{i},{S}_{j}\right)}{{f}_{ij}^{0}\left({S}_{i},{S}_{j}\right)}, s.t. \sum_{i=1}^{p}\sum_{j>i}^{p}\left|{\beta }_{ij}\right|\le c,c>0,$$

(1)

Z_t (t = 1,…,T) denotes covariates such as BMI and age, p is the number of genes. $f_{ij}^k\left(k=0,1\right)$ denotes the group conditional joint density of S_i and S_j for group k, respectively, i.e.,

$$\left(\left({S}_{i},{S}_{j}\right)\left|Y=1\right.\right)\sim {f}_{ij}^{1}$$

(2)

and

$$\left(\left({S}_{i},{S}_{j}\right)\left|Y=0\right.\right)\sim {f}_{ij}^{0}$$

(3)

which represents the strength of interaction between S_i and S_j for group k [23]. β_ij indicates the dependency between specific conditional groups.

JDINAC adopted a multiple randomly split algorithm to improve the accuracy and robustness of the results. A Lasso penalty was added to the logistics regression to estimate the coefficient β_ij and a cross-validation method was used to determine the best penalty parameter. The importance score for each pair $S_i,S_j$ was obtained by the following formula:

$${\omega }_{ij}=\sum_{t=1}^{T}I\left({\widehat{\beta }}_{ij,t}\ne 0\right), i,j=1,\dots ,p, j>i$$

(4)

where $\omega_{ij}$ was the importance score, $I\left(\cdot\right)$ was an indicative function, ${\widehat\beta}_{ij,t}\left(t=1,\dots,T\right)$ was the tth estimation of the coefficient $\beta_{ij}$ . The importance scores represented the differential dependency weight of each pair $\left(S_i,S_j\right)$ between two groups [23]. The difference network was inferred by connecting pairs with high importance scores through their shared genes.

Differential expression analysis and enrichment analysis

The edgeR package [26] was utilized to identify differentially expressed genes in TCGA breast cancer data to test the reliability of the JDINAC results. Multiplicity correction was performed by applying the Benjamini–Hochberg method on the p-values.

To explore the biological functions of the identified interaction genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in enrichment analysis were performed by the R package "clusterProfiler" [27]. Only terms with a multiple-test adjusted p-value < 0.05 were considered significant.

Results

Participant demographic and lifestyle characteristics

There were 1,916 subjects in the study, including 953 and 963 in the BC and control groups, respectively. There were significant differences in BMI and menopausal status between the two groups (p-value < 0.05) (Table 1). Women with BC had a higher BMI than that of healthy women (24.36 ± 3.46 vs. 24.01 ± 3.11, respectively), indicating that obesity may be a risk factor for BC.

Table 1 Clinical characteristics of the study population

Full size table

Differential network of gene interaction

Twenty genes that might be related to the pathogenesis of BC and 101 SNPs associated with these genes were selected. The differential gene interaction network was estimated based on four scenarios: no adjustment for covariates, adjustment for BMI, adjustment for the menopause status (Fig. 1), and adjustment for BMI and menopause status simultaneously (see Additional file 1). The number of edges selected under the four scenarios was 18, 14, 19 and 16, respectively. The orange nodes in the figure represent the central genes with at least four adjacent genes in the network. All scenarios had the three genes, LEP, LEPR, and XRCC6 in common. Gene pairs were ranked based on the importance scores derived from JDINAC and the top ten pairs in the network with no covariate adjustment are summarized in Table 2. Among them, six pairs had evidence of interaction in STRING database [28]. Additional data are shown in Additional files 2, 3, 4 and 5.

Table 2 Top 10 gene interaction pairs identified by JDINAC with no covariate

Full size table

Association between polymorphisms and BC risk

Next, the association between SNPs in the hub genes of differential networks and BC risk was assessed (Table 3). Most SNPs were not associated with BC significantly. Rs1137101 (OR = 0.728, p-value = 0.002) and rs4655555 (OR = 0.825, p-value = 0.015) contained in LEPR were significantly associated with BC risk, while the LEP, XRCC6, and RETN polymorphisms were not significantly. Functional consequences of SNPs on genes were also shown in Table 3. Rs4655555 is an intron variant. Rs1137101 is a missense variant and coding sequence variant reported as benign [29].

Table 3 The association of SNPs in hub genes with breast cancer (BC) adjusted for BMI and menopause status

Full size table

Identification of the interaction network

RNA-seq expression and clinical data from BC patients were obtained from TCGA to analyze and verify the identified hub genes. The validation dataset included 112 subjects for whom both tumor and matched normal samples were available. All genes available in the TCGA dataset were analyzed to detect differences between tumor and normal samples, and 10 common genes in Fig. 1 were screened out from the results. LEP, LEPR and XRCC6 expression was significantly different between two groups (Table 4). RETN was not differentially expressed in the TCGA data.

Table 4 The validation results of the 10 identical genes in Fig. 1 using TCGA data

Full size table

Genetic data from 4,030 BCs and 3,494 controls in the UK Biobank was used to verify the eight identical edges of the three networks in Fig. 1 using logistic regression. The data were randomly divided into two parts, the kernel density function of the BC and control groups were estimated, and logistic regression was used to assess the corresponding p-value of the eight edges (Table 5). The results showed that the first four edges were significantly different (p-value < 0.05). The genes connected by these four edges were the identified hub genes, indicating that the interaction between hub genes in this network is more significant than it is for other genes.

Table 5 The validation results of the 8 identical edges in Fig. 1 using UK Biobank data

Full size table

Enrichment analysis

GO analysis showed that the biological processes of the identified genes were mainly related to glucose homeostasis and carbohydrate homeostasis (Fig. 2). KEGG pathway analysis showed that these genes were mainly enriched in adenosine-monophosphate-activated protein kinase (AMPK) signaling pathway, adipocytokine signaling and non-alcoholic fatty liver disease (Fig. 2).

Discussion

This study sought to identify potential pathogenic genes associated with BC by constructing a BC gene interaction network. This study extended the results of prior studies [14] by not only assessing the effect of a single gene on BC but also the gene interaction network, providing new insight into how genetic factors impact complex human diseases. These results suggest that BMI and menopausal status may be risk factors for BC. The gene interaction network obtained using the JDINAC method showed that LEPR, LEP, XRCC6, and RETN have significant interactivity difference between BC patients and healthy women, and are associated with higher BC risk. However, analysis of hub gene polymorphisms indicated that only LEPR rs1137101 and rs4655555 were strongly linked to BC. Other independent datasets and bioinformatics analysis tools were used to verify the hub genes and the edges, increasing the reliability of the results. The expression of LEPR, LEP and XRCC6 was significantly associated with BC in TCGA dataset. Meanwhile, UK Biobank SNP data validated their interaction on BC.

GO enrichment analysis showed that the interacting genes were closely related to cell energy and cell metabolism, such as glucose homeostasis, carbohydrate homeostasis, muscle cell proliferation and regulation of small molecules. The results in KEGG analysis were consistent with those by GO analysis. Studies have shown that AMPK is the main cellular energy sensor [30]. Reduced activity of AMPK is associated with altered cellular metabolic processes that drive BC tumor growth and progression. If AMPK is activated, it can respond to adenosine triphosphate (ATP) depletion, glucose starvation, and metabolic stress [31]. Obesity-related factors modulate metabolic pathways in BC, providing a molecular link between obesity and BC.

Many studies have shown that LEP and LEPR play an important role in obesity. LEP is a hormone secreted by adipose tissue, which regulates eating and energy consumption through the hypothalamic region of the brain [32]. Circulating leptin binds to LEPR, activating Janus kinase 2 (JAK2), phosphorylating three tyrosine residues in LEPR, and inducing phosphorylation of STAT transcription factors, STAT5 and STAT3, which are involved in the development of BC [32]. Leptin may stimulate the expression of estrogen by increasing aromatase expression, which is also involved in BC development [33]. The LEPR rs1137101 polymorphism results from a nonconservative A to G substitution at codon 223, reducing leptin binding and impairing signaling [34]. While the effect of LEPR rs4655555 on the development of BC has not yet been reported, one study has shown that rs4655555 is significantly correlated with plasma soluble leptin receptor levels and may inform diabetes prognosis [35]. The findings from the current study further support the evidence that LEP and LEPR play an important role in BC pathogenesis.

The impact of RETN on BC has been reported previously. RETN is highly expressed in BC tissues and may serve as a biomarker for disease stage and the degree of inflammation [36, 37]. Low-grade systemic inflammation is one of the characteristics of obesity [38], and RETN is shown to exert pro-inflammatory properties by upregulating pro-inflammatory cytokines [39] through the NFκB signaling pathway [40] that lead to inflammation and tumorigenesis. Several studies have also linked XRCC6 with an increased risk of BC [14, 41, 42]. Interaction between XRCC6 genetic polymorphisms and reproductive risk factors is thought by some researchers to contribute to estrogen exposure, which results in double-strand breaks on BRCA1 and BRCA2 DNA and induces BC [41]. XRCC6 is also involved in the production of proinflammatory cytokines induced by lipopolysaccharide (LPS) in human macrophages and monocytes. Proinflammatory cytokine production is, in turn, associated with obesity and BC [42].

Recent studies have used gene expression data to explore the pathogenesis of BC [18] and other diseases [17, 20]. However, no genetic interaction network has been constructed to identify potential BC pathology genes using SNP data. As discussed previously, single genetic variants often explain only a small fraction of phenotypic variation, that is, the problem of missing heritability [43]. Gene–gene interactions are proposed as a potential source of this problem [44]. The current study built gene interaction networks based on SNP data to explain the etiology of complex human traits. While high-throughput SNP genotyping methods have been developed, the computational and statistical challenges of simultaneously analyzing large SNP datasets still exist [9]. The method used here provides ideas for handling SNP data. In addition, because BC incidence is affected by demography [45, 46] the gene network was constructed adjust the influence of confounding factors such as BMI and menopause, making the results more reliable. This study does have some limitations, however. Only the interaction between paired genes was assessed. For BC, the relationship between genes may be more complicated. Future studies should assess more complex interactions associated with this disease.

Conclusions

Potential pathogenic BC genes were investigated by constructing a gene interaction network. LEP, LEPR, XRCC6, and RETN had significant interactions during BC, and LEPR polymorphisms may also be associated with BC development. Gene network analysis can provide more detailed information about the pathogenesis of complex diseases.

Availability of data and materials

The datasets analyzed during the current study are not publicly available due to privacy but are available from the corresponding author on reasonable request.

Abbreviations

BC:: Breast cancer
LEP:: Leptin
LEPR:: Leptin receptor
XRCC6:: X-ray repair cross complementing 6
RETN:: Resistin
JDINAC:: The joint density-based non-parametric differential interaction network analysis and classification
OR:: Odds ratio
CI:: Confidence interval
SNP:: Single nucleotide polymorphism
TCGA:: The Cancer Genome Atlas
BMI:: Body mass index
IARC:: International Agency for Research on Cancer
GWAS:: Genome wide association study
WGCNA:: Weighted gene co-expression network analysis
WHR:: Waist-hip rate
JAK2:: Janus kinase 2
LPS:: Lipopolysaccharide

References

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Article PubMed Google Scholar
Burden G, Fitzmaurice C, Akinyemiju T, Al Lami F, Alam T, Alizadeh-Navaei R, et al. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2016: a systematic analysis for the global burden of disease study. JAMA Oncol. 2018;4(11):1553–68.
Article Google Scholar
Keum N, Greenwood DC, Lee DH, Kim R, Aune D, Ju W, et al. Adult weight gain and adiposity-related cancers: a dose-response meta-analysis of prospective observational studies. J Natl Cancer Inst. 2015;107(2):djv088.
Article PubMed Google Scholar
Yoon YS, Kwon AR, Lee YK, Oh SW. Circulating adipokines and risk of obesity related cancers: A systematic review and meta-analysis. Obes Res Clin Pract. 2019;13(4):329–39.
Article PubMed Google Scholar
Simone V, D’avenia M, Argentiero A, Felici C, Rizzo FM, De Pergola G, et al. Obesity and breast cancer: molecular interconnections and potential clinical applications. Oncologist. 2016;21(4):404–17.
Article CAS PubMed PubMed Central Google Scholar
Kaklamani V, Yi N, Sadim M, Siziopikou K, Zhang K, Xu Y, et al. The role of the fat mass and obesity associated gene (FTO) in breast cancer risk. BMC Med Genet. 2011;12(1):1–10.
Article Google Scholar
Gallicchio L, McSorley MA, Newschaffer CJ, Huang HY, Thuita LW, Hoffman SC, et al. Body mass, polymorphisms in obesity-related genes, and the risk of developing breast cancer among women with benign breast disease. Cancer Detect Prev. 2007;31(2):95–101.
Article CAS PubMed Google Scholar
Sayad S, Dastgheib SA, Farbod M, Asadian F, Karimi-Zarchi M, Salari S, et al. Association of PON1, LEP and LEPR Polymorphisms with Susceptibility to Breast Cancer: A Meta-Analysis. Asian Pac J Cancer Prev: APJCP. 2021;22(8):2323.
Article CAS PubMed PubMed Central Google Scholar
Chuang LY, Chang HW, Lin MC, Yang CH. Chaotic particle swarm optimization for detecting SNP–SNP interactions for CXCL12-related genes in breast cancer prevention. Eur J Cancer Prev. 2012;21(4):336–42.
Article CAS PubMed Google Scholar
Huang S, Liu L, Xiang Y, Wang F, Yu L, Zhou F, et al. Association of PTPN1 polymorphisms with breast cancer risk: A case-control study in Chinese females. J Cell Biochem. 2019;120(7):12039–50.
Article CAS Google Scholar
Ghosh S, Watanabe RM, Hauser ER, Valle T, Magnuson VL, Erdos MR, et al. Type 2 diabetes: evidence for linkage on chromosome 20 in 716 Finnish affected sib pairs. Proc Natl Acad Sci. 1999;96(5):2198–203.
Article CAS PubMed PubMed Central Google Scholar
Lee JH, Reed DR, Li WD, Xu W, Joo EJ, Kilker RL, et al. Genome scan for human obesity and linkage to markers in 20q13. Am J Hum Genet. 1999;64(1):196–209.
Article CAS PubMed PubMed Central Google Scholar
Soro A, Pajukanta P, Lilja HE, Ylitalo K, Hiekkalinna T, Perola M, et al. Genome scans provide evidence for low-HDL-C loci on chromosomes 8q23, 16q24. 1–24.2, and 20q13. 11 in Finnish families. Am J Hum Genet. 2002;70(5):1333–40.
Article CAS PubMed PubMed Central Google Scholar
Yu LX, Liu LY, Xiang YJ, Wang F, Zhou F, Huang SY, et al. XRCC5/6 polymorphisms and their interactions with smoking, alcohol consumption, and sleep satisfaction in breast cancer risk: A Chinese multi-center study. Cancer Med. 2021;10(8):2752–62.
Article CAS PubMed PubMed Central Google Scholar
Schadt EE. Molecular networks as sensors and drivers of common human diseases. Nature. 2009;461(7261):218–23.
Article CAS PubMed Google Scholar
Gong BS, Zhang QP, Zhang GM, Zhang SJ, Zhang W, Lv HC, et al. Single-nucleotide polymorphism-gene intermixed networking reveals co-linkers connected to multiple gene expression phenotypes. In: BMC proceedings. BioMed Central. 2007;1(1):1–7.
Chen J, Wang X, Hu B, He Y, Qian X, Wang W. Candidate genes in gastric cancer identified by constructing a weighted gene co-expression network. PeerJ. 2018;6: e4692.
Article PubMed PubMed Central Google Scholar
Jubair S, Alkhateeb A, Tabl AA, Rueda L, Ngom A. A novel approach to identify subtype-specific network biomarkers of breast cancer survivability. Network Model Anal Health Inform Bioinform. 2020;9(1):1–12.
Google Scholar
Zhou L, Rueda M, Alkhateeb A. Classification of breast cancer nottingham prognostic index using high-dimensional embedding and residual neural network. Cancers. 2022;14(4):934.
Article CAS PubMed PubMed Central Google Scholar
Chen H, He Y, Ji J, Shi Y. A machine learning method for identifying critical interactions between gene pairs in alzheimer’s disease prediction. Frontiers in Neurology. 2019;10:1162.
Onay VÜ, Briollais L, Knight JA, Shi E, Wang Y, Wells S, et al. SNP-SNP interactions in breast cancer susceptibility. BMC Cancer. 2006;6(1):1–16.
Article Google Scholar
Sapkota Y, Mackey JR, Lai R, Franco-Villalobos C, Lupichuk S, Robson PJ, et al. Assessing SNP-SNP interactions among DNA repair, modification and metabolism related pathway genes in breast cancer susceptibility. PLoS ONE. 2013;8(6): e64896.
Article CAS PubMed PubMed Central Google Scholar
Ji J, He D, Feng Y, He Y, Xue F, Xie L. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinform. 2017;33(19):3080–7.
Article CAS Google Scholar
Liu LY, Wang F, Cui SD, Tian FG, Fan ZM, Geng CZ, et al. A case-control study on risk factors of breast cancer in Han Chinese women. Oncotarget. 2017;8(57):97217.
Article PubMed PubMed Central Google Scholar
Ahmed M, Mulugeta A, Lee SH, Mäkinen VP, Boyle T, Hyppönen E. Adiposity and cancer: a Mendelian randomization analysis in the UK biobank. Int J Obes. 2021;45(12):2657–65.
Article CAS Google Scholar
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Article CAS PubMed Google Scholar
Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics. 2012;16(5):284–7.
Article CAS PubMed PubMed Central Google Scholar
von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003;31(1):258–61.
Article Google Scholar
Considine RV, Caro JF, Considine EL, Williams CJ, Hyde TM. Identification of Incidental Sequence Polymorphisms and Absence of the db/db Mouse and fa/fa Rat Mutations. Diabetes. 1996;45(7):992–4.
Article PubMed Google Scholar
López M. Hypothalamic AMPK and energy balance. Eur J Clin Invest. 2018;48(9): e12996.
Article PubMed PubMed Central Google Scholar
Ponnusamy L, Natarajan SR, Thangaraj K, Manoharan R. Therapeutic aspects of AMPK in breast cancer: Progress, challenges, and future directions. Biochimica et Biophysica Acta (BBA)-Reviews on Cancer. 2020;1874(1):188379.
Article CAS Google Scholar
Bains V, Kaur H, Badaruddoza B. Association analysis of polymorphisms in LEP (rs7799039 and rs2167270) and LEPR (rs1137101) gene towards the development of type 2 diabetes in North Indian Punjabi population. Gene. 2020;754: 144846.
Article CAS PubMed Google Scholar
Hosney M, Sabet S, El-Shinawi M, Gaafar KM, Mohamed MM. Leptin is overexpressed in the tumor microenvironment of obese patients with estrogen receptor positive breast cancer. Exp Ther Med. 2017;13(5):2235–46.
Article CAS PubMed PubMed Central Google Scholar
Illangasekera Y, Kumarasiri P, Fernando D, Dalton C. Association of the leptin receptor Q223R (rs1137101) polymorphism with obesity measures in Sri Lankans. BMC Res Notes. 2020;13(1):1–4.
Article Google Scholar
Sun Q, Cornelis MC, Kraft P, Qi L, van Dam RM, Girman CJ, et al. Genome-wide association study identifies polymorphisms in LEPR as determinants of plasma soluble leptin receptor levels. Hum Mol Genet. 2010;19(9):1846–55.
Article CAS PubMed PubMed Central Google Scholar
Lee YC, Chen YJ, Wu CC, Lo S, Hou MF, Yuan SSF. Resistin expression in breast cancer tissue as a marker of prognosis and hormone therapy stratification. Gynecol Oncol. 2012;125(3):742–50.
Article CAS PubMed Google Scholar
Dalamaga M, Sotiropoulos G, Karmaniolas K, Pelekanos N, Papadavid E, Lekka A. Serum resistin: a biomarker of breast cancer in postmenopausal women? Association with clinicopathological characteristics, tumor markers, inflammatory and metabolic parameters. Clin Biochem. 2013;46(7–8):584–90.
Article CAS PubMed Google Scholar
Fantuzzi G. Adipose tissue, adipokines, and inflammation. J Allergy clin immunol. 2005;115(5):911–9.
Article CAS PubMed Google Scholar
Bokarewa M, Nagaev I, Dahlberg L, Smith U, Tarkowski A. Resistin, an Adipokine with Potent Proinflammatory Properties. J Immunol. 2005;174(9):5789.
Article CAS PubMed Google Scholar
Filková M, Haluzík M, Gay S, Šenolt L. The role of resistin as a regulator of inflammation: Implications for various human pathologies. Clin Immunol. 2009;133(2):157–70.
Article PubMed Google Scholar
Fu YP, Yu JC, Cheng TC, Lou MA, Hsu GC, Wu CY, et al. Breast cancer risk associated with genotypic polymorphism of the nonhomologous end-joining genes: a multigenic study on cancer susceptibility. Can Res. 2003;63(10):2440–6.
CAS Google Scholar
Sun H, Li Q, Yin G, Ding X, Xie J. Ku70 and Ku80 participate in LPS-induced pro-inflammatory cytokines production in human macrophages and monocytes. Aging (Albany NY). 2020;12(20):20432.
Article CAS Google Scholar
Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456(7218):18–21.
Article CAS PubMed Google Scholar
Yang S, Liu Y, Jiang N, Chen J, Leach L, Luo Z, et al. Genome-wide eQTLs and heritability for gene expression traits in unrelated individuals. BMC Genomics. 2014;15(1):1–12.
Article Google Scholar
Suzuki Y, Tsunoda H, Kimura T, Yamauchi H. BMI change and abdominal circumference are risk factors for breast cancer, even in Asian women. Breast Cancer Res Treat. 2017;166(3):919–25.
Article PubMed Google Scholar
Li T, Tang L, Gandomkar Z, Heard R, Mello-Thoms C, Shao Z, et al. Mammographic density and other risk factors for breast cancer among women in China. Breast J. 2018;24(3):426–8.
Article PubMed Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was funded by General program of China Postdoctoral Science Foundation (2021M691911), General programs of Natural Science Foundation of Shandong Province (ZR2021MH243), National Natural Science Foundation of China (81903410), National Statistical Scientific Research Project (2022LY031), and the Young Scholars Program of Shandong University.

Author information

Liyuan Liu and Wenli Zhai are the co-first authors.

Authors and Affiliations

Department of Breast Surgery, The Second Hospital, Cheeloo College of Medicine, Shandong University, 250033, Jinan, China
Liyuan Liu, Fei Wang, Lixiang Yu, Fei Zhou, Yujuan Xiang, Shuya Huang, Chao Zheng & Zhigang Yu
School of Mathematics, Shandong University, Jinan, 250100, China
Liyuan Liu
Institute for Financial Studies, Shandong University, Jinan, 250100, China
Wenli Zhai, Yong He & Jiadong Ji
Institute of Translational Medicine of Breast Disease Prevention and Treatment, Shandong University, Jinan, 250100, China
Fei Wang, Lixiang Yu, Fei Zhou, Yujuan Xiang, Shuya Huang, Chao Zheng & Zhigang Yu
Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
Zhongshang Yuan

Authors

Liyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenli Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lixiang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yujuan Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Shuya Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhongshang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yong He
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiadong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, J.J. and Z.G.Y.; Writing—original draft, W.Z. and L.L.; Writing—review & editing, J.J., Z.S.Y., Z.G.Y. and Y.H.; Formal analysis, W.Z. and J.J.; Resources, L.L.; Data curation, L.L., F.W., L.Y., F.Z., Y.X., S.H. and C.Z.. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Zhigang Yu or Jiadong Ji.

Ethics declarations

Ethics approval and consent to participate

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Second Hospital of Shandong University (No. 2010004, KYLL-2021(KJ)P-0136). Informed consent was obtained from all subjects involved in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Figure S1. The differential interaction network inferred by JDINAC after adjusting for BMI and menopause status.

Additional file 2:

Table S1. Top 10 gene interaction pairs identified by JDINAC after adjusting for BMI.

Additional file 3:

Table S2. Top 10 gene interaction pairs identified by JDINAC after adjusting for menopausal status.

Additional file 4:

Table S3. Top 10 gene interaction pairs identified by JDINAC after adjusting for BMI and menopause status.

Additional file 5:

Table S4. The association of IFI30 polymorphisms with BC adjusted for BMI and menopause status.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Liu, L., Zhai, W., Wang, F. et al. Using machine learning to identify gene interaction networks associated with breast cancer. BMC Cancer 22, 1070 (2022). https://doi.org/10.1186/s12885-022-10170-w

Download citation

Received: 16 April 2022
Accepted: 10 October 2022
Published: 17 October 2022
DOI: https://doi.org/10.1186/s12885-022-10170-w

Using machine learning to identify gene interaction networks associated with breast cancer

Abstract

Background

Methods

Results

Conclusion

Background

Methods

Participants

Data collection

Genotyping and laboratory methods

Statistical analysis

Differential network analysis using JDINAC method

Differential expression analysis and enrichment analysis

Results

Participant demographic and lifestyle characteristics

Differential network of gene interaction

Association between polymorphisms and BC risk

Identification of the interaction network

Enrichment analysis

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1:

Additional file 2:

Additional file 3:

Additional file 4:

Additional file 5:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Cancer

Contact us