Evaluation of the role of KPNA2 mutations in breast cancer prognosis using bioinformatics datasets

Breast cancer, comprising of several sub-phenotypes, is a leading cause of female cancer-related mortality in the UK and accounts for 15% of all cancer cases. Chemoresistant sub phenotypes of breast cancer remain a particular challenge. However, the rapidly-growing availability of clinical datasets, presents the scope to underpin a data-driven precision medicine-based approach exploring new targets for diagnostic and therapeutic interventions. We report the application of a bioinformatics-based approach probing the expression and prognostic role of Karyopherin-2 alpha (KPNA2) in breast cancer prognosis. Aberrant KPNA2 overexpression is directly correlated with aggressive tumour phenotypes and poor patient survival outcomes. We examined the existing clinical data available on a range of commonly occurring mutations of KPNA2 and their correlation with patient survival. Our analysis of clinical gene expression datasets show that KPNA2 is frequently amplified in breast cancer, with differences in expression levels observed as a function of patient age and clinicopathologic parameters. We also found that aberrant KPNA2 overexpression is directly correlated with poor patient prognosis, warranting further investigation of KPNA2 as an actionable target for patient stratification or the design of novel chemotherapy agents. In the era of big data, the wealth of datasets available in the public domain can be used to underpin proof of concept studies evaluating the biomolecular pathways implicated in chemotherapy resistance in breast cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09969-4.


Introduction
Breast cancer is the most commonly-diagnosed, and leading cause of cancer-related mortality worldwide among women with an estimate of 2.3 million new cases in 2020 [1,2]. Breast cancer represents a heterogeneous group of diseases classified across several sub-phenotypes according to their anatomical location and gene expression profile.
Despite significant advancements in developing new treatments for breast cancer, the incidence of breast cancer in women continues to rise proportionally with age, posing a significant global public health challenge [3]. Current standard of care in breast cancer treatment involves surgery, radiotherapy, endocrine-based therapies, chemotherapies or biologicals, or a combination of these therapeutic interventions. From a diagnostic perspective, mammography remains one of the main approaches for detecting breast cancer. However, patients are often diagnosed during later stages of breast cancer with the potential to adversely impact patient clinical prognosis and outcomes. Therefore, Open Access *Correspondence: nicholas.rattray@strath.ac.uk; zahra.rattray@strath.ac.uk the recent years have seen a significant growth in novel surrogate biomarker research for diagnostic, prognostic and therapeutic interventions. Current routine stratification for breast cancer treatment is based on the hormonal status (oestrogen, progesterone and human epidermal growth receptor-2) or more recently, genetic biomolecular signatures classifying breast cancers according to intrinsic subtypes (e.g. basal, and luminal A and B) [4].
Karyopherin alpha 2 (KPNA2), a member of the Karyopherin family and an adaptor protein, is a component of the nuclear import pathway machinery involved in the nucleocytoplasmic transport of molecules involved in cell division, transcription, and DNA repair. Aberrant amplification of KPNA2 expression in cancer has been implicated in the pathogenic mislocalization of substrate proteins, resulting in tumorigenesis and conferring an aggressive sub-phenotype [5]. KPNA2 over-expression has been correlated with poor patient outcomes in a number of malignancies including glioblastoma [6], colon [7], hepatocellular carcinoma [8], ovarian [9] and breast [10][11][12] cancers. In breast cancer, KPNA2 expression is correlated with a lower abundance of DNA repair proteins including CHK1, UBC9, PIAS1, BRCA1, RAD51 and γH2AX in cell nuclei [12]. Moreover, the incidence of KPNA2 overexpression has correlated with oestrogen receptor-negative (ER-) status [12,13] and rapidly proliferating subtypes, specifically basal-like tumours [14].
With increasing reports of KPNA2 involvement in several cancer types [6,7,9,15] and significant advancements in precision medicine technologies, coupled to extensive biobanking and electronic curation of patient metadata, the scope exists to interrogate the correlation between KPNA2 expression, breast cancer phenotype and patient prognosis.
Dysregulation of mRNA expression levels of KPNA2 in human breast cancer and its association with breast cancer prognosis has not been further investigated. A cohort by AlShareeda et al. correlated tumours overexpressing KPNA2 with poor patient prognosis and a larger tumour size [12]. Other recent studies have evaluated significant KPNA2 expression in breast cancer compared to normal samples [16][17][18]. In this study, we extensively investigate the effect of KPNA2 in breast cancer patients on specific prognosis outcomes using a range of bioinformatics tools. We analyzed the mRNA expression patterns and mutations of KPNA2 in patients with breast cancer from the vast number of gene expression data available within the public domain, to identify expression patterns and the potential prognostic value of KPNA2 in human breast cancer.

Data retrieval
cBioPortal (https:// www. cbiop ortal. org/) is an open access resource for cancer genomics that was originally developed by Memorial Sloan Kettering Cancer Center [19]. In this study cBioPortal was used to query the incidence and types of KPNA2 mutations occurring in breast cancer as a function of tumour clinicopathologic parameters.
COSMIC (Catalogue of Somatic Mutations in Cancer (www. sanger. ac. uk) is a tool for studying the influence of somatic mutations in all cancers and assessing druggability of targets incorporation with chEMBL, which is maintained by the European Molecular Biology Laboratory. Using this resource, we identified over 500 KPNA2related mutations, specifying the amino acid point mutation position and mutation type, and their classification as missense or insertion.
Oncomine (https:// www. oncom ine. org/ resou rce/ login. html) Analysis of KPNA2 mRNA expression patterns was conducted using the following parameter selections: Gene-KPNA2, differential analysis-cancer vs. normal analysis, cancer type-breast cancer; and data type-mRNA. A two-fold change, a P-value corresponding to 1E-4 and a top 10% gene rank were selected as thresholds for this analysis. The same parameters were applied to the analysis of gene co-expression analyses. All statistical analyses and parameters were directly exported from Oncomine.
PrognoScan (http:// dna00. bio. kyute ch. ac. jp/ Progn oScan/ index. html) [20] is a resource for performing meta-analysis of the prognostic role of mutations occurring in cancer through incorporating gene expression studies from multiple sources such as the Gene Expression Omnibus (GEO-www. ncbi. nlm. nih. gov/ gds) and reports from individual labs [21]. PrognoScan combines expression data with clinical outcomes, which enables the evaluation of potential biomarkers and their role in cancer prognosis. In this study PrognoScan was used to assess the correlation between KPNA2 mRNA expression levels and patient prognostic endpoints for breast cancer. Output generated and exported from PrognoScan include P-values (Cox), hazard ratios and confidence intervals across breast cancer datasets available. Data available for the 201088_at KPNA2 reporter was selected for the generation of Forest plots.
Kaplan-Meier Plotter (KMplot) (http:// kmplot. com/ analy sis/ index. php?P= servi ce) [22] uses gene expression data from GEO datasets and through integration will clinical data, generates Kaplan-Meier plots across multiple prognostic outcomes. Using this tool it is possible to restrict the selection to patients with specific breast cancer sub-phenotypes, enabling the selection of inclusion and exclusion criteria. For the purposes of this study, the prognostic value of KPNA2 was studied across all breast cancer types, and as a function of each intrinsic molecular subtype (St. Gallen definitions were used) [23]. For all survival analyses, the auto select best cut-off was used to display the P-value (log-rank) and false-discovery rate (FDR) for each plot and the probe ID (201088_at) of KPNA2 reporter was selected for all searches.
Breast Cancer Gene-Expression Miner v4.6 [24] (http:// bcgen ex. ico. unica ncer. fr/ BC-GEM/ GEM-Accue il. php? js=1) is a breast cancer statistical mining tool providing information on gene expression and prognostic implications of gene expression profiles in breast cancer. Moreover, the correlation between multiple genes, and their association with breast cancer can be elucidated using this tool [25]. Briefly, KPNA2 expression patterns in all breast cancers were examined (RNA-seq, all platforms) and endpoint events (overall survival, disease-free survival) classified according to sub-phenotypes. Gene ontology and exhaustive gene correlations were also studied across all breast cancer groups as a function of intrinsic molecular subtype and hormone receptor expression profile.

Statistical analysis
Comparisons of KPNA2 mRNA expression levels performed between breast cancer and healthy breast tissue (fold-change) was performed in Oncomine using a t-test. For comparisons between breast cancer patient subsets in Geneminer, a Welch test was used to compare differences in KPNA2 mRNA expression. To analyze the prognostic value of KPNA2 using Kaplan-Meier plot (KMPlot), P-values from log-rank analysis were used to compare prognostic endpoints between patient cohorts using in-built algorithms on the webpage. Prognostic data obtained from PrognoScan was selected according to the calculated Cox P-values and corresponding Hazard ratios (95% confidence interval) for various endpoints (overall survival, disease-free survival, disease-free metastatic survival, and relapse-free survival) that were subsequently plotted and visualized with a Forest plot. Unless otherwise stated, a P < 0.05 was deemed as statistically significant for all comparisons.

KPNA2 mutations in breast cancer
Genetic alterations impacting KPNA2 in breast cancer were analyzed using cBioPortal and COSMIC databases. Querying a combined total of 4,065 samples across five studies in cBioportal, the frequency of KPNA2 gene alterations differed across each study queried ( Table 1). The percentage of samples with somatic mutations in KPNA2 were 0.2% of the KPNA2-related duplicate mutations, corresponding to 8 missense substitutions and one in-frame deletion in patients with multiple samples (see supplementary information). Amplification of KPNA2 expression was the most frequently observed alteration across all studies examined.
As a validation step, patterns of KPNA2 expression were studied across 39,619 cancer samples in COSMIC. These analyses revealed that 341 out of 2,612 breast cancer samples contained seven KPNA2 amino acid changes characterized as missense mutations. Six of the seven mutations identified in COSMIC were identical to those found in cBioPortal. In the case of COSMIC, no deletion mutations were found in the KPNA2 sequence, but an additional missense mutation (A364V) was present (Table 2). Thus, cBioPortal is a useful tool for evaluating expression patterns in breast cancer and verifying KPNA2-associated mutations.
After confirming identified mutations within BioMuta (NIH) (https:// hive. bioch emist ry. gwu. edu/ biomu ta/ prote inview/ P52292), results between all databases were unified using an international protein nomenclature based on the Human Genome Variation Society (HGVS). For descriptions of the sequence variants, see Table 3. Missense, substitution or deletion protein mutations are formatted as described by HGVS to better communicate our results for future clinical findings [32]. Next, we assessed the correlation between the amplification of KPNA2 mRNA expression levels in breast cancer tumours compared to matched healthy breast tissue using Oncomine. Findings from these comparisons across breast cancer intrinsic molecular subtypes and corresponding fold-changes are presented in Table 4.
Our analysis of fold-change data show that within breast cancer datasets available on Oncomine, KPNA2 frequently was ranked in the top 7% of genes altered in breast cancer with significant fold-changes observed across all studies relative to adjacent breast cancer tissue. Across all breast cancer subtypes examined, at   least a positive two-fold increase (with a corresponding P-value < 0.05) was observed in KPNA2 mRNA expression levels between healthy and breast cancer tissue, indicating KPNA2 overexpression across various breast cancer types. Next, we performed a search of the patterns of KPNA2 mRNA expression in breast cancer using Oncomine, cBi-oPortal and Geneminer toolsets. Analysis of the datasets available on these resources indicated differential KPNA2 expression levels as a function of clinicopathological parameters (Fig. 1).
Analysis of KPNA2 expression level patterns across multiple toolsets shows a varied KPNA2 expression and mutational profile as a function of clinicopathological parameters. The incidence of KPNA2 genetic alterations occurred more frequently in patients with positive ER status (Fig. 1D), whereas higher KPNA2 mRNA levels appeared in patients with negative hormone receptor status (Fig. 1 A and B). Relative to normal breast-like tissue, mRNA expression levels of KPNA2 are significantly elevated across all molecular subtypes. Across Geneminer and Oncomine databases, KPNA2 amplification occurred most frequently in patients aged < 40 years in comparison to postmenopausal patients (see supplementary information, Welch's P < 0.0001, GeneMiner). We also compared KPNA2 expression profiles across different breast cancer subtypes that included carcinoma, invasive ductal carcinoma and adenocarcinoma. KPNA2 amplification occurred in patients with invasive ductal carcinoma and was more frequently observed in patients with oestrogen-receptor negative breast cancer. Pairwise comparisons of the relative KPNA2 mRNA expression levels were performed in Geneminer according to tumour intrinsic molecular subtype. Corresponding readout indicates differential KPNA2 expression patterns across the sub-phenotypic classifications, with normal breast-like tumours consistently exhibiting (statistically significant, P < 0.0001) lower KPNA2 expression levels in comparison to other molecular sub-phenotypes.

Aberrant KPNA2 expression is associated with poor breast cancer prognosis
The prognostic value of KPNA2 in breast cancer was examined using PrognoScan and KMPlot. In PrognoScan, 25 Gene Expression Omnibus (GEO) datasets were located in total, which were divided across five categories of 10 distant metastasis-free survival (DMFS), 2 Disease-free survival (DFS), 2 Disease-specific survival (DSS), 8 Relapse-free survival (RFS), and 3 overall survival (OS). Data presented in the Forest plot consistently demonstrate a negative correlation between KPNA2 overexpression and patient survival (Fig. 2).
The number of breast cancer dataset entries extracted from PrognoScan across all KPNA2 reporters were 56 studies in total. These were further categorized into one of five categories including relapse-free survival (RFS-18), disease-free survival (DFS-5), disease-specific survival (DSS-6), overall survival (OS-8), and distant metastasis-free survival (DMFS-19). The forest plot (Fig. 2) demonstrates a direct correlation between amplification of KPNA2 expression and a poor prognosis across all endpoints.

Co-expression patterns of KPNA2 mRNA in breast cancer
To identify the pathways impacted by aberrant KPNA2 activity, we examined the correlation in gene expression patterns between KPNA2 and other genes using Oncomine. The top positive and negatively correlated genes with KPNA2 are shown in Fig. 4. The Richardson Breast 2 study was selected to study gene co-expression patterns (P-value: 0.001, Fold change:2, Gene rank: 10%), with 186 located genes upregulated genes in ductal breast carcinoma.
As shown in Fig. 4, genes most frequently coexpressed with KPNA2 in ductal breast carcinoma were found to be least expressed in healthy breast tissue.
Taken together, our findings from analyses of KPNA2 expression levels, mutational signature, impact on prognostic endpoints and co-expression patterns evidence that KPNA2 is implicated in cancer progression and prognosis.

Discussion
In the present study we examined the expression patterns of KPNA2 and its prognostic significance in breast cancer as a function of clinicopathologic parameters using online bioinformatics databases. To-date, datasets from the genomic and transcriptomic-based analyses of breast cancer tumour biopsies and their corresponding metadata have been curated and deposited across multiple databases for public access as a precision medicine tool [24,34]. To our knowledge, a comprehensive analysis of clinical datasets interrogating the frequency and patterns of KPNA2 gene alterations as a function of tumour clinicopathologic parameters has not previously been attempted.
The dysregulation and aberrant function of Karyopherin activity has previously been correlated with tumour aggressiveness and poor patient prognosis across multiple cancer types. KPNA2, a member of the karyopherin family, is involved in the nucleocytoplasmic transport of a range of key cellular factors including DNA repair, transcription, and cell division factors [17]. Previous work has shown a direct correlation between KPNA2 overexpression and poor patient prognosis across a range of cancer types, including glioblastoma, colorectal and ovarian cancer [6,7,9]. Despite the involvement of the Karyopherin family in breast cancer prognosis and tumorigenesis, the distinct role of KPNA2 in breast cancer outcomes and its expression patterns within breast tumour subtypes requires further investigation.
We used datasets available from online resources to analyse the frequency of genetic alterations occurring in KPNA2 mRNA expression levels across breast cancer intrinsic molecular subtypes (Geneminer, cBioPortal, COSMIC and Oncomine), examined patterns of KPNA2 co-expression with other genes (Geneminer and Oncomine) and evaluated the prognostic implications of KPNA2 mRNA overexpression in patients with breast cancer (Prognoscan and Kaplan-Meier Plotter).
Our analysis of patterns of KPNA2 mutations in cBio-Portal and COSMIC revealed that N375S is also present in the MET gene, occurs across a range of cancer types and is detected in 9% of advanced breast tumours. MET mutations indicate a tyrosine kinase mutation previously shown to be oncogenic and dysregulated in earlystage lung cancers [35]. R366H mutations are common in colon cancer and involves a defective phosphorylation pathway of Long interspersed nuclear elements (LINE-1), activating inflammatory immune responses that drive tumour development [36]. Our searches of the Geneminer and cBioPortal repositories (Fig. 1) consistently show that the most frequently-occurring KPNA2 genetic alteration in breast cancer tumours is overexpression. Furthermore, our results demonstrate that patients with hormone receptor-negative (ER/PR) status are most likely to exhibit higher KPNA2 mRNA expression levels, in comparison to patients with hormone receptor-positive breast cancers (P < 0.0001, Fig. 1). These data were further confirmed with the inverse correlation between KPNA2, and oestrogen and progesterone receptor mRNA levels (Geneminer, supplementary information). The incidence of KPNA2 amplification was also found to be higher in younger patients with breast cancer (supplemental information), suggesting its role in breast cancer progression in this age group. Furthermore, KPNA2 mRNA expression levels were found to be significantly amplified in patients with invasive ductal carcinoma (Fig. 1B).
Our search of the Oncomine database showed that at the transcriptional level relative to matched healthy breast tissue, the expression of KPNA2 was significantly upregulated in invasive lobular breast carcinoma, ductal breast carcinoma in situ, and invasive breast carcinoma. In all searches performed, KPNA2 was ranked in the top 7% of genes dysregulated in cancer across breast cancer subtypes located.
Functional assessment of KPNA2 co-expression showed that KPNA2 mRNA overexpression is directly correlated with an enrichment in genes regulating the cell cycle. SCL-interrupting locus protein (STIL), previously identified in prostate cancer [37], is a G2 phase gene involved in cell growth and development. This oncogene also activates the cell cycle-dependent protein kinase 1 (CDK1) pathway. CDK1, also co-expressed with KPNA2, promotes G2/M cell cycle transition and has previously been reported in hepatocellular carcinomas [8]. Moreover, KPNA2 overexpression in ovarian cancer was recently linked to KIF4F signalling upregulation accelerating tumour progression [38,39].
ZW10 interacting kinetochore protein (ZWINT) and Epithelial cell transforming 2 (ECT), both mitotic checkpoint proteins, have been shown to contribute to poor prognosis across multiple cancer types including glioblastoma [40]. Though previous reports show an association between ZWINT overexpression and triple-negative breast cancers, the functional role of ZWINT and ECT in breast cancer remains largely unexplored [41]. The ECT gene has been implicated in the protein assembly in cell division [42], and its dysregulation in breast cancer remains poorly understood. Another gene directly co-expressed with KPNA2 is the Cell division cycle 20 (CDC20), a late mitosis checkpoint mediator that predominantly occurs in hormone positive (ER +) breast tumours (58% (N = 870), METABRIC study) [43]. Aberrant CDC20 overexpression has previously been implicated in pan-cancer disease progression and poor patient prognosis.
Our evaluation of the prognostic role of KPNA2, showed that across multiple prognostic endpoints (OS, RFS, DMFS and PPS) from PrognoScan and KMPlot (Fig. 3), KPNA2 overexpression was associated with poor survival outcomes. Our findings are in agreement with a previous report indicating that KPNA2 overexpression can serve as a prognostic marker across multiple cancer types and is associated with malignant transformation and poor patient survival [14,44,45].
To-date a limited number of reports have studied the functional role of KPNA2 in patient response to standard of care treatments and breast cancer outcomes. Our investigation primarily focused on using existing databases to inform the future rationale for exploring the biomolecular and phenotypic role of KPNA2 in breast cancer. Our integrated analyses of existing datasets indicate that KPNA2 can serve as a prognostic biomarker in breast cancer, warranting further investigation of its biomolecular role in tumour aggressiveness. We identified the functional associations and prognostic significance of KPNA2 in breast cancer, which warrants its further investigation as a promising prognostic biomarker or druggable target.

Conclusion
During the COVID-19 pandemic and with limitations in laboratory access clinical datasets freely available on databases have provided a tool for data mining and scoping new projects. Open access databases provide a useful toolbox for investigation the correlations between biomolecular drivers of cancer and prognostic outcomes. Here, we used outputs from such databases to explore the rationale for targeting KPNA2 as a novel druggable target. Our analyses of existing clinical datasets for expression and survival outcomes show that KPNA2 over-expression contributes to poor patient survival outcomes, further necessitating its investigation in future studies to consider its clinical utility for triple negative breast cancer subtypes.

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12885-022-09969-4.   Fig S 4. Heatmap of KPNA2 mRNA co-expression with estrogen receptor (ESR1) and progesterone receptors (PGR) across all breast cancers, obtained from Geneminer. Fig S 5. Differences between KPNA2-correlated genes in healthy and a variety of breast cancer tissues. The colour shade indicates the level of confidence that two proteins are functionally associated, given the overall expression data. Co-expression scores are based on mRNA expression levels and protein co-regulation. The black colour indicates the absence of data. Heatmaps of protein-protein correlation matrices in healthy tissue (A), breast cancer tissue (B), ER+ (C), ER-(D), PR+ (E), and PR-breast cancer tissues. Table  S 3. Categorizing multiple KPNA2 co-expressed genes according to their biological role and cancer implications obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG) https:// doi. org/ 10. 1093/ nar/ 28.1. 27. In association with UniProt and the Human Proteome Atlas, this resource provides a link of gene functions to published literature. Gene subcellular location obtained from EMBL-EBI GO Annotations (QuickGO (ebi.ac.uk)).