A 4-gene signature predicts prognosis of uterine serous carcinoma

Background Uterine serous carcinoma (USC) is an aggressive type of endometrial cancer that accounts for up to 40% of endometrial cancer deaths, creating an urgent need for prognostic biomarkers. Methods USC RNA-Seq data and corresponding patients’ clinical records were obtained from The Cancer Genome Atlas and Genotype-Tissue Expression datasets. Univariate cox, Lasso, and Multivariate cox regression analyses were conducted to forge a prognostic signature. Multivariable and univariable cox regression analysis and ROC curve evaluated the prediction efficiency both in the training and testing sets. Results We uncovered 1385 genes dysregulated in 110 cases of USC tissue relative to 113 cases of normal uterine tissue. Functional enrichment analysis of these genes revealed the involvement of various cancer-related pathways in USC. A novel 4-gene signature (KRT23, CXCL1, SOX9 and ABCA10) of USC prognosis was finally forged by serial regression analyses. Overall patient survival (OS) and recurrence-free survival (RFS) were significantly lower in the high-risk group relative to the low-risk group in both the training and testing sets. The area under the ROC curve of the 4-gene signature was highest among clinicopathological features in predicting OS and RFS. The 4-gene signature was found to be an independent prognostic indicator in USC and was a superior predictor of OS in early stage of USC. Conclusions Our findings highlight the potential of the 4-gene signature as a guide for personalized USC treatment. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-07834-4.


Background
Endometrial cancer is the 2nd most common gynecologic malignancy worldwide [1]. In China it also ranks the 2nd most common female cancer of the genital tract [2]. Uterine serous carcinoma (USC/uterine serous papillary carcinoma) was first described by Hendrickson in 1982 [3]. It represents a type of endometrial cancer whose clinicopathological and molecular features deviate from those of endometrioid carcinoma (EEC). Unlike EEC, USC tends to develop in elderly women, with low body weight and arises in the background of atrophic endometrium [4]. Microscopically, USC typically forms complex papillary structure with almost high-grade polymorphic nuclei in contrast to glandular/cribriform pattern with mild to moderate atypical nuclei in EEC [5,6]. 80-90% of USC tumors harbor TP53 mutation while retaining wildtype PTEN but losing ER/PR expression [5][6][7][8]. USC accounts for almost 10% of endometrial cancers but is disproportionately responsible for poor outcomes, contributing up to 40% cancer-related deaths from endometrial cancer [4]. The estimated 5-year disease-specific survival for USC is 18-27% compared with that of 80-90% for EEC. Compared in stage, USC has better 5-year disease-specific survival than grade 3 EEC both in early (stage I/II, 74% vs. 85%, p < 0.0001) and late stage (stage III/IV, 33% vs. 54%, p < 0.0001) [9]. USC is characteristically aggressive, readily invading lymph-vascular space and undergoing abdominal dissemination in the early and stages even in the absence of myometrium invasion [10][11][12][13][14][15]. A high proportion of USC cases present with extrauterine symptoms and adnexal, peritoneal or upper abdominal mass at diagnosis [16][17][18]. Therefore, the clinicopathological parameters that can predict the prognosis of EEC, such as tumor size, myometrial invasion, lymph-vascular space invasion and lymph node metastasis, are not reliable indicators of USC prognosis [4,19,20]. To the best of our knowledge, a robust system for predicting USC outcomes and recurrence is currently unavailable.
Advancements in molecular biological techniques and RNA-sequencing technology, have made it easier to identify genes that are associated with cancer initiation and progression [21]. Single or multiple gene signatures exhibiting superior capacity to predict cancer outcomes relative to conventional clinicopathological features, have been developed [22][23][24][25][26][27][28][29]. While similar signatures have been developed for EEC [30][31][32], to the best of our knowledge, rare is available for USC.
Here, we carried out a genome-wide search for dysregulated genes in datasets from TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) and uncovered a 4-gene prognostic signature for USC. As an independent indicator of USC prognosis, this signature performs better than conventional prognostic factors.

Processing of TCGA-USC, GTEx datasets
Level 3 USC RNA-Seq dataset (reads FPKM with HTSeq) along with associated clinical information was downloaded from the TCGA database. Normal uterus GTEx data were downloaded from the UCSC Xena project (http://xena. ucsc.edu/) in October 2019. On the TCGA dataset, cases with follow-up data or overall survival (OS) of less than 30 days were excluded from the study.

Identification of dysregulated genes and functional enrichment analysis
Limma, an "R" Bioconductor package was used to identify genes that are dysregulated in USC tissues relative to normal uterine tissue by applying a threshold of |log 2 FC| > 2 and FDR < 0.01. GO (gene ontology) term analysis and KEGG (Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were conducted using the Cluster-Profiler package on "R". A P value = < 0.05 was considered indicative of significantly enriched functional annotations.

Construction and evaluation of the prognostic model
Half of the USC cases were randomly assigned to the training set. Cases with complete records on clinicopathological features, including OS, age, invasion, node, and stage, were assigned to the testing set. In the training set, dysregulated genes with prognostic potential were identified by univariable Cox regression analysis using the Survival package in "R". P-value = < 0.05 was considered significant. To identify the most important prognostic genes, the least absolute shrinkage and selection operator (LASSO) regression method was executed in "R" using the Glmnet package. The prognostic signature for predicting OS was developed through multivariable Cox regression analysis using the "R" Survival package.
The prognostic signature was applied in the calculation of the patients' risk scores. The cases were then ranked into the high-risk and low-risk groups based on the median score. Kaplan-Meier survival analysis was done using the "R" Survival package to plot the survival curves for the 2 risk groups. Receiver operating characteristic (ROC) curve analysis done using the "R" Survival ROC package to test the 4-gene signature's accuracy in predicting OS for the high and low-risk USC cases. To validate the effectiveness of the signature, the OS risk score for each patient in the testing set was calculated using the signature, followed by Kaplan-Meier curve analysis and ROC estimation as was done in the training set. To evaluate the superiority of the 4-gene signature as a prognostic indicator, ROC curve analysis was done on other clinicopathological features, including age at diagnosis, myometrium invasion, node metastasis and stage. The process outlined above was used to test the signature's effectiveness at predicting recurrence-free survival (RFS).

TCGA-USC patient characteristics
A dataset of 110 UCS samples and 35 adjacent normal uterus tissue samples was downloaded from TCGA. The training and testing set consisted of data from 56 and 74 USC cases, respectively. The clinicopathological features among the 2 groups and the whole dataset did not differ significantly (P-value = > 0.05). These features were summarized in Table 1.

Identification of dysregulated genes in USC and functional enrichment analysis
To ensure that our analysis compared equivalent numbers of USC and non-USC cases, we downloaded a dataset of normal uterus tissue samples from GTEx (n = 78), which along with the 35 in the TCGA dataset brought the total number of normal uterine cases to 113. Using Limma package in "R", and a cutoff threshold of |log 2 FC| > 2, FDR < 0.01, 1385 genes were identified as being dysregulated in USC tissue vs the normal controls (Fig. 1a). Functional enrichment analysis revealed that the dysregulated genes are significantly associated with 717 GO term processes and 21 KEGG pathways. The most significantly enriched GO terms were extracellular matrix, mitosis, and cell adhesion, processes that might promote cancer progression (Fig. 1b). The most significantly enriched pathways are involved in cell adhesion, cell cycle, PI3K-Akt signaling pathway, cancerous micro-RNAs, transcriptional misregulation, and pathways involved in melanoma and bladder cancer (Fig. 1c).

Prognostic signature construction and evaluation in the training set
To identify dysregulated genes that may be associated with OS, we performed univariable Cox regression analysis and uncovered 29 genes that significantly correlated with OS (Table S1). To narrow down to the most important prognostic genes, we used LASSO regression analysis, which revealed 5 dysregulated genes as being potential critical indicators of USC survival (Fig. 2a). Next, multivariable Cox regression analysis narrowed down to a signature 4 genes, KRT23, CXCL1, SOX9 and ABCA10 (Fig. 2b) that effectively predict OS (Table 2). Among these, KRT23, CXCL1, and SOX9 exhibited positive regression coefficients, indicating a high risk of mortality. While ABCA10 showed a negative regression coefficient, implying a low mortality risk. Next, we constructed the following risk prediction formula based on the 4 prognostic genes and used it to calculate each patient's risk score in the training set: risk score = (0.5424 × expression level of KRT23) + (0.2398 ×  than the median risk score (Fig. 2c). The relationship between risk scores and survival time was showed in Fig.  2d. Visualization of the expression of the 4 genes in a heatmap revealed that the expression level of the 3 highrisk genes increasing with rising risk scores, while the low-risk gene showed an opposite correlation (Fig. 2e).
Kaplan-Meier analysis revealed that patients in high-risk group experienced worse outcomes relative to the lowrisk group (P-value = 0.003317, Fig. 2f). Relative to standard clinicopathological parameters like age, myometrium invasion, node metastasis and disease stage, the 4-gene prognostic signature scored 0.855 in AUC (area under the ROC curve) analysis, indicating superior performance over conventional prognostic factors (0.213, 0.796, 0.728 and 0.564 for age, myometrium invasion, node metastasis and stage, respectively; Fig. 2g).

Validation of the 4-gene signature in the testing set
To assess the robustness of the 4-gene prognostic signature, risk scores for the 74 USC cases in the testing set were calculated and ranked as described in section 3.3 (Fig. 3a). The relationship between risk scores and survival is shown in Fig. 3b. This analysis revealed that the expression of the 3 high-risk genes increased with rising risk scores, while the low-risk gene exhibited the opposite effect (Fig. 3c). Kaplan-Meier curve indicated that the high-risk group experienced worse outcomes relative to the low-risk group (P-value = 0.0004387, Fig. 3d). The score of 0.811 for the 4-gene signature was revealed by AUC analysis was higher than for conventional prognosis indicators (0.430, 0.752, 0.808 and 0.688 for age, myometrium invasion, node metastasis and stage, respectively; Fig. 3e), consistent with observations made in the training set.

Independent prognostic value of the 4-gene signature
To evaluate the potential of the 4-gene signature independently of conventional prognosis indicators, we used univariate and multivariate Cox regression analysis on testing set cases with reporting complete clinical features. This analysis revealed that our prognostic signature and tumor stage are both independent predictors of OS (Table 3). Next, we tested if the 4-gene signature could predict OS at different disease stages. To this end, we stratified the cases by stage into early (stage I + II) and late stage (stage III + IV). Patients in high-risk group in both early and late stage exhibited lower OS relative to those in the low-risk group (P value = 0.003306 and P value = 0.02755, respectively, Fig. 3f-g). These results indicate that the 4-gene signature has superior performance in early stage, highlighting its potential clinical application.

Evaluation of the 4-gene signature in predicting RFS
To evaluate whether the 4-gene signature could predict recurrence-free survival (RFS) in USC, TCGA-USC cases with RFS data were analyzed. Cases with RFS of < 30 days were excluded and 95 cases further analyzed. Each patient's risk scores were calculated and ranked as described in section 3.3 (Fig. 4a). The risk scores and recurrent time are shown in Fig. 4b. This analysis revealed that expression of the 4 genes increased with rising risk scores (Fig. 4c). Kaplan-Meier analysis revealed that the high-risk group had higher recurrence rate relative to the low-risk group (P value = 0.01198, Fig. 4d). The AUC analysis of the prognostic signature revealed a score of 0.737 at RFS prediction, which was higher than the scores from conventional indicators (0.151, 0.595, 0.551 and 0.632 for age, myometrium invasion, node metastasis and stage, respectively, Fig. 4e). Univariate and multivariate Cox regression analysis revealed the prognostic signature and stage as independent prognostic factors for RFS, consistent with OS analysis ( Table 4). Analysis of the effectiveness of the 4-gene signature in predicting RFS at different disease stages revealed that patients in low-risk and high-risk groups had significantly different RFS in late stage (P value = 0.003489, Fig. 4f). However, there was no difference in early stage between the two risk groups (Fig. S1).

Discussion
Here, we analyzed USC datasets from TCGA and GTEx and uncovered 1385 genes that are dysregulated USC tissues relative to normal endometrial tissue. KEGG pathway analysis revealed that these genes mainly belong to cancer-associated pathways, including melanoma and bladder cancer as well as in pathways associated with cell adhesion, cell cycle, PI3K-Akt signaling pathway, cancer-linked microRNAs and transcriptional misregulation. Disruption of cell adhesion may explain why USC tends to disseminate early, spreading to fallopian tubes or invading lymph-vascular space. The tumor suppressor, TP53 is the frequently mutated gene in USC [33]. USC's high proliferative rate dysregulated cycle control may contribute to the high relapse and mortality rates in endometrial cancers. The PI3K/AKT/mTOR signaling pathway is the most frequently dysregulated pathway in EEC [33]. In USC, PIK3CA mutation occurs in about 30% of cases [11,33,34], which is consistent with the involvement of the PI3K-Akt signaling pathway seen from our analysis. Inhibition of PI3K/AKT/mTOR signaling strongly suppresses EEC progression [35][36][37] and clinical trials targeting PI3K/AKT/mTOR signaling in solid tumors have shown promise [38]. However, the benefits of this against in endometrial cancers is controversial due to the complexity of pharmacological action and toxicity [39]. Further studies are needed to better target PI3K/AKT/mTOR signaling in endometrial cancer. The TGCA database, which offers a collection of complete transcriptomic data and associated clinical information, is publicly available for data mining [40]. To identify important dysregulated genes associated with USC outcomes, we used LASSO and Cox regression analysis. LASSO is widely applied in modeling highdimensional data and avoids overfitting risk and improves prediction accuracy [41]. Our analysis generated a 4-gene signature for predicting USC OS by calculating each patient's risk score. We find that patients with high scores exhibit poor outcomes relative to those with low scores, an observation that was validated in both the training and testing sets. ROC curve analysis revealed this signature's superiority over conventional prognostic parameters (age, myometrium invasion, node metastasis, and stage) in the training and testing sets. Our data show that both the 4-gene signature and disease stage are independent prognostic indicators OS. Patients with late-stage of the disease have an unfavorable prognosis for most malignant solid tumors. However, for USC, the early-stage disease does not necessarily correlate with good prognosis due to the tumor's propensity for shedding, spreading and invading the lymph-vascular space even when the lesion confined to the endometrium or polyps. Management of patients with early stage USC is controversial [4,42,43]. Our signature identified highrisk patients in the early stage USC group who had much poorer OS relative to low-risk patients in the same group. Our data show that this signature performed better in the early stage group than in the late stage group, highlighting its potential value in guiding the management for early stage USC.
The average recurrence rate for stage IA USC after chemotherapy, radiotherapy or surgery is 8.7, 25 and 12.4% respectively. For stage IB/IC the corresponding recurrence rate are 10.8, 36.6 and 37.3%, respectively [11]. Our 4-gene signature predicts a higher recurrence risk in the high-risk group relative to the low-risk group. Consistently with our OS, ROC curve analysis, this 4gene signature exhibited superior effectiveness over conventional indicators of RFS. Both the signature and disease stage were an independent prognostic factor for RFS. Our data show that the 4-gene signature is effective at RFS prediction in late stage disease but showed no difference between high and low-risk groups in early stage. This may be due to too few recurrent cases (8 cases out of 45 cases) in early stage in the TCGA cohort.
The 4 genes in the signature have been associated with various cancers. KRT23 has been implicated as an oncogene in liver cancer [44] and colorectal cancer [45]. CXCL1 is overexpressed in EEC tissue relative to normal endometrium and promotes tumorigenesis by promoting neutrophil chemotaxis [46]. Snail induces ovarian epithelial-mesenchymal transition via CXCL1 and CXCL2, representing an immunological therapeutic target [47]. SOX9 overexpression in uterine epithelium may induce endometrial hyperplastic lesions [48], promoting endometrial cancer cell proliferation [49]. ABCA10 has been proposed as a prognostic marker in ovarian carcinoma [50]. Germline single nucleotide polymorphisms in ABCA10 may affect follicular lymphoma overall survival [51]. So far, none of the 4 genes has been associated with USC, though CXCL1 and SOX9 are associated with EEC progression.

Conclusion
Here, we an analysis of USC genome-wide expression profiles in TCGA and GTEx datasets. We have identified genes that are dysregulated in USC and explored their molecular functions and pathways. More importantly, we have developed and validated a 4-gene signature that robustly predicts USC OS and RFS. This signature is an independent prognostic indicator that is more superior to conventional indicators of USC prognosis, especially when predicting OS in early stage of USC. Our findings highlight the potential of this signature as a guide for personalized USC treatment. However, more independent cohorts are needed to validate the signature and to elucidate the molecular mechanisms of these predictive genes in USC.