Skip to main content
Fig. 1 | BMC Cancer

Fig. 1

From: Gene-associated methylation status of ST14 as a predictor of survival and hormone receptor positivity in breast Cancer

Fig. 1

Characterization of ST14 methylation and establishment of a classifier for gene-associated methylation (GAM) status. a Kaplan-Meier curve for overall survival divided into two groups based on the median ST14 expression level (left panel, log-rank p = 0.382) and methylation level (right panel, log-rank p = 0.042) in the TCGA BRCA cohort. The median split for gene expression was based on the RNAseq data (defined as log2(RPKM+ 1)) (11.5). Average β-values across the 40 CpG probes was calculated and their median value across all samples is 0.6605. RNAseq, RNA sequencing; RPKM, reads per kilobase per million mapped reads. b Difference in ST14 gene expression levels between normal and primary tumors in the TCGA BRCA cohort (left panel, Wilcoxon’s p < 0.001). The difference of ST14 expression level between high and low methylation status groups (defined by the median methylation value) is shown on the right panel (Wilcoxon’s p < 0.001). Error bars show the standard deviation. c Methylation profile (34 CpG probes) of ST14 for TCGA tumor (blue) TCGA normal tissue (orange) and GSE75067 (red). The corresponding cgi probes and features are shown in the lower and right panels. Red arrows indicate regions with large differences (Average β-values > 0.125) between TCGA and GSE75067 tumors; black arrows indicate the regions with large differences between TCGA tumor and normal samples (d) Pearson’s correlation coefficients between genes involved in matriptase-associated or epithelial mesenchymal transition (EMT)-associated pathways and the GAM in TCGA cohort. The highest positive correlation was noted for XBP1. e Unsupervised hierarchical clustering for matriptase-associated (left panel) and EMT-associated genes (right panel) in the TCGA cohort with GAM status. The gene expression level is based on log2 (RPKM+ 1) transformation of RNAseq data, with the color bar shown in the upper-right corner: high GAM status is in cyan, and low GAM status is in pink. f Identification of the optimal lambda value for the least absolute shrinkage and selection operator (LASSO). The left panel depicts the shrinkage of coefficients and the right panel shows the binomial deviance during shrinkage. The optimal lambda was 0.003970773. g Assessment of classifier accuracy (left panel). LR, logistic regression; KNN, K-nearest neighbor; SVC1, support vector classifier 1 (using a linear kernel); SVC2, support vector classifier 2 (using a radial basis function kernel); GNB, Gaussian naive Bayes; DT, decision tree; RF, random forest. The highest accuracy was obtained with LR (accuracy = 91.31%). Receiver operating curve for LR with the area under curve value and 95% confidence interval shown in the lower part. h Normalized gene expression levels of ST14 between high and low GAM status groups for GSE5364 (upper panel, p < 0.001) and GSE22820 (lower panel, p < 0.001). The GAM status was predicted using the classifier constructed by LR

Back to article page