Development of an oncogenic dedifferentiation SOX signature with prognostic significance in hepatocellular carcinoma

Background Gradual loss of terminal differentiation markers and gain of stem cell-like properties is a major hall mark of cancer malignant progression. The stem cell pluripotent transcriptional factor SOX family play critical roles in governing tumor plasticity and lineage specification. This study aims to establish a novel SOX signature to monitor the extent of tumor dedifferentiation and predict prognostic significance in hepatocellular carcinoma (HCC). Methods The RNA-seq data from The Cancer Genome Atlas (TCGA) LIHC project were chronologically divided into the training (n = 188) and testing cohort (n = 189). LIRI-JP project from International Cancer Genome Consortium (ICGC) data portal was used as an independent validation cohort (n = 232). Kaplan-Meier and multivariable Cox analyses were used to examine the clinical significance and prognostic value of the signature genes. Results The SOX gene family members were found to be aberrantly expressed in clinical HCC patients. A five-gene SOX signature with prognostic value was established in the training cohort. The SOX signature genes were found to be closely associated with tumor grade and tumor stage. Liver cancer dedifferentiation markers (AFP, CD133, EPCAM, and KRT19) were found to be progressively increased while hepatocyte terminal differentiation markers (ALB, G6PC, CYP3A4, and HNF4A) were progressively decreased from HCC patients with low SOX signature scores to patients with high SOX signature scores. Kaplan-Meier survival analysis further indicated that the newly established SOX signature could robustly predict patient overall survival in both training, testing, and independent validation cohort. Conclusions An oncogenic dedifferentiation SOX signature presents a great potential in predicting prognostic significance in HCC, and might provide novel biomarkers for precision oncology further in the clinic. Electronic supplementary material The online version of this article (10.1186/s12885-019-6041-2) contains supplementary material, which is available to authorized users.


Background
Liver cancer ranks the fifth most prevalent cancers in the world and the second leading cause of cancer death. Lack of suitable biomarkers for early detection and limited treatment strategies are the major causes of high mortality [1]. Although it's still under debate whether cancer originates from embryonic stem cells or undergoes dedifferentiation from terminally differentiated cells, the critical roles of developmental signaling pathways in cancer initiation and malignant progression have been widely accepted [2,3]. Increasing evidences suggested that critical molecules which regulate embryonic stem cell pluripotency and differentiation are usually activated in the tumor tissue [4][5][6]. Aberrant activation of those developmental networks can also induce retro-differentiation or trans-differentiation between different cellular lineages including liver progenitors, hepatocytes, and cholangiocytes, which constitute the cellular heterogeneity of liver cancer [7][8][9]. Monitoring the extent of tumor dedifferentiation and patient prognosis might help define different subgroups of patients for precision treatment. However, effective biomarkers are still lacking for clinical use.
The Sox (Sry-related high-mobility groupbox) family of transcription factors have been well appreciated in multiple aspects of development including sex determination, embryogenesis, organogenesis, neurogenesis, skeletogenesis and hematopoiesis [10,11]. SOX proteins are functionally divided into 9 subgroups termed A to H according to the degree of similarity of their HMG-box amino acids and flanking regions: Subgroup A (SRY), Subgroup B1 (SOX1, SOX2 and SOX3), Subgroup B2 (SOX14 and SOX21), Subgroup C (SOX4, SOX11 and SOX12), Subgroup D (SOX5, SOX6 and SOX13), Subgroup E (SOX8, SOX9 and SOX10), Subgroup F (SOX7, SOX17 and SOX18), Subgroup G (SOX15) and Subgroup H (SOX30) [12][13][14]. Beyond the functions of well-established regulators of development, growing evidences have linked SOX families with human diseases, particularly in tumors. SOX family members were shown to mastermind the tumor initiating potential of cancer cells in driving cancer pluripotent stem cells establishment, stem cell maintenance, and lineage fate determinant in various types of cancers [15][16][17][18][19][20]. In the present study, we established a novel oncogenic dedifferentiation SOX signature to effectively monitor the extent of tumor dedifferentiation and predict patient prognosis in HCC. Further incorporation of the gene signature into clinical RNA-seq profiling might help identify groups of highrisk patients for precision medicine.

Clinical cohort and RNA-seq data sets
We obtained RNA-seq mRNA expression data and clinical pathological data of liver cancer from the LIHC project of TCGA (https://tcgadata.nci.nih.gov/tcga/). The data was downloaded using the University of California Santa Cruz cancer genomics data portal UCSC Xena (https://xena.ucsc.edu/). The LIHC project contains 50 normal liver tissue samples and 377 primary liver cancer tissue samples. Samples from TCGA data set were divided chronologically into training (TCGA-LIHC Cohort I, n = 188) and testing cohorts (TCGA-LIHC Cohort II, n = 189), and we did not find any bias in TCGA test and validation set in case bias analysis. A total of 232 samples with RNA-Seq mRNA expression data and clinical pathological data were obtained from the ICGC portal (https://dcc.icgc.org/projects/LIRI-JP) as an independent validation cohort. These samples belong to a Japanese population primarily infected with HBV/HCV [21]. We used the normalized read count values given in the gene expression file. Detailed clinical background information of the patients could be found in Additional file 1: Table S1. Studies using human tissues were reviewed and approved by the Committees for Ethical Review of Research involving Human Subjects of Guangzhou Medical University. The studies were conducted in accordance with International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS). All patients gave written informed consent for the use of their clinical specimens for medical research.
Statistical analysis and signature score generation The differential expression profiles between tumor tissues and the normal liver tissues were generated based on the normalized expression value of RNA-seq data. Independent student's t test was used to compare the mean expression level of two different groups. One-way ANOVA test was used to compare means between 3 and more subgroups. The test was performed in Graph-Pad Prism 5 (La Jolla, CA, USA). Kaplan-Meier survival curves of the two risk groups were plotted and the logrank P value of the survival difference calculated between them. The association of SOX signature subgroups with clinical features was examined by Pearson's χ 2 test. Univariate and multivariable Cox proportional hazards regression was used to assess association with overall survival using SPSS v19 (IBM, Inc., Chicago, IL, USA). P value less than 0.05 was considered statistically significant. The oncogenic dedifferentiation SOX signature was generated by taking into account the expression of individual sox family genes and their clinical association with patient overall survival time. A SOX signature score was calculated according to the expression of each signature gene. HCC patient with overexpression (defined as the normalized expression value above median in the tumor tissues) of each sox signature gene will be given "1" score. The sum of the 5 SOX signature genes (SOX3, SOX4, SOX11, SOX12, SOX14) forms the final SOX signature score. Patients with SOX signature score value greater than 2 was defined as "High SOX signature group", and with score value less than and including 2 was defined as "Low SOX signature group". The cBio Cancer Genomics Portal was used to establish a network connection of SOX signature targets and other closely associated genes [22,23]. Gene ontology analysis and signaling pathway analysis was performed using DAVID Bioinformatics Resources [24,25].

RNA extraction and quantitative real-time PCR
Total RNA was extracted using TRIZOL Reagent (Life technologies, Carlsbad, CA), and reverse transcription was performed using an Advantage RT-for-PCR Kit (Clontech Laboratories, Mountain View, CA) according the manufacturer's instructions. For qPCR analysis, aliquots of double-stranded cDNA were amplified using a SYBR Green PCR Kit (Life technologies, Carlsbad, CA) and an ABI PRISM 7900 Sequence Detector. Sequences of primers used in this study were listed in Additional file 2: Table S2. For cell lines, the relative gene expression is given as 2 −ΔCT (ΔCT = CT (gene) -CT (18S)) and normalized to the relative expression that was detected in the corresponding control cells. For clinical samples, we calculated the relative expressions of target genes in clinical HCCs and their matched nontumor specimens by the formula 2 −ΔCT (ΔCT = CT (target genes) -CT (18S)) and normalized to the average relative expression in all of the nontumor tissues, which was defined as 1.0.

Immunohistochemical staining (IHC)
Paraffin-embedded tissue sections were deparaffinized and rehydrated. Slides were immersed in 10 mM citrate buffer and boiled for 15 min in microwave oven and then incubated with primary antibody at 4°C overnight in a moist chamber and then sequentially incubated with biotinylated general secondary antibody for 1 h at room temperature, streptavidin-peroxidase conjugate for 15 min at room temperature. Finally, the 3, 5-diaminobenzidine (DAB) Substrate Kit (Dako, Carpinteria, CA) was used for color development followed by Mayer's hematoxylin counterstaining.

Compiling a biology-based prognostic dedifferentiation SOX gene signature in HCC
Considering the important roles of the SOX gene family in regulating stem cell pluripotency, tumor cell plasticity and differentiation, we tried to establish a SOX gene signature to monitor tumor differentiation and stratify patient overall survival in HCC. To comprehensively analyze the expression profile and prognostic significance of SOX family members in HCC, The Cancer Genome Atlas (TCGA) hepatocellular carcinoma cohort was divided chronologically into a training cohort (TCGA-LIHC Cohort I, n = 188) and a validation cohort (TCGA-LIHC Cohort II, n = 189). The mRNA expression data and clinical data were downloaded using the UCSC XENA portal. The demographics of these cohorts were well balanced, and the clinical pathological information was shown in Additional file 1: Table S1. The relative expression of all 19 SOX family members excluding SRY, which was absently expressed in both liver and HCC tissues, was compared in the 188 HCC cases from TCGA-LIHC Cohort I and 50 normal liver tissues from TCGA-LIHC project. Most of the SOX family members were found to be aberrantly expressed in HCC. SOX2, SOX3, SOX4, SOX11, SOX12, SOX13, SOX14, SOX18, and SOX21 were found to be significantly up-regulated in HCC. SOX5, SOX6, SOX7, and SOX10 were found to be significantly down-regulated in HCC (Table 1). Kaplan-Meier survival analysis showed that SOX3, SOX4, SOX11, SOX12, SOX14, and SOX17 were significantly associated with patient overall survival (Table 1). Taken together, SOX3, SOX4, SOX11, SOX12, and SOX14 were aberrantly expressed in HCC with prognostic significance, and were selected as SOX signature genes for further validation (Fig. 1a). The significant upregulation of the SOX signature genes were further confirmed by qPCR in 21 paired HCC clinical samples (Additional file 3: Figure S1). Overexpression of the representative SOX signature gene SOX11 was also found in paired HCC tissues by IHC staining (Additional file 4: Figure S2).

The SOX signature represents an oncogenic dedifferentiation phenotype
In clinical pathology, tumor grade represents the extent of how tumor tissues resemble their normal counterparts. High grade tumors usually show oncogenic dedifferentiation phenotypes. The expression of SOX signature genes was examined in subgroups of patients with different tumor grade. A progressive increase of SOX signature genes could be found from low grade HCC patients to high grade HCC patients (Fig. 1b). In addition, the expression of SOX signature genes also progressively increases from early stage HCC patients to late stage HCC patients (Fig. 1c). Poorly differentiated tumors usually indicate the activation of cancer stem cells or progenitor cells. This process is accompanied with increase of stem cell markers, and decrease of terminal differentiation markers. We further established a score system to quantitatively define the SOX signature in HCC patients. Patient with overexpression (defined as the normalized expression value above median level in the tumor tissues) of each sox signature gene will be given "1" score, and the sum of the 5 SOX signature genes forms the final SOX signature score. We examined the liver cancer stem cell or progenitor markers (AFP, CD133, EPCAM, and KRT19), and hepatocyte terminal differentiation markers (ALB, G6PC, CYP3A4, and HNF4A) in subgroup of patients with different SOX signature scores. A significant positive correlation of liver cancer stem cell or progenitor markers, and a significant negative correlation of hepatocyte terminal differentiation markers with SOX signature scores could be found in the HCC patients ( Fig. 2a and b). These findings indicated that the SOX signature represents an oncogenic dedifferentiation phenotype, and is activated in high grade and late stage tumors.

Prediction of the SOX signature-regulated transcriptional network
Considering the SOX family members are transcriptional factors that regulate gene expression, the binding motifs and downstream targets of SOX signature genes were predicted using a systems genetics approach [26]. The common downstream targets of the five SOX signature genes were plotted using the online Venn diagram tool (http://bioinformatics.psb.ugent.be/webtools/Venn/). A total of 245 genes were found to be commonly regulated by the SOX signature (Fig. 3a, Additional file 5: Table  S3). High-frequency binding motifs of each SOX signature genes were also predicted (Fig. 3b). The downstream targets of SOX signature genes formed a comprehensive network, which closely associated with critical transcriptional regulators of embryonic development including TP53, ZEB1, SMARCA2, and JARID2 (Fig. 3c). Gene ontology analysis also revealed the signaling pathways significantly associated with SOX signature target genes (Fig. 3d).

The association of SOX signature with clinical pathological features in HCC
To investigate the clinical significance of SOX signature, the patients were further classified into two subgroups. The "High sox signature group" was defined with a sox signature score greater than 2, and the "Low SOX signature group" was defined with a sox signature score less than and including 2. The association of the SOX signature with clinical pathological features were examined by Pearson's χ 2 test in the TCGA-LIHC Cohort I ( Table 2). The five-gene SOX signature was further tested in two independent clinical cohorts for validation using the same risk score threshold chosen in the TCGA-LIHC cohort I. The association of the SOX signature with clinical pathological features were also examined by Pearson's χ 2 test in the TCGA-LIHC Cohort II and the LIRI-JP Cohort ( Table 2).

The relation between the SOX signature and the prognosis of HCC patients
Kaplan-Meier survival analysis showed that the "High SOX signature group" had significantly worse overall survival than the "Low sox signature group" in the TCGA-LIHC Cohort I (HR = 4.045, 95% CI = 2.174-7.525, P = 0.000). The progressive decrease in mean survival time could also be found when the curves were plotted according to different sox signature scores (Fig. 4a). The SOX signature significantly stratified the TCGA-LIHC cohort II for overall survival (HR = 1.618, 95% CI = 1.023-2.560, P = 0.040) (Fig. 4b, Table 3). In a second independent LIRI-JP Cohort, again using the same risk score in the TCGA-LIHC cohort I, the SOX signature was also able to significantly stratified patients for overall survival (HR = 2.012, 95% CI = 1.031-3.926, P = 0.041) (Fig. 4c). In addition, Cox proportional hazards regression analysis further indicated the SOX  The normalized expressions of SOX signature genes were compared between HCC patient subgroups with different tumor grade. c The normalized expressions of SOX signature genes were compared between HCC patient subgroups with different tumor stage. Independent student's t test, *, P < 0.05, **, P < 0.01, ***, P < 0.001, ****, P < 0.0001, ns, not significant. The figures were generated using GraphPad Prism 5 Fig. 2 The SOX signature represents an oncogenic dedifferentiation phenotype. a The normalized expressions of liver cancer dedifferentiation markers and liver progenitor cell markers in HCC patients with different SOX signature score. b The normalized expressions of hepatocyte terminal differentiation markers in HCC patients with different SOX signature score. One-way ANOVA test. P value less than 0.05 was considered statistically significant. The figures were generated using GraphPad Prism 5 signature as a promising predictor of patient overall survival both in the univariate overall survival analysis (Table 3). These results suggested that our newly established oncogenic dedifferentiation SOX signature could robustly predict HCC patient's overall survival in multiple clinical cohorts.

Discussion
Clinical observation of poorly differentiated tumors preserving lineage characteristics of their developmental precursor cells, indicated the strong link between tumor aggressiveness and embryonic developmental [27,28]. Hepatocellular carcinoma (HCC) is one of the most common cancers in the world, with very poor prognosis and limited treatment methods [29]. Like many other tumors, HCC also gains embryonic-like properties, such as elevated expression of alpha-fetoprotein (AFP), which should only appear in fetal liver development. A subtype of HCC, which was usually characterized by molecular markers of bipotential hepatic progenitor cells such as CD133, EPCAM, and CK19, is predicted to have an extremely poor prognosis. [28] The critical transcriptional factors and their regulated signaling pathways governing lineage specification in development are reactivated in cancer cells and substantially contribute to malignant phenotypes such as tumor growth, metastasis, and resistance to chemotherapeutic drugs [30,31]. Further targeting the oncogenic driving events according to tumor dedifferentiation status might provide novel therapeutic strategy for cancer treatment [32,33]. However, biomarkers which effectively reflect the extent of HCC tumor dedifferentiation and predict patient's outcome are still lacking currently.
In the present study, we developed a novel oncogenic dedifferentiation SOX signature and a score system to monitor the extent of tumor dedifferentiation in HCC. Taking into account the expression of individual SOX family genes and their clinical association with patient overall survival time, five SOX family members were selected as SOX signature genes. A progressive increase of liver cancer dedifferentiation markers was found from HCC patients with low SOX signature scores to patients with high SOX signature scores. Conversely, hepatocyte terminal differentiation markers were found to be  Fig. 4 The prognostic significance of SOX signature genes in multiple HCC clinical cohorts. a The patients in the training set (TCGA-LIHC Cohort I, n = 188) were divided into "High sox group" and "Low sox group" according to the SOX signature score. Kaplan-Meier survival curves of the two risk groups were plotted and the log-rank P value of the survival difference calculated between them (Upper panel). Kaplan-Meier survival curves of HCC patients from subgroups with different SOX signature score (Lower panel). b Similar analysis was down in the testing set (TCGA-LIHC Cohort II, n = 189). c and validated in an independent validation set (LIRI-JP Cohort, n = 232). P value less than 0.05 was considered statistically significant. The figures were generated using SPSS v19  with our previous experimental findings that the dedifferentiated tumor cells with stem cell-like properties are usually more aggressive, easy to metastasis, and resistant to chemotherapeutic drugs [34][35][36]. Previous molecular sub-classifications of liver cancer mainly focused on the genomic mutational landscapes and molecular signaling alterations of the tumors [37]. Recent data from genomic profiling enabled the proposals of different molecular clusters of HCCs according to their proliferation index, cellular origins and immune responses [38][39][40][41]. Interestingly, all the newly established classification models mentioned the evidence of a stem cell or progenitor celllike properties of poor prognostic liver tumors. However, no previous reports mentioned the molecular biomarkers in defining the differentiation status and predict prognostic significance of those embryonic-related tumors. To date, several liver cancer stem cell markers such as CD133, EPCAM, CD44, KRT19 et al. have been identified and well characterized. However, due to the multiple hierarchy of stem cell progeny and the heterogeneity of the tumor, it's difficult to define a tumor dedifferentiation state using a single cell surface marker. Considering the tumor dedifferentiation process is driven by transcriptional reprograming, we for the first time tried to define tumor differentiation status using a combination of pluripotent transcriptional factors instead of cell surface markers. Instead of stem cell or progenitor biomarkers, sox family are transcriptional factors that regulated a broad range of gene expression and critical cell fate determinants. The SOX family transcriptional factors are critical in embryonic stem cell pluripotency and tumor lineage plasticity [42,43]. Liver cancer stem cell or progenitor biomarkers are usually also expressed on normal stem cells or regenerating hepatocytes, and their expression in the tumors are not necessarily up-regulated in the tumor tissues. This makes it difficult to quantify and discriminate cancer stem cells in evaluating patient prognosis. However, sox family genes are mostly expressed in embryonic stem cells and aberrant expression of SOX family members was also frequently found in HCC patients. Thus, using a combination of SOX family transcriptional factors might comprehensively represent the differentiation status of HCC patients and classify patients for precision oncology further in the clinic. and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials
The RNA-seq mRNA expression data and clinical pathological data of liver cancer from the LIHC project of TCGA was downloaded from the website: https://tcgadata.nci.nih.gov/tcga/. The data was downloaded using the University of California Santa Cruz cancer genomics data portal UCSC Xena (https://xena.ucsc.edu/). A total of 232 samples with RNA-Seq mRNA expression data and clinical pathological data from the ICGC portal was downloaded from the website: https://dcc.icgc.org/projects/LIRI-JP.
Ethics approval and consent to participate Studies using human tissues were reviewed and approved by the Committees for Ethical Review of Research involving Human Subjects (CERRHS) of Guangzhou Medical University. The studies were conducted in accordance with International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS). All patients gave written informed consent for the use of their clinical specimens for medical research.

Consent for publication
Not applicable.