In this study, we used a TSPG simulation to identify tumor to normal gene expression state transitions and determine which genes exhibit aberrant expression in thyroid tumors relative to the normal thyroid gland. In essence, this is a precision medicine approach where we place an individual’s tumor (n = 1) into the context of other tumor samples. We previously used this approach to identify gene shifts in a single patient with Type II papillary renal cell carcinoma, but in that study we did not consider tumor stage [13]. Here, we examined 60 individuals consisting of 10 each of six different sample types, including stage 1–4 THCA tumors from TCGA and normal thyroid tissue from TCGA or GTEx. The results from the 10 samples in each class were pooled together to represent their sample type and used to investigate whether there were consistent genetic signatures for each stage of THCA. These randomly selected samples included tumors of multiple subtypes of THCA across all stages (Table S3). Future research might validate the results of this study using a limited set of samples of only one subtype of thyroid cancer to assess potential bias in pooling samples with varying primary diagnoses.
The t-SNE plot in Fig. 1A shows that the THCA tumor samples of all stages typically segregate from the normal thyroid samples. There are a few stage 1, stage 2, and stage 3 tumors that appear to cluster with the normal tissue, meaning those tumor samples have similar gene expression profiles to normal thyroid samples. These may represent tumors with very few or very small expression changes or could potentially be a sign of error in labeling or contamination with normal tissue during the RNA sequencing process. There is also evidence of two separate clusters of tumor samples containing approximately the same proportion of each stage. Further examination is needed to determine the significance and cause. Some possible causes for this discrepancy may be gender, age, and race of the individuals from whom the samples were obtained as well as the cancer subtype, or metastasis status of the samples.
The representative heatmap provided in Fig. 1B demonstrates the importance of the perturbations discussed in this study. The expression vector of the tumor sample is distinct from that of the average expression in normal thyroid tissue. However, the tumor with perturbations applied shows expression levels that are very similar to the average levels from normal thyroid samples. The t-SNE plot also supports the success of TSPG because all the samples that were perturbed with a target class of GTEx normal thyroid tissue cluster with the normal thyroid samples, indicating similar gene expression profiles (Fig. 1A).
Across all sample types, there was a greater average number of unique significantly up-regulated genes than unique significantly down-regulated genes in the original sample compared to expression in GTEx normal thyroid tissue (Fig. 2). Despite this difference, the proportion of shared perturbed genes to total unique perturbed genes between two samples of the same type was similar in the tumor-downregulated and tumor-upregulated directions across tumor samples of all stages (Fig. 4).
We expected many of the genes related to normal thyroid function would be consistently down-regulated in all tumor samples. However, this was not supported by the functional enrichment because three of the tumor classes had no significantly enriched terms for their tumor-downregulated gene sets (Table S6). Additionally, the stage 4 tumor-downregulated genes were mostly enriched for terms related to translation, none of which were unique to this gene set because they were also significant for the down-regulated genes in both normal sample types.
There were some genes that were consistently up-regulated or down-regulated in tumor samples of all or most stages, which suggests they play a role in tumorigenesis and may maintain a consistent abnormal expression level even as the tumor progresses (Table 3). Additionally, there were some unique genes found that were significantly up- or down-regulated in all tumor samples of only one stage. These are good candidate genes for future studies to investigate thyroid cancer progression to later stages.
Some genes were found to be consistently up- or down-regulated in certain sample types (Table 3). SLC6A15 was down-regulated in all individuals with stage 1, 2, or 3 THCA. Previous studies suggest this gene acts as a tumor suppressor [25]. PLA2G12B was up-regulated in all tumor samples. Previous studies have observed relationships between other genes in the phospholipases A2 (PLA2) superfamily with various cancers, like over-expression of PLA2G5 correlated with poor prognosis in patients with glioma tumors and differential expression of some PLA2 genes in normal colons and colon adenocarcinomas [26, 27]. Proteins belonging to the PLA2 superfamily are involved in metabolism, which means they may be an important indicator of normal thyroid function [28]. Therefore, disruption of PLA2 protein expression levels, as seen in all THCA tumor samples in this study, may be a potential signal of abnormal thyroid function and thyroid cancer development. REN was up-regulated in all later stage (stages 3 and 4) tumor samples. This gene codes for the enzyme renin, which is a component of the renin-angiotensin system [29]. Previous research indicates that the renin-angiotensin system has a role in multiple cancer types, likely due to its regulation of angiogenesis [30, 31]. Angiogenesis is an important process in cancer because increased blood flow to the tumor allows it to grow larger [32]. Since tumor size is part of the staging system, with larger tumors classified as later stages, it is possible that the up-regulation of REN, which induces angiogenesis, is a factor in the progression of cancer to a later stage.
There were no significantly perturbed genes that were found in all 10 GTEx normal-to-GTEx normal control samples tested (Table 3). This suggests that the perturbations applied to the GTEx normal thyroid samples, to make them appear as the target of GTEx normal thyroid tissue, were unique to individuals and represent the expected natural variations in expression among healthy individuals [33]. Whereas there were common positive and negative perturbations among the tumor samples classified as the same stage, which indicates common changes in expression that lead to the cancer phenotype. However, it should be noted that the normal thyroid samples from TCGA also had a relatively large number of common significantly up-regulated genes in all 10 samples. It is not clear whether this is a result of batch error which differentiates the TCGA and GTEx datasets, which was corrected using the methods described by Wang et al. [18]. One future experiment that may reveal more about this occurrence is to classify all normal thyroid tissue from either TCGA or GTEx together as “normal” to see if using the average expression of all normal samples as the target would eliminate the apparent difference between samples from TCGA and GTEx.
Table 2 shows that the significant perturbations applied to the normal thyroid samples, from either GTEx or TCGA, tend to be smaller than those applied to THCA tumor samples of any stage. This is a result of our method for determining the significantly perturbed genes for each sample which included those that were greater than 2 standard deviations away from the mean perturbation value within that sample. Since any of the genes in the normal samples required only small perturbations to match the average normal expression levels, Fig. 2 shows there were similar numbers of genes that were considered significantly perturbed in tumor and normal samples. There do not appear to be consistent differences in the number or value of perturbations between stages of THCA.
In this study, genes with a perturbation value of two standard deviations greater or less than the mean of all perturbations for a particular sample were deemed significantly perturbed. We were most interested in these genes that required the largest changes in expression to return to normal values because they would represent the genes that are most strongly up-regulated or down-regulated in tumors so likely have a role in tumorigenesis. However, past research has suggested that small changes in gene dosage can have a role in cancer development [34]. Future research utilizing TSPG to understand an individual’s cancer progression could consider genes with a perturbation value exceeding a threshold based on the gene’s average expression level in normal samples in order to include more subtle changes to gene expression.
Functional enrichment results for the significant tumor-downregulated and tumor-upregulated genes in each sample type revealed some similarities among the gene sets. All sample types showed enrichment for olfactory receptor activity and related terms in the tumor-upregulated gene set (Table S5). This result for the tumors aligns with previous research finding connections between abundance or stimulation of olfactory receptors and cancer [35, 36]. However, the enrichment of these functions in both normal-to-normal perturbed gene sets means these findings should be considered cautiously. We would not expect the genes perturbed in normal samples with a normal target to be enriched for relevant functions because those perturbations are expected to be random. The similar functional enrichment results for tumor-upregulated genes in all classes may indicate bias in the genes perturbed by TSPG or could potentially suggest the presence of unrecognized THCA tumor contaminating the TCGA or GTEx normal thyroid tissue. The lack of functional enrichment for down-regulated genes in tumors of stages 1, 2, and 3 was also surprising (Table S6). This is because we expected genes related to normal thyroid function to be down-regulated consistently among THCA tumor samples.
This research expands on the use of TSPG to determine how gene expression in individual tumor samples differs from that of the corresponding normal tissue. We analyzed 10 samples from each sample type perturbed toward a target of normal thyroid tissue (GTEx) to identify consistent changes in expression in different stages of THCA. This method could generate gene sets that could be used to classify tissue samples based on clinical attributes such as pathologic stage. Future experiments could explore the classification accuracy of these significantly perturbed gene sets using an existing public repository [37]. Another area to develop our understanding of differences between stages of cancer would be to look at differential expression of the identified candidate genes in samples from each stage. We also propose the use of these methods in other cancer types to determine if the differences between stages are unique to each cancer or if there are similarities.