A weighted gene co-expression network
We identified a total of 26 modules in the GSE45216–98774 dataset and highlighted these separately with different colors. We calculated correlations between modules and clinical characteristics, as shown in Fig. 2a. According to the correlation between MEs and characteristics, module 5 was the most relevant for cSCC. This module contained 1742 genes (Supplementary Table 1). Module 23 was the most relevant for AK and included 31 genes (Supplementary Table 2). Module 9 was the most relevant for normal samples and included 352 genes (Supplementary Table 3).
We identified a total of 12 modules in the GSE108008 dataset and highlighted them separately with different colors. We calculated the correlations between modules and clinical characteristics, as shown in Fig. 2b. Module 5 was the most relevant for cSCC and had 249 genes (Supplementary Table 4). Module 10 was the most correlated module for AK and contained 324 genes (Supplementary Table 5). Module 7 was the most correlated module for normal samples and included 525 genes (Supplementary Table 6). Next, these modules would be further analyzed.
Gene ontology and pathway enrichment analyses
GO and KEGG pathway enrichment analyses were performed for genes in the module 5 (relevant for cSCC) of the GSE45216–98774 dataset. The most overrepresented GO terms are listed in Fig. 3a and Supplementary Table 7. They were associated with glutathione, collagen, extracellular matrix and cytokine, among others. According to the KEGG database, the genes in module 5 were mainly enriched in the TNF signaling pathway, glutathione metabolism, cytokine-cytokine receptor interaction, hepatocellular carcinoma, colorectal cancer and focal adhesion, among others (Fig. 3b and Supplementary Table 8). GO and KEGG pathway enrichment analyses were performed for genes in module 23 (relevant for AK). No overrepresented GO term with an adjusted q < 0.05 was found. According to the KEGG database, genes in the module 23 were mainly enriched in one pathway of mineral absorption (Supplementary Table 9). GO and KEGG pathway enrichment analyses were also performed for genes in module 9 (relevant for normal samples). Overrepresented GO terms were associated with scavenger receptor activity, among others (Supplementary Table 10). According to the KEGG database, genes in module 9 were mainly enriched in Wnt signaling pathway and complement and coagulation cascades (Supplementary Table 11).
GO and KEGG pathway enrichment analyses were performed for genes in module 5 (relevant for cSCC) of the GSE108008 dataset. Figure 3c and Supplementary Table 12 list the top overrepresented GO terms. Notably, they were associated with collagen and extracellular matrix, among others. According to the KEGG database, genes in module 5 were mainly enriched in one pathway of focal adhesion (Supplementary Table 13). We performed GO and KEGG pathway enrichment analyses for genes in module 10 (related to AK). The most overrepresented GO terms were associated with key enzymes in controlling the synthesis of fatty acid and triglycerides, among others (Fig. 3d and Supplementary Table 14). According to the KEGG database, genes in the module 10 were mainly enriched for fatty acid metabolism, steroid biosynthesis and terpenoid backbone biosynthesis, among others (Fig. 3e and Supplementary Table 15). We also performed GO and KEGG pathway enrichment analyses for genes in module 7 (related to normal samples). The most overrepresented GO terms were associated with sulfur compound binding, Wnt-protein binding, extracellular matrix structural constituent and fibronectin binding, among others (Supplementary Table 16). According to the KEGG database, genes in module 7 were mainly enriched for Wnt signaling pathway (Supplementary Table 17).
According to the results of KEGG pathway enrichment for cSCC, focal adhesion was enriched in both datasets. Focal adhesions are composed of adhesins outside the cell membrane, integrins on the cell membrane and cytoskeletal proteins in the cell. The functions of focal adhesion are mechanical structure function and signal transmission function. Abnormalities in this pathway may be one of the causes of cSCC pathogenesis. Wnt signaling pathway for normal tissue was enriched in both datasets. Wnt signaling is a key pathway in controlling skin development and homeostasis, which indicates the importance of this pathway for the maintenance of normal skin function. As for AK, the two datasets were not enriched in common pathways associated with AK.
Detection of DEGs
By comparing the cSCC and AK samples in the GSE45216–98774 dataset, we identified a total of 1432 DEGs (Fig. 4a and Supplementary Table 18), including 722 down-regulated genes and 710 up-regulated genes. By comparing AK and normal samples, we found 696 DEGs (Fig. 4b and Supplementary Table 19). Among them, 377 genes were down-regulated and 319 genes were up-regulated. We identified 599 relevant DEGs present in all the comparisons, indicating that they may play continuous roles in AK and cSCC development (Fig. 4c).
By comparing cSCC and AK samples in the GSE108008 dataset, we identified a total of 183 DEGs (Fig. 4d and Supplementary Table 20), including 65 down-regulated genes and 118 up-regulated genes. By comparing AK and normal samples, we found 52 DEGs (Fig. 4e and Supplementary Table 21). Of these, 39 were down-regulated and 13 were up-regulated. We identified 46 relevant DEGs present in all the comparisons, indicating that they may play continuous roles in AK and cSCC development (Fig. 4f). Among them, 18 genes were present in both the GSE45216–98774 and GSE108008 datasets, including ACER1, ACSBG1, ACSL1, APOD, CST6, CYB5A, DGAT2, ELOVL4, FAXDC2, IL24, MET, MMP1, MMP10, MMP12, PLEK2, PSAPL1, PTHLH and STC1. Thus, these genes have a relatively high priority for future research.
Hub gene identification and validation
In the GSE45216–98774 dataset, 418 DEGs (from the cSCC/AK samples comparison) were also present in module 5 (relevant for cSCC). Ten DEGs (from the AK/normal samples comparison) were also present in module 23 (relevant for AK). One gene from module 23 and 51 genes from module 5 fulfilled the screening criteria of hub genes in the co-expression network.
In the GSE108008 dataset, 21 DEGs (from the cSCC/AK samples comparison) were also present in module 5 (relevant for cSCC). Thirty-two DEGs (from the AK/normal samples comparison) were also present in module 10 (relevant for AK). Sixteen genes from module 5 and 30 genes from module 10 fulfilled the criteria for hub genes in the co-expression network.
We considered the hub genes present in both the GSE45216–98774 and GSE108008 datasets as validated hub genes. Finally, there were seven hub genes for cSCC, including GATM, ARHGEF26, PTHLH, MMP1, POU2F3, MMP10 and GATA3. We visualized the expression of these seven hub genes between cSCC and AK samples by plotting heatmaps (Fig. 5a and b). However, no hub gene was validated for AK.
Hub gene expression in pan-cancer
To understand the expression of hub genes in other cancers, we investigated the expression levels of 7 hub genes in primary patient tumors of 21 cancer types that have at least 3 paired adjacent normal samples (Fig. 6 and Supplementary Table 22). All hub genes except POU2F3 showed significant differential expression in different cancer types. However, the direction of the altered expression varies for each gene and for each cancer type. Although PTHLH, MMP1 and MMP10 were mainly up-regulated in the 21 cancer types, the rest of the members GATM and ARHGEF26, were primarily down-regulated with a few exceptions. GATA3 was both up-regulated and down-regulated in different cancer types.
Association of hub gene with patient overall survival in pan-cancer
We further tested whether the expression of hub genes could also predict patient survival outcomes in other cancers. For the survival analysis, all 33 cancer types were tested with the Kaplan–Meier method. The results showed that each of the hub genes was significantly associated with the survival of several cancer types (Supplementary Table 23); however, the direction of the association varied depending on the member queried and the cancer type tested. More specifically, increased expression of MMP1, MMP10 and PTHLH was mainly associated with increased survival disadvantage. MMP1 predicted poor prognosis of patients with ACC, CESC, KICH, KIRP, LIHC, LUAD, MESO, PAAD, SARC and UVM. MMP10 predicted poor prognosis for patients with LGG, LIHC, MESO and SARC. PTHLH predicted poor prognosis for patients with KIRP, LGG, LIHC, MESO, PAAD, THCA and UVM. In contrast, increased expression of GATM, ARHGEF26 and POU2F3 was primarily associated with survival advantage and predicted better survival. GATM predicted good prognosis for patients with ACC, KIRC, SARC and UCEC. ARHGEF26 predicted good prognosis for patients with KIRC, OV and PAAD. POU2F3 predicted good prognosis for patients with STAD and THCA. The rest of GATA3 were associated with either survival advantage or disadvantage depending on the cancer types. In more detail, increased expression of GATA3 predicted poor prognosis for patients with ACC, GBM, KIRP, LGG and UVM, but predicted survival advantage for patients with BLCA, SKCM and THYM. It is worth noting that only GATA3 was significantly associated with the survival of SKCM, another skin cancer.