Patient characteristics and data collection
Data from 208 patients who underwent craniotomy and were diagnosed with primary DHGG at Beijing Tiantan Hospital, China, were retrospectively reviewed. Patients who previously underwent surgery for other intracranial diseases or had any other concomitant malignant diseases were excluded from the study. All patients had received the standard treatment, including maximal safe resection and adjuvant radiotherapy and/or chemotherapy, depending on their tumor grade. Demographic data (age and gender) and the main complaints from patients were retrieved from an electronic medical record system. Tumor samples collected during craniotomy were flash frozen in liquid nitrogen and stored at -80 °C. Hematoxylin and eosin staining was performed, and tumor tissues containing > 80% tumor cells were selected for further analysis. Molecular pathological characteristics, including IDH1/2 mutation, chromosome 1p19q co-deletion, and MGMT promoter methylation status were assessed through pyrosequencing or fluorescence, as previously described [12, 13]. The pathological diagnosis of each patient was re-evaluated by integrating histopathology and molecular biomarkers according to the 2016 WHO classification of tumors of the central nervous system[14]. Transcriptomic data of samples collected during surgery were generated on the Agilent Whole Human Genome Array platform, as previously described [15]. RNA expression data were converted into fragments per kilobase million matrix and used in subsequent analysis.
Preoperative GRE was defined as at least one unprovoked epileptic seizure prior to surgery based on the patients' main complaint and history of illness. All patients received prophylactic AEDs after surgery, including phenobarbital injections administered within 3 d and valproate or levetiracetam tablets administered for at least 3 months. Seizure outcome was followed up via telephone interviews conducted 12 months after surgery, and postoperative seizure occurrence was defined as at least one unprovoked epileptic seizure occurring between the day of discharge and the day of follow-up. Patients with overall survival time of less than 12 months were excluded from the study. All data analyzed here, except those related to seizures, have been uploaded to the CGGA portal (http://www.cgga.org.cn/) for open access [16].
Identification of differentially expressed genes and enrichment analysis
Patients were randomly assigned to either a training or validation cohort in a 4:1 ratio. The prediction model was established using the training cohort and validated using the validation cohort. The R package “Limma” was used to screen for differentially expressed genes (DEGs) between patients with or without preoperative GRE under the criteria of adjusted p-value < 0.05 (Benjamini and Hochberg Method) and |log2(fold change)|≥ 0.8. Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis[17,18,19], and Gene Set Enrichment Analysis (GSEA) were conducted using the R package “clusterProfiler.” Adjusted p < 0.05 was used as the cut-off threshold. Enrichment analyses used the Molecular Signatures Database (MSigDB) as a reference.
Assessment of tumor-infiltrating immune cells
Immune cell infiltration in tumor samples was analyzed using the “CIBERSORT” algorithm (https://cibersort.stanford.edu/), which was used to evaluate the cell composition of tumor samples based on gene expression profiles. The proportions of 22 types of tumor-infiltrating immune cells were evaluated according to the LM22 signature genes file, with 1000 permutations. Tumor infiltration levels in patients with and without preoperative GRE were compared.
Identification of a predictive gene-signature of postoperative seizure occurrence
The DEGs were used to construct a gene-signature for postoperative seizure occurrence using logistic regression, regularized with the least absolute shrinkage and selection operator (LASSO) algorithm. The analysis was conducted using the R package “glmnet.” The risk score for each individual was calculated using the formula:
Risk score = \(\sum_{{\varvec{i}}=1}^{{\varvec{n}}}{{\varvec{\beta}}}_{{\varvec{i}}}{{\varvec{x}}}_{{\varvec{i}}}\)(1)
Where, x represents the expression level of genes selected to construct the gene-signature, and β represents the corresponding regression coefficients.
The performance of the risk score was evaluated using receiver operating characteristic (ROC) analysis and the area under the curve (AUC), implemented using the R package “pROC.” Patients in the training and validation cohorts were then divided into high-risk and low-risk groups based on the median risk score value.
Weighted correlation network analysis
Weighted correlation network analysis (WGCNA), implemented using the R package “WGCNA,” was conducted to identify key modules and genes with the highest correlation with preoperative GRE history, postoperative seizure occurrence, and the obtained gene-signature. Age, WHO grade, IDH1/2 mutation status, chromosome 1p/19q codeletion status, preoperative GRE history, postoperative seizure occurrence, and risk scores were included as variables in the analysis. Risk scores and patient age (with a cut-off of 45 years) were included as dichotomous categorical variables. The top 5000 genes (based on variance) were screened, and those with β = 5 (R2 ≥ 0.9) were selected and used to construct an unsigned scale-free co-expression gene network. The smallest module contained at least 30 genes, and analogous modules were fused at a height cut-off of 0.25. Genes in the obtained key modules were extracted for further analysis.
Identification of hub genes
WE generated a new set of DEGs based on risk scores (RDEGs) by identifying genes that were differentially expressed between high-risk and low-risk groups and selecting genes with potential effect on both the occurrence and outcome of GRE. Genes overlapping upregulated or downregulated RDEGs and genes extracted from WGCNA were selected. Protein–protein interaction (PPI) analysis was performed using the R package “STRINGdb” to explore the interactions among these genes. Genes showing high numbers of PPI interactions (those interacting with more than five other nodes) were identified as hub genes. The expression levels of hub genes in the validation cohort were also evaluated.
Construction of a prediction model for postoperative seizure occurrence
A multivariate logistic regression analysis of clinical and gene-signature data was conducted to identify the independent risk factors associated with postoperative seizure occurrence. Statistically significant factors (p < 0.05) were included in the final prediction model, and a corresponding predictive nomogram for poor seizure control was constructed using the R package “rms.” Finally, ROC analysis and AUCs were used to evaluate the accuracy of the final prediction model using the training and validation cohorts.
Statistical analysis
Statistical analysis was performed using R software version 4.0.5 (R Foundation for Statistical Computing, Vienna, Austria, www.r-project.org). Figures were generated using R packages, including “ggolot,” “pheatmap,” and “ggraph.” Chi-square and Fisher’s exact tests were used to evaluate the statistical significance of categorical variables. Patients with missing data were excluded from multivariate analyses. Statistical significance was set at p < 0.05, unless otherwise stated.