A multi-step approach was used to identify and analyze the autophagy-dependent ferroptosis-related key gene in LUAD. The transcriptome and clinical information were downloaded from The Cancer Genome Atlas (TCGA) project and Gene Expression Omnibus (GEO) data. Autophagy-dependent ferroptosis-related genes were screened by the published articles. Differentially expressed genes (DEGs) related to autophagy-dependent ferroptosis were identified. Univariate and multivariate Cox analyses were applied to screen out the independent prognosis genes related to overall survival (OS). The key gene was identified by the intersection of the DEGs and the prognostic genes. The LUAD patients were classified into the high-risk and low-risk groups based on the key gene expression level. Kaplan-Meier (K-M) analysis and receiver operating characteristic (ROC) curve were conducted to analyze the survival prognosis of patients in TCGA and GEO cohorts. Chemotherapy sensitivity was predicted between different risk groups. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) were conducted to investigate the potential bio-function of the key gene. ImmuneScore was calculated using the Tumor Immune Estimation Resource (TIMER) algorithm, and the TMB was counted as the total number of mutations per megabyte of tumor tissue.
LUAD patients dataset processing
All the RNA-Seq data were normalized as fragments per kilobase of transcript per million mapped reads. mRNAs ensemble gene identities were derived from the HUGO Gene Nomenclature Committee (HGNC) database. The corresponding clinical information includes age, gender, tumor grade, lymph node metastasis, AJCC TNM stages, and survival outcomes. Patients with insufficient clinical data were excluded. OS was estimated as the primary endpoint.
Construction and validation of an autophagy-dependent ferroptosis-related gene signature
Autophagy-dependent ferroptosis-related genes were retrieved from the literature published before January 2021. After combining the related mRNA expression and the clinical data, the gene expression files were obtained. The DEGs between LUAD and normal lung tissues were identified with a false discovery rate (FDR) < 0.05 in the TCGA cohort. Univariate and multivariate Cox analyses of OS were performed to screen the genes with prognostic values in TCGA-LUAD cohort. The key gene was identified by the intersection of the DEGs and the prognostic genes in the TCGA cohort. The cut-off score was defined as the median expression level of the key gene in the LUAD cohort. Patients were stratified into high-risk and low-risk groups based on the cut-off score. To choose appropriate matching cohorts to perform survival analysis before selecting prognostic-related genes, we performed propensity score matching to reduce the selection bias between the high- and low-risk groups. Propensity scores were estimated using age, gender (male versus female), TNM stage (I, II, III, IV), Tumor stage (T1, T2, T3, T4), Lymph Node stage (N0, N1, N2, N3) and Metastasis stage (M0, M1) in TCGA-LUAD cohort. In the same way, propensity scores were estimated using age, gender (male versus female), TNM stage (I, II, III, IV), and TP53 (Wild versus Mutant) in the GEO cohort. The prognostic value and the clinical correlation of the key gene were both validated between the high- and low-risk groups in TCGA and GEO cohort (GSE116959). The time-dependent ROC curve analyses were conducted to evaluate the predictive power of the key gene. The mRNA expression level of the key gene in various types of cancers was identified in the Oncomine database . The mRNA and protein expression of the key gene in LUAD were determined using the Gene Expression Profiling Interactive Analysis (GEPIA) and The Human Protein Atlas (HPA) database [17, 18]. To verify the correlation between ferroptosis and LUAD outcome, we also analyze the survival value of GPX4 in the LUAD cohort, which is the master regulator of ferroptosis .
Chemotherapeutic response prediction
We analyzed the commonly used chemotherapy drugs, including pemetrexed, cisplatin, gemcitabine, paclitaxel, vinorelbine, docetaxel, doxorubicin, etoposide, erlotinib, and gefitinib . The chemotherapeutic response prediction was made based on the TCGA-LUAD cohort using the “pRRophetic” R package . The half maximal inhibitory concentration (IC50) of patients in different risk groups were compared.
Functional enrichment analysis
The biological functions and pathways of the key gene were elucidated through the DEGs between the high-risk and low-risk groups. GO enrichment and KEGG pathway analyses  were then assessed in DAVID database. The correlation analysis of the key gene with tumor proliferation and cell cycle markers was conducted in GEPIA database .
Correlation between the key gene and tumor immune cell infiltration
The enrichment levels of immune cells were quantified by the Tumor Purity, Estimate Score, Immune Score, and Stromal Score in each sample. The tumor immune cell infiltration was calculated by Single Sample Gene Set Enrichment Analysis (ssGSEA). Then we analyzed the correlation between the key gene expression and the abundance of infiltrating immune cells (B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils, and dendritic cells) via The Tumor IMmune Estimation Resource (TIMER) database .
Analyses of somatic mutations and TMB estimation
The somatic mutation profiles of LUAD patients were downloaded from TCGA database. The mutation frequency with the number of variants/the length of exons (38 million) were calculated for each sample. The OncoPlot of the top 10 mutated genes was plotted. The detailed mutational information, including the variant classification, the number of variant type, and the single-nucleotide variant (SNV) class, were displayed. Then we assessed the correlation between the key gene expression and the TMB levels.
Quantitative real-time PCR and immunohistochemistry
The mRNA and protein expression of the key gene in LUAD were determined using quantitative real-time PCR (qRT-PCR) and immunohistochemistry. The clinical tissue samples of LUAD were obtained from patients who received surgery in Thoracic Oncology Department of Sun Yat-sen University Cancer Center, which was approved by the Institutional Review Committee of Sun Yat-sen University Cancer Center. The detailed procedure was performed according to strict adherence to the manufacturers’ instructions.
For qRT-PCR, 24 paired LUAD and normal tissues were resected and stored in RNAlater immediately. Total RNA was extracted using TRIzol reagent. cDNA was synthesized from total RNA using cDNA reverse transcription kit (Thermo Fisher Scientific). qRT-PCR was performed using the SYBR Green PCR kit (Thermo Fisher Scientific). The housekeeping gene GAPDH was used as an endogenous control. Primer information: FANCD2: 5′- AAAACGGGAGAGAGTCAGAATCA-3′ (forward) and 5′- ACGCTCACAAGACAAAAGGCA-3′ (reverse); GAPDH: 5′- GGAGCGAGATCCCTCCAAAAT-3′ (forward) and 5′- GGCTGTTGTCATACTTCTCATGG-3′ (reverse). The cycle threshold (Ct) of each gene in samples was recorded. Relative quantification was calculated as 2-ΔCt (ΔCt values = target gene mean Ct value - control gene mean Ct value).
Twenty pairs of LUAD immunohistochemistry samples were fixed using 10% formalin and embedded in paraffin. Immunohistochemistry was carried out using the processed 5 μm continuous sections. Samples were dewaxed with decreasing concentrations of 100, 95, 75, and 50% ethanol and washed in deionized water. The sections were heated in a microwave with TE buffer pH 9.0 to retrieve antigens. Endogenous peroxidase was inhibited by incubation in goat serum. Then they were incubated in rabbit anti-FANCD2 (Proteintech, 204006–1-AP, 1:200) overnight at 4 °C. Next, the sections were incubated with horseradish peroxidase-coupled goat anti-rabbit secondary antibody and stained using DAB Detection Kit (Polymer). The following process is cell nucleus staining, dehydration, xylene infusion, and mounting . The immunohistochemical scores were scored by two independent pathologists. The intensity of FANCD2 expression was scored as zero, negative; one point, weak staining; two points, mild staining; three points, strong staining. The positive stained area percentage (PSAP) of FANCD2 expression was scored as 1, 0–25%; 2, 25–50%; 3, 50–75% and 4, 75–100%. FANCD2 IHC score = Intensity score × PSAP score.
Statistical analysis and R software packages
Significance analysis of microarrays was used to screen the differentially expressed genes between the LUAD and normal lung tissues. Univariate Cox proportional hazards model was used to analyze the association between gene expression level and prognosis. Kaplan-Meier method and Log-rank test were used to evaluate the difference between survival curves. The continuous and categorical variables between the two risk subtypes were compared using the two-sided Wilcoxon rank-sum test and chi-square test, respectively. Benjamini-Hochberg method was used to adjust for multiple hypothesis testing. All P values were 2-sided, and P < 0.05 was considered statistically significant.
Statistical analyses and result visualization were performed using R software v3.6.3, v4.0.5 and v4.1.2 (“pheatmap v4.0.5”, “limma v3.6.3 ”, “survival v3.6.3”, “survminer v3.6.3”, “ggpubr v3.6.3”, “survivalROC v3.6.3”, “car v3.6.3”, “ggridges v3.6.3”, “genefilter v4.1.2”, “ggpubr v3.6.3”, “pRRophetic v4.1.2”, “ggplot2 v3.6.3”, “colorspace v4.0.5”, “stringi v4.0.5”, “clusterProfiler v4.1.2 ”, “enrichplot v4.1.2” and “maftools v4.1.2” R package).