The Dual Effect of CD28 on The Prognosis of Lung Cancer Based On Nomograms: A Comprehensive Analysis of The Tumor Immune Microenvironment

Lung cancer has ranked rst in China in recent years, and TIME-related molecules may serve as biomarkers for the prognosis of lung cancer. Nomograms are widely used tools for the evaluation of prognosis in malignancies. We performed this study to construct nomograms based on TIME for predicting the prognosis of lung cancer. Univariate and multivariate analyses were performed to estimate prognosis. TIME-related variables and basic clinical characteristics were included in the nomograms. Discrimination and calibration were used for the internal validation of the nomograms. Patients in our center and in the TCGA database were involved in the construction of the nomograms.


Introduction
In China, lung cancer has ranked rst among the incidence rates of all malignancies in the past decade, with a 5-year overall survival (5-year OS) of 19.8%, which is lower than the global average [1]. Lung cancer is currently divided into 2 pathological types, including small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC), and NSCLC accounts for almost 85% of lung cancer patients [2]. Moreover, lung cancer shows a trend of youth, and more young patients (aged less than 45 years) are suffering from lung cancer [3]. Recent studies have provided evidence that lung cancer, especially lung adenocarcinoma (LUAD), in young patients demonstrates unique biological and genomic characteristics [4,5], which is necessary to establish a novel model for the evaluation of the prognosis of young LUAD patients.
Advances in orthodox and novel therapeutics such as surgery, chemotherapy, radiotherapy, targeted therapy and immunotherapy bene t patients with lung cancer. However, not all patients respond to these therapeutics, and some may develop resistance and die as a result of cancer invasiveness and metastasization. The tumor microenvironment (TME) has been suggested to be a convincing regulator of the biological behaviors of malignancies, especially lung cancer [6]. With the development of cancer immunotherapy, tumor immune microenvironment (TIME)-related molecules may serve as new biomarkers for the prognosis of lung cancer. TIME can be divided into four types according to the expression status of programmed death-ligand 1 (PD-L1) and tumor in ltrating lymphocytes (TILs), including Type I (PD-L1-, TILs-), Type II (PD-L1+, TILs+), Type III (PD-L1+, TILs-) and Type IV (PD-L1-, TILs+). The type of TIME is an important biomarker for the prediction of immunotherapy sensitivity [7], whereas to our knowledge, there is no prognostic model based on TIME, and the role of TIME in young lung adenocarcinoma patients remains unclear.
Nomograms are widely used tools for the clinical evaluation of prognosis in malignancies such as gastric cancer [8], breast cancer [9] and lung cancer [10]. In previous studies on nomograms, researchers mostly focused on the basic characteristics and clinical information of patients but paid less attention to the pathological changes, especially the TIME status. To our knowledge, there is no nomogram based on TIME for the prognostic prediction of lung cancer. Given that TIME plays an important role in the biological behaviors and prognosis of cancer, we performed this study to analyze the relationship between TIMErelated molecules and the prognosis of lung cancer and established novel nomograms based on TIME to predict the disease-free survival (DFS) and OS of LUAD and lung cancer patients.

Study patients
Seventy-one young patients (age < 45 years, n = 71) with resectable LUAD in our center from March 2013 to June 2016 were identi ed and included in this study. Hematoxylin and eosin (H&E) staining was used for the diagnosis of LUAD. The patient characteristics are shown in Table 1. In addition, 472 LUAD patients (n = 472) and 809 lung cancer patients (n = 809) derived from The Cancer Genome Atlas (TCGA) database were included in this study. This study was approved by the Ethics Committee of the A liated Hospital of Qingdao University, and the investigations were carried out following the rules of the Declaration of Helsinki.
Written informed consent was obtained from all patients included in the study, and all experiments were carried out in accordance with the National Health and Family Planning Commission of the PRC's guidelines. Basic information was collected from all patients in our center, including age, sex, post-surgery treatment methods, smoking history and family history. The ampli cation refractory mutation system (ARMS) method based on a ten-gene panel examination was performed to detect the status of epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK) and other basic genomic alterations related to lung cancer. Regular follow-up was conducted after the surgery of each patient, and recurrence was de ned as lung cancer disease occurring in any site of the patient. All tumors were staged according to the 2019 American Joint Committee on Cancer (AJCC) TNM staging system for lung cancer. Disease progression was diagnosed by two professional physicians in clinical medical oncology.

Immunohistochemistry analyses
Five-µm-thick sections were cut from para n-embedded tissues for immunohistochemistry (IHC) analyses. Antigen retrieval was performed by boiling the slides in 10 mM citrate buffer, pH 6.0, for 10 min followed by cooling at room temperature for 20 min. Each section was incubated with primary antibodies overnight at 4 °C. The investigators evaluated the IHC staining based on the mean optical density (MOD), as previously described in published works [11,12]. All sections were scanned by NanoZoomer slide scanners (NanoZoomer-XR C12000, Hamamatsu) and viewed with NDP.view software (NDP.view2 U12388-01, Hamamatsu). Five views (200X) of each section were randomly collected by an experienced pathological doctor using NDP.view. The section MOD was calculated according to the average MOD of the ve views by Image Pro Plus 6.0 (Media Cybernetics, Inc.). The ROC curve of each variable was calculated by SPSS 23.0 (SPSS, Inc.), and Youden indexes were used to determine the best cutoffs. CD3 + and/or CD8 + sections were divided into two pathological types [13], the center of tumor (CT) and invasive margins (IMs), by a pathological doctor. As we used CD3 to represent the total lymphocytes and CD8 for cytotoxic T lymphocytes (CTLs), the statuses of PD-L1 and CD3 were used for the classi cation of TIME types.
Fragments per kilobase of transcript per million fragments mapped (FPKM) was used for the calculation of ribonucleic acid (RNA) expression derived from the TCGA database, and the other statistical methods were the same as above. The cutoffs of all variables determined by Youden indexes are shown in Table S1.

Statistical analysis
Univariate analysis was performed for the selection of probable variables to estimate the relationship between TIME and survival. Then, multivariate analysis using the Cox proportional hazards model with P < 0.2 was used to select variables for our prognostic model [8] to reduce the in uence of sample size. Principal component analysis (PCA) was used for the detection of the most valuable variable in our model to investigate the relationships between the most valuable variable and other molecules. All of the processes above were conducted by SPSS 23.0 (SPSS, Inc.).
The "rms" package of R software version 3.1.2 (The R Foundation for Statistical Computing, Vienna, Austria) was used for the construction of the nomograms. The variables selected by Cox regression were involved in the construction process. The nomogram for DFS was based on the data from our center, while the nomograms for OS were based on the TCGA database. Discrimination and calibration were used to evaluate the internal validation of the nomograms. Harrell's C-indexes ranging from 0.5 (no discrimination) to 1 (perfect discrimination) were used for the veri cation of discrimination [14]. Visual calibration plots were used for the veri cation of calibration [10]. Bootstrap analyses with 1000 resamples were used for these analyses.
Harrell's C-index and the area under the ROC curve (AUC) were used to compare the discrimination ability between nomograms and single variables for DFS and OS [8]. All gures in our study were produced by R software version 3.1.2 (The R Foundation for Statistical Computing, Vienna, Austria), SPSS 23.0 (SPSS, Inc.) and GraphPad Prism 8.0 software. P values were two tailed for all tests. P < 0.05 was used to de ne statistical signi cance except for the selection of nomogram variables.

Patient characteristics and survival analyses
Seventy-one young LUAD patients were selected for inclusion in our study. The basic characteristics of the patients are summarized in Table 1. The mean age was 40.4 years, ranging from 27 to 45 years. In the cohort of young LUAD patients from our center, we found obvious heterogeneity between our cohort and the population of lung cancer worldwide. In our cohort, we detected that 73.24% of young patients harbored EGFR or ALK mutations. The most frequent three genomic alterations were deletion mutations of EGFR 19 exon (31%), EGFR 21 exon L858R point mutation (23%) and echinoderm microtubule associated protein like 4 (EML4)-ALK fusion (14%), which indicates that young patients may be more likely to bene t from EGFRtyrosine kinase inhibitors (EGFR-TKIs) or ALK-TKIs. The basic information of the genomic alterations is shown in Figure S1E. In addition, 472 LUAD patients and 809 lung cancer patients from the TCGA database were also included in our study.
As shown in Fig. 1, the IHC sections of all variables were collected for MOD calculation. The positive and negative staining comparisons of PD-L1 and CD28 are shown in Fig. 1A, 1B, 1D and 1E. CD3 was involved in the representation of total lymphocytes, and the positive and negative staining IHC images are shown in Fig. 1G and Fig. 1H, respectively. In our cohort, there was no signi cant difference in DFS between the CT and IM groups. The IHC sections of PD1 and CD8 are shown in Figure S1A-D. In this study, we compared the MOD between the progression group (disease recurring before the end of follow-up) and the stable group and found a signi cant difference between them. Figure 1C demonstrates that a higher mean MOD of PD-L1 was detected in the progression group than in the stable group (P = 0.0032), which is similar to the results of CD28 (P < 0.0001), as shown in Fig. 1F. This nding revealed that PD-L1 and CD28 may be associated with rapid progression in young LUAD patients. However, the MOD of CD3 showed a different distribution, and a higher mean MOD was detected in the stable group (P < 0.0001), as shown in Fig. 1I, which indicated that TILs may be a biomarker of long DFS in young LUAD patients. The mean MODs of PD1 and CD8 were calculated, and no signi cant differences were found between the two groups.
Survival analyses based on ve variables related to TIME, including CD28, PD-L1, PD1, CD3 and CD8, were performed in this study. Thereafter, PD-L1 + CD28 (MOD PD−L1+CD28 ) and the CD3/CD8 ratio (MOD CD3/CD8 ) were also included in the survival analyses. The results revealed that TIME-related molecules are closely associated with the prognosis of LUAD and lung cancer, as shown in Figure S2, Fig. 3A-D and Fig. 4A-D, especially in young LUAD patients, as shown in Fig. 2A to Fig. 2F. Patients with a higher expression of CD28 had a shorter DFS (young LUAD: P < 0.0001; LUAD: P = 0.0011; lung cancer: P = 0.0001) but a longer OS (LUAD: P = 0.016; lung cancer: P = 0.0282). Higher expression of PD-L1 was associated with a worse prognosis for lung cancer patients both in DFS (young LUAD: P = 0.0382; LUAD: P = 0.0002; lung cancer: P = 0.0001) and OS (lung cancer: P = 0.0032). Moreover, we found that PD-L1 + CD28 may be a better biomarker for DFS in young LUAD patients and that a higher MOD PD−L1+CD28 was associated with a shorter DFS (P = 0.0004), which is more persuasive than a single variable in predicting DFS according to the AUC of these three variables. TIL-related variables such as CD3 and CD8 were also found to be associated with DFS among young LUAD patients. CD3, which represents the total lymphocytes, was a convincing variable for predicting a long DFS in young LUAD patients (P < 0.0001) but a short DFS in LUAD patients (P = 0.0003) and lung cancer patients (P < 0.0001), which indicated that young LUAD patients showed a different response to TILs, which resulted in a different outcome. A higher CD3/CD8 ratio was associated with a longer DFS (P = 0.0005) in young LUAD patients, which suggests that a higher CTL proportion in total lymphocytes may be related to a worse prognosis in young LUAD patients. Although PD1 and CD3 showed no statistical signi cance in survival analysis for LUAD and lung cancer patients, we found that these two variables were associated with a good long-term prognosis. Higher expression of the two variables was associated with a longer OS in LUAD and lung cancer patients. The ROC curves of these variables are shown in Figure S1.
Nomograms based on TIME Eight variables were involved in the Cox regression, and ve variables (P < 0.2), including TNM stage, EGFR or ALK mutation status, CD28, PD-L1 and CD3, were selected for the construction of the DFS nomogram for young LUAD patients, as shown in Fig. 2G. Here, we constructed a novel type of nomogram, and the basic characteristics of the patients involved are clearly shown in the nomogram represented by the yellow peak called "density". We provide an example here to explain our DFS nomogram. One of our patients represented by red dots in the DFS nomogram was involved in the veri cation of the prognostic model. After the quanti cation of the basic characteristics, including stage, mutation and the expression of TIME-related variables, we obtained a total score of 359 on the score axis, and the corresponding probability (DFS shorter than 41 months) was 50.9%. The C-index of the DFS nomogram was 0.913, as shown in Table 2, which revealed that our model for DFS has satisfactory and robust discrimination ability compared to any single variable involved.  Fig. 3E and Fig. 4E, respectively. The probabilities of OS less than 1 year, 3 years and 5 years were all evaluated by the OS nomograms. The C-indexes of the OS nomograms were 0.678 and 0.625, as shown in Table 2. Every single variable involved in these nomograms provides a score, and the total score can easily be calculated to estimate the probability of OS less than 1 year, 3 years and 5 years.

Validation of nomogram performance
Harrell's C-indexes revealed that the nomograms for young LUAD DFS (C-index = 0.913), LUAD OS (C-index = 0.678) and lung cancer OS (C-index = 0.625) showed satisfactory discrimination. Calibration of the nomograms was evaluated by calibration plots, as shown in Fig. 2H, Fig. 3F-H and Fig. 4F-H. The calibration plots showed that the probabilities of our prognostic models agreed with the accuracy probabilities on acceptable scales (the dashed lines in the calibration plots correspond to a 10% margin of error), except for the 5-year OS in the LUAD OS nomogram (Fig. 3H). Generally, it proved that our prognostic models based on the nomograms showed satisfactory and robust ability in the discrimination and calibration of DFS for young LUAD patients and OS for LUAD and lung cancer patients.
The dual effect of CD28 on lung cancer prognosis Although PD-L1 plays an important role in tumor immune suppression, we conducted PCA analysis and detected the important role of CD28 associated with TIME. Figure S3A-6D describes the correlation between CD28 and other TIME-related variables derived from GEPIA based on the TCGA database [15]. Linear correlations were con rmed between CD28 and other variables, including CD3 (R = 0.66), CD8 (R = 0.43), PD-L1 (R = 0.22) and PD1 (R = 0.42). Low CD28 expression is associated with a higher proportion of type II TIME, which is proven to be sensitive to immune checkpoint inhibitor (ICI) treatment. The TIME type distribution based on CD28 expression is shown in Figure S3E. Based on the TCGA database, we found a group of genes positively correlated with CD28, and the heatmap of these genes is shown in Figure S3F. Then, enrichment analysis was performed to detect the possible cell signaling pathways related to these genes through "METASCAPE" [16], as shown in Figure S3G, and CD28 is closely associated with the immune-related cell signaling pathways in LUAD, especially the T cell activation pathway. According to the TCGA database, the expression level of CD28 is higher in primary tumor tissue than in normal control tissue, and a higher CD28 expression level is associated with an earlier TNM stage and less nodal metastasis, as shown in Figure S3H-J. The dual role of CD28 in the prognosis of lung cancer is shown in Table 3. Table 3 The dual role of CD28 in the prognosis of lung cancer.

Discussion
Recently, studies suggested the important role of TIME in in uencing the invasiveness and metastasis of lung cancer [17] and relating to tumor heterogeneity [18]. Multiple variables related to TIME, such as PD-L1 [19,20] and PD1 [21], have been used to predict the prognosis of lung cancer patients, but the relationship between CD28 and the prognosis of lung cancer patients remains unknown. Given that young lung cancer patients represent a special population among lung cancer patients, the predictive ability of TIME-related variables in young lung cancer patients needs further study. In this study, we analyzed the relationship between TIME-related variables and the prognosis of lung cancer, especially young LUAD patients, and constructed nomograms for predicting DFS and OS based on TIME-related variables.
In young patients, TIME-related variables showed satisfactory predictive ability for prognosis. CD28, PD-L1 and CD3 were all convincing biomarkers related to disease progression with acceptable e ciency. The combination of PD-L1 and CD28 (MOD PD−L1+CD28 ) showed better e ciency in predicting disease progression than any single variable. A recent study pointed out that the CD28 costimulatory pathway is required for CTL proliferation after PD-1/PD-L1 pathway blockade by ICIs [22], which indicated that after immune suppression induced by the PD1/PD-L1 pathway, the activation of the CD28 costimulatory pathway may rescue the suppression status. Therefore, CD28 may act as a biomarker for predicting the prognosis of ICI treatment. In our work, patients with a higher expression level of CD28 or PD-L1 + CD28 had a worse prognosis for DFS, revealing that the CD28 high-expression baseline of these patients may exhaust the ability of reversing the tumor immune suppression status, resulting in a poor DFS. Given that the expression of CD28 is markedly higher in primary tumor tissue than in normal control tissue, T lymphocytes may be dysmature in tumor tissue [23], which results in an immune-suppressed microenvironment that may endow cells with oncogenic functions. Therefore, the high expression of CD28 may suppress the adaptive immune response to cancerous cells and act as a promoter in the early stage of lung cancer, which is related to the rapid progression of the disease. However, a high CD28 expression level was related to a long OS, which suggested the dual effect of CD28 in the prognosis of lung cancer. With the development of cancer, we can see CD28 loss in metastatic tissue, as shown in our results. Low expression of CD28 is associated with more metastatic lymph nodes and results in advanced disease. From our perspective, CD28 may promote the development of cancer in the early stage, endowing cells with oncogenic functions, and is thus related to a worse DFS, but the metastasis suppressor function of CD28 may rescue OS. Moreover, patients with high baseline CD28 expression were less likely to have the ICI-sensitive type of TIME. In conclusion, CD28 can be a novel biomarker for predicting the sensitivity to immunotherapy and the prognosis of lung cancer patients, especially young LUAD patients. The dual effect of CD28 on lung cancer prognosis should draw our attention in the pathological evaluation of rst diagnosed lung cancer patients.
Nomograms are a simple but effective method for predicting prognosis in medical oncology [24]. DFS and OS are major variables for the prognostic evaluation of lung cancer patients [25][26][27]. Our prognostic models based on nomograms for DFS and OS provide an easy way to estimate the prognosis of patients. Patients may obtain not only probabilities of disease progression and 1-year, 3-year or 5-year survival but also a precise and individualized follow-up regimen according to the models. The clinical signi cance of the models based on these nomograms remains to be veri ed by larger samples.
Admittedly, our study had some limitations. First, given that young LUAD patients represent only a small proportion of all lung cancer patients, the sample size of our study was not large enough. However, we followed the instructions of a previously published study [8] and elevated the selective P value to 0.2 when selecting variables from the Cox regression models to reduce the in uence of sample size on the construction of the models. In addition, several patients harboring EGFR mutations underwent EGFR-TKI treatment after surgery, but we did not include this treatment in our model. The in uence of post-surgery EGFR-TKI treatment on DFS remains to be further studied.

Conclusion
In conclusion, TIME-related molecules, including CD28, PD-L1, CD3 and CD8, are closely associated with the prognosis of lung cancer patients, especially young LUAD patients. CD28 may be a novel biomarker for not only the prognosis of lung cancer but also the sensitivity to immunotherapy, and the dual effect of CD28 in lung cancer prognosis should be considered. The basic characteristics and TIME-related variables were involved in the construction of the nomograms to evaluate the prognosis of lung cancer patients in a simple but effective way. Here, we provide prognostic models based on nomograms for doctors to establish more individualized follow-up regimens for lung cancer patients. Admittedly, our models remain to be further veri ed, and more studies are needed for further research on TIME and lung cancer.