Predicting response to immunotherapy plus chemotherapy in patients with esophageal squamous cell carcinoma using non-invasive Radiomic biomarkers

Objectives To develop and validate a radiomics model for evaluating treatment response to immune-checkpoint inhibitor plus chemotherapy (ICI + CT) in patients with advanced esophageal squamous cell carcinoma (ESCC). Methods A total of 64 patients with advance ESCC receiving first-line ICI + CT at two centers between January 2019 and June 2020 were enrolled in this study. Both 2D ROIs and 3D ROIs were segmented. ComBat correction was applied to minimize the potential bias on the results due to different scan protocols. A total of 788 features were extracted and radiomics models were built on corrected/uncorrected 2D and 3D features by using 5-fold cross-validation. The performance of the radiomics models was assessed by its discrimination, calibration and clinical usefulness with independent validation. Results Five features and support vector machine algorithm were selected to build the 2D uncorrected, 2D corrected, 3D uncorrected and 3D corrected radiomics models. The 2D radiomics models significantly outperformed the 3D radiomics models in both primary and validation cohorts. When ComBat correction was used, the performance of 2D models was better (p = 0.0059) in the training cohort, and significantly better (p < 0.0001) in the validation cohort. The 2D corrected radiomics model yielded the optimal performance and was used to build the nomogram. The calibration curve of the radiomics model demonstrated good agreement between prediction and observation and the decision curve analysis confirmed the clinical utility. Conclusions The easy-to-use 2D corrected radiomics model could facilitate noninvasive preselection of ESCC patients who would benefit from ICI + CT. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08899-x.


Background
Esophageal cancer (EC) is the seventh most common cancer cause of death in male population worldwide [1]. China accounts for more than half of the world's new cases and EC-related deaths, with more than 90% diagnosed EC being esophageal squamous cell carcinoma (ESCC) [2]. Surgery, chemotherapy and radiotherapy are the cornerstone treatments of EC [3,4]. However, outcomes are still poor with a 5-year survival rate of 10-15% [5]. The emerging targeted drugs used to treat EC are only targeting HER2 or vascular endothelial growth factor [6][7][8], and the therapeutic effect of improved traditional treatments with added targeted drugs is still unsatisfactory with a 5-year survival rate of 30-40% for ESCC [9]. Therefore, there is a high clinical need for novel and more effective treatment options for EC patients.
In recent years, the study of KEYNOTE-028 and KEYNOTE-180 first confirmed the efficacy and safety of pablizumab in the treatment of advanced EC [10,11]. Whereafter, in a larger sample size, KEYNOTE-181 established the position of pablizumab in the treatment of advanced EC [12]. At present, a number of studies have been performed to explore the efficacy and safety of immunotherapy combined with chemotherapy as first-line and post-line treatment of advanced ESCC [13][14][15][16]. The comprehensive positive score (CPS), tumor proportion score (TPS) are immunohistochemical markers for evaluating the expression of programmed death receptor ligand 1 (PD-L1) in tumors. However, the precision of these biomarkers was unsatisfying. Therefore, more reliable biomarkers for predicting the efficacy of immunotherapy for EC is in urgent need.
With the rapid development of artificial intelligence (AI) in the field of medical imaging, radiographic characteristics of tumors referred to as 'radiomics' have shown success in immunotherapeutic response prediction for different tumor types [17][18][19]. To the best of our knowledge, there is no evidence yet in EC. In this study, we aimed to evaluate the potential predictive value of CT-derived radiomics in advanced ESCC patients receiving immune-checkpoint inhibitor plus chemotherapy (ICI + CT).

Study design
A total of 64 patients with advance inoperable ESCC receiving 200 mg every 3 weeks of Sintilimab plus Docetaxel (60 mg/m 2 ) and Carboplatin (AUC = 5) at two centers between January 2019 and June 2020 were included in this study approved by the two institutional review boards. All patients in two centers had undergone MDT study before starting treatment. Informed consent was waived. In this study patients were confirmed by biopsy and immunohistochemistry of the original tumor tissue. All the enrolled patients were first-visit and prior to treatment. Patients who had never received cancer related treatments including radiotherapy, chemotherapy, comprehensive treatment and surgery and those who lacked CT imaging data and necessary clinical information before the initial treatment (immunotherapy plus chemotherapy) were excluded from this study. Exclusion criteria also included patients with non-squamous cell carcinoma including adenocarcinoma and signet ring cell carcinoma and those who discontinued treatment due to adverse events. Flow chart of patient enrollment is shown in Fig. 1. For patients' clinical characteristics, information of age, gender, Body Mass Index (BMI), clinical TNM stage, hemoglobin, blood albumin, leucocyte, C-reactive protein and underlying diseases was acquired from electronic medical record system. BMI was calculated based on height and weight. Clinical TNM stage was confirmed by pre-treatment gastroscopy, CT examination, etc.

Response kinetics and scan protocol
Contrast enhanced computed tomography (CE-CT) scans were acquired before (baseline) and around six weeks (two cycles) after start of treatment (follow-up). Treatment response was evaluated by assessing the relative change in diameter between baseline and follow-up, using RECIST 1.1 criteria [20]. Patients were divided into responders [complete/partial response disease] and non-responders [stable and progressive disease] according to RECIST. For progressive disease, pseudoprogression was confirmed by follow-up observation.
All preoperative enhanced CT images were obtained with multidetector CT scanners during inspiration. Detailed information of the CT scanners including manufacturer, country of origin, tube voltage, slice thickness and spacing was shown in Supplementary Table 1. Iopromide (300 mg I/m1, Schering Pharmaceutical Ltd) was used as the contrast agent for enhanced scanning protocol, and 80-100 ml was injected at 3-4 ml/s flow rate.

Lesion segmentation and Radiomics features extraction
All enhanced CT images were manually segmented with an open-source software ITK-SNAP (http://www. itksnap.org/pmwiki/pmwiki.php) for feature extraction. 2D ROI was selected as the slice with maximum axial diameter of the tumor, and 3D ROI was segmented slice by slice on the whole volume of the lesions.
To correct variability from spatial information in three axes (x, y, z) and different CT protocols, all enrolled CT images were resampled to a same isotropic voxel spacing. Considering the distribution of our data, we resampled the 2D ROIs to 1 × 1 mm 2 , and the 3D ROIs to 1 × 1 × 1 mm 3 to balance between the loss of in-plane information and the interpolation of out-of-plane information. Afterwards, the CT radiomics features, from 2D and 3D ROIs respectively, were extracted with an opensource python platform Pyradiomics (version 2.1.2, https://pyradiomics.readthedocs.io/en/latest/#). Features used in this study included 14 shape-based features, 18 first order statistics features and 68 texture features containing the gray-level co-occurrence matrix (GLCM, 22 features), gray level run length matrix (GLRLM, 16 features), gray level size zone matrix (GLSZM, 16 features) and gray level dependence matrix (GLDM, 14 features). Besides the original images, eight filters were also generated for feature extraction, including wavelet transform filter. All the categories of features other than shape originated from the original and filtered images were calculated. Therefore, in this study, a total of (18 + 68 + 14) + (18 + 68) *8 = 788 features were statistically analyzed.

Feature selection
Feature selection was performed separately for each group of features. Three steps were applied to reduce dimensionality: (1) features with variance larger than 0.8 were included for further analysis; (2) univariate feature selection was done by ANOVA (continuous variable) or chi-square test (discrete variable) to explore the associations between features and treatment response. The features with p value>0.05 would be excluded from further analysis; (3) the most significant features were selected by the least absolute shrinkage and selection operator (LASSO) method. Since the total patient number was limited, the nonzero feature coefficients ranking the first five were selected for each group to avoid overfitting.

Prediction models and workflow
After feature selection, traditional machine learning algorithms, including support vector machine (SVM), k nearest neighbors, random forest, decision tree (DT), logistic regression (LR), were applied to build prediction radiomics models for each feature group. The performance of the models was compared by using 5-fold crossvalidation in the validation cohort, with the best model being selected. All the patients were randomly split into 80% for training and the remaining 20% for validation, with 100 iterations. All feature selection and radiomics algorithm selection were based on the data in the training dataset to ensure independence from validation dataset.
Radiomic nomogram was built based on the multivariable logistic analysis of the selected radiomics features in the training group. Calibration curves accompanied by the Hosmer-Lemeshow test were plotted to evaluate the effectiveness of the radiomics nomogram. Decision curve analysis was conducted to determine the clinical usefulness of the radiomics nomogram by quantifying the net benefits at different threshold probabilities in the validation dataset. Flow chart of radiomics nomogram building was illustrated in Fig. 2. Fig. 2 Flowchart of feature selection and radiomics nomogram building. (A) Lesion segmentation and 2D ROI and 3D ROI segmentation; (B) A total of 788 selected features for 2D and 3D ROI respectively; (C) ComBat correction was applied to minimize the potential bias on the results due to different scan protocols of the 5 different CT scanners; (D) Dimension reduction for features selection; (E) Select the optimal algorithm for radiomics model building. The best one was selected by using 5-fold cross-validation in the validation cohort. All the patients were randomly split into 80% for training and the remaining 20% for validation, with 100 iterations; (F) Radiomic nomogram was built on the optimal algorithm. Calibration curves and decision curves were used to evaluate the effectiveness of the radiomics nomogram

Statistical analysis
Statistical analyses were performed by using SPSS 22.0 (IBM, USA). Variables were described as frequency (n%). The chi-square test was used to compare patients' basic information between groups (responders versus nonresponders) and P < 0.05 was considered statistically significant. All machine learning analyses were performed by using the Python package scikit-learn (0.19.0), and statistical plots were generated by R software (3.6.1, http://www.R-project.org). Area under the Receiver-Operating Characteristic Curves (AUCs) were calculated to evaluate the performance of the algorithms for each model, and the Youden Index was used to generate the optimal threshold to convert probabilities into binarized labels. Statistical metrics, including accuracy, sensitivity, specificity, NPV (Negative Predictive Value), PPV (Positive Predictive Value) and AUC were also calculated to evaluate the performance of the ultimate selected algorithm in the training cohort and the validation cohort for the different radiomics models. Wilcoxon rank test with Bonferroni correction was applied for multiple comparisons, and p < 0.0125 was considered statistically significant.

Basic Clinicopathological characteristics
A total of 64 patients were included in our study, including 32 (50%) responders and 32 (50%) non-responders. Patients' clinicopathological characteristics were given in Table 1. No significant difference was observed in underlying diseases between non-responders and responders both in the training and validation cohorts, with P value>0.05 respectively.

Features and optimal Radiomics algorithm selection
For the four different radiomics models including 2D uncorrected, 2D corrected, 3D uncorrected and 3D corrected models, feature selections were performed  Table 2. Algorithms of SVM, KN, RF, and LR were applied to build radiomics models for 2D and 3D ROIs by using selected features from the training cohort, and their performances were compared. The results showed that relatively higher AUC (0.804, 95% CI: 0.800-0.822) could be obtained by using SVM algorithm for the training dataset (Supplementary Table  2). Finally, SVM with the best performance was selected for further evaluation of the performance of radiomics models.

Radiomics models performance based on SVM algorithm
To evaluate the performance of our models in classifying patients according to their treatment response, we used the SVM algorithm. Good performance of the four different radiomics models using SVM algorithm was observed for the probability of responders ( The performance of the four different radiomics models was compared by AUCs as shown in Fig.3. The 2D models outperformed the 3D models (2D uncorrected vs. 3D uncorrected, p < 0.0001; 2D corrected vs. 3D corrected, p < 0.0001) in the training cohort, which was confirmed in the validation cohort (2D uncorrected vs. 3D uncorrected, p < 0.0001; 2D corrected vs. 3D corrected, p < 0.0001). When the ComBat correction was used, the performance of 2D models was better (p = 0.0059) in the training cohort, and significantly better (p < 0.0001) in the validation cohort. There was no improvement for 3D models when integrated with the ComBat correction (training cohort, p = 0.17; validation cohort, p = 0.018).

Development, performance and validation of individualized Radiomics nomogram
Quantitative nomograms for predicting the probability of responders were constructed separately for the four groups of features, of which the one based on the 2D corrected model is shown in Fig.4 (A). The calibration curves of the four different radiomics models [ Fig. 4 (B) & (C)] estimating the probability of responders demonstrated good agreement between prediction and observation in the training cohort and validation cohort. For the 2D corrected radiomics model, the Hosmer-Lemeshow test yielded a nonsignificant P value of 0.160 in the training cohort and 0.478 in the validation cohort, suggesting the perfect match between the actual (Y-axis) and nomogram-predicted (X-axis) responders. The 2D corrected model also achieved good discrimination performance with AUC of 0.843 (95% CI, 0.736-0.950) within the training cohort and 0.914 (95% CI, 0.775-1.000) in the validation cohort (Table 4). For

Clinical use
The decision curve was used to compare the benefit of the four different radiomics nomogram, treat-all and treat-none scheme, as shown in Fig. 4 (D). The results showed relatively good performance for the models in terms of clinical application and indicated that all the models added more benefit than either the treat-all or

Discussion
This study aimed to evaluate the prediction efficacy of pre-therapeutic CT imaging based radiomics models in treatment response of patients with advanced ESCC receiving anti-PD-1 antibodies plus chemotherapy. In the new era of artificial intelligence (AI), radiographic characteristics automatically calculated by computer is more objective and makes more accurate quantitative analysis possible [21][22][23][24]. Our study is the first attempt to predict treatment efficacy of ICI + CT in advanced ESCC prior to treatment using CT radiomics model. The quantitative approach has the potential to identify the responders before treatment. Due to the long-time debate on whether to use oneslice 2D annotation or whole-volume 3D annotation especially for advanced cancer [25][26][27], in our study, the comparison between 2D and 3D radiomic features was Decision curves showed relatively good performance for the models in terms of clinical application and indicated that all the models added more benefit than either the treat-all or treat-none scheme within the threshold between 30 and 60%. Moreover, the 2D corrected model achieved the highest benefit if the threshold probability of a patient was between 50 and 70% also performed. We found that 2D radiomic features significantly outperformed 3D features, which was similar to the reported [26][27]. In a multicenter study of advanced gastric cancer [27], the performances of 2D and 3D CT radiomic features were compared in discriminating lymph node metastasis, lymphovascular invasion as well as pT stages' classification. They found that 2D model outperformed 3D model with higher AUCs regarding the above three tasks despite different resampling spacings. Similar findings were also reported by another study [26] in which the prognostic prediction performances were compared between 2D and 3D CT radiomics features in patients with non-small cell lung cancer (NSCLC). They found that 2D Cox model had a higher C-index compared with 3D Cox model. The results of our study showed that 2D models performed significantly better than 3D ones, which might be attributed to more noise of 3D ROIs originated from multi-slice manual annotations and inconsistent resolutions of the transverse plane and z-plane [26][27]. Therefore, in this scenario, 2D models are recommended in ESCC radiomics researches for the better performance and time-saving annotations.
The ComBat function compensation method is a datadriven method correcting for differences in features caused by the various imaging protocols [28][29]. Com-Bat correction was applied in our study to control potential bias on the results caused by different CT scanning schemes such as tube voltage, reconstruction kernel, slice thickness, and in-plane resolution. This method showed efficiency in 2D models by standardizing the CT images obtained from different CT scanners, and achieved the highest AUCs in both training and validation cohort. In addition, higher net benefits could be obtained with ComBat correction in decision curve analysis, thus patients could benefit from treatment optimization and avoid unnecessary risks.
In this study, the proposed radiomics model provided potential clinical utility from the following perspectives. For patients with advanced unresectable esophageal cancer, the established radiomics model could screen out the potential responders to ICI + CT prior to treatment which would improve effeacy. On the other hand, due to the high cost of immunotherapy, preselecting the potential responders prior to treatment could reduce the economic burden to patients and maximize their benefits, which was particularly important in developing countries. In addition, as a non-invasive biomarker, CT imaging could overcome the problem of tumor heterogeneity. Some other indicators, such as PD-L1 expression, obtained by fine needle aspiration biopsy could not represent its real status in the whole tumor tissue, so the detection results might be biased due to tumor heterogeneity. Finally, in our study, we recommended 2D radiomics features because one-slice 2D annotation was a much more timesaving data processing with significantly higher prediction efficacy than that of whole-volume 3D annotation.
This study has several limitations. First, our findings deserve further extra external validation with larger sample size and inclusion of other medical centers. A large-scale study enrolling more patients is deserved and may definitely help validate and improve its applicability as an effective prediction tool for assisting treatment decision making. Second, due to the limited patient number of other histologic types of EC in our center, adenocarcinoma and signet ring cell carcinoma, therefore, were not included in the present study. This limits the application of the built model to some extent. Third, due to the limited spatial resolution of CT, there may be bias in the determination of the boundary between the lesion and the normal esophageal tissue when conducting ROIs segmentation.

Conclusions
In conclusion, the proposed CT-based radiomics model performs well and thereby is expected to serve as an alternative tool to select the potential best responders to ICI + CT prior to treatment for patients with ESCC, thus can assist treatment decision making in the clinical setting.