Skip to main content
  • Research article
  • Open access
  • Published:

Machine learning-based Radiomics analysis for differentiation degree and lymphatic node metastasis of extrahepatic cholangiocarcinoma



Radiomics may provide more objective and accurate predictions for extrahepatic cholangiocarcinoma (ECC). In this study, we developed radiomics models based on magnetic resonance imaging (MRI) and machine learning to preoperatively predict differentiation degree (DD) and lymph node metastasis (LNM) of ECC.


A group of 100 patients diagnosed with ECC was included. The ECC status of all patients was confirmed by pathology. A total of 1200 radiomics features were extracted from axial T1 weighted imaging (T1WI), T2-weighted imaging (T2WI), diffusion weighted imaging (DWI), and apparent diffusion coefficient (ADC) images. A systematical framework considering combinations of five feature selection methods and ten machine learning classification algorithms (classifiers) was developed and investigated. The predictive capabilities for DD and LNM were evaluated in terms of area under precision recall curve (AUPRC), area under the receiver operating characteristic (ROC) curve (AUC), negative predictive value (NPV), accuracy (ACC), sensitivity, and specificity. The prediction performance among models was statistically compared using DeLong test.


For DD prediction, the feature selection method joint mutual information (JMI) and Bagging Classifier achieved the best performance (AUPRC = 0.65, AUC = 0.90 (95% CI 0.75–1.00), ACC = 0.85 (95% CI 0.69–1.00), sensitivity = 0.75 (95% CI 0.30–0.95), and specificity = 0.88 (95% CI 0.64–0.97)), and the radiomics signature was composed of 5 selected features. For LNM prediction, the feature selection method minimum redundancy maximum relevance and classifier eXtreme Gradient Boosting achieved the best performance (AUPRC = 0.95, AUC = 0.98 (95% CI 0.94–1.00), ACC = 0.90 (95% CI 0.77–1.00), sensitivity = 0.75 (95% CI 0.30–0.95), and specificity = 0.94 (95% CI 0.72–0.99)), and the radiomics signature was composed of 30 selected features. However, these two chosen models were not significantly different to other models of higher AUC values in DeLong test, though they were significantly different to most of all models.


MRI radiomics analysis based on machine learning demonstrated good predictive accuracies for DD and LNM of ECC. This shed new light on the noninvasive diagnosis of ECC.

Peer Review reports


Cholangiocarcinoma (CCA), categorized as intrahepatic (ICC) and extrahepatic (ECC) forms, is a malignant neoplasm arising from the biliary epithelium, representing an estimated 3% of all gastrointestinal system malignancies [1]. ECC, originating from the bile ducts outside the liver parenchyma, accounts for approximately 80% of all CCA. ECC is divided into two types of perihilar and distal cholangiocarcinoma [1, 2]. It was reported that the incidence and mortality rates of ECC have been increasing gradually worldwide over the last decades, although it’s not as established as ICC [3]. The prognosis of ECC and ICC still remains poor. The only effective way to cure ECC is complete surgical resection. However, it is only appropriate for patients with well-localized lesions [4]. The curative rate of ECC has been low for patients in advanced stages. Even with complete resection of the tumors, most patients may encounter a poor prognosis (e.g., local recurrence, distant metastasis, or death), which is associated with the differentiation degree (DD) and lymph node metastasis (LNM) [5, 6]. Therefore, it is crucial to accurately evaluate ECC, especially DD and LNM of the tumor, in order to select optimal treatment strategies and determine prognosis.

Several imaging techniques could diagnose ECC, including ultrasonography (US) [7], computerized tomography (CT) [8], positron emission computerized tomography (PET-CT) [9], magnetic resonance imaging (MRI) [10, 11], and cholangioscopy [12], etc. At present, MRI has become the imaging modality of choice for diagnosis and staging of CCA, with high soft-tissue contrast to help better detect and identify the infiltrating tumors [10]. It has been reported that MRI techniques are helpful in displaying the stricture morphology of bile ducts clearly, such as irregularity and wall thickness [11], regional lymph node, distant metastases, and survival outcomes of CCA [13]. This imaging method could provide an accurate preoperative evaluation of CCA, result in better treatment selection and improved prognosis [13]. However, conventional techniques still have some disadvantages. They rely on subjective visual evaluations of the radiologists to a large extent and are not quantitative methods for predicting DD and LNM of ECC [14]. It is hard to analyze the large number of digital features embedded in the images involving cells, physiology, and genetic variation of patients, which cannot be recognized by human eyes [14].

Radiomics, a recently introduced methodology, describes quantitative computerized algorithm-based feature extraction from traditional imaging data, including CT, MRI, or PET images [15,16,17]. This technique, opposing to the subjective visual interpretation of the radiologists, quantifies the heterogeneity of lesions objectively. Previous studies have suggested improvements in the preoperative prediction of LNM by using radiomics-based approaches to lung adenocarcinoma, gastric cancer, colorectal cancer, bladder cancer, and breast cancer [18,19,20,21,22]. In addition, radiomics has been applied to predict the histopathological grades of tumors such as soft tissue sarcomas and gliomas [23, 24].

Recently, it has been reported to use radiomics to diagnose and evaluate CCA. Some protein expressions of CCA and early recurrence of ICC could be predicted based on radiomics methods [25,26,27]. To our knowledge, radiomics models have been used in predicting LNM of biliary tract cancers or ICC and determining its prognostic value [28,29,30]. Therefore, the aim of this study is to develop machine learning-based radiomics models to predict DD and LNM of ECC.


The flowchart of this study was illustrated in Fig. 1. We first retrospectively collected patient records and obtained the MRI scanning. The texture features of tumors were extracted from the MRI images. For the independent predictions of DD and LNM, the machine learning-based radiomics were developed in two steps, including feature selection and classification, before final evaluations. The predictions of DD and LNM were conducted separately. Namely, the two steps were repeated for DD and LNM, respectively.

Fig. 1
figure 1

Radiomics development flowchart of this study

Patients characteristics

Ethical approval for this retrospective study was obtained from the Ethical Committee of the Affiliated Hospital of Southwest Medical University (KY2019063). The procedures of this study strictly followed the standard rules and regulations of the hospital. The patient informed consent was waived, and all patient identification information was removed. The inclusion criteria were (1) all patients who experienced MRI examinations no more than two weeks before surgical resection, and (2) all patients who underwent surgical excisions and pathological examinations. The exclusion criteria were (1) patients whose lesions were not identified, (2) patients whose MRI examinations sequences were incomplete, and (3) patients whose MRI images in which the lesions were too small to be identified. Initially, we collected 144 patients based on clinical data. However, 39 patients were excluded because of obscure MRI images or incomplete sequences, and five patients were excluded because of existing inconspicuous lesions without being identified. As a result, a group of ECC patients (n = 100) diagnosed and treated in the hospital between January 2011 to December 2018 were included in our study. Due to the low incidence rate of ECC, the sample size in the present study was limited. More samples could be included in future studies.

Clinical data (e.g., gender, age, primary tumor site, and lesion size) and the baseline appearance of MRI were obtained from medical records. The ECC status ascertainment (pathological grades and lymphatic status) was confirmed by pathology reports and reviewed by an abdominal pathological expert with ten years of experience.

MRI acquisition

All patients underwent preoperational MRI scans using a 3.0 T MRI scanner (Achieva 3.0 T, Philips, Amsterdam, Netherlands) with a 16-channel abdominal coil. The scanning range extended from the top of the diaphragm to the lower edge of the liver. For each patient, the following MRI sequences were obtained: an axial T1 high-resolution isotropic volume excitation (THRIVE) sequence (T1 weighted imaging, T1WI) (TR = 3.1 ms, TE = 1.44 ms, flip angle = 10°, matrix = 244 × 186, a field of view (FOV) = 280 mm × 305 mm, number of excitations (NEX) =1, section thickness = 3 mm, gap = − 1.5 mm), an axial fat-suppressed turbo spin echo (TSE) T2-weighted imaging sequence (T2WI) (TR = 1610 ms, TE = 70 ms, flip angle = 90°, matrix = 176 × 201, FOV = 280 mm × 305 mm, NEX = 2, section thickness = 7 mm, gap = 1 mm), an axial diffusion weighted imaging (DWI) (TR = 934 ms, TE = 52 ms, flip angle = 90°, matrix = 100 × 124, FOV = 280 mm × 305 mm, NEX = 4, section thickness = 7 mm, gap = 1 mm, b values = 0,600 and 800 s/mm2), a coronal TSE T2WI sequence, an axial dual-echo T1WI breath-hold gradient-echo sequence for the acquisition of in-phase and out-of-phase images, MR cholangiopancreatography (MRCP), and a T1-weighted dynamic contrast-enhanced MRI. ADC images were obtained by reconstructing DWI images in the post-processing workstation (Philips Extended MR Workspace Finally, the T1WI, T2WI, DWI (b = 800 s/mm2), and ADC images were used in this study.

Radiomics features extraction

All MRI images were extracted from the same machine using the same scanning parameters in this study. Therefore, no data preprocess was executed. All images were examined by an expert radiologist with eight years of experience in abdominal radiological diagnosis. Using MaZda software (version 4.6),Footnote 1 the regions of interest (ROI) of the lesion on the maximum section of the tumor were delineated manually, avoiding adjacent vessels and bile duct, as shown in Fig. 2.

Fig. 2
figure 2

ROI was placed on the maximum section of the tumor, avoiding adjacent vessels and bile duct on ADC (a), DWI (b), T1WI (c), and T2WI (d), respectively

Four MRI sequence features were extracted from MRI images, including T1WI, T2WI, DWI (b = 800 s/mm2), and ADC images. Six common feature groups, including histogram, absolute gradient, gray-level co-occurrence matrix, run-length matrix, autoregressive model, wavelet transform, were extracted using MaZda. Each sequence had 300 features. As a result, a total of 1200 features were extracted from the four MRI sequences for each patient.

Feature selection

First, we applied feature selection methods to reduce feature dimensions before conducting classification predictions. In this study, five feature selection methods, including joint mutual information (JMI) [31], minimum redundancy maximum relevance (MRMR) [32], select K best-using analysis of variance (SKB), select percentile (SP) [33], and Wilcoxon (WLCX) [34], were applied to the 1200 features to select the most significant features for DD and LNM, respectively. These filter-based methods were frequently applied in studies [35]. The features were ranked using the above feature selection methods based on joint mutual information (JMI) [35,36,37], redundancy and relevance (MRMR) [37,38,39], ANOVA F-value (SKB, SP) [37, 38], p-value (WLCX) [35, 37, 39], respectively. For each feature selection method, different number of selected features (n = 5, 10, 15, ..., 100) were selected for further classifications. In other words, each feature selection method generated 20 groups of selected features in different numbers ranged from n = 5 to n = 100 with an increment of five. This approach allowed sufficient searches of significant features. As a result, we obtained 100 groups (n = 20 × 5) of selected features (20 for each of the five feature selection methods) for DD and LNM, respectively. These selected feature groups would be later used to conduct independent classifications using different machine learning classifiers.

Classification prediction

After feature selection, we applied machine learning classifiers to the selected features. In this study, the predictions of DD (high, medium-low) and LNM (positive, negative) were two separated binary classifications and conducted independently. For all of the 100 groups of selected features obtained by the five feature selection methods, ten machine learning classifiers (Table 1) were applied to evaluate the final classification performance of the different combinations of feature selection methods and classifiers. Thus, we conducted systematical evaluations of 1000 cases (n = 5 × 20 × 10). To evaluate the performance of classifiers, the metrics including area under the receiver operating characteristic (ROC) curve (AUC) [40], accuracy (ACC), sensitivity, and specificity were calculated using the test set for DD and LNM, respectively. Using AUC as the major metric, we organized these results into 20 groups according to the number of selected features (n = 5, 10, 15, ..., 100). Results of all groups were later organized and illustrated as heatmaps, from which we further identified and reported the highest AUC value and the corresponding heatmap. It’s worth noting that multiple AUC values achieved by multiple methods should be compared statistically using DeLong test [41]. If a model has a better AUC value and at the same time is significant in DeLong test in comparing with other models, we can acclaim that this model is optimal and significantly different to other models. While, though a model has better AUC value but is not significant in DeLong test in comparing with other models, we should avoid overstating that this model is significantly superior compared to other models, since the model is not significant in DeLong test in comparing with other models. Therefore, in reporting and comparing the performance of models, the DeLong tests should be reported no matter the test results were significant or not [42,43,44,45,46,47,48,49]. In line with previous radiomics studies involving comparing performance of multiple models [43, 44, 46,47,48], we conducted DeLong test [41] to evaluate the statistical differences between models. Namely, pairwise DeLong tests were performed for models in classifications of DD and LNM, respectively. Statistically, for any given two models, a significant DeLong test result (p-value < 0.05) indicates the two models are significantly different. All methods used in the feature selection and classification were implemented in Python (version 3.6.3) using the publicly available Pandas library (version 0.24.2), NumPy library (version 1.15.1), SciPy library (version 1.0.0), and Scikit-learn library (version 0.19.1). DeLong test was implemented and performed in Python according to the algorithm of the original paper [41]. We further provided the source codes we developed in this study for interested researchers at GitHub (

Table 1 Classification machine learning algorithms

Statistical analysis

The age and the lesion size were expressed as mean ± standard deviation (SD) when the distribution of data was normal or as median when it was outside the bounds of normality. The variables were compared using independent t-tests or Wilcoxon Rank Sum tests, when appropriate. Gender was compared using the chi-squared test. The above statistical analyses were conducted using SPSS 25. A two-sided p value < 0.05 was considered significant. The classification performance was assessed using the ROC curve and AUC. The models were compared with DeLong test [41], and the difference between models was considered statistically significant with p-value < 0.05.



Table 2 provided a summary of the patient characteristics of this study (n = 100). There were 54 males (54%) and 46 females (46%) with an age range of 28–83 and a median age of 59.5. All tumors were confirmed to be adenocarcinomas and were divided into high (n = 36), medium (n = 46), and low (n = 18) differentiation groups based on the World Health Organization classification of digestive system tumors (4th edition). Given that the sample size is too small in the low differentiation group, patients were classified into high (n = 36) and medium-low differentiation groups (n = 64) in our study. Among them, 27 and 73 patients were found to have positive LNM (27%) and negative LNM (73%), respectively. We further randomly divided the patients into two cohorts, namely one training set (n = 80, 80%) and one test set (n = 20, 20%) to ensure that no data of a given individual appear in both sets in order to avoid bias.

Table 2 Patient characteristics

Feature selection and Radiomics signature construction

DD prediction

Using the extracted 1200 radiomics features, we applied each of the five feature selection methods to obtain 20 groups of selected features of different numbers of features, namely n = 5, 10, ..., 100. We further applied ten classification methods to each of the 20 selected feature groups to conduct the classifications for DD prediction. Therefore, for each group, 50 combinations of five feature selection methods and ten classifiers were systematically investigated. Thus, we have evaluated the performance of 1000 = 20*50 possible combinations. The performance metrics were applied to the independent test set (n = 20).

Among all of the selected groups, the combination of feature selection method JMI and classifier BAGC achieved the highest performance with AUPRC = 0.65, AUC = 0.90 (95% CI 0.75–1.00), ACC = 0.85 (95% CI 0.69–1.00), sensitivity = 0.75 (95% CI 0.30–0.95), and specificity = 0.88 (95% CI 0.64–0.97). As listed in Table 3, the number of selected features in this group was five, including two ADC features, one DWI feature, one T1WI feature, and one T2WI feature. For this group, we plotted the heatmap of AUC values in Fig. 3a and the ROC in Fig. 3b, respectively. The corresponding DD radiomics with the best AUC included five features of two ADC features, one DWI feature, one T1WI feature, and one T2WI feature. In DeLong test, the combination of JMI and BAGC was found significantly different to most of the rest models. However, no statistically difference was observed to other combinations of high AUC values. For example, the highest combination (JMI and BAGC, AUC = 0.90) was not significantly different to the second highest combination (JMI and XGBC, AUC = 0.89) (p-value = 0.9004, DeLong test).

Table 3 The selected features with the best performance for DD
Fig. 3
figure 3

DD prediction AUC heatmap and ROC. (a) Combinations of feature selection methods and classifiers; (b) ROC for the best performing combination of feature selection method JMI and classifier BAGC (feature number n = 5, AUC = 0.90)

LNM prediction

Similarly, the LNM classification was also conducted using combinations of five feature selection methods and ten classifiers based on the independent test set (n = 20). As listed in Table 4, the group with 30 selected features, including seven ADC features, seven DWI features, eight T1WI features, and eight T2WI features, achieved the highest performance using feature selection method MRMR and classifier XGBC with AUPRC = 0.95, AUC = 0.98 (95% CI 0.94–1.00), ACC = 0.90 (95% CI 0.77–1.00), sensitivity = 0.75 (95% CI 0.30–0.95), and specificity = 0.94 (95% CI 0.72–0.99). For this LNM prediction, we plotted the heatmap of AUC in Fig. 4a and ROC in Fig. 4b, respectively. Similar to DD, in DeLong test, the combination of MRMR and XGBC was found significantly different to most of the rest models. However, no statistical difference was observed to other combinations of high AUC values. For example, the highest combination (MRMR and XGBC, AUC = 0.98) was not significantly different to the second highest combination (MRMR and ADAC, AUC = 0.97) (p-value = 0.4795, DeLong test).

Table 4 The selected features with the best performance for LNM
Fig. 4
figure 4

LNM prediction AUC heatmap and ROC. (a) Combinations of feature selection methods and classifiers; (b) ROC for the best performing combination of feature selection method MRMR and classifier XGBC (feature number n = 30, AUC = 0.98)


ECC is a malignant tumor with an extremely unfavorable prognosis despite the rare incidence of the disease. It’s important to comprehensively evaluate ECC, especially its DD and LNM, to guide clinicians and predict the prognosis of the tumor.

In this study, we found radiomics model incorporating ADC, DWI, T1WI, and T2WI had the highest diagnostic performance in discriminating high and medium-low DD groups of ECC (AUC = 0.90 (95% CI 0.75–1.00)), and LNM of ECC (AUC = 0.98 (95% CI 0.94–1.00)), suggesting that the clinical use of radiomics is promising in terms of the preoperative evaluation of ECC. Specifically, machine learning algorithm combinations of five feature selection methods and ten classification algorithms were applied to build radiomics signatures for DD and LNM of ECC. As a result, the algorithm combination of feature selection method JMI and machine learning classifier BAGC achieved the best predictive performances for DD with satisfying accuracy of ACC = 0.85 (95% CI 0.69–1.00) and AUC = 0.90 (95% CI 0.75–1.00) based on five selected optimal features. For LNM, the combination of feature selection method MRMR and classifier XGBC achieved the highest performance, with ACC = 0.90 (95% CI 0.77–1.00) and AUC = 0.98 (95% CI 0.94–1.00), based on 30 selected optimal features. The results demonstrated that radiomics analysis was able to accurately predict the DD and LNM for ECC cases. Meanwhile, the predictions achieved in radiomics analysis also had implications for guiding the clinicians in selecting the most appropriate treatment strategy and hopefully improving the prognosis of patients with ECC.

Recently, many studies have indicated that contrast-enhanced CT, PET-CT, and MRI played an important role in detecting of LNM of CCA [50,51,52]. It’s reported that PET-CT and MRI have been proposed to predict DD of CCA [53, 54]. However, discrimination of malignant from benign nodes and various DD on cross-sectional imaging with the traditional practice of visual interpretation remains challenging. These conventional imaging modalities based on morphologic criteria or metabolic activity still have some limits and are unable to fully meet the clinical requirements. In contrast, radiomics, which is more reflective of quantitative information drawn from images rather than those drawn by the naked eye, can enable mineable high-dimensional data to be applied within clinical decision support [55, 56]. The main contribution of this study is developing machine learning-based radiomics to predict LNM and DD of ECC using MRI data. For predicting LNM, the radiomics signature derived from ADC, DWI, T1WI, and T2WI sequences in this study achieved an AUC of 0.98, better than that derived from the evaluation of traditional images like PET-CT [9]. Besides, there is a lack of literature to identify the DD of ECC by using radiomics at present. Therefore, we developed a radiomics model to predict the DD of the tumor and achieved a better result with an AUC of 0.90.

At present, some radiomics methods have been used to evaluate CCA recently [26,27,28,29,30]. Wenjie Liang et al. constructed a novel nomogram for preoperative prediction of early recurrence (ER) in ICC, discovering the radiomics signature and clinical stages that could be used to predict ER of ICC after partial hepatectomy [26]. In another study, a combined model based on clinicoradiologic-pathological and radiomics features was developed to predict ER of ICC, with AUC, sensitivity, and specificity of 0.949, 0.875, and 0.774, respectively [27]. Besides, it’s reported that two radiomics models were built based on arterial phase (the highest AUC of 0.89) and portal venous CT scans (the highest AUC of 0.81) respectively to evaluate LNM and clinical outcome of biliary tract cancer in two previous studies, which were inferior to ours [28, 30]. Lei Xu and his partners used a radiomics approach based on MR images through a support vector machine for preoperative lymph node status evaluation in ICC, with AUC of 0.788 and 0.787 in the training and validation cohort, respectively, but still inferior to ours [29]. More importantly, almost no relevant studies were reported to evaluate DD and LNM of ECC simultaneously using radiomics developed by machine learning. In addition, as mentioned before, there exist many differences between ECC and ICC, e.g., origin, growth pattern, morbidity, imaging characteristics, and prognosis of tumor. Therefore, we separately built the radiomics models based on MRI of ECC in our study to predict DD and LNM of the tumor. This work contributes to the line of literature on MRI radiomics analysis of DD and LNM of ECC by developing a machine learning framework combining feature selections and classifications.

Of course, radiomics analysis is an analysis that relies on the quantity and quality of data. Especially for machine learning-based methods, the data size is one important determinant for the final predictive performance. The more data we used to train and test the algorithms, the more confidence we have in the obtained models and performance. Compared with other radiomics analyses, the sample size is limited in our study. This difficulty is not uncommon in radiomics studies, and small datasets still demonstrate the potentials of radiomics approaches [57,58,59,60]. Our study initially indicates that radiomics analysis has potentials in predicting DD and LNM of ECC with promising performance. The proposed methods could guide the treatment strategies and improve the quality of life in patients with ECC. In other words, radiomics analysis based on big data could help in cases of rare diseases. Therefore, the standardized data collection terminological systems and approaches such as ENT COBRA ONTOLOGY [61] and SKIN-COBRA [62] must be followed to ensure the high data quality and the high performance of machine learning algorithms.

Meanwhile, it’s worth mentioning that since we systematically invested 1000 combinations of feature selection methods and machine learning classifiers in this study, DeLong test [41] should be applied in pairwise comparing of two models. Only when p-value < 0.05 in DeLong test, the two given models could be considered significantly different. Therefore, we conducted DeLong test to all pairs, though the number of possible pairs was significantly larger than most previous studies in which only a few models were considered [42,43,44,45,46,47,48,49]. Similar to those studies in which the chosen models of optimal AUC values were not found significantly different to other models in DeLong test [43,44,45, 49], we could still choose and apply the reported model combinations (DD: JMI and BAGC; LNM: MRMR and XGBC) to achieve satisfying performance. However, they should not be treated as the only best models, since they were not significantly different to other combinations, especially not to other combinations of similar AUC values in DeLong test.

Our study still had limitations. Firstly, since machine learning-based approaches are data-driven, the development and the performance of machine learning algorithms rely on the quality and quantity of datasets. However, due to the rareness of ECC, the number of ECC cases used here was relatively small. Another limitation of the present dataset was that the unbalanced positive samples versus the negative samples due to the low incidence of ECC. Therefore, the initial results obtained in the present study should be treated with cautions. In the future, a larger-scale patient population will be acquired to further improve and validate the proposed machine learning-based radiomics models for DD and LNM predictions of ECC. A larger and relatively balanced dataset would not only allow more extra validation but also lead to more confident performance and more reliable predictive models. Secondly, our study was retrospective and from a single institution. Prospective multi-center studies with considerably large datasets are needed to further develop our radiomics prediction models to validate the effectiveness as well as generalization. Therefore, we suggest conducting multi-center clinical collaboration in the future to utilize a larger ECC dataset for further validations. Thirdly, the number of features was larger than the number of cases in the present study. Though, we conducted a feature selection process before performing the classifications. This effort partially alleviated the difficulty. However, larger datasets were still needed to further reliable validations. Therefore, more efforts were required to collect more data to further validate the reproducibility of the present results and thus improve the performance of the proposed machine learning framework. Finally, our texture extraction was based on two-dimensional analysis instead of three-dimensional delineation, which may contribute to the loss of texture information in the tumor. Therefore, a three-dimensional analysis of ECC could be carried out in future work.


In conclusion, our MRI radiomics models based on optimal combinations of feature selection methods and machine learning classifiers demonstrate potentials in predictions for DD and LNM in ECC. Though the dataset used in this study is limited, future investigations using a larger dataset could further investigate the framework proposed in this study for better performance. This machine learning-based radiomics analysis provided a potential noninvasive method to evaluate ECC, which could guide the clinician to select the optimal treatment strategy depending on the individual situation and evaluate the survival prognosis in patients with ECC.

Availability of data and materials

The data used in this study is available from the corresponding author on reasonable request. The code used in this study is available at GitHub (


  1. available at



Extrahepatic cholangiocarcinoma


Magnetic resonance imaging


Differentiation Degree


Lymph node metastasis


T1 weighted imaging


T2-weighted imaging


Diffusion weighted imaging


Apparent diffusion coefficient


Area under precision recall curve


Receiver operating characteristic


Area under the ROC curve




Joint mutual information




categorized as intrahepatic




Computerized tomography


Positron emission computerized tomography


T1 high-resolution isotropic volume excitation


field of view


Time of repetition


Time of echo


number of excitations


MR cholangiopancreatography


Regions of interest


Minimum redundancy maximum relevance


Select K best using analysis of variance


Select percentile




Probability value


Ada boosting classifier


Bagging classifier


Bernoulli naïve Bayesian


Decision tree classifier


Gaussian Naïve Bayesian Classifier


K nearest neighborhood classifier


Random forest classifier


Stochastic gradient descent classifier


Support vector machine classifier


eXtreme gradient boosting classifier


Confidence interval


  1. Rizvi S, Gores GJ. Pathogenesis, diagnosis, and management of cholangiocarcinoma. Gastroenterology. 2013;145(6):1215–29

    Article  CAS  PubMed  Google Scholar 

  2. DeOliveira ML, Cunningham SC, Cameron JL, Kamangar F, Winter JM, Lillemoe KD, et al. Cholangiocarcinoma: thirty-one-year experience with 564 patients at a single institution. Ann Surg. 2007;245(5):755.–62.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Saha SK, Zhu AX, Fuchs CS, Brooks GA. Forty-year trends in cholangiocarcinoma incidence in the US: intrahepatic disease on the rise. Oncologist. 2016;21(5):594.–9.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Buettner S, van Vugt JL, IJzermans JN, Koerkamp BG. Intrahepatic cholangiocarcinoma: current perspectives. OncoTargets. 2017;10:1131.–42.

    Article  CAS  Google Scholar 

  5. Murakami Y, Uemura K, Sudo T, Hayashidani Y, Hashimoto Y, Nakamura H, et al. Adjuvant gemcitabine plus S-1 chemotherapy improves survival after aggressive surgical resection for advanced biliary carcinoma. Ann Surg. 2009;250(6):950–6

    Article  PubMed  Google Scholar 

  6. Mao Z-y, Guo X-c, Su D, Wang L-j, Zhang T-t, Bai L. Prognostic factors of cholangiocarcinoma after surgical resection: a retrospective study of 293 patients. Medical science monitor: international medical journal of experimental clinical research. 2015;21:2375

    Article  Google Scholar 

  7. Sotoudehmanesh R, Nejati N, Farsinejad M, Kolahdoozan S. Efficacy of endoscopic ultrasonography in evaluation of undetermined etiology of common bile duct dilatation on abdominal ultrasonography. Middle East journal of digestive diseases. 2016;8(4):267.–72.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Asayama Y, Nishie A, Ishigami K, Ushijima Y, Takayama Y, Okamoto D, et al. Prognostic significance of contrast-enhanced CT attenuation value in extrahepatic cholangiocarcinoma. Eur Radiol. 2017;27(6):2563–9

    Article  PubMed  Google Scholar 

  9. Ma KW, Cheung TT, She WH, Chok KSH, Chan ACY, Dai WC, et al. Diagnostic and prognostic role of 18-FDG PET/CT in the management of resectable biliary tract cancer. World J Surg. 2018;42(3):823–34

    Article  PubMed  Google Scholar 

  10. Jhaveri KS, Hosseini-Nik H. MRI of cholangiocarcinoma. J Magn Reson Imaging. 2015;42(5):1165–79

    Article  PubMed  Google Scholar 

  11. Suthar M, Purohit S, Bhargav V, Goyal P. Role of MRCP in differentiation of benign and malignant causes of biliary obstruction. JCDR. 2015;9(11):TC08

    PubMed  PubMed Central  Google Scholar 

  12. Voigtländer T, Lankisch TO. Endoscopic diagnosis of cholangiocarcinoma: from endoscopic retrograde cholangiography to bile proteomics. Best Pract Res Clin Gastroenterol. 2015;29(2):267–75

    Article  PubMed  Google Scholar 

  13. Li S, Qian H, Peng Y, Jia H, Lin G. Differentiating peripheral cholangiocarcinoma in stages T1N0M0 and T2N0M0 from hepatic hypovascular nodules using dynamic contrast-enhanced MRI. Sci Rep. 2017;7(1):1–7

    Google Scholar 

  14. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke vascular neurology. 2017;2(4)

  15. Choi JY. Radiomics and deep learning in clinical imaging: what should we do? Nucl Med Mol Imaging. 2018;52(2):89–90

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Peeken JC, Nüsslin F, Combs SE. Radio-oncomics. Strahlenther Onkol. 2017;193(10):767–79

    Article  PubMed  Google Scholar 

  17. Vial A, Stirling D, Field M, Ros M, Ritz C, Carolan M, et al. The role of deep learning and radiomic feature extraction in cancer-specific predictive modelling: a review. Transl Cancer Res. 2018;7(3):803–16

    Article  Google Scholar 

  18. Wang Y, Liu W, Yu Y, Liu J-j, Xue H-d, Qi Y-f, et al. CT radiomics nomogram for the preoperative prediction of lymph node metastasis in gastric cancer. Eur Radiol. 2020;30(2):976–86

    Article  PubMed  Google Scholar 

  19. Y-q H, Liang C-h, He L, Tian J, Liang C-s, Chen X, et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016;34(18):2157–64

    Article  Google Scholar 

  20. Yang J, Wang T, Yang L, Wang Y, Li H, Zhou X, et al. Preoperative prediction of axillary lymph node metastasis in breast cancer using mammography-based radiomics method. Sci Rep. 2019;9(1):1–11

    Google Scholar 

  21. Wu S, Zheng J, Li Y, Yu H, Shi S, Xie W, et al. A radiomics nomogram for the preoperative prediction of lymph node metastasis in bladder cancer. Clin Cancer Res. 2017;23(22):6904–11

    Article  CAS  PubMed  Google Scholar 

  22. Gu Y, She Y, Xie D, Dai C, Ren Y, Fan Z, et al. A texture analysis-based prediction model for lymph node metastasis in stage IA lung adenocarcinoma. Ann Thorac Surg. 2018;106(1):214–20

    Article  PubMed  Google Scholar 

  23. Ditmer A, Zhang B, Shujaat T, Pavlina A, Luibrand N, Gaskill-Shipley M, et al. Diagnostic accuracy of MRI texture analysis for grading gliomas. J Neurooncol. 2018;140(3):583–9

    Article  PubMed  Google Scholar 

  24. Wang H, Chen H, Duan S, Hao D, Liu J. Radiomics and machine learning with multiparametric preoperative MRI may accurately predict the histopathological grades of soft tissue sarcomas. J Magn Reson Imaging. 2020;51(3):791–7

    Article  PubMed  Google Scholar 

  25. Sadot E, Simpson AL, Do RK, Gonen M, Shia J, Allen PJ, et al. Cholangiocarcinoma: correlation between molecular profiling and imaging phenotypes. PLoS One. 2015;10(7):e0132953

    Article  PubMed  PubMed Central  Google Scholar 

  26. Liang W, Xu L, Yang P, Zhang L, Wan D, Huang Q, et al. Novel nomogram for preoperative prediction of early recurrence in intrahepatic cholangiocarcinoma. Front Oncol. 2018;8:360

    Article  PubMed  PubMed Central  Google Scholar 

  27. Zhao L, Ma X, Liang M, Li D, Ma P, Wang S, et al. Prediction for early recurrence of intrahepatic mass-forming cholangiocarcinoma: quantitative magnetic resonance imaging combined with prognostic immunohistochemical markers. Cancer Imaging. 2019;19(1):1–10

    Article  Google Scholar 

  28. Ji G-W, Zhang Y-D, Zhang H, Zhu F-P, Wang K, Xia Y-X, et al. Biliary tract cancer at CT: a radiomics-based model to predict lymph node metastasis and survival outcomes. Radiology. 2019;290(1):90–8

    Article  PubMed  Google Scholar 

  29. Xu L, Yang P, Liang W, Liu W, Wang W, Luo C, et al. A radiomics approach based on support vector machine using MR images for preoperative lymph node status evaluation in intrahepatic cholangiocarcinoma. Theranostics. 2019;9(18):5374.–85.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Ji G-W, Zhu F-P, Zhang Y-D, Liu X-S, Wu F-Y, Wang K, et al. A radiomics approach to predict lymph node metastasis and clinical outcome of intrahepatic cholangiocarcinoma. Eur Radiol. 2019;29(7):3725–35

    Article  PubMed  Google Scholar 

  31. Bennasar M, Hicks Y, Setchi R. Feature selection using joint mutual information maximisation. Expert Syst Appl. 2015;42(22):8520–32

    Article  Google Scholar 

  32. Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis machine intelligence. 2005;27(8):1226–38

    Article  PubMed  Google Scholar 

  33. St L, Wold S. Analysis of variance (ANOVA). Chemom Intel Lab Syst. 1989;6(4):259–72

    Article  Google Scholar 

  34. Wilcoxon F, Katti S, Wilcox RA. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. American Cyanamid Company Pearl River, NY. 1963:171–6.

  35. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJ. Machine learning methods for quantitative radiomic biomarkers. Sci Rep. 2015;5(1):1–11

    Article  Google Scholar 

  36. Sun W, Jiang M, Dang J, Chang P, Yin F-F. Effect of machine learning methods on predicting NSCLC overall survival time based on Radiomics analysis. Radiat Oncol. 2018;13(1):1–8

    Article  Google Scholar 

  37. Zhang Y, Zhang B, Liang F, Liang S, Zhang Y, Yan P, et al. Radiomics features on non-contrast-enhanced CT scan can precisely classify AVM-related hematomas from other spontaneous intraparenchymal hematoma types. Eur Radiol. 2019;29(4):2157–65

    Article  PubMed  Google Scholar 

  38. Cheng Z, Zhang J, He N, Li Y, Wen Y, Xu H, et al. Radiomic features of the nigrosome-1 region of the substantia nigra: using quantitative susceptibility mapping to assist the diagnosis of idiopathic Parkinson's disease. Front Aging Neurosci. 2019;11:167

    Article  PubMed  PubMed Central  Google Scholar 

  39. Chen C-H, Chang C-K, Tu C-Y, Liao W-C, Wu B-R, Chou K-T, et al. Radiomic features analysis in computed tomography images of lung nodule classification. PLoS One. 2018;13(2):e0192002

    Article  PubMed  PubMed Central  Google Scholar 

  40. Huang J, Ling CXJITok. Engineering D: Using AUC and accuracy in evaluating learning algorithms 2005;17(3):299–310.

  41. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45

    Article  CAS  PubMed  Google Scholar 

  42. Chen SC-C, Lo C-M, Wang S-H, Su EC-Y. RNA editing-based classification of diffuse gliomas: predicting isocitrate dehydrogenase mutation and chromosome 1p/19q codeletion. BMC bioinformatics. 2019;20(19):1–11

    Google Scholar 

  43. Du D, Feng H, Lv W, Ashrafinia S, Yuan Q, Wang Q, et al. Machine learning methods for optimal radiomics-based differentiation between recurrence and inflammation: application to nasopharyngeal carcinoma post-therapy PET/CT images. Mol Imaging. 2020;22(3):730–8

    Article  CAS  Google Scholar 

  44. Gu Q, Feng Z, Liang Q, Li M, Deng J, Ma M, et al. Machine learning-based radiomics strategy for prediction of cell proliferation in non-small cell lung cancer. Eur J Radiol. 2019;118:32–7

    Article  PubMed  Google Scholar 

  45. Hu J, Zhao Y, Li M, Liu J, Wang F, Weng Q, et al. Machine learning-based radiomics analysis in predicting the meningioma grade using multiparametric MRI. Eur J Radiol. 2020;131:109251

    Article  PubMed  Google Scholar 

  46. Liang M, Cai Z, Zhang H, Huang C, Meng Y, Zhao L, et al. Machine learning-based analysis of rectal Cancer MRI Radiomics for prediction of Metachronous liver metastasis. Acad Radiol. 2019;26(11):1495–504

    Article  PubMed  Google Scholar 

  47. Ning Z, Luo J, Xiao Q, Cai L, Chen Y, Yu X, et al. Multi-modal magnetic resonance imaging-based grading analysis for gliomas by integrating radiomics and deep features. Ann Transl Med. 2021;9(4):298–8

  48. Park YW, Choi YS, Kim SE, Choi D, Han K, Kim H, et al. Radiomics features of hippocampal regions in magnetic resonance imaging can differentiate medial temporal lobe epilepsy patients from healthy controls. Sci Rep. 2020;10(1):1–8

    CAS  Google Scholar 

  49. Zhang R, Zhu L, Cai Z, Jiang W, Li J, Yang C, et al. Potential feature exploration and model development based on 18F-FDG PET/CT images for differentiating benign and malignant lung lesions. Eur J Radiol. 2019;121:108735

    Article  PubMed  Google Scholar 

  50. Wang Z, Lan X, Xiao Y, Liu J, Ji J. Correlation between TNM staging of primary cholangiocarcinoma and the maximum standard uptake value of (18) F-2-deoxy-D-glucose positron emission tomography with computerized tomography. Zhonghua Yi Xue Za Zhi. 2017;97(39):3104–7

    CAS  PubMed  Google Scholar 

  51. Ciresa M, De Gaetano AM, Pompili M, Saviano A, Infante A, Montagna M, et al. Enhancement patterns of intrahepatic mass-forming cholangiocarcinoma at multiphasic computed tomography and magnetic resonance imaging and correlation with clinicopathologic features. Eur Rev Med Pharmacol Sci. 2015;19(15):2786–97.

    CAS  PubMed  Google Scholar 

  52. Holzapfel K, Gaa J, Schubert EC, Eiber M, Kleeff J, Rummeny EJ, et al. Value of diffusion-weighted MR imaging in the diagnosis of lymph node metastases in patients with cholangiocarcinoma. Abdominal radiology. 2016;41(10):1937–41

    Article  PubMed  Google Scholar 

  53. Cui X-Y, Chen H-W, Cai S, Fang XM Bao J, Tang Q-F, Wu L-Y, Diffusion-weighted MR imaging for detection of extrahepatic cholangiocarcinoma. Eur J Radiol 2012;81(11):2961–2965.

  54. Jiang L, Tan H, Panje CM, Yu H, Xiu Y, Shi H. Role of 18F-FDG PET/CT imaging in intrahepatic cholangiocarcinoma. Clin Nucl Med. 2016;41(1):1–7

    Article  PubMed  Google Scholar 

  55. Limkin E, Sun R, Dercle L, Zacharaki E, Robert C, Reuzé S, et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann Oncol. 2017;28(6):1191–206

    Article  CAS  PubMed  Google Scholar 

  56. Lambin P, Leijenaar RT, Deist TM, Peerlings J, De Jong EE, Van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62

    Article  PubMed  Google Scholar 

  57. Ning P, Gao F, Hai J, Wu M, Chen J, Zhu S, et al. Application of CT radiomics in prediction of early recurrence in hepatocellular carcinoma. Abdominal Radiology. 2020;45(1):64–72

    Article  PubMed  Google Scholar 

  58. Taghavi M, Trebeschi S, Simões R, Meek DB, Beckers RC, Lambregts DM, et al. Machine learning-based analysis of CT radiomics model for prediction of colorectal metachronous liver metastases. Abdominal Radiology. 2021;46(1):249–56

    Article  PubMed  Google Scholar 

  59. Wang X-H, Long L-H, Cui Y, Jia AY, Zhu X-G, Wang H-Z, et al. Mri-based radiomics model for preoperative prediction of 5-year survival in patients with hepatocellular carcinoma. Br J Cancer. 2020;122(7):978–85

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Zhao Y, Wu J, Zhang Q, Hua Z, Qi W, Wang N, et al. Radiomics analysis based on multiparametric MRI for predicting early recurrence in hepatocellular carcinoma after partial hepatectomy. J Magn Reson Imaging. 2021;53(4):1066–79

    Article  PubMed  Google Scholar 

  61. Tagliaferri L, Budrukkar A, Lenkowicz J, Cambeiro M, Bussu F, Guinot JL, et al. ENT COBRA ONTOLOGY: the covariates classification system proposed by the Head & Neck and skin GEC-ESTRO working group for interdisciplinary standardized data collection in head and neck patient cohorts treated with interventional radiotherapy (brachytherapy). Journal of contemporary brachytherapy. 2018;10(3):260.–6.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Lancellotta V, Guinot JL, Fionda B, Rembielak A, Di Stefani A, Gentileschi S, et al. SKIN-COBRA (consortium for brachytherapy data analysis) ontology: the first step towards interdisciplinary standardized data collection for personalized oncology in skin cancer. Journal of contemporary brachytherapy. 2020;12(2):105.–10.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


The authors acknowledge Dr. Tiejun Zhou (Department of Pathology) and Dr. Lulu Xu (Department of Radiology) of The Affiliated Hospital of Southwest Medical University for their assistance and contributions to this research.


This study is supported by the Innovation Method Program of the Ministry of Science and Technology of the People’s Republic of China (M112017IM010700), Technology Strategic Cooperation Project Between Luzhou Municipal People’s Government and Southwest Medical University (2019LZXNYDZ04), the Health Committee of Sichuan Province (19PJ151) and the Project of Southwest Medical University (2020ZRQNA041). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations



All authors have made significant contributions to this study. YT, JS, and LPF contributed to the conception of the study. YT, CMY, JS, and SS contributed significantly to analysis and manuscript preparation; YT, CMY, WJW, LPF performed the data analyses and wrote the manuscript; LPF, SS, and JS helped perform the analysis with constructive discussions. All authors have read and approved the manuscript.

Authors’ information

YT, School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, Chengdu, 610054, Sichuan, China.

CMY, JS, Department of Radiology, The Affiliated Hospital of Southwest Medical University, and Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, 646000, Sichuan, China.

SS, Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, 25 Taiping Street, Luzhou, 646000, Sichuan, China.

WJW, School of Information and Software Engineering, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, Chengdu, 610054, Sichuan, China.

LPF, Department of Ultrasound, The Affiliated Hospital of Southwest Medical University, 25 Taiping Street, Luzhou, 646000, Sichuan, China.

Corresponding authors

Correspondence to Li Ping Fan or Jian Shu.

Ethics declarations

Ethics approval and consent to participate

Ethical approval for this study was obtained from the Ethical Committee of the Affiliated Hospital of Southwest Medical University (KY2019063). The requirement for informed consent was waived by the Ethical Committee of the Affiliated Hospital of Southwest Medical University, owing to the retrospective nature of the study. Some administrative permissions were required to access the raw data from the patient’s records.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests to report.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Y., Yang, C.M., Su, S. et al. Machine learning-based Radiomics analysis for differentiation degree and lymphatic node metastasis of extrahepatic cholangiocarcinoma. BMC Cancer 21, 1268 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: