Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma

Ji, Gu-Wei; Jiao, Chen-Yu; Xu, Zheng-Gang; Li, Xiang-Cheng; Wang, Ke; Wang, Xue-Hao

doi:10.1186/s12885-022-09352-3

Research
Open access
Published: 11 March 2022

Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma

Gu-Wei Ji^1,2,3^na1,
Chen-Yu Jiao^1,2,3^na1,
Zheng-Gang Xu^1,2,3^na1,
Xiang-Cheng Li^1,2,3,
Ke Wang^1,2,3 &
…
Xue-Hao Wang^1,2,3

BMC Cancer volume 22, Article number: 258 (2022) Cite this article

2703 Accesses
24 Citations
2 Altmetric
Metrics details

Abstract

Background

Accurate prognosis assessment is essential for surgically resected intrahepatic cholangiocarcinoma (ICC) while published prognostic tools are limited by modest performance. We therefore aimed to establish a novel model to predict survival in resected ICC based on readily-available clinical parameters using machine learning technique.

Methods

A gradient boosting machine (GBM) was trained and validated to predict the likelihood of cancer-specific survival (CSS) on data from a Chinese hospital-based database using nested cross-validation, and then tested on the Surveillance, Epidemiology, and End Results (SEER) database. The performance of GBM model was compared with that of proposed prognostic score and staging system.

Results

A total of 1050 ICC patients (401 from China and 649 from SEER) treated with resection were included. Seven covariates were identified and entered into the GBM model: age, tumor size, tumor number, vascular invasion, number of regional lymph node metastasis, histological grade, and type of surgery. The GBM model predicted CSS with C-Statistics ≥ 0.72 and outperformed proposed prognostic score or system across study cohorts, even in sub-cohort with missing data. Calibration plots of predicted probabilities against observed survival rates indicated excellent concordance. Decision curve analysis demonstrated that the model had high clinical utility. The GBM model was able to stratify 5-year CSS ranging from over 54% in low-risk subset to 0% in high-risk subset.

Conclusions

We trained and validated a GBM model that allows a more accurate estimation of patient survival after resection compared with other prognostic indices. Such a model is readily integrated into a decision-support electronic health record system, and may improve therapeutic strategies for patients with resected ICC.

Peer Review reports

Background

Intrahepatic cholangiocarcinoma (ICC) ranks as the second most common primary liver cancer after hepatocellular carcinoma. The increasing incidence and accompanying rising mortality rates of ICC over the past few decades worldwide have become a significant healthcare problem [1]. Although surgery offers the best chance of a potential cure for patients with localized and resectable ICC, the prognosis following resection remains discouraging, with 5-year survival of 25–35%, and mortality largely attributes to tumor recurrence, with 50–70% of patients experiencing tumor recurrence [2,3,4]. Thus, accurate prognosis assessment is essential to help direct appropriate individualized treatment for surgically resected ICC and thereafter optimize outcomes.

The American Joint Committee on Cancer (AJCC) staging manual represents the most widely used system for surgically managed patients with ICC. Although constantly refined, the AJCC staging system exhibits modest prognostic accuracy for resected cases and the prognosis of patients with the same stage varies [2, 5]. By using data from institutional series, multiple prognostic nomograms have been established to predict survival after resection for ICC [2, 6]. Recently, Raoof et al. [7] developed a prognostic score for ICC based on the independent association of multifocality, extrahepatic extension, grade, nodal status, and age (MEGNA) with survival using cases derived from a population-based database. All these published models were developed on factors known after surgery because several determinants, such as tumor grade and nodal status, can be ascertained only in the postoperative context. However, all these models are outmoded and rigid tools by nature because all variables were examined by Cox proportional hazard regression and assigned fixed weights, and missing data are not allowed. Hence, new methods to improve survival estimation and goal-concordant cancer care are warranted.

Today, machine learning (ML) algorithms enable computers to learn from large-scale, heterogeneous health-care data without predefined rules. ML models have offered considerable advantages over traditional statistical models for many tasks, such as diagnosis and classification, risk stratification, and survival prediction [8]. Unfortunately, many popular ML algorithms are essentially black boxes that limit the physician’s trust in their results. Gradient boosting machine (GBM) is currently considered as the state-of-the-art algorithm for prediction with tabular data and has been consistently utilized as the top performer of modelling competitions in a variety of clinical scenarios [9,10,11]. GBM algorithm can be disassembled into simple decision-tree-base-learners, which provide model-centric explanations, and handle missing values with the gradient-boosting predictor. To date, there has been no effort to use GBM to take full advantage of readily-available clinical information to help physicians predict survival of patients with resected ICC. Accordingly, we assembled a large-scale international cohort of ICC patients to design and evaluate a GBM model for prognosis prediction. We hypothesized that this model would outperform routinely used or previously established prognostic indices in ICC.

Methods

Patient population and study design

Adult patients (age ≥ 20 years) with histology-confirmed ICC who underwent liver resection were retrospectively identified from two sources: (1) consecutive patients treated between 2009 and 2019 at the First Affiliated Hospital of Nanjing Medical University (FAHNJMU) (Nanjing, China); (2) patients (histology codes 8140 and 8160 for adenocarcinoma and cholangiocarcinoma in combination with site code C22.1 for intrahepatic bile duct, according to International Classification of Diseases for Oncology, 3rd Edition) [12] between 2004 and 2015 in the Surveillance, Epidemiology, and End Results (SEER) database. The exclusion criteria were: (1) loss to follow-up or a survival of < 1 month; (2) missing information on the type of resection; (3) another malignant primary tumor prior to ICC diagnosis; (4) cause of death unknown; (5) exact tumor size unknown; (6) incomplete information on tumor extension or metastasis for 8th AJCC staging; (7) distant metastatic disease.

The GBM model was trained and validated on data from FAHNJMU using nested cross-validation, and then tested on the SEER database (Fig. 1A). Because the model was developed on the dataset of Asian patients, use of the geographically distinct population from SEER should provide an appropriate assessment for its generalization ability. This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guideline [13]. This study was approved by the ethics committee of FAHNJMU (Nanjing, China) and the requirement of informed patient consent was waived.

Data collection and outcome

The pertinent demographic and clinicopathological data were abstracted based on a standardized template. Data collection included the following characteristics of interest: age, gender, tumor size, tumor number, vascular invasion, regional lymph node metastasis (LNM), number of regional LNM, histological grade, visceral peritoneum invasion, adjacent organ invasion, liver fibrosis score, and type of surgery. The above-mentioned covariates are readily retrieved from electronic medical records and routine clinical practice. Patients in the FAHNJMU database were monitored after surgery with laboratory and imaging studies, including liver function, serum tumor markers, ultrasonography, dynamic computed tomography or magnetic resonance imaging, every 3 months during the first 2 years and every 6 months thereafter; the follow-up was terminated on August 20, 2020. Survival data for the SEER database were estimated using statistics from the US Census Bureau [14]. The primary outcome of this study was cancer-specific survival (CSS), defined as the duration from the date of surgery to the date of death from ICC. All deaths from any other cause were counted as non-cancer-specific and censored at the date of the last follow-up.

Model training, validating and testing

A GBM model that aggregated multiple predictors was trained to predict the likelihood of survival with decision-tree-base-learners using the “gbm” R package. Each base learner may consist of different predictors; predictors with higher importance are utilized in more decision trees as well as earlier in the boosting algorithm. Hyperparameters were tuned with a grid search approach in a 3 × fivefold nested, cross-validated, manner (3 outer iterations and 5 inner iterations) on the training/validation cohort using the “mlr” R package. Nested cross-validation was applied because it more accurately estimates the independent validation error of the given algorithm on unseen datasets by averaging its performance metrics across folds [15]. Study pipeline is schematically depicted in Fig. 1B. The GBM model was then tested on the patients of the test cohort to determine whether it remains accurate when new data are fed into it. We also compared the performance of GBM model to that of AJCC staging system and previously published MEGNA model.

Statistical analysis

All statistical analyses were performed using R software version 3.4.4 (www.r-project.org). Categorical variables were presented as number (percentage) and compared using χ2 test. Continuous variables were reported as median (interquartile range) and compared using Mann–Whitney U test or Kruskal–Wallis rank test, as appropriate. Survival probabilities and 95% confidence intervals (CI) were estimated using the Kaplan–Meier method and compared by the log-rank test. Model performance was measured by Harrell’s C-statistic and 95% CIs were calculated by bootstrapping. Model calibration was performed by plotting the predicted probabilities versus the observed outcomes. Clinical utility was determined by decision curve analysis that quantifies the net benefit associated with the adoption of the model [16]. By using X-tile software [17], the optimal cut-points of GBM predictions were determined to stratify patients at low, intermediate, or high risk for cancer-specific death. A two-sided P < 0.05 was considered statistically significant.

Results

Patient data

A total of 1050 patients (401 from the FAHNJMU database and 649 from the SEER database; 559 men [53.2%] and 491 women [46.8%]; median [interquartile range] age, 62.0 [53.0–69.0] years) who met the study criteria formed the original dataset. During a median follow-up of 36.2 months (range, 1.0–165.0 months), 591 cancer-specific deaths (56.3%) occurred; the 2-and 5-year CSS rates were 63.1% and 35.6%, respectively. Comparisons of training/validation (n = 401) and test (n = 649) cohorts are shown in Table 1.

Table 1 Comparison of demographic and clinicopathological characteristics between the training/ validation and test cohorts

Full size table

GBM prognostic model

Based on the training/validation cohort, we explored 12 potential model covariates using GBM algorithm and nested cross-validation. We utilized 2000 decision trees sequentially, with at least 5 observations in each terminal node; the decision tree depth was optimized at 2, corresponding to 2-way interactions, and the shrinkage parameter was optimized at 0.01. Covariates with a relative influence greater than 6 (age, tumor size, tumor number, vascular invasion, number of regional LNM, histological grade, and type of surgery) were integrated into the GBM model developed to predict CSS (Fig. 2A-B). The most important feature in the GBM model was tumor size, followed by patient age and number of regional LNM. No difference was observed with regard to GBM prediction scores between training/validation and test cohorts (P = 0.499) (Fig. S1).

Model performance

For predicting post-resection survival specific for ICC, the GBM model had a C-statistic of 0.751 (95% CI 0.717–0.784) in the training/validation cohort, significantly better than that achieved using 8th edition AJCC criteria as well as MEGNA prognostic score (P < 0.001) (Table 2). The internal validation group was the nested cross-validation of the GBM model of the training cohort with approximately 134 patients in each outer loop iteration; GBM model yielded a median C-statistic of 0.756 (range 0.707–0.796) for the composite outcome and outperformed AJCC system (median C-statistic 0.679, range 0.648–0.693, P < 0.05) as well as MEGNA score (median C-statistic 0.660, range 0.656–0.710, P < 0.05) (Fig. 2C). In the test cohort, the GBM model also offered improved prognostic discrimination (C-statistic, 0.723; 95% CI 0.697–0.749) compared with the AJCC staging system and MEGNA prognostic score (P < 0.001) (Table 2). The superior performance of GBM model was further confirmed in sub-cohorts stratified by covariate integrity (complete/missing information) (Table S1). Calibration curves for probability of 2-and 5-year CSS showed excellent agreement between model prediction and actual observation in both the training/validation and test cohorts (Fig. 3A-B). Decision curve analysis demonstrated that GBM model provided larger net benefits to decide which ICC patients to refer to specialized oncological care compared with "treat all" or "treat none" strategy (Fig. 3C-D). We deployed an app (https://machinelearningmodel.shinyapps.io/ICC_App/) that allows real-time survival estimates using the prediction score (Fig. 2D).

Table 2 Performance of proposed and existing prognostic tools for ICC

Full size table

Risk stratification

With X-tile software identifying optimal cut-off values for prediction scores (-3.65 and -2.45) (Fig. S2), patients were categorized into three groups with a highly different probability of post-resection survival in the training/validation cohort: low risk (194 [48.4%]; 5-year CSS, 58.1%), intermediate risk (165 [41.1%]; 5-year CSS, 10.3%), and high risk (42 [10.5%]; 5-year CSS, not applicable) (P < 0.001). The three prognostic strata by using the GBM model were confirmed in the test cohort: low risk (345 [53.1%]; 5-year CSS, 54.1%), intermediate risk (251 [38.7%]; 5-year CSS, 18.5%), and high risk (53 [8.2%]; 5-year CSS, 0.0%) (P < 0.001) (Fig. 4A-B; Table 3). Patient characteristics stratified by the GBM model are shown in Table S2. Remarkable differences were observed among three risk groups in all listed characteristics except for patient gender. We also noted that patients were split into distinct prognostic groups across the AJCC stages using the proposed GBM model (P < 0.001) (Fig. 4C-E).

Table 3 Cancer-specific survival according to risk stratification

Full size table

Discussion

Accurate prediction of survival in ICC is important for decision making and counseling of patients. By harvesting data from over 1000 patients with surgically managed ICC, we trained, validated and tested a novel gradient-boosting ML model that utilized readily available clinical data and provided accurate prognosis prediction (C-statistic ≥ 0.72). The GBM model outperformed both the AJCC staging system as well as the previously published MEGNA score. Importantly, this GBM model increased the number of low-risk/early-stage patients who could be identified by approximately 1.4-fold as compared to the widely adopted AJCC system.

Genomic biomarkers may provide prognostic information; however, their applicability is limited in routine clinical care [18]. Notably, a simple system that utilizes readily available clinical data and provides accurate prognosis estimates remains the preferred reference for personalized management in clinical oncology. Clinicians already use simple models to discuss, for example, the benefit of adjuvant therapy with patients [19]. Prior efforts to develop parsimonious models to predict the prognosis for patients with ICC have mostly been reliant on Cox regression modeling strategies [2, 6, 7]. The Cox model, also known as the proportional hazards model, assumes that the interactions between covariates are homogeneous and different covariates multiplicatively contribute to the hazard function but complex relationships exist between factors related to ICC prognosis [20, 21]. Moreover, Cox regression analysis must be performed in cases with complete information and improper management of data, such as excluding cases with missing data, introduces substantial bias, as noted across various cancer types [22, 23]. In that setting, ML techniques have a significant role to play.

Recent recommendations have emphasized the explainability along with the robustness to incomplete data as the priority in ML research [24, 25]. Decision tree-based algorithms represent a large family of ML techniques. Current machine-based classification and regression trees (CART) have been applied to define prognostic groups for patients with resected ICC because of their simplicity and intuitive interpretation [20, 21]. Nevertheless, such trees suffer from intrinsic limitations in predictive performance. Gradient boosting of regression trees enables highly competitive, robust, interpretable procedures to relax the assumption of proportional hazards and allow for complicated relationships between covariates that improve the predictive accuracy [26]. GBM model can be disassembled into massive decision-tree-base-learners (CART models) so that it is possible to decipher the intrinsical structure of our proposed model and understand how the machine makes predictions. Moreover, GBM algorithm has a built-in functionality to handle missing values that permits utilizing data from, and assigning classification to, all observations in the cohort without the need of imputation for missing data [9]. This considerably broadens the datasets available and the scope for building prognostic models. Another limit to ML techniques is overfitting (low bias and high variance), defined as a superior performance in the training/validation cohort but inferior performance in an independent test dataset [27]. To avoid this issue, a nested cross-validation approach was applied for hyperparameter tuning in this study because it prevents information leaking between cases used for model training and validation [15]. Comparable performance in the training/validation cohort, the test cohort as well as sub-cohorts stratified by covariate integrity further confirmed good reproducibility and reliability of our GBM model.

Although ML algorithms may improve prediction performance in the prognostic setting, it is important to demonstrate that improved accuracy can translate to better clinician and patient decision-making. We therefore provided an app (https://machinelearningmodel.shinyapps.io/ICC_App/) that allows for a GBM prediction input and an immediate feedback of survival probabilities at individualized time scale. Also, the GBM model was able to identify three risk strata for cancer-specific death (ie, low-, intermediate-, and high-risk groups). Nearly half of ICC patients who suffered from extremely dismal prognosis following resection were identified by using the GBM model. Therefore, adjuvant treatments, such as capecitabine-based chemotherapy or immune-directed therapy, is desirable for intermediate-to-high risk patients. In turn, over half of patients with surgically resected ICC were categorized as low risk with satisfactory long-term survival and thus may receive no adjuvant therapy. On the other hand, the GBM model highlights that the number of regional LNM holds more prognostic information compared with the involvement of regional LNM, which is consistent with previous publication [28].

Several limitations warrant attention when interpreting the results of this model. First, our model was developed, validated and tested using retrospective data; a prospective validation study should be conducted to confirm our results prior to its routine use in clinical practice. Nonetheless, the top-ranked features in the proposed model, such as tumor size, number of regional LNM, vascular invasion and tumor number, are all well-established prognostic factors, lending validity to our GBM model [1]. Second, patient and tumor characteristics included in this study were limited because some potential prognostic factors, such as carcinoembryonic antigen, carbohydrate antigen 19–9, surgical margin status and treatment of recurrent disease, were not available in SEER database. However, the seven covariates integrated into our model are readily accessible from health-care data, indicating its simplicity and feasibility; the proposed GBM model is still able to provide accurate prediction and risk stratification even without additional prognostic information. Finally, our GBM model promises to identify ICC patients at high risk for cancer-specific death after resection but does not provide individualized solution for how to manage these patients clinically to ultimately improve prognosis.

Conclusions

In conclusion, we developed and validated an interpretable ML model using readily available clinical data to predict the prognosis for patients with resected ICC. Our GBM model provides more-accurate determination of survival probabilities compared with previously proposed MEGNA score and widely adopted AJCC staging system. Such an easy-to-use tool may lead to better personalized treatments for patients with resected ICC in future clinical practice.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Abbreviations

ICC:: Intrahepatic cholangiocarcinoma
AJCC:: American Joint Committee on Cancer
MEGNA:: Multifocality, extrahepatic extension, grade, nodal status, and age
ML:: Machine learning
GBM:: Gradient boosting machine
FAHNJMU:: First Affiliated Hospital of Nanjing Medical University
SEER:: Surveillance, Epidemiology, and End Results
LNM:: lymph node metastasis
CSS:: Cancer-specific survival
CART:: Classification and regression trees

References

Banales JM, Marin JJG, Lamarca A, et al. Cholangiocarcinoma 2020: the next horizon in mechanisms and management. Nat Rev Gastroenterol Hepatol. 2020;17(9):557–88.
Article Google Scholar
Wang Y, Li J, Xia Y, et al. Prognostic nomogram for intrahepatic cholangiocarcinoma after partial hepatectomy. J Clin Oncol. 2013;31(9):1188–95.
Article Google Scholar
Spolverato G, Kim Y, Ejaz A, et al. Conditional Probability of Long-term Survival After Liver Resection for Intrahepatic Cholangiocarcinoma: A Multi-institutional Analysis of 535 Patients. JAMA Surg. 2015;150(6):538–45.
Article Google Scholar
Tsilimigras DI, Sahara K, Wu L, et al. Very Early Recurrence After Liver Resection for Intrahepatic Cholangiocarcinoma: Considering Alternative Treatment Approaches. JAMA Surg. 2020;155(9):823–31.
Article Google Scholar
Büttner S, Galjart B, Beumer BR, et al. Quality and performance of validated prognostic models for survival after resection of intrahepatic cholangiocarcinoma: a systematic review and meta-analysis. HPB (Oxford). 2021;23(1):25–36.
Article Google Scholar
Hyder O, Marques H, Pulitano C, et al. A nomogram to predict long-term survival after resection for intrahepatic cholangiocarcinoma: an Eastern and Western experience. JAMA Surg. 2014;149(5):432–8.
Article Google Scholar
Raoof M, Dumitra S, Ituarte PHG, et al. Development and Validation of a Prognostic Score for Intrahepatic Cholangiocarcinoma. JAMA Surg. 2017;152(5).
Article Google Scholar
Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262–73.
Article Google Scholar
Eaton JE, Vesterhus M, McCauley BM, et al. Primary Sclerosing Cholangitis Risk Estimate Tool (PREsTo) Predicts Outcomes of the Disease: A Derivation and Validation Study Using Machine Learning. Hepatology. 2020;71(1):214–24.
Article CAS Google Scholar
Bibault JE, Chang DT, Xing L. Development and validation of a model to predict survival in colorectal cancer using a gradient-boosted machine. Gut. 2021;70(5):884–9.
Article CAS Google Scholar
Shung DL, Au B, Taylor RA, et al. Validation of a Machine Learning Model That Outperforms Clinical Risk Scoring Systems for Upper Gastrointestinal Bleeding. Gastroenterology. 2020;158(1):160–7.
Article Google Scholar
Fritz, AG. International Classification of Diseases for Oncology: ICD-O. 3. Geneva, Switzerland: World Health Organization; 2000.
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015; 350: g7594.
Doll KM, Rademaker A, Sosa JA. Practical Guide to Surgical Data Sets: Surveillance, Epidemiology, and End Results (SEER) Database. JAMA Surg. 2018;153(6):588–9.
Article Google Scholar
Maros ME, Capper D, Jones DTW, et al. Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat Protoc. 2020;15(2):479–512.
Article CAS Google Scholar
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26(6):565–74.
Article Google Scholar
Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin Cancer Res. 2004;10(21):7252–9.
Article CAS Google Scholar
Nault JC, Villanueva A. Biomarkers for Hepatobiliary Cancers. Hepatology. 2021;73(Suppl 1):115–27.
Article Google Scholar
Wang SJ, Lemieux A, Kalpathy-Cramer J, et al. Nomogram for predicting the benefit of adjuvant chemoradiotherapy for resected gallbladder cancer. J Clin Oncol. 2011;29(35):4627–32.
Article Google Scholar
Bagante F, Spolverato G, Merath K, et al. Intrahepatic cholangiocarcinoma tumor burden: A classification and regression tree model to define prognostic groups after resection. Surgery. 2019;166(6):983–90.
Article Google Scholar
Tsilimigras DI, Mehta R, Moris D, et al. A Machine-Based Approach to Preoperatively Identify Patients with the Most and Least Benefit Associated with Resection for Intrahepatic Cholangiocarcinoma: An International Multi-institutional Analysis of 1146 Patients. Ann Surg Oncol. 2020;27(4):1110–9.
Article Google Scholar
Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338: b2393.
Jeong CW, Washington SL 3rd, Herlemann A, Gomez SL, Carroll PR, Cooperberg MR. The New Surveillance, Epidemiology, and End Results Prostate with Watchful Waiting Database: Opportunities and Limitations. Eur Urol. 2020;78(3):335–44.
Article Google Scholar
Towards trustable machine learning. Nat Biomed Eng. 2018;2(10):709–10.
Article Google Scholar
Watson DS, Krutzinna J, Bruce IN, et al. Clinical applications of machine learning algorithms: beyond the black box. BMJ. 2019;364: l886.
Luna JM, Gennatas ED, Ungar LH, et al. Building more accurate decision trees with the additive tree. Proc Natl Acad Sci U S A. 2019;116(40):19887–93.
Article CAS Google Scholar
Mummadi SR, Al-Zubaidi A, Hahn PY. Overfitting and Use of Mismatched Cohorts in Deep Learning Models: Preventable Design Limitations. Am J Respir Crit Care Med. 2018;198(4):544–5.
Article Google Scholar
Zhang XF, Xue F, Dong DH, et al. Number and Station of Lymph Node Metastasis After Curative-intent Resection of Intrahepatic Cholangiocarcinoma Impact Prognosis. Ann Surg. 2020. https://doi.org/10.1097/SLA.0000000000003788.
Article PubMed Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by Key Program of the National Natural Science Foundation of China (31930020), National Natural Science Foundation of China (82102150) and Natural Science Foundation of Jiangsu Province (BK20210968).

Author information

Gu-Wei Ji, Chen-Yu Jiao and Zheng-Gang Xu contributed equally to this work.

Authors and Affiliations

Hepatobiliary Center, The First Affiliated Hospital of Nanjing Medical University, Nanjing, People’s Republic of China
Gu-Wei Ji, Chen-Yu Jiao, Zheng-Gang Xu, Xiang-Cheng Li, Ke Wang & Xue-Hao Wang
Key Laboratory of Liver Transplantation, Chinese Academy of Medical Sciences, Nanjing, People’s Republic of China
Gu-Wei Ji, Chen-Yu Jiao, Zheng-Gang Xu, Xiang-Cheng Li, Ke Wang & Xue-Hao Wang
NHC Key Laboratory of Living Donor Liver Transplantation, Nanjing Medical University), Nanjing, People’s Republic of China
Gu-Wei Ji, Chen-Yu Jiao, Zheng-Gang Xu, Xiang-Cheng Li, Ke Wang & Xue-Hao Wang

Authors

Gu-Wei Ji
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Yu Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Gang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-Cheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ke Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Gu-wei Ji contributed to drafting of this manuscript. Chen-Yu Jiao and Zheng-Gang Xu were responsible for analysis and interpretation of data. Ke Wang, Xiang-Cheng Li and Xue-Hao Wang contributed to study concept and design, critical revision for the draft manuscript. All authors approved the final version of this manuscript.

Corresponding authors

Correspondence to Ke Wang or Xue-Hao Wang.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the ethics committee of First Affiliated Hospital of Nanjing Medical University (Nanjing, China). Written informed consent was waived by the ethics committee of First Affiliated Hospital of Nanjing Medical University (Nanjing, China) because retrospective anonymous data were analyzed. This study was conducted in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors have no conflicts of interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Fig. S1. Scatter plot of gradient boosting machine-based prediction scores in the training/validation and test cohort. Scores are reported as median (interquartile range). Fig. S2. X-tile analysis to determine the optimal cut-points for GBM-based prediction scores. The optimal cut-points highlighted by black circle (A) are detailed in histogram of the training/validation cohort (B) with corresponding Kaplan-Meier curves (C). GBM gradient boosting machine. Table S1. Comparison of proposed and existing prognostic tools for ICC in sub-cohort with or without missing covariates. Table S2. Comparison of demographic and clinicopathological characteristics among different risk groups

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Ji, GW., Jiao, CY., Xu, ZG. et al. Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma. BMC Cancer 22, 258 (2022). https://doi.org/10.1186/s12885-022-09352-3

Download citation

Received: 13 July 2021
Accepted: 01 March 2022
Published: 11 March 2022
DOI: https://doi.org/10.1186/s12885-022-09352-3

Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Patient population and study design

Data collection and outcome

Model training, validating and testing

Statistical analysis

Results

Patient data

GBM prognostic model

Model performance

Risk stratification

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Cancer

Contact us