Nomogram for prediction of the international study Group of Liver Surgery (ISGLS) grade B/C Posthepatectomy liver failure in HBV-related hepatocellular carcinoma patients: an external validation and prospective application study

Background To develop a nomogram for predicting the International Study Group of Liver Surgery (ISGLS) grade B/C posthepatectomy liver failure (PHLF) in hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) patients. Methods Patients initially treated with hepatectomy were included. Univariate regression analysis and stochastic forest algorithm were applied to extract the core indicators and reduce redundancy bias. The nomogram was then constructed by using multivariate logistic regression, and validated in internal and external cohorts, and a prospective clinical application. Results There were 900, 300 and 387 participants in training, internal and external validation cohorts, with the morbidity of grade B/C PHLF were 13.5, 11.0 and 20.2%, respectively. The nomogram was generated by integrating preoperative total bilirubin, platelet count, prealbumin, aspartate aminotransferase, prothrombin time and standard future liver remnant volume, then achieved good prediction performance in training (AUC = 0.868, 95%CI = 0.836–0.900), internal validation (AUC = 0.868, 95%CI = 0.811–0.926) and external validation cohorts (AUC = 0.820, 95%CI = 0.756–0.861), with well-fitted calibration curves. Negative predictive values were significantly higher than positive predictive values in training cohort (97.6% vs. 33.0%), internal validation cohort (97.4% vs. 25.9%) and external validation cohort (94.3% vs. 41.1%), respectively. Patients who had a nomogram score < 169 or ≧169 were considered to have low or high risk of grade B/C PHLF. Prospective application of the nomogram accurately predicted grade B/C PHLF in clinical practise. Conclusions The nomogram has a good performance in predicting ISGLS grade B/C PHLF in HBV-related HCC patients and determining appropriate candidates for hepatectomy.


Background
Hepatocellular carcinoma (HCC) is the sixth most common malignancy and the fourth leading cause of cancerrelated death worldwide [1]. Hepatectomy is the most effective treatment for early-stage HCC patients, [2] and selective intermediate-stage and advanced-stage HCC patients with resectable tumors and moderate liver function [3]. Advances surgical techniques and management have greatly improved the safety and postoperative outcomes over the past few decades [4]; however, the International Study Group of Liver Surgery (ISGLS) grade B/ C posthepatectomy liver failure (PHLF) remains a serious complication, which is a predominant cause of postoperative mortality [5,6].
Incidence of PHLF as reported in literature widely ranges from 1.2-32% attributing to diverse etiological and pathogenic liver characteristics and surgical procedures [6,7]. Independent risk factors of PHLF can be grouped into three categories [5,8]: 1) Patient-related factors including age, sex, comorbidities such as malnutrition, diabetes mellitus, cardiopulmonary, renal or cerebral dysfunction; 2) liver disease-related factors including hepatitis B/C, steatosis, cholangitis, alcoholic liver disease and cirrhosis; 3) surgery-related factors including future liver remnant volume (FLRV), excessive intraoperative blood loss, prolonged operation time, and ischemia-reperfusion injury resulting from Pringle's manoeuver manipulation. In particular, as a major cause to promote decompensate liver cirrhosis and dysfunction, chronic hepatitis B is highly prevalent and associated with 70-90% of HCC cases in the Asia-Pacific region [9].
Accurate prediction of PHLF is of primary concern for determining the feasibility of hepatectomy for HCC [5,8]. Child-Pugh grade, [10] model for end-stage liver disease (MELD), [11] albumin-bilirubin (ALBI), [12,13] platelet-albumin-bilirubin (PALBI) and aspartate aminotransferase to platelet ratio index (APRI) [14] are commonly conventional scores used for evaluating PHLF, nevertheless their predictive performance remains controversial due to inherent limitations. Child-pugh grade is the most widely used for evaluating compensate liver function and has been incorporated into surgical treatment algorithms [11]. However, subjective and unquantifiable variables usually complicate Child-pugh grade: serum bilirubin level of 55 μmol/L has the same influence on Child-pugh grade as 550 μmol/L due to arbitrary thresholds for continuous variables; there is no clear guideline for distinguishing mild or moderate ascites, and the influence of diuretic therapy on grading ascites remains unclear; sedatives therapy frequently mislead encephalopathy [15]. MELD was developed to evaluate acute liver failure mortality risk and rank candidates for transplantation, [16,17] which was also good at determining increased PHLF morbidity and mortality risk when a MELD score > 8 on postoperative day 5 [5]. However, MELD has a poor performance at preoperative predicting PHLF [5,6,11]. ALBI statistical eliminates subjective observation and assesses liver function and overall survival compared favorably with Child-pugh grade in four geographical and etiological HCC patient groups [18]. A preoperative ALBI score predicting PHLF was more accurately than Child-pugh grade, MELD and indocyanine green retention at 15 min (ICG-15) [12,13]. However, when patients with hyperbilirubinemia were divide into the ALBI grade 3, patients with obstructive jaundice may have better liver function and prognosis than patients with jaundice caused by decompensate liver dysfunction, which significantly misleading the grading of ALBI [19,20]. Blood platelet (PLT) counts as a surrogate marker of portal hypertension was added to ALBI to develop the PALBI, which predicting survival in HCC patients across treatment modalities including hepatectomy was better than ALBI and MELD. Nonetheless, further research is necessary as few studies have been done evaluating use of PALBI for predicting PHLF. APRI is noninvasive and reliable for evaluating liver fibrosis and cirrhosis, [21,22] meanwhile a high preoperative APRI score have a high risk of PHLF in HCC patients [23]. However, APRI only includes two quantitative variables and has no ceiling effect. In general, these conventional scores were primarily designed for assessing liver function or other purposes rather than predicting PHLF. Moreover, when they were used for predicting PHLF none of these scores comprehensively considers patientrelated, liver-related, and surgical-related risk factors.
As an evidence-based model, nomogram has been proposed as an alternative tool for therapy risk individualized estimation in clinical application [24,25]. This study aimed to establish a nomogram to predict grade B/C PHLF risk for HBV-HCC patients.

Patient population
This study was conducted retrospectively in HBVrelated HCC patients who were initially treated with hepatectomy. The training and internal cohorts consisted of patients treated at the Guangxi Medical University Cancer Hospital (GXMUCH) between October 11th, 2013 and December 21st, 2017. The external cohort consisted of patients treated at the Eastern Hepatobiliary Surgery Hospital (EHBH) between September 14th, 2009 and January 22th, 2018. In addition, patients would receive hepatectomy as the initial treatment, were prospectively recruited from GXMUCH between December 22th, 2017 and June 21st, 2018 for evaluation of the nomogram in clinical application. This study was approved by the Institutional Ethics Committees of the two hospitals.

Preoperative examination and surgical procedure
Preoperative general characteristics, laboratory biochemistry data (including liver and renal function tests, hepatitis immunology and serum α-fetoprotein level), radiological data (including abdominal contrastenhanced CT or MRI scan, and chest radiograph), surgical data (including operation time, intraoperative blood loss, intraoperative transfusion, sFLR) were routinely collected. sFLR was determined by three-dimensional technology using software DEMedical (version3.1, DE Sci&Tech co., Ltd., Shenzhen, China). The details of the calculation of future liver remnant have been described in previous study [27]. Surgical procedures have been described in a previous report [3]. Pringle's manoeuver was applied to occlude the liver blood. Electrosurgical instruments or clamp-crushing method was performed to carry out liver parenchymal transection. Histopathological examination was routinely conducted by three pathologists on all surgical specimens. The main outcome observed was the PHLF morbidity and mortality.

Study design and statistical analyses
Flow chart of the study design is shown in Fig. 1. A stratified random grouping method was performed to randomly assign and divide patients into training cohort and internal validation cohort at a ratio of 3:1. In training cohort, logistic univariate analysis was used to identify independent risk indicators of grade B/C PHLF and correlation analysis was performed to eliminate data redundancy and excessive false positives. When correlation analysis indicated non-independence from univariate analysis, indicators were classified according to clinical significance in seven groups with different meanings (liver synthesis ability, metabolism ability, HBV activity status, liver inflammation, compensate cirrhotic liver function, coagulation function, and surgery-related factors). Stochastic According to the forest algorithm, indexes with the highest weight (at least> 20) in each category were extracted and incorporated into the subsequent logistic multivariate regression model. A nomogram was formulated using the RMS package in R version 3.3.2. The predictive performance of the nomogram was measured using receiver operating characteristic (ROC) curve and compared with conventional scores. Calibration plots methods evaluated the goodness of fit for the nomogram. Yoden index of the ROC curve from training cohort was calculated to set the diagnostic threshold. The diagnostic errors were displayed by correcting the curve. Area under the curve (AUC) represents the misdiagnosis threshold. Correspondingly, the confidence interval of diagnosis is expressed in abscissa of area beyond the 95% misdiagnosis threshold. For efficacy evaluation of the nomogram in prospective clinical application, total points of predictions were calculated for each patient, meanwhile statistical indicators including precision, recall, accuracy and F1 balance were calculated to evaluate the diagnostic ability.
Data analysis was performed using SPSS (Version 23.0, IBM, New York, USA) and R software (Version 3.2.2, Institute for Statistics and Mathematics, Vienna, VIC, Austria). Normally distributed continuous data are expressed as mean (s.d.) and compared using an unpaired. Two-tailed ttest Values with a non-normal distribution are expressed as median (IQR 25-75) and were compared using Mann-Whitney U test. Categorical data are shown as frequency and proportion and were compared using the χ 2 test.

Clinicopathologic characteristics
During the study period, 1200 HBV-related HCC patients from GXMUCH met the inclusion criteria were included and randomly assigned to a training cohort (n = 900) and an internal validation cohort (n = 300) at a ratio of 3:1. Besides, 387 HBV-related HCC patients from EHBH met the inclusion criteria were included in an external validation cohort (Fig. 1). Baseline clinicopathologic characteristics are listed in Table 1.
The incidences of grade B/C PHLF were 13.5, 11.0 and 20.2% in the training cohort, internal cohort and external cohort, respectively ( Table 1). The postoperative mortality rate of entire participants at GXMUCH was 1.75% (21 patients): 13 patients died as a result of grade C PHLF following multiple organ failure, 8 patients died due to sepsis or severe pneumonia. The postoperative mortality rate of participants from EHBH was 0%. In participants for prospective clinical application of the nomogram, grade B PHLF occurred in 14 patients (11.7%) and grade C PHLF occurred in 4 patients (3.3%).

Independent risk indicator of ISGLS grade B/C PHLF
In training cohort, independent risk indicators of grade B/C PHLF was identified by logistic univariate analysis (Fig. 2a) and data redundancy and excessive false positives were eliminated by correlation analysis (Fig. 2b). According to the forest algorithm, serum HBV-DNA loads with weights of 2.19 were excluded (Fig. 2c). Multivariate analysis for a stepwise removal of variables was then done, and the results reported as odds ratios with 95%CI, total bilirubin (T-Bil), platelet (PLT) count, prealbumin (PA), aspartate aminotransferase (AST),      (Table 2). Then, the second correlation test revealed that not significant independence among these six independent indicators, which are able to be incorporated into the nomogram (Fig. 2d).

Development and validation of a grade B/C PHLFpredicting nomogram
Based on the multivariate logistic analysis (Table 2), a nomogram integrating T-Bil, PLT, PA, AST, PT and sFLR was developed (Fig. 3a) (Table 3). Calibration curves assessing risk and analyzing consistency of results showed good agreement for probability of grade B/C PHLF between the actual observation and prediction in training cohort (R 2 = 0.992), internal validation (R 2 = 0.977) and external validation cohorts (R 2 = 0.946) (Fig.  3c-e). Further, calculations of objectivity evaluation of diagnostic confidence interval revealed that total points of diagnostic errors with 95%CI were concentrated in 175 (ranged 158-220) in training cohort; concentrated in 170 (ranged 155-210) in internal validation cohort and concentrated in 176 (ranged 144-240) in external validation cohort (Fig. 4a-c), respectively. The sizes of confidence intervals among the three cohorts were very similar and the positions of the concentrated total points were close to the best cutoff value of 169. Confidence interval is considered to be prediction of low risk of grade B/C PHLF when total points fall below this range, while to be prediction of high risk of grade B/C PHLF when total points beyond this range. However, when total points fall within this range, the prediction results should be carefully considered.

Comparison of predictive accuracy for grade B/C PHLF among the nomogram and conventional scores
When compared with conventional scores, the nomogram had greater discriminatory performance for predicting grade B/C PHLF than Child-pugh grade, MELD, ALBI, PALBI and APRI in training cohort, internal validation cohort and external validation cohort ( Fig. 4d-f, Table 4 and Supplemental Table 1), which was not significantly influenced by inherent heterogeneity in different cohorts.
Prospective clinical application of the nomogram to predict grade B/C PHLF In order to further evaluate the ability of predicting grade B/C PHLF in clinical application, the nomogram was applied to predict whether grade B/C PHLF occurred in 120 individual HBV-related HCC patients who would receive hepatectomy in GXCH. As a result, we accurately predicted that 85 patients would not have grade B/C PHLF, while 16 patients would; the remaining 19 cases were misjudged with total points were within 165-197 (Supplemental Table 1). All miscalculations have been re-evaluated and incorrect predicted total points (ranged 165-197) are fully contained within 158-220. The results of empirical evaluation of confidence interval help to improve the practicability of the nomogram and support the scientific of the study design. In addition, the predictive performance of the nomogram for judging non-occurrence of grade B/C PHLF is good in clinical practice, with a precision of 0.977, a recall of 0.833, an accuracy of 0.947, and a F1 balanced Score of 0.899.

Discussion
In this study, after eliminating data redundancy and excessive false positives (Fig. 2), a nomogram was developed by integrating five essential preoperative serum laboratory biochemistries (T-Bil, PLT, PA, AST, PT) and sFLR from different categories with clinical significance comprehensively indicating compensate liver function and the percentage of postoperative remain liver ( Table  2). Then a graphical and easy-to-use tool was applied for individualized predicting grade B/C PHLF in HBV-HCC patients (Fig. 3a). This nomogram displayed a good accuracy of prediction for grade B/C PHLF (Fig. 3b) and Table 2 Multivariable logistic regression analyses of grade B/C PHLF in the training cohort good agreement between probability and actual observation in training cohort, internal validation cohort and external validation cohort (Fig. 3c-e). Besides, this nomogram had greater predicative performance than conventional scores (Fig. 4a-c,   ability of the nomogram and identification of confidence intervals were conducted. Results from feedback error diagnosis distribution curves revealed that actual distribution of total points for error prediction with 95%CI was concentrated in 175 (158-220), 170 (155-210) and 176 (144-240) in training cohort, internal validation and external validation cohorts, respectively, and were very close to the best cutoff value of 169 (Fig. 4a-c). Such that doctors have more flexibility when using the diagnostic results. Confidence interval is considered to be prediction of low risk of grade B/C PHLF when total points fall below this range and high risk of grade B/C PHLF when total points beyond this range. While total points fall within this range, the prediction results should be carefully considered and further confirmed through other evaluations such as ICG-15 retention or computer residual liver imaging volumetry. In this study, clinical use of the nomogram was evaluated in 120 HBV-related  HCC patients who will receive hepatectomy. We found that preoperative application of this nomogram had good predictive performance for acutely judging grade B/C PHLF did not occured in 85 patients; and occurred in 16 patients; misjudged in 19 cases with total points were fall within 165-197, which were all fall within the 95%CI of diagnostic errors between 158 and 220 (Supplemental Table 1). The scoring range of misjudged cases was consistent with the objective evaluation, which supports the scientific nature of this study and guides further flexible application of this nomogram. In addition, this nomogram had a good predictive performance for judging non-occurrence of grade B/C PHLF in clinical practice, which was consistent with negative predictive values analysis. A growing body of researches confirmed that major hepatectomy and insufficient sFLR was associated with high risk of PHLF [27,29]. Our result is consistent with previous findings revealed that sFLR in this nomogram is of great importance for predicting grade B/C PHLF and adopted sFLR as an important predictive indicator of grade B/C PHLF in our model [30]. Recently, two radiomics-based nomograms based on portal-phase computed tomography or ultrasound were established to predict PHLF, [31,32] however, they didn't considered the influence of sFLR on PHLF and integrated this indicator. Moreover, considering mild-moderate liver dysfunction after liver surgery is very common, these nomograms remains to be further validated for predicting ISGLS grade B/C PHLF, particular in HBV-related HCC patients.
This study remains some limitations. First, all of the study participants were associated with HBV, while in most western countries and Japan, the majority of HCC cases are related to alcoholic liver disease or HCV. Therefore, further validation is required from other etiological populations. Second, the reliability of the nomogram remains to be further confirmed by conducting prospective and multicenter validation studies with expanding study participants. Moreover, advanced imaging scans and ICG-15 retention to estimate PHLF might be taken into further consideration to improve the diagnostic value.

Conclusions
By comprehensively integrating five essential preoperative serum laboratory biochemistries (T-Bil, PLT, PA, AST, PT) and sFLR with different clinical significance comprehensively indicating the compensate liver function and the percentage of remain liver after hepatectomy, a novel nomogram was generated for individualized predicting ISGLS grade B/C PHLF in HBV-HCC patients. The results of internal and external validations demonstrated that this nomogram had good predicative performance. Prospective clinical application of this nomogram proposed an accurate judgment of non-occurrence of ISGLS grade B/C PHLF. It potentially provides an alternative tool for determining HBV-HCC patients with low risk of ISGLS grade B/C PHLF are appropriate candidates for hepatectomy.
Additional file 1. Supplement Table 1. Correlation analysis of 120 individual HBV-HCC patients' grade B/C PHLF risk prediction data by the nomogram in prospective clinical application.