Risk factors and socio-economic burden in pancreatic ductal adenocarcinoma operation: a machine learning based analysis

Background Surgical resection is the major way to cure pancreatic ductal adenocarcinoma (PDAC). However, this operation is complex, and the peri-operative risk is high, making patients more likely to be admitted to the intensive care unit (ICU). Therefore, establishing a risk model that predicts admission to ICU is meaningful in preventing patients from post-operation deterioration and potentially reducing socio-economic burden. Methods We retrospectively collected 120 clinical features from 1242 PDAC patients, including demographic data, pre-operative and intra-operative blood tests, in-hospital duration, and ICU status. Machine learning pipelines, including Supporting Vector Machine (SVM), Logistic Regression, and Lasso Regression, were employed to choose an optimal model in predicting ICU admission. Ordinary least-squares regression (OLS) and Lasso Regression were adopted in the correlation analysis of post-operative bleeding, total in-hospital duration, and discharge costs. Results SVM model achieved higher performance than the other two models, resulted in an AU-ROC of 0.80. The features, such as age, duration of operation, monocyte count, and intra-operative partial arterial pressure of oxygen (PaO2), are risk factors in the ICU admission. The protective factors include RBC count, analgesic pump dexmedetomidine (DEX), and intra-operative maintenance of DEX. Basophil percentage, duration of the operation, and total infusion volume were risk variables for staying in ICU. The bilirubin, CA125, and pre-operative albumin were associated with the post-operative bleeding volume. The operation duration was the most important factor for discharge costs, while pre-lymphocyte percentage and the absolute count are responsible for less cost. Conclusions We observed that several new indicators such as DEX, monocyte count, basophil percentage, and intra-operative PaO2 showed a good predictive effect on the possibility of admission to ICU and duration of stay in ICU. This work provided an essential reference for indication in advance to PDAC operation. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-020-07626-2.


Background
The current 5-year survival rate of PDAC is only 8% [1], which is the lowest among all common cancers. The incidence of male pancreatic cancer is increasing year by year [1].
Surgical resection is a major way to cure the disease. Adding chemotherapy to adjuvant therapy can improve survival (the five-year survival rate is close to 30%) and reduce peri-operative mortality (about 3%) [2]. However, the risks borne by surgery cannot be underestimated, and the risk of complications is around 50% [3]. The intra-operative risk is mainly bleeding (5.9%), and postoperative complications are mainly pancreatic leakage (13%). These complications may be life-threatening, making the patient's risk of entering the ICU increased [4,5]. The high cost of pancreatic cancer surgery directly increases the burden on the patient and family. Patients transferred to the ICU also require special monitoring and intense care. Simultaneously, the number of complications greatly increased the in-hospital cost and length of hospitalization [6]. In a study of emergency department evaluation of suspected ICU patients, it was found that prolonged hospital stay in the emergency department increases the risk of death [7].
Identifying risk factors to predict high-risk groups and correcting surgical procedures has important economic benefits. It is now known that unmodifiable risk factors such as age (> 55), gender (male), blood type (non-O type), modifiable risk factors such as smoking (> 35 cigarettes / d,> 40 years), obesity (BMI > 30) [8,9]. Accurate perioperative risk prediction can prevent patients from clinical deterioration, reduce the incidence of adverse events, and control unplanned readmissions to the ICU, mortality, and potentially huge socioeconomic burden. The academic community has found some rules in risk prediction. For example, patients with high-risk surgery have a lower gastric mucosal PHi before surgery. During surgery, tachycardia will make the peri-operative risk higher, increasing the possibility of transfer to ICU [10,11]. It is still necessary to supplement essential risk factors to assess the risk of postoperative ICU transfer and the incidence of complications to optimize surgical decision-making.
Machine learning (ML) methods have attracted considerable research attention with the development of data storage techniques. ML is a multi-disciplinary interdisciplinary major. It uses computers as a tool and is committed to simulating human learning in real-time, especially how to improve specific algorithms' performance in empirical learning. ML provides opportunities to improve accuracy by taking advantage of the complex interaction between potential risk factors. It can improve medicine by better exploiting "big data" in a learning way [12]. Studies have shown that machine learning is significantly better than standard clinical reference tools for real-time prediction of complications in intensive care and sepsis prediction [13,14]. ML can be applied to clinical data sets to develop robust risk models and redefine patient classes [15].
However, few reports on patients' physiological status before PDAC operation and whether the influencing factors such as pre-operative and intra-operative status or anesthesia intervention will affect the post-operative effect from the real-world and artificial intelligence (AI) angle. Therefore, we collected 120 clinical features from 1242 PDAC patients, including demographic data, pre-operative and intra-operative blood tests, in-hospital duration, and ICU status. After data pre-processing, the 39 filtered variables are finally used for model construction. ML pipelines, including Supporting Vector Machine (SVM), Logistic Regression, and Lasso Regression, were employed to choose an optimal model in predicting ICU admission. Ordinary least-squares regression (OLS) and Lasso Regression were adopted in the correlation analysis of post-operative bleeding, total in-hospital duration, and discharge costs. Establishing a peri-operative risk prediction model helps prevent patients from clinical deterioration and potentially socioeconomic burden in advance of the surgery.

Research participants
We retrospectively selected 1242 PDAC patients from existing databases. All participants signed an informed consent form (except for those who have died). The detailed data collection procedure obtained permission from the Ethics Committee of Renji Hospital Affiliated to Shanghai Jiaotong University School of Medicine.
All participants underwent a physical examination and routine blood examination, blood gas analysis, and medication records during the operation, and statistics of clinical and demographic characteristics. Inclusion criteria: patients with pre-operative imaging confirmed pancreatic tumors, with surgical indications, and radical pancreatectomy. Exclusion criteria: The patient did not undergo pancreatic surgery or performed palliative surgery and pancreatic puncture. And those who are confirmed as non-pancreatic primary tumors after the operation (lower bile duct tumors, ampullary tumors, pancreatic metastases, etc.). Criteria for patients being admitted to the ICU are partly referred to ICU admission, discharge, and triage guidelines by Joseph L Nates et al. [16]: peri-operative patients with acute respiratory insufficiency, circulatory instability, severe cardiopulmonary comorbidities, major bleeding, and patients needing life-sustaining interventions.
Data pre-processing and feature selection Our structured database initially contains 100 clinical variables. First, features with more than 40% missing were excluded. Then, the categorical variables' missing values were filled by the mode, and the continuous variables were filled by the random forest [17]. To reduce the influence of the range difference of the features on the model construction, the noncategorical data was processed by mean and SD. Categorical data were further transformed into binary dummy variables. Finally, 39 variables were recruited to build the predictive model for post-operative admission to ICU.
The purpose of feature selection is to determine the best subset of features that can be used to predict each outcome variable. We used the machine learning method lasso regularization to construct feature subsets.

Model development
Model development includes linear model Lasso [18], Logistic Regression [19] and kernel-based SVM [20] machine learning models. The model was trained in the training set using 10-fold cross-validation, and the grid search method was used to adjust the parameters of each algorithm. In order to quantify the model's discrimination, a test set was applied to evaluate the model. The categorical dependent variable's evaluation index includes AU-ROC, sensitivity (recall), specificity, accuracy, log-loss and precision, while the continuous dependent variables are the prediction error graph. In addition, the factor weight of the linear model is taken as the importance of the factor. In addition to the performance comparison, we ranked the effect size of factors contributing to the models. Figure 1 showed the study flowchart.

Patients and variables
Our development cohort included a total of 1242 PDAC patients, 665 (52.74%) of whom admission to ICU with a mean time of 16.84 h. Through data pre-processing, 660 Fig. 1 Study flowchart. 1242 patients were recruited in the current study. Through data pre-processing, 660 patients with 120 complete clinical variables were used as predictive variables. The data were pre-processed and randomly divided into a training set (80%) and a validation set (20%). In the training set, k-fold cross-validation (k = 10) is used, and various parameter combinations are exhausted by grid search patients with 120 complete clinical variables were used as predictive variables (Table 1). ICU admission was considered as an outcome variable to build the predictive model for post-operative admission to ICU. We also built the predictive model for ICU hours, bleeding volume, in-hospital duration, and discharge costs.

Validation of training set for post-operative evaluation of ICU
The average ROC curves and PR curves was shown in the predictive model establishment of ICU admission in three models, i.e., SVM, Lasso, and LR ( Fig. 2a and 1b). All models have AUC values above 0.75, and the SVM is present to be upper (0.80). We use the AP value as the criterion for the PR curve. It can be seen that the APs of SVM and Lasso models are all above 0.80. The confusion matrix (rounding) was also calculated for these models ( Table 2). SVM generates the minimum number of FN (4) during the prediction process. The model LR produced the minimum number of FP (18). Table 3 showed the AUC, Sensitivity, Specificity, Accuracy, log-loss, FP Rate, Precision, AP, and F1 of each model evaluation result. There are significant performance differences between the different models. All models have excellent performance, and the accuracy rate is up to 0.75. Among them, SVM obtains the highest AUC value of 0.80, and the accuracy rate is 0.81. The Lasso has an AU-ROC value of 0.77, and the accuracy rate is 0.77. LR obtains the lowest AUC value of 0.76, and the accuracy rate is 0.75. The best performance of Sensitivity is the model SVM, which is suitable for the predictive model for post-operative admission to ICU in patients with PDAC. The model SVM, Lasso, and LR's Sensitivity reached over 0.80, and the specificity rate is over 0.60. SVM performed best in FP Rate and Precision.
Feature importance was calculated by the sum of the decrease in error when split by a variable, reflecting each variable's contribution to ICU admissions. The important features of the predictive model for post-operative admission to ICU, as were shown in effect sizes, were calculated, as shown in Fig. 2c. The features, such as age, duration of operation, monocyte, O 2 (intra-operative), and pre-operative mean hemoglobin concentration et al., are risk factors. The protective factors include RBC, analgesic pump DEX, intra-operative DEX, crystal weight, and pre-operative blood gas Cl et al. (Fig. 2c and Supplementary Figure 1).

Predictive model for post-operative evaluation of ICU hours and intra-operative bleeding volume
The basophil percentage was the most important risk variable for post-operative evaluation of ICU hours, followed by the duration of the operation and total infusion volume. The protective factors include analgesic pump DEX, HCO3, and lymphocyte percentage et al. The higher feature value of direct bilirubin before surgery, CA125, and actual remaining base increased probability of intra-operative bleeding volume, and preoperative total bilirubin, Sex. Female and pre-operative albumin decreased bleeding volume probability (Fig. 3).
Predictive model for evaluation of in-hospital duration and discharge costs The risk factors for post-operative evaluation of inhospital duration were age, pre-operation urine output, and operation duration. The protective factors were Preoperative lymphocyte absolute value, Pre-operative mean platelet volume, and SBE. The operation duration was the most important risk variable for discharge costs, followed by peri-operative urine output, age, and total infusion volume. The protective factors for discharge costs include lymphocyte percentage, pre-operative lymphocyte absolute value, and midazolam (Fig. 4).

Discussion
In our study, we compared and developed machine learning models to predict ICU admission, bleeding volume, in-hospital duration, and discharge costs by collecting 1242 patients with PDAC surgery and recording 120 pre-operative, intra-operativ and post-operative variables. Logistic regression is a generalized linear model that converts nonlinear factors through the sigmoid function to handle classification problems well. Yihe Wu et al. used logistic regression to analyze risk factors significantly associated with postoperative pulmonary complications (PPCs) in patients undergoing minimally invasive lobectomy [21]. The results showed that both restrictive and liberal intraoperative fluid administration were related to adverse effects on postoperative outcomes. Lasso is a linear regression method using L1 regularization, which will make some of the learned feature weights zero, to achieve sparseness and feature selection. Tadahiro Goto et al. applied regularization methods such as lasso and ridge to predict the disposition of asthma and COPD exacerbations in the ED to avoid overfitting when machine learning predicts complex relationships [22]. The learning algorithm of SVM is the optimal algorithm for solving convex quadratic programming. Abeg Kumar Jaiswal et al. were based on the SVM classification algorithm for automatic EEG seizure detection, which had obtained good prediction results and revealed the application potential of SVM in other prediction fields [23]. In our ICU admission prediction test, the SVM model achieved higher performance than other models, resulted in an AU-ROC of 0.80. Feras Hawari et al.'s study identified smoking status and having received chemotherapy were potentially showed that factors such as age, duration of operation, and various complications could determine admission to an ICU [28]. The result is the same as our study. Besides, we also found other factors such as PaO 2 level, monocyte count, and DEX that affect ICU admissions. The PaO 2 value of the radial artery was monitored as routine in our study. As patients intake pure oxygen (inspired oxygen of 100%) in operation, the normal PaO 2 / FiO 2 values are usually above 300 mmHg, according to ALI and ARDS diagnosis guidelines [29]. However, in our study, we found that intra-operative PaO 2 was a risk factor to enhancing the post-operative ICU admission incidence rate. There have been reports of direct and indirect adverse effects of oxygen in the perioperative period [30]. Studies have shown that a higher death rate was observed in the high-flow oxygen group than the titrated oxygen group in COPD patients [31]. The same situation reported that intake of 100% oxygen after 15 min of cerebral ischemia for 3 to 6 h significantly increased the 14-day mortality to three-fold compared to the air intake group [32].
Meanwhil, intaking high-flow oxygen to the postoperative patient, especially high-risk patients, will not prevent reintubation or extubation of respiratory failure [33]. Studies also indicated that hyperoxia could cause acute lung  injury and impairment of lung function [34,35], which was considered an independent factor associated with inhospital mortality [36]. Mechanisms of hypoxia-induced injury mainly induced oxidative stress, which may activate necroptosis. Hypoxia can cause an imbalance of noninflammatory factors and anti-inflammatory factors in the lungs through various ways, resulting in the release of inflammatory factors in the lungs and causing damage; it can affect the initiation of NF-kb in macrophages and aggravate the initiation of the inflammatory response; in addition, hypoxia can cause increased production of ROS induced lung damage. Combined with the 'duration of operation', the PaO 2 increase the rick to ICU admission is easy to understand. In our study, it is worth noted that the PaO 2 in the PDAC operation should be controlled at a relatively lower level to guarantee a lower ICU admission incidence rate. These findings suggest that the high flow of oxygen or high PaO 2 status during surgery is not beneficial to patients, but rather impaired lung function, especially in patients with pre-operative pulmonary insufficiency, such as COPD, increasing the likelihood of failure post-operative extubation and increased the risk to ICU. Therefore a more welldesigned clinical trial should be performed to validate the hypothesis.  Monocyte is a risk factor in predicting ICU admissions. There have also been some previous studies on the relationship between monocyte and PDAC, suggesting that high monocyte in pancreatic cancer patients is usually suggestive of shorter survival and poor prognosis [37][38][39], which can serve as in independent factor to predict the survival of pancreatic cancer with resection [37]. In particular, monocytes appear to play an important role in determining patient outcomes following surgery [40]. Although the prognosis and survival of PDAC patients were not addressed in this part of our study, monocytes play an important role in tumor proliferation and metastasis. They are also associated with tumor-induced systemic inflammatory responses. We hypothesized that an increase in monocytes predicted that the whole body was already undergoing an inflammatory response. After undergoing greater surgical stimulation, the inflammatory response's exacerbation caused damage to vital organs such as the heart, lungs, and brain. However, the characterization of the early post-operative immune response in ICU patients with a causal link to later post-operative infections lacks from the current literature.
Dexmedetomidine is a highly selective α2-adrenergic receptor agonist, providing sedative and analgesic effects without respiratory depression. Studies found that DEX could inhibit the inflammatory response. In human studies, Dex could reduce the release of serum inflammatory markers CRP, TNF-α, IL-6, and IL-1β, which indicated a strong effect on anti-inflammatory reaction [41]. Further studies confirm that dextromethorphan may provide lung protection through various pathways, such as attenuating pulmonary ischemia-reperfusion injury through the PI3K/Akt/HIF-1α signaling pathway [42], protecting lung tissue by modulating immune responses [43], and also providing pulmonary protection from hyperoxia induced lung injury by attenuated the ROS [44]. In clinical, DEX is a popular medicine used for sedation in the ICU. DEX can provide safe and effective sedation, facilitate extubation, and reduce delirium, atrial fibrillation, and renal and myocardial injury [45]. DEX was necessary to prevent post-operative complications from preadmission interventions for older cardiac surgery patients [46]. In our study, the feature of analgesic pump DEX and intra-operative DEX are protective factors for ICU admissions, probably due to the anti-inflammatory effect, which against the high monocyte and protective effect against the high PaO 2 induced injury.
ICU length of stay (LOS) is a frequent measure of ICU resource use and performance. Predictions of ICU LOS are routinely used as the means of resource allocation. However, the accuracy of ICU LOS predictions made by clinicians has been poorly evaluated. Studies reported that variables, such as post-operative monitoring, systolic arterial pressure, creatinine level, invasive mechanical ventilation, and active infection et al., were associated with ICU LOS [47][48][49][50]. In the ICU time-length study, Huang performed two types of analyses, in which a singlefactor correlation analysis found a large number of changes in the blood cell indexes of hospitalized patients in relation to their in-hospital mortality, both mentioning the ratio of monocytes to basophils; in the multifactorial regression, basophils, leukocytes, MCHC were independent factors associated with in-hospital mortality [51]. After analyzing the data through artificial intelligence and deep learning, Our results only suggested that basophil percentage was a potential risk variable for post-operative evaluation of ICU LOS, as an independent fast associated with the in-hospital mortality.
In addition, we analyzed the characteristics of the potential factors of intra-operative bleeding volume and inhospital duration. We found that the in-hospital duration was related to pre-operative urine volume, lymphocyte absolute value. The bilirubin, CA125, and pre-operative albumin were associated with bleeding volume, which were rarely reported in previous studies. The location of the pancreatic tumor often determines its clinical presentation, such as direct bilirubin. Direct bilirubin elevation in our study can increase in intraoperative bleeding, most likely related to tumor oppression, resulting in increased bleeding due to increased surgical difficulty. However, more research is needed to explain the increase in surgical bleeding caused by Ca125 and the findings that preoperative total bilirubin can reduce surgical bleeding.
In our study, age, operative time, PaO 2 , and monocyte were found to be risk factors for increased ICU entry in the model predicting ICU stay; in the model predicting ICU stay, basophils percentage, duration of operation, total infusion volume were found to be risk factors for increased ICU stay; in the model predicting ICU stay, age, operative urine volume, and direct preoperative bilirubin were found to be risk factors for increased ICU stay; in the model predicting operative expense summary, we found operative time, urine volume, age, and total infusion to be major risk factors.
Within this range of predictive models and factors, except for age as the recognized risk factor, other factors and models directly still have some potential linkage. For example, the "direct bilirubin before surgery" response is the degree of obstruction of the biliary tract system by the pancreatic tumor, which directly affects the difficulty of surgery and causes the increase in the operating time, increasing the probability of entering the ICU, the length of stay in the ICU and the medical expenses. And the prolonged duration of the operation also caused an increase in intraoperative urine volume and total infusion volume, which has to be reflected in other predictive models. Besides, the prolonged duration of surgery increases the time to high PaO2, causing damage to vital organs and increasing the risk of patients entering the ICU.
There are no standard criteria for admission to the ICU in different regions and hospitals. In our study, the criteria for admission to the ICU were based on the clinical experience of the current hospital, in addition to the criteria of the surgeon and anesthesiologist. The highrisk factors identified in our predictive models, in addition to alerting and assisting surgeons and anesthesiologists in clinical decisions, can also serve to provide data support for future ICU admission criteria for pancreatic cancer patients in the future. The ultimate goal is to provide effective advice and standards for access to the ICU and rationalize medical resources allocation through the continuous expansion of data volume and the enrichment of clinical disease types, which was the purpose of Nates' study, published in 2016 in the journal Critical Care Medicine [16].
For patients with PDAC, survival, morbidity, and sequelae are significant and necessary outcome indicators. Since many patients in our database were operated on from 2018 to 2019, the best observation period of longterm results (such as a 3-year survival period) has not yet been reached, so it has not been analyzed in this study. Patients' outcome is also of great concern and interest to us, and we will further analyze it in the follow-up study.

Conclusions
In conclusion, we developed a machine learning model to predict ICU admission in this study. There are essential values for reducing patients' financial burden and provides new clinical insights for improving perioperative management of PDAC patients.