Advanced gastrointestinal stromal tumor: reliable classification of imatinib plasma trough concentration via machine learning

Aim Patients with advanced gastrointestinal stromal tumors (GISTs) exhibiting an imatinib plasma trough concentration (IM Cmin) under 1100 ng/ml may show a reduced drug response rate, leading to the suggestion of monitoring for IM Cmin. Consequently, the objective of this research was to create a customized IM Cmin classification model for patients with advanced GISTs from China. Methods Initial data and laboratory indicators from patients with advanced GISTs were gathered, and the above information was segmented into a training set, validation set, and testing set in a 6:2:2 ratio. Key variables associated with IM Cmin were identified to construct the classification model using the least absolute shrinkage and selection operator (LASSO) regression and forward stepwise binary logistic regression. Within the training and validation sets, nine ML classification models were constructed via the resampling method and underwent comparison through the Brier scores, the areas under the receiver-operating characteristic curve (AUROC), the decision curve, and the precision-recall (AUPR) curve to determine the most suitable model for this dataset. Two methods of internal validation were used to assess the most suitable model's classification performance: tenfold cross-validation and random split-sample validation (test set), and the value of the test set AUROC was used to evaluate the model's classification performance. Results Six key variables (gender, daily IM dose, metastatic site, red blood cell count, platelet count, and percentage of neutrophils) were ultimately selected to construct the classification model. In the validation set, it is found by comparison that the Extreme Gradient Boosting (XGBoost) model has the largest AUROC, the lowest Brier score, the largest area under the decision curve, and the largest AUPR value. Furthermore, as evaluated via internal verification, it also performed well in the test set (AUROC = 0.725). Conclusion For patients with advanced GISTs who receive IM, initial data and laboratory indicators could be used to accurately estimate whether the IM Cmin is below 1100 ng/ml. The XGBoost model may stand a chance to assist clinicians in directing the administration of IM.


Introduction
Gastrointestinal stromal tumors (GISTs) are the most common mesenchymal tumors of the digestive tract [1].Acquired functional mutations in the tyrosine-protein kinase growth factor receptor proto-oncogene (KIT) and platelet-derived growth factor-alpha gene lead to increased tyrosine kinase activity, which is considered a key factor in the pathogenesis of GIST [1][2][3].Imatinib (IM), a tyrosine kinase inhibitor (TKI), blocks KIT receptor activity and has become the conventional first-line therapy for patients with advanced GISTs [4], which inhibits proliferation and promotes apoptosis of GIST cells [4][5][6].Therefore, the IM plasma trough concentration (C min ) is intimately linked to the effectiveness of treatment [7].
The IM C min of patients with advanced GISTs below 1100 ng/mL showed a shorter time to progression, according to a prior study by Demetri et al. [8].Meanwhile, marked inter-individual variability in IM pharmacokinetics between subjects has been observed [9][10][11], leading to the suggestion of monitoring for IM C min [12].However, the absence of a therapeutic drug monitoring (TDM) platform in certain hospitals is due to restricted health conditions, making the sampling and examination of TDM for IM expensive both temporally and financially.Thus, there is a need for more convenient concentration classification tools than TDM, such as rapidly developing machine learning (ML) methods [13,14], which can provide a reference for clinicians to make clinical decisions, thus reducing the cost of time and money for patients.
ML has an irreplaceable position in data analysis and can help promote data-driven estimation when predicting from multiple variables and capturing non-linear variable relations to construct a model with high classification performance [15,16].Therefore, this study aimed to streamline the process of IM C min monitoring using the ML model based on patients' initial data (demographic, treatment, and clinical information) and laboratory indicators.

Patients and data
Demographic information of patients with advanced GISTs who were followed up at the First Affiliated Hospital of Chongqing Medical University (CMU) between January 2000 and August 2023 was gathered retrospectively.Meanwhile, IM C min data, treatment information, clinical information, and laboratory indicators were collected in the same patient with advanced GIST from April 2017 to August 2023.For patients with advanced GISTs, our team generally recommends that patients go to the GIST specialist clinic for follow-up every 3 or 6 months or so for an abdomen ultrasound or CT examination, to observe the tumor situation and monitor the IM C min simultaneously.It is worth noting that blood samples were collected and separated for routine blood, liver, and kidney function examinations from patients with GISTs on the same day the venous blood samples were collected to determine IM C min .The inclusion criteria were as follows: (1) verification of GIST through biopsy or postoperative pathology, (2) age over 18 years, (3) good medication adherence with IM, (4) less than 8% missing data, and (5) had been taking IM ≥ 1 month.The exclusion criteria were as follows: (1) patients with GISTs who had undergone complete tumor resection and had no recurrence of the tumor at the end of follow-up, (2) history or existence of other malignancies, (3) patients with missing IM C min data, and (4) patients lost to followup.The case screening flowchart and the schematic representation of the study design are displayed in Fig. 1.
We established a database called "Weinichangzai", which included initial data for each patient, and all patients' initial data was obtained through the GIST specialist outpatient clinic, telephone calls, WeChat, and other interaction tools.Initial data included demographic information (including age at diagnosis, age at blood sampling, gender, and residence); treatment information (including surgical procedures (1, gastrectomy; 2, nongastric operation) [17,18] and daily IM dose); and clinical information (including expression of DOG-1/CD117/ CD34, metastatic site (1, liver; 2, non-liver) [17,18], and primary tumor site).

Determination of IM C min
All patients with advanced GISTs were advised to take IM at lunchtime, and a 3 ml venous blood sample was collected in a heparinized vial on the second day (generally 24 ± 3 h following the previous dose) before taking an IM dose.Blood samples were centrifuged at room temperature for 5 min at 3000 g.A protocol was established for determining IM C min according to the method described by Tan et al. [19] and Roth et al. [20].Plasma samples were pretreated by protein precipitation.We added 100 μl of acetonitrile and 50 μl (50%) of perchloric acid successively to plasma (0.5 ml), swirled the solution for 20 s, and centrifuged it at 10,800 rpm for 15 min.We added 50 μl of neutralizing solution (containing 1.4 g potassium carbonate and 0.65 g potassium chloride dissolved in 5 ml of purified water) to 400 μl of the supernatant, and the mixture was vortexed thoroughly before submerging for 30 min at 4 °C.Thirty microliters of the supernatant was injected into a high-performance liquid chromatography system.The lower limit of quantification was set at 50 ng/ml.

Method of feature selection
We collected 26 candidates from demographic information, treatment information, clinical information, and laboratory indicators.To obtain the best predictive performance, the variable selection was performed on 26 candidates using the LASSO (least absolute shrinkage and selection operator) regression with tenfold cross-validation, which could compress the variable coefficients to prevent overfitting and solve severe collinearity problems [21,22].LASSO regression analyses were performed using "Extreme Smart Analysis" (www.xsmar tanal ysis.com).To further control the influence of confounding factors, variables selected by LASSO regression were analyzed by forward stepwise binary logistic regression (LR) to obtain key variables.LR analyses were performed using SPSS version 27.0 (IBM Corp, Armonk, NY, USA).

Selection method of ML algorithm
In this study, we randomly divided the dataset into three sets: the training (60%) and validation sets (20%) for ML model development and the test set (20%) for performance evaluation.The randomization's success was determined by comparing baseline characteristics in each group.Nine types of ML algorithms were used to construct the classification models in this study: Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forest (RF), Gaussian Naive Bayes (GNB), Complement Naive Bayes (CNB), Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Adaptive Boost (AdaBoost).All analyses were performed using "Extreme Smart Analysis", which can also select the best-performing hyper-parameters using the grid-search method.
Within the training and validation sets, nine ML classification models were constructed via the resampling method and underwent comparison through the Brier scores, the areas under the receiver-operating characteristic curve (AUROC), the decision curve, and the precision-recall (AUPR) curve to determine the most suitable model for this dataset, which were important indicators that can be used to evaluate classification models.Two methods of internal validation were used to assess the most suitable model's classification performance: tenfold cross-validation and random split-sample validation (test set).To explain the model predictions, we used Shapley Additive Explanations (SHAP) of "Extreme Smart Analysis" to calculate the Shapley values of the test set.SHAP values are based on Shapley values in cooperative game theory to proceed with the best explanation of the output of our machine-learning model [23].

Statistical analysis
Continuous variables (non-normal distribution) are described using median and interquartile range (IQR) values, and categorical variables are presented as frequencies (percentages).The Mann-Whitney U-test (non-normal distribution) was used to assess the differences in continuous variables between the training, validation, and test sets.Categorical variables were compared between the training, validation, and test sets using the Pearson chi-square test, and Fisher exact test.Statistical significance was set at p < 0.05.All P values were calculated as two-tailed.All analyses were performed using SPSS version 27.0 (IBM Corp, Armonk, NY, USA).

Baseline characteristics
In total, 212 patients with advanced GISTs, based on the inclusion and exclusion criteria, were included, of whom 890 IM C min data were collected.Missing data were filled by imputing the data via the RF algorithm [24].The mean value of IM C min , the label variable, was 1469.59 ng/mL, with a standard deviation (SD) value of 755.71 ng/ml.In this dataset, 31.24% of IM C min values were < 1100 ng/ ml.More than half were males (59.10%).The mean age at diagnosis was 56 years, and 18.54% of this dataset underwent gastrectomy.The daily IM dose in this dataset (76.29%) was 400 mg/day.The comparison of baseline characteristics between the test set (20%) and training and validation sets (80%) is shown in Table 1, without any statistically significant differences in the variables between the two groups (p > 0.05).The comparison of baseline characteristics between "IM C min ≤ 1100 ng/ml" and "IM C min > 1100 ng/ml" was shown in Table 2, significant differences were observed between the groups according to age at diagnosis, age at blood sampling, gender, daily IM dose, metastatic site, NEU%, RBC, HB, LYM%, ALT, TBIL, IBIL, GGT, Cr, BUN, and eGFR (p < 0.05).

Key variables
In the training and validation sets, the 26 candidates underwent a tenfold cross-validation LASSO regression analysis (Fig. 2A and B).The results showed that the optimal parameter λ (λ = 0.018) in the LASSO regression analysis with the smallest mean square error, which reduced the 26 candidates to 9 feature variables, including daily IM dose, Metastatic site, Gender, PLT, NEU%, RBC, HB, LYM%, and age at diagnosis.To address potential confounding factors, the binary LR was used to analyze the above 9 feature variables via the forwardstepwise method.Finally, only daily IM dose, Metastatic site, Gender, PLT, NEU%, and RBC were determined as key variables (p < 0.05), as shown in Table 3.

The best model building
Following the identification of these six key variables, XGBoost, LightGBM, RF, GNB, CNB, MLP, SVM, KNN, and AdaBoost were trained and applied the resampling method by resampling 10 times.As shown in Fig. 3A and B, RF and KNN (ranked according to AUROC) had the best performance in the training set, but XGBoost (ranked according to AUROC) had the largest AUROC and shortest SD in the validation set, indicating the best stability of this model.When the Brier scores for the nine aforementioned ML models were compared, that of XGBoost was the lowest, indicating that its prediction calibration was the best (Brier scores = 0.193, Fig. 3C).XGBoost model reveals the largest area under the decision curve, indicating a better clinical utility than other models (Fig. 3D).The PR curve is sensitive to data imbalance, and it changes dramatically as the ratio of positive to negative samples changes [25].As we know, the larger the AUPR, the higher the average precision of the model.Although in  the training set, RF and KNN had the largest value of AUPR (Fig. 3E), in the validation set, the PR curve area of the XGBoost model was the largest (AUPR = 0.842) (Fig. 3F).Based on the above results, the XGBoost model may be the optimal model choice for this dataset, rather than the RF and KNN models, which may overfit data.

The best model evaluation
The XGBoost ML algorithm analysis and tenfold crossvalidation were performed on the dataset.According to the findings, the training set's average AUROC was 0.881 (0.873-0.890,Fig. 4A), the validation set's average AUROC was 0.699 (0.614-0.782,Fig. 4B), and the testing set's AUROC was 0.725 (Fig. 4C).If the validation set's AUROC is lower than the test set's, the model fitting could be considered successful, indicating that the model has good generalization [26].Meanwhile, as shown in Fig. 4D, the learning curve revealed that the training and validation sets were well-fitting and stable [26][27][28].As a result, the above results revealed that the XGBoost algorithm might be employed for this dataset's classification modeling purpose.
The SHAP analyzes the entire test set, visually explaining the impact of six key variables on the XGBoost model.Furthermore, in the SHAP analysis of the XGBoost model, the color represents the value of the variable, red pixels symbolize positive SHAP values enhancing class likelihood, while blue pixels denote negative SHAP values reducing class probability (Fig. 5A).The bar chart shows the relationship between the magnitude of the feature value and the predicted impact (Fig. 5B).

Discussion
Demetri et al. [8] previously reported that advanced GIST patients with IM C min < 1100 ng/mL had a shorter progression-free time in 2009.Although there is still controversy about the optimal cut-off value for IM C min , in clinical practice, "1100 ng /ml" has become a common reference value for monitoring IM C min in outpatients.In this study, we thus used 1100 ng/ml as the cutoff value and converted IM C min , a continuous variable, into a binary variable.We compared nine common ML algorithms.The optimal ML model was selected using AUROC, DCA, Brier Scores, and AUPR.Finally, the XGBoost model was selected as the best model for analysis, internally validated, and proved to have good classification.
The relationships between the label variable (IM C min ) and feature variables were assessed using LASSO regression and LR.Six key variables (daily IM dose, metastatic site, gender, PLT, NEU%, and RBC) were screened out, which were easy to obtain, and also were the key variables in constructing the XGBoost model in this study.Interestingly, except for PLT, these key variables reached also statistical significance in Table 2. Some studies believe that IM is mainly metabolized in the liver [29,30].Therefore, before our data analysis, Laboratory indicators related to liver function examination were expected to be key and important features in constructing the classification model.However, to our surprise, the features that were finally screened by parameters did not include laboratory indicators related to liver function examination.We consider that the reason for this phenomenon may be that our outcome variables are binary, whereas the outcome variable in the previous study was continuous.This difference may lead to the exclusion of laboratory indicators related to liver function examination in the final selection of variables.IM C min was shown to be higher in females than in males in several studies, and researchers believed the difference could be attributable to differences in body weight or medication adherence between genders [31,32].The existence of liver metastases may result in more changes and increased exposure to IM, which may cause higher in IM C min [33].A previous study (Eechoute, 2012) found that IM clearance was expected to decrease by 3.8% for every 100 cm 3 increase in liver metastatic volume [29].Previous studies had reported the relationship between daily IM dose and IM C min [17,18,34], the TDM for IM provided a reference for the adjustment of IM dosage, which added to the utility of TDM in the management of patients with GISTs [12].It is worth noting that no foreign studies have previously reported the effect of RBC on imatinib clearance, but a recent domestic study confirmed that RBC had a significant effect on the clearance of IM [35], which may be due to ethnic differences between domestic and foreign study populations.Thrombocytopenia and neutropenia are common side effects of IM-targeted therapy [36], which may be why PLT and NEU% are key variables in constructing the model.
IM is an anti-cancer drug administered primarily to outpatients because blood samples are not always available at the end of the administration interval.Thus, IM C min is the most widely used pharmacokinetic proxy for predicting clinical outcomes [7,37], and C min is naturally used as a focus for TDM [38].TDM for IM may reassure patients and physicians about full exposure to the drug and improve long-term adherence to this chronic treatment, which may be a promising approach for fine-tuning the IM dosage for better tolerability and optimal clinical outcomes in patients with GISTs [7,37].It is widely known that high IM C min increases the risk of adverse effects and toxicity, which can reduce medication adherence rates and quality of life.Therefore, it is crucial for patients with GISTs to frequently undergo TDM of IM [34].However, most hospitals are unable to monitor IM C min because they do not have the equipment to do so, which makes the IM C min classification model valuable for clinical application.
Precision therapy stands as a primary use of ML, offering patients customized medical services including individualized dosage modification, plasma concentration prediction, and prediction of negative drug reactions [13,39,40].In clinical practice, 1100 ng/mL is often used as the reference value, combined with the patient's drug tolerance and the change in CT tumor lesions, to evaluate the drug efficacy and adjust the drug dosage [41].For example, patients with IM C min less than 1100 ng/ml (which is predicted by the XGBoost model), where tumor progression is defined by imaging and/or symptomatic progression, could be encouraged to appropriately increase the doses.By the same token, patients with IM C min greater than 1100 ng/ ml, as predicted by the XGBoost model, would experience serious adverse drug reactions and could be encouraged to appropriately reduce their doses.Using the above two examples, we know that using machine-learning methods to detect blood drug concentrations could help some hospitals without the TDM platform reduce their healthcare burden.For some hospitals with the TDM platform, sometimes, the ML model is more often used to streamline IM C min monitoring rather than completely replace TDM.
A model developed by Gotta in 2012 showed that the Bayesian MAP-ρ method, which considered the correlation between pharmacokinetic parameters, could predict IM C min with an unbiased accuracy of ± 30.7% [42].
The difference between this study and the above study mainly lies in the study population, study design, and study variables.First, IM C min measured in the adjuvant setting is excluded.Second, the classification model includes six feature variables that are easily accessible during usual treatment.This advantage enables the model to be generalized and applied well.Finally, to our knowledge, this is the first study to develop and internally validate a classification model for IM C min that has high predictive performance, which, combines with Demetri's study [8], may aid in prognostic prediction in patients with advanced GISTs.Therefore, in the future, we plan to further establish a web application that is easy to use based on the presented XGBoost classification model, which could then be used as a real-time clinical decision support tool through self-learning and optimization and aid in personalized IM dose adjustment.
Although the new model has good predictive performance, there are still some considerable limitations to this study.First, the limited number of samples available may reduce the performance of the XGBoost model.Second, given its nature as a retrospective, single-center research with an extended duration, it faces all the constraints typical of retrospective studies.For instance, the lack of pharmacokinetic parameters and body surface area data, incomplete laboratory indicators, and fluctuations in blood collection time points may all affect IM C min .Therefore, in this study, the classification prediction of IM C min is the next best thing, rather than the specific value prediction, which is continuous.For this reason, our current model is more of a reference than a complete replacement for TDM.Third, while our classification model has been internally validated, additional prospective validation should be performed in future studies, or a wholly external dataset should be employed for external validation to improve the generalization ability of this model.Finally, as several works of literature suggest polymorphism effects on exposure and drug-drug interaction via CYP3A [30,[43][44][45], changes in C min estimation could be suspected, but those indicators are not included in this research.In future work and research, we will make efforts to make up for the above deficiencies and establish a new model, and the result variable of this model is a continuous value, to help some hospitals without the TDM platform reduce their healthcare burden, or even replace TDM.

Conclusion
We developed and validated ML models for individualized classification of IM C min tailored to patients with advanced GISTs from China by utilizing readily available baseline information and assay indices, which were easy to obtain.This XGBoost model showed good classification performance and had good clinical application value.

Fig. 1
Fig. 1 The case screening flowchart and the schematic representation of the study design.A This figure showed how the data were gathered from the "Weinichangzai" database of the First Affiliated Hospital of CMU, and all variables included demographic information, treatment information, clinical information, and laboratory indicators.There were 26 feature variables collected, and 6 key variables related to IM C min were screened using the LASSO regression and binary logistic regression.Moreover, the study used the 6 key variables to establish a classification model.B The flowchart of study design.Abbreviations: GIST, gastrointestinal stromal tumor; CMU, Chongqing Medical University; IM, imatinib; C min , plasma trough concentration; LASSO, least absolute shrinkage and selection operator.

Fig. 2
Fig. 2 The processes of LASSO regression for screening variables.A The use of tenfold cross-validation to draw vertical lines at selected feature values.B The coefficient profiles of 26 feature variables were obtained from the log (λ) sequence in the LASSO model.Vertical dotted lines are placed at the minimal mean square error (λ = 0.018) and the standard error of the minimum distance (λ = 0.045).Abbreviations: LASSO, least absolute shrinkage and selection operator

Fig. 4
Fig. 4 The performance of the XGBoost model was evaluated by tenfold cross-validation in the training set and internal validation in the test set.A The mean AUROC for the XGBoost model in the training set (AUROC = 0.881).B The mean AUROC for the XGBoost model in the validation set (AUROC = 0.699).C The AUROC for the XGBoost model in the test set (AUROC = 0.725).D In the learning curve, the red dashed line represents the training set and the blue dashed line represents the validation set.Abbreviations: XGBoost, Extreme Gradient Boosting; AUROC, area under the receiver-operating characteristic curve; ROC, receiver operating characteristic; AUC, area under curve

Fig. 5
Fig. 5 SHAP summary graph of the XGboost model.A This diagram describes the dot estimation on the model output of the XGBoost model.Each dot represents an individual patient from the dataset.The colors represent the feature value, red represents the higher SHAP value of specific features, and blue represents the lower SHAP value of specific features.B Average absolute impact of variables on the output value of the XGBoost model (ranked in descending order of feature importance).Abbreviations: XGBoost, Extreme Gradient Boosting; SHAP, Shapley Additive Explanations; RBC, red blood cell count; NEU%, percentage of neutrophils; PLT, platelet count

Table 1
Baseline characteristics of 890 IM C min data from 212 patients with advanced GISTs Abbreviations: GIST Gastrointestinal stromal tumor, IM Imatinib, WBC White blood cell, PLT Platelet count, NEU% Percentage of neutrophils, RBC Red blood cell count, HB Hemoglobin, LYM% Percentage of lymphocytes, ALT Alanine aminotransferase, AST Aspartate aminotransferase, TBIL Total bilirubin, DBIL Direct bilirubin, IBIL Indirect bilirubin, GGT Gamma-glutamyl transpeptidase, AKP Alkaline phosphatase, Cr Creatinine, BUN Urea nitrogen, eGFR Estimated glomerular filtration rate, DOG-1 Gastrointestinal stromal tumor protein 1, CD117 Cluster of differentiation 117, CD34 Cell differentiation factor 34 a Mann-Whitney U-test; b Pearson chi-square test; We calculated all P values as two-tailed; "Age at diagnosis" referred to the age at diagnosis of primary localized GIST; "Age at blood sampling" referred to the age at primary localized GIST patients took IM; Percentages might not always add up to exactly 100% as a result of rounding

Table 2
Comparison of baseline characteristics between "IM C min ≤ 1100 ng/ml" and "IM C min > 1100 ng/ml" Abbreviations: GIST Gastrointestinal stromal tumor, IM Imatinib, WBC White blood cell, PLT Platelet count, NEU% Percentage of neutrophils, RBC Red blood cell count, HB Hemoglobin, LYM% Percentage of lymphocytes, ALT Alanine aminotransferase, AST Aspartate aminotransferase, TBIL Total bilirubin, DBIL Direct bilirubin, IBIL Indirect bilirubin, GGT Gamma-glutamyl transpeptidase, AKP Alkaline phosphatase, Cr Creatinine, BUN Urea nitrogen, eGFR Estimated glomerular filtration rate, DOG-1 Gastrointestinal stromal tumor protein 1, CD117 Cluster of differentiation 117, CD34 Cell differentiation factor 34 a Mann-Whitney U-test; b Pearson chi-square test; We calculated all P values as two-tailed; "Age at diagnosis" referred to the age at diagnosis of primary localized GIST; "Age at blood sampling" referred to the age at primary localized GIST patients took IM; Percentages might not always add up to exactly 100% as a result of rounding

Table 3
Forward stepwise binary logistic regression analysisThrough forward stepwise binary logistic regression analysis, variables such as HB, LYM%, and age at diagnosis are excluded Abbreviations: NEU% Percentage of neutrophils, HB Hemoglobin, RBC Red blood cell count, LYM% Percentage of lymphocytes, PLT Platelet count, R Regression coefficient, SE Standard error, OR Odds ratio