- Research
- Open access
- Published:
Artificial intelligence reveals the predictions of hematological indexes in children with acute leukemia
BMC Cancer volume 24, Article number: 993 (2024)
Abstract
Childhood leukemia is a prevalent form of pediatric cancer, with acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) being the primary manifestations. Timely treatment has significantly enhanced survival rates for children with acute leukemia. This study aimed to develop an early and comprehensive predictor for hematologic malignancies in children by analyzing nutritional biomarkers, key leukemia indicators, and granulocytes in their blood. Using a machine learning algorithm and ten indices, the blood samples of 826 children with ALL and 255 children with AML were compared to a control group of 200 healthy children. The study revealed notable differences, including higher indicators in boys compared to girls and significant variations in most biochemical indicators between leukemia patients and healthy children. Employing a random forest model resulted in an area under the curve (AUC) of 0.950 for predicting leukemia subtypes and an AUC of 0.909 for forecasting AML. This research introduces an efficient diagnostic tool for early screening of childhood blood cancers and underscores the potential of artificial intelligence in modern healthcare.
Introduction
Leukemia represents a significant oncological challenge affecting individuals of all ages worldwide, with a marked impact on children [1]. This group of malignancies, originating from haematopoietic stem cells, disrupts normal blood cell production, compromising the immune system, and increasing susceptibility to infections [2, 3]. The aetiology of leukemia is complex, involving genetic and environmental factors, yet it remains poorly understood in many cases [4]. It is classified into various types, including acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML), which are particularly prevalent among children, whereas other forms are more common in adults [5, 6].
Early diagnosis of leukemia is crucial for improving prognosis and survival rates. The existence of various diagnostic methods with strengths and limitations, the challenge of early leukemia diagnosis necessitates comprehensive testing, including Morphology, Immunology, Cytogenetics, and Molecular biology (MICM), to accurately identify the disease [7,8,9]. Morphological examination of blood and bone marrow samples under a microscope is fundamental but depends on the pathologist’s skill and may miss minimal residual disease (MRD) and early-stage leukemia [10]. Flow cytometry-based immunophenotyping identifies specific markers on leukemia cells, aiding in subtype classification, but it is time-consuming and requires sophisticated equipment [11, 12]. Cytogenetic and molecular diagnostics detect genetic abnormalities and mutations but are expensive and complex, limiting their routine use [13]. Biochemical markers like elevated lactate dehydrogenase (LDH), C-reactive protein (CRP), platelets (PLT), and hemoglobin (HB) can indicate leukemia but lack specificity and can result in false positives [14, 15]. However, The MICM diagnostic approach is complex, time-consuming, expensive, requires specialized personnel, depends heavily on sample quality, has limited sensitivity for early disease detection. Therefore, we predict the disease based on a combination of multiple indices to develop a comprehensive predictive index for leukemia in young children by integrating multiple-indices parameters with advanced machine learning techniques to enhance early detection accuracy and improve patient early treatment outcomes.
Recent efforts in leukemia research have focused on early detection and prediction, employing machine learning algorithms to analyse patient blood samples for early-stage indicators of the disease [16,17,18,19]. The search for novel biomarkers and the refinement of predictive models for leukemia diagnosis underscores the importance of this research. With the continuous advancement of treatment modalities, including chemotherapy, radiation therapy, and targeted therapy, the potential to improve patient outcomes is significant [20, 21]. However, the challenge remains in identifying reliable biomarkers for early detection, which is crucial for the timely initiation of treatment and ultimately, improving survival rates among children diagnosed with leukemia.
This study used the data of tests performed on leukemia patients with major subtypes (ALL, AML) in young children and healthy controls at the First Affiliated Hospital of Guangzhou Medical University between 2013 and 2023, in order to create a predictive index for leukemia. The parameters used for this purpose were albumin, total protein, total cholesterol, hemoglobin, platelets, white blood cells, neutrophil count, lymphocyte count, eosinophil count, and basophil count (see Fig. 1). Treatment for leukemia has primarily included chemotherapy, radiotherapy, and targeted therapy, and the effectiveness of treatment may depend on the subtype of leukemia, age, and the patient’s overall health. Therefore, identifying novel biomarkers for the rapid and accurate screening of leukemia will be essential to accelerate the pace of treatment development and bring the greatest benefit to patients, which is an imminent challenge.
Materials and methods
Participants enrollment
Our research study analyzed data from 1081 pediatric cases (<10 years old), divided into three cohorts: 826 cases of acute lymphoblastic leukemia (ALL), 255 cases of acute myeloid leukemia (AML), and 200 healthy controls. These cases were diagnosed and treated at the First Affiliated Hospital of Guangzhou Medical University, China, between July 2013 and July 2023. Healthy control subjects were randomly selected from children undergoing routine health checks at the same hospital, matched in age and sex distribution with the patient groups. All available patients diagnosed with ALL and AML during this period were included, based on comprehensive clinical and laboratory data availability. All patients utilizing a multi-parameter approach known as MICM (Morphology, Immunology, Cytogenetics, and Molecular biology) diagnosis. At our center, flow cytometry was primarily employed to analyze the surface markers on cells collected from blood or bone marrow samples to confirm the diagnosis. Specifically, cytoplasmic CD3 expression or myeloperoxidase (MPO) positivity were assessed to aid in the differentiation of leukemia subtypes. Additionally, PCR and sequencing techniques were used to detect mutations in genes such as FLT3, NPM1, and CEBPA for AML, as well as genetic alterations in genes like TEL-AML1 and MLL for ALL to double-check the diagnoses as per internationally recognized diagnostic standards. This comprehensive diagnostic approach ensured accurate classification and informed treatment strategies for patients. Clinical laboratory test results, including parameters such as albumin, total protein, total cholesterol, hemoglobin, platelets, white blood cells, neutrophil count, lymphocyte count, eosinophil count, and basophil count, were extracted from the hospital information system (HIS) and processed into Excel spreadsheets for further analysis using advanced machine learning techniques. This study was approved by the Scientific Research Project Reviews Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (Document 2021 No.K25). This study provides a comprehensive overview of the clinical indicators of ALL and AML in children, essential for advancing our understanding of the disease and its treatment.
Key indicators screening
The selection of biomarkers (ALB, TP, LDH, HB, PLT, WBC, BASO, EO, LYMPH, and NEUT) in the leukemia prediction model was based on their clinical relevance and support from existing literature. Previous studies have demonstrated a strong correlation between the nutritional biomarkers of the human body and metabolism, and many have used ALB to evaluate tumour pathogenesis and clinical treatment [22, 23], Albumin (ALB) is a protein made by the liver and serves as a marker of nutritional status and liver function. Low levels of albumin can indicate poor nutritional status or liver dysfunction, which are common in leukemia patients [24]. Total protein (TP) measures the total amount of albumin and globulin in the blood, providing insight into the overall nutritional and health status of the patient. Altered levels of serum proteins have been observed in leukemia patients and are indicative of disease progression [25]. Lactate dehydrogenase (LDH) is an enzyme found in almost all body tissues, and elevated levels are often seen in patients with hematologic malignancies due to cell turnover and tissue breakdown. High LDH levels are commonly associated with increased tumor burden and poorer prognosis in leukemia [26, 27]. McQuilten et al. [28] pointed out in their study that Hemoglobin (HB) is an important indicator for the diagnosis of leukemia, and it has been validated. Platelets (PLT) are cell fragments involved in blood clotting, and thrombocytopenia (low platelet count) often indicates the severity of bone marrow involvement in leukemia [29]. White blood cells (WBC) are part of the immune system and fight infection, with abnormal counts being critical for the diagnosis and monitoring of leukemia [30]. Lymphocytes (LYMPH), basophils (BASO) and eosinophils (EO) are WBCs involved in the immune response, with abnormal lymphocyte counts commonly associated with leukemia [31, 32]. Neutrophils (NEUT) are the most abundant type of WBC and are crucial for fighting infections, with neutropenia indicating bone marrow suppression in leukemia [33, 34].
Statistical analyses and visualisation
Multiple algorithms were used in data processing, such as the randomised nearest neighbour embedding (tSNE) of distribution t and the receiver operating characteristic (ROC). Furthermore, the limma toolkit (version 3.52; https://bioconductor.org/packages/limma), developed by Ritchie et al. [35], was used to perform different analyses between biochemical metrics. The data preliminary classification was visualized by the “tsne” package (version 0.1-3.1; https://cran.r-project.org/web/packages/tsne) in R studio. In addition, the ROC curve was used to evaluate and compare the effectiveness of diagnostic models and whether they are of practical value [36]. To visualize the ROC curve and AUC, we used the “pROC” package (version 1.18.0; https://cran.r-project.org/web/packages/pROC/). Data were expressed as mean SD for continuous variables and as numbers (percentages) for categorical variables, and visualized and analyzed by R studio and Python Programming. All data collection and statistical analysis of the data were performed using R version 4.2.1 and Python 3.7. The basic column bar and box plots were drawn via the “matlibplot” python package (version 3.5; https://matplotlib.org/). The subplots were visualised using the “seaborn” Python package (version 0.11.2; https://seaborn.pydata.org/). Furthermore, volcano plots were visualised using the ggplot2 package (version 3.3.6; https://cran.rproject.org/web/packages/ggplot2) in R, and a heat scatter was created using the ggplot2 and LSD package (version 2.45; https://cran.rproject.org/web/packages/LSD). To determine index differences in medical records of men and women with different cancers, we used the “beanplot” package (version 1.3.1; https://cran.rproject.org/web/packages/beanplot/). Lastly, the colour and typography optimisation was done through Adobe Illustrator (https://www.adobe.com).
Random forest models
In this study, 10 blood indices were normalised and the data was divided into a training set (70%) and a validation set (30%) using random sampling. A random forest (RF) model was developed utilizing Python (version 3.7; http://www.Python.org) and the sklearn library (version 1.1.2; https://scikit-learn.org/stable/). The GridSearchCV module was used to optimise the parameters of the RF model, which included approximately 100 trees, each with 10 random variables and a maximum tree depth of 50, to ensure the best performance of the model. Cross-validation was utilized during the parameter tuning process to maximize the model’s stability and utility while avoiding over-fitting. Afterward, the best model was chosen based on the accuracy of the predictions made on the test set. Finally, the recognition performance was evaluated with the help of the ROC curves and the corresponding AUC values.
Results
Demographic characteristics of all patients
A total of 1281 participants were included in this study, 826 with acute lymphoblastic leukemia (ALL), 255 with acute myeloid leukemia (AML), and 200 healthy controls. All participants were aged<10 years and were randomly selected for comparison. The male-female ratio of participants was significantly higher in the patient groups than in the healthy controls, accounting for 62.5%, 60.8%, and 51.5% respectively.
The gender subgroup distribution of all participants is visualised in Fig. 2A. The difference between male and female patients was weak in the healthy control group, with no significant gender disparity in the healthy control group. The results indicate a notable gender disparity in ALL patients and AML patients, with a significantly higher proportion of boys compared to girls. This trend is consistent across both leukemia subtypes, suggesting a potential gender-related predisposition to these forms of leukemia. In contrast, the healthy control group shows a more balanced distribution between boys and girls. Figure 2B provides the age distribution of all participants. The data highlight that among ALL patients, a larger proportion are under the age of 3 compared to older age groups. This suggests that ALL is diagnosed more frequently in younger children within this cohort. In contrast, the age distribution for AML patients does not show a concentration as pronounced in this younger age range, indicating a more uniform distribution between the different age groups studied. The healthy control group displays a uniform distribution in age ranges, ensuring a broad and representative sample of the general paediatric population.
The predictive biomarkers for the three cohorts, ALL, AML and healthy controls, were further visualised using a tSNE plot (Fig. 2C), which visualises the clustering of predictive indices for ALL and AML versus healthy controls. Different colored dots represent the three cohorts: ALL (blue), AML (orange), and healthy controls (green). The dotted curves outline the areas of concentration for each cohort. The distinct separation between these clusters indicates significant differences in the biomarker profiles of healthy children compared to those with ALL and AML, demonstrating the efficacy of these indices in differentiating between these groups. The volcano plot of the difference between ALL and AML versus healthy controls is illustrated in Fig. 2D, Points further to the right indicate indices that are significantly higher in leukemia patients compared to healthy controls, and points to the left indicate lower values. The vertical dotted lines mark the fold change threshold and the horizontal line represents the significance threshold. This graph highlights specific indices that are markedly different in leukemia patients, underscoring their potential as biomarkers for detection and diagnosis. In this study, we compared several machine learning models, including Random Forest (RF), Naive Bayes (NB), K-Nearest Neighbours (KNN), and Support Vector Machine (SVM). As shown in Fig. 2E, the RF model outperformed others across all evaluation metrics, achieving an AUC of 0.838, significantly higher than NB’s 0.774, KNN’s 0.712, and SVM’s 0.659. Additionally, the RF model demonstrated a better balance with sensitivity of 0.895 and specificity of 0.704, making it a more robust choice for the classification task. Based on these results, we selected the RF model for further analysis and application.
Difference significance analysis of indices among ALL, AML, and healthy controls
To further investigate this correlation, in this study, we analysed 10 potential indices in detail and plotted the scatter density map of TP against ALB (Fig. 3A). The data points were separated by contour lines of different colours, allowing us to visually assess the density distribution between the scattered points. The visual distribution plots revealed that healthy controls had a scattered distribution, with ALB concentrated around 45 mg/mL and TP around 70 mg/mL, while ALL patients were centralised around 40 mg/mL and 60 mg/mL for ALB and TP, respectively. In contrast, the distribution of ALB and TP in patients with AML was relatively dispersed and the contents were rather low. In general, there were significant differences between each cohort.
To visualise gender differences, we selected 10 potential indices as our study objects and integrated the 10 indices of all participants in this study. Figure 3B shows the distribution and statistical significance of 10 potentially valuable indices in this analysis between genders among the three cohorts. We found that ALL and AML had very significant differences compared to the healthy population. Additionally, in ALL patients, there was a significant difference in WBC between genders, while in AML patients, ALB, TP, and PLT showed significant differences between genders in the analysis of each cohort. Besides, the above 10 indices were also drawn in a box chart (Fig. 4), the results highlighted significant differences between patients and healthy controls, as well as statistically significant differences in TP and LDH in ALL and AML. However, HB, ALB, and WBC showed only a few differences in this comparison. However, BASO, EO, LYMPH and NEUT were not significantly different in patients with ALL and AML.
Detailed correlational analysis in key indices in three cohorts
Many studies have demonstrated the indices link between ALL, AML, and human metabolism [37]. In our correlation analysis, we used paired analysis to compare the distribution of 10 indices in each cohort. The Fig. 5 reveals several notable correlations in the blood parameters of ALL(red), AML(green), and healthy individuals (blue). In AML patients, a strong positive correlation (0.341***) between HD and PLT indicates that higher HD levels are associated with increased platelet counts, but in the healthy population exhibit a significant negative correlation (-0.402***), suggesting that as hemoglobin levels decrease, platelet counts tend to increase. Additionally, there is a striking positive correlation (0.527***) between EO and LYMPH. These specific data values highlight significant hematological alterations in leukemia patients compared to healthy individuals. Most of the content levels of each index in the healthy control were covered by ALL and AML patients. The lower left part of the result represented a dispersed distribution of focused indices, while the upper right part showed a concentrated distribution of centralised indices and the diagonal part illustrated the density distribution of anchor indices. We observed that the ALB, TP, HB, and PLT of healthy controls were mostly distributed above ALL and AML patients. In contrast, WBC, BASO and NEUT of ALL and AML patients were distributed covered healthy distribution. Notably, ALB and TP demonstrated the best correlation among the distribution of 10 indices and presented a well-linear distribution.
In addition, combining EO with other indices allowed healthy controls to be separated mainly from ALL and AML patients. Similarly, correlation analysis among all indices revealed that the distribution of TP and ALB, as well as of PLT, WBC, BASO, EO and LYMPH, showed a similar trend. Specifically, the former presented an elliptic triangle, while the latter took the shape of a fan-shaped inverted triangle.
Relevant exploration of ALL and AML in RF model
After RF model prediction, for healthy controls and ALL, we imported patients with AML into the classifier for comparative analysis, where the ROC curve and the AUC of the biomarkers focused above were calculated. Therefore, as shown in Fig. 6A and B, we performed an exploration of the ROC single index prediction model for the 10 key indices to observe the efficacy of this model and further study the validity of the indices in patients with ALL and AML. In the ROC analysis of 10 markers selected in ALL patients, ALB surprisingly had a poor predictive effect. The AUCs of the other 9 indices in ALL were greater than 0.6, while HB and EO showed a better predictive effect of more than 0.8. The AUC of PLT and EO, which may produce a stronger predictive effect, was less than 0.7 in the study of patients with AML, although the AUC of all markers was greater than 0.6. In ALL, the sensitivity of the curve is mostly greater than 1, and the sensitivity of leukocytes and granulocytes is above 0.9, but the specificity of the prediction of the model is relatively poor. In general, the results indicate that EO has a superior predictive effect in patients with ALL and AML. The AUC of this model was confirmed to be 0.950 and 0.909 in patients with ALL and AML, respectively, when combined with the 10 key indices emphasized in this research (Fig. 7).
The ALL prediction model’s AUC achieved 0.950 in Fig. 7, while the sensitivity and specificity were 0.965 and 0.815, respectively. However, the AML prediction model was a little worse in ALL aspects, with an AUC of 0.909, but slightly higher specificity than the ALL value of 0.884.
Discussion
In our study, we conducted a detailed statistical analysis of clinical laboratory test data from 1081 leukemia patients, 826 with ALL and 255 with AML, collected over a 10-year period. Data were used to identify changes in biomarkers and applied to the Random Forest (RF) model to develop a diagnostic and prognostic prediction model for ALL and AML. This approach demonstrated high accuracy in prediction. In recent years, childhood malignancies, including acute leukemias, have emerged as a significant source of morbidity and mortality among children under the age of 10 [38]. To effectively reduce the prevalence of this widespread disease, early detection and prediction of childhood leukemia are of the utmost importance. The international incidence of leukemia is 7.6 per 100,000 people for men and 5.4 per 100,000 people for women [39]. Our study reflects these ratios and highlights the distribution of indices for healthy individuals and leukemia patients. The volcano plot revealed that the leukemia patients’ indices are primarily in the region of up and down regulation, with a significant fold change in some indices, demonstrating the distinction between leukemia patients and healthy individuals.
Nutrition is an important factor in the prognosis of cancer patients. Serum albumin (ALB) is a good indicator of nutritional health and has been used as a predictor of leukemia Katsuya et al. [40] to investigate its prognostic effect. Healthy children in China have serotonin levels in the range of 40-55 mg/mL and total serum protein levels in the range of 65-84 mg/mL [41], and Total Protein <60 g/L or Albumin <25 g/L is referred to as hypoproteinemia [42]. Our study found that leukemia patients had more hypoproteinemia than healthy children, and the TP and ALB indexes of children with two kinds of leukemia are lower than those of healthy children. As shown in Fig. 3, most leukemia patients have low levels of albumin, in connection with a report by Wang et al. [43] demonstrating that leukemias are associated with lower albumin . Also confirmed by Sadek et al. [44] in a study found that patients with acute myeloid leukemia (AML) had significantly lower levels of albumin compared to healthy individuals. Among these low levels of albumin, patients were more likely to experience treatment-related toxicities, such as infections, compared to patients with normal ones [45, 46].
In this study, the random forest model demonstrated excellent performance, with an AUC of 0.950 for the ALL prediction model and an AUC of 0.909 for the AML prediction model. These results surpass those of previous studies. For instance, Mishra et al. [16] utilized a Gray Level Co-occurrence Matrix (GLCM) method to analyze cell morphology, achieving an AUC of 0.87. Similarly, Das et al. [19] employed support vector machines (SVM) to improve ALL detection and compared the effectiveness of different machine learning classifiers. Researchers have also analyzed patient BMI [47], gut microbiota [17], and daily diet [48] to screen for risk indices. Additionally, metabolic imbalance biomarkers in the exploratory stage, such as changes in glucose [14], might be important diagnostic tools for detecting leukemias, particularly in differentiating between different types of leukemias. Furthermore, key markers of leukemia, including Hemoglobin (HB), White Blood Cell Count (WBC), and Platelet Count (PLT), were identified [49, 50]. The results showed significant differences between healthy and leukemic children, as well as considerable differences between the subtypes ALL and AML (Fig. 4). These three indicators were particularly focused on in some cases. Bain [51] stated that hemoglobin levels could be an important diagnostic tool in detecting leukemia, with low levels indicating the presence of the disease and anemia often being one of the first symptoms. Regarding the white blood cell count, Kiem Hao et al. [52] noted that it could be more than three times higher in leukemia patients than in healthy individuals. This increase can lead to a decrease in the number of red blood cells and platelets, resulting in anemia and a greater risk of bleeding and infections. Platelet count is another crucial parameter in the diagnosis of leukemia. Surprisingly, a decrease in platelet count, combined with an increase in white blood cell count, can strongly indicate leukemia [53]. In some cases, the platelet count may be the only abnormal finding leading to a diagnosis of leukemia [54].
The analysis of Fig. 5 highlights several important correlations among blood parameters in ALL, AML, and healthy individuals. In AML patients, there is a strong positive correlation (0.341***) between hemoglobin (HB) and platelet count (PLT), indicating that higher hemoglobin levels are associated with increased platelet counts. Conversely, in healthy individuals, a significant negative correlation (-0.402***) suggests that as hemoglobin levels decrease, platelet counts tend to increase. This contrast underscores the distinct hematological profiles between leukemia patients and healthy individuals. Additionally, a notable positive correlation (0.527***) between eosinophils (EO) and lymphocytes (LYMPH) in leukemia patients suggests a linked alteration in the immune response. The distribution patterns further reveal that healthy controls generally have higher levels of albumin (ALB), total protein (TP), hemoglobin (HB), and platelet count (PLT) compared to ALL and AML patients. In contrast, white blood cells (WBC), basophils (BASO), and neutrophils (NEUT) levels in ALL and AML patients cover the distribution range seen in healthy individuals, indicating elevated levels in leukemia patients. These findings emphasize the significant hematological changes in leukemia patients compared to healthy individuals and underscore the importance of integrating multiple hematological indices for improved diagnostic accuracy. Understanding these correlations provides valuable insights into the disease’s pathophysiology and aids in developing targeted diagnostic and therapeutic strategies.
The results of the study could have important implications for the early detection and treatment of these types of leukemia, especially in children where these malignancies are a significant source of morbidity and mortality. In clinical practice, our model performance could be used to screening and diagnostic tests for ALL and AML, which could lead to earlier detection and more effective treatment (Fig. 7). The model could also be used to monitor the progression of the disease and evaluate the effectiveness of treatment. However, further validation of the model in a clinical setting is necessary before it can be widely adopted. Future research can build upon these findings by further refining the predictive model and incorporating other factors that can affect the prognosis of patients with leukemia, such as environmental and lifestyle factors, to increase the precision of the model. Additionally, larger and more diverse patient populations can be studied to validate the findings of the current study and to make the model more generalizable.
Conclusion
In this study, we aimed to investigate the differences in common blood indices between healthy children and those with acute leukemia, as well as to analyze the relationship between these indices and the gender of children. To achieve this goal, we used a 10-key index RF model for the prediction of leukemia subtypes. The results indicated that the proposed method achieved impressive predictive performance, with an AUC of 0.950 for AML and 0.909 for ALL. Therefore, these findings demonstrate that the proposed model needs to be validated prospectively as an auxiliary tool for the detection of blood cancer.
Availability of data and materials
The original coding contributions presented in this study are included in DOI:10.17632/fz7d6x4bwx.1, further inquiries can be directed to the corresponding author.
References
Zapata-Tarrés M, Balandrán JC, Rivera-Luna R, Pelayo R. Childhood acute leukemias in developing nations: successes and challenges. Curr Oncol Rep. 2021;23:1–9.
Hu Y, Zhang X, Zhang A, Hou Y, Liu Y, Li Q, et al. Global burden and attributable risk factors of acute lymphoblastic leukemia in 204 countries and territories in 1990–2019: Estimation based on Global Burden of Disease Study 2019. Hematol Oncol. 2022;40(1):93–105.
Radivoyevitch T, Sachs R, Gale R, Molenaar R, Brenner D, Hill B, et al. Defining AML and MDS second cancer risk dynamics after diagnoses of first cancers treated or not with radiation. Leukemia. 2016;30(2):285–94.
Newell LF, Cook RJ. Advances in acute myeloid leukemia. BMJ. 2021;375:n2026.
Fitzmaurice C, Allen C, Barber RM, Barregard L, Bhutta ZA, Brenner H, et al. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 32 cancer groups, 1990 to 2015: a systematic analysis for the global burden of disease study. JAMA Oncol. 2017;3(4):524–48.
Ou Z, Yu D, Liang Y, He W, Li Y, Zhang M, et al. Analysis of the Global Burden of Disease study highlights the trends in death and disability-adjusted life years of leukemia from 1990 to 2017. Cancer Commun. 2020;40(11):598–610.
Bain BJ. Leukaemia diagnosis. John Wiley & Sons; 2017.
Pourrajab F, Zare-Khormizi MR, Hashemi AS, Hekmatimoghaddam S. Genetic characterization and risk stratification of acute myeloid leukemia. Cancer Manag Res. 2020;12:2231–53.
Zhang WT, Zhang GX, Gao SS. The potential diagnostic accuracy of circulating microRNAs for leukemia: a meta-analysis. Technol Cancer Res Treat. 2021;20:15330338211011958.
Vardiman JW, Thiele J, Arber DA, Brunning RD, Borowitz MJ, Porwit A, et al. The 2008 revision of the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia: rationale and important changes. Blood J Am Soc Hematol. 2009;114(5):937–51.
Ng VL. Flow Cytometry in Hematopathology: A Visual Approach to Data Analysis and Interpretation. LWW; 2003.
Hui HY, Clarke KM, Fuller KA, Stanley J, Chuah HH, Ng TF, Cheah C, McQuillan A, Erber WN. “Immuno‐flowFISH” for the Assessment of Cytogenetic Abnormalities in Chronic Lymphocytic Leukemia. Cytometry Part A. 2019;95(5):521–33.
Bullinger L, Döhner K, Döhner H. Genomics of acute myeloid leukemia diagnosis and pathways. J Clin Oncol. 2017;35(9):934–46.
Bai Y, Zhang H, Sun X, Sun C, Ren L. Biomarker identification and pathway analysis by serum metabolomics of childhood acute lymphoblastic leukemia. Clin Chim Acta. 2014;436:207–16.
Dirim AB, Tiryaki TO, Altin S, Besisik SK, Hindilerden IY, Nalcaci M. Baseline inflammation indexes and neutrophil-to-LDH ratio for prediction of the first mobilization failure without plerixafor-based regimens in multiple myeloma and lymphoma patients: a single-center retrospective study. J Clin Apher. 2023;38(6):711–20.
Mishra S, Majhi B, Sa PK, Sharma L. Gray level co-occurrence matrix and random forest based acute lymphoblastic leukemia detection. Biomed Signal Process Control. 2017;33:272–80.
Liu X, Zou Y, Ruan M, Chang L, Chen X, Wang S, et al. Pediatric acute lymphoblastic leukemia patients exhibit distinctive alterations in the gut microbiota. Front Cell Infect Microbiol. 2020;10:558799.
Agarwal N, Agrawal P. Early Stage Detection of Leukemia Using Artificial Intelligence. Mach Learn Healthc Appl. 2021:215–24.
Das PK, Pradhan A, Meher S. Detection of acute lymphoblastic leukemia using machine learning techniques. In: Machine learning, deep learning and computational intelligence for wireless communication. Springer; 2021. pp. 425–437.
Wargo JA, Reuben A, Cooper ZA, Oh KS, Sullivan RJ. Immune effects of chemotherapy, radiation, and targeted therapy and opportunities for combination with immunotherapy. In: Seminars in oncology, vol. 42. Elsevier; 2015. pp. 601–616.
Chandra RA, Keane FK, Voncken FE, Thomas CR. Contemporary radiotherapy: present and future. Lancet. 2021;398(10295):171–84.
Maltoni M, Amadori D. Prognosis in advanced cancer. Hematol Oncol Clin. 2002;16(3):715–29.
Li H, Cheng ZJ, Liang Z, Liu M, Liu L, Song Z, et al. Novel nutritional indicator as predictors among subtypes of lung cancer in diagnosis. Front Nutr. 2023;10:1042047.
Doucette K, Percival ME, Williams L, Kandahari A, Taylor A, Wang S, et al. Hypoalbuminemia as a prognostic biomarker for higher mortality and treatment complications in acute myeloid leukemia. Hematol Oncol. 2021;39(5):697–706.
Walter RB, Ofran Y, Wierzbowska A, Ravandi F, Hourigan CS, Ngai LL, et al. Measurable residual disease as a biomarker in acute myeloid leukemia: theoretical and practical considerations. Leukemia. 2021;35(6):1529–38.
Xiao Z, Gong R, Chen X, Xiao D, Luo S, Ji Y. Association between serum lactate dehydrogenase and 60-day mortality in Chinese Hakka patients with acute myeloid leukemia: A cohort study. J Clin Lab Anal. 2021;35(12):e24049.
Geva M, Pryce A, Shouval R, Fein JA, Danylesko I, Shem-Tov N, et al. High lactate dehydrogenase at time of admission for allogeneic hematopoietic transplantation associates to poor survival in acute myeloid leukemia and non-Hodgkin lymphoma. Bone Marrow Transplant. 2021;56(11):2690–6.
McQuilten ZK, Busija L, Seymour JF, Stanworth S, Wood EM, Kenealy M, et al. Hemoglobin is a key determinant of quality of life before and during azacitidine-based therapy for myelodysplasia and low blast count acute myeloid leukemia. Leuk Lymphoma. 2022;63(3):676–83.
Zhang L, Liu J, Qin X, Liu W. Platelet-acute leukemia interactions. Clin Chim Acta. 2022;536:29–38.
Jiwani N, Gupta K, Pau G, Alibakhshikenari M. Pattern recognition of acute lymphoblastic Leukemia (ALL) using computational deep learning. IEEE Access. 2023;11:29541–53.
Kelemen K, Saft L, Craig FE, Orazi A, Nakashima M, Wertheim GB, et al. Eosinophilia/hypereosinophilia in the setting of reactive and idiopathic causes, well-defined myeloid or lymphoid leukemias, or germline disorders: report of the 2019 Society for Hematopathology/European Association for Haematopathology workshop. Am J Clin Pathol. 2021;155(2):179–210.
Bain BJ, Leach M. Leukaemia diagnosis. John Wiley & Sons; 2024.
Bain BJ. Blood Cells: A Practical Guide. 5th ed. Wiley-Blackwell; 2015.
Haider RZ, Khan NA, Urrechaga E, Shamsi TS. Mature and Immature/Activated Cells Fractionation: Time for a Paradigm Shift in Differential Leucocyte Count Reporting? Diagnostics. 2021;11(6):922.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47–e47.
Kannan R, Vasanthi V. Machine learning algorithms with ROC curve for predicting and diagnosing the heart disease. In: Soft computing and medical bioinformatics. Springer; 2019. pp. 63–72.
Mengucci C. A take on complexity: bio-molecules and human metabolism interaction modelling for health and nutrition with machine learning. 2022.
Seth R, Singh A. Leukemias in children. Indian J Pediatr. 2015;82(9):817–24.
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Katsuya H, Yamanaka T, Ishitsuka K, Utsunomiya A, Sasaki H, Hanada S, et al. Prognostic index for acute-and lymphoma-type adult T-cell leukemia/lymphoma. J Clin Oncol. 2012;30(14):1635–40.
Ni X, Song W, Peng X, Shen Y, Peng Y, Li Q, et al. Pediatric reference intervals in China (PRINCE): design and rationale for a large, multicenter collaborative cross-sectional study. Sci Bull. 2018;63(24):1626–34.
Hayes GM, Mathews K, Floras A, Dewey C. Refractometric total plasma protein measurement as a cage-side indicator of hypoalbuminemia and hypoproteinemia in hospitalized dogs. J Vet Emerg Crit Care. 2011;21(4):356–62.
Wang XS, Giralt SA, Mendoza TR, Engstrom MC, Johnson BA, Peterson N, et al. Clinical factors associated with cancer-related fatigue in patients being treated for leukemia and non-Hodgkin’s lymphoma. J Clin Oncol. 2002;20(5):1319–28.
Sadek NA, Abd-eltawab SM, Assem NM, Hamdy HA, EL-sayed FM, Ahmad MAR, et al. Prognostic value of absolute lymphocyte count, lymphocyte percentage, serum albumin, aberrant expression of CD7, CD19 and the tumor suppressors (PTEN and p53) in patients with acute myeloid leukemia. Asian Pac J Cancer Biol. 2020;5(4):131–40.
Li H, Wang F, Huang W. A novel, simple, and low-cost approach for machine learning screening of kidney cancer: an eight-indicator blood test panel with predictive value for early diagnosis. Curr Oncol. 2022;29(12):9135–49.
Li H, Cheng ZJ, Fu X, Liu M, Liu P, Cao W, et al. Decoding acute myocarditis in patients with COVID-19: early detection through machine learning and hematological indices. Iscience. 2024;27(2).
Barr RD, Gomez-Almaguer D, Jaime-Perez JC, Ruiz-Argüelles GJ. Importance of nutrition in the treatment of leukemia in children and adolescents. Arch Med Res. 2016;47(8):585–92.
Fuemmeler BF, Pendzich MK, Clark K, Lovelady C, Rosoff P, Blatt J, et al. Diet, physical activity, and body composition changes during the first year of treatment for childhood acute leukemia and lymphoma. J Pediatr Hematol Oncol. 2013;35(6):437.
Xiong H, Zhang HT, Xiao HW, Huang CL, Huang MZ. Serum metabolomics coupling with clinical laboratory indicators reveal taxonomic features of leukemia. Front Pharmacol. 2022;13:794042.
Li Y, Wang S, Xiao H, Lu F, Zhang B, Zhou T. Evaluation and validation of the prognostic value of platelet indices in patients with leukemia. Clin Exp Med. 2023;23(6):1835–44.
Bain BJ. Diagnosis from the blood smear. N Engl J Med. 2005;353(5):498–507.
Kiem Hao T, Nhu Hiep P, Kim Hoa NT, Van Ha C. Causes of death in childhood acute lymphoblastic leukemia at Hue Central Hospital for 10 years (2008-2018). Global Pediatr Health. 2020;7:2333794X20901930.
Shen ZX, Shi ZZ, Fang J, Gu BW, Li JM, Zhu YM, et al. All-trans retinoic acid/As2O3 combination yields a high quality remission and survival in newly diagnosed acute promyelocytic leukemia. Proc Natl Acad Sci. 2004;101(15):5328–35.
Cabral DA, Tucker LB. Malignancies in children who initially present with rheumatic complaints. J Pediatr. 1999;134(1):53–7.
Acknowledgements
Thank you to the First Affiliated Hospital of Guangzhou Medical University for providing patient data. We also extend our gratitude to the Sun Yat-sen University Cancer Center for supporting our research by supplying the necessary instrumentation. Our thanks go to the MRC Biostatistics Unit of the University of Cambridge for their guidance on statistical analysis and predictive model improvement. Additionally, we acknowledge the Guangdong Youth Excellent Scientific Research Talent International Training Plan for Doctoral Project for providing the research funding.
Funding
This study was supported by the National Natural Science Foundation of China (No. 82302607) and Zhong Nanshan Medical Foundation of Guangdong Province (ZNSXS- 2021005).
Author information
Authors and Affiliations
Contributions
Zhangkai J. Cheng, Haiyang Li, and Mingtao Liu led this research, analyzed data, and wrote the manuscript. Zhangkai J. Cheng visualized figures and revised the manuscript. Haiyang Li provided guidance on research methods and proofread research theories. Xing Fu, Zhiman Liang sorted out the data and visualized part of the data. Li Liu collected and screened participants and healthy control data. Hui Gan and Baoqing Sun supervised and validated the manuscript and supported the funding. All authors contributed to the manuscript and agreed to submit it.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study received approval from the Scientific Research Project Reviews Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (Document 2021 No.K25). Informed consent was obtained from all participants and for participants under 10 years of age, consent was obtained from their parents or legal guardians.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Cheng, Z.J., Li, H., Liu, M. et al. Artificial intelligence reveals the predictions of hematological indexes in children with acute leukemia. BMC Cancer 24, 993 (2024). https://doi.org/10.1186/s12885-024-12646-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12885-024-12646-3