Characteristic of the study population
The studied population included 357 patients presenting with non-clonal cytopenias (controls) and 168 patients with MDS. Univariate analysis of baseline characteristics of the cohort revealed eight parameters associated with MDS diagnosis (Supplementary Table 1). MDS patients were slightly older than non-clonal cytopenic patients (78 [71–84] versus 71 years [62–80], p < 10− 4) (Supplementary Table 1) and more frequently from the male gender. The frequency of anemia was similar between both groups (93.4% versus 93.8%), despite slightly lower hemoglobin levels in MDS patients (9.6 [8.4–11.0] versus 10.5 g/dL [9.5–11.5], p = 0.0007) and higher mean corpuscular volume (MCV) (94 [87–102] versus 87 fL [80–93], p < 10− 4). MDS patients more frequently harbored neutropenia or thrombocytopenia compared to non-clonal cytopenia (45.8% versus 10.1 and 66.7% versus 18.5%, respectively) which was reflected by lower ANC (1.9 [1.0–2.8] versus 5.5 [3.2–8.8], p < 10− 4) and platelet count (113 [54–177] versus 438 [185–526], p < 10− 4) (Supplementary Table 1 and Supplementary Fig. 1). As expected, Mean Platelet Volume (MPV) was also significantly increased in MDS patients (11.4 [10.6–12.5] versus 9.8 [9.3–10.4], p < 10− 4). Due to analytical interference related to the presence of macroplatelets, MPV was only available in 75% of MDS patients compared to 99% of non-clonal cytopenias, explaining why we focused on the evaluation of IPF.
Performance of the MDS-CBC score and morphological parameters
Using the previously published threshold of 0.2, MDS-CBC score was abnormal in 18.7% of non-clonal cytopenias and 94% of MDS patients (Supplementary Table 1 and Supplementary Fig. 1), thus confirming the high performance of this score, even in such a selected cohort (suspicion of MDS). The performance was globally similar across subtypes of MDS with a slightly decreased diagnostic power in MDS with multilineage dysplasia (MDS-MLD). We then evaluated the diagnostic performance of morphological parameters in the different subtypes of MDS and regarding the presence or absence of cytopenia. Ne-WX was slightly different across MDS subtypes (p = 0.005) but not different when comparing high grade MDS versus low grade MDS (Fig. 1A). Interestingly, Ne-WX was constantly increased in MDS patients compared to non-clonal cytopenias whether they were neutropenic or not (Fig. 1B). On the contrary, MCV was not different between MDS subtypes and was only increased in MDS patients showing anemia (Fig. 1C and D). As mentioned earlier, as MPV was not available for all patients, we focused on IPF as a surrogate marker of the presence of dysplastic platelets. IPF was not significantly different between MDS subtypes but, contrary to MCV, was constantly increased in MDS patients whether they harbored thrombocytopenia or not (Fig. 1E and F).
Machine learning identification of the most contributive parameters for MDS diagnosis
Using Breiman’s random forests (RF) classification, we identified Ne-WX and IPF as the two most discriminatory predictors for MDS diagnosis (Fig. 2A), explaining 37 and 33% of diagnoses respectively. Comparatively, ANC and MCV only contributed to 18 and 6% of diagnoses. We then used the CARET (Classification And REgression Training) package to evaluate the effect of model tuning parameters on performance and choose the “optimal” model across these parameters. A bootstrapping approach was used, splitting the cohort in five sub-cohorts, to cross-validate these models. We started with the three parameters from the MDS-CBC score and added the other parameters one by one to random forest (RF) models, considering parameters with the highest VIMP first. As expected, RF using three parameters (Ne-WX, ANC and MCV) showed similar performances to the MDS-CBC score (Fig. 2B, RF three parameters), with a sensitivity (Se) equal to 95% (94% for the MDS-CBC score) and a specificity (Sp) equal to 80% (81% for the MDS-CBC score). Adding IPF to the model dramatically increased Sp from 80 to 87% while maintaining high Se at 94% (Fig. 2B, RF four parameters). Adding the platelet count to this model slightly increased Se to 96% with a Sp still equal to 87% (Fig. 2B, RF five parameters). Unexpectedly, supplementing the model with hemoglobin level decreased performances with a Sp equal to 84% (Fig. 2B, RF six parameters).
From artificial intelligence to routine practice: a two-step approach
We then used the CARET package to design CART (Classification And Regression Trees). To obtain “simplified” trees, which could be easily used with the laboratory middleware, we only introduced MDS-CBC score and IPF into the model. Two CART were proposed with similar performances, one with three levels of decision (data not shown) and one with a two-step algorithm (Fig. 3A). Strikingly, the MDS-CBC score threshold proposed by these two algorithms for classification was 0.23, very close to the published threshold of 0.2 by Boutault et al. In the two-step algorithm, if MDS-CBC score was inferior or equal to 0.23, no additional testing or slide review was requested (311 patients including thirteen MDS patients). When MDS-CBC score was superior to 0.23, an IPF threshold equal to 3% was proposed by the machine-learning model to stratify patients (Fig. 3A). 64 patients had an IPF inferior to 3 including only thirteen MDS patients, whereas 142 of 150 patients with an IPF superior or equal to 3 were MDS patients (Fig. 3A). Se of this model was 84.5% (95%CI: 78.3–89.2) and Sp 97.8 (95%CI: 95.6–98.9), whereas positive predictive value (PPV) and negative predictive value (NPV) were equal to 94.7 (95%CI: 89.8–97.3) and 93.1 (95%CI: 90–95.2), respectively. Among the thirteen MDS patients with a score < 0.23, 10 had other criteria for slide review (analyzer flag or ANC < 1.5 × 109/L or Hb < 80 g/L or platelets < 100 × 109/L) as well as eight of the thirteen patients with a score ≥ 0.23 and IPF < 3%. At the end of this diagnostic work-up, 349 of non-MDS (97.8%) and 160 of MDS patients (95.2%) were correctly classified (Fig. 3B). Since machine-learning models do not integrate medico-economic considerations, we wondered if we could determine a threshold beyond which IPF would have no significant benefit, thus allowing saving on PLT-F reagent. Plotting histogram frequency of diagnosis (Fig. 3C) showed that in patients with a MDS-CBC score equal or superior to 0.6 (22% of the cohort), most patients were MDS patients (91%). Considering that the sensitivity of MDS-CBC score was excellent above the 0.6 threshold, we therefore proposed an alternative two-step algorithm illustrated in Fig. 3D. If the MDS-CBC score was inferior to 0.23, there was no indication for slide review for MDS suspicion. A score between 0.23 and 0.6 triggered PLT-F on the analyzer for measurement of IPF, then if IPF was below 3% there was no indication for slide review, if IPF was equal to or superior to 3%, a slide review was required for suspicion of MDS. As previously mentioned, MDS-CBC scores superior or equal to 0.6 were highly predictive of MDS and needed a slide review. This new algorithm, the “extended MDS-CBC (e-MDS-CBC) score”, had high sensitivity: 88.7 (95%CI: 83–92.6) and specificity: 95.8 (95%CI: 93–97.4) with a PPV of 90.8 (95%CI: 85.5–94.4) and a NPV of 94.7 (95%CI: 91.9–96.6) with a reasonable cost. Six MDS patients had a score between 0.23 and 0.6 and IPF inferior to 3, none had a flag on the analyzer but three had other slide review criteria. At the end of this diagnostic work-up, 342 of non-MDS (95.8%) and 162 of MDS patients (96.4%) were correctly classified (Fig. 3E).