In this study, we successfully validated two Bayesian models previously trained to estimate the likelihood of survival at two time points that are useful for orthopaedic surgical decision making. Importantly, despite differing patient populations and varying amounts of missing data, the BETS-3 and BETS-12 models accurately classified post-operative survival at clinically useful 3- and 12-month time points.
The models performed well, despite significant differences between patients in the training and validation sets (Tables 1 and 2). Scandinavian patients in the validation set were slightly older (median age, 67.0 years [total range 23.0-96.0; interquartile range, 58.0, 76.0]) than those in the training set (median age, 62.7 years [total range 20.0-92.0; interquartile range, 54.4, 72.2]) (P = 0.0002). They were also nearly twice as likely to be treated for a completed pathologic fracture, as opposed to undergoing prophylactic surgery for an impending pathologic fracture (P < 0.0001). This may explain the significantly lower proportion of these patients surviving longer than 12 months (P = 0.002). Nevertheless, there were significantly higher proportions of Scandinavian patients in the more favorable diagnosis group (Group 3; P = 0.001) and in the more favorable ECOG performance status categories (ECOG score 0, 1, and 2; P < 0.001). However, there was no significant difference in 3-month survival between patients in the validation set and those in the test set (P = 0.78). The distributions of visceral, lymph node, and skeletal metastases also differed between the two patient populations, but this may be largely due to the proportion of missing data in the validation set.
The performance of these models is important, clinically, because inaccuracies generated by the models are not of equal significance. For example, BETS-3 was designed to identify patients that are likely to live at least 3 months who would then derive some benefit from surgery. If survival is overestimated by BETS-3, and the patient does not live at least 3 months, then the surgery may have been unnecessary. Our data show that 15.3% of records were misclassified by BETS-3 and survival overestimated, which translates to 125 potentially unnecessary surgeries performed at the end of life (Figure 3). Of these, 38 (30.4%) survived less than 1 month, 44 (35.2%) survived between 1 and 2 months and 43 (35.2%) survived between 2 and 3 months. Of course, these data do not distinguish patients who died of perioperative complications that might have been independent of the progression of disease, and in whom surgery was still the best option. Thus, surgery was still appropriate for many of the patients for whom survival was overestimated, and 15.3% represents the maximum proportion of patients who may otherwise have been spared surgery in the care of their terminal illness.
In contrast, the BETS-12 model was designed to identify patients that are expected to live 12 months or longer. This was done in an effort to help support decisions regarding the type of procedure required, as well as the durability of the implant. For example, a surgeon’s decision to perform a less invasive procedure using a less durable implant such as an intramedullary nail is supported by a low likelihood of survival at 12 months generated by the BETS-12 model. If survival is underestimated, and actual patient survival exceeds 12 months, then the chosen construct may not have sufficient durability to outlast the patient. Our results suggest that 7.6% of records may be underestimated by the BETS-12 model and misclassified in this fashion. Clinically, this represents a maximum of 62 cases at risk for implant failure that may ultimately need revision surgery (Figure 4). However, the median survival for this group of misclassified patients was 18 months [total range 12.0-73.0; interquartile range 13.8, 25.3], with 17 patients surviving longer than 24 months and only 5 surviving longer than 36 months. As such, relatively few patients, in whom the BETS-12 model underestimated survival, may have actually require revision surgery for implant failure.
Clinicians have long been interested in estimating and modeling survival in patients with metastatic cancer. For example, Bauer and Wedin  evaluated survival after orthopaedic stabilization in 241 patients with skeletal metastases. They found that 7 variables were independently associated with survival. Negatively associated prognostic variables included pathologic fracture, visceral or brain metastases, and a diagnosis of lung cancer, whereas positively associated variables included solitary skeletal metastases and diagnoses of lymphoma, myeloma, breast, or kidney carcinoma. Later, after retrospectively analyzing the records of 460 similar patients, the same group identified hemoglobin concentration as another negative prognosticator and discriminator of short-term survival . Their work demonstrated that it was possible to make generalized estimations of survival based on disease-related and laboratory parameters; however, an accurate, individualized estimation of survival in this patient population was not possible using this method.
In an attempt to generate a prognostic tool useful for surgical decision-making, Tokuhashi et al.  developed a scoring system by which survival could be categorized into one of three groups: <6 months, >6 months, or >1 year. Focusing on only patients with symptomatic spine metastases, the authors collected a series of variables including, for the first time, Karnofsky performance status . Other variables included were the number of extra- and intraspinal bone metastases, the number and type (resectable/nonresectable) of organ metastases, the primary oncologic diagnosis, and the degree of neurologic impairment. The group later applied their scoring system to 246 patients and found that survival greater or less than 6 months could be reliably estimated using this method . Independent validation produced similar results ; however, this scoring system applies only to patients with symptomatic spine metastases.
Recognizing the value of a prognostic model that could be applied to all patients with skeletal metastases, Nathan et al.  evaluated 191 patients undergoing orthopaedic stabilization for both spine and extremity lesions. In addition to demographics, disease-specific information, and performance status , Nathan et al. also included a series of laboratory parameters as candidate variables. A regression-derived nomogram was developed using eight independent predictors of survival. This nomogram performed well in a small test set, but, to our knowledge, no external validation has been attempted.
We chose to use a Bayesian classifier for a variety of reasons. First, we assumed that there are, in the setting of patients with skeletal metastases, verifiable relationships between various prognostic features. The Bayesian method not only generates a joint distribution function describing the probabilistic relationships between features, but it also displays it graphically in an intuitive, transparent manner. This allows the clinician to better understand the hierarchy, and relative importance, of each feature (Figures 1 and 2) within each model. Second, Bayesian networks can account effectively for uncertainty within the data, and can thus be used in the setting of incomplete or missing input data . This is a significant advantage over the traditional nomogram, when one considers that three of the first- and second-degree associates of survival—the surgeon’s estimate of survival, the absolute lymphocyte count, and the presence of lymph node metastases—were largely missing from the validation set. More importantly, the Bayesian method mimics human reasoning by updating beliefs in response to new evidence . Thus, Bayesian models can be “improved” from time to time as new evidence becomes available, be it emerging patterns of disease or more effective treatment modalities. We acknowledge, however, that additional, prospective data collection is required to fulfill this goal, and we are committed to this ongoing investigation.
The BETS models discussed in this paper are clinical decision support models; their output is designed to support (not replace) good clinical judgment. The goals of surgery in patients with skeletal metastases are to relieve pain and to restore function for the maximum amount of time. Because surgery intended to relieve pain or stabilize pathologic fractures is often indicated in patients despite a very short life expectancy, a low probability of survival generated by the BETS-3 model should not be used to deny patients a palliative intervention. On the contrary, if a less invasive/less durable intervention is planned, low probabilities of survival generated by the BETS-3 and BETS-12 models would support this decision.
This study has several limitations. First, the BETS models were developed and validated using only patients who underwent orthopaedic surgery for their skeletal metastases. Thus, they are not applicable to all patients with metastatic disease or those in whom skeletal metastases were treated nonoperatively. Second, the Scandinavian patient population used for validation was well characterized and relatively homogeneous, but the generalizability of these models depends on their performance in a variety of patient populations with differing institutional biases and treatment philosophies. Finally, we believe that there is always room for model improvement, particularly when longer survival estimates are needed. Additionally, the current models are relatively optimistic, and additional covariates should be sought to help identify which patients may die earlier than expected as well as to better identify patients at risk for perioperative death. A prospective trial is currently under way to evaluate new prognostic features that may help estimate the likelihood of individual patient survival at these and other time points. Finally, the acceptance of clinical decision-support tools, such as these, depends not only on validation in additional populations, but also on how the end-user judges its availability and ease of use. It is difficult, if not impossible, to represent this classifier on paper so that other researchers may use it. To address this problem, we developed an “app” that will make this tool widely available for such a purpose.