Postoperative nomogram to predict the probability of metastasis in Enneking stage IIB extremity osteosarcoma

Background Metastasis is the most crucial prognostic factor in osteosarcoma. The goal of this study was to develop a new nomogram to predict the probability of metastasis in Enneking stage IIB extremity osteosarcoma after neoadjuvant chemotherapy and limb salvage surgery. Methods We examined medical records of 91 patients who had undergone surgery between March 1994 and March 2007. A nomogram was developed using multivariate logistic regression. The nomogram was validated internally by bootstrapping-method (200 repetitions) and externally in independent validation set (n = 34). A Youden-derived cutoff value was assigned to the nomogram to predict dichotomous outcomes for metastasis. Results The nomogram was built from four predictors of tumor site, serum alkaline phosphatase, intracapsular extension, and Huvos grade, and an additional clause that the cutoff value should be added to the total points in the cases of incomplete surgical resection. P-value of Hosmer and Lemshow Goodness-of-fit test of this model was 0.649. Area under receiver operating curve values of 0.83 (95% confidence interval [CI], 0.75 to 0.92) in the training set and 0.80 (95% CI, 0.63 to 0.96) in the validation set were obtained. The accuracy of dichotomous outcomes was 79.1% (95% CI, 0.69 to 0.86) and 82.4% (95% CI, 0.63 to 0.92) in the training and validation sets. Conclusions We have developed a new high-performance nomogram to predict the probability of metastasis in Enneking stage IIB extremity osteosarcoma after limb salvage surgery.


Background
Although osteosarcoma is a rare disease, it is the most common primary malignant bone tumor. Prior to 1970, the oncologic outcomes of osteosarcoma were extremely poor with only a 10-20% overall survival rate despite aggressive surgery. The overall survival rates of osteosarcoma have dramatically increased to approximately 65-75% with the establishment of multidisciplinary treatments [1]. The Enneking staging system and American Joint Committee on Cancer (AJCC) are used to classify osteosarcoma according to prognosis primarily based on histologic grade and metastasis at diagnosis [2,3]. In addition to the factors used for clinical staging, many other clinical factors have been reported to be prognostic factors for osteosarcoma such as age, [4] tumor location, [5][6][7] serum markers such as alkaline phosphatase (ALP) [8] and lactate dehydrogenase (LDH), [9] pathologic fracture, [10] histologic type, [11] and histologic response to neoadjuvant chemotherapy [12]. Molecular markers of prognosis in osteosarcoma have also been reported including ezrin, chemokine receptor 4, and P-glycoprotein [13]. Because no single factor can accurately predict prognosis, statistical prediction models to integrate the cumulative effects of individual prognostic factors are required for more precise prognosis predictions. Nomograms have been proposed as a new and alternative tool to traditional staging systems for predicting prognosis in a variety of cancers [14]. A few nomograms have been reported for soft tissue sarcoma [15,16] and osteosarcoma [17].
Although multidisciplinary approach has dramatically improved survival in osteosarcoma, the presence of metastasis makes this a challenging disease to cure, for survival rates of osteosarcoma with metastasis are of approximately 20% [18]. On the other hand, osteosarcoma without metastasis can be cured and most osteosarcoma patients without metastasis live a long and healthy life. Therefore, the accurate prediction of an individual patient's probability of metastasis is important. The purpose of this study was to develop a new nomogram to predict the probability of metastasis in Enneking stage IIB extremity osteosarcomas, which rank the majority of osteosarcoma cases.

Patients
We searched and retrospectively reviewed the medical records of Enneking stage IIB extremity osteosarcoma patients who had undergone surgery between March 1994 and March 2007 (cohort 1) at Severance Hospital (Seoul, Korea). This study was done under Severance Hospital Institutional Review Board-approved protocol. We restricted the inclusion criteria for the training set to the patients who had undergone standard therapy (neoadjuvant chemotherapy, definitive surgery, and adjuvant chemotherapy) and limb salvage surgery that was performed by the same surgeon. Of the 140 patients identified, 108 patients were enrolled in the study. Of the 108 patients, 91 and 17 patients were included in the training and validation sets, respectively, according to the inclusion criteria. An additional 17 patients who had undergone surgery between April 2007 and July 2011 (cohort 2) at Severance Hospital were included in the validation set ( Figure 1). The clinical characteristics of the training and validation sets are listed in Table 1. The overall 5-year survival rate of the training set was 70.3%. The proportions of patients with metastasis in the training and validation sets were 37.4% and 50%, respectively. Because the follow-up period of cohort 2 (with the longest follow-up period of 7 years) was much shorter than that of cohort 1 (with the longest follow-up period of 19 years), fewer patients with 5-year continuously disease free (CDF)  status after definitive surgery and 5-year no evidence of disease (NED) status after last metastasectomy were enrolled in cohort 2 than cohort 1, which led to quite a difference in the proportions of patients with metastasis.
No patients received radiation therapy at the primary tumor site. Only seven patients in the training set received palliative radiation therapy on the metastatic lesions. All the patients received neoadjuvant chemotherapy. Sixty-five patients were treated with doublet of intra-arterial cisplatin (DDP) and doxorubicin (ADR), fifty patients were treated with triplet intra-arterial DDP, ADR, and ifosafamide (Ifos). Ten patients were treated with other regimens: five patients with ADR and intravenous DDP; four patients with ADR, intravenous DDP, and methotrexate (MTX); and one patient with VP-16, Ifos, and MTX. Huvos grade, disease-free survival, and overall survival were not significantly different between doublet and triplet regimens in our cohorts [19].

Developing the nomogram
We identified candidate predictors of metastasis using the χ 2 test and performed multivariate analysis of a variety of suggested candidates ( Table 2). Among these candidates, we chose the parameters for a nomogram that were statistically significant and developed a weighted nomogram. The association between these parameters and metastasis was evaluated using multivariate logistic regression analysis. A nomogram was developed on the basis of the multivariate logistic regression model using tumor site, ALP at diagnosis, intracapsular extension, and Huvos grade. The goodness-of-fit of the nomogram was calculated using the Hosmer-Lemeshow test.

Definitions of the parameters for each predictor in the nomogram
The parameters of all predictors were divided into two prognosis groups, good or poor.

Tumor site
Tumors located along the distal femur, proximal tibia, and proximal humerus were regarded as the good prognosis group and those at other locations were regarded as the poor prognosis group. In addition, tumors along the distal femur, proximal tibia, and proximal humerus with a longitudinal size that it exceeded the isthmus of the affected bone (more than half the entire length of the affected bone) were categorized in the poor prognostic group.

Intracapsular extension
Intracapsular extension was regarded as the poor prognosis group. Intracapsular extension of the tumor was defined not only as direct penetration of the articular cartilage but also as the involvement of intracapsular and extrasynovial structures. Diagnosis of intracapsular extension by MRI, whether positive or negative, was confirmed by gross pathology.

Serum ALP levels at diagnosis
Normal level of alkaline phosphatase (ALP) was regarded as the good prognosis group. The serum ALP levels were measured in international units (IU), and the activity of ALP was estimated by the p-nitrophenyl phosphate method. ALP ranges of 60.0-300.0 IU/L for patients ≤14 years and 38.0-115.5 IU/L for patients > 15 years were considered normal.

Response to neoadjuvant chemotherapy
Responses to neoadjuvant chemotherapy were graded on the basis of the amount of tumor necrosis in the resected specimen. More than 90% tumor necrosis was regarded as a good response; a cut-off of 90% tumor necrosis is usually used to distinguish good and poor responders. Good response was categorized in the good prognosis group.

Surgical resection
Surgical resection was assessed by resection margin from pathology not surgical margin. Free of tumor (R0) was defined as complete surgical resection, while positive margins microscopically (R1) and macroscopically (R2) were defined as incomplete surgical resection. Complete surgical resection was regarded as good prognosis group.

Statistical analysis
The performance of our nomogram was evaluated internally and externally for discrimination and calibration. Discrimination was evaluated by the area under receiver operating characteristic curve (AUC) for both the training set (N = 91) and the external validation set (N = 34). A 95% confidence interval (CI) was calculated for each AUC. Calibration plots were obtained from bootstrapping (200 repetitions) of the training and validation sets.
To improve the clinical practicality of the nomogram, we assigned a cutoff value, derived from the Youden index, to the nomogram to allow for the prediction of dichotomous outcomes for metastasis. Nomogram performance in predicting dichotomous outcomes was also evaluated in the training and validation sets by two-way contingency table analysis. A 95% CI was calculated for each indicator.
All statistical analysis were performed using SPSS (version 20.0, SPSS, Inc., Chicago, IL, USA), SAS (version 9.2, SAS Institute Inc., Cary, NC, USA), and R (version 2.9.1, The R Foundation for Statistical Computing, Vienna, Austria). All P values were two-tailed, and a P value < 0 .05 was considered significant.

Nomogram development and validation
Six factors of tumor site, ALP level at diagnosis, intracapsular extension, Huvos grade, histologic type, and surgical resection were identified as prognostic factors for metastasis ( Table 2). The odds ratios for metastasis were calculated for these and are shown in Table 3. The odds ratio of surgical resection was beyond compute, because all the cases with incomplete surgical resection had undergone metastasis. Huvos grade and histologic type were strongly correlated and confounded the multivariate analysis. Therefore, surgical resection and histologic type were excluded from the prediction model. On the basis of multivariate logistic regression analysis, we built a nomogram using tumor site, ALP level at diagnosis, intracapsular extension, and Huvos grade as the predictors (Figure 2A). The P value of the Hosmer-Lemeshow test for the prediction model was 0.65, which indicated the good statistical fit of the model. AUC values of 0.83 (95% CI, 0.75 to 0.92) and 0.80 (95% CI, 0.63 to 0.96) were obtained in the training and validation sets, respectively ( Figure 2B and C). The calibration plot for the training and validation sets is shown in Figure 2D and E, respectively. The bootstrapcorrected AUC was 0.81. There was no significant difference among the three AUC values, which suggested that the discrimination of the nomogram could be reproducible in other populations. The calibration plots showed that the nomogram predicted probabilities were slightly lower than the actual probabilities.

Cutoff value for dichotomous outcomes
Nomograms show the probability of metastasis as a percentage; however, dichotomous outcomes for metastasis are likely to be a user friendly option in practice. Therefore, we assigned a Youden-derived cutoff value to the nomogram. The cutoff value was a total of 123 points, which was equal to a predicted probability of 0.36. The combined score of the two poor prognosis parameters with the lowest scores was more than the cutoff value. Therefore, the dichotomous decision for metastasis is positive whenever any two of the four predictors are classified as poor group.
The relative risk comparisons for the predictors showed that surgical resection was a very strong prognostic factor (Table 3). However, surgical resection had to be excluded from the nomogram for statistical reasons because all six cases with an incomplete surgical margin showed metastasis: Odds ratios are calculated as the probability of metastasis/(1-the probability of metastasis). Therefore, for these cases, the probability of metastasis would be 100%, and the odds ratio would not be mathematically calculable, as the denominator would be zero. To overcome this problem, we imposed an additional clause on the nomogram that the cutoff value should be added to the total points in the cases of incomplete surgical resection. Consequently, all the cases with incomplete resection margin were always metastasis positive in the dichotomous outcomes.
The performance of the nomogram in predicting dichotomous outcomes for metastasis was validated by two-way contingency table analysis ( Table 4). The accuracy of the nomogram in predicting dichotomous outcomes for metastasis was 79.1% (95% CI, 0.69 to 0.86) in the training set and 82.4% (95% CI, 0.63 to 0.92) in the validation set. Although the nomogram predicted probabilities were lower than the actual probabilities, dichotomous outcomes showed only a few false negatives in both sets and high negative predictive values in the training set (88.0%; 95% CI, 0.79 to 0.95) and validation set (77.8%; 95% CI, 0.60 to 0.87), which implies that the cutoff value was still effective under underestimated conditions. These results suggested that the performance of dichotomous outcomes could be generalizable to other populations. The introduction of a cutoff value to the nomogram was advantageous on three counts: to increase clinical convenience and practicality, to allow the integration of surgical resection into the nomogram, and to compensate for the underestimation of actual probabilities.

Discussion
To construct a nomogram with better performance, it is more advantageous to use a large training set and many prognostic factors with strong correlations to an event. On the other hand, inclusion of too many predictors compared to size of training set and overly complicated parameters of predictors are likely to result in an overfitted prediction model. Osteosarcoma is a rare disease and only a few well-validated prognostic factors for metastasis have been identified, which is likely to make prediction model overfitted. To overcome this and increase statistical simplicity of the nomogram, we limited the numbers of predictors used to build the nomogram according to the guidelines of Harrell [14]. In addition, we divided the parameters of all predictors into only two prognosis groups, good or poor. Whether the performance of the nomogram is reproducible in other populations is more important than overfitting. We validated the reproducibility of our nomogram in external validation set, which was heterogeneous to the training set with respect to surgeon factor and surgery type (limb salvage or amputation). The validation results suggested that our nomogram could be generalizable to other patient populations, including populations with amputation rather than limb salvage surgery. It has been a general consensus that the prognosis of osteosarcoma with axial and proximal locations is poorer than that of osteosarcoma with distal locations [5,12]. However, the prognosis of osteosarcoma with proximal humeral locations is controversial [6,7]. Because the results of our study were similar to those reported by Meyers et al., osteosarcomas with proximal humeral location were classed as good prognosis group in our nomogram.
Although the effective cutoff range is still uncertain, tumor size has been reported as a definitive prognostic factor in osteosarcoma [20,21]. Although the cutoff of 8 cm in maximal tumor diameter was not a prognostic factor for metastasis in our study, we integrated tumor size into our nomogram for clinical considerations. We integrated the effect of large tumor size into tumor site by defining large tumors exceeding the isthmus of the affected bone (more than half of the entire length of the affected bone) as the poor prognosis group, as one would expect that such a large tumor would show a poor prognosis. As a result, very large tumors were classified as poor prognosis group despite their primary location.
Tumor invasion of the joints with direct penetration through the articular cartilage are expected to be rare in osteosarcoma because articular cartilage acts as a strong barrier to tumor invasion. However, it has been reported that intracapsular and extrasynovial involvements are common in osteosarcoma [22,23]. Tumors can extend under the joint capsule and make contact with the peripheral margin of the articular cartilage. In the case of knee joints, tumors can also extend through or around the osseoustendinous junction of the cruciate ligaments. We defined intracapsular extension of the tumor as extension into the intracapsular and extrasynovial structures as well as the penetration through articular cartilage by tumors. The use of MRI to identify intracapsular extension is limited because its high sensitivity makes it difficult to distinguish peritumorous inflammatory changes and edema from the tumor itself, which results in false-positives [24]. To overcome this, we confirmed intracapsular extension by MRI and gross pathology.
Complete surgical resection of tumor has also been regarded as a definitive prognostic factor of osteosarcoma. However, it may be questionable to assign a cutoff value for incomplete surgical resection because the strength of the association between incomplete surgical resection and metastasis has not been proven quantitatively. Inadequate surgical margin (marginal and intralesional margin) had a relative risk of approximately 1.4 for event-free survival or metastasis when compared to adequate surgical margin (radical and wide margin) [25,26]. On the basis of these data, the importance of incomplete surgical resection is likely to be highly underestimated if it is not taken into consideration that residual tumor is not retained in all marginal margins. In fact, osteosarcoma with incomplete surgical resection to retain macroscopic residual tumor showed a 5-year survival rate of only 15% and a relative risk for overall survival of 3.60 in the multivariate analysis when compared to complete surgical resection, which was higher than the relative risks of metastasis positive at presentation [12]. We obtained similar results in our study, although all the incomplete surgical resection cases in our study were microscopically margin positive.
As survival rates of osteosarcoma increase, the prognoses of individual patients become of greater interest. AJCC and Enneking staging system have been used to classify prognostic groups after initial assessments. However, high grade osteosarcoma shows a clinical course so heterogeneous during treatment that the prognoses of individual osteosarcomas may widely vary, even if their initial stages, such as AJCC classification or Enneking system, are the same. Therefore, a nomogram may be useful in the management of osteosarcoma to realize personalized prognoses. Survival rates of osteosarcoma with metastasis are approximately 20% and early detection and aggressive metastasectomy should be considered to increase survival rates of patients with metastasis [18]. Accordingly, distinguishing patients at high risk for metastasis according to the nomogram and swift management of metastatic lesions may comtribute to improvement in survival rates forosteosarcoma.
Our nomogram had several limitations. First, our training set was relatively small and had a deviated composition of Asian. In addition, our validation set was quite small and showed a higher proportions of patients with metastasis than those of natural populations, as considerable number of patients with CDF and NED status at less than 5 years were excluded from cohort 2 due to a short follow-up period. The generalizability of our nomogram should be validated in larger populations with a natural proportion of patients with metastasis. Second, our nomogram underestimated actual probabilities presented as percentage. To avoid inaccurate predictions, dichotomous outcomes should be considered because it was less affected by underestimation. Third, the predictors used to construct our nomogram were confined to clinical factors and could not include molecular markers. Fourth, our nomogram cannot predict the time when metastasis occurs because it was based on logistic regression and not Cox regression. A positive dichotomous decision for metastasis without any indication of time of occurrence may be unnerving to patients and doctors.

Conclusions
We have developed a new postoperative nomogram with high performance and generalizability to predict the probability of metastasis in Enneking stage IIB extremity osteosarcoma. Development of this nomogram will contribute greatly to individualized risk assessments for metastasis in osteosarcoma.