- Research article
- Open Access
- Open Peer Review
Non-linear transformations of age at diagnosis, tumor size, and number of positive lymph nodes in prediction of clinical outcome in breast cancer
BMC Cancer volume 18, Article number: 1226 (2018)
Prognostic factors in breast cancer are often measured on a continuous scale, but categorized for clinical decision-making. The primary aim of this study was to evaluate if accounting for continuous non-linear effects of the three factors age at diagnosis, tumor size, and number of positive lymph nodes improves prognostication. These factors will most likely be included in the management of breast cancer patients also in the future, after an expected implementation of gene expression profiling for adjuvant treatment decision-making.
Four thousand four hundred forty seven and 1132 women with primary breast cancer constituted the derivation and validation set, respectively. Potential non-linear effects on the log hazard of distant recurrences of the three factors were evaluated during 10 years of follow-up. Cox-models of successively increasing complexity: dichotomized predictors, predictors categorized into three or four groups, and predictors transformed using fractional polynomials (FPs) or restricted cubic splines (RCS), were used. Predictive performance was evaluated by Harrell’s C-index.
Using FP-transformations, non-linear effects were detected for tumor size and number of positive lymph nodes in univariable analyses. For age, non-linear transformations did, however, not improve the model fit significantly compared to the linear identity transformation. As expected, the C-index increased with increasing model complexity for multivariable models including the three factors. By allowing more than one cut-point per factor, the C-index increased from 0.628 to 0.674. The additional gain, as measured by the C-index, when using FP- or RCS-transformations was modest (0.695 and 0.696, respectively). The corresponding C-indices for these four models in the validation set, based on the same transformations and parameter estimates from the derivation set, were 0.675, 0.700, 0.706, and 0.701.
Categorization of each factor into three to four groups was found to improve prognostication compared to dichotomization. The additional gain by allowing continuous non-linear effects modeled by FPs or RCS was modest. However, the continuous nature of these transformations has the advantage of making it possible to form risk groups of any size.
Prognostic and treatment predictive factors in breast cancer (e.g. number of positive lymph nodes, age at diagnosis, tumor size, estrogen receptor (ER) and progesterone receptor (PgR), histological grade, and human epidermal growth factor receptor type 2 (HER2)) can predict clinical outcome and hence facilitate treatment choice [1, 2]. These factors can either be used individually or combined in indices such as e.g. the Nottingham Prognostic Index , CancerMath.net, Adjuvant! Online (http://cancer.lifemath.net)  or the St Gallen subtypes . Prognostic factors are often continuous or measured on an integer-valued scale, but categorized for clinical decision-making. This application of prognostic factors in breast cancer has a long history dating back to the invention of the TNM classification system. Categorization of prognostic factors is intuitively appealing, since the clinically relevant question is often to select between a limited number of treatment modalities, but categorization of individual factors is not necessary for construction of useful prediction models . On the contrary, numerous authors have discussed its negative consequences [7,8,9]. Categorization will in general lead to loss of information and hence lower power to detect true associations to prognosis and/or treatment response. To use dichotomized factors in prognostic models corresponds to assuming threshold effects and such effects are often biologically implausible. The use of multiple cut-points per factor, like e.g. T0 to T3 for tumor size in the TNM system is a step in the right direction, but how should cut-points be chosen for new prognostic factors? Optimal cut-offs, maximizing the prognostic value of a new factor in a specific dataset, will in general lead to biased effect estimates, even though methods have been designed to deal with this problem . To avoid bias, pre-defined percentile-based cut-offs can be used, but different percentiles might be prognostically useful for different factors.
In survival analysis, the most commonly used model for analysis of multiple prognostic markers is the Cox proportional hazards regression model. In its simplest form, this model assumes constant, i.e. time independent, linear covariate effects on the log hazard scale or equivalently multiplicative effects on the hazard scale (proportional hazards). The log hazard is hence assumed to increase or decrease with the same constant additive factor for each step on the scale of the covariate, e.g. for each year of age at diagnosis of breast cancer. One way of relaxing this strong and often biologically unrealistic assumption of linear covariate effects is to use fractional polynomial (FP) transformations [11,12,13,14]. Transformations of this kind are useful when one wishes to preserve the continuous nature of the covariates in a regression model, but suspects that some of the effects may be non-linear. By taking non-linearity into account, more prognostic information will be extracted, which might have important clinical implications. A limited number of studies have addressed this question. Sauerbrei et al. have evaluated the use of FP transformations in Cox modelling of recurrence-free survival in a lymph node-positive breast cancer data set from the German Breast Cancer Study Group . They conclude that analysis using FP transformations can extract important prognostic information which the traditional approaches may miss. More recently, Ejlertsen and co-workers have used FP transformations of age, tumor size, number of positive lymph nodes, and percentage of ER-positive nuclei, when developing a model for prediction of excess mortality after adjuvant endocrine therapy . Compared to models with categorized predictors, models with FP transformations could better identify patients without excess mortality compared to the general population . Another frequently used option is to model potential non-linear covariate effects on outcome using restricted cubic splines (RCS) [17, 18].
The primary aim of this study was proof of principle, i.e. to evaluate if accounting for non-linear effects of the three factors age at diagnosis, tumor size, and number of positive lymph nodes improves prognostication; factors which will be utilized also in the future after an expected implementation of gene expression profiling in clinical routine. Our hypothesis was that by keeping the predictors continuous as long as possible during the modeling process, prognostication would be improved.
Materials and methods
Included were 4568 women with primary breast cancer originating from four multicenter randomized controlled trials in stage II breast cancer (Patient materials I–IV; see below) and two prospectively followed cohorts (Patient materials V–VI; see below), more information in Additional file 1: Table S1. Patients were excluded due to missing information on follow-up, number of positive lymph nodes, and/or tumor size (Fig. 1), rendering 4477 patients included in the present paper. The endpoint was defined as distant recurrence-free interval (D-RFi) according to the DATECAN initiative (Definition for the Assessment of Time-to-event Endpoints in CANcer trials)  and the follow-up was restricted to ten years after diagnosis.
Patient material I
Patients were enrolled between 1978 and 1983 in a randomized controlled trial from the South Swedish Breast Cancer Group. The purpose of the trial was to evaluate the effect of chemotherapy (cyclophosphamide) and radiotherapy alone and in combination, in breast cancer women treated with modified radical mastectomy and axillary clearance. The original trial included 387 patients .
Patient material II
Patients were enrolled between 1978 and 1985 in a clinical study in the South Swedish Health Care Region, where postmenopausal patients were randomized to evaluate the effect of endocrine therapy (tamoxifen (TAM), one year) and radiotherapy alone and in combination. The original trial included 668 patients [21, 22].
Patient material III
Premenopausal patients were enrolled, between 1986 and 1991, in a randomized controlled trial with the aim to compare the effect of two years of TAM treatment versus no adjuvant systemic treatment (only eight patients received chemotherapy). The original trial included 564 patients enrolled in the South and South-East Swedish Health Care Regions .
Patient material IV
Postmenopausal patients were enrolled, between 1983 and 1991, in a randomized controlled trial launched by the Swedish Breast Cancer group of two versus five years of adjuvant TAM. The original trial included 1107 patients from the South Swedish Health Care Region . The inclusion continued after the original paper was published, hence the greater number of 1553 patients included in the present paper.
Patient material V
The original study enrolled a consecutive series of 841 patients with primary breast cancer referred to Odense University Hospital, Denmark. Patients were enrolled between 1980 and 1990. The purpose was to investigate the prognostic value of estimating angiogenesis by Chalkley counting using a large population-based confirmatory study design .
Patient material VI
The original prospective observational study included 555 patients diagnosed with primary breast cancer from three hospitals in the South Swedish Health Care Region between 1999 and 2003. The purpose was to study the prognostic value of the presence of cytokeratin positive cells in bone marrow aspirates from the sternum .
The above patient materials (I–VI) constitute our derivation set, which has a median follow-up for D-RFi of 9.1 years.
Patient material VII
Between 1983 and 1999, a consecutive series of patients diagnosed with primary breast cancer in the Kalmar County, Sweden, were enrolled. The median follow-up time for distant recurrence-free survivors was 8.4 years.
The Kaplan-Meier method was used to estimate the primary endpoint, D-RFi , and the Cox proportional hazards model, stratified for patient material, for estimation of hazard ratios (HR) for groups formed by applying well-accepted pre-defined as well as percentile-based cut-offs. The relative effects of the factors in the Cox regression model are assumed to be constant, i.e. independent of time, assumptions which must be tested. Proportional hazards assumptions were checked graphically and by Schoenfeld’s test . To avoid problems with non-proportional hazards, especially for tumor size, the follow-up was restricted to the first ten years of follow-up after diagnosis. This restriction has the additional advantage that the median follow-up in the included patient materials will be about the same. The statistical analysis software Stata 15.0, 2017 (StataCorp, College Station, TX, USA) was used for statistical calculations. Whenever applicable, the REMARK recommendations for reporting of tumor marker studies were followed [28, 29].
Continuously varying non-linear effects on the log hazard were modeled using FP transformations [12, 13] in Cox regression models. To avoid over-fitting, a function selection procedure based on a closed test procedure was used which adds flexibility, i.e. extra polynomial terms, only if the model fit improves significantly at the chosen overall significance level after adjustment for multiple testing . The multivariable FP procedure (MFP; default settings in Stata), which is an extension of the function selection procedure based on FPs, was used to derive an FP-based prognostic index based on the three originally continuous or integer valued predictors . For simplicity, we will henceforth use the term continuous for both types of scales – the truly continuous used for age and tumor size and the integer valued used for number of positive nodes. In this paper significance testing was applied only during model selection within the FP procedure and then the alpha level was set at the default value 0.05.
Separate Cox-models were fitted for each of the three covariates in the derivation set. For each model, the MFP procedure was used to automatically select the best fitting transformation(s). Thereafter, predicted relative hazards were calculated for each factor and patient. These predictions were plotted versus each factor to graphically describe the functional form of the relationships. The following reference values were chosen for calculation of relative hazards: age 35 years, tumor size 20 mm, and 0 positive lymph nodes. 95% confidence intervals (CI) for the relative hazards were calculated using bootstrap resampling. Briefly, the model selection procedure and the corresponding calculation of relative hazards was replicated for 1000 bootstrap samples and the lower and upper limits were chosen as the 2.5 and 97.5% percentiles, respectively, for each covariate and observed value. The distribution of each factor is also shown in these graphs, both as dots along a line and as a kernel density estimate. The default options in Stata were used for kernel density estimation.
Non-linear covariate effects were also modeled using RCS. In brief, for each covariate, k so-called knots, was chosen which uniquely define k-1 polynomial transformations of the covariate. The definition of these transformations guarantees that any linear combination of the k-1 spline variables will be linear before the first knot, a piecewise cubic polynomial between adjacent knots, and linear again after the last knot. To avoid over-fitting, we decided to use five knots located at the 5th, 27.5th, 50th, 72.5th and 95th percentiles as recommended by Harrell . This definition was found to work for age and tumor size. For number of positive lymph nodes, a variable with almost 40% zeros, we chose to place the five knots at 1, 2, 3, 4 and 10 positive nodes.
The statistical models developed in the derivation set were tested in the validation set. Briefly, the transformations of the covariates, and the corresponding weights from estimation in the derivation set, were applied to calculate the value of a prognostic index for each patient in the validation set. Patients were then divided into risk groups based on this index to assess the discrimination in the validation set. A proper validation should assess both discrimination and calibration , but calibration could not be reliably assessed in the present study due to differences in the distribution of prognostic factors, treatments, calendar periods and length of follow-up.
Different measures of predictive performance and model fit for Cox proportional hazards model have been suggested in the literature . In the present paper, we have chosen Harrell’s Concordance index (C), which is a generalization of the area under the receiver operating characteristic curve (AUC) for survival data . It is defined as the fraction of all evaluable pairs of patients for which the patient with the best observed survival also has the lowest predicted hazard . Hence, the C-index will be 0.500 for a useless model and 1.000 for a model with optimal fit to the data.
Patient and tumor characteristics
During 10 years of follow-up, distant recurrences were recorded for 1315 of the 4477 patients (29%) in the derivation set. Median age was 60 years (range: 25–93), median tumor size 22 mm (range: 1–120 mm), and 40% were lymph node-negative. Endocrine treatment alone was given to 59%, chemotherapy alone to 10%, and chemo-endocrine therapy to 2% of the patients. Only five patients were given anti-HER2 treatment. Clinical and histopathological characteristics for the patients in the derivation set and in the separate patient materials are shown in Table 1 and Additional file 1: Table S1, respectively.
Age at diagnosis, when categorized into two groups (< 35 vs. ≥35 years), was associated to D-RFi, HR = 1.49 (95% CI: 1.04–2.13, C-index: 0.502). The corresponding HR for tumor size (> 20 vs. ≤20 mm) was 1.65 (95% CI: 1.47–1.86, C-index: 0.557), and for positive vs. negative lymph nodes 2.40 (95% CI: 2.11–2.73, C-index: 0.587). The respective Kaplan-Meier estimates of D-RFi are shown in Fig. 2 a-c.
Categorized predictors in three or four groups
Age was categorized in three groups (> 50, 35–50, and < 35 years at diagnosis), tumor size also in three groups (≤20 mm (T1), 21–50 mm (T2), and > 50 mm (T3)), and number of positive lymph nodes in four groups (no positive nodes (N0), 1–3 positive (N1–3), 4–9 positive (N4–9), and ≥ 10 positive lymph nodes (N ≥ 10)). The addition of an extra cut-off at 50 years and 50 mm for age and tumor size, respectively, lead to only minor increases in C-indices, from 0.502 to 0.507 for age and from 0.557 to 0.561 for tumor size. For number of positive lymph nodes, adding two additional cut-offs at 4 and 10 positive nodes lead to a more pronounced increase, from 0.587 to 0.652. The associations between these categorized variables and D-RFi are further illustrated in Fig. 2 d, e and f and in Additional file 2: Table S2.
Initial FP-modeling revealed a non-monotonic relationship between relative hazard of distant recurrences and tumor size. However, a sensitivity analysis showed this unexpected pattern to be caused by a small fraction (12/4477; 0.3%) of very small tumors (≤2 mm) with more distant recurrences than expected. After reviewing the original pathology reports, the registered tumor size for four of these patients was found to be wrong and was therefore corrected to sizes ranging from 20 to 25 mm. All results presented in this paper correspond to the corrected database.
In the final FP-analyses, non-linear effects were detected for tumor size and number of positive lymph nodes, but not for age, see Fig. 3a-c. The C-index for age, for which the linear identity transformation was chosen by the MFP procedure, was 0.513. A square root transformation provided best fit for tumor size, C-index 0.594, whereas a linear combination of two polynomial terms provided the best fit, according to the MFP procedure, for number of positive lymph nodes leading to a C-index of 0.665. A sensitivity analysis excluding the patient with the highest number of positive lymph nodes (N = 47), did not lead to a final model with fewer degrees of freedom.
Similar results were found with restricted cubic splines (Fig. 3d-f), except that the sensitivity to outliers was better handled by the restriction to linearity outside the most extreme knots. The corresponding C-indices were 0.516, 0.594, and 0.665 for age, tumor size and number of positive lymph nodes, respectively.
Model based on dichotomized predictors
The C-index for this model was 0.628, which should be compared to the C-indices for the corresponding univariable models, range 0.502–0.587. The estimated adjusted effects for the three factors were HR = 1.54 for age at diagnosis (< 35 vs. 35 ≥ years), 95% CI: 1.07–2.21, HR = 1.85 for tumor size (> 20 vs. ≤20 mm), 95% CI: 1.65–2.08, and HR = 2.60 for positive vs. negative lymph nodes, 95% CI: 2.28–2.96.
Model based on categorized predictors in three or four groups
Cox regression with age at diagnosis and tumor size in three categories, and number of positive lymph nodes in four categories further improved risk stratification, with a corresponding C-index of 0.674. For comparison with models based on FPs and RCS, see below. The predicted relative hazards, with 31 distinct values corresponding to the 31 actually observed of the 36 possible combinations (3x3x4) of the three categorized factors, were divided into four groups aiming at the 16th, 50th, and 84th percentile, following the recommendation by Royston and Altman . The closest possible fit to this recommendation for the present dataset resulted in the 15th, 44th, and 84th percentiles with 10-year D-RFi (95% CI): 88% (86–91), 75% (72–77), 65% (62–67), and 36% (33–40), respectively.
Models based on continuous predictors
Multivariable FP (MFP) and RCS transformations improved the C-index further, to 0.695 and 0.696, respectively. Second degree FPs, i.e. linear combinations of two polynomial transformations, were chosen by the MFP procedure for both tumor size and number of positive lymph nodes. However, a sensitivity analysis excluding the patient with the highest number of positive lymph nodes, (N = 47), revealed that the second polynomial term for number of positive nodes in the multivariable model was driven by this outlier. A single polynomial term, a log-transformation, for this variable would have been sufficient if this patient had been excluded. The C-index for this less complex MFP-model was 0.692.
Furthermore, the predictions from these two models, MFP and RCS, were divided into the four percentile-based subgroups mentioned above. For MFP, the 10-year D-RFi-Figures (95% CI) were 90% (87–92), 76% (74–78), 62% (59–65), and 36% (32–40), respectively, and for RCS: 89% (86–91), 76% (74–79), 62% (59–65), and 35% (32–39), respectively.
During 10 years of follow-up, distant recurrences were recorded for 289 of the 1132 patients in the validation set (Material VII). Median age was 64 years (range: 28–99), median tumor size 20 mm (range: 1–160 mm), and 58% were lymph node-negative. Endocrine treatment only was given to the same fraction of patients as in the derivation set, 59%, whereas chemotherapy only was less frequently administered, just 5%. Chemo-endocrine treatment was given to 3% of the patients (Table 1).
Four multivariable prediction models fitted in the derivation set were evaluated in the validation set (N = 1132); the models with dichotomized and categorized predictors (≥2 cut-offs), the MFP model and the RCS model. For each model, the same predictor transformations were applied in the validation set as in the derivation set and the weights estimated in the derivation set, i.e. the estimated log relative hazards, were used to calculate the values for the prognostic indices (PIs) for the patients in the validation set. These PIs rank the patients in the validation set from lowest to highest risk based on external data. Hence, unbiased C-indices could be calculated. The validation C-index for the model with dichotomized predictors was 0.675. As expected, the discrimination in the validation set turned out to be better for the model with categorized predictors, C = 0.700, but the extra gain by allowing FP- or RCS-transformations was almost negligible, C-indices 0.705 and 0.701, respectively.
For all the four models, the distribution of the PI was found to be shifted to the left, i.e. towards lower risk, in the validation set compared to the derivation set. This is in agreement with the patient characteristics presented in Table 1. Most notably, the fraction of lymph node-negative patients is higher in the validation set (58% vs. 40%). As an example, histograms of the PI-distributions in the derivation and validation datasets for the MFP-model are shown in Fig. 4.
The prognostic discrimination in the derivation and the validation set, respectively, of the models based on FP- and RCS-transformed predictors was further analyzed by calculation of HR:s for the four risk groups; G1 (reference), G2, G3, and G4, defined by cut-offs at the 16th, 50th, and 84th percentiles of the PIs in the derivation set, see Table 2. The corresponding risk groups, formed by applying the actual values of the PIs in the derivation set as cut-offs for the PIs calculated in the validation set lead to risk groups of other relative sizes in the validation set. Instead of 16/34/34/16 for G1/G2/G3/G4, the percentages in the four risk groups turned out to be 35/27/26/11 and 34/29/25/12 for MFP and RCS, respectively, again reflecting a shift towards lower risk in the validation set. The HR:s in the column ‘Validation’ of Table 2 reflect the discrimination of the prognostic models in an independent patient material. Briefly, the results from the FP- and RCS-modelling were comparable and as expected, the relative effect estimates were smaller, i.e. closer to 1.00, when the models fitted in the derivation set were applied to the validation set. The corresponding Kaplan-Meier estimates for the MFP-based model, see Fig. 5, show that the discrimination between G1 and G2 in the validation set is poor. However, G3 and G4 are well separated in the validation set and these risk groups are also well separated from G1 and G2. Note also that the calibration of the high-risk group is good, reflected by the almost completely overlapping survival estimates for the two data sets.
Using a large cohort comprising 5609 patients with D-RFi as primary endpoint, we detected non-linear relationships to the relative hazard of distant recurrences for tumor size and number of positive lymph nodes, but not for age at diagnosis. These findings were, however, found to be of minor importance for prognostication of 10-years D-RFi in the multivariable modeling with FP transformations, since, in contrast to what we expected, only a modest increase in C-index was obtained for the model based on continuous variables compared to the model with categorized predictors. In the derivation set, a model with age and tumor size in three categories and number of positive lymph nodes in four categories, was considerably better than the corresponding model applying dichotomized variables (C-index: 0.674 vs. 0.628). These findings support the way tumor size and number of positive lymph nodes are used in the clinical decision-making today. The putative non-linear effects of these variables seem to be sufficiently captured by increasing the number of cut-offs from one to two or three. The drawback is information loss and that categorization might lead to tied predictions for large groups of patients, prohibiting the possibility to create risk groups of any size desired. Similar results were obtained in the validation set. Furthermore, the HR:s comparing the prognosis in the four groups, based on the 16th, 50th, and 84th percentile of the prognostic index derived from the final MFP model were similar in the derivation and validation sets. The relative effect estimates were smaller when the models fitted in the derivation set were applied to the validation set. This could be explained by over-fitting to the derivation set.
In contrast to previous studies, [33, 34] we found no effect of age on D-RFi. This could be explained by that 33% (23/69) of the patients below the age of 35 have been treated with adjuvant chemotherapy compared to only 12% (511/4406) of the patients above 35 years. The importance of chemotherapy for the association between age and prognosis has been demonstrated by others [33, 34]. Another possible explanation for the non-existing age trend in the present study is that the fraction of patients below 35 years is lower in this study than previously reported for population based breast cancer series, diluting the power to detect a trend.
In contrast to our results, Ejlertsen and co-workers have shown that FP transformation outperformed the predictions based on categorized variables . This may be explained by that they also included the percentage of ER-positive nuclei in their algorithm and furthermore used a population-based and more homogenous derivation cohort of 6529 postmenopausal high-risk patients, all receiving five years of adjuvant endocrine therapy. Also, the study by Sauerbrei and colleagues concluded that FP extracted more prognostic information in a study only including patients with lymph node-positive breast cancer (N = 686; ).
We have used both MFP- and RCS-transformations to model potentially non-linear relationships to prognosis for the factors age at diagnosis, tumor size and number of positive lymph nodes. The results, as measured by the C-indices and the functional form of the relationships, were strikingly similar. These transformation methods have advantages and disadvantages, as discussed by Royston and Sauerbrei . FPs are more sensitive to outliers, but this can be handled for example by restricting the degrees of freedom for each factor. A single patient with 47 positive lymph nodes, which was the most extreme value observed in the derivation set, altered the shape of the estimated relationship. A sensitivity analysis revealed that the final prognostic model suggested by the MFP procedure had fewer degrees of freedom when this patient was excluded. RCS, on the other hand can lead to over-fitting  especially if many knots are used. The integrated automatic selection of variables and functional forms of these, implemented in the MFP procedure, gives some protection against over-fit, but to avoid capturing too much of nuances in the data set used for estimation, incorporation of prior knowledge should also be considered during the statistical modeling. Another, alternative modeling strategy is artificial neural networks (ANN), which was recently applied to a dataset, which largely overlaps with the derivation set in the present paper . The performance of ANN and Cox models were almost identical.
In an initial FP-modeling step, we revealed a non-monotonic relationship between the relative hazard of distant recurrences and tumor size. This was caused by incorrect values of tumor size for four patients in the very small subset of patients with tumors less or equal to 2 mm. This finding highlights the importance of the quality of the data. We have not been able to perform a complete examination of all figures in the database, but the study of Rydén et al., demonstrate good agreement for parts of the patient material included in the present study .
One limitation with the present study is that only the three factors age at diagnosis, tumor size, and number of positive lymph nodes are included. A clinically applicable model should include all prognostic factors in use, i.e. according to current guidelines also ER, PgR, HER2, and histological grade [2, 37]. Unfortunately, we did not have complete and standardized information for these additional factors. However, in an expected future situation, when different gene profiles have replaced single biomarker analyses, age, tumor size and number of positive lymph nodes will most likely still be included in clinical routine management of breast cancer patients, and therefore the results obtained with these three factors in the present work, should retain their value. Another limitation is that the derivation dataset is not population based, but rather consists of patients included in randomized controlled trials and well-defined cohorts from different geographical areas and time periods. A strength, on the other hand, is that the models fitted in the derivation set were successfully validated in an independent dataset, even though the validation set had a higher proportion of N0 compared to the derivation set. This suggests that the results are generalizable and robust. The discrimination was found to be better for high-risk patients than for patients whose prognostic factors indicated lower risk. Differences between the derivation set and the validation set in this study can explain the sub-optimal performance of the prediction models, but perfectly matching dataset are hard to find and it is desirable that the performance of prognostic models is good also in datasets with slightly different characteristics. Future studies aiming at clinically useful models, should be thoroughly assessed for both discrimination and calibration in external datasets, see  for details.
In conclusion, categorization of age at diagnosis, tumor size, and number of positive lymph nodes into three to four groups was found to improve prognostication compared to dichotomization. The additional gain by allowing continuous non-linear effects modeled by FPs or RCS was modest – a finding in line with the famous statistician John Tukey’s advice of parsimony .
Artificial neural network
Area under the receiver operating characteristic curve
Distant recurrence-free interval
Human epidermal growth factor receptor 2
Multivariable fractional polynomial
No positive lymph nodes
1–3 positive lymph nodes
4–9 positive lymph nodes
- N ≥ 10:
≥ 10 positive lymph nodes
Restricted cubic spline
Tumor size ≤20 mm
Tumor size 21–50 mm
Tumor size > 50 mm
Tumor node metastasis
Goldhirsch A, Winer EP, Coates AS, Gelber RD, Piccart-Gebhart M, Thurlimann B, Senn HJ, Panel M. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen international expert consensus on the primary therapy of early breast Cancer 2013. Ann Oncol. 2013;24(9):2206–23.
Harris LN, Ismaila N, McShane LM, Andre F, Collyar DE, Gonzalez-Angulo AM, Hammond EH, Kuderer NM, Liu MC, Mennel RG, et al. Use of biomarkers to guide decisions on adjuvant systemic therapy for women with early-stage invasive breast Cancer: American Society of Clinical Oncology clinical practice guideline. J Clin Oncol. 2016;34(10):1134–50.
Haybittle JL, Blamey RW, Elston CW, Johnson J, Doyle PJ, Campbell FC, Nicholson RI, Griffiths K. A prognostic index in primary breast cancer. Br J Cancer. 1982;45(3):361–6.
Ravdin PM, Siminoff LA, Davis GJ, Mercer MB, Hewlett J, Gerson N, Parker HL. Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001;19(4):980–91.
Curigliano G, Burstein HJ, PW E, Gnant M, Dubsky P, Loibl S, Colleoni M, Regan MM, Piccart-Gebhart M, Senn HJ, et al. De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen international expert consensus conference on the primary therapy of early breast Cancer 2017. Ann Oncol. 2017;28(8):1700–12.
Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ. 2006;332(7549):1080.
MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychol Methods. 2002;7(1):19–40.
Irwin JR, McClelland GH. Negative consequences of dichotomizing continuous predictor variables. J Mark Res. 2003, 40:366–71.
Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst. 1994;86(11):829–35.
Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin Cancer Res. 2004;10(21):7252–9.
Sauerbrei WaR P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Stat Soc. 1999;162:71–94.
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25(1):127–41.
Royston PaS W. Multivariable Model-Building: A pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables: John Wiley & Son,s Ltd; 2008.
Royston P. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat. 1994;43:429–67.
Sauerbrei W, Royston P, Bojar H, Schmoor C, Schumacher M. Modelling the effects of standard prognostic factors in node-positive breast cancer. German breast Cancer study group (GBSG). Br J Cancer. 1999;79(11–12):1752–60.
Ejlertsen B, Jensen MB, Mouridsen HT. Danish breast Cancer cooperative G: excess mortality in postmenopausal high-risk women who only receive adjuvant endocrine therapy for estrogen receptor positive breast cancer. Acta Oncol. 2014;53(2):174–85.
Harrell FE Jr. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis: Springer; 2001.
Steyerberg EW: Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating: Springer; 2008.
Gourgou-Bourgade S, Cameron D, Poortmans P, Asselain B, Azria D, Cardoso F, A'Hern R, Bliss J, Bogaerts J, Bonnefoi H, et al. Guidelines for time-to-event end point definitions in breast cancer trials: results of the DATECAN initiative (definition for the assessment of time-to-event endpoints in CANcer trials). Ann Oncol. 2015;26(12):2505–6.
Killander F, Anderson H, Ryden S, Moller T, Hafstrom LO, Malmstrom P. Efficient reduction of loco-regional recurrences but no effect on mortality twenty years after postmastectomy radiation in premenopausal women with stage II breast cancer - a randomized trial from the South Sweden breast Cancer group. Breast. 2009;18(5):309–15.
Ryden S, Ferno M, Borg A, Hafstrom L, Moller T, Norgren A. Prognostic significance of estrogen and progesterone receptors in stage II breast cancer. J Surg Oncol. 1988;37(4):221–6.
Killander F, Anderson H, Ryden S, Moller T, Aspegren K, Ceberg J, Danewid C, Malmstrom P. Radiotherapy and tamoxifen after mastectomy in postmenopausal women -- 20 year follow-up of the South Sweden breast Cancer group randomised trial SSBCG II:I. Eur J Cancer. 2007;43(14):2100–8.
Ryden L, Jonsson PE, Chebil G, Dufmats M, Ferno M, Jirstrom K, Kallstrom AC, Landberg G, Stal O, Thorstenson S, et al. Two years of adjuvant tamoxifen in premenopausal patients with breast cancer: a randomised, controlled trial with long-term follow-up. Eur J Cancer. 2005;41(2):256–64.
SwedishBreastCancerCooperativeGroup. Randomized trial of two versus five years of adjuvant tamoxifen for postmenopausal early stage breast cancer. Swedish breast Cancer cooperative group. J Natl Cancer Inst. 1996;88(21):1543–9.
Hansen S, Grabau DA, Sorensen FB, Bak M, Vach W, Rose C. The prognostic value of angiogenesis by Chalkley counting in a confirmatory study design on 836 breast cancer patients. Clin Cancer Res. 2000;6(1):139–46.
Falck AK, Bendahl PO, Chebil G, Olsson H, Ferno M, Ryden L. Biomarker expression and St Gallen molecular subtype classification in primary tumours, synchronous lymph node metastases and asynchronous relapses in primary breast cancer patients with 10 years’ follow-up. Breast Cancer Res Treat. 2013;140(1):93–104.
Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39(2):499–503.
Altman DG, McShane LM, Sauerbrei W, Taube SE. Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration. BMC Med. 2012;10:51.
McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM. REporting recommendations for tumour MARKer prognostic studies (REMARK). Eur J Cancer. 2005;41(12):1690–6.
Royston P, Altman DG. External validation of a cox prognostic model: principles and methods. BMC Med Res Methodol. 2013;13:33.
Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy and measuring and reducing errors. Stat Med. 1996;15(4):361–87.
Uno H, Cai T, Pencina MJ, D'Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–17.
Fredholm H, Eaker S, Frisell J, Holmberg L, Fredriksson I, Lindman H. Breast cancer in young women: poor survival despite intensive treatment. PLoS One. 2009;4(11):e7695.
Kroman N, Jensen MB, Wohlfahrt J, Mouridsen HT, Andersen PK, Melbye M. Factors influencing the effect of age on prognosis in breast cancer: population based study. BMJ. 2000;320(7233):474–8.
Kalderstam J, Eden P, Bendahl PO, Strand C, Ferno M, Ohlsson M. Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. Artif Intell Med. 2013;58(2):125–32.
Ryden S, Moller T, Hafstrom L, Ranstam J, Westrup C, Wiklander O. Adjuvant therapy of breast cancer: compliance and data validity in a multicenter trial. Control Clin Trials. 1986;7(4):290–305.
Coates AS, Winer EP, Goldhirsch A, Gelber RD, Gnant M, Piccart-Gebhart M, Thurlimann B, Senn HJ, Panel M. Tailoring therapies--improving the management of early breast cancer: St Gallen international expert consensus on the primary therapy of early breast Cancer 2015. Ann Oncol. 2015;26(8):1533–46.
Tukey JW. The collected works of John W Tukey, vol. 5. p. 1965–85.
We are indebted to participating departments of the South Sweden Breast Cancer Group and South East Sweden Breast Cancer Group for providing samples and clinical follow-up.
The study was supported by funds from the Swedish Cancer Society, the Swedish Research Council, the Gunnar Nilsson Cancer Foundation, the Swedish Breast Cancer Association, the Swedish Cancer and Allergy Foundation, the Mrs. Berta Kamprad Foundation, the Anna and Edwin Bergers Foundation, Skåne County Council’s Research and Development Foundation, and Governmental Funding of Clinical Research within the National Health Service. The funding bodies did neither influence the design of the study, collection, analysis or interpretation of data, nor in the writing of the manuscript.
Availability of data and materials
All the data are available without restriction. Researchers can obtain data by contacting the corresponding author.
Ethics approval and consent to participate
The study was approved by the Regional Ethical Review Board at Lund University, Lund Sweden (LU 240–01), and carried out in accordance with the code of ethics of the World Medical Association.
Since the study handles archival paraffin material, often several decades old, informed consent was not possible to retrieve from all patients. However, all data was analyzed and presented anonymously. In addition a note was published in the local paper, informing all previous breast cancer patients about the possibility to contact the research group if they did not want their tumor tissue to be used in scientific studies. This procedure was accepted by the Regional Ethical Review Board.
Consent for publication
See the Ethics approval and consent to participate section above.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.