Modeling the effect of age in T1-2 breast cancer using the SEER database

Background Modeling the relationship between age and mortality for breast cancer patients may have important prognostic and therapeutic implications. Methods Data from 9 registries of the Surveillance, Epidemiology, and End Results Program (SEER) of the United States were used. This study employed proportional hazards to model mortality in women with T1-2 breast cancers. The residuals of the model were used to examine the effect of age on mortality. This procedure was applied to node-negative (N0) and node-positive (N+) patients. All causes mortality and breast cancer specific mortality were evaluated. Results The relationship between age and mortality is biphasic. For both N0 and N+ patients among the T1-2 group, the analysis suggested two age components. One component is linear and corresponds to a natural increase of mortality with each year of age. The other component is quasi-quadratic and is centered around age 50. This component contributes to an increased risk of mortality as age increases beyond 50. It suggests a hormonally related process: the farther from menopause in either direction, the more prognosis is adversely influenced by the quasi-quadratic component. There is a complex relationship between hormone receptor status and other prognostic factors, like age. Conclusion The present analysis confirms the findings of many epidemiological and clinical trials that the relationship between age and mortality is biphasic. Compared with older patients, young women experience an abnormally high risk of death. Among elderly patients, the risk of death from breast cancer does not decrease with increasing age. These facts are important in the discussion of options for adjuvant treatment with breast cancer patients.


Background
In many clinical situations, age is an important determinant of treatment decision in breast cancer. For example, after mastectomy, patients with T2 tumors and one to three positive nodes are at high risk of isolated locoregional recurrences. Authors have advocated the routine use of postmastectomy radiotherapy in those patients who have T2 tumors and who are younger than 45 years [1]. In another study about close margins at mastectomy, the subgroup of patients aged 50 or younger with clinical T1-2 tumors and 0-3 positive nodes who have close (5 mm or less) or positive margins were at high risk (28% at 8 years) for chest wall recurrence regardless of adjuvant systemic therapy. Therefore, such patients should be considered for postmastectomy radiation [2]. Young women aged less than 45 should be regarded as high-risk patients, on the basis of age alone, and should be given adjuvant cytotoxic treatment [3]. The latter study showed a non-linear relationship between age and relative risk of dying.
At the other end of the age spectrum, breast cancers in elderly patients have been considered by some authors to exhibit a less aggressive behavior than in younger patients [4,5]. Other authors have argued that breast cancer does not become more indolent as age increases [6].
There are still controversial issues about the relationship between age and prognosis in breast cancer. Detailed analysis would be useful in order to provide more insight into this relationship. In the present study, we used proportional hazards to model the survival of T1-2, nodenegative (N0) and node-positive (N+) breast cancer patients. Outcomes which we considered included allcause mortality and cancer specific mortality from breast cancer. The primary aim of the study is to present how age relates with the risk of death. The secondary objective is to search for a simple algebraic representation of this relationship.

Methods
The Surveillance, Epidemiology, and End Results Program (SEER) of the United States collected data about the incidence of cancer and related matters from 11 populationbased registries [7]. The data extracted in this study was from 9 registries: San Francisco-Oakland, Connecticut, Metropolitan Detroit, Hawaii, Iowa, New Mexico, Seattle (Puget Sound), Utah, and Metropolitan Atlanta. Selected patients were women who were without previous history of cancer and presented with non-inflammatory invasive breast carcinoma, diagnosed and histologically confirmed pT1-2 pM0 between 1988 and 1997, and for whom curative surgery and axillary lymph node dissections were performed. In 1987, the American Joint Committee on Cancer (AJCC) staging defined pT1 tumors as 2 cm or less in greatest dimension, and pT2 tumor as more than 2 cm but not more than 5 cm in greatest dimension. These definitions did not change until 1997. Some records were rejected because of concerns about the quality of data: non-hospital based data records, uncertain sequence of treatment, unknown month of diagnosis and unknown race. Records with missing histological grade and receptor status were not excluded. Examination of statistical outliers excluded one case with 75 nodes involved. Events for the study were death from all causes and death from breast cancer. Follow-up cutoff date was December 31, 1999 as provided by the database.
In order to verify the linearity of the continuous variables, the martingale residuals (differences between observed and expected numbers of events) were used. The martingale residuals were examined by a non-parametric smoothing (fitting the scatter-plots of residuals) against the quantitative covariates of interest. The smoothing used a Poisson regression implementation of generalized additive model (GAM) [8]. The GAM procedure provided two outputs. One was the non-parametric smoothed curves approximating the residuals. The other was a significance test of the non-linearity of the curves. For the covariates that significantly departed from linearity, an iterative search was performed to identify parametric families of functions that approximated the curves. The criteria used to end the search were: [a] simple parametric expression, [b] the corresponding function introduced as a transform in the Cox model satisfying the GAM linearity test, and [c] without deteriorating the model fit as assessed by the sum of squares of "deviance residuals" [8]. If the transforms were valid, the graphical displays should be linear shapes, and the non-linearity test results should be non-significant. Finally, scaled Schoenfeld residuals were used to verify that the relative hazards were constant over time [9]. The hypothesis underlying this dual modeling approach was as follows. If the algebraic functions are valid, their use as plug-in transforms should appropriately linearize the functional forms of the covariates of interest. Other information about the implementation of these procedures have been described earlier [10][11][12].
The analysis was applied first to node-negative cases ("training set") in order to find a simple expression of the functional form which relates age to mortality. The functional form obtained from node-negative cases was then applied to node-positive cases ("validation set"). In addition to the validation with the same transformation which was obtained for node-negative patients, a further iterative search was performed in order to improve the fit for node-positive patients.
This analysis was applied also to a European dataset, the German Breast Cancer Study Group (GBSG-2), in which the outcome studied was disease-free survival [14]. From

Results
There are 83,804 T1-2 cases (58,139 N0 and 25,665 N+, mean: 4 nodes involved, range: 1-48) available for analysis from the SEER database. Table 1 shows the characteristics of the patients. This table has been presented elsewhere [13]. Except for 28 additional cases (because of updated registration), there are no noticeable differences in the distribution of the characteristics. Table 2 shows the results of proportional hazards models in N0 and N+ groups, without using transforms for covariates. The supplemental   The shape of the smoothed curve for age suggests the use of a quadratic function. A fractional polynomial analogous to Sauerbrei and Royston [14], but with different exponents, combining a linear term (age) and a quasiquadratic term |age-50| 1.5 , i.e. age+ |age-50| 1.5 , provides a good fit and passes the test of linearity (Chi-square = 6.530, P = 0.089) in N0 patients (Table 3).
We note that the age transform derived from node-negative cases does not provide a perfect linearization in N+ patients (Table 3). A better linearization in N+ patients was obtained by replacing the 1.5 exponent with 1.8, though without improving global model fit ( Table 3).
The proportional hazard check for age shows a deviation from the assumption of constant hazard ( Table 3). The "rho" values are positive when considering overall mortality, i.e. an increasing risk of death with longer follow-up. The values are negative when considering breast cancer specific mortality, i.e. a decreasing risk of breast cancer death with longer follow-up.
The age transforms suggest two components in the effect of age. One component is linear (linear for the log hazard ratio, i.e. exponential for the hazard ratio) and corresponds to a natural increase in mortality with each year of age. The other component is quasi-quadratic and is centered around age 50. It contributes to an increased risk of mortality as age increases beyond 50. It suggests a hormonally related process, not pre-versus post-menopausal, but perimenopausal versus non-perimenopausal (premenopausal + postmenopausal). The further age at diagnosis is from the age at menopause, the more prognosis is influenced by the quasi-quadratic component.
The results display a complex functional form of the effect of age on mortality. The curves clearly highlight the biological anomaly that younger patients experience the same relative mortality risk from all causes as do older patients. Figures 1 and 2 show that a 30-year old patient has a risk of death almost equal to a 60-year old patient.
The marked increase in mortality risk at older ages is attributable to the increased risk of death from causes other than breast cancer (co-morbidity). It should be noted that breast cancer does not become less virulent in older patients. An increase in the risk of death from breast cancer associated with older age was observed both in N0 and in N+ patients (Figures 3 and 4).
Mortality for all causes as a function of age for N0 patients Figure 1 Mortality for all causes as a function of age for N0 patients. Dotted lines: twice-standard-error.
Mortality for all causes as a function of age for N+ patients Figure 2 Mortality for all causes as a function of age for N+ patients. Dotted lines: twice-standard-error. Breast cancer mortality as a function of age for N+ patients Figure 4 Breast cancer mortality as a function of age for N+ patients. Dotted lines: twice-standard-error.
Breast cancer mortality as a function of age for N0 patients Figure 3 Breast cancer mortality as a function of age for N0 patients. Dotted lines: twice-standard-error.
The German Breast Cancer Study Group GBSG-2 dataset [14] is a separate database of 686 patients. Using the GAM procedure on the GBSG-2 data, age was significantly nonlinear (Chi 2 = 31.744, 3 degrees of freedom, P < 0.000001). The age transforms improved the linearity for the age variable, and also improved the proportional hazards model (Table 4).

Discussion
In studies addressing the effect of age on breast cancer, several authors have reported a biphasic mortality [15][16][17][18][19]. This large study concurs with others in the literature. As in any modeling, the validity and the utility of the model may be questioned. Data from the GBSG-2 study were considered for verification of the model. The GBSG-2 study differs from the present SEER study in several respects. This German study was a prospective controlled clinical trial about the adjuvant treatment of node-positive breast cancer patients. Inclusion of patients was not restricted by tumor size. Histopathological classification and grading were performed centrally by one reference pathologist. The GBSG-2 data have been extensively investigated for the effect of age on the prognosis of breast cancer [20]. The GBSG-2 data thus provide an indication of the capability of our results to be extrapolated to a different population. It is also complementary, since the SEER has no data on recurrence and can provide no information on disease-free survival.
Applying different methods to estimate the effect of age on event-free survival of breast cancer (linear, categorization based on cutpoints, classification and regression trees, quadratic, fractional polynomial, cubic splines), Hollaender found that all methods showed a decrease in risk with increasing age up to 45-50 years [20]. A slight increase in risk was observed for older patients in the GBSG-2 data. Taking into account the wide confidence intervals for ages older than 80 years, our Figure 4 for node-positive breast cancer specific survival shows a good concordance with the node-positive GBSG-2 event-free survival.
Regarding the proportional hazards assumption, Hollaender noted that assuming a linear risk function, a small correlation value rho of 0.147 was obtained [20]. Our result for the GBSG-2 data shows the value of rho to be 0.131 (Table 4). The small difference is attributable to the incorporation of different covariates to our proportional hazards model (additional file 2 "outputgbsg2.doc"). For the SEER data, the rho values are smaller (Table 3).
Our results are also in keeping with a closely related investigation of the SEER data in which a group of 4,616 patients 35 years old or younger was compared to a group of 20,319 patients aged 50-55 years [21]. The authors observed that younger breast cancer patients had poorer survival explained in part by presentation with later stage disease and more aggressive tumors, in terms of grade and receptor status. But the known factors could not account for the remaining unexplained difference in survival. In contradiction, recently Rapiti et al have argued that age is not an independent prognostic factor when accounting for breast tumor characteristics and treatment [22]. However, this latter study included only 82 patients who were 35 years old or younger.
In order to try to understand the biphasic mortality, we looked at hormonal status and treatments of the patients. The age of 50 corresponds to the menopause. A large proportion of younger women were estrogen receptor (ER) negative ( Figure 5). The proportion of ER-negative patients decreases with increasing age without any inflection. On the other hand, the proportion of progesterone receptor (PR) negative patients increases at age 50 then slowly decreases again. The reporting of hormonal receptor status is incomplete in SEER (~33-35% missing data).
Data on systemic treatment were not available from the SEER database, but the types of surgery and radiotherapy were provided. Mastectomy was performed less frequently on younger patients, but increased markedly among older patients. Post-operative radiotherapy was given less frequently at both ends of the age spectrum; somewhat less frequently in the young and considerably less frequently in the elderly patients ( Figure 6). Researchers have reported under-treatment of elderly patients and this fact may account in part for the poor prognosis in the elderly [23][24][25]. Whether hormonal status or type of treatment or other factors may explain the biphasic mortality will need to be researched.
There are several limitations in the present analysis. The data are retrospective. Several orders of statistically significant interactions have not been incorporated in the models. Receiving systemic treatment is a particularly important prognostic factor in younger patients [3], but data on systemic treatment were not available for analysis.
Despite the limitations and regardless of the modeling, our major finding is that the relationship of age and mortality is biphasic. Such a finding has been described by many other authors [16,17,20,26]. It is important to remember this biphasic relationship when analyzing the effect of age on patients with breast cancer. Otherwise, there is a substantial risk of misinterpreting results when age is inappropriately categorized [26] or inappropriately modeled. (Table 2 would suggest erroneously almost no effect of age on mortality). Taking into account the full shape of the relationship between age and breast cancer specific mortality, we conclude that: 1) young women experience a much higher risk of death than do older patients; 2) among elderly patients, the risk of death from breast cancer does not decrease with increasing age. These are two facts that should be remembered by those when discussing adjuvant treatment with breast cancer patients.

Conclusion
The present analysis confirms that the relationship between age and mortality is biphasic. It is important that clinical research takes this relationship into account.