Biological tumor markers are used as measures of clinical efficacy when evaluating novel neo-adjuvant therapies., Ki67 is a biological tumor marker that follows changes in tumor proliferation between pre- and post-therapeutical samples, typically core biopsies and surgical samples [10, 26, 27]. These, however, vary in both sample acquisition and post-acquisition treatment while containing compositional differences possibly affecting direct comparison . To the best of our knowledge these potential differences have not previously been fully addressed [18, 19, 29, 30]. In this study, we observe significant average proliferation differences between paired core biopsies and surgical samples from patients in an untreated setting. Importantly, the difference represents an average difference in proliferation with the core biopsies demonstrating a higher proliferation index compared to the surgical samples. The pattern is however inconsistent between individuals with proliferation differences in either directions as demonstrated in Figure 2 and 3. This variability needs to be addressed in interpretations of proliferation differences in future clinical studies. In many cases a decrease in proliferation values with increasing number of evaluated tumor cells was observed. This dilution effect, which we believe to affect core biopsies and surgical samples unequally, could also play a role in the systematic difference observed. Finally, the lack of consensus concerning Ki67 assessment may raise problems in the comparison of neo-adjuvant studies. We propose a theoretical model for Ki67 assessment which may diminish the reduction of systematic differences and improve comparison of future neo-adjuvant studies.
The decision of including fifty tumors in this study was not based on a statistical rationale or availability of tumor tissue, rather regarded as a reasonable number of samples for detailed Ki67 assessments in this preliminary study.
Initial analysis using the t-test showed a significant difference between average proliferation values of core biopsies versus surgical samples for the first 200 cancer cells counted, but not for the entire 1000 cells. However, when a single sample pair with an extreme difference in proliferation was excluded, significant differences were observed for both 200 and 1000 cancer cells. This profound effect of a single sample pair suggests that the distributional assumptions of the t-test are not met. Hence, a non-parametric analysis, using Wilcoxon matched-pairs signed-ranks test, was carried out for the entire series of 50 sample pairs. This analysis revealed a significant difference in Ki67 expression between core biopsies and surgical samples, but only for the first 200 tumor cells counted. The Ki67-fractions in core biopsies and surgical samples were also compared on a multiplicative scale - i.e. as ratios. This scale should theoretically be a better choice, compared to the linear, if the variation in Ki67 increases with increasing average value, but the drawback is that ratios for low Ki67 values are unreliable. The geometrical mean of the ratios (surgical sample/core biopsy) was 0.81. Thus, the decrease in the surgical specimen was, on average, 19% relative to the core biopsy (p = 0.063).
The observed differences in Ki67 expression between the two types of samples leads to discordances after dichotomization at 20% and, for 200 cells, positivity in the core biopsy and negativity in the surgical sample was significantly more common than the opposite pattern. Random differences between the paired samples were expected due to the heterogeneity of the samples, but the significant systematic difference of the first 200 cells was unexpected. A closer examination of the statistical results was undertaken to better penetrate probable causes.
For clinical use, a dichotomized biomarker value is often preferred as a decision-making tool in choice of therapy. A dichotomized proliferation variable was thus created and tested statistically with results summarized in Table 2. At first glance, evaluating 1000 cancer cells results in fewer sample pairs with opposing classifications while eliminating the systematic difference seen with 200 cells, however, some cumulative Ki67 curves (Figure 5) demonstrate a slight but visible inverse relationship between the number of tumor cells evaluated and proliferation results, a dilution effect. When a sample's proliferation value is near the cut-off value and cancer cells outside of the initial hotspot are included in the assessment, researchers run the risk of diluting the actual proliferation percentage from one categorized as highly proliferative to one that is falsely classified as low proliferating. To verify this putative dilution effect, proliferation values based on the initial 200 cells were compared with the proliferation values of the latter 800 cells. Values for the first 200 cells were significantly higher in both core biopsies and surgical samples despite the evaluations coming from the same samples. Following this substantiation of the dilution effect we postulate sample composition and acquirement as primary factors which are discussed below.
Ki67 evaluation focused on hotspots. In nearly all samples the initial area of increased proliferation, or hot-spot, was exhausted before 1000 cells were evaluated leading to areas of lower proliferation being included in the final Ki67 result. Core biopsies generally contain fewer cancer cells than surgical samples. Studies suggest core biopsies are often acquired from near the center of a tumor, although knowledge of which area of the tumor the needle targets, is difficult to elucidate and might be regarded as random [31, 32]. Hotspots, however, are often noted to occur near the periphery of a tumor . Therefore, core biopsies could be expected to have lower proliferation values; however, core biopsy samples must pass through the tumor periphery in order to reach the tumor center and thus may pass through a hotspot. Assuming a core biopsy includes a hotspot and at the same time contains fewer cells than a surgical sample, the hotspot in the core biopsy would be less affected by dilution than a surgical sample containing not only entire hotspots, but large areas of low proliferation. Further explanations for the observed systematic difference relate to both acquisition and post-acquisition handling of tissue samples. Acquisition of core biopsies is a relatively quick process with little time for ischemic damage to affect the sample. Surgical samples are, however, routinely exposed to varying periods of ischemia during tumor removal. This hypoxic damage could result in apoptosis of surgical sample cancer cells and lead to lower proliferation values compared to core biopsies lacking significant hypoxic damage. Post-acquisition handling of tissue samples also varies between core biopsies and surgical samples. Core biopsies are immediately fixed in formalin while surgical samples are often stored on ice for varying lengths of time before commencing formalin fixation. Cold ischemic damage could lead to further apoptosis. The nearly instantaneous acquisition and fixation of the core biopsies allows not only minimal time for apoptosis, but little opportunity for degradation of the Ki67 nuclear protein, whereas the combined ischemic times during and after surgery give ample opportunity for protein degradation to occur . Further studies are required to elucidate the extent to which these factors influence the observed difference between core biopsies and surgical samples.
The dilution effect, regardless of cause, might be important to note in clinical pratice if samples are dichotomized, as only proliferation values near a chosen cut-off would be affected by the dilution effect. In a research context, however, where continuous values are gathered and analyzed as such, the dilution effect could be relevant over the majority of samples.
A lack of consensus concerning an appropriate cut-off value for Ki67 exists within the breast cancer research community [33–36] and might raise problems in comparison of neo-adjuvant studies using change in proliferation as an endpoint. The secondary aim of the present study was to introduce a theoretical model for Ki67 assessment which may also minimize the difference in proliferation observed here between core biopsies and surgical samples. The initial idea of a simple adjustment factor was discarded due to large ranges and intra-patient proliferation differences in both directions (Figure 2 and 3). Instead, we focused on the development of a theoretical model both optimizing the number of cancer cells evaluated for Ki67 and possibly standardizing the counting practice.
Currently, a predetermined number of cancer cells are evaluated without regard to sample heterogeneity and without a general agreement as to an optimal number. It is generally assumed that more cells evaluated signifies more reliable results as attested to by narrower confidence intervals. The underlying assumption when constructing a CI for the probability of Ki67-positivity, however, is that the counted cells constitute a random sample of cells from a homogenous distribution - an assumption which is certainly not true for small hotspots. We observed a dilution effect that despite narrower CIs provides less accurate Ki67-estimates for samples with small hotspots (Figures 4, 5 and 6). Optimally, cancer cells from a single hotspot are counted until the null hypothesis of proliferation rate equal to the cut-off can be rejected at a pre-defined significance level. Hotspots, however, vary considerably in size and composition from sample to sample. A dual problem of accommodating individual sample heterogeneity while optimizing counting methods emerges.
In summary, we propose the following counting model to be tested in future neo-adjuvant studies: Evaluate 100 cells for Ki67 proliferation. If the proportion is far enough from the cut-off value then no further cells need to be counted. If the cut-off cannot be excluded, an additional ten cells are evaluated and the corresponding proportion is compared to the limits in Figure 7. The evaluation continues in ten cell increments until either the cut-off is rejected or until a maximum of 400 cells is reached. In the latter case, the sample is designated unclassifiable.