Model refinement on test data constitutes a dangerous case of overfitting Olivier Gevaert, University of Leuven, Belgium 5 August 2009 Although already pointed out by the reviewers that the model refinement in this article is a dangerous case of overfitting, still the authors overemphasize that they improve prediction of progression free survival (PFS) of their model. In this article, the authors report that a 180-gene signature developed based on lung data and a different EGFR inhibitor, is predictive of therapy response in metastatic colorectal cancer independent of KRAS mutation status. This constitutes an important finding since these results show that their original signature is independent of disease and anti-EGFR monoclonal antibodies. Next, the researchers attempted to reduce the number of genes in the original signature to a suitable number manageable with other methods besides microarray such as qRT-PCR. This is obviously an important step because a smaller set of genes will facilitate clinical applicability of the signature but the authors should have stopped there and not have emphasized that by doing this they improve their predictive accuracy. This is trivial since information from the test set was used to reduce the 180-gene signature to a set of 26 genes. Thus, the results presented in Figure 3 and Table 3 correspond to training set performance and therefore the performance of the 26-gene signature is not known at this moment and can only be verified using another independent test set. Competing interests I have no financial or non-financial competing interests in this subject.