Performance of the metagene-based classifier when applied to independent data. A) Genes with highest AUC values of the SanchezC data are first tested by LOOCV in the SanchezC dataset (black circles) and then taken over to establish a classifier in the Stransky (green) and Blaveri (datasets), again by LOOCV. Gene signatures of 5-500 gene members are used. Balanced Accuracies are plotted; the dashed line at 0.5 indicates the balanced accuracy obtained by chance. B) and C): Procedure repeated with Blaveri and Stransky as training data, respectively. D) Changes in balanced accuracies from one gene signature size to the next biggest size are plotted for all balanced accuracies obtained in the validation datasets in A-C).