Expression profiling to predict the clinical behaviour of ovarian cancer fails independent evaluation
BMC Cancer volume 8, Article number: 18 (2008)
In a previously published pilot study we explored the performance of microarrays in predicting clinical behaviour of ovarian tumours. For this purpose we performed microarray analysis on 20 patients and estimated that we could predict advanced stage disease with 100% accuracy and the response to platin-based chemotherapy with 76.92% accuracy using leave-one-out cross validation techniques in combination with Least Squares Support Vector Machines (LS-SVMs).
In the current study we evaluate whether tumour characteristics in an independent set of 49 patients can be predicted using the pilot data set with principal component analysis or LS-SVMs.
The results of the principal component analysis suggest that the gene expression data from stage I, platin-sensitive advanced stage and platin-resistant advanced stage tumours in the independent data set did not correspond to their respective classes in the pilot study. Additionally, LS-SVM models built using the data from the pilot study – although they only misclassified one of four stage I tumours and correctly classified all 45 advanced stage tumours – were not able to predict resistance to platin-based chemotherapy. Furthermore, models based on the pilot data and on previously published gene sets related to ovarian cancer outcomes, did not perform significantly better than our models.
We discuss possible reasons for failure of the model for predicting response to platin-based chemotherapy and conclude that existing results based on gene expression patterns of ovarian tumours need to be thoroughly scrutinized before these results can be accepted to reflect the true performance of microarray technology.
Ovarian cancer ranks fifth when considering cancer mortality in women . Unfortunately clinical or pathologic variables that can reliably predict recurrence in FIGO (Fédération Internationale de Gynécologie Obstétrique) stage I patients or resistance to platin-based chemotherapy in advanced stage disease (FIGO stage III or IV) are not available. The prognosis might be more optimally predicted based on gene expression analysis, since microarrays can capture tumour properties that might not be reflected in the commonly used clinical or histopathological variables at diagnosis.
Previously, we performed a pilot study consisting of microarray analysis on three groups of patients: seven stage I without recurrence, seven platin-sensitive advanced stage and six platin-resistant advanced stage ovarian tumours . We investigated whether gene expression analysis can be used to distinguish between stage I and advanced stage ovarian tumours, and between platin-sensitive and platin-resistant ovarian tumours. The results showed that a considerable number of genes were differentially expressed between the different tumour classes. This was confirmed by principal component analysis (PCA) where the distinction between the three tumour classes was visualised. A least squares support vector machine (LS-SVM) analysis showed that the estimated classification performance was 100% for the distinction between stage I and advanced stage disease, and 76.92% for the distinction between platin-sensitive and platin-resistant disease when using a leave-one-out approach. These results indicated that gene expression analysis could be appropriate to predict prognosis of ovarian tumours. However, since leave-one-out cross validation can overestimate the performance of a model, an independent evaluation is needed to have an unbiased estimate of the generalization capacity.
In the current study, we describe results of an independent evaluation of models for predicting disease stage and response to platin-based chemotherapy built on the data of the pilot. Our goal was to evaluate whether an independent study could confirm the applicability of microarrays for the clinical management of ovarian cancer. This independent evaluation was carried out on a set of 49 new tumour samples which were subjected to the same experimental protocol. This data set was used as a test set to estimate the performance when predicting the difference between stage I and advanced stage disease, and between platin-sensitive and platin-resistant disease using models trained on the pilot data set. After presenting the results, we discuss the generalization performance on this independent data set and compare with models based on previously published gene sets.
Tissue collection and analysis were approved by the local ethical committee. After obtaining informed consent, tumour biopsies were sampled and immediately frozen in liquid nitrogen during primary surgery and were taken from three groups of patients: 4 from patients with stage I disease, 30 from patients with platin-sensitive advanced stage disease and 15 from patients with platin-resistant advanced stage disease . In this study, similarly as in the pilot study, we will refer to these three groups as: I, As and Ar respectively. The patient and tumour characteristics are shown in table 1.
Microarray procedures were similar to our pilot study . Briefly, each tumour in the independent data set was hybridized twice (dye-swap) against the same common reference pool from the pilot study on an array containing 21.372 probes enriched for genes related to ovarian cancer. From each patient, mRNA was amplified and labelled with Cy3 and Cy5, according to Puskas and collaborators . All protocols can be downloaded from ArrayExpress . Microarray data and information recommended by the MIAMI (Minimum Information About a MIcroarray experiment) guidelines can be found on the ArrayExpress website  (Accession number E-MEXP-995 for the independent data set and E-MEXP-979 for the pilot data).
Microarray data analysis
The gene expression data were analysed using MATLAB 7 (R2006b). Pre-processing was done similarly as in our pilot study. Briefly, each microarray in the independent data set was analysed separately in the following order: the intensities were background-corrected, log-transformed and finally normalised using the intensity dependent Lowess fit procedure. The mean of the replicate and normalised log ratios was used as a measure for expression. After pre-processing, first PCA and secondly LS-SVM were used to analyse the data. PCA was used for visualisation of the data while LS-SVMs were used for building classification models. A p-value is considered statistically significant if smaller than 0.05. All statistical tests were two-sided unless mentioned otherwise. Exact bionomial confidence intervals were calculated using SAS 9.1.3 statistical software.
The procedure followed during PCA analysis can be found in Figure 1. This figure schematically shows the different steps involving the pilot and the independent data set. First, we rank the genes according to their differential expression between the three classes (Kruskal Wallis test) the pilot data and the top 3000 genes were selected. Then PCA analysis was performed on the reduced pilot data set and the three largest principal components were selected (i.e., the directions associated with the largest eigenvalues). Finally, we used the gene expression values from the independent data set corresponding to the 3000 genes that were previously selected in the pilot data set and projected this reduced independent data set in the space defined by the three largest principal components in the pilot data. Finally, the 3000 corresponding gene expression values were selected in the independent data set and the reduced independent data set was projected in this space.
Next, we used the pilot data set to build an LS-SVM to predict disease stage and an LS-SVM to predict the response to platin-based chemotherapy (MATLAB scripts were downloaded from LS-SVMlab version 1.5 [7, 8]). In the pilot study, an RBF kernel did not improve results therefore in all subsequent analysis a linear kernel was used. Figure 2 shows the different steps in this analysis which consists of the same steps for both two-class classification problems. First, the genes were ranked according to the differential expression between two classes using only the pilot study data and the top 3000 genes in this ranking were selected (Wilcoxon rank sum test). Next, the corresponding gene expression values were selected in the independent data set. Subsequently, an LS-SVM with linear kernel was trained using the reduced pilot data and applied to predict the class of the samples in the independent data set. This results in a estimate of the generalization performance of a model built only on the pilot study data for both classification problems.
Comparison with other profiles
To assess the performance of models based on our data we compared them with the performance of models based on published gene sets that predict a broad range of outcomes in ovarian cancer. It is difficult to directly apply the published models on our data since multiple different microarray platforms (e.g. one channel Affymetrix microarrays(Uv95Av2, HumanGeneFl, U133A) or two-channel custom arrays (cDNA)) have been used to derive these gene sets. Therefore we adopted the strategy visualized in Figure 3. First, the gene set is extracted from the literature and, if not already done, the genes were translated to HUGO (Human Genome Organization) gene symbols. Then, we extracted, in both the pilot and independent data set, the genes corresponding to the HUGO gene set from the literature. Subsequently, our model building strategy proceeds as previously described (see Figure 3). We used gene sets related to the response on platin-based chemotherapy [9–11], gene sets related to survival in epithelial ovarian cancer (EOC)  or in advanced stage serous EOC [13, 14], gene sets discriminating between the major histological types (serous, mucinous, clear cell and endometrioid) [15, 16], gene sets distinguishing between normal ovarian tissue and disease [17, 18], gene sets discriminating between low malignant potential or borderline disease and invasive disease , gene sets differentiating between ovarian cancer tissue and metastatic tissue  and a gene set predicting the presence of disease at second look surgery . These gene sets where constructed based on affymetrix microarrays (HuGeneFl, U95 set, U95Av2, U133A), different cDNA microarrays or HPLC (High Performance Liquid Chromatography) followed by ESI-TOF (Electrospray Ionization Time of Flight) mass spectrometry.
In this study we describe the results of the evaluation of models developed based on the data from our previously published pilot study  using PCA analysis or LS-SVMs on independently gathered microarray data. Note that all stage I patients in the pilot study had ovarian tumours without recurrence while in the current study population the four patients with stage I disease consist of 3 stage I tumours with recurrence and 1 stage I tumour without recurrence. Figure 4 shows the results of the PCA analysis. This figure visualises the projection of the patients from the independent data set belonging to the stage I, platin-sensitive and platin-resistant group onto the three principal component directions calculated based on the pilot study data. For all three groups, the data are scattered around the origin which indicates that the principal components computed based on the pilot data were not able to reproduce the three classes in the independent data set. Additionally, we did not observe a clear distinction between the stage I patients with and without recurrence (see Figure 4, top panel).
Secondly, we used LS-SVMs to assess if a supervised classification model can discriminate between the stage I and advanced stage disease, and between platin-sensitive and platin resistant disease. This resulted in a classification accuracy of 97.96% (CI 19%–99%) for the distinction between stage I and advanced stage disease which corresponds to one stage I tumour out of four that was classified as an advanced stage tumour. Next, a classification accuracy of 51.11% was obtained for the distinction between platin-sensitive and platin-resistant disease. This corresponds to five platin-resistant and eighteen platin-sensitive tumours that were misclassified, corresponding to a sensitivity of 67% (CI 38%–0.88%) and specificity of 40% (CI 23%–0.59%) when considering a platin resistant patient as a positive
Table 2 shows the accuracy on the independent data set for predicting stage and platin sensitivity of the models based on the pilot data and previously published gene sets. Most gene sets are able to predict ovarian cancer stage reliably (ranging from 87.8%–97.96%). Five profiles were less successful: Lancaster disease vs. normal (79.6%), Roberts platin sensitivity vs. platin resistance (75.5%) and both Lancaster ovarian cancer tissue vs. metastatic tissue models (71.4% and 57.14%). When focusing on the prediction of platin sensitivity, 5 of the published gene sets predicted the majority class on the independent data set resulting in 66.6% (30/45) classification accuracy. However, such a classifier has very little practical use since it predicts the same class for all independent data set samples. Finally, the Lancaster metastasis model consisting of 25 genes performed best with an accuracy of 60% corresponding to a sensitivity of 86% and specificity of 47% when considering a platin resistant patient as a positive (P-value 0.12, one sided binomial test).
Recently, several studies have investigated the use of microarrays to predict several clinically relevant outcomes of ovarian cancer [9, 10, 12, 13, 15, 21]. However, the identified gene sets or developed models in these studies have not been properly evaluated on independently gathered data. Microarray technology is notorious for its low signal-to-noise ratio, suffering from many potential experimental sources of error (e.g. dye effect, print-tip effect, array effect) on top of the biological variation inherent to the samples. Moreover due to the huge number of genes (e.g. ~25.000) compared to the low number of samples (~50), overfitting models is a real danger. This occurs when models fit the training data too well and are not capable of predicting new samples. Overfitting can only be detected when using proper cross-validation techniques or independent test set analysis. Only a true independent test set – not used for determining pre-processing parameters, selection of differentially expressed genes, model building or model selection – can be used to estimate the true performance of models . For example, we noticed a case of inappropriate use of a test set where this data set was used to select the best model [10, 22]. This implies that the model will perform well on this particular test set but, due to the high-dimensional nature of microarray data, this performance might be impossible to reproduce on truly independent data. Moreover, a recently published review of published microarray studies that focus on cancer related outcomes showed that the most common flaw in classification studies is a biased estimation of the accuracy (present in 12 of 28 studies published in 2004 ). This illustrates that inappropriate evaluation of classifiers based on microarray data is a common problem when building models to predict cancer outcomes.
Although more data should be gathered on stage I patients, the results presented in this paper indicate that predicting the response to platin-based chemotherapy is not straightforward and more subtle than predicting advanced stage disease. Furthermore, since most published studies lack a proper independent evaluation, their results should be cautiously interpreted. We advocated the use of microarrays based on the results from our pilot study, but warned for overestimating the generalization performance, as these results were based on a cross validation technique instead of using an independent data set. Additionally, since the pilot study performance for predicting the response to platin-based chemotherapy was not statistically significant, we searched for confirmation on an independent test set. Therefore, we carried out a new study to estimate the performance of models based on independently gathered microarray data in an unbiased way. The present results, both the PCA analysis and the performance of the LS-SVM models, show that the independent evaluation is disappointing. Only the LS-SVM stage model performed well and was able to distinguish early stage and advanced stage disease on the independent data set. The PCA analysis however demonstrated that, for the three classes, the independent data did not cluster to their corresponding class in the pilot study. Additionally, the LS-SVM platin model was not able to perform better than a random predictor. Therefore, we argue that a gene expression study should be validated on independently gathered data before the results can be considered for clinical use. Independently gathered data can be influenced by subtle changes in sample preparation, sample analysis and sample hybridizations, which can deteriorate model performance. Even the techniques used by the same lab might undergo subtle changes throughout time, causing a drop in model performance when the model is applied on new patient samples. It is unclear whether published models are robust against these influences.
Additionally, ovarian cancer represents an immense variation in histological structure and biological behaviour which complicates microarray based modelling. A large number of samples is required to correctly represent the complete microscopic spectrum. It is not unlikely that an independent data set contains a different mix of tumour samples with slightly different histological characteristics compared to the pilot study, complicating independent evaluation. Moreover, the quality of the samples has a major effect on the ability to detect true differential expression and subsequent model building. However in most cases, including ours, only a limited number of samples with sufficient follow-up is available which limits our ability to obtain a similar distribution of histopathology in the pilot and independent data set, and also forces us to use archival samples instead of new ones.
The comparison of the LS-SVM stage and LS-SVM platin model with published genes sets confirmed that predicting disease stage is easier than predicting response to platin-based chemotherapy. For predicting disease stage many previously developed gene sets are able to distinguish both classes indicating that many genes change when a tumour progresses from early to advance stage disease. Predicting the response to platin based chemotherapy is more challenging. None of the previously developed gene set models related to the response to platin based chemotherapy are able to predict this outcome significantly better than chance. This indicates that these gene sets do not generalize to our independently gathered data set. Only the 27-gene model by Lancaster and colleagues , which distinguishes between primary ovarian cancer and metastatic tissue, is able to predict the response to platin based chemotherapy to some degree. This gene set contains 12 genes which have previously been shown to be involved in oncogenesis and 10 genes which have been implicated in the p53 pathways. The performance of this gene set on our independent data set provides some evidence that genes distinguishing between primary and metastatic tissue also play a role in resistance to therapy.
Our results show that an independent evaluation of models based on gene expression data is necessary to validate models before considering subsequent steps to make microarray analysis clinically available. Previously published studies should be critically reviewed, in light of the current results, to assess if the reported model performance is not overestimated by inappropriate use of a test set and, if this is not the case, to consider if an independent study would confirm the reported model performance. Finally, prospective validation in multi-centre trials is necessary before microarray technology can move to clinical practice.
Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun MJ: Cancer statistics, 2007. CA Cancer J Clin. 2007, 57: 43-66.
De Smet F, Pochet NL, Engelen K, Van Gorp T, Van Hummelen P, Marchal K, Amant F, Timmerman D, De Moor BL, Vergote IB: Predicting the clinical behavior of ovarian cancer from gene expression profiles. Int J Gynecol Cancer. 2006, 16 Suppl 1: 147-151. 10.1111/j.1525-1438.2006.00321.x.
Markman M, Rothman R, Hakes T, Reichman B, Hoskins W, Rubin S, Jones W, Almadrones L, Lewis JL: Second-line platinum therapy in patients with ovarian cancer previously treated with cisplatin. J Clin Oncol. 1991, 9: 389-393.
Puskas LG, Zvara A, Hackler L, Van Hummelen P: RNA amplification results in reproducible microarray data with slight ratio bias. Biotechniques. 2002, 32: 1330-4, 1336, 1338, 1340.
Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N, Contrino S, Coulson R, Farne A, Lara GG, Holloway E, Kapushesky M, Lilja P, Mukherjee G, Oezcimen A, Rayner T, Rocca-Serra P, Sharma A, Sansone S, Brazma A: ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2005, 33: D553-D555. 10.1093/nar/gki056.
ArrayExpress. 2008, [http://www.ebi.ac.uk/arrayexpress]
LS-SVMlab. 2008, [http://www.esat.kuleuven.be/sista/lssvmlab/]
Pochet N, De Smet F, Suykens J, De Moor B: Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics. 2004, 20: 3185-3195. 10.1093/bioinformatics/bth383.
Helleman J, Jansen MP, Span PN, van Staveren IL, Massuger LF, Meijer-van Gelder ME, Sweep FC, Ewing PC, van der Burg ME, Stoter G, Nooter K, Berns EM: Molecular profiling of platinum resistant ovarian cancer. Int J Cancer. 2006, 118: 1963-1971. 10.1002/ijc.21599.
Hartmann LC, Lu KH, Linette GP, Cliby WA, Kalli KR, Gershenson D, Bast RC, Stec J, Iartchouk N, Smith DI, Ross JS, Hoersch S, Shridhar V, Lillie J, Kaufmann SH, Clark EA, Damokosh AI: Gene expression profiles predict early relapse in ovarian cancer after platinum-paclitaxel chemotherapy. Clin Cancer Res. 2005, 11: 2149-2155. 10.1158/1078-0432.CCR-04-1673.
Roberts D, Schick J, Conway S, Biade S, Laub PB, Stevenson JP, Hamilton TC, O'Dwyer PJ, Johnson SW: Identification of genes associated with platinum drug sensitivity and resistance in human ovarian cancer cells. Br J Cancer. 2005, 92: 1149-1158. 10.1038/sj.bjc.6602447.
Spentzos D, Levine DA, Ramoni MF, Joseph M, Gu X, Boyd J, Libermann TA, Cannistra SA: Gene expression signature with independent prognostic significance in epithelial ovarian cancer. J Clin Oncol. 2004, 22: 4700-4710. 10.1200/JCO.2004.04.070.
Berchuck A, Iversen ES, Lancaster JM, Pittman J, Luo J, Lee P, Murphy S, Dressman HK, Febbo PG, West M, Nevins JR, Marks JR: Patterns of gene expression that characterize long-term survival in advanced stage serous ovarian cancers. Clin Cancer Res. 2005, 11: 3686-3696. 10.1158/1078-0432.CCR-04-2398.
Lancaster JM, Dressman HK, Whitaker RS, Havrilesky L, Gray J, Marks JR, Nevins JR, Berchuck A: Gene expression patterns that characterize advanced stage serous ovarian cancers. J Soc Gynecol Investig. 2004, 11: 51-59. 10.1016/j.jsgi.2003.07.004.
Schwartz DR, Kardia SL, Shedden KA, Kuick R, Michailidis G, Taylor JM, Misek DE, Wu R, Zhai Y, Darrah DM, Reed H, Ellenson LH, Giordano TJ, Fearon ER, Hanash SM, Cho KR: Gene expression in ovarian cancer reflects both morphology and biological behavior, distinguishing clear cell from other poor-prognosis ovarian carcinomas. Cancer Res. 2002, 62: 4722-4729.
Zhu Y, Wu R, Sangha N, Yoo C, Cho KR, Shedden KA, Katabuchi H, Lubman DM: Classifications of ovarian cancer tissues by proteomic patterns. Proteomics. 2006, 6: 5846-5856. 10.1002/pmic.200600165.
Lu KH, Patterson AP, Wang L, Marquez RT, Atkinson EN, Baggerly KA, Ramoth LR, Rosen DG, Liu J, Hellstrom I, Smith D, Hartmann L, Fishman D, Berchuck A, Schmandt R, Whitaker R, Gershenson DM, Mills GB, Bast RC: Selection of potential markers for epithelial ovarian cancer with gene expression arrays and recursive descent partition analysis. Clin Cancer Res. 2004, 10: 3291-3300. 10.1158/1078-0432.CCR-03-0409.
Hibbs K, Skubitz KM, Pambuccian SE, Casey RC, Burleson KM, Oegema TR, Thiele JJ, Grindle SM, Bliss RL, Skubitz AP: Differential gene expression in ovarian carcinoma: identification of potential biomarkers. Am J Pathol. 2004, 165: 397-414.
Ouellet V, Provencher DM, Maugard CM, Le Page C, Ren F, Lussier C, Novak J, Ge B, Hudson TJ, Tonin PN, Mes-Masson AM: Discrimination between serous low malignant potential and invasive epithelial ovarian tumors using molecular profiling. Oncogene. 2005, 24: 4672-4687. 10.1038/sj.onc.1208214.
Lancaster JM, Dressman HK, Clarke JP, Sayer RA, Martino MA, Cragun JM, Henriott AH, Gray J, Sutphen R, Elahi A, Whitaker RS, West M, Marks JR, Nevins JR, Berchuck A: Identification of genes associated with ovarian cancer metastasis using microarray expression analysis. Int J Gynecol Cancer. 2006, 16: 1733-1745. 10.1111/j.1525-1438.2006.00660.x.
Spentzos D, Levine DA, Kolia S, Otu H, Boyd J, Libermann TA, Cannistra SA: Unique gene expression profile based on pathologic response in epithelial ovarian cancer. J Clin Oncol. 2005, 23: 7911-7918. 10.1200/JCO.2005.02.9363.
De Smet F, Pochet NL, De Moor BL, Van Gorp T, Timmerman D, Vergote IB, Hartmann LC, Damokosh AI, Hoersch S: Independent test set performance in the prediction of early relapse in ovarian cancer with gene expression profiles. Clin Cancer Res. 2005, 11: 7958-7959. 10.1158/1078-0432.CCR-05-1216.
Dupuy A, Simon RM: Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007, 99: 147-157. 10.1093/jnci/djk018.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2407/8/18/prepub
OG is a supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen). NP is a Henri Benedictus Fellow of the King Baudouin Foundation and the Belgian American Educational Foundation (B.A.E.F.). KE is supported by CoE EF/05/007 SymBioSys. BDM is supported by the Research Council KUL: GOA AMBioRICS, CoE EF/05/007 SymBioSys, IDO (Genetic networks), Flemish Government: FWO: PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0413.03 (inference in bioi), G.0388.03 (microarrays for clinical use), G.0499.04 (Statistics), G.0302.07 (SVM/Kernel); Belgian Federal Science Policy Office: IUAP P5/22 ('Dynamical Systems and Control: Computation, Identification and Modelling, 2002–2006); FP6-NoE Biopattern; FP6-IP e-Tumours, FP6-MC-EST Bioptrain.
The author(s) declare that they have no competing interests.
FDS, OG, NP, FA, BDM, DT and IV conceived the study and provided clinical and mathematical background. TVG looked up patient records in the database and tissue samples in the tumour bank, performed sample annotation and gathered follow-up of patients. KE performed pre-processing of the data sets. OG, FDS and NP performed the PCA and LS-SVM analysis, and the comparison with published gene set analysis. All authors contributed to the manuscript and approved it.
About this article
Cite this article
Gevaert, O., De Smet, F., Van Gorp, T. et al. Expression profiling to predict the clinical behaviour of ovarian cancer fails independent evaluation. BMC Cancer 8, 18 (2008). https://doi.org/10.1186/1471-2407-8-18