Does risk for ovarian malignancy algorithm excel human epididymis protein 4 and ca125 in predicting epithelial ovarian cancer: A meta-analysis

Backgrounds Risk for Ovarian Malignancy Algorithm (ROMA) and Human epididymis protein 4 (HE4) appear to be promising predictors for epithelial ovarian cancer (EOC), however, conflicting results exist in the diagnostic performance comparison among ROMA, HE4 and CA125. Methods Remote databases (MEDLINE/PUBMED, EMBASE, Web of Science, Google Scholar, the Cochrane Library and ClinicalTrials.gov) and full texts bibliography were searched for relevant abstracts. All studies included were closely assessed with the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2). EOC predictive value of ROMA was systematically evaluated, and comparison among the predictive performances of ROMA, HE4 and CA125 were conducted within the same population. Sensitivity, specificity, DOR (diagnostic odds ratio), LR ± (positive and negative likelihood ratio) and AUC (area under receiver operating characteristic-curve) were summarized with a bivariate model. Subgroup analysis and sensitivity analysis were used to explore the heterogeneity. Results Data of 7792 tests were retrieved from 11 studies. The overall estimates of ROMA for EOC predicting were: sensitivity (0.89, 95% CI 0.84-0.93), specificity (0.83, 95% CI 0.77-0.88), and AUC (0.93, 95% CI 0.90-0.95). Comparison of EOC predictive value between HE4 and CA125 found, specificity: HE4 (0.93, 95% CI 0.87-0.96) > CA125 (0.84, 95% CI 0.76-0.90); AUC: CA125 (0.88, 95% CI 0.85-0.91) > HE4 (0.82, 95% CI 0.78-0.85). Comparison of OC predictive value between HE4 and CA125 found, AUC: CA125 (0.89, 95% CI 0.85-0.91) > HE4 (0.79, 95% CI 0.76-0.83). Comparison among the three tests for EOC prediction found, sensitivity: ROMA (0.86, 95%CI 0.81-0.91) > HE4 (0.80, 95% CI 0.73-0.85); specificity: HE4 (0.94, 95% CI 0.90-0.96) > ROMA (0.84, 95% CI 0.79-0.88) > CA125 (0.78, 95%CI 0.73-0.83). Conclusions ROMA is helpful for distinguishing epithelial ovarian cancer from benign pelvic mass. HE4 is not better than CA125 either for EOC or OC prediction. ROMA is promising predictors of epithelial ovarian cancer to replace CA125, but its utilization requires further exploration.


Background
Ovarian cancer is the leading cause of death from gynecologic cancers in the United States and the fifth-top cause of cancer death in women (Link 1). Non-specific clinical manifestation mainly hinders the early diagnosis of ovarian cancer [1]. Cancer antigen 125 (CA125) was the only FDA-approved biomarker for ovarian cancer before the year 2008. CA125 is indicated for use as an aid in the detection of residual ovarian carcinoma in patients who have undergone first-line therapy and would be considered for diagnostic second-look procedures. Although the CA125 serum level elevated in 80% of epithelial ovarian cancer (EOC) patients with advanced stage [2], it increased in only 50% of patients with stage I EOC [3]. In addition, CA125 serum levels elevate in various benign gynecological diseases (including endometriosis) [4], non-gynecologic malignancies [5]. Therefore, considerable efforts are underway to identify new serum biomarkers, alone or combining with CA125 to improve EOC detection [6,7].
With high-throughput technologies employed, a large number of new biomarkers have been discovered [8][9][10]. Human epididymis protein 4 (HE4) is among the most promising ones [11]. High levels of HE4 are found in the serum of patients with EOC, especially in serous and endometroid cancers [12]. Unlike CA125, HE4 doesn't overexpress in endometriosis and other benign gynecological diseases [11]. And HE4, as an aid in monitoring recurrence or progressive disease in patients with epithelial ovarian cancer, has been the first biomarker for EOC after CA125 to be approved by the U.S. Food and Drug Administration (FDA) at the year of 2008. However, conflicts arise on the sensitivity of HE4 and CA125 [5,[13][14][15][16].
Moore and colleagues [17] have explored a multianalytes assay named the Risk of Ovarian Malignancy Algorithm (ROMA™), which combines the results of HE4 EIA (enzyme immunoassay), ARCHITECT CA 125 II™ and menopausal status into a numerical score to predict malignancy when an ovarian mass was found clinically. Although ROMA™ has received clearance from the FDA of U.S. in September of the year 2011, the diagnostic accuracy of ROMA compared to CA125 and HE4 alone is still controversial [13,[16][17][18]. Here we try to clarify conflicting results existing in the diagnostic accuracy of ROMA, and in the performance comparison among ROMA, HE4 and CA125.

Data sources and search strategy
We followed the Meta-analysis Of Observational Studies in Epidemiology (MOOSE) [19] and the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Link 2). MEDLINE (through PubMed interface), EMBASE, Web of Science, Google Scholar, the Cochrane Library and ClinicalTrials.gov (ended on 22 th December, 2011) were searched. Reference lists of articles identified were manually searched. Publication languages were not limited. The terminology for search was based on the standardized National Library of Medicine MeSH terms and free texts. The search strategies of all the databases were based on those of PubMed (Additional file 1: Table S1).
Two authors (RXT and WPL) independently screened the search results based on the titles and abstracts. The full text of selected articles were reviewed independently by another two authors (KC and LLY) to determine the inclusion. Disagreements were resolved by referring to a third author (MC).

Inclusion criteria
Studies that investigated both serum HE4 and CA125 as diagnostic tests or calculated the ROMA algorithm were included if (1) they were cross-sectional studies; and (2) performed in the same population presenting pelvic mass; (3) all serum specimens were collected preoperatively; (4) all subjects with histological diagnostic information; (5) with sufficient data for reconstructing fourfold table.
Studies recruiting participants without presenting pelvis mass, with obviously error data or ROC curve analysis containing healthy person and case-control studies were excluded. Case-control studies were excluded, for these studies had a tendency of overestimating or underestimating the diagnostic performance of a test [20].

Data extraction
The data extracted from each study included: author; year; country; design; recruitment; age; menopausal status; test methods (e.g. chemilumenesence immunoassay); number of patients; sensitivity; specificity and cut-off value. Four fold tables were reconstructed. Two reviewers (FKL and RXT) independently extracted the data for each study and referred to a third opinion (MC) when disagreements appeared. Important data that were not provided in the original studies were referred to their authors through Emails.

Index tests and reference standard
Since the Risk of Ovarian Malignancy Algorithm (ROMA™) is a qualitative serum test that combines the results of HE4 EIA (enzyme immunometric assays), ARCHITECT CA 125 II™ and menopausal status into a numerical score. Index tests for HE4 and CA125 in

Methodological quality assessment
The methodological quality of each study was evaluated with QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) [22] quality items. Overall scores were not helpful for interpreting study quality [23] and were avoided in studies evaluation by QUADAS-2 tool. Doubts were resolved by discussion. In the items of QUADAS-2, the blindness of index tests and reference test has been list, but not the blindness between index tests. So one item that focus on validity of this comparative question has been added in Risk of Bias part of Domain 2 (Index Test) in QUADAS-2 [22] as follows. "Were the results of index tests interpreted without knowledge of each other?" The answers (Yes, No or Unclear) of this question were considered to help assessing the Risk of Bias of including studies. According to the suggestion in Concerns Regarding Applicability part of Domain 2 (Index Test) in QUADAS-2 [22], variations in test technology, executing, or interpretation might affect estimates of the diagnostic accuracy of a test. If index test methods varied from those specified in the review question, concerns about applicability might exist.
Index tests for HE4 and CA125 in this meta-analysis questions were specified as EIAs and chemilumenesence immunoassays respectively. For tests of HE4, the chemilumenesence immunoassays were more sensitive than the specified EIAs, thus bias might be introduced into pooling of studies. And similarly, for CA125, EIA and RIA (radioimmunoassay) assays were less sensitive and steady than chemilumenesence immunoassays, so studies using either EIA or RIA will be considered as High Concern Regarding Applicability. The ROMA test employed the results from tests of CA125 and HE4 within the same study. So ROMA was considered as High Concern Regarding Applicability when either HE4 or CA125 test was evaluated as High Concern Regarding Applicability.

Data analysis plan
The statistical analysis is based on the following steps: (1) qualitatively describing the findings; (2) searching for heterogeneity and threshold effect; (3) figuring out the sources of heterogeneity by subgroup analysis; (4) choosing appropriate model and pooling estimates statistically. Univariate [24] and bivariate model [25] were two choices for diagnostic meta-analysis. When a positive correlation existed between true positive rate (TPR) and false positive rate (FPR), the bivariate analysis model was more appropriate [26].
Heterogeneity of studies were shown with forest graphs and explored with I 2 estimates [27]. The main  ? advantage of I 2 was inherent independence with the number of the studies included in the meta-analysis. I 2 estimates below 25% were regarded as low risk of heterogeneity, between 25% and 50% as moderate heterogeneity, and 50% or higher as high heterogeneity. If there was a low level heterogeneity, univariate metaanalysis model was used (Meta-DiSc software version 1.4 [28]). If there was a moderate to high heterogeneity, Spearman correlation coefficients was explored. Positive Spearman correlation coefficients between Logit(TPR) and Logit(FPR) denoted the presence of threshold effects (Meta-DiSc software version 1.4). Then a bivariate model as well as HSROC (Hierarchical Summary Receiver Operator Characteristics) were estimated and plotted; if negative, summary estimates were pooled without HSROC [24,29]; and if zero, summary estimates were pooled the way same as low level heterogeneity. Influence analysis reestimated the meta-analysis by omitting each study in turn (STATA version 10.0) to confirm the stability of our analysis model. Publication bias was investigated by Deek's funnel plot as well as asymmetry test [30]. Subgroups were analyzed hierarchically by menopausal status, FIGO stages and concern of methods of index tests. In some studies, patients with low malignant potential tumors (LMP) or borderline tumors (BL) were classified into EOC group. And these studies were specifically analyzed as subgroup EOC (LMP/BL). Subgroups with less than four studies were    analyzed with univariate model, because the bivariate model required 4 studies at least [26]. Summary estimates and 95% CIs (confidence intervals) for sensitivity, specificity, DOR, LR ± and AUC were calculated (STATA version 10.0 [31,32]). HSROC (Hierarchical summary receiver operating characteristic curves) plots were shown when appropriate. Comparisons between estimates of different tests were performed with z-test.

Search results
Of the 267 references identified from 6 databases, 11 articles [13-18,33-37] met the inclusion criteria and were included in meta-analysis ( Figure 1). Characteristics of the included studies were summarized (  [16,[33][34][35][36] with 883 patients compared the performance of HE4 and CA125 for OC prediction. Four studies [13,15,18,36] with 715 patients compared the performance of HE4 and CA125 for EOC prediction. And 3 studies [15,18,36] (482 patients) compared the performance among ROMA, HE4 and CA125 for EOC prediction. In all studies, the spectrum of patients was considered representative. All enrolled participants present pelvis mass of suspected ovarian origin, have never received any treatment before and plan to have a surgical intervention. The prevalence of proven ovarian cancer across all studies ranged from 7.86% to 63.1% (overall prevalence was 18.5% for EOC). The study of Holcomb and colleagues [14] had the lowest prevalence (7.86%) for only investigating the results of premenopausal women. respectively. CMIA, CLEIA and ECLIA belonged to chemilumenesence immunoassays, which were higher sensitive than EIA or RIA. According to Methodological quality assessment (the 4 th part of Methods section), HE4 tests with CMIA, CA125 tests with EIA and RIA were regarded as high Concern Regarding Applicability. The ROMA tests were considered as high Concern Regarding Applicability when either HE4 or CA125 test was evaluated as high Concern Regarding Applicability (Figure 2).
The appearance of the Deeks' funnel plot for ROMA on EOC detection was symmetrical (Additional file 2: Figure S1), and the funnel plot asymmetry test showed little sign of publication bias (regression coefficients was   Table S2).
Studies included also investigated the diagnostic value of HE4 and CA125 in early stage of EOC, as well as distinguishing EOC from benign pelvic mass in premenopausal and postmenopausal women. Because all these settings contained less than 3 studies, we didn't pool them as subgroups but summarized their sensitivity specificity with forest plots (Additional file 4: Figure S2).

Performance comparison among ROMA, HE4 and CA125 for EOC prediction
Three studies evaluated the performance of HE4, CA125 and ROMA for EOC detection ( Figure 12). All three groups (EOC-ROMA, EOC-HE4 and EOC-CA125) were pooled with univariate model (Figure 13 & Table 5).

Summary of main results
Our results found that, first, ROMA could help distinguish EOC from benign pelvic mass with a high diagnostic accuracy (AUC: 0.93). The ROMA has high sensitivity to predict advanced stage EOC than early Figure 11 Hierarchical summary receiver operating characteristic (HSROC) curves and results of bivariate analysis for HE4 and CA125 to predict OC. Results of bivariate analysis: estimates of each studies (the squares), the summary point (solid circle), 95% confidence region (the ellipse) and HSROC (solid line) for HE4 (black) and CA125 (red) were shown. Each study is represented by each square in the meta-analysis. The size of the square indicates the size of each study. stage EOC and in postmenopausal women than in premenopausal women. Second, although HE4 has higher specificity than CA125 for EOC monitoring, CA125 has better diagnosis accuracy (higher AUC) than HE4 for EOC or OC prediction. This is based on the results of 4 studies that compare HE4 and CA125 within the same population. Third, based on the results of comparison of HE4, CA125 and ROMA in the same population, the overall performance (AUC) of the three tests for EOC prediction are similar. ROMA is less specific but more sensitive than HE4, while both ROMA and HE4 are more specific than CA125 for EOC monitoring.
All studies included were subjected to close scrutiny with the QUADAS-2 tool, resulting in high quality across the items. Heterogeneity often existed in diagnostic metaanalysis [38], and mainly resulted from characteristics of the study population, variations in the study design, different statistical methods, and different covariates [39].
Within-study quality were highly concerned in this metaanalysis. Both high level of heterogeneity in sensitivity and specificity were found for ROMA test. The existence of threshold effect might partially explain the heterogeneity. Analysis of subgroups (EOC-methods high concern and EOC-methods low concern) found the EOC-methods High concern group had higher specificity than both EOCmethods Low concern group and EOC group.
In the current paper, only three studies evaluated the diagnostic value of ROMA at early stage of EOC. The early stage ovarian cancer usually presented non-specific clinical manifestation, and the FIGO staging by surgery often resulted in low prevalence of early stage EOC. So future clinical investigations will be promising and expectant to be prospective studies recruiting enough patients with early stage EOC.
We analyzed the predictive value of ROMA for patients with EOC, EOC(LMP/BL) and ovarian cancer. No differences were found in all summary estimates (except AUC between EOC and OC groups) of EOC, EOC (LMP/BL) and OC groups. Although EOC accounted for 90% of ovarian cancer, we didn't think ROMA could be expanded to predict ovarian cancer, for both HE4 and CA125 were biomarkers of epithelial ovarian cancer [2,11].
Cut-off values were variable for HE4 (70-150pM) and ROMA (preM: 7.4-13.1%; postM: 10.9-27.7%), but consistent for CA125 (35U/mL) across studies. Among the studies included, only one study [15] used specific cut-off values for premenopausal (70pM) and postmenopausal women (140pM). Studies found that HE4 levels in healthy subjects were associated with age [40,41]. So it would be essential to define a specific normal range and cut-off value for premenopausal and postmenopausal women respectively. For other two predictors ROMA and CA125, it would also be indispensable for each center to define their normal ranges and cut-off values.

Strengths and weaknesses
Except employing a comprehensive search strategy, strict inclusion criteria and sound analysis protocol, strengths of this paper also contain that only studies investigating both the two tests (HE4 and CA125) or all three tests (HE4, CA125 and ROMA) in a same population have been included in tests comparisons. The latter makes sure that the comparison takes place between studies under the same or similar population background, thus reduces the heterogeneity between studies [42].
The main limitations are: (1) unable to gain the unpublished paper. (2) Study number might be small. We believe that reliability of the meta-analysis are majorly dependent on the quality of studies included. (3) The  diagnostic value of ROMA, HE4 and CA125 in early stage EOC have not been convincingly analyzed.