- Research article
- Open Access
Development, validation and effectiveness of diagnostic prediction tools for colorectal cancer in primary care: a systematic review
BMC Cancer volume 20, Article number: 1084 (2020)
Tools based on diagnostic prediction models are available to help general practitioners (GP) diagnose colorectal cancer. It is unclear how well they perform and whether they lead to increased or quicker diagnoses and ultimately impact on patient quality of life and/or survival. The aim of this systematic review is to evaluate the development, validation, effectiveness, and cost-effectiveness, of cancer diagnostic tools for colorectal cancer in primary care.
Electronic databases including Medline and Web of Science were searched in May 2017 (updated October 2019). Two reviewers independently screened titles, abstracts and full-texts. Studies were included if they reported the development, validation or accuracy of a prediction model, or assessed the effectiveness or cost-effectiveness of diagnostic tools based on prediction models to aid GP decision-making for symptomatic patients presenting with features potentially indicative of colorectal cancer. Data extraction and risk of bias were completed by one reviewer and checked by a second. A narrative synthesis was conducted.
Eleven thousand one hundred thirteen records were screened and 23 studies met the inclusion criteria. Twenty-studies reported on the development, validation and/or accuracy of 13 prediction models: eight for colorectal cancer, five for cancer areas/types that include colorectal cancer. The Qcancer models were generally the best performing.
Three impact studies met the inclusion criteria. Two (an RCT and a pre-post study) assessed tools based on the RAT prediction model. The third study looked at the impact of GP practices having access to RAT or Qcancer.
Although the pre-post study reported a positive impact of the tools on outcomes, the results of the RCT and cross-sectional survey found no evidence that use of, or access to, the tools was associated with better outcomes. No study evaluated cost effectiveness.
Many prediction models have been developed but none have been fully validated. Evidence demonstrating improved patient outcome of introducing the tools is the main deficiency and is essential given the imperfect classification achieved by all tools. This need is emphasised by the equivocal results of the small number of impact studies done so far.
Colorectal cancer is the third most frequent cancer and the second leading cause of cancer-related death in the world . In 2014–2016 there were 42,042 new cases of colorectal cancer in the UK, with 57% of people with colorectal cancer surviving for 10 years or more .
Research suggests that cancer prognosis can be improved by reducing the time to diagnosis , as earlier diagnosis is associated with earlier stage at diagnosis , and earlier treatment is associated with improved survival . Reducing time to diagnosis also has the potential to reduce presentation via emergency admissions, and prevent the poorer survival associated with that route of diagnosis . A national cancer screening programme exists in the National Health Service (NHS) for colorectal cancer, and the National Awareness and Early Diagnosis Initiative (NAEDI) (to increase public awareness on the signs and symptoms of cancer ) is intended to improve early diagnosis. However, as many individuals go through primary care as a route for diagnosis , so efforts here could improve cancer survival.
Cancer diagnosis in primary care is not straightforward. Symptoms of cancer are commonly seen but mostly have non-cancer origins . Of those individuals referred from primary care via the two-week wait (2WW) referrals for suspected colorectal cancer in areas of England, approximately 5–8% were ultimately diagnosed with cancer [9, 10]. The type and presence of symptoms can vary greatly  and it is not surprising that patients can have multiple general practitioner (GP) consultations before being referred, especially for those cancers that have less well-known signs and symptoms . Thus, tools to help improve cancer diagnosis in primary care have great potential to impact on diagnoses and subsequent treatment options, leading to better outcomes for patients.
Diagnostic prediction models combine multiple predictors, such as symptoms and patient characteristics, to obtain the risk of the presence or absence of a disease within an individual patient [13, 14]. These prediction models can then be used to develop diagnostic tools (such as a website risk calculator, or mouse mat containing estimates of risk depending on features) to assist doctors in estimating probabilities and potentially influence their decision making . To evaluate diagnostic prediction models, there are three important stages, or types of studies: prediction model development, prediction model validation, and assessment of the impact of prediction models in practice (generally implemented as diagnostic tools). The first two are often conducted as part of the same study, and are generally evaluated using a single cohort design. These types of studies are commonly found in the diagnostic prediction literature, with some studies also reporting results of an external validation . To assess the impact of the prediction model (the third stage), comparative studies are required to evaluate the ability of the tool to guide patient management. However, very few diagnostic prediction models that are developed go on to be evaluated for their clinical impact  or cost-effectiveness.
Tools currently available to GPs in the UK to help cancer diagnosis, beyond the National Institute for Health and Care Excellence (NICE) guidelines for suspected cancer referral , are based on diagnostic prediction models, and are integrated into GP software systems.
The Risk Assessment Tool (RAT) developed by Hamilton and colleagues which provides estimates of cancer risk for 17 cancers based on symptoms alone is integrated into Vision (INPS), and
The Qcancer tool, which estimates the risk of 11 cancers based on symptoms and patient characteristics, and overall cancer risk in males and females, is integrated into EMIS Web.
There is recent evidence that these tools are being used in primary care , however it is unclear whether these tools impact on GP decision-making, and ultimately on patient outcomes.
Systematic reviews have looked at the use of prediction models for colorectal cancer in primary and secondary care . However, more research in the primary care setting had been published for colorectal cancer since, so we sought to systematically review this evidence. The aim of our review was to identify reports on the development, validation or accuracy of prediction models, as well as evidence evaluating the impact (i.e. effectiveness or cost-effectiveness) of symptom-based diagnostic tools that could be used to inform colorectal cancer diagnosis decision-making in primary care.
This systematic review was conducted as part of a wider programme considering risk assessment tools for any cancer site . Protocols relevant to the systematic review described here were registered on PROSPERO (CRD42017068373, CRD42017068375).
The systematic review was conducted in accordance with good practice guidelines  and is reported here in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines .
Bibliographic searches of relevant databases (Medline, Medline in Process, Embase, Cochrane, Web of Science), were conducted in May 2017 and updated in October 2019.
The search strategies were developed by an information specialist (SR) and comprised terms for cancer, terms for primary care, terms for decision support tools and terms for diagnosis (see.supplementary Table S1). No date, language or other limits were used. Search filters for clinical prediction models were investigated but none were thought to be fully tested or reliable. A balance was sought between sensitivity of search results and volume of papers to screen. As the search strategies were originally developed to identify reports related to prediction models for any cancer site , no cancer site specific terms were used. Instead, we retrospectively excluded non-colorectal cancer studies from the current systematic review.
The search results were exported to Endnote X7 (Thomson Reuters, NY, USA) and de-duplicated using automatic and manual checking.
Additional searches were conducted using Scopus (Elsevier) on the references, as well as any citations of the items included after full-text screening, in order to identify additional relevant studies. Searches were also conducted for identified named tools (QCancer, RAT, CAPER, Bristol-Birmingham equation) in order to ensure search results were sufficiently comprehensive.
Inclusion and exclusion criteria
Diagnostic prediction models are defined as multivariate statistical models that predict the probability or risk that a patient currently has cancer based on a combination of known features of that patient, such as symptoms, signs, test results and patient characteristics . Symptoms could be self-reported by the patient, or prompted by physician’s questioning. Signs and test results are identified within primary care via routine testing (such as full blood count, urine dipstick testing, clinical signs), as are patient characteristics (socio-demographic variables, personal and family history). Studies that simply looked at ‘red-flag symptoms’ or symptom lists and (weighted) scores that did not provide a numerical risk of current cancer were excluded. Models developed with secondary care data (i.e. referred patients) were only included if an attempt was made to validate the models with primary care data.
Inclusion and exclusion criteria are presented in Table 1.
Selection of studies
Titles and abstracts were screened for relevance independently (by BG and RL), and any disagreements were resolved by consensus. Pilot screening was undertaken for the first 100 hits to ensure both reviewers were interpreting the inclusion and exclusion criteria in the same way. Articles retained were obtained in full and further screened independently by the two reviewers. For any disagreements that were not resolved, a third reviewer (CH) made the final decision.
The development and validation aspects of particular prediction models were often reported in multiple studies (e.g. the development and internal validation of the Q cancer prediction model was presented in one paper by Hippisley-Cox and colleagues, 2012  and the external validation in a separate paper (Collins and colleagues, 2012 ) All studies related to the same specific prediction model were collated regardless of whether they refer to the development, validation and/or impact of that tool.
To extract relevant data from each included study, standardised data extraction forms were used that evolved following piloting and discussion among reviewers. One reviewer (BG) extracted the data, which was checked by a second reviewer (RL). The following data were extracted from all study types: included cancer type(s), study design, country, sample size, participant recruitment (with inclusion and exclusion criteria) and participant characteristics. For studies reporting on the development and/or validation of prediction models an adaptation of the CHARMS checklist (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies)  was used to extract additional relevant data, including data source, number of participants with specific cancer, features of the model (what symptoms, test results, patient demographics etc. are included), how features are defined and measured, definition of primary and secondary outcomes, how and when outcomes are assessed, main results (including model performance, validation and estimates of risk), features included in final model. For studies reporting the impact of tools based on prediction models additional items extracted included characteristics of the tool (including whether based on symptoms alone or other features in addition to symptoms), definition of outcomes, main results including confidence intervals, and subgroup analyses, where available.
Risk of bias assessment
Risk of bias of studies reporting the development and/or validation of prediction models was assessed with the PROBAST  (Prediction model Risk of Bias ASsessment Tool) checklist. The derived checklist assesses the risk of bias and applicability of prediction-modelling studies on 5 domains: participant selection, predictors, outcome, sample size and missing data, analysis.
For studies reporting on the impact of tools based on decision models a risk-of-bias form based on the Cochrane EPOC (Effective Practice and Organisation of Care) group recommendations  was used. All risk of bias assessments were conducted by one reviewer (BG) and checked by a second reviewer (RL).
Owing to the heterogeneity between included studies a narrative review of the studies was conducted.
Search phrases were finalised and searches were run in May 2017. A total of 9352 records were obtained through database searching. Additional reference and citation searches on tool names resulted in another 4171 records. After de-duplication, 9780 records were obtained. The database searches were updated in October 2019, and resulted in 2254 additional new records (after de-duplication). After screening the title and abstracts of these records independently by two reviewers, 260 records were retained for full text screening.
We identified two systematic reviews. Scanning their reference list led to the inclusion of two additional studies not found in the database search. One systematic review also included validation of models . In the end, 23 records were identified that were relevant for colorectal cancer (Fig. 1).
Discussions with collaborators led to the identification of relevant grey literature, but no such studies were deemed eligible for inclusion.
Elias and colleagues (2017)  aimed to identify and validate published diagnostic models to safely reduce unnecessary endoscopy referrals for colorectal cancer. A systematic review of the literature was undertaken up until 2015 and identified models were validated using a cross-sectional Dutch dataset referred to as CEDAR (n = 810). The definition of model used by Elias and colleagues is very broad and includes guidelines and weighted scores. Therefore, although Elias and colleagues identified 18 models, only four are relevant to our review: Fijten and colleagues (1995)  and Marshall and colleagues (2011)  which were identified from our searches, while Muris and colleagues (1995)  and Nørrelund and colleagues (1996)  are new inclusions. Due to the fact that Elias and colleagues attempted to validate the models they found, their validation of these four models is included in the results below.
Of the 20 included model development or validation studies, 17 report on the development (with some also reporting on validation) of models, four only report model validation.
The included studies (excluding the validation by Elias and colleagues ) reported on 13 different prediction models. Eight models are specifically for colorectal cancer: the Bristol-Birmingham equation (Marshall ), a Dutch model (Fijten ), a machine learning algorithm (Kop ), a Danish model (Nørrelund ), Qcancer (Hippisley-Cox ), RAT 2005 (Hamilton ), RAT 2009 (Hamilton ) and RAT 2017 (Stapley ). One model relates to metastatic cancer (RAT, Hamilton ), and the remaining four models cover multiple cancer sites which include colorectal cancer: Qcancer for males (Hippisley-Cox ), Qcancer for females (Hippisley-Cox ), a model for abdominal complaints (Muris ), and a model for abdominal cancers (Holtedahl 2018 ). Elias  and Collins  reported on the validation of one or more of the above models.
Table 2 provides a brief description of the models, their stages of development, the cancer sites covered (colorectal cancer-specific or other) and study designs.
The risk prediction models referred to as RATs [33, 35, 36, 43, 44] were designed to be used with patients presenting to primary care with “low-risk-but-not-no-risk symptoms” . Early versions of RATs were developed using case–control data from Devon, UK as part of the CAPER studies . Later models were derived using UK-wide primary care data – the Clinical Practice Research Datalink (formerly General Practice Research Database) [35, 44, 46,47,48,49,50,51], and The Health Improvement Network (THIN) database [43, 52]. In addition to the models identified in this systematic review as relevant to colorectal cancer, RATs exist for the following cancer sites: lung, ovarian, kidney, bladder, pancreas, breast, uterine, brain, prostate, Hodgkin lymphoma, non-Hodgkin lymphoma and multiple myeloma. The RATs are available as prints on common office objects (e.g. mousepads) and are integrated into general practitioner software in the form of the electronic Cancer Decision Support (eCDS). Regardless of the format, they provide risk estimates for patients with single symptoms of possible cancer, pairs of symptoms and repeat attendances with the same symptoms. Elias used a Dutch dataset to externally validate the 2005 colorectal version of RATs . No other RAT was externally validated.
The QCancer series of models can be used both in symptomatic (diagnostic models) and asymptomatic (prognostic models) patients . QCancer was developed in the QRESEARCH database, a large database comprising over 12 million anonymised health records from 602 general practices throughout the United Kingdom using the EMIS (Egton Medical Information Systems) computer system. Initially, several models were developed for each cancer type in symptomatic populations, in addition to colorectal: lung, renal, gastro-oesophageal, pancreatic and ovarian cancer. An updated approach incorporates multiple risk factors and symptoms into one model for males and one model for females to predict cancer risk. Most of these models have been externally validated in UK-wide populations (e.g. THIN database ). QCancer is available as an online calculator (www.qcancer.org), which provides estimates of absolute risk of any cancer with a breakdown of type of cancer based on both risk factors such as age, gender and family history, which increase the likelihood of cancer, and risk markers such as haemoptysis or features, usually symptoms (e.g. weight loss), suggesting that cancer is already present.
Marshall and colleagues (2011) used data from the THIN dataset (> 40,000 participants) to construct a model for colorectal cancer, known as the Bristol-Birmingham equation . The model was validated by Marshall et al. using the UK CAPER dataset and was also validated by Elias et al. (26) in a Dutch population. Data from 290 patients presenting to GPs in the Netherlands with rectal bleeding (from 1988 to 1990) were used by Fijten and colleagues (1995)  to develop a prediction model for colorectal cancer (Netherlands model). The Netherlands model was validated by Hodder and colleagues (2005)  using secondary care data from the UK, and by Elias and colleagues (2017)  using a Dutch dataset. Kop and colleagues (2015) [32, 41, 42] used a machine learning algorithm to develop a prediction model for colorectal cancer using electronic records of almost 220,000 patients from two GP practices in the Netherlands. We found no external validation of this model. A Danish colorectal model  has also been developed for use in primary care, this was externally validated using a Dutch dataset by Elias and colleagues (2017) .
Holtedahl and colleagues (2018)  detail the development of a prediction model for abdominal cancers. These are defined as all cancers of the digestive organs, female genital organs and urinary organs (including testis). Data on 61,802 patients, recorded during GP consultations over a 10 day period from Norway, Denmark, Sweden, Scotland, Belgium, and the Netherlands, were used to develop the model. No validation of the model was identified.
The models are in various stages of development. A total of 5 models (or versions of models) have only assessed apparent performance [35, 36, 39, 41, 43], two models have been internally validated (Qcancer for males and Qcancer for females), one model was updated as a result of using a different data source . One of the four Qcancer versions , one RAT version  and four of the other prediction models [28,29,30,31] have been externally validated, the highest level of evidence identified in this systematic review. Apart from the two Qcancer versions, which were externally validated by Collins and Altman, all other external validations were conducted by Elias et al. . This was a systematic review which used a cross-sectional Dutch dataset referred to as CEDAR (n = 810) to validate the models they identified.
All of the models were developed in primary care settings in Europe. Only five models were not derived from UK-only data: Fijten and colleagues (1995) , Kop and colleagues (2015) , and Muris and colleagues (1995)  were developed in the Netherlands, Nørrelund and colleagues (1996)  was developed in Denmark, and Holtedahl which used data from Norway, Denmark, Sweden, Scotland, Belgium and the Netherlands. For those models having been externally validated, most were validated in the country in which it was developed except for: the validation  of the Netherlands colorectal cancer model  in a UK population, the validation of the Danish colorectal cancer  in a Dutch population  and the validation of the colorectal version of RATs (UK)  in a Dutch population .
The assessment of risk of bias is summarised in Table 3, and given in more detail in supplementary Table S3. Note that for the RATs and Qcancer models, only one entry each is shown as all versions of the RAT or Qcancer model scored the same for each aspect of the risk of bias tool used. Qcancer development and validation studies were judged to be of low risk of bias. For the RAT development studies, there is uncertainty as to the risk of bias for how predictors and sample size and participants were dealt with, and a high risk of bias concerning the analysis. For the development of the other models, risk of bias was variable across all domains, although most models have a low risk of bias with respect to how outcomes are dealt with.
The external validation of the colorectal cancer RAT, and of many of the other models by Elias was judged to be of uncertain risk of bias for how sample size and patient flow was dealt with, and how analyses were conducted.
Overall, apart from the Qcancer studies, the risk of bias of the development and validation studies is mixed and/or uncertain.
Performance of the models
As with many systematic reviews of prediction models, we found a mix of outcomes reported on the different models. The most widely reported outcome was the area under the curve (AUC). AUC estimates were calculated from external datasets for seven of the 13 models (Table 4). As some authors reported AUCs based on the model derivation dataset, in Table 4 we distinguishing between whether the reported AUC is estimated using the derivation dataset, or the external dataset. Note that for the remaining six models, which includes three of the RATs, we could find no external validation of any kind.
The Qcancer models are associated with the highest estimated AUC value from external validation: 0.92 (0.91, 0.92) and 0.91 (0.90, 0.92) for the male and female versions of the colorectal Qcancer model.
The Bristol-Birmingham equation was also associated with a high AUC value for external validity, but only in one of two studies. The two AUCs from external validation of the Bristol-Birmingham equation differ, with the AUC estimate from the UK CAPER dataset being much higher (0.92 (0.91, 0.94)) than that from the external validation using the Dutch CEDAR dataset (0.84 (0.77, 0.90)) or the derivation dataset (0.83 (0.82, 0.84)) [27, 29].
The Netherlands model for colorectal cancer was associated with the highest AUC score for internal validation (0.97), but this was not replicated when the model was used in a different population. The AUC value was much lower in both external validation studies, using either secondary care data from the UK (0.78 (0.74, 0.81)) or Dutch dataset (0.72 (0.62, 0.81)).
The remaining models are estimated to have mean AUCs between 0.6 and 0.8, with the Danish model for colorectal cancer and the Muris abdominal complaints model being the two lowest performing models. The only RAT for which an AUC is reported is for the 2005 version of the colorectal model from Elias , and is much lower than those from the Qcancer models, 0.81 (0.75, 0.88).
Estimates of NPV, PPV, sensitivity and specificity are available from the external validations by Elias of the Bristol-Birmingham equation , the models by Fijten , Nørrelund , and Muris  and the 2005 colorectal RAT  . Collins and Altman  also report these estimates for validation of the colorectal Qcancer model (see supplementary Table S2). The (male and female) colorectal Qcancer models are the only models to have estimates of sensitivity > 0.9 and specificity > 0.7. The 2005 colorectal RAT has a reported sensitivity of 0.95 and specificity of 0.45. The other four models (Bristol-Birmingham, Fijten, Nørrelund and Muris) all have high sensitivity (> 0.95), but very low specificity: 0.06 for Nørrelund to 0.36 for the Bristol-Birmingham equation. Marshall , Holtedahl , Hamilton , Hamilton  and Stapley  also report likelihood ratios (LRs), see Supplementary Table S2c. Marshall  report a LR of 14.7 for the Bristol-Birmingham equation, while the other 4 studies only report LRs for individual symptoms included in the model. These range from < 2 for some symptoms in the model reported in Hamilton  to > 30 for rectal bleeding in the model reported by Stapley .
Three studies were identified that attempted to evaluate the impact of tools based on diagnostic prediction models used in practice: a cross-sectional survey , a pre-post study  and a randomised controlled trial . The RCT and pre-post studies evaluated the use of a combination of tools which included RATs for colorectal cancer. The cross-sectional survey by Price  evaluated the impact of GP practice access to RAT and/or Qcancer, see Table 5.
Price and colleagues  compared UK practice-level 2WW referral rates between GP practices that reported access to RAT and/or Qcancer, with practices that reported no access to these two tools. The tools included Qcancer and RAT for any cancer, and the analyses were not restricted to colorectal cancer.
Hamilton and colleagues (2013)  investigated the number of times two RATs  – one for lung and one for colorectal cancer – were used, together with the number of subsequent referrals and investigations, before and 6 months after the introduction of the tools in general practice in the UK.
Emery and colleagues (2017)  evaluated the impact of two complex interventions in rural Australia – a GP intervention and a cancer awareness campaign – in a 2 × 2 design trial, compared to control groups. The GP intervention consisted of an “education resource card” that included RATs for colorectal, lung and prostate cancer, together with summaries of relevant guidelines for colorectal, lung and prostate cancer, with the addition of guidelines for breast cancer and training on the use of these resources. The RATs were based on diagnostic prediction models developed using a patient cohort from the UK . Emery and colleagues (2017)  used the total diagnostic interval (TDI), i.e. the time from first symptom to cancer diagnosis, as an outcome measure.
The RCT by Emery was found to be at low risk of bias (see Table 6). Given the observational nature of the studies by Hamilton and Price , there are a number of concerns regarding their risk of bias.
Emery and colleagues  did not find significant differences in the median or log-transformed (ln) mean time to diagnosis at either intervention level (community intervention vs control, GP intervention vs control) or when analysed by factorial design, tumour group or sub-intervals of the TDI.
Hamilton and colleagues (2013)  reported on changes in investigations carried out and rapid referrals before and after the introduction of the tools. They found a 26% increase in referrals for colorectal cancer and a 15% increase in GP requests for colonoscopies after introduction of the tools. However, only absolute numbers are reported, without data on total numbers of patients and GP visits, or the appropriateness of the referral.
Price and colleagues  did not find any differences in mean 2WW referral rates between practices reporting access to cancer decision-making tools and those who did not: mean difference in referral rate of 3.1 per 100,000 population (95% CI of − 5.5, 11.7). As the study considered RATs and Qcancer for any suspected cancer and 2WW referral rates for any cancer, the specific impact of colorectal cancer-relevant RATs or Qcancer tools on referrals for colorectal cancer cannot be evaluated.
Study results are summarised in Table 7.
This review summarised existing evidence on development, validation, accuracy and impact of prediction models developed to help diagnosis of colorectal cancer in primary care. A large number of prediction models were identified consisting of one-off models and models from the RAT and Qcancer series. Validation and impact assessment of these models in appropriate settings is currently limited, and we found no economic evaluations of any tools.
Currently, most research on developing symptom-based colorectal cancer risk prediction models is concentrated in Europe and, in particular, the UK. Qcancer and RAT are the dominant prediction models, and highlight important knowledge gaps: the Qcancer models are developed on higher quality data (cohort data) than the RATs, and have been externally validated, but lack specific impact assessment. In contrast, the RAT models have more evidence of impact in practice, but were developed from case-control studies and have limited external validation. Ideally, this is an area for further development of the RATs, and the other models that had not been externally validated. This lack of evaluation seems consistent with prediction models in other disease areas .
Other systematic reviews have looked at feature-based cancer diagnostic tools in primary care. Williams and colleagues (2016)  conducted a systematic review of studies that described, validated or assessed the impact of colorectal cancer diagnostic tools. They identified reports on the development and/or validation of 15 models: nine relevant to primary care and six for secondary care. They also identified one study looking at referral patterns (for colorectal cancer RAT ). However, they did not identify any studies that tested whether patients who were diagnosed with the aid of the tool fared better than those who were diagnosed without it. In a similar review, looking at risk prediction models for screening, Usher-Smith and colleagues (2015)  concluded that, even though some of the colorectal cancer prediction models had potential for clinical application, there remains considerable uncertainty about their clinical utility. Similarly, Schmidt-Hansen and colleagues (2017)  conducted a review of lung cancer tools and found limited evidence to support the recommendation of any of the identified risk prediction tools, due to lack of external validation or cost impact assessment.
Our systematic review identified two impact studies, published after the review by Williams et al. , both of which indicating little evidence of an impact from using these tools in primary care. However, it is still difficult to conclude whether these tools have any impact on patient outcomes. For instance, concerns on the quality of the studies makes it unclear whether the lack of effect was due to poor implementation of the tools in practice, insufficient uptake by the GPs or limited marginal contribution of the tools in assessing the risk of cancer. The best quality study (Emery and colleagues 2017 ) failed to show a significant effect; however, the composite intervention used, combining older versions of several instruments (developed on populations from a different country), could have limited the effectiveness of the diagnostic tools. Thus, there is still a need for good quality studies to examine the impact of using prediction model based tools to help colorectal cancer diagnosis in primary care.
Only prediction models were included in our systematic review. Other aids, such as algorithms or guidelines may be useful, but were excluded from this review. However, the systematic review by Elias et al.  had a much broader inclusion criteria for “model”. The review found a previous version of the NICE guidelines to be the best performing (when validated against the CEDAR dataset). Importantly, this review did not include any of the Qcancer models, which are associated with AUCs greater than those reported for the NICE guidelines.
The systematic review followed a pre-specified protocol, and the team conducting the review are independent and experienced in systematic review methodology.
Our findings are limited by the quality of the studies included in the systematic review, in particular, among the limitations of the impact studies were lack of randomisation, lack of patient-related outcomes and use of tools on populations they were not developed for (e.g. use of a UK-developed tool on an Australian population). The outcome measures used by some of the impact studies make it difficult to interpret reports of an increase in referral rate without including reasonable assessment of the appropriateness of the referral or subsequent impact on cancer vs non-cancer diagnosis.
Current evaluations provide limited evidence of the impact on patient outcomes of using feature-based cancer diagnostic tools in primary care. The lack of robust effectiveness data is also likely to be a major limiting factor in assessing their cost-effectiveness. More research is needed to externally validate prediction models that could be used as tools, as well as more research on the impact of using these tools in clinical practice. However, choice of study design and outcomes for future evaluations of the impact of tools, may not be straightforward. Practical reasons may highlight the potential need for a cluster and pragmatic trial design. Arguably, by comparing average times to diagnosis, patients not prioritised for quick referrals are less at risk of being missed. The debate, however, is ongoing on the most appropriate outcomes for evaluating interventions to improve cancer diagnosis and referral.
Availability of data and materials
The data that support the findings of this study are available within the article or its supplementary materials.
Area under the curve
Cancer Prediction in Exeter
Cost-Effectiveness of a Decision rule for Abdominal complaints in Primary care
CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies
Electronic Cancer Decision Support
Egton Medical Information Systems
Effective Practice and Organisation of Care
National Institute for Health and Care Excellence
Reporting Items for Systematic Reviews and Meta-Analyses
Risk Assessment Tool(s)
Randomised controlled trial
Total diagnostic interval
The Health Improvement Network
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424.
Ades AE, Biswas M, Welton NJ, Hamilton W. Symptom lead time distribution in lung cancer: natural history and prospects for early diagnosis. Int J Epidemiol. 2014;43(6):1865–73.
Cole SR, Tucker GR, Osborne JM, Byrne SE, Bampton PA, Fraser RJ, Young GP. Shift to earlier stage at diagnosis as a consequence of the National Bowel Cancer Screening Program. Med J Aust. 2013;198(6):327–30.
Richards MA, Westcombe AM, Love SB, Littlejohns P, Ramirez AJ. Influence of delay on survival in patients with breast cancer: a systematic review. Lancet. 1999;353(9159):1119–26.
Elliss-Brookes L, McPhail S, Ives A, Greenslade M, Shelton J, Hiom S, Richards M. Routes to diagnosis for cancer–determining the patient journey using multiple routine data sets. Br J Cancer. 2012;107(8):1220–6.
Richards M. The national awareness and early diagnosis initiative in England: assembling the evidence. Br J Cancer. 2009;101(Suppl 2):S1.
National Institute for H, Care E. Suspected cancer: recognition and referral. In: NICE guidelines NG12; 2015.
Jones CP, Fallaize RC, Longman RJ. Updated ‘two-week wait’referral guidelines for suspected colorectal cancer have increased referral volumes without improving cancer detection rates. Br J Med Pract. 2019;12(2):a012.
Vulliamy P, McCluney S, Raouf S, Banerjee S. Trends in urgent referrals for suspected colorectal cancer: an increase in quantity, but not in quality. Ann R Coll Surg Engl. 2016;98(8):564–7.
Hiom S. Diagnosing cancer earlier: reviewing the evidence for improving cancer survival. Br J Cancer. 2015;112:S1–5.
Lyratzopoulos G, Neal RD, Barbiere JM, Rubin GP, Abel GA. Variation in number of general practitioner consultations before hospital referral for cancer: findings from the 2010 National Cancer Patient Experience Survey in England. Lancet Oncol. 2012;13(4):353–65.
Hendriksen JM, Geersing G-J, Moons KG, de Groot JA. Diagnostic and prognostic prediction models. J Thromb Haemost. 2013;11:129–41.
Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–W73.
Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, Riley RD, Hemingway H, Altman DG, Group P. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381.
Price S, Spencer A, Medina-Lara A, Hamilton W. Availability and use of cancer decision-support tools: a cross-sectional survey of UK primary care. Br J Gen Pract. 2019;69(684):e437–43.
Williams TGS, Cubiella J, Griffin SJ, Walter FM, Usher-Smith JA. Risk prediction models for colorectal cancer in people with symptoms: A systematic review. BMC Gastroenterol. 2016;16(1).
Medina-Lara A, Grigore B, Lewis R, Peters J, Price S, Landa P, Robinson S, Neal R, Hamilton W, Spencer A. Understanding the effectiveness, cost-effectiveness and current use of cancer diagnostic tools to aid decision-making in primary care. In: Health Technology Assesment: National Institute for Health Research; 2020. https://www.journalslibrary.nihr.ac.uk/programmes/hta/161204/#/.
Centre for R. Dissemination: Systematic Reviews: CRD's guidance for undertaking reviews in health care. York: Centre for Reviews and Dissemination; 2009.
Moher D, Liberati A, Tetzlaff J, Altman DG. The PG: Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097.
Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, Altman DG, Moons KGM. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9(5):e1001221.
Hippisley-Cox J, Coupland C. Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract. 2012;62(594):e29–37.
Collins GS, Altman DG. Identifying patients with undetected colorectal cancer: an independent validation of QCancer (colorectal). Br J Cancer. 2012;107(2):260–5.
Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, Reitsma JB, Collins GS. Critical appraisal and data extraction for systematic reviews of prediction Modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744.
Wolff R, Whiting P, Mallett S. PROBAST: a risk of bias tool for prediction modelling studies. In: Cochrane Colloquium Vienna, vol. 2015; 2015.
Higgins JPT, Altman DG, Sterne JAC: Chapter 8: Assessing risk of bias in included studies. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1. The Cochrane Collaboration 2011. 0 [updated March 2011].
Elias SG, Kok L, Witteman BJ, Goedhard JG, Romberg-Camps MJ, Muris JW, de Wit NJ, Moons KG. Published diagnostic models safely excluded colorectal cancer in an independent primary care validation study. J Clin Epidemiol. 2017;82:149–57 e148.
Fijten GH, Starmans R, Muris JW, Schouten HJ, Blijham GH, Knottnerus JA. Predictive value of signs and symptoms for colorectal cancer in patients with rectal bleeding in general practice. Fam Pract. 1995;12(3):279–86.
Marshall T, Lancashire R, Sharp D, Peters TJ, Cheng KK, Hamilton W. The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut. 2011;60(9):1242–8.
Muris JW, Starmans R, Fijten GH, Crebolder HF, Schouten HJ, Knottnerus JA. Non-acute abdominal complaints in general practice: diagnostic value of signs and symptoms. Br J Gen Pract. 1995;45(395):313–6.
Nørrelund N, Nørrelund H. Colorectal cancer and polyps in patients aged 40 years and over who consult a GP with rectal bleeding. Fam Pract. 1996;13(2):160–5.
Kop R, Hoogendoorn M, Teije AT, Büchner FL, Slottje P, Moons LMG, Numans ME. Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records. Comput Biol Med. 2016;76:30–8.
Hamilton W, Round A, Sharp D, Peters TJ. Clinical features of colorectal cancer before diagnosis: a population-based case-control study. Br J Cancer. 2005;93(4):399–405.
Hamilton W. The CAPER studies: five case-control studies aimed at identifying and quantifying the risk of cancer in symptomatic primary care patients. Br J Cancer. 2009;101:S80–6.
Stapley SA, Rubin GP, Alsina D, Shephard EA, Rutter MD, Hamilton WT. Clinical features of bowel disease in patients aged <50 years in primary care: a large case-control study. Br J Gen Pract. 2017;67(658):e336–44.
Hamilton W, Barrett J, Stapley S, Sharp D, Rose P. Clinical features of metastatic cancer in primary care: a case-control study using medical records. Br J Gen Pract. 2015;65(637):e516–22.
Hippisley-Cox J, Coupland C. Symptoms and risk factors to identify men with suspected cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract. 2013;63(606):e1–e10.
Hippisley-Cox J, Coupland C. Symptoms and risk factors to identify women with suspected cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract. 2013;63(606):e11–21.
Holtedahl K, Hjertholm P, Borgquist L, Donker GA, Buntinx F, Weller D, Braaten T, Månsson J, Strandberg EL, Campbell C. Abdominal symptoms and cancer in the abdomen: prospective cohort study in European primary care. Br J Gen Pract. 2018;68(670):e301–10.
Hodder RJ, Ballal M, Selvachandran S, Cade D. Pitfalls in the construction of cancer guidelines demonstrated by the analyses of colorectal referrals. Ann R Coll Surg Engl. 2005;87(6):419–26.
Kop R, Hoogendoorn M, Moons LMG, Numans ME, ten Teije A. On the advantage of using dedicated data mining techniques to predict colorectal cancer. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9105; 2015. p. 133–42.
Hoogendoorn M, Szolovits P, Moons LMG, Numans ME. Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif Intell Med. 2015;69:53–61.
Hamilton W, Lancashire R, Sharp D, Peters TJ, Cheng K, Marshall T. The risk of colorectal cancer with symptoms at different ages and between the sexes: a case-control study. BMC Med. 2009;7:17.
Stapley S, Peters TJ, Neal RD, Rose PW, Walter FM, Hamilton W. The risk of oesophago-gastric cancer in symptomatic patients in primary care: a large case-control study using electronic records. Br J Cancer. 2013;108(1):25–31.
Hamilton W. Cancer diagnosis in primary care. Br J Gen Pract. 2010;60(571):121–8.
Shephard E, Neal R, Rose P, Walter F, Hamilton WT. Clinical features of kidney cancer in primary care: a case-control study using primary care records. Br J Gen Pract. 2013;63(609):e250–5.
Shephard EA, Hamilton W, Neal RD, Rose PW, Walter FM. Symptoms of adult chronic and acute leukaemia before diagnosis: large primary care case-control studies using electronic records. Br J Gen Pract. 2016;66(644):e182–8.
Shephard EA, Neal RD, Rose P, Walter FM, Litt EJ, Hamilton WT. Quantifying the risk of multiple myeloma from symptoms reported in primary care patients: a large case-control study using electronic records. Br J Gen Pract. 2015;65(631):e106–13.
Shephard EA, Stapley S, Neal RD, Rose P, Walter FM, Hamilton WT. Clinical features of bladder cancer in primary care. Br J Gen Pract. 2012;62(602):e598–604.
Stapley S, Peters TJ, Neal RD, Rose PW, Walter FM, Hamilton W. The risk of pancreatic cancer in symptomatic patients in primary care: a large case-control study using electronic records. Br J Cancer. 2012;106(12):1940–4.
Walker S, Hyde C, Hamilton W. Risk of uterine cancer in symptomatic women in primary care: case-control study using electronic records. Br J Gen Pract. 2013;63(614):e643–8.
Hamilton W, Lancashire R, Sharp D, Peters TJ, Cheng KK, Marshall T. The importance of anaemia in diagnosing colorectal cancer: a case-control study using electronic primary care records. Br J Cancer. 2008;98(2):323–7.
Usher-Smith J, Emery J, Hamilton W, Griffin SJ, Walter FM. Risk prediction tools for cancer in primary care. Br J Cancer. 2015;113:1645.
Lewis JD, Schinnar R, Bilker WB, Wang X, Strom BL. Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiol Drug Saf. 2007;16(4):393–401.
Hamilton W, Green T, Martins T, Elliott K, Rubin G, Macleod U. Evaluation of risk assessment tools for suspected cancer in general practice: a cohort study. Br J Gen Pract. 2013;63(606):e30–6.
Emery JD, Gray V, Walter FM, Cheetham S, Croager EJ, Slevin T, Saunders C, Threlfall T, Auret K, Nowak AK, et al. The improving rural Cancer outcomes trial: a cluster-randomised controlled trial of a complex intervention to reduce time to diagnosis in rural cancer patients in Western Australia. Br J Cancer. 2017;117(10):1459–69.
van Giessen A, Peters J, Wilcher B, Hyde C, Moons C, de Wit A, Koffijberg E. Systematic review of health economic impact evaluations of risk prediction models: stop developing, Start Evaluating. Value Health. 2017;20(4):718–26.
Schmidt-Hansen M, Berendse S, Hamilton W, Baldwin DR. Lung cancer in symptomatic patients presenting in primary care: a systematic review of risk prediction tools. Br J Gen Pract. 2017;67(659):e396–s404.
The authors would like to thank Ms. Jenny Lowe for her support and Dr. Chris Cooper for his contribution.
PROSPERO: Assessing the impact of diagnostic prediction tools for cancer in primary care: a systematic review is registered as PROSPERO CRD42017068373 and Prediction models for aiding cancer diagnosis in primary care: a systematic review is registered as PROSPERO CRD42017068375.
This report was commissioned by the NIHR HTA Programme as project number 16/12/04.
Ethics approval and consent to participate
Consent for publication
All authors have no conflicts of interest to disclose.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
MEDLINE literature search strategy. Table S2a. Development and validation studies - Characteristics (1). Table S2b. Development and validation studies - Characteristics (2). Table S2c. Development and validation studies - Model development and performance. Table S2d. Development and validation studies – Results. Table S3a. Development and validation studies - Risk of bias assessment, Questions 1 to 3. Table S3b. Development and validation studies - Risk of bias assessment, Questions 4 to 5. Table S4a. Impact Studies – Characteristics. Table S4b. Impact studies - Study Design. Table S4c. Impact studies – Results. Table S5. Impact studies - Critical Appraisal.
About this article
Cite this article
Grigore, B., Lewis, R., Peters, J. et al. Development, validation and effectiveness of diagnostic prediction tools for colorectal cancer in primary care: a systematic review. BMC Cancer 20, 1084 (2020). https://doi.org/10.1186/s12885-020-07572-z
- Primary care
- Diagnostic prediction models