Approaches for classifying the indications for colonoscopy using detailed clinical data
© Fassil et al.; licensee BioMed Central Ltd. 2014
Received: 15 October 2013
Accepted: 11 February 2014
Published: 15 February 2014
Skip to main content
© Fassil et al.; licensee BioMed Central Ltd. 2014
Received: 15 October 2013
Accepted: 11 February 2014
Published: 15 February 2014
Accurate indication classification is critical for obtaining unbiased estimates of colonoscopy effectiveness and quality improvement efforts, but there is a dearth of published systematic classification approaches. The objective of this study was to evaluate the effects of data-source and adjudication on indication classification and on estimates of the effectiveness of screening colonoscopy on late-stage colorectal cancer diagnosis risk.
This was an observational study in members of four U.S. health plans. Eligible persons (n = 1039) were age 55–85 and had been enrolled for 5 years or longer in their health plans during 2006–2008. Patients were selected based on late-stage colorectal cancer diagnosis in a case–control design; each case patient was matched to 1–2 controls by study site, age, sex, and health plan enrollment duration. Reasons for colonoscopies received in the 10-year period before the reference date were collected from three medical records sources (progress notes; referral notes; procedure reports) and categorized using an algorithm, with committee adjudication of some tests. We evaluated indication classification concordance before and after adjudication and used logistic regressions with the Wald Chi-square test to compare estimates of the effects of screening colonoscopy on late-stage colorectal cancer diagnosis risk for each of our data sources to the adjudicated indication.
Classification agreement between each data-source and adjudication was 78.8-94.0% (weighted kappa = 0.53-0.72); the highest agreement (weighted kappa = 0.86-0.88) was when information from all data sources was considered together. The choice of data-source influenced the association between screening colonoscopy and late-stage colorectal cancer diagnosis; estimates based on progress notes were closest to those based on the adjudicated indication (% difference in regression coefficients = 2.4%, p-value = 0.98), as compared to estimates from only referral notes (% difference in coefficients = 34.9%, p-value = 0.12) or procedure reports (% difference in coefficients = 27.4%, p-value = 0.23).
There was no single gold-standard source of information in medical records. The estimates of colonoscopy effectiveness from progress notes alone were the closest to estimates using adjudicated indications. Thus, the details in the medical records are necessary for accurate indication classification.
There is a critical need for valid comparative effectiveness studies of cancer screening tests, but this is often hampered by uncertainties about the exact reason for testing. This is particularly important for observational studies that seek to determine the effectiveness of colorectal cancer (CRC) screening. There are multiple testing options available for CRC, [1, 2] which differ in the strength of the evidence supporting their use, [3–11] and in their benefits, harms, costs, and complexity [3, 12].
In the United States, colonoscopy is the most commonly used CRC screening test,  but it is also used in the diagnosis and surveillance of colorectal neoplasia . Thus, the accurate determination and classification of the reasons for testing is crucial to the validity of observational studies of colonoscopy’s effectiveness and for guiding quality improvement efforts . Further, the documented test indication, such as a prior diagnosis of adenoma or family history of CRC, guides clinicians in making follow-up recommendations [1, 16, 17]. However, there is currently a paucity of published studies on the process of using clinical data to assign indication.
The true indication for colonoscopy is the clinical rationale for the referral for testing, but this is difficult to measure from medical records or administrative data because the reasons for testing are not consistently documented . Assigning an indication may also be difficult due to the multiplicity of reasons often recorded for a particular test or when common gastrointestinal symptoms, which have a low predictive value for CRC diagnosis, [19–21] are recorded at the time a colonoscopy is recommended or performed . Therefore, colonoscopy indication derived from clinical or administrative data may be misclassified, leading to biased results in observational studies of screening colonoscopy effectiveness.
This study describes an algorithm and an adjudication approach for classifying colonoscopy indications using clinical data. We also determined the extent to which estimates of colonoscopy effectiveness based on pre-adjudication indication classification differed from an adjudicated reference standard by estimating the effect of screening colonoscopy on the risk of diagnosis with incident late-stage CRC.
The currently used approaches and published algorithms for assigning indication have not been validated against a standardized classification approach. Previous studies on classifying colonoscopy indication have simply been based on diagnosis and procedure codes in administrative or claims data that indicate the presence or absence of gastrointestinal-related procedures, signs, symptoms or conditions [22–25]. These algorithms can produce different classification results, depending on the codes used or the length of time prior to the test that was evaluated for ascertaining the presence or absence of gastrointestinal conditions. This can lead to unexpected results when evaluating the effectiveness of colonoscopy in observational data, [15, 26] underscoring the need for a standardized approach for indication classification.
The data were obtained from a case–control study of the comparative effectiveness of CRC screening tests . Study patients were 55–85 years old between January 1, 2006 and December 31, 2008 and had been enrolled for ≥5 years in one of the following managed care plans: Group Health Cooperative, Washington State; Kaiser Permanente Hawaii; Kaiser Permanente Northwest; and Reliant Medical Group/Fallon Community Health Plan, Massachusetts. These health plans have used electronic medical records systems since at least 2005 and have electronic healthcare utilization data dating back to 1995 or earlier. This study was approved by the Institutional Review Boards at the University of Pennsylvania, the University of Massachusetts Medical School (UMMS), Group Health Research Institute (GHRI), and through ceded human subjects oversight authority from Reliant Medical Group to UMMS, and from Kaiser Permanente Hawaii and Kaiser Permanente Northwest to GHRI.
The outcome of the study was a diagnosis of incident late-stage CRC, defined as American Joint Commission on Cancer (Sixth Edition) stage IIB or higher based on tumor registry data [4, 27]. Each patient with late-stage CRC (n = 498) was matched on the diagnosis (reference) date to 1–2 CRC-free controls (n = 541) by study site, birth year, sex, and health plan enrollment duration, as described elsewhere . Data on the matching variables, socioeconomic factors, and patients’ clinical history were collected from electronic databases, tumor registry, and census data. Information on family history of CRC was obtained from electronic or paper medical records.
The primary interest in this report was the concordance of indication across multiple data sources for colonoscopies received during the 10-year period before the reference date (observation period), which was determined from data collected from each patient’s medical records (see Additional file 1: Appendix A). Trained abstractors, one each at three study sites and two at one site, performed the medical record audits. Audits were standardized through training and retraining and through the use of a common, structured electronic data collection instrument that was developed in Microsoft Access. The data collection tool was pre-populated with patient demographics, health care utilization history and the dates of CRC tests that were extracted from electronic databases using, in part, codes from the International Classification of Diseases, 9th Edition, Clinical Modification, Current Procedural Terminology and Healthcare Common Procedure Coding System . For each test found in the medical records, the auditors collected up to three documented reasons, separately, from each of three data sources (progress notes, referral note, and procedure report) according to 28 pre-coded categories (see Additional file 1: Appendix B). Auditors also collected reason-related information in free-text format. We defined the progress notes as all parts of the medical records other than the referral note and procedure-related documentation.
Similar data were collected on sigmoidoscopy, double contrast barium enema (BE), and CT colonography (CTC), which aided in indication classifications. Detailed data on fecal occult blood test (FOBT) restricted to the 5-year period before the reference date were also collected, including whether a test was positive or negative and the type of diagnostic test received following positive results. Auditors coded FOBT reasons as screening, diagnostic, surveillance, other, or unknown.
A colonoscopy was classified as surveillance if performed for follow-up of previously detected polyps; ‘definite’ diagnostic if used to work-up a positive FOBT, a mass or other abnormal finding; ‘probable’ diagnostic if the medical records noted clinical conditions that were deemed to represent a high pretest probability for CRC, such as rectal bleeding; ‘possible’ diagnostic if the only documented reasons were non-specific medical conditions such as diarrhea or abdominal pain; or ‘probable’ screening if both non-specific symptoms and screening were recorded. The indication was considered ‘high-risk’ screening if the test was performed for screening and the patient had a first-degree relative diagnosed with CRC before age 50, two or more second-degree relatives diagnosed at any age, or other familial syndromes. The indication was considered ‘definite’ average-risk screening if screening was recorded and none of the CRC conditions or risk factors noted above were recorded. The indication was considered unknown if the reason was not specifically documented.
The algorithm assigned each test a single indication irrespective of the number of reasons (or missing data) recorded by chart auditors (see Figure 2). We therefore identified tests that could have been misclassified in order to review all available indication-related data. This review was conducted in two steps. The first step determined whether or not a particular test required a formal review by an adjudication panel of experts. Tests were selected for the first-tier review if more than one indication could be assigned, or indication was unknown in all data sources (Figure 2). For instance, a test was selected for review if the referral note recorded both constipation and average-risk screening or the indication differed (including unknown) across data sources (i.e., classified as ‘probable’ diagnostic based on referral note but ‘probable’ screening from progress notes). Because non-coded information was not included in the algorithm, we also reviewed all tests that had data in relevant free-text variables.
Three investigators and a research assistant (KA and see acknowledgement) performed the first-tier reviews of indication data (in pairs). At this review, tests that had additional pertinent indication-related information in free-text data or had substantive discordance across data sources were submitted for adjudication. Discordance due to classification as ‘definite’ diagnostic versus ‘probable’ diagnostic was considered non-substantive. We required consensus by both reviewers for a test to bypass adjudication. All tests classified as ‘high-risk’ screening were adjudicated to evaluate the details of the CRC risk. Once a test was selected for the first-tier review or adjudication, all the CRC tests of the particular patient (except FOBTs) were evaluated at the first-tier review, and/or adjudication, as appropriate. Of the 647 colonoscopies observed in the sample, 454 underwent the first-tier review of which 304 were reviewed by the adjudication panel (Figure 2).
We formed a 5-member panel of experts comprised of epidemiologists, internists and gastroenterologists (DAC, VPDR and see acknowledgement), and a non-voting chair (CAD) to evaluate indication for the selected tests. The goal of adjudication was to classify each test according to the predetermined categories in Figure 1, after careful review of all available data. The adjudication committee reviewed tests blinded to the case–control status; study site; test type and exact dates; and, in the case of patient with multiple tests, whether a particular test was the trigger for adjudication. However, they were given the sequence and results of FOBTs and the sequence and type of health care visits.
In assigning indication, the committee considered clinical conditions that were documented as reasons for CRC testing, in part, by grouping them as strong versus non-specific based on the pretest probability of CRC associated with each condition (Additional file 1: Appendix C) [29, 30]. Because gastrointestinal conditions are highly prevalent but are individually not highly predictive for CRC diagnosis [19, 20, 31], the grouping of clinical conditions was largely based on panel consensus. Disagreements among committee members on indication assignment were resolved using a majority rule. However, tests classified by different committee members as both screening and diagnostic were discussed until a consensus was reached.
Patients with multiple colonoscopies (n = 88) during the observation period were assigned a single patient-level indication in a temporally hierarchical manner by considering both the indication and the sequence of colonoscopies in relation to the reference date. We selected the ‘definite’ screening test with a test date that was farthest from the reference date; if none, then we used the earliest ‘probable’ screening colonoscopy; and if none, then ‘possible’ diagnostic, ‘probable’ diagnostic and finally ‘definite’ diagnostic colonoscopy, in that order. The indication was classified as surveillance if the first colonoscopy was for surveillance and there was no subsequent screening test.
For this report, we categorized the indication as routine screening (‘probable’ or ‘definite’ average-risk screening), ‘high-risk’ screening, surveillance, ‘possible’ diagnostic, diagnostic (‘definite’ or ‘probable’ diagnostic), or unknown. Analyses were performed on both test-level (each colonoscopy, n = 647) and patient-level (n = 524) classifications. Pair-wise analyses compared the proportion classified in each of the six indication categories among data sources and with adjudication.
We calculated the percent concordance with adjudicated indication, for each data source individually and for all sources combined, in both test- and patient-level analyses. In these analyses, we considered all indication categories at the same time using a categorical variable, and combined routine and ‘high-risk’ screening into a single ‘screening’ category for ease of interpretation.
We also computed kappa (ĸ) coefficient of agreement using quadratic weights that considered the most important distinction as that between screening and diagnostic. The kappa statistic was interpreted according to Byrt’s recommendation (≤0.00 = no agreement; 0.01-0.20 = poor; 0.21-0.40 = slight; 0.41-0.60 = fair; 0.61-0.80 = good; 0.81-0.92 = very good; and >0.92 = excellent agreement) . Kappa accounts for the probability of chance by considering both the observed and expected agreements. Thus, it can be spuriously low when expected agreement is high, as could occur in the case of indication classification due to high correlation among data sources. Therefore, we based our interpretation primarily on unweighted percentage concordance.
Next, we evaluated whether differences in the data sources and classification approach for indication influenced estimates of the association between exposure to routine screening colonoscopy and diagnosis with late-stage CRC. In secondary analyses, we used the expanded screening definition that included ‘high risk’ screening. Analyses were performed with conditional logistic regression models, adjusting for census block-group poverty levels (in quartiles), number of preventive health care visits, family history of CRC, modified Charlson comorbidity index at baseline, and receipt of other screening tests. We then computed the percentage difference in beta coefficients between the algorithm-derived screening indications and the adjudicated standard, and used two-sided Wald χ 2 P-values to evaluate the statistical significance of the differences. In our regression analyses, we accounted for the period of preclinical late-stage CRC by excluding tests performed within one month of the reference date, as described in a previous report . The analyses were performed using STATA version 12.1 (StataCorp, College Station, TX, USA).
Demographic and clinical characteristics of cases and controls, SEARCH Study 2006–2008, n = 1,039
Sample, n = 1,039
Poverty levels, quartiles*
Length of enrollment with health plan before reference date, yr
Number of preventive outpatient health care visits within 5 years of reference date
Family history of colorectal cancer (CRC)†
Charlson comorbidity index at baseline‡
Had a healthcare visit during the 2-year period at baseline‡
Had ≥2 colonoscopies
The algorithm-based indications of the colonoscopies reviewed by the committee were: screening = 21, ‘high-risk’ = 21, surveillance = 80, ‘possible’ diagnostic = 8, diagnostic = 170, and unknown = 4 (Additional file 1: Appendix D). After the review, 16 (76.2%) indications previously classified as screening remained unchanged, but the remaining five were reclassified as ‘possible’ diagnostic (n = 2), diagnostic (n = 2) and surveillance (n = 1). Nineteen of the 21 ‘high-risk’ tests (90.5%), six of the 170 diagnostic (3.5%), one of the eight ‘possible’ diagnostic (12.5%) and two of the 80 surveillance tests (2.5%) were reclassified as screening. The majority of diagnostic tests (n = 155, 91.2%) remained unchanged; five were reclassified as ‘possible’ diagnostic, three as surveillance, and one as ‘high-risk’ screening. Only one of the four ‘unknowns’ remained unchanged, with one each of the remaining three reclassified as surveillance, ‘possible’ diagnostic and diagnostic.
In the patient-level analyses (n = 524), there was fair-to-good agreement in exposure classification among the three sources (76.9% to 82.3%, ĸ = 0.56-0.65) (Figure 4). Compared to adjudication, there was fair-to-good agreement with each of the data sources (progress note 80.2%, ĸ = 0.58; referral 84.0%, ĸ = 0.66; procedure report 88.0%, ĸ = 0.71); the highest level of agreement was with all sources combined (93.9%, ĸ = 0.88).
Association between screening colonoscopy and risk of incident late-stage CRC according to data source, SEARCH Study 2006–2008, n = 1,039
Data source according to screening definition used
Odds Ratio and 95% CI
% Difference in beta coefficients
P-value of difference*
Screening defined as ‘probable’ or ‘definite’
All sources combined
Same definition as above plus ‘high-risk’ screening exposures
All sources combined
This study compared the information from different clinical data sources for colonoscopy indication classification and found generally good agreement among the progress notes, referral note, and procedure report. However, there were differences between sources in the classification of tests as screening and the extent of missing information. After adjudication, most patients classified as ‘high-risk’ were determined to be average-risk screening. Indication classification without expert review resulted in a 2.4-34.9% deviation from the adjudicated standard in the estimated effects of screening colonoscopy. We found that, although the direction of the association between screening colonoscopy and late-stage CRC diagnosis risk was not changed by the indication data source, analyses with information from the progress notes alone or in combination with referral and procedure reports produced results that were closest to those from the indication derived through adjudication.
The literature provides no consistent method for determining CRC test indication and no previously published studies have described the use of adjudication in systematically assigning indication. Most reports using medical records derive indication from the procedure report alone and in some cases the source of the indication information in the medical records was not clearly described [18, 33–36]. Our findings suggest that approaches using only the procedure report or referral notes may be subject to a greater degree of misclassification, possibly because the indication documented may be influenced by examination findings or the need to obtain third-party payer approval for the referral.
Our study has several important implications. First, compared with adjudication, all of the sources of information demonstrated some misclassification, particularly for ‘high-risk’ indications. Second, the procedure report had the fewest missing indications, but produced effect sizes that differed slightly more from the adjudicated results than the progress notes. Third, the progress notes data produced estimates of screening that were consistently closest to those from adjudication, suggesting that the details from progress notes are important for accurate indication classification. Thus, our study suggests that review of data in the progress notes in medical records, including detailed information on clinical conditions documented around the time of the test, is required to produce valid results in observational studies of CRC screening effectiveness. Finally, if resources are limited, adjudication of indication may focus on ‘high-risk’ and ‘unknown’ test indications. If adjudication is not performed, given their relative rarity, including ‘high-risk’ indication as screening is preferable to excluding them in analyses of effects on average-risk persons.
This study has some limitations. Because the original study was for average-risk persons, some high-risk patients were excluded at the time of patient selection. Therefore, tests for high-risk indications may be underrepresented in this analysis. Abstractors were not blinded to the source of information in the medical records, possibly contributing to the high correlation of indication across data sources. Also, not all tests were adjudicated, and reviewers did not have access to all the medical record data, including detailed information on the duration and severity of clinical conditions that were recorded as reasons for testing. Further, the distribution of colonoscopy indications, and thus the usefulness and necessary extent of adjudication, may vary across settings, depending on population demographics and reimbursement policies. Future larger studies in non-managed care settings and in different settings or populations are needed to establish the benefits of obtaining data from multiple sources and conducting adjudication for indication classification. Additional studies are also needed to evaluate the impact of indication misclassification on estimates of the effectiveness of colonoscopy for reducing risk of CRC death. Further, the approaches described in this paper can be applied to evaluate the degree to which indication misclassification biases results of colonoscopy effectiveness in studies based on administrative data.
Careful classification of indication is important in observational research on the comparative effectiveness of CRC screening tests and in the quality improvement of CRC testing. In our study, we found no single gold-standard source of information in the medical records for indication classification that agreed consistently with expert adjudication, and the data sources were complementary in achieving better indication classification. Adjudication changed the classification of some indications and the data-source differences we observed resulted in some deviations in the odds ratios for the association between screening colonoscopy and late-stage CRC risk. The deviations from the adjudicated standard for this association were smaller with progress notes information than with other sources alone. Therefore, careful standardized reviews of information in the progress notes, referral notes and procedure report are necessary for accurate classification of colonoscopy indication.
This study was performed as part of a multicenter cancer screening comparative effectiveness research project, SEARCH (Screening Effectiveness and Research in Community-based Healthcare), which was supported by Grant Number UC2CA148576 from the National Institutes of Health (NIH)/National Cancer Institute (NCI) to Drs. Buist and Doubeni. The study was also supported by Grant Number U01CA151736 from the National Institutes of Health (NIH)/National Cancer Institute (NCI) to Dr. Doubeni. Dr. Doubeni’s time was also supported by the following grants from the NIH/NCI: K01CA127118 and K01CA127118-S1. The contents of this report are solely the responsibility of the authors and do not necessarily represent the official views of the NIH/NCI. Data collection on cancer incidence for this study was supported in part by data infrastructure developed by the HMO Cancer Research Network at participating sites. Group Health Research Institute’s Cancer Surveillance System is funded in part by Contract # N01-CN-67009 and N01-PC-35142 from the Surveillance, Epidemiology and End Results Program of NCI with additional support from the State of Washington. We are grateful to Robert H. Fletcher, M.D., M.Sc.; Noel S. Weiss, M.D., Dr.P.H; and Theodore R. Levin, M.D. who served on the indication adjudication committee; to Drs. Robert Greenlee and Rosalie Torres Stone and Mr. Shawn J. Gagne for reviewing the data prior to adjudication; and to study coordinators and medical records auditors; and to Dr. Sayantani Ghosh, MBBS for help with manuscript preparation.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.