Many quality measurements, but few quality measures assessing the quality of breast cancer care in women: A systematic review

Background Breast cancer in women is increasingly frequent, and care is complex, onerous and expensive, all of which lend urgency to improvements in care. Quality measurement is essential to monitor effectiveness and to guide improvements in healthcare. Methods Ten databases, including Medline, were searched electronically to identify measures assessing the quality of breast cancer care in women (diagnosis, treatment, followup, documentation of care). Eligible studies measured adherence to standards of breast cancer care in women diagnosed with, or in treatment for, any histological type of adenocarcinoma of the breast. Reference lists of studies, review articles, web sites, and files of experts were searched manually. Evidence appraisal entailed dual independent assessments of data (e.g., indicators used in quality measurement). The extent of each quality indicator's scientific validation as a measure was assessed. The American Society of Clinical Oncology (ASCO) was asked to contribute quality measures under development. Results Sixty relevant reports identified 58 studies with 143 indicators assessing adherence to quality breast cancer care. A paucity of validated indicators (n = 12), most of which assessed quality of life, only permitted a qualitative data synthesis. Most quality indicators evaluated processes of care. Conclusion While some studies revealed patterns of under-use of care, all adherence data require confirmation using validated quality measures. ASCO's current development of a set of quality measures relating to breast cancer care may hold the key to conducting definitive studies.


Background
Cancer is the second-most common cause of death (after cardio-vascular disease) in North Americans, and breast cancer is the most commonly diagnosed cancer in women [1]. It was estimated in 2004 that one in seven American women would develop breast cancer in her lifetime [1], up from one in eight estimated in 2003 [2].
During the 1990's, American women's 5-year breast cancer survival rates improved on average 2.3% per year, with the largest improvements for younger women. However, survival rates are generally lower for African American women, with 30% excess deaths compared with white women estimated in the year 2000 [3]. Thus, while some aspects of breast cancer care (e.g., earlier detection) have contributed to improved survival, increasing rates of disease and disparities in outcomes point to outstanding issues to be addressed.
The enormous toll on women, families and society make it urgent that breast cancer care be as effective, safe, accessible and equitable as possible. The foundation of this investigation must be sound research to refine what represents "quality care" (e.g., timely access to efficacious and safe treatments). Only by measuring and monitoring adherence to recommended care can meaningful trends and gaps in the delivery, receipt and outcomes of care be identified and put in context, at all levels, from individual centers to nationally and globally [4].
Health care quality measurement is an emerging field, developing alongside the establishment of goals for health care delivery and utilization. Ideally, stakeholders within the health care system will assess internal quality improvement and accountability, and oversee external health care quality, by appropriate measurement of the rates of adherence to recommended care. This would guide policy, the provision of care, and future research directions.
The quality of health care is "the degree to which healthcare services ... increase the likelihood of desired health outcomes and are consistent with current professional knowledge" [5]. Despite more than a trillion dollars spent annually on health care in general in the USA, however, it is suggested that the care received by Americans falls well short of ideal [6]. On average, almost half of those in need do not receive recommended care [7,8].
Health care quality measurement may address a question such as: How many women in a given clinical situation (e.g., diagnosis, treatment history) receive a standard of care (e.g., radiation following surgery) within a specific time frame? Similar questions, yet ones which might yield different results, could be: How many health care practi-tioners offer or deliver a particular standard of care to women in a specific clinical situation? Patient refusal of care may account for discrepancies in rates identified by these questions.
The assessment of the delivery or receipt of quality health care may seem deceptively straightforward, with large quantities of data available in health care records or cancer registries, for example. These data sources permit the measurement of rates of adherence to recommended health care processes (e.g., a competent and timely action by the health care practitioner), structures (e.g., the availability of diagnostic imaging equipment), or outcomes (e.g., event-free survival; quality of life).
It is not sufficient simply to compile information from health records, health care providers or patients related to a definition of quality care (e.g., if diagnosis X, then deliver care Y within Z weeks). Scientific validation is needed to ensure that data specifically and repeatedly reflect details defining the care in question; that measurements accurately reflect patterns of practice. Indeed, without ensuring scientific soundness, a definition given to individuals extracting data from medical records, or used to solicit information from other data sources (e.g., patients), may complicate or even prevent the identification of what was intended. Unless health care indicators survive the rigors of a scientific process and are found to have sound psychometric properties, they cannot formally be considered quality "measures" per se. In the absence of validated measures, observations may be misleading.
What, then, are the requirements for the development of a sound quality measure?
The definition of quality care should be evidence-based [7], possibly with a subsequent expert consensus process, and with details that are precisely expressed (e.g., in a clinical practice guideline). For example, quality care for women with early stage breast cancer entails the receipt of radiotherapy following breast-conserving surgery. Evidence from randomized controlled trials (RCTs) has shown that this less invasive, less disfiguring strategy brings survival outcomes identical to those following mastectomy. This definition of quality care is considered an indicator of quality care, or quality indicator (e.g., if early stage breast cancer in women, then radiotherapy following breast-conserving surgery within a specific timeframe). Establishing the rate of adherence to this quality (care) indicator according to a specific data source amounts to quality measurement.
The definition of a quality indicator must be specific, complete, and clearly worded regarding, for instance, the target population (e.g., women with specific diagnoses) and the characteristics of the care (e.g., the order, type and timing of care). It must be verified that different users share the same meaning and therefore make the same observations when, on different occasions, they consult various data sources (e.g., clinic or hospital records) to gather data. This verifies an indicator's reliability as a quality measure.
Additional scientific validation is necessary to increase the confidence that the measured rate of adherence reflects the actual delivery/receipt of particular care (e.g., "percentage of women receiving radiotherapy after breast-conserving surgery"). Along with reliability, sound validity indicates and ensures that observations unambiguously reflect what was intended to be identified. For example, only data pertaining to the details circumscribed by the quality indicator should be sought and collected (e.g., the request to identify clinical outcome data should not result in extraction of data for surrogate measures).

Project Scope
Our rationale in conducting this systematic review was to identify extant quality measures, which could be employed by stakeholders (e.g., service providers) to assure or improve the quality of breast cancer care in women. While it is our view that validated quality measures are required to appropriately ascertain the quality of breast cancer care in women, both formally developed quality measures, as well as quality indicators having received little or no scientific development, were eligible for inclusion in our review. It was thought that it would add value to this project, but practical constraints made it impossible to evaluate the soundness of the empirical evidence supporting recommended standards for care.

Methods
A seven-member Technical Expert Panel (TEP) provided advisory support, including refining the questions, highlighting key variables requiring consideration in the evidence synthesis and supporting refinement of the scope of the project. Detailed methods information, including the search strategy and data assessment/abstraction forms is available elsewhere [9].

Study Identification
Various electronic bibliographic databases (Medline, Cancerlit, Healthstar, Premedline, Embase, CINAHL, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effectiveness, Cochrane Central Register of Controlled Trials, and, Health and Psychosocial Instruments (HAPI)) were searched for reports published from 1992 to 2003 relevant to breast cancer diagnosis and treatment, and quality measures. Another search to retrieve systematic reviews of breast cancer treat-ment or diagnosis was executed in Medline and Cancerlit, with retrieval limited to material published after 1993. Additional published or unpublished literature was sought through manual searches of reference lists of included studies and key review articles, and from the files of content experts. Web sites were searched, including AHRQ's National Quality Measures Clearinghouse. The American Society of Clinical Oncologists (ASCO) was at that time developing care quality measures, but wished first to complete its work before disseminating it. After removing duplicate citations via Reference Manager™ (Thomson ResearchSoft, Carlsbad, CA.), bibliographic records were identified and posted to a secure internetbased software system for review.
Following calibration exercises, bibliographic records (level 1), and then retrieved articles (level 2) were screened for relevance, with two reviewers per stage. A final screening (level 3) excluded reports describing clinical practice guidelines, systematic reviews, and commentaries/editorials that had initially passed into data abstraction before the project scope was narrowed to exclude examination of the strength of the empirical evidence supporting any given recommended breast cancer care. Disagreements were resolved by consensus and, if necessary, third party intervention. Excluded studies were noted as to the reason for their ineligibility using a modified QUOROM format [10].

Inclusion/Exclusion Criteria
The population of interest was female adults, diagnosed with or in treatment for breast cancer, including all histological types of adenocarcinoma, both in situ and invasive. Quality measurement efforts had to have focused on at least one data source (e.g., medical records; cancer registries; patient or provider questionnaires), entailed any sampling strategy (e.g., convenience sample over a period of time in a health care setting; hospital medical records; general population sample from a given region) and could index any domain (e.g., structure; process).
Searches were restricted to post-1992 because, in the opinion of the funders, quality measurement efforts concerning breast cancer care began to receive serious attention in the ten years prior to the initiation of this project.
Quality indicators could be derived from any source (e.g., clinical practice guideline) and have been subjected to any degree of scientific development, but reference had to have been made to the empirical evidence supporting each indicator. A standard of care (e.g., a recommendation in a guideline) serving as the basis for quality measurement had to have been established prior to the quality measurement effort, so that it would have been available at the time to guide the care subsequently assessed using the quality indicator. Given the unique issues related to breast cancer, measures of quality of life (QOL) and patient satisfaction had to have been developed or adapted for use with breast cancer patients. Inflammatory breast cancer, Paget's disease, phyllodes tumors, and benign breast conditions were excluded. A separate initiative is addressing breast cancer screening and prevention.

Data Abstraction
Following a calibration exercise involving two studies, three reviewers independently abstracted the contents of each included study using an electronic data abstraction form. Abstracted data were then verified by a second reviewer. Data included: report characteristics (e.g., publication status); study characteristics (e.g., data sources); population characteristics (e.g., case characteristics [size of tumor; level of lymph node involvement; presence/ absence of metastasis]); characteristics of the quality indicators used in quality assessment (e.g., data concerning reliability, validity, and study-obtained links to outcomes; whether data extractors were trained and extractions were independently verified) [10]; and adherence data (e.g., overall adherence rate; variations in rates based on reviewrelevant stratifications such as age; possible reasons for failure to receive care, including patient refusal).
After a calibration exercise involving two included studies, each quality indicator was assessed independently by two reviewers to determine the extent of its scientific development as a quality measure. Levels of development were: I -quality indicator was developed prior to its implementation in the present study, according to scientific principles (e.g., assessment of scientific soundness, feasibility and ease of use, reliability, internal validity, sensitivity, and pilot testing with appropriate rigor and relevant data sources); II -quality indicator was being actively developed as part of the present quality measurement study; III -quality indicator was not currently under development, but existing psychometric data were reported; or, IV -quality indicator was not currently under development and no psychometric data were reported. Levels I-III could be further subdivided according to the soundness of the reported psychometric properties.

Data Synthesis
Data from relevant studies were synthesized qualitatively, including: diagnosis; treatment (including supportive care); followup care; and the reporting/documentation of care.
Variables to be taken into consideration included the study population (e.g., age, race/ethnicity, socioeconomic status), data sources (e.g., cancer registries), sampling techniques (e.g., convenience sample, random general population sample), and the purpose of the indicators/ measures (e.g., internal quality improvement). Other parameters of interest included measurements of outcomes linked to the quality measurements, and psychometric properties of the identified quality measures (e.g., sensitivity and specificity for diagnostic tests).
Quantitative syntheses of adherence data were not possible, given the paucity of data from validated measures.

Results
Lists of included and excluded studies (with reasons for exclusion), evidence and summary tables and a comprehensive report are available electronically elsewhere [9] Results of record retrieval and screening are summarized in Figure 1. From 3,848 unique records identified at the outset, 60 reports, describing 58 studies, met eligibility criteria, and 143 quality indicators were identified (Table  1).
Many different populations were investigated, typically retrospectively, using various reference standards (e.g., clinical practice guidelines) and data sources (e.g., medical records). Younger women, and those with early stage breast cancer, were more likely to have been studied. Most standards reflected processes of care, focusing most often on whether or not women with breast cancer received indicated care (e.g., percentage of women treated with breast-conserving surgery who begin radiation therapy within 6 weeks of completing either of the following: the last surgical procedure on the breast (including reconstructive surgery that occurs within 6 weeks of primary resection) or chemotherapy, if patient receives adjuvant chemotherapy, unless wound complications prevent the initiation of treatment; percentage of women having first localization biopsy operation to correctly identify impalpable lesions). There were few investigations of the quality with which this care was delivered. The quality indicators were employed to serve internal quality improvement or external quality oversight.
Database choices reflected study rationale. Small, local databases were used for internal quality improvement, while large databases were used to assess and compare adherence to care across various, larger jurisdictions. The single study linking a quality measurement to outcome noted that reporting the number of affected lymph nodes was linked to both overall and disease-free survival [11]. The only scientifically validated quality measures that were identified assessed QOL (n = 11) and patient satisfaction (n = 1) [12][13][14][15][16][17][18][19][20][21][22]. Of the 12 validated quality measures, 11 were used with reference to treatment and one with regards to diagnosis. None pertained to followup or the documentation of care. Two QOL scales had been specifically validated for use with breast cancer populations. The Functional Assessment of Cancer Therapy Scale (FACT-B, version 3) evaluated the QOL associated with a diagnosis of breast cancer [16]. The European Organization of Research and Treatment of Cancer (EORTC) QLQ-BR23 scale [20] was employed to evaluate the impact of treatment. Other validated instruments included: the Patient Satisfaction Questionnaire [20], Short Form-36 [12,14,16,18,22], EORTC-C30 [14,15]., Medical Outcomes Scale [16,17], Spitzer Quality of Life Index [21], Uniscale [21], Ferrans Quality of Life scale [20], Psychoso-cial Adjustment to Illness Scale [20], Guttman Health Status Questionnaire [16], and the Linear Analogue Self-Assessment Scale [15].

Modified QUOROM Flow Chart
Overall, where gaps in care appeared to exist, they were generally marked by patterns of under-use rather than lower quality of delivered care. Reports of disparities in breast cancer care amongst groups at risk of being disadvantaged (by age, race, socio-economic status, health insurance) are summarized in Table 2. This includes reports from a wide range of population mixes and sizes, in differing settings, and employing varying standards of optimal care. For example, definitions of "younger" ranged from <40 years to <70 years. Most of the quality indicators were defined in terms of whether or not the indicated care had been received, rather than the quality of the care. No group was advantaged regarding QOL, and n = number of different quality indicators regarding this type; *extent of scientific development of quality indicator: Level Ia = pre-study data indicating consistently sound psychometric properties; Iac = pre-and on-study data indicating consistently sound psychometric properties; IV = no pre-or on-study psychometric data the satisfaction study indicated no advantage related to age [17]. Satisfaction was higher among white women and those with government insurance.
Twenty-six quality indicators were identified regarding events surrounding diagnosis, with most not fitting into the project's predefined categories. These measures reflected recommendations that women be seen by specific types of health care professional, for specific reasons, and within certain time frames. The greatest number of studies evaluating a given quality indicator focused on a recommendation pertaining to the use of preoperative diagnosis by fine-needle aspiration cytology, needle biopsy or biopsy (n = 4). Most quality indicators referred to the delivery or receipt of indicated diagnostic care (75%: 18/24). Only five addressed the quality with which specific diagnostic care was delivered. One study observed sound on-study reliability data for an instrument previously validated as a QOL measure [19]. Quality measurements were not found relating to sentinel node biopsy, chest X-ray, bone scan, CT scan, MRI, blood tests, tumor marker status, or genetic testing.
Many more quality indicators were employed to assess treatment (n = 67). The most frequently assessed treatments were adjuvant systemic therapy (n = 25) and radiation therapy (n = 16). The greatest number of studies employing a given treatment-related quality indicator evaluated the appropriate use of breast-conserving surgery (n = 18), and the appropriate use of radiotherapy following breast-conserving surgery (n = 19). Most of the quality indicators referred to the delivery or receipt of indicated treatment (70%: 47/67). Nine quality indicators assessed the quality with which specific treatment care was delivered. Quality measurements were not found relating to reconstructive surgery or neodjuvant systemic therapy, nor to late-stage treatment and palliative care.
Followup care was the focus of five quality indicators, none of which were validated. Specific types of followup care were not predefined.
Of 45 quality indicators relating to reporting/documentation, pathology reporting was the most frequently assessed (n = 42). Reporting the assessment of microscopic margins, and reporting histological type (microscopic) were each evaluated in five studies. Neither surgical nor radiotherapy reporting were the focus of quality measurement.

Discussion
The measurement of the quality of breast cancer care is in its infancy, despite the fact that breast cancer in women is one of the most-studied areas of healthcare [23].
The clearest observations from this systematic review are that most efforts to measure adherence to quality breast cancer care have centered on whether or not appropriate care was delivered or received (rather than on the quality of this care), focused on treatment, and failed to employ quality indicators formally developed as quality measures. As well, the quality indicators identified did not cover many of the predefined types of diagnostic or treatment care of interest to the funders.
Nearly all quality measurements entailed quality indicators for which no reference was made, or data reported, indicating that they had been developed scientifically as quality measures. Only QOL and satisfaction with care indicators had been validated. Thus, while many measurements were identified, very few were conducted with validated quality measures. In the absence of sufficient data yielded by the application of validated quality measures, the decision was made to forego meta-analysis. For the same reasons, adherence data need to be interpreted with caution. Potential gaps in care compiled in Table 2 unfortunately do not contribute substantially to understanding the divergence of outcomes for American women of different ethnic origins for example.
Malin et al. reviewed breast cancer care literature post-1985 [24], and although the present project had a later commencement date, the same quality indicators of breast cancer care were identified.
McGlynn et al.'s efforts to establish clinically relevant, valid quality indicators for breast cancer care [8] via a review of the evidence and a peer consensus process, and their findings of under-use, must be considered preliminary. Their study was based on a small number (n = 192) of eligible breast cancer cases; the evidence supporting some (especially treatment) standards was observational by nature, or based on expert opinion; and the quality indicators had not been pilot-tested as measures. Furthermore, patient preference could have been considered [25]. If patient refusal of treatment had been uniformly taken into account, observations of "gaps" may have been different. Under-use of optimal treatment strategies by certain patient groups may arise for a multitude of cultural reasons, and only if measures are sensitive to diverse issues will health care for all citizens be improved. These are some of the issues to be uncovered as part of the formal development of measures.
The most important weakness of the present review is that, due to practical constraints, the "strength" of the clinical evidence base (i.e., the consistency of the results of high quality, appropriately-designed, and adequatelypowered primary studies indicating significant links between care and improved outcomes) supporting the definition of each quality indicator (i.e., standard of care) could not be examined. A second limitation is likely that the "level of scientific development" scheme designed especially for this study was itself employed without the benefit of a validation process. Nevertheless, most reports did not describe any validation of their quality indicators, so this limitation did not ultimately affect the results of the review.
Considerable work remains, to define and to measure adherence to standards of breast cancer care. While empirical evidence will likely continue to be collected and synthesized in the pursuit of defining quality breast cancer care, the translation of quality indicators (even with strong support from evidence and clinical consensus) into quality measures with an equally strong psychometric foundation is likely the most pressing need for this field of inquiry to progress. However, before researchers rush headlong into efforts to generate quality measures in the scientific manner described above, it may be wise to appraise the soundness of quality measures under development by ASCO.
ASCO has been developing a set of quality measures relating to stages I-III breast cancer [26]. Their goal is to produce a robust set of largely evidence-based indicators that were being pilot-tested using multiple data sources (e.g., patient survey, ACOS's National Cancer Database) and published with a detailed profile of their reliability (e.g., inter-rater, inter-database), feasibility, and validity. It is hoped that these will be the validated measures required to push forward the field of quality measurement with respect to breast cancer care. It remains to be seen whether or not these quality measures will cover aspects of care (e.g., quality of delivery of care, structural factors) and components of care (e.g., reconstructive surgery, neoadjuvant systemic therapy, sentinel node biopsy, chest X-ray, bone scan, CT scan, MRI, blood tests, tumor marker status, genetic testing, followup, and treatment of recurrent disease and palliative care) identified by the present review as being largely absent from the literature.
Future research efforts to measure adherence to quality breast cancer care could be conducted prospectively, if health care practices and systems were modified to accommodate the required data collection. Virtually all of the efforts to date have involved retrospective data capture. While this strategy reduces the waiting time for collection of especially long-term (e.g., 5 year survival) outcomes, retrospective data collection also makes it difficult to ensure that some of the key factors potentially influencing adherence-to-care data can be observed (e.g., reasons for patient refusal of care).

Conclusion
A clear, comprehensive understanding of the quality of breast cancer care received by the average citizen is necessary before quality of healthcare may be seriously addressed on a national level [27]. Reliable, validated quality measures with which to identify confidently possible gaps in breast cancer care, and to afford accountability, improvement, and research [28], are the first step to resolving this issue. Some promise is attached to ASCO's ongoing development of breast cancer quality measures, although it will be some time before the results are known. It may be best to proceed with caution before allowing even minor decisions to be guided by any of the adherence data reviewed in this report.

Financial competing interest
Authors of this manuscript (and corresponding review) have not received any reimbursements fees, funding or salary form organizations that may in anyway gain or lose financially from the publication of this manuscript in the past five years prior to start of the corresponding review.
Authors do not hold any stocks or shares in an organization that may in any way gain or lose financially from the publication of this manuscript.
Authors do not hold or are currently applying for any patents relating to the content of the manuscript, nor they have received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript.

Non-financial competing interests
Authors have no non-financial interests (political, personal, religious, ideological, academic, intellectual, commercial or any other) to declare in relation to this manuscript.