|
|
||||||||
Health Care Analytics Group, United BioSource Corporation, Bethesda, Maryland
Correspondence and requests for reprints should be addressed to Nancy Kline Leidy, Ph.D., The MEDTAP Institute at UBC, 7101 Wisconsin Avenue, Bethesda, MD 20814. E-mail: nancy.leidy{at}unitedbiosource.com
ABSTRACT
This is an interesting time in the evolution of clinical research, as the convergence of scientific, social, and technological advances influences the development of new measurement strategies and creates new challenges and opportunities in the evaluation of treatment outcomes. The purpose of this paper is to provide a brief overview of the forces of change influencing clinical research in general and measurement, specifically, discuss current challenges in the evaluation of treatment outcomes in chronic obstructive pulmonary disease and propose several areas for further research. Five key challenges are discussed: accuracy in the selection and measurement of endpoints; appropriate timing and recall; measurement efficiencies using aggregation techniques or item response theory; interpretation, including the debate concerning the minimal important difference; and the need for "real world" studies with real-world measures to understand treatment effectiveness. Each of these areas offers interesting challenges and opportunities for further development and research.
Key Words: chronic obstructive pulmonary disease issues outcomes strategies study design
Measurement is the standardized process of assigning numbers to an attribute for purposes of communication. In clinical research, measurement refers to the combination of tools and procedures used to quantify health and disease and test the effects of medical treatment. Ultimately, the goal of measurement in health care research is to understand the burden of illness to the individual and to society, and to communicate the need, the benefits, and the risks of treatment to potential users. Accurately measuring treatment effects in a clinical trial requires an understanding of the treatment's mechanism of action; its intended and potentially unintended effects; knowledge of the target population and clinical setting; expertise in the related measurement environment; and careful attention to detail in study design, implementation, and statistical analysis. Sound measurement is an essential element of sound science. In the presence of what appears to be rigorous trial design and robust analysis, the "signal" of the treatment effect can only be detected if the measurement procedures are rigorousthat is, the evaluative tools are valid and precise, the procedures for administering the tools are carefully implemented, the time points are correctly specified, and the rules for aggregating data are appropriately applied.
Like all scientific endeavors, the development, refinement, and creation of new measurement approaches is not performed in isolation. Advancement in the science of health outcome measurement occurs concurrently with, and draws from, advances in biomedical, technological, clinical, and social sciences and changes in the political, economic, and social climate.
The purpose of this article is to describe the forces of change influencing the evolution of health care and the science of health outcomes measurement, and examine how these changes are influencing the measurement of treatment effects in clinical trials with specific reference to tests of therapies for chronic obstructive pulmonary disease (COPD). Five evolving concepts of measurement are discussed: accuracy, timing and recall, efficiency, interpretation, and "real world" measurement.
FORCES OF CHANGE AND THE HEALTH CARE ENVIRONMENT
Scientific discovery, the availability and volume of medical information, informed consumers, societal demand for access to new treatments, concerns about safety, rising health care costs, and advances in information technology are converging to change the health care environment and create new challenges and opportunities in the evaluation of treatment outcomes. The discovery of new drugs, the evolution of new or improved delivery systems, and the refinement and development of medical devices are increasing the number and variety of therapeutic options to minimize, reverse or eliminate underlying disease, improve health status, and/or maintain or enhance quality of life.
The growth of scientific information on the Internet together with the increase in access to information through personal computers and various forms of wireless technology is increasing the volume and the timeliness of information available to scientists, clinicians, patients, and caregivers around the world. The volume of scientific information is increasing daily. It has been estimated that over 2 million biomedical articles are published annually in over 21,000 journals. Over 13 million articles currently appear in the PubMed database, dating back to 1966, with 571,000 added in 2004 alone. Between 1,500 and 3,500 citations are added to this database each working day (1). In addition to abstracts, the articles themselves are often available online, providing depth, as well as breadth, of information to consumers of all levels. Access to such large volumes of information by consumers with varied backgrounds and levels of expertise creates both confusion and opportunity, and a milieu for growth and discovery.
Fueled by access to information, advertising across all media forms, and vocal consumer advocacy groups, patients are requesting, and in some cases demanding, a greater role in health care decision making. The use of patient-reported outcomes (PROs) in clinical trials is an important response to this demand, providing important information on the effect of treatment from the patient's perspective and contributing to more informed decision making at all levels.
New treatment options are not necessarily introduced without risk. Greater awareness and concerns about drug safety are also creating a demand for public access to information, including full reports of clinical trial results, to inform decision making. Efforts by the National Institutes of Health (NIH) in the United States to post results of publicly funded research on the Internet (2) and the program of the Pharmaceutical Research and Manufacturers of America (PhRMA) program to post clinical study results (3) are designed to enhance transparency and inform the public of study results that may not reach publication. The need for information goes beyond traditional phase III clinical trial data, however. Questions about drug safety postlaunch, when new products are used across a wide range of patients with various comorbidities and levels of compliance, require large-scale, real-world data tracking. New technologies, such as interactive voice response systems (IVRS), Web-based data capture, personal digital assistant (PDA) technology, and cell phones with text messaging capability, together with wide-ranging comfort with these technologies across generational and sociodemographic groups, will make naturalistic tracking of the risks and benefits of new therapies in large patient populations a reality.
Scientific, informational, social, economic, and technological changes are leading to new approaches and opportunities for improving our understanding of COPD and its effect on patients and caregivers, and are creating an environment ripe for discovering new treatment options for this serious chronic disease. These advances are particularly timely, in light of COPD's rising burden of disease (4) and increasing prevalence as a cause of death worldwide (5).
NEW CHALLENGES AND OPPORTUNITIES IN OUTCOMES MEASUREMENT
Although there are clearly a number of interesting challenges and opportunities in the science of outcomes measurement, five are particularly noteworthy: accuracy, timing and recall, efficiency, interpretation, and relevance (i.e., "real world" data).
Accuracy
Measuring the right endpoint with the right instrument is critical to understanding treatment effects. The analytic decision processes involved in trial design generally, and in outcome measurement specifically, are becoming increasingly complex, as new treatments and delivery systems are developed and the demand for understanding their effects from multiple perspectives increases.
Matching the underlying mechanism of action, concept of interest, study endpoint, and the evaluation instrument is an essential step in study design. Although this may sound simple and intuitive, it requires careful thought as to what, specifically, links the mechanism to the concept and outcome and how this specific outcome should be measured. In clinical trials of COPD, the treatment may be targeted at improving or reversing the underlying physiology, such as airflow obstruction, hypersecretion, or muscle wasting; altering patient perception and report of disease-related experience, including symptoms, functional status, and health-related quality of life (HRQL); improving or maintaining aspects of physical function, such as exercise tolerance or muscle strength; reducing or eliminating disease-related events, such as exacerbations; or changing the progression of disease over time. Processes of care, such as patient satisfaction or preference for treatment events, may also be of interest. Table 1 offers a framework for classifying outcomes in COPD and guiding the selection of the appropriate trial endpoints.
|
"Improving functional status" is another frequently used statement of intent that is too imprecise to be useful (8). Evaluating outcomes in terms of functional capacity (e.g., treadmill, cycle ergometer, 6-min walk test [6MWT]) or day-to-day performance (subjective measurement of activities of daily living, such as the Duke Activity Status Index) (9), the Functional Performance Inventory (10), or the Pulmonary Functional Status and Dyspnea Questionnaire (11) more clearly defines the concept of interest and guides the selection of the appropriate measure.
FEV1 has been the mainstay of outcome evaluation in COPD trials of bronchodilator therapies, logically and appropriately linking the mechanism of drug action to the appropriate outcome. Limiting an evaluation of bronchodilator effects to FEV1, however, ignores the clinical context of therapy, that is, endpoints such as dyspnea, fatigue, activities of daily living, and work productivity that are meaningful to the patients themselves and which often influence quality of life and adherence to therapy. In fact, FEV1 is likely to be insensitive or irrelevant in trials evaluating new bronchodilator treatments designed for convenience or to improve patient satisfaction and compliance. As new therapies are discovered that treat the disease at the pulmonary or systemic level where bronchodilation is not the mechanism or the intent of treatment, FEV1 is not an appropriate primary endpoint and a new outcome paradigm is necessary.
As the concept, endpoint, and measure become clear, the right person to provide data should be carefully considered. Clinicians are appropriate for evaluating directly observed or witnessed clinical phenomena. Clinical experience does not necessarily translate into rating expertise in an empiric setting, however. Clinician rating is subject to significant interrater variability (i.e., measurement error) (12). This variability will lead to greater difficulty detecting treatment effects and the need for larger sample sizes to reach statistical significance. Variability is particularly problematic in multicenter national and international trials involving large numbers of clinicians with varied levels of education, clinical research experience, and expertise on the rating instrument. Formally training clinicians on the instrument to target levels of interrater reliability (i.e., 0.70 to 0.80 or above) can reduce this form of measurement error and increase the likelihood of detecting a signal/treatment effect. Widespread access to the Internet and DVD technology is leading to cost-effective approaches for reaching and training clinical investigators and enhancing the precision and quality of endpoints, particularly in large and long-term studies characterized by geographic and clinician variability and/or high levels of coordinator turnover.
A clinician's global evaluation of treatment efficacy based on his or her clinical background and experience in patient care may be useful as a secondary endpoint, offering a quantitative estimate of clinical intuition that can be used for exploratory analyses to further understand outcomes of treatment. It is now generally accepted that a single-item clinical global impression approach is not efficient, accurate, or appropriate for use as a primary outcome in clinical trials.
Data on the effect of treatment on patient experiences or perceptions should come directly from the patients themselves (13, 14). Accurately quantifying subjective patient experiences, including symptom severity, frequency and impact; HRQL; perception of day-to-day functional performance; and satisfaction with treatment can be accomplished only through direct patient response. Filtering the data through clinicians or caregivers reduces the precision of this outcome and raises questions related to the validity of the data and the observed treatment effect.
Timing and Recall
Precise measurement requires the right endpoint, the right person and the right timing. To capture an effect, including onset, duration, and patterns of change, the outcome must be measured as closely as possible to the time of the expected effect. Evaluating immediate effects of bronchodilator therapy, for example, is well known and practiced in the form of pre- and postinhalation spirometric measurement. Advances in information technology are enabling scientists to capture timely data on other dimensions of disease and outcomes of treatment. Electronic diaries and IVRS, for example, can now be used to quantify day-to-day variability in symptoms and activity in COPD and changes surrounding an acute exacerbation. Ecologic momentary assessment involves an electronic datalog that randomly prompts patients to respond to one or more questions about their current status, such as activity, symptoms, and mood, as they go about their normal daily activity (15). This measurement approach can provide insight into variability and patterns of symptoms and activity in COPD. Activity monitors, such as the actigraph, objectively measure activity levels continuously over time, offering new insight into patterns of activity and symptomatology simultaneously, including circadian rhythmicity and day-to-day variability (16, 17). These measurement techniques can increase the precision with which these concepts are measured, but more importantly have the potential to uncover patterns of disease expression and rhythmic effects of treatment.
It is important to note that not all concepts or measures require immediate "real time" data assessment or short-term recall. Patient perceptions of their overall physical or psychologic health status, their impressions of the processes of care, such as satisfaction or preference, and reports of infrequent events, such as hospitalizations or exacerbations, can be reliably and validly measured using longer recall periods (18, 19). Technologies such as Web-based data capture can be used to track disease-related experiences and events on a weekly or monthly basis over long periods of time, to better understand the effect of treatment on symptoms, activity, and disease progression over months and even years.
Efficiency
Measurement efficiency is a method's ability to be simultaneously comprehensive, precise, and parsimonious. Item response theory (IRT) and aggregate endpoints each address this challenge. Although IRT is not new, electronic data-capture tools are moving this technique from theory and basic measurement research to the more applied clinical trial setting. IRT involves the use of mathematical models to describe the relationship, in probabilistic terms, between a person's response to a question and his or her level on the construct being measured by that question (20). The relationship is usually nonlinear and the probability of a given response usually increases as the level (on that construct) increases. Briefly, IRT is applied to situations in which a patient's level on a given attribute can be determined by how a patient responds to a set of questions that represent degrees of "difficulty" on the underlying construct of interest. Patients who indicate that they can run 10 km or climb two flights of stairs, for example, can clearly walk one city block on the level. Programming instrument administration to enable patients to skip items that fall "below" a given level is both valid and highly efficient.
Statistical and communication efficiencies can also be achieved by using conceptual frameworks and empiric methodologies that combine multiple items into dimensional scores and/or a single total score. Health status measures, such as the generic Medical Outcomes Study Short-Form 36 (SF-36) (21, 22), and the pulmonary-specific St. George's Respiratory Questionnaire (SGRQ) (23, 24) and Chronic Respiratory Questionnaire (CRQ) (25) provide summary scores that reflect the positive and negative aspects of disease and treatment from the patient's perspective. The SF-36, SGRQ, and CRQ have withstood the test of time and the large amounts of published data provide important context for interpreting clinical study results.
Newer measurement approaches may show promise as they are used and tested in multiple COPD populations and treatment settings. The BODE index, for example, combines clinical indicators of body mass index (B), airflow obstruction (O), dyspnea (D), and exercise capacity (E) into a single, 10-point scale of physical health state where higher scores are associated with a higher risk of death (26).
Interpretation
The health care consumer's need for information and the enhanced sensitivity of the medical environment to these concerns have led to the extensive use of HRQL and other PRO measures to understand the effect of treatment from the patient's perspective. Many of these measures are new or unfamiliar to decision makers at the regulatory, payer, provider, and patient level. In addition, the scaling metrics for these measures are variable. The SGRQ and CRQ, for example, are widely used pulmonary-specific health status measures designed to capture the effect of disease and treatment on patient's symptomatic experience and well-being. Although the two instruments assess related concepts, the structures of the two instruments are quite different. The SGRQ quantifies symptoms, activities, and impact of COPD, in addition to overall health status, represented by the total score. The CRQ assesses dyspnea, fatigue, emotional function, and mastery, with a total score also representing overall health status. Response options for the SGRQ are dichotomous (yes/no) and involve Likert-type (1 to 4 and 1 to 5) scaling, whereas all of the items on the CRQ involve a 7-point Likert-type scale (1 to 7). Subscale and total scores on the SGRQ have a range from 0 to 100, with lower scores indicating better health status, whereas scores on the CRQ range from 1 to 7, with higher scores indicating better health status. Each of these approaches is methodologically sound, and the difference in scaling methods does not create problems in descriptive, relational studies, where the metric of interest is a standardized correlation or regression coefficient (0 to 1) representing the strength of relationship between two variables. Unlike familiar clinical variables with standard metrics and relatively long empiric histories, such as pulmonary function parameters or exercise endurance time, the variations in scaling across PROs can create confusion among new users or clinicians attempting to interpret clinical trial reports. How does one develop the clinical or empiric "intuition" required to interpret study results when the measurement scales are variable? How "large" is a "large" effect on a 0 to 100 scale or 1 to 7 scale?
Questions of interpretation have led to the exploration of statistical and clinically based techniques for interpreting group scores. Norman and colleagues (27) suggested a "remarkable universality" of one-half standard deviation (1/2 SD) among statistical estimates of clinical significance for measures of HRQL. Sloan and coworkers (28) supported this position, and a recent triangulation approach for determining a minimal important difference in an oncology-specific HRQL measure was consistent with this proposition (29). The extent to which this general rule applies to other PROs is unknown, however. Leidy and Wyrwich (30), for example, found that 1/2 SD overestimated the minimal important difference (MID) in a dyspnea measure when contrasted with clinically based indicators of the MID. The extent to which statistically based estimates such as 1/2 SD can be linked to clinical data, where "clinically meaningful" is ultimately defined, requires careful consideration and input from clinical specialists.
Questions concerning the interpretation of PROs have led to debate concerning the appropriateness of designating a fixed value as the "minimal clinically important difference" (MCID) or a "minimally important difference" (MID) for a given outcome and to important discussions regarding the empiric basis (or, in some cases, lack of empiric data) for more "traditional" outcome measures (31). What constitutes a "clinically meaningful" improvement in FEV1 (32), the 6-min or shuttle walk tests (33) or exercise tolerance tests (34, 35)? Do improvements in these physiologic parameters lead to improvements in outcomes perceptible and meaningful to the patients? Does the effect warrant the cost, from either a fiscal or risk/side-effect perspective? These questions require further research.
Relevance
A fifth challenge in outcomes research is the disconnect between relatively short-duration, randomized, placebo-controlled efficacy trials required for regulatory review and market access and the real-world information needs of clinicians and payers. Efficacy trials provide important information about the relatively short-term efficacy and safety of a new product under carefully controlled and closely observed circumstances in a screened and selected patient population. By their very nature, it is difficult, if not impossible, to extrapolate these data to the effectiveness of the product in the broader patient population and under normal use. This information can only be determined through naturalistic effectiveness studies conducted after the product is introduced to the market. Effectiveness studies and safety registries offer opportunities to evaluate the benefits of treatment in the real world, helping decision makers understand the benefits as well as risks of new treatments at the population and the individual patient level. Effectiveness studies and safety/benefit evaluations require real-world instruments that are relevant, easy to complete, and sensitive to real-world treatment effects.
SPECIFIC AREAS FOR FURTHER RESEARCH
As scientific, social, and technological advances converge, new opportunities for research emerge, leading to improved understanding of disease and treatment. In addition to the opportunities outlined above, there are immediate measurement needs that must be addressed to move the science of COPD and the care of patients with COPD to the next level.
Improving the precision of symptom measurement is an important priority. Understanding patient perception of dyspnea severity, the day-to-day and within-day variability of this symptom, the relationship between dyspnea and daily activity, and the rhythmic patterns of dyspnea, activity, fatigue, and sleep will help further the foundational understanding of the disease and disease experience and improve the evaluation of treatment outcomes. Devices such as electronic diaries and actigraphs together with the increasing comfort level of the older adult with these types of devices will help address these unmet informational needs.
A second measurement area in need of further work in COPD is the evaluation of functional capacity in a clinical setting. Exercise tolerance testing using treadmills or cycle ergometers is essential for evaluating the effects of treatment on very specific tolerance endpoints, such as maximum exercise capacity or constant work rate duration. This approach is not only expensive, limiting a given study to clinical sites where these tests can be performed, but is not necessarily predictive of daily activity levels, often the ultimate intent of treatment. The 6MWT would be a useful method for evaluating the effect of treatment on patient's activity tolerance, as either a primary or secondary endpoint. This test was introduced a number of years ago (36), is more closely tied to daily activity tolerance than cycle or treadmill tests, and has proposed MID values to assist in interpretation (33, 37). Unfortunately, this test often shows wide variability across patients and across sites, which can discourage its use in clinical trials due to the large sample sizes required to overcome this limitation. It is possible, however, and even likely, that the use of a standardized 6MWT implementation protocol (38), with careful training across clinical sites to ensure standardized application of the protocol, would reduce this variability and improve the utility of this clinical outcome tool. Wider use of the 6MWT in clinical trials will improve our understanding of the effect of various treatments on daily activity tolerance and may even help patients realize higher levels of activity at home, as they come to appreciate their capacity for activity through a walking exercise in a protected clinic environment.
Exacerbations are a critical component of the COPD experience. For patients, these episodes can be very disconcerting. For those concerned about the economic costs of this disease, exacerbation ranks first as a driver of health care costs associated with this disease. Despite its importance, the definition and empiric measurement of exacerbation have been elusive. What, for empiric purposes, constitutes an exacerbation? Two or more days of increased symptoms accompanied by reduced activity level? How are severity and duration defined? Stabilized symptoms and activity level at preexacerbation levels for 2 d or more? This important clinical phenomenon needs to be standardized to enhance its utility as a trial outcome and permit meta-analysis across trials to inform decision making. Technical advances, including PDAs and IVRS, permit the evaluation of daily symptoms and activity level.
Another interesting challenge concerns the predictive validity of COPD outcomes. Can we operationalize the clinical context of treatment effects observed in short-term clinical trials by understanding the empiric relationship between this effect and longer term outcomes? This is analogous to the relationship between lipid levels and subsequent cardiovascular events, enabling decision makers, including clinicians and payers, to extrapolate from the efficacy trials to the broader use of the product and its preventive health effects over time. Can we predict changes in illness trajectory based on health status, for example? Results reported by Domingo-Salvany and colleagues (39) and Olga and colleagues (40) suggest this is possible with the SGRQ. The BODE index appears to have predictive value as well (26). Both are relatively easy to include in clinical trials and would assist with the extrapolation of findings beyond the typical 3- to 6-mo study period.
For many years, FEV1 served as a reliable and useful indicator of disease severity and has been one of the most frequently used measures for evaluating treatment outcomes in COPD. As our attempt to understand the disease shifts toward a more systemic, multicomponent approach, the need for a new, multidimensional outcome paradigm becomes evident, beginning with a standardized set of outcomes and outcome measures. A multidimensional outcomes set for use in clinical studies of COPD would not only facilitate meta-analysis but could lead to data warehousing for secondary analyses and hypothesis generation, ongoing measurement validation, and the development of IRT methods to simplify outcome measurement further. Candidates for standardized outcomes include the following: health status, clinical exercise capacity (e.g., the 6MWT), daily function (a measure of subjective perception and objective activity levels), symptoms (dyspnea, fatigue), and exacerbations. This would, of course, require consensus on the outcomes and measures, clearly stated and accessible standards for applying the measures, and the consistent application of the measurement standards during study design, implementation, analyses, and interpretation.
CONCLUSIONS
The convergence of scientific, social, and technological advances offers new opportunities to understand COPD and its effect on patients and caregivers and motivates the development and testing of new therapies for this serious disease. Each of the five areas discussed here offers interesting opportunities for further development and research.
FOOTNOTES
Conflict of Interest Statement: N.K.L. made a presentation at the symposium on "Linking Outcomes and Pathobiology in COPD" organized by AstraZeneca in 2005, from which this article was adapted. Her employer, United BioSource Corporation (formerly The MEDTAP Institute at UBC), received $2,500 as an honorarium for Dr. Leidy's participation. She is employed by the United BioSource Corporation, which provides consulting and other services, including some of the types of services mentioned in this article, to pharmaceutical, device, government and nongovernment organizations. In this salaried position, she works with a variety of companies and organizations. She receives no payment or honoraria personally from these organizations for services rendered.
(Received in original form December 16, 2005; accepted in final form January 9, 2006)
REFERENCES
This article has been cited by other articles:
![]() |
N. Roche Where current pharmacological therapies fall short in COPD: symptom control is not enough Eur. Respir. Rev., September 1, 2007; 16(105): 98 - 104. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |