The Proceedings of the American Thoracic Society 4:334-337 (2007)
© 2007 The American Thoracic Society
doi: 10.1513/pats.200611-176HT
Power Considerations for Studies of Lung Function in Cystic Fibrosis
Mary Corey1
1 Child Health Evaluative Sciences, The Hospital for Sick Children, and Public Health Sciences, Faculty of Medicine, University of Toronto, Toronto, Canada
Correspondence and requests for reprints should be addressed to Mary Corey, Ph.D., Child Health Evaluative Sciences, The Hospital for Sick Children, 555 University Avenue, Toronto, ON M5G 1X8, Canada. E-mail: mary.corey{at}sickkids.ca
ABSTRACT
Observational studies and clinical trials in cystic fibrosis (CF) have largely been concerned with improving long-term survival. Lung function tests, in particular FEV1, have proven to be reliable and objective measures for monitoring the course of CF lung disease. Over several decades, the variability and average rates of FEV1 decline have been remarkably stable. In the past decade, specific treatments and management of CF have resulted in a more gradual rate of decline, so that large numbers of patients are needed to demonstrate a significant subgroup or treatment difference. New measures are needed that detect changes before lung function decline, and that reflect more subtle changes over time. As new measurement tools are developed, FEV1 provides a model to show how age, sex, duration, and frequency of measurement are related to variability, sample size, and power in cross-sectional or longitudinal studies. Chest radiographs are a standard tool for clinical assessment of an individual patient. However, their use in clinical trials has been limited by the lack of an objective way of measuring the elements that characterize the disease process. The CT scan offers more specific measurements relating directly to the process of lung disease in CF. Computerized algorithms can provide objective scores, but it will be an ongoing challenge to confirm the validity of candidate measures and their relationship to CF lung disease.
Key Words: cystic fibrosis lung function, sample size
Over the past 30 years, treatment and clinical research in cystic fibrosis (CF) has been aimed at maximizing growth, optimizing lung health, and improving survival. Figure 1 shows how CF survival has improved in sequential birth cohorts from the early 1970s to the turn of the century (1). The historically poorer survival for females is apparent in the earlier cohorts, but not in patients with CF born since the mid 1980s. In the latest cohort, born 1998–2002, there is virtually no mortality up to age 5 years. If mortality rates up to age 20 years will be no worse than those seen in the cohorts from 1983 on, greater than 90% survival to age 20 years would be predicted. Whether differences in male and female survival will appear later is impossible to tell. Although survival age is the ultimate measure of CF severity, the improving rates mean that mortality is not a useful endpoint to evaluate the progress of patients with CF or the effectiveness of interventions.

View larger version (32K):
[in this window]
[in a new window]
|
Figure 1. Survival curves of successive 5-year birth cohorts in female (top panel) and male (bottom panel) patients with cystic fibrosis (CF) in the Canadian Patient Data Registry.
|
|
Pulmonary function tests of lung volume and airflow are regularly measured on all patients over 5 or 6 years of age to monitor the status and progression of lung disease. FEV1 and its rate of decline have proven to be strong predictors of survival time (2, 3). However, Figure 2 shows the great variability that is seen in FEV1 at all ages. Patients with one or more cystic fibrosis transmembrane regulator mutations consistent with the pancreatic sufficient phenotype have, on average, statistically better values of FEV1. But the variability is still large and not predictive for individual patients. Many of the summary scores for CT scans show a similar range of variability. Is it technical variance? Is it patient variation? Is it day-to-day fluctuation of lung disease? To what extent do environmental factors and modifier genes influence the value for an individual at a single time point? The strong correlation of longitudinal measures of FEV1 with age at death has established FEV1 as the prime surrogate for lung disease severity in CF, but there is growing evidence that the critical events in lung pathology occur at a very early age, and, therefore, standard lung function testing is not optimal for evaluating novel therapies aimed at the earliest stages of CF lung disease.

View larger version (17K):
[in this window]
[in a new window]
|
Figure 2. Cross-sectional values of FEV1% predicted plotted against age for Toronto patients with CF, aged 5–30 years in 2005, by pancreatic status: pancreatic insufficiency (open circles and solid regression line) and pancreatic sufficiency (closed circles and dashed regression line).
|
|
In a study of lung function in successive 5-year birth cohorts from 1960 to 1974 (4), the average values of FEV1 at age 5 years and its slope to age 14 years were remarkably stable (Table 1). However, in an updated study (5) looking at the next three 5-year birth cohorts, two interesting trends are observed. When first tested at age 5 years, children in the later cohorts had lower average FEV1, but their average rate of decline was much slower. Mortality selection was far higher in the early cohorts, so that the most severely affected children were excluded because they died before the age of lung function testing. In the later cohorts, more and more children survived to school age, including those who were severely affected and whose lung health was already compromised. However, the average rate of decline to age 14 years has become steadily better in each 5-year cohort. This general reduced rate of decline has been reported in another study of young adults (6). So, FEV1 is becoming less and less helpful in these early years when there is already evidence of lung disease and the rate of decline is so gradual that to demonstrate any improvement would take huge numbers over long periods of time.
View this table:
[in this window]
[in a new window]
|
TABLE 1. CHANGING PATTERNS OF LUNG FUNCTION DECLINE (FEV1% PREDICTED) FROM 5 TO 14 YEARS OF AGE IN CHILDREN WITH CYSTIC FIBROSIS IN CONSECUTIVE BIRTH COHORTS IN TORONTO
|
|
Clearly, new measures of early lung damage are needed. But examining FEV1 as an outcome is helpful in deliberating the relative merits of new measures based on more sensitive tests. The first issue that must be addressed is that of standardization. Technical standards for lung function testing have been established over many decades (7, 8). The value of FEV1 is known to depend on body size, age, and gender. To evaluate changes due to disease, equations have been developed to predict the value of FEV1 in the absence of disease. But careful attention must be paid to the appropriateness of reference equations for the population, age group, and intended analysis (9, 10). Subgroup variability is an extremely important issue, because the variance of a particular measure may be reduced within subsets of the population. Studies that properly select, adjust, or allow for subgroups will have more power to detect differences. Finally, there is the issue of what is an important clinical difference in the defined standardized measure in the defined population subgroup.
As a simple example, Table 2 shows the mean and SD of FEV1% predicted measurements for patients with CF based on sex and age groups. Data were summarized from the Toronto CF Database, and include one value per year for each patient seen in the 10-year period, 1995–2004. There is a decrement in mean values over age, which appears to level off at the older ages. The SDs are fairly stable, but, in older males, the SD is increased. Also note that the relative number of females is reduced with age, so that, in the oldest group, there are twice as many males as females, reflecting the greater female mortality in older cohorts. The greater variation in males is likely due to survivor selection; that is, many older males with low FEV1 values have survived whereas their female counterparts have died. Such subgroup differences must be considered in planning studies and interpreting results. Table 3 shows how rather small differences in SD affect the sample size needed to detect a difference (d) in two groups, whether they be prognostic sub groups in an observational study, or treatment groups in a clinical trial. From any biostatistics text, the sample size formula is:
 |
where
is the two-sided significance level, 1 – ß is the power, and
is the SD.
View this table:
[in this window]
[in a new window]
|
TABLE 2. MEAN VALUES FOR FEV1% PREDICTED IN TORONTO PATIENTS WITH CYSTIC FIBROSIS TESTED FROM 1995 TO 2004 USING THE FIRST MEASUREMENT FOR EACH PATIENT AFTER EACH BIRTHDAY
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 3. SAMPLE SIZE REQUIRED IN EACH OF TWO GROUPS TO TEST A DIFFERENCE IN MEAN FEV1% PREDICTED FOR SPECIFIC VALUES OF THE STANDARD DEVIATION AND THE DIFFERENCE CONSIDERED IMPORTANT TO DETECT (ASSUMING A TWO-SIDED SIGNIFICANCE LEVEL OF = 0.05 AND POWER [1 – ß] = 0.8).
|
|
Although, intuitively, multiple measurements on individuals should help to reduce variability and increase the power to detect differences over time and between groups, the complexity of calculations is increased because it is necessary to incorporate measurement variation both between and within individual subjects. It is often of interest to compare the mean rate of change, or slope, between two groups. The sample variance of the slope, which is given by
is a combination of three components—the true variation between individual slopes, the within-subject variance around individual slopes, and the sum of the squared deviations of measurement times "x" (11). For example, x = 0, 6, 12, 18, and 24 months, and
=12, in a 2-year study with measurements at 6-month intervals. Increasing either the frequency of measurements or the duration of the study will reduce the variance and the required sample size.
Table 4 shows how frequency of measurement and study duration affect the sample size needed to detect a difference in the linear slope of FEV1% predicted. The estimates of the sample variance in slope for different scenarios were calculated from a sample of 60 patients with CF between 8 and 18 years of age in the Toronto clinic who had 3 monthly measures taken over a period of 3 years. The variance estimate for each scenario was computed using only the relevant measures on the same patients. The estimate for a 2-year study with 6 monthly visits (variance = 36.5) was similar to that in two previous Toronto studies (12, 13), in which subjects were measured every 6 months over a 2-year study period (variance = 38.4). Missing values in a study design will usually increase variance, and may also create serious bias if the missing measurements are not random, but related to the outcome of interest. From Table 4, it is clear that increasing the duration of a study is more powerful than increasing the frequency of measurements, and that very large numbers of patients will be needed to detect differences in study populations where the background rate of decline is near 1% predicted per year.
View this table:
[in this window]
[in a new window]
|
TABLE 4. EFFECT OF TRIAL DURATION AND FREQUENCY OF MEASUREMENTS ON SAMPLE SIZE REQUIREMENTS FOR TESTING DIFFERENCE OF SLOPE IN LONGITUDINAL MEASURES OF FEV1% PREDICTED WITH n SUBJECTS AND m TIME POINTS PER SUBJECT
|
|
This report has focused on FEV1 because this lung function measure has been shown to best reflect morbidity and mortality from CF lung disease over the full course of the disease (2–4). If the focus is on young patients with mild disease, then the midexpiration forced expiratory flow (FEF25–75) is thought to be a more sensitive measure, reflecting early abnormalities in peripheral airways (14). However, the variability in FEF25–75 is also known to be much higher than for FEV1, as shown in Table 5, where the SDs are roughly 1.5-times higher than those in Table 2 for the same patients. If there is a reasonable hypothesis in a clinical trial, that a larger effect will be seen in the peripheral airways than in the larger airways, then it may well be that FEF25–75 would be a more efficient primary outcome, because the greater variability would be offset by a larger effect size. Most studies do, indeed, include multiple measures of lung function, and the similarity or difference in the results of different measures can help to interpret the mechanism of action in the lungs.
View this table:
[in this window]
[in a new window]
|
TABLE 5. MEAN VALUES FOR % PREDICTED FEF25–75 IN YOUNG PATIENTS WITH CYSTIC FIBROSIS TESTED FROM 1995 TO 2004 USING THE FIRST MEASUREMENT FOR EACH PATIENT AFTER EACH BIRTHDAY
|
|
Chest radiographs have always been used to augment clinical judgment in the assessment of individual patients. Several scoring systems have been devised (15), but all require subjective interpretation by a trained observer. Although all have been shown to be reasonably correlated with lung function tests, it is lamentable that automated quantitative scores have not been devised to measure specific aspects, such as air trapping, which may reflect the earliest abnormalities in young CF lungs. Combining all the detectable abnormalities in a chest radiograph is one example of a composite score that may be suitable for tracking overall changes in an individual. However, clues to the underlying pathology or hypotheses about potential treatment effects are more likely to come from precise measures of specific parameters.
In summary, the power to detect changes or treatment differences in outcome measures depends on many factors. First and foremost, the outcome measure must reflect an important aspect of the disease. FEV1 has been an excellent surrogate for long-term survival. Now, measures are needed to detect and monitor the events that initiate and establish lung disease in CF. One must find ways of minimizing or explaining technical and subject variability in every new measure. If composite scores are developed, they must be objective and, ideally, automated, and must be demonstrated to reflect the underlying biological process. To assess change over time, reproducibility and stability over time must be verified. Linearity and continuity must be tested. For example, the true trend may be curved (i.e., nonlinear), or there might be a threshold effect. Timing of measurements and duration of follow-up must be optimized for risk reduction as well as effect detection. There is virtually no risk attached to spirometry, except for cough and fatigue in the most severely affected patients. Imaging studies entail risks that may not be fully appreciated and may be cumulative. If a new metric is to become the standard for detecting early signs of CF lung disease and for evaluating novel treatments, then it must be safe and feasible for serial testing in the very young.
FOOTNOTES
Supported by the Canadian Cystic Fibrosis Foundation; Network of Centers of Excellence, Mathematics of Information and Complex Systems (MITACS); Genome Canada.
Conflict of Interest Statement: M.C. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript.
(Received in original form November 29, 2006; accepted in final form April 6, 2007)
REFERENCES
- Canadian Cystic Fibrosis Foundation. Report of the Canadian Patient Data Registry 2002, Toronto, Ontario.
- Kerem E, Reisman J, Corey M, Canny GJ, Levison H. Predicting mortality in patients with cystic fibrosis. N Engl J Med 1992;326:1187–1191.[Abstract]
- Schluchter MD, Konstan MW, Davis PB. Jointly modelling the relationship between survival and pulmonary function in cystic fibrosis patients. Stat Med 2002;21:1271–1287.[CrossRef][Medline]
- Corey M, Edwards L, Levison H, Knowles M. Longitudinal analysis of pulmonary function in patients with cystic fibrosis. J Pediatr 1997;131:809–814.[CrossRef][Medline]
- Xu W, Subbarao P, Corey M. Changing patterns of lung function decline in children with cystic fibrosis. J Cyst Fibros 2004;3:S116.
- Que C, Cullinan P, Geddes D. Improving rate of decline of FEV1 in young adults with cystic fibrosis. Thorax 2006;61:155–157.[Abstract/Free Full Text]
- American Thoracic Society. Standardization of spirometry: 1994 update. Am J Respir Crit Care Med 1995;152:1107–1136.[Medline]
- Laszlo G. Standardisation of lung function testing: helpful guidance from the ATS/ERS Task Force. Thorax 2006;61:744–746.[Free Full Text]
- Rosenfeld M, Pepe MS, Emerson J, Longton G, FitzSimmons S. Effect of different reference equations on the analysis of pulmonary function data in cystic fibrosis. Pediatr Pulmonol 2001;31:227–237.[CrossRef][Medline]
- Subbarao P, Lebecque P, Corey M, Coates AL. Comparison of spirometric reference values. Pediatr Pulmonol 2004;37:515–522.[CrossRef][Medline]
- Schlesselman JJ. Planning a longitudinal study: II. Frequency of measurement and study duration. J Chronic Dis 1973;26:561–570.[CrossRef][Medline]
- Nolan G, McIvor P, Levison H, Fleming PC, Corey M, Gold R. Antibiotic prophylaxis in cystic fibrosis: inhaled cephaloridine as an adjunct to oral cloxacillin. J Pediatr 1982;101:626–630.[CrossRef][Medline]
- Reisman JJ, Rivington-Law B, Corey M, Marcotte J, Wannamaker E, Harcourt D, Levison H. The role of conventional physiotherapy in cystic fibrosis: a three year study. J Pediatr 1988;113:632–636.[CrossRef][Medline]
- Quan JM, Tiddens HA, Sy JP, McKenzie SG, Montgomery MD, Robinson PJ, Wohl ME, Konstan MW; Pulmozyme Early Intervention Trial Study Group. A two-year randomized, placebo controlled trial of dornase alfa in young cystic fibrosis patients with mild lung function abnormalities. J Pediatr 2001;139:813–820.[CrossRef][Medline]
- Terheggen-Lagro S, Truijens N, van Poppel N, Gulmans V, van der Laag J, van der Ent C. Correlation of six different cystic fibrosis chest radiograph scoring systems with clinical parameters. Pediatr Pulmonol 2003;35:441–445.[CrossRef][Medline]
This article has been cited by other articles:

|
 |

|
 |
 
L. J. Stark, A. L. Quittner, S. W. Powers, L. Opipari-Arrigan, J. A. Bean, C. Duggan, and V. A. Stallings
Randomized Clinical Trial of Behavioral Intervention and Nutrition Education to Improve Caloric Intake and Weight in Children With Cystic Fibrosis
Arch Pediatr Adolesc Med,
October 1, 2009;
163(10):
915 - 921.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Hubert, F. Aubourg, B. Fauroux, L. Trinquart, I. Sermet, G. Lenoir, A. Clement, A. T. Dinh-Xuan, B. Louis, B. Mahut, et al.
Exhaled nitric oxide in cystic fibrosis: relationships with airway and lung vascular impairments
Eur. Respir. J.,
July 1, 2009;
34(1):
117 - 124.
[Abstract]
[Full Text]
[PDF]
|
 |
|