|
|
||||||||
1 Bioinformatics Program, Boston University, Boston, Massachusetts; 2 Department of Pathology and Laboratory Medicine, and 3 Pulmonary Center, Boston University School of Medicine, Boston, Massachusetts
Correspondence and requests for reprints should be addressed to Avrum Spira, M.D., M.Sc., Boston University School of Medicine, Pulmonary Center, 715 Albany Street, R-304, Boston, MA 02118. E-mail: aspira{at}bu.edu
ABSTRACT
While the role cigarette smoke plays in chronic obstructive pulmonary disease (COPD) is undisputed, the molecular mechanisms by which inhaled smoke contributes to disease pathogenesis remains unclear. One of the major barriers to effective approaches to diagnose and manage COPD is the remarkable heterogeneity displayed by patients with the disease. Whole-genome gene-expression studies of airway and lung tissue from patients with COPD provide an opportunity to gain insights into disease pathogenesis, allowing for both a molecular understanding of the pathogenic processes that contribute to this heterogeneity, and the ability to target therapies to these processes. This review focuses on synthesizing and integrating the limited numbers of high-throughput gene expression studies that have been conducted on lung tissue and airway samples from smokers with COPD. Comparing several lung tissue studies using computational approaches, we find that the results suggest fundamental similarities and identify common biological processes underlying COPD, despite each study having identified largely nonoverlapping lists of differentially expressed genes. Given these similarities, we argue that additional lung tissue and airway gene-expression studies are warranted, and present a roadmap for how such studies could lead to clinically relevant tools that would impact COPD management.
Key Words: gene expression microarray analysis biomarkers emphysema
Chronic obstructive pulmonary disease (COPD) is the fourth most common cause of death in the United States (1), and the mortality rate due to COPD continues to increase as heart disease and stroke mortality decline (2). Airflow limitation in COPD is usually progressive and is associated with an abnormal inflammatory response to cigarette smoking (3). COPD is diagnosed on the basis of fixed airflow obstruction that is not due to bronchiectiasis, cystic fibrosis, or previous tuberculosis (3). This operational definition of COPD encompasses a broad spectrum of patients among whom there is substantial heterogeneity in terms of respiratory and systemic symptoms, pulmonary physiology, radiographic findings, histopathology, and, undoubtedly, underlying molecular disease processes.
Environmental exposures, most notably tobacco smoking (but also exposure to other particulate and gaseous pollutants), are associated with longitudinal decline of pulmonary function and are important causes of COPD. Several lines of evidence indicate that genetic factors also contribute to COPD. First, family studies have revealed increased risk of lung function impairment in smoking first-degree relatives of individuals with COPD (4, 5), and population-based studies have shown substantial heritability of spirometry measures (6, 7). Second, severe
1-antitrypsin deficiency due to homozygous mutations in SERPINA1 (AAT) gene is a documented cause of COPD (8). However, as the frequency of these deleterious variants in AAT is low, they explain only a small proportion of COPD prevalence.
The histopathology of COPD includes abnormalities of the central airways (subepithelial infiltration with macrophages and CD8+ T lymphocytes, mucous gland hypertrophy, and inflammation), peripheral airways (goblet cell hyperplasia, subepithelial infiltration with CD8+ T lymphocytes, smooth muscle hypertrophy, increased airway wall thickness, and loss of alveolar attachments), and lung parenchyma (centriacinar emphysema, panacinar emphysema, and infiltration of CD8+ T lymphocytes) (9). Among patients with COPD, there is great heterogeneity in the severity, relative predominance, and regional distribution of these pathologies. In some patients, parenchymal emphysema predominates, while others have a relative predominance of airway wall thickening. Although some data have suggested that the degree of parenchymal emphysema is the most important pathologic correlate of lung function impairment in patients with severe disease (10), the relative contributions of emphysema and airways disease are likely to differ among patients (11).
While the contribution of cigarette smoke to the pathogenesis of COPD is widely appreciated, the mechanism by which inhaled tobacco smoke contributes to disease pathogenesis remains unclear. Furthermore, we have very little insight as to why 10 to 20% of smokers develop COPD while the remainder does not. Whole-genome gene expression studies of diseased lung and airway tissue provide an opportunity to gain a detailed and unbiased portrait of the molecular consequences of COPD with the potential to gain insights into the processes that contribute to disease pathogenesis, and the relationship between the various pathologies that give rise to the clinical manifestations of COPD. The hope is that this detailed molecular understanding of COPD will both allow for patients to be categorized according to the specific disease processes responsible for their disease, and enable therapies to be targeted to these pathogenic processes. Realizing the promise of these studies has the potential to revolutionize the diagnosis and treatment of COPD.
We remain at the beginning of this path, as there are presently very few whole-genome expression studies of COPD in humans (12–16). Progress has been slow, and some of the initial enthusiasm for using genome-wide gene expression in the setting of COPD has been tempered by the apparent lack of overlap between these studies as to which genes are differentially expressed in response to COPD (15). In this review, we focus on synthesizing and integrating these studies. Using several computational approaches, we find that there is a high degree of overlap in the biological themes represented by the genes identified in these studies, and that while the lists of differentially expressed genes presented in each of the studies are largely nonoverlapping, there is considerable evidence that the gene-expression results of each paper share fundamental similarities. As a result, we believe that larger, better-powered studies are warranted. Future directions for such studies will be outlined to provide a roadmap for translating studies of COPD-related gene expression into clinically relevant tools that can impact disease management.
SUMMARY OF LUNG TISSUE GENE EXPRESSION STUDIES
As of June 2008, there have been four studies published on gene expression of lung tissue in COPD (12–15). Each study examined cohorts that were ascertained using slightly different inclusion and exclusion criteria, and identified genes that are differentially expressed as a function of different clinical variables. These differences between studies complicate comparisons of their findings. The cohorts, clinical samples, variables examined, and the gene expression technologies employed are summarized in Table 1.
|
Although this study employed a moderate sample size, it used leave-one-out cross-validation and chose only genes that were consensus picks among several algorithms as those that are differentially expressed in COPD. The decision to combine patients with no COPD and those with mild COPD as the reference group in this analysis may conceivably have limited the power of the study if individuals without emphysema have a different pattern of gene expression than individuals with mild early-stage disease. A perhaps more serious concern about this study is that the analysis may have been confounded by cancer-specific alterations in lung gene expression, as 64% of the individuals in the severe emphysema group had lung cancer compared with 0% of the control group. The relevance of this concern is highlighted by subsequent work from our group demonstrating that alterations in gene expression occur throughout the respiratory tract in patients with lung cancer (27).
The study by Golpon and coworkers (13) took peripheral lung tissue samples from patients undergoing LVRS or transplantation for COPD. This included six smokers with emphysema associated with
1-antitrypsin deficiency and five smokers with emphysema of otherwise unknown etiology. Gene expression profiles in these diseased tissue samples were compared with those of lung tissue from nonsmoker organ donors. They compared gene expression in all emphysema samples versus the normal samples,
1-antitrypsin–deficient (AAT) emphysema versus regular emphysema, regular emphysema versus normal samples, and AAT emphysema versus normal samples. They found gene expression differences between AAT emphysema and non-AAT emphysema, with genes involved in protein biosynthesis, energy pathways, and electron transport being expressed at higher levels in AAT emphysematous lung tissue, and genes involved in the cellular defense response and small GTPase-mediated signal transduction being expressed at higher level in non-AAT emphysematous lung tissue. They also found differences between the emphysematous samples and the normal tissue.
The study of AAT emphysema versus normal emphysema is interesting and important, as AAT emphysema is often used as a COPD model. However, the ability to make firm conclusions about these differences is constrained by the small size of the cohort. Perhaps a more serious limitation is that the nonemphysematous tissue was from healthy nonsmoking lung donors, while 91% of the patients with emphysema were smokers. Thus, it is not possible to determine the relative contribution of smoking and emphysema to the gene expression differences between the two arms of the cohort.
Ning and colleagues (14) obtained surgical lung tissue specimens from smokers with no (n = 12) or moderate (n = 14) emphysema as staged using the Global Initiative for Chronic Obstructive Lung Disease criteria (GOLD-0 and GOLD-2, respectively). They did not specify why the patients were having surgery. It is therefore unclear if there are non-COPD pathologies that confound ascertainment to the various arms of the study. They pooled six GOLD-2 samples and five GOLD-0 samples and ran both SAGE and microarray analysis. They ran six each of the remaining GOLD-2 and GOLD-0 samples individually on microarrays. They also obtained fibroblasts from excised lungs from patients with COPD undergoing lung transplantation (n = 3) and from healthy donor lungs (n = 3). They performed additional immunofluorescence and quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) studies on these fibroblasts to confirm the COPD-related differential expression of EGR-1 expression, which they had identified in their microarray studies.
The use of SAGE is unique among the studies. SAGE relies on sequencing the 3' end of cDNA generated from each the samples (for review see Reference 18). Because differential expression is detected as differences in the number of times a particular sequence is detected in the collection of sequences from the healthy versus diseased pool, it can detect both expression and differential expression of relatively low abundance genes. Such genes are not usually measurable using microarrays due to the signal-to-noise characteristics of microarray technology. The results from the SAGE experiments and the microarray experiments in this study support each other, and the authors successfully validated a subset of their results in fibroblasts from additional patients. The sample size in this study was again quite small. However, the genes found to be differentially expressed between GOLD-2 and GOLD-0 patients have functions in processes that are thought to be involved in COPD, which suggests that their results are in concordance with the biology of COPD.
Wang and coworkers (15) took lung-tissue samples from patients undergoing lung-nodule resection for suspected lung cancer; samples came from nonsmokers and smokers with COPD severity ranging from GOLD-0 to GOLD-4, for a total of 48 samples. They also examined the histopathology of each sample to determine percent parenchyma and other tissue types. They found 203 genes whose expression was associated with forced expiratory flow between 25 and 75 percent of forced expiratory volume (FEF25–75% predicted), a measure of small airways function, while controlling for pack-years. They found that further controlling for the fraction of parenchyma in each tissue sample did not affect the ability to detect genes whose expression is associated with COPD. Using immunohistochemical methods, they additionally found no correlation between FEF25–75% predicted and the number or percentage of alveolar macrophages positive for PLAU or PLAUR, suggesting that the gene expression differences for these genes are unlikely to be a result of inflammation or tissue remodeling.
The strengths of this study include its large sample size and the use of immunohistochemistry to explore the cell types that are responsible for COPD-related gene expression changes, as well as the contribution of inflammation and tissue remodeling to the lung function–dependent changes in gene expression. It is interesting that the authors associated gene expression with FEF25–75% predicted, while FEV1% predicted is a more standard measure of lung function in the context of COPD. While all samples from both smokers and nonsmokers are from patients with lung cancer, it is important to consider the possibility that the gene expression consequences of lung cancer may vary between sporadic and tobacco smoke–induced cancer.
INTEGRATION OF LUNG TISSUE GENE EXPRESSION STUDIES
A table published by Wang and colleagues (15) and reproduced in Table 2 shows that there is very little overlap between the genes that have been identified as being differentially expressed as a function of COPD-related markers in each study. This is likely due to the small sample size of each study and possibly to differences in the clinical markers of disease used for gene expression association. Differences in inclusion criteria, high-throughput gene expression platform, and analytical approaches may also contribute to this lack of overlap. One of the challenges of the meta-analyses that have been performed on these datasets is that each study relies on a P value threshold to determine whether a gene is significantly differentially expressed. However, due to the relatively small sample size in each study, the statistical power of each study to detect the complete set of genes that are affected by COPD is quite low. This complicates the interpretation of a gene not being detected as differentially expressed in any study. In the meta-analysis that we will describe below (Figure 1), we have employed a pathway-based analysis using Gene Ontology classifiers (19) to identify common biological themes in the genes identified as being differentially expressed in each study. In addition, we have used a method called Gene Set Enrichment Analysis (GSEA) (20, 21) to further explore the "P value threshold" problem. This approach ranks all genes according to their differential expression as a function of COPD (using a signal-to-noise metric) and then determines whether the genes identified as being differentially expressed in a second study (the "Gene Set") are significantly skewed toward the top or bottom of that ranked list.
|
|
|
"Leading Edge" genes in GSEA are the genes from each gene set that are most differentially expressed in the test set (20)—in this case, the data from Spira and coworkers. Using the GSEA software, we identified leading edge genes for the Ning and Golpon sets (containing genes each group identified to be expressed at a higher level in patients with emphysema). Biological categorization of these genes was performed with DAVID and its associated program EASE (22, 23) as described earlier. Biological categories that are significantly enriched in the leading edge genes from the analysis of the Ning gene list included GO biological processes involved in response to external stimulus and response to wounding. The expression of these genes in the Spira dataset was explored via hierarchical clustering (Figure 2). The leading edge of the Golpon genes fell into the GO biological process categories related to cell adhesion.
|
AIRWAY GENE EXPRESSION STUDIES
The studies that have been described up until now have focused on determining the gene expression consequences of COPD in lung tissue: a tissue in which disease-related processes can be expected to occur. In this section, we discuss the potential to study COPD-related processes by examining gene expression in more readily accessible proximal airway tissue. This work is motivated by the concept that smoking creates a "field of molecular injury" in the epithelial cells that line the entire respiratory tract. In support of this, many groups (including our own), have studied gene expression in bronchial airway epithelial cells from smokers with and without smoking-related lung disease using high-throughput gene expression platforms. Initial studies showed that smoking effects airway epithelial cell gene expression (24, 25), and that while some of these gene expression changes revert toward baseline after smoking cessation, some genes remain irreversibly altered (26, 27). The effect of cigarette smoke on airway gene expression is heterogeneous between individuals, and we have found that some of this heterogeneity in large airway epithelial gene expression reflects the presence or absence of lung cancer (28). This cancer-specific gene expression heterogeneity can be used as a biomarker to distinguish smokers with and without lung cancer (28, 29).
The above studies support the hypothesis that airway gene expression reflects the host response to cigarette smoke and that heterogeneity in this response may associate with the presence of (or perhaps susceptibility to) tobacco-associated lung disease. Given that large airway epithelial cells can be collected in a less invasive fashion than lung tissue, profiling airway epithelium to gain insights into COPD-related disease processes would be extremely attractive if, as is the case with lung cancer, some of the heterogeneity in airway gene expression reflects differences associated with COPD-related processes. Using this site to study aspects of COPD is attractive, as it would make it possible to study large numbers of patients to gain insights into heterogeneity in the processes that contribute to disease pathogenesis. In addition, the ability to identify COPD-related processes in airway gene expression would raise the possibility of developing airway biomarkers that could be used clinically to identify smokers at higher risk for developing disease. Pierrou and colleagues have recently published a study comparing bronchial airway gene expression profiles in 38 smokers with COPD, 18 healthy smokers, and 14 healthy nonsmokers, in which they identified 200 oxidative stress–related genes that were differentially expressed between smokers with and without COPD (16). One of the major strengths of this study was the careful characterization of subjects using high-resolution computed tomography (HRCT) of the chest to select healthy smokers without emphysema. However, given the limited numbers of smokers with moderate to severe COPD, smokers with GOLD stage 2–4 were combined for the subgroup analysis within subjects with COPD. Interestingly, nonlinear gene expression patterns were identified for some these antioxidant genes across the spectrum of COPD stages/severity. GSEA analysis revealed that these pathways are also among those most affected by smoking, with nonsmokers, healthy smokers, and smokers with COPD forming a continuum. Supporting the idea that COPD-related gene expression changes in the airway may be relevant to pathogenic processes occurring in the lung, the authors found that approximately 23% of the genes with COPD-specific patterns of differential expression in the airway were also found to have COPD-specific patterns of differential expression in the lung by Spira and coworkers (12).
FUTURE DIRECTIONS
The diverse clinical, radiographic, and pathologic findings in COPD represent a major challenge to molecular studies aimed at unraveling underlying molecular mechanisms. There is substantial heterogeneity among patients with COPD with regard to potentially confounding factors (e.g., medication use, co-morbidities such as lung cancer, and degree of tobacco smoke exposure). There is also substantial heterogeneity with regard to the severity, relative predominance, and regional distribution of pathologic findings. As a result, future studies that combine detailed clinical and pathological phenotyping of subjects (e.g., HRCT of the chest, pulmonary function, chart review, and detailed questionnaires) with gene expression profiling will likely be required to dissect the various components of COPD. Furthermore, it is likely that it will be necessary to study the different tissues within the respiratory tract that are differentially affected by COPD-related pathologies. For example, separately profiling the small airways and parenchyma in tissue sections within subjects with COPD may lead to a better understanding of the potentially distinct disease processes that underlie these major pathologic components of this disease. Gosselink and colleagues at the University of British Columbia have taken a first step in this direction by looking at expression of candidate genes via QRT-PCR in both the small airway and adjacent lung parenchyma from smokers with COPD (30).
Related to the above concern about the tissue specificity of different disease processes, the mixture of cell types in a given lung tissue sample represents a significant challenge to interpreting the biological significance of observed gene expression differences. In particular, there are significant concerns that COPD-associated gene expression changes may represent COPD-associated changes in the relative proportions of various cell types (e.g., COPD-specific neutrophil infiltration) as opposed to cell-intrinsic changes in gene expression. Localizing the cell type responsible for gene expression differences observed in whole lung tissue using in situ hybridization or immunohistochemistry, as done in the study by Wang and colleagues (15), is needed if gene expression studies of whole lung tissue are to lead to a better understanding of disease pathogenesis. While technically difficult, the use of laser capture microdissection to isolate various cell types from lung tissue before gene expression is another approach to address this issue.
Finally, one of the ultimate challenges in deriving mechanistic insights from a collection of genes that are differentially expressed as a function of any disease-related process is the difficulty in determining which genes are proximal to the disease-promoting process and which are secondary consequences of the activity of this process. One avenue that may identify at least a subset of the direct disease-promoting differences in gene expression involves integrating gene expression studies with large-scale genotyping studies of COPD: germ-line genetic variants that are associated with altered COPD risk and are located within genes that are differentially expressed as a function of COPD are more likely to indicate gene-expression differences that are causally involved in the pathogenesis of COPD. This type of approach was used by Spira and coworkers to find genes differentially expressed in COPD that map to chromosomal regions linked to various phenotypes of early onset COPD (12). Computational approaches to identify gene regulatory networks are also likely to provide insights into the primary causes and secondary consequences of the complex biology that underlies COPD.
The study by Pierrou and colleagues (16) suggests the exciting possibility that gene expression changes in the proximal airway can be a useful surrogate for COPD-specific changes occurring in the lung. One hope is that such studies may ultimately yield biomarkers from this readily accessible tissue (which can be collected via bronchoscopy) that could be used clinically to provide information about which patients are most likely to develop COPD, as well as information about prognosis and disease subtype. Another exciting possibility is that treatment-induced changes in biomarker expression would serve as an intermediate endpoint to gauge both the efficacy of candidate COPD therapeutics and the clinical response of individuals to established COPD therapies. Such intermediate measures of disease progression and remission are needed for assessing the efficacy of candidate COPD therapeutics given the slow progression of COPD clinical symptoms.
SUMMARY
While limited numbers of high-throughout gene expression studies have been conducted on COPD lung tissue, they have begun to provide insights into disease pathogenesis and targets for therapy. Enthusiasm for these studies has been tempered somewhat by the limited overlap between the genes identified in each study as being differentially expressed as a function of COPD, but a more in-depth analysis reveals significant overlap between the studies and suggests that common biological pathways are seen to be altered by COPD across datasets. Gene expression profiling of proximal airway epithelium in the context of COPD offers the promise of using this more readily accessible tissue both for basic studies into COPD as well as clinically. There is hope that gene expression in the proximal airway will ultimately impact the diagnostic and prognostic evaluation of patients with COPD and for the potential use of gene expression to tailor anti-COPD therapy to the patient. There is also hope that proximal airway gene expression will be a useful tool for evaluating the response to novel and existing therapies. If gene expression studies of COPD are to ultimately yield clinical benefit, large well-designed microarray studies will need to be conducted to dissect and understand the various pathologic components of the disease.
ACKNOWLEDGMENTS
The authors thank Adam Gower for his assistance in preparing Figure 2.
FOOTNOTES
This work was supported by NIH/NHLBI 1R01HL095388 (AS, MEL), NIH/NIEHS U01ES016035 (AS, MEL), and NSF IGERT grant DGE-0654108 awarded to the Boston University Bioinformatics program (JEZ).
Conflict of Interest Statement: J.E.Z. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. M.E.L. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. A.S. received an honorarium from AstraZeneca for a talk at the Lund Symposium in 2008.
(Received in original form July 30, 2008; accepted in final form August 28, 2008)
REFERENCES
This article has been cited by other articles:
![]() |
J. D Maclay, R. A Rabinovich, and W. MacNee Update in Chronic Obstructive Pulmonary Disease 2008 Am. J. Respir. Crit. Care Med., April 1, 2009; 179(7): 533 - 541. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |