|
|
||||||||
Departments of Medicine, Biological Chemistry, and Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland
Correspondence and requests for reprints should be addressed to Yurong Guo, Ph.D., Johns Hopkins University Bayview Campus, 5200 Eastern Avenue, Mason F. Lord Building, Center tower, Room 609, Baltimore, MD 21224. E-mail: yguo7{at}jhmi.edu
ABSTRACT
Proteomics is a rapidly developing field and it opens new horizons in many research areas of life sciences. In the field of medicine, proteomics promises to accelerate the discovery of new drug targets and protein disease markers useful for in vitro diagnosis. In this article, we review the current proteomics technologies for biomarker discovery and validation, which include two-dimensional gel electrophoresis, one- and two-dimensional liquid chromatography, and proteomic microarrays. We will also review proteomic strategies for proteinprotein interactions and identification of post-translational modifications. Selection of the more effective technology or combination of technologies is required to maximize the interpretation and utility of the data.
Key Words: 2D electrophoresis multiple dimensional chromatography post-translational modification proteomic microarrays proteomics
The proteome is the temporal cell or tissue-specific protein complement of the genome, encompassing all proteins expressed at any given time, including various protein isoforms and their co- and post-translational modified forms (14). Proteomics is an emerging scientific field that involves the identification, characterization, and quantification of proteins in whole cells, tissues, or body fluids (1). Protein characterization, in an ideal situation, includes amino acid sequence analysis, post-translational modifications (PTMs), splice variants, and the identification of its binding partners and cellular localization (1).
The underlying reason for proteomic investigations is that proteins are often expressed in quantities and physical forms that can not be predicted from DNA and mRNA analysis. Thus expression analysis directly at the protein level is necessary to unravel the critical proteome changes that occur as part of disease pathogenesis. The proteome ultimately dictates the function of the cell, and therefore dictates phenotype. The proteome undergoes dynamic changes as it continuously responds to autocrine, paracrine, and endocrine factors, bloodborne mediators, temperature, drug treatment, and developing disease over time. This complex interplay results in a highly variable proteome, dynamically reflecting protein production, co- and post-translational modification, degradation, and secretion. Proteomic strategies are attracting increasing interest for identification of tissue markers and for providing data for analysis. In addition, proteomics gives the unique opportunity to develop blood-based biomarkers to be used for diagnosis, prognosis, and therapy modulation (5).
Proteomics is driven by the state-of-the-art analytical and biochemical technologies and it opens new horizons in many research areas of life sciences, particularly in the field of medicine. Clinical proteomics has been defined as application of proteomics specifically to the field of medicine, which promises to accelerate the discovery of new drug targets and protein disease markers useful for in vitro diagnosis (6). Useful biomarkers predict the extent and duration of organ damage, anticipate clinical outcomes, and evaluate the usefulness of therapeutic strategies. In theory, any protein change or protein modification that is tightly associated with a disease state has the potential to be a biomarker. The minimum criteria used to determine whether a particular protein is a potential biomarker (79) are listed in Table 1. Sample sources for biomarker could be tissues, body fluids, or cells.
|
This review will discuss current proteomics technologies that are used for biomarker discovery and validation, and will also cover proteinprotein interactions, and identification of PTMs, which are closely associated with biomarker and drug discovery.
PROTEOMICS PROFILING AND DISCOVERY OF BIOMARKERS
The first approach for proteomics-based biomarkers is global proteomics profiling. Owing to its diversity and complexity, the proteome cannot be resolved completely by a single technology. Proteomic studies have demonstrated that the most effective proteomic analysis of even a simple biological system requires combinations of protein separation and identification techniques. Listed in Table 2 are the major proteomics tools for quantitative analysis, and listed in Table 3 are proteomics tools for characterization (or nonquantitative) analysis. The use of combinations of complement technologies allows us to analyze a large spectrum of the proteome. However, the choice of which technologies to use must be driven by underlying clinical or biological questions. For an example of such a strategy see Figure 1.
|
|
|
Protein Separation Methods
There are two approaches for proteome analysis: intact protein separation and peptide separation. Protein separation methods include one- and two-dimensional gel electrophoresis (1-DE and 2-DE), one- and two-dimensional liquid chromatography (1D LC and 2D LC), and affinity chromatography for selective isolation of a target protein or protein complex. Peptide separation is more limited and includes multidimensional liquid chromatography and selective enrichment of a subset of peptides which is highly representative of the parent proteins.
Two-dimensional gel electrophoresis.
A cornerstone of proteomic analysis and protein separation remains 2-DE. Two-dimensional gel electrophoresis separates intact proteins in the first dimension based on intrinsic pI (isoelectric focusing [IEF]), and in the second dimension by molecular weight (MW or mass). Two-dimensional gel electrophoresis is one of only a few methods that are able to routinely detect PTMs of proteins even in complex mixtures. However, 2-DE is limited by the solubility and mass of the proteins. Therefore, considerable effort is needed to optimize technical issues involved in maximizing protein solubilization at each stage of 2-DE (14, 15). This includes the optimization of sample solubilization, loading and running conditions for IEF which require testing various combinations of detergents, optimizing the ramping and duration of IEF (14, 15), and specialized methods to improve incorporation of high and low mass proteins (16) and basic proteins (17).
Gel image analysis.
Advanced gel image analysis is necessary to compare multiple gels for accurate protein quantification. This quantification can be accomplished using sophisticated comparison algorithms in software programs. Unfortunately, due to variation between gels, no two gel images are directly superimposable, and warping is required to overlay and compare. This limitation makes image comparison complex, particularly between samples with markedly different spot patterns or when subtle protein changes are under investigation. Currently two options are available to investigators: (1) software programs can be purchased and used in-house for gel image analysis with extensive manual editing, and (2) companies will provide image analysis on a contract basis.
Differential in-gel electrophoresis (2D-DIGE) is a fairly recent improvement of the 2-DE technology. Before gel electrophoresis, the proteins from different disease states or experimental treatments are separately labeled with two different fluorescent dyes and an internal pooled standard labeled with the third dye. These three dyes are matched with mass and charge and each has a different emission wavelength. The labeled samples are then combined and subjected to 2-DE. The gel is scanned at different emission wavelengths and multiple images corresponding to a set of samples are generated and overlaid. Figure 2 shows an example of a pair of samples analyzed on two different gels stained with silver and also analyzed using 2D-DIGE. DIGE allows the differentially regulated proteins to be viewed as changed in color. Although 2D-DIGE significantly improves gel reproducibility and minimizes alignment issues (at least within the paired samples) (18, 19), some issues regarding quantification and translating gel maps to allow protein spot excising for downstream mass spectrometry (MS) identification remain.
|
|
One- and two-dimensional liquid chromatography approaches to protein separation.
One particularly powerful methodology that has not been extensively used yet in proteomics is protein separation by one- and two- dimensional liquid chromatography (1D LC and 2D LC). One-dimensional liquid chromatography can be used to separate proteins based on their size (mass), pI (charge), or hydrophobicitythe three chemical characteristics that define any given protein. The most commonly used 1D LC is reversed phase chromatography, which separates proteins based on hydrophobicity. In proteomic studies, 1D LC has been used primarily for peptide separation before MS analysis, but it can be used for protein separation before protein enzymatic digestion and MS (21, 22). In 2D LC, proteins are separated in the first dimension by isoelectric chromatography (pI) (or strong cation exchange chromatography) and in the second dimension by hydrophobicity (23, 24), thereby increasing the extent of protein fractionation compared with 1D LC. As with 1D LC, this method has been used primarily in proteomics for peptide separation; however, it is increasingly being applied to separation of complex intact protein mixtures (22, 2426). A combination of proteome separation by 2-DE, 1D LC, and 2D LC is synergistic and expands the observable proteome while allowing detection of protein isoforms and PTMs. In our laboratory, we compared 2-DE and 2D LC by creating an extensive database for serum (24) and isolated inner mitochondrial subproteome (22), and the findings were that only about 12% of the proteins observed were common between the two platforms.
When comparing the elution profiles of multiple samples obtained from different experiment conditions, it is essential to be able to both quantify (or obtain molar ratio) and identify the proteins present in the various fractions generated from the 2D LC separation. The reversed phase elution profile is monitored at 214 nm, the absorbance for peptide bonds. Hence, the elution peak volume is directly proportional to the number of peptide bonds present and thus reflects both the mass of the protein and its concentration. However, because in general multiple proteins are present in most of the 2D LC fractions, differential labeling is necessary at or after the time of tryptic digestion for the purpose of quantitation (see below). The standard strategy is to normalize and overlay elution profiles using sophisticated software and then quantify only the fractions that vary between samples. There are still needs for the development of more sophisticated quantitative and matching/alignment programs that would be similar to those used in 2-DE.
Isotopic labeling and mass spectrometry analysis.
For protein quantitation, several isotopic labeling techniques have been developed (27), and they can be divided into two main categories: globally adding labels in vitro before or after protein digestion, and metabolic labeling of proteins in vivo.
For the first category, two labeling techniques, isobaric tagging for relative and absolute quantitation (iTRAQ) (28) and 16O/18O labeling (29), are particularly useful. Briefly, iTRAQ reagents label free amines (N-termini of peptides and lysines). Up to four different samples with equivalent amounts are digested with trypsin separately, and each digest is then labeled with an iTRAQ reagent 114, 115, 116, or 117. Strong cation exchange is carried out to clear the peptide mixture from the reaction reagents, or to fractionate complex peptide mixture before LC-MS analysis. The intensity of the reporter ions in the tandem mass spectra (114.1, 115.1, 116.1, and 117.1) is used to quantify the peptides, and the average ratios of different peptides from the same proteins are used to quantify the protein amount. Second, the incorporation of the heavy or light oxygen isotopes can be accomplished during proteolysisoften referred to as 16O/18O labeling, which allows quantitative comparison between two samples. During proteolytic cleavage of proteins, two oxygen atoms from the solvent are incorporated universally into the carboxyl termini of all tryptic peptides. Thus, by incubating the peptides in either heavy or light oxygen water after the enzymatic digestion, selective incorporation of 18O or 16O will occur. The ratio of abundance between two samples can be obtained by the ratio of the two peaks separated by 4/z (z is the charge state of the peptide). This method may require reverse labeling to confirm the results, and it is restricted to two samples. However, the big cost difference between the two methods makes it reasonable to use the much more expensive iTRAQ method only when it is necessary to compare more than two samples. Figures 4 and 5 show the schematic diagrams of iTRAQ and 16O/18O labeling methodologies and example mass spectra. 16O/18O labeling has been applied for comparative proteomics analysis in adenovirus (29) and breast cancer (30, 31). iTRAQ has been successfully used to quantitate proteins from yeast whole cell lysates (28), murine cell cultures (32), and Escherichia coli (33).
|
|
Two-Dimensional Liquid Chromatography Approach to Peptide Separation
Analysis of complex peptide mixtures of digested proteomes is termed shot-gun or multidimensional protein identification technology (MudPIT) (41). This method involves tryptic digestion of protein mixture followed by multidimensional liquid chromatography (typically strong cation exchange followed by reverse phase chromatography) and MS analysis. Using MudPIT in combination with 1-DE, Guo and associates have identified 297 unique proteins from mouse bronchoalveolar lavage fluid (BALF) proteome, greatly expanding the BALF proteome by about threefold regardless of species (42).
Another peptide-based protein identification technique, different from MudPIT, is called combined fractional diagonal chromatography (COFRADIC). In COFRADIC, the proteins are first digested to peptides. A subset of peptides, which is highly representative of the parent proteins originally present in the lysate, are then selected. COFRADIC reduces the complexity of the peptide mixture. Theoretically, any peptide carrying a group that can be specifically and quantitatively modified can in principle be selected. It is sensitive and is characterized by a broad protein coverage, including abundant and rare, large and small, acidic and basic, and hydrophobic proteins. This concept has been applied to select methionine-containing peptides (43) and N-terminal peptides (44).
Whole Mass Monitoring of Intact Proteins
Mass analysis of proteolytic peptides is a much more popular method of protein characterization, as cheaper instrument designs can be used for characterization. In addition, sample preparation is easier once whole proteins have been digested into smaller peptide fragments. The intact proteins can also be ionized using electrospray ionization (ESI) or MALDI and introduced into a mass analyzer, primarily a TOF MS or FT-ICR MS. These two types of instrument are preferable here because of their wide mass range, and in the case of FT-ICR, its high mass accuracy. MALDI produces predominately singly charged molecular ions (although larger proteins can produce multiply charged ions), making the analysis of mixtures very much straightforward. In ESI, multiply charged molecular species are formed from the analytes that contain more than one possible site of proton attachment. Proteins usually exhibit a characteristic series of multiple charged ions. The molecular mass of the protein can be easily calculated using the observed masses of any two adjacent ions in the series. Using whole mass monitoring, we have identified 24 proteins from the serum albuminome (a subset of proteins associated with albumin in serum) (45).
Proteomics Profiling Using Surface-enhanced Laser Desorption/Ionization TOF MS
Surface-enhanced laser desorption/ionization (SELDI)-TOF has led to a vast increase in the number of publications about new serum biomarkers since its introduction (46), and it has excellent potential for protein profiling. Technically, ProteinChip array-based SELDI-TOF MS is a variant of MALDI-TOF, but the on-chip purification is a great advantage. Twelve eight-spot chips are assembled in 96-well "bioprocessors," which improves the expression analyses of many samples simultaneously. The subfraction of the proteome bound to the chips can be analyzed with MS on the same chip, resulting in a "pattern" of proteins characterized by mass-to-charge ratio (m/z). Furthermore, the technique is especially suitable for analyzing the low-molecular-weight proteome. The advantage of SELDI-TOF MS is that it does not rely on evidence of a gold-standard biomarker, but rather on combinations of peptide signals. However, substantial caveats remain, including limited biomarker discovery in complex biological samples, PTM analysis of proteins altered with diseases, identification of proteins/peptides, and precise protein/peptide quantitation (47). Platforms for direct on-chip sequencing of detected biomarkers by Q-TOF MS have become available, thus increasing the feasibility of the identification of discriminative proteins (48). SELDI-TOF MS has been applied to cancer tissue (49), plasma (50), serum (51), and nipple aspirate fluid (52). de Torre and coworkers have successfully used SELDI-TOF and 2-DE to assess markers of lung inflammation (53). They studied the changes in bronchoalveolar lavage (BAL) protein in 33 subjects challenged with local bronchial lung endotoxin and saline and in 11 patients with acute respiratory distress syndrome (ARDS). The temporal changes in acute inflammatory BAL (6, 24, and 48 h after endotoxin challenge) on hydrophobic binding chip surfaces revealed the differential presence of four proteins (all p < 0.001) in the inflammatory BAL. The differential pattern was also found in the ARDS BAL. They also analyzed the hydrophobic fraction of the inflammatory BAL using 2-DE and identified increased levels of apolipoprotein A1, and S100 calcium-binding proteins A8 and A9 in the inflammatory BAL. This pattern was also found in ARDS BAL after immunoblot analysis. These approaches will be useful to improve current methods of montoring lung inflammation and to identify new therapeutic targets.
PROTEINPROTEIN INTERACTIONS
The stimulation of cells from outside triggers cascades of signal transduction that results in cellular responses such as growth, differentiation, and movement. These signals are transduced by networks of interacting proteins (54). One method to probe proteinprotein interactions is to use a bait protein labeled with an affinity tag, expressed in cell culture, and then isolated from lysed cells along with its associated partners by affinity chromatography. The resulting proteins either are separated on 1D gel, or are digested directly with trypsin and analyzed by LC/MS. For tissues, antibodies are in general used to pull down the bait protein and its binding partners. The proteinprotein microarrays are the high-throughput platforms for study of such interactions. Proteomics, through proteinprotein association studies, will eventually provide a detailed map of all protein interactions in the healthy and diseased cells and thus facilitate development of drugs that selectively target disease-associated pathways while minimizing unwanted side effects. Understanding how proteins function by interacting with relevant cellular partners will also make it possible to evaluate the consequences of gene mutations on the operation of the cell. This, in turn, should accelerate the advance of gene therapy and individualized medicine in general (1).
IDENTIFICATION OF PTMS
Protein structure and activity is often regulated by the enzymatic attachment and removal of covalent modifications. PTMs may occur at different stages of disease development, providing clues indicative of early or late events of transformation. Common PTMs include phosphorylation, acetylation, glycosylation, ubiquitination, farnesylation, methylation, and sialylation (55). There are more than 200 post-translational modifications known to affect a variety of protein functions such as proteinprotein interaction and nucleic acidprotein interaction, stability, localization, and half-life (56). PTM-based biomarkers are proteins or peptides modified on a specific amino acid residue or residues and implicated in a specific pathway or biological network leading to initiation or promotion of a disease. For example, cardiac troponin I (cTnI) is specifically and selectively degraded at the C-terminus in the heart with myocardial ischemia. During acute myocardial infarction, the intact molecule and any degradation products are released from the heart cell into the blood where they can be detected (57). Although the detection of cTnI is now the gold standard for the diagnosis of myocardial infarction, increased quantities of degradation products have been proposed to increase risk and poor outcome (57).
Protein phosphorylation is one of the most important PTMs, and it is involved in most cellular signal pathways including cell cycling (58), signal transduction (59), DNA repair (60), and carcinogenesis (61). Nevertheless, despite phosphorylation being acknowledged as a crucial modification involved in many cellular events, determining the sites of phosphorylation on proteins is not a routine task. Recently, MS-based methods have emerged as powerful and preferred tools for the analysis of PTMs, including phosphorylation due to higher sensitivity, selectivity, and speed than most biochemical techniques (6264). MS approaches for the analysis of phosphorylation sites have mostly relied on using an instrument capable of tandem MS experiments to determine the sequence and exact sites of phosphorylation on peptides. Because of the low stoichiometry of phosphopepitdes in a biological sample, they are in general enriched using affinity chromatography before MS analysis (65).
PROTEOMIC MICROARRAYS: VALIDATION AND DISCOVERY
Proteomics profiling is useful for biomarker discovery; however, it has inherent limitations. First, it is limited by sample complexity and has great difficulty in finding the low abundance biomarkers from hundreds to thousands of other proteins in the biological sample. Second, it does not provide (or provides only limited) information on specific proteins, such as biological role of the proteins, except in context of a particular subproteome. Third, proteomics is, in general, expensive and time-consuming, and not compatible with processing large amounts of clinical samples. Therefore, simultaneous efforts are being made to identify proteins and develop antibody assays for the candidate proteins identified from global proteomics profiling or other useful proteins (e.g., cytokines).
There has been fascinating growth in the field of large-scale and high-throughput biology, resulting in a new era of technology development and the collection and analysis of information. Elucidatiing the function of every encoded gene and protein, and understanding the cellular events mediating complex disease processes, are the challenges ahead. Miniaturized and parallel assay systems, especially microarray-based analysis, are crucial to high-throughput biological analysis. In a microarray format, capture molecules are immobilized in a very small area, and probed for various biochemical activities. There are two general types of protein microarrays: analytical microarrays and functional protein microarrays (66).
Analytical Microarrays: Quantitative Assessment of Potential Biomarkers and Useful Proteins
In antibody microarrays, antibodies, antibody mimics, or other proteins are arrayed and used to measure the presence and concentrations of proteins in complex mixtures. They are essentially the same as enzyme-linked immunosorbent assay (ELISA) except that multiple analytes are quantified simultaneously. They are desirable platforms for high-throughput and quantitative assessment of potential biomarkers resulting from proteomics profiling.
The most common form of analytical arrays are antibodies/antibody mimic arrays, in which antibodies (or similar reagents) that bind specific antigens are arrayed on a glass slide at high density. A lysate is passed over the array and the bound antigen is detected after washing. Detection is usually carried out by using labeled lysates or using a second antibody that recognizes the antigen of interest (multiplex sandwich immunoassay). Assay performance is evaluated based on (1) ability to measure analytes across broad dynamic range at sufficiently low coefficients of variation (CVs), (2) detection of proteins at levels requisite to capture biologically relevant expression difference, and (3) generation of standard curves to calculate analyte concentrations based on detected intensity data. Antibody microarrays require calibration standard curves to be included in the experiment so that the concentration of analytes in the samples can be calculated according to the assay signal. It is critical to obtain recombinant protein standards that mimic the native proteins in the particular biofluids to be analyzed. The stability and solubility of the recombinant proteins should also be evaluated, since insolubility of the recombinant protein can have a major impact on assay precision. Several factors make high quality multiplex sandwich immunoassay limited to date (67). Matched antibody pairs are often unavailable for novel proteins, and the cross-reactivity of the detection antibodies limits the degree of multiplexing. Also, with each additional analyte added, it can be time consuming and costly to select capture and detection antibodies with high sensitivity and low cross-reactivity for quantitative mulplexing analysis.
In addition to antibody microarrays, other analytical microarrays have been developed, including microarrays for profiling antibodies in the patient's serum and plasma, essentially the reciprocal of that described above. Joos and colleagues used 18 diagnostic markers for autoimmune diseases to form an autogen microarray and screened for antigenantibody interactions (68).
Functional Protein Microarrays
Functional protein arrays are composed of sets or proteins or even an entire proteome that are arranged in an orderly fashion on a small surface. Unlike the antibody microarrays (mainly developed for diagnostics and profiling of expression), protein microarrays have enormous potential in assaying for a wide range of biochemical activities, as well as drug and drug target identification. Protein microarrays can be used to perform proteinprotein, proteinligand, proteinDNA, proteinRNA, proteindrug, and enzymesubstrate interactions. For example, Miksich has demonstrated the potential in using protein microarrays to conduct enzymatic assays to identify downstream targets of kinases (69). Protein microarrays will be the approach of choice to close the information gap between genomics and proteomics in the development of new markers for the early detection, diagnosis, and classification of diseases, as well as drug development and drug target identification.
The first great obstacle to overcome is the purification of large number of recombinant proteins in a high-throughput manner. Many research groups and companies have contributed tremendous effort in developing high-throughput protein purification methods, and recombinant proteins have been purified from E. coli, yeast, insects, and humans (7074). Leuking and coworkers cloned cDNAs from human fetal brain tissues as C-terminal Hisx6-tagged fusions (75). In a later report, Braun and colleagues created a system (FLEXP) that performs from cDNA cloning to protein production from E. coli in a fully automated fashion (73). However, because eukaryotic proteins expressed in prokaryotic systems are not post-translationally modified, Zhu and coworkers has developed a high-throughput protein purification method from the budding yeast (71). For the same reason, Albala and associates chose 72 unique human cDNA clones to create an array of recombinant baculoviruses, from which 42% of the clones produced soluble fusion proteins in a 96-well format (74). Functional protein chips like traditional assays performed in microtiter plates (76) are suitable for a wide variety of biochemical analyses. Unlike microtiter plates, however, they are much more amenable to high-throughput studies and use small amounts of reagents. Studies analyzing large sets of proteins have recently been performed. Using a nanowell chip mounted on glass slides, Zhu and colleagues analyzed the activity of 119 yeast kinases for 17 different substrates (77).
FUTURE PERSPECTIVES AND CHALLENGES
Proteomics complements genomics-based approaches, providing additional information but presenting different technical challenges. Because no protein equivalent of PCR for amplification of low-abundant proteins is present, detection methods with large dynamic range are needed. The folded structures are important for the protein properties, but generic methods are difficult to design and apply. The analysis of PTMs provides another challenge. Certain technological processes, particularly protein separation and analysis, are difficult to automate.
ACKNOWLEDGMENTS
The authors acknowledge Shijun Sheng for providing courtesy figures. They also thank Lesley A. Kane for her critical reading of the manuscript.
FOOTNOTES
Funding sources: NHLBI Proteomics Initiative (content N01-HV-28180) (J.E.V.E., Y.G.).
Conflict of Interest Statement: None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript.
(Received in original form August 31, 2006; accepted in final form September 26, 2006)
REFERENCES
This article has been cited by other articles:
![]() |
P. Matt, Z. Fu, Q. Fu, and J. E. Van Eyk Biomarker discovery: proteome fractionation and separation in biological samples Physiol Genomics, October 8, 2008; 33(1): 12 - 17. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |