Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach.
BACKGROUND: Most profile and motif databases strive to classify protein sequences into a broad spectrum of protein families. The next step of such database studies should include the development of classification systems capable of distinguishing between subfamilies within a structurally and functionally diverse superfamily. This would be helpful in elucidating sequence-structure-function relationships of proteins. RESULTS: Here, we present a method to diagnose sequences into subfamilies by employing hidden Markov models (HMMs) to find windows of residues that are distinct among subfamilies (called signatures). The method starts with a multiple sequence alignment (MSA) of the subfamily. Then, we build a HMM database representing all sliding windows of the MSA of a fixed size. Finally, we construct a HMM histogram of the matches of each sliding window in the entire superfamily. To illustrate the efficacy of the method, we have applied the analysis to find subfamily signatures in two well-studied superfamilies: the cadherin and the EF-hand protein superfamilies. As a corollary, the HMM histograms of the analyzed subfamilies revealed information about their Ca2+ binding sites and loops. CONCLUSIONS: The method is used to create HMM databases to diagnose subfamilies of protein superfamilies that complement broad profile and motif databases such as BLOCKS, PROSITE, Pfam, SMART, PRINTS and InterPro. (+info)
Protein interactions: two methods for assessment of the reliability of high throughput observations.
High throughput methods for detecting protein interactions require assessment of their accuracy. We present two forms of computational assessment. The first method is the expression profile reliability (EPR) index. The EPR index estimates the biologically relevant fraction of protein interactions detected in a high throughput screen. It does so by comparing the RNA expression profiles for the proteins whose interactions are found in the screen with expression profiles for known interacting and non-interacting pairs of proteins. The second form of assessment is the paralogous verification method (PVM). This method judges an interaction likely if the putatively interacting pair has paralogs that also interact. In contrast to the EPR index, which evaluates datasets of interactions, PVM scores individual interactions. On a test set, PVM identifies correctly 40% of true interactions with a false positive rate of approximately 1%. EPR and PVM were applied to the Database of Interacting Proteins (DIP), a large and diverse collection of protein-protein interactions that contains over 8000 Saccharomyces cerevisiae pairwise protein interactions. Using these two methods, we estimate that approximately 50% of them are reliable, and with the aid of PVM we identify confidently 3003 of them. Web servers for both the PVM and EPR methods are available on the DIP website (dip.doe-mbi.ucla.edu/Services.cgi). (+info)
Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
Quantitative proteomics has traditionally been performed by two-dimensional gel electrophoresis, but recently, mass spectrometric methods based on stable isotope quantitation have shown great promise for the simultaneous and automated identification and quantitation of complex protein mixtures. Here we describe a method, termed SILAC, for stable isotope labeling by amino acids in cell culture, for the in vivo incorporation of specific amino acids into all mammalian proteins. Mammalian cell lines are grown in media lacking a standard essential amino acid but supplemented with a non-radioactive, isotopically labeled form of that amino acid, in this case deuterated leucine (Leu-d3). We find that growth of cells maintained in these media is no different from growth in normal media as evidenced by cell morphology, doubling time, and ability to differentiate. Complete incorporation of Leu-d3 occurred after five doublings in the cell lines and proteins studied. Protein populations from experimental and control samples are mixed directly after harvesting, and mass spectrometric identification is straightforward as every leucine-containing peptide incorporates either all normal leucine or all Leu-d3. We have applied this technique to the relative quantitation of changes in protein expression during the process of muscle cell differentiation. Proteins that were found to be up-regulated during this process include glyceraldehyde-3-phosphate dehydrogenase, fibronectin, and pyruvate kinase M2. SILAC is a simple, inexpensive, and accurate procedure that can be used as a quantitative proteomic approach in any cell culture system. (+info)
Biophysical characterization of proteins in the post-genomic era of proteomics.
Proteomics focuses on the high throughput study of the expression, structure, interactions, and, to some extent, function of large numbers of proteins. A true understanding of the functioning of a living cell also requires a quantitative description of the stoichiometry, kinetics, and energetics of each protein complex in a cellular pathway. Classical molecular biophysical studies contribute to understanding of these detailed properties of proteins on a smaller scale than does proteomics in that individual proteins are usually studied. This perspective article deals with the role of biophysical methods in the study of proteins in the proteomic era. Several important physical biochemical methods are discussed briefly and critiqued from the standpoint of information content and data acquisition. The focus is on conformational changes and macromolecular assembly, the utility of dynamic and static structural data, and the necessity to combine experimental approaches to obtain a full functional description. The conclusions are that biophysical information on proteins is a useful adjunct to "standard" proteomic methods, that data can be obtained by high throughput technology in some instances, but that hypothesis-driven experimentation may frequently be required. (+info)
A proteomic analysis of human cilia: identification of novel components.
Cilia play an essential role in protecting the respiratory tract by providing the force necessary for mucociliary clearance. Although the major structural components of human cilia have been described, a complete understanding of cilia function and regulation will require identification and characterization of all ciliary components. Estimates from studies of Chlamydomonas flagella predict that an axoneme contains > or = 250 proteins. To identify all the components of human cilia, we have begun a comprehensive proteomic analysis of isolated ciliary axonemes. Analysis by two-dimensional (2-D) PAGE resulted in a highly reproducible 2-D map consisting of over 240 well resolved components. Individual protein spots were digested with trypsin and sequenced using liquid chromatography/tandem mass spectrometry (LC/MS/MS). Peptide matches were obtained to 38 potential ciliary proteins by this approach. To identify ciliary components not resolved by 2-D PAGE, axonemal proteins were separated on a one-dimensional gel. The gel lane was divided into 45 individual slices, each of which was analyzed by LC/MS/MS. This experiment resulted in peptide matches to an additional 110 proteins. In a third approach, preparations of isolated axonemes were digested with Lys-C, and the resulting peptides were analyzed directly by LC/MS/MS or by multidimensional LC/MS/MS, leading to the identification of a further 66 proteins. Each of the four approaches resulted in the identification of a subset of the proteins present. In total, sequence data were obtained on over 1400 peptides, and over 200 potential axonemal proteins were identified. Peptide matches were also obtained to over 200 human expressed sequence tags. As an approach to validate the mass spectrometry results, additional studies examined the expression of several identified proteins (annexin I, sperm protein Sp17, retinitis pigmentosa protein RP1) in cilia or ciliated cells. These studies represent the first proteomic analysis of the human ciliary axoneme and have identified many potentially novel components of this complex organelle. (+info)
A proteomics approach for the identification of DNA binding activities observed in the electrophoretic mobility shift assay.
Transcription factors lie at the center of gene regulation, and their identification is crucial to the understanding of transcription and gene expression. Traditionally, the isolation and identification of transcription factors has been a long and laborious task. We present here a novel method for the identification of DNA-binding proteins seen in electrophoretic mobility shift assay (EMSA) using the power of two-dimensional electrophoresis coupled with mass spectrometry. By coupling SDS-PAGE and isoelectric focusing to EMSA, the molecular mass and pI of a protein complex seen in EMSA were estimated. Candidate proteins were then identified on a two-dimensional array at the predetermined pI and molecular mass coordinates and identified by mass spectrometry. We show here the successful isolation of a functionally relevant transcription factor and validate the identity through EMSA supershift analysis. (+info)
A proteomics approach to identify proliferating cell nuclear antigen (PCNA)-binding proteins in human cell lysates. Identification of the human CHL12/RFCs2-5 complex as a novel PCNA-binding protein.
Proliferating cell nuclear antigen (PCNA), a eukaryotic DNA replication factor, functions not only as a processivity factor for DNA polymerase delta but also as a binding partner for multiple other factors. To understand its broad significance, we have carried out systematic studies of PCNA-binding proteins by a combination of affinity chromatography and mass spectrometric analyses. We detected more than 20 specific protein bands of various intensities in fractions bound to PCNA-fixed resin from human cell lysates and determined their peptide sequences by liquid chromatography and tandem mass spectrometry. A search with human protein data bases identified 12 reported PCNA-binding proteins from both cytoplasmic (S100 lysate) and nuclear extracts with substantial significance and four more solely from the nuclear preparation. CHL12, a factor involved in checkpoint response and chromosome cohesion, was a novel example found in both lysates. Further studies with recombinant proteins demonstrated that CHL12 and small subunits of replication factor C form a complex that interacts with PCNA. (+info)
Peptidomics of the larval Drosophila melanogaster central nervous system.
Neuropeptides regulate most, if not all, biological processes in the animal kingdom, but only seven have been isolated and sequenced from Drosophila melanogaster. In analogy with the proteomics technology, where all proteins expressed in a cell or tissue are analyzed, the peptidomics approach aims at the simultaneous identification of the whole peptidome of a cell or tissue, i.e. all expressed peptides with their posttranslational modifications. Using nanoscale liquid chromatography combined with tandem mass spectrometry and data base mining, we analyzed the peptidome of the larval Drosophila central nervous system at the amino acid sequence level. We were able to provide biochemical evidence for the presence of 28 neuropeptides using an extract of only 50 larval Drosophila central nervous systems. Eighteen of these peptides are encoded in previously cloned or annotated precursor genes, although not all of them were predicted correctly. Eleven of these peptides were never purified before. Eight other peptides are entirely novel and are encoded in five different, not yet annotated genes. This neuropeptide expression profiling study also opens perspectives for other eukaryotic model systems, for which genome projects are completed or in progress. (+info)