Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. (9/6804)

We analyzed 10 genome expression data sets by large-scale cross-referencing against broad structural and functional categories. The data sets, generated by different techniques (e.g. SAGE and gene chips), provide various representations of the yeast transcriptome (the set of all yeast genes, weighted by transcript abundance). Our analysis enabled us to determine features more prevalent in the transcriptome than the genome: i.e. those that are common to highly expressed proteins. Starting with simplest categories, we find that, relative to the genome, the transcriptome is enriched in Ala and Gly and depleted in Asn and very long proteins. We find, furthermore, that protein length and maximum expression level have a roughly inverse relationship. To relate expression level and protein structure, we assigned transmembrane helices and known folds (using PSI-blast) to each protein in the genome; this allowed us to determine that the transcriptome is enriched in mixed alpha-beta structures and depleted in membrane proteins relative to the genome. In particular, some enzymatic folds, such as the TIM barrel and the G3P dehydrogenase fold, are much more prevalent in the transcriptome than the genome, whereas others, such as the protein-kinase and leucine-zipper folds, are depleted. The TIM barrel, in fact, is overwhelmingly the 'top fold' in the transcriptome, while it only ranks fifth in the genome. The most highly enriched functional categories in the transcriptome (based on the MIPS system) are energy production and protein synthesis, while categories such as transcription, transport and signaling are depleted. Furthermore, for a given functional category, transcriptome enrichment varies quite substantially between the different expression data sets, with a variation an order of magnitude larger than for the other categories cross-referenced (e.g. amino acids). One can readily see how the enrichment and depletion of the various functional categories relates directly to that of particular folds.  (+info)

Completing the E. coli proteome: a database of gene products characterised since the completion of the genome sequence. (10/6804)

A database collating research on E. coli genes whose products have been characterised subsequent to in silico predictions from the completed genome sequence.  (+info)

Single allele knock-out of Candida albicans CGT1 leads to unexpected resistance to hygromycin B and elevated temperature. (11/6804)

Almost all eukaryotic mRNAs are capped at their 5'-terminus. Capping is crucial for stability, processing, nuclear export and efficient translation of mRNA. We studied the phenotypic effects elicited by depleting a Candida albicans strain of mRNA 5'-guanylyltransferase (mRNA capping enzyme; CGT1). Construction of a Cgt1-deficient mutant was achieved by URA-blaster-mediated genetic disruption of one allele of the CGT1 gene, which was localized on chromosome III. The resulting heterozygous mutant exhibited an aberrant colony morphology resembling the 'irregular wrinkle' phenotype typically obtained from a normal C. albicans strain upon mild UV treatment. Its level of CGT1 mRNA was reduced two- to fivefold compared to the parental strain. Proteome analysis revealed a large number of differentially expressed proteins confirming the expected pleiotropic effect of CGT1 disruption. The disrupted strain was significantly more resistant to hygromycin B, an antibiotic which decreases translational fidelity, and showed increased resistance to heat stress. Proteome analysis revealed a 50-fold overexpression of Ef-1alphap and a more than sevenfold overexpression of the cell-wall heat-shock protein Ssa2p. Compared to a reference strain, the cgt1/CGT1 heterozygote was equally virulent for mice and guinea pigs when tested in an intravenous infection model of disseminated candidiasis.  (+info)

The proteome of Mycoplasma genitalium. Chaps-soluble component. (12/6804)

Mycoplasma genitalium is the smallest member of the class Mollicutes, with a genome size of 580 kb. It has the potential to express 480 gene products, and is therefore considered to be an excellent model to assess: (a) the minimum metabolism required by a free living cell; and (b) proteomic technologies and the information obtained by proteome analysis. Here, we report on the most complete proteome observed at 73% (expected proteome), and analysed at 33% (reported proteome). The use of four overlapping pH windows in conjunction with SDS/PAGE has allowed 427 distinct proteins to be resolved in association with the exponential growth of M. genitalium. Proof of expression for 201 proteins of sufficient abundance on silver stained two-dimensional gels was obtained using peptide mass fingerprinting (PMF) of which 158 were identified. The potential for gene product modification in even the simplest known self-replicating organism was quantified at a ratio of 1.22 : 1, more proteins than genes. A reduction in protein expression of 42% was observed for post-exponentially-grown cells. DnaK, GroEL, DNA gyrase, and a cytadherence accessory protein were significantly elevated, while some ribosomal proteins were reduced in relative abundance. The strengths and weaknesses of techniques employed were assessed with respect to the observed and predicted proteome derived from DNA sequence information. Proteomics was shown to provide a perspective into the biochemical and metabolic activities of this organism, beyond that obtainable by sequencing alone.  (+info)

Comparative genomics of the eukaryotes. (13/6804)

A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.  (+info)

The effect of nucleotide bias upon the composition and prediction of transmembrane helices. (14/6804)

Transmembrane helices are the most readily predictable secondary structure components of proteins. They can be predicted to a high degree of accuracy in a variety of ways. Many of these methods compare new sequence data with the sequence characteristics of known transmembrane domains. However, the known transmembrane sequences are not necessarily representative of a particular organism. We attempt to demonstrate that parameters optimized for the known transmembrane domains are far from optimal when predicting transmembrane regions in a given genome. In particular, we have tested the effect of nucleotide bias upon the composition and hence the prediction characteristics of transmembrane helices. Our analysis shows that nucleotide bias of a genome has a strong and predictable influence upon the occurrences of several of the most important hydrophobic amino acids found within transmembrane helices. Thus, we show that nucleotide bias should be taken into account when determining putative transmembrane domains from sequence data.  (+info)

Proteomic analysis of the Escherichia coli outer membrane. (15/6804)

Outer membrane proteins (OMPs) of Gram-negative bacteria are key molecules that interface the cell with the environment. Traditional biochemical and genetic approaches have yielded a wealth of knowledge relating to the function of OMPs. Nonetheless, with the completion of the Escherichia coli genome sequencing project there is the opportunity to further expand our understanding of the organization, expression and function of the OMPs in this Gram-negative bacterium. In this report we describe a proteomic approach which provides a platform for parallel analysis of OMPs. We propose a rapid method for isolation of bacterial OMPs using carbonate incubation, purification and protein array by two-dimensional electrophoresis, followed by protein identification using mass spectrometry. Applying this method to examine E. coli K-12 cells grown in minimal media we identified 21 out of 26 (80%) of the predicted integral OMPs that are annotated in SWISS-PROT release 37 and predicted to separate within the range of pH 4-7 and molecular mass 10-80 kDa. Five outer membrane lipoproteins were also identified and only minor contamination by nonmembrane proteins was observed. Importantly, this research readily demonstrates that integral OMPs, commonly missing from 2D gel maps, are amenable to separation by two-dimensional electrophoresis. Two of the identified OMPs (YbiL, YeaF) were previously known only from their ORFs, and their identification confirms the cognate genes are transcribed and translated. Furthermore, we show that like the E. coli iron receptors FhuE and FhuA, the expression of YbiL is markedly increased by iron limitation, suggesting a putative role for this protein in iron transport. In an additional demonstration we show the value of parallel protein analysis to document changes in E. coli OMP expression as influenced by culture temperature.  (+info)

Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics. (16/6804)

Modern biomedical research greatly benefits from large-scale genome-sequencing projects ranging from studies of viruses, bacteria, and yeast to multicellular organisms, like Caenorhabditis elegans. Comparative genomic studies offer a vast array of prospects for identification and functional annotation of human ortholog genes. We presented a novel comparative proteomic approach for assembling human gene contigs and assisting gene discovery. The C. elegans proteome was used as an alignment template to assist in novel human gene identification from human EST nucleotide databases. Among the available 18,452 C. elegans protein sequences, our results indicate that at least 83% (15,344 sequences) of C. elegans proteome has human homologous genes, with 7,954 records of C. elegans proteins matching known human gene transcripts. Only 11% or less of C. elegans proteome contains nematode-specific genes. We found that the remaining 7,390 sequences might lead to discoveries of novel human genes, and over 150 putative full-length human gene transcripts were assembled upon further database analyses. [The sequence data described in this paper have been submitted to the  (+info)