An effective approach for analyzing "prefinished" genomic sequence data.
Ongoing efforts to sequence the human genome are already generating large amounts of data, with substantial increases anticipated over the next few years. In most cases, a shotgun sequencing strategy is being used, which rapidly yields most of the primary sequence in incompletely assembled sequence contigs ("prefinished" sequence) and more slowly produces the final, completely assembled sequence ("finished" sequence). Thus, in general, prefinished sequence is produced in excess of finished sequence, and this trend is certain to continue and even accelerate over the next few years. Even at a prefinished stage, genomic sequence represents a rich source of important biological information that is of great interest to many investigators. However, analyzing such data is a challenging and daunting task, both because of its sheer volume and because it can change on a day-by-day basis. To facilitate the discovery and characterization of genes and other important elements within prefinished sequence, we have developed an analytical strategy and system that uses readily available software tools in new combinations. Implementation of this strategy for the analysis of prefinished sequence data from human chromosome 7 has demonstrated that this is a convenient, inexpensive, and extensible solution to the problem of analyzing the large amounts of preliminary data being produced by large-scale sequencing efforts. Our approach is accessible to any investigator who wishes to assimilate additional information about particular sequence data en route to developing richer annotations of a finished sequence. (+info
A computational screen for methylation guide snoRNAs in yeast.
Small nucleolar RNAs (snoRNAs) are required for ribose 2'-O-methylation of eukaryotic ribosomal RNA. Many of the genes for this snoRNA family have remained unidentified in Saccharomyces cerevisiae, despite the availability of a complete genome sequence. Probabilistic modeling methods akin to those used in speech recognition and computational linguistics were used to computationally screen the yeast genome and identify 22 methylation guide snoRNAs, snR50 to snR71. Gene disruptions and other experimental characterization confirmed their methylation guide function. In total, 51 of the 55 ribose methylated sites in yeast ribosomal RNA were assigned to 41 different guide snoRNAs. (+info
Randomly amplified polymorphic DNA analysis of clinical and environmental isolates of Vibrio vulnificus and other vibrio species.
Vibrio vulnificus is an estuarine bacterium that is capable of causing a rapidly fatal infection in humans. A randomly amplified polymorphic DNA (RAPD) PCR protocol was developed for use in detecting V. vulnificus, as well as other members of the genus Vibrio. The resulting RAPD profiles were analyzed by using RFLPScan software. This RAPD method clearly differentiated between members of the genus Vibrio and between isolates of V. vulnificus. Each V. vulnificus strain produced a unique band pattern, indicating that the members of this species are genetically quite heterogeneous. All of the vibrios were found to have amplification products whose sizes were within four common molecular weight ranges, while the V. vulnificus strains had an additional two molecular weight range bands in common. All of the V. vulnificus strains isolated from clinical specimens produced an additional band that was only occasionally found in environmental strains; this suggests that, as is the case with the Kanagawa hemolysin of Vibrio parahaemolyticus, the presence of this band may be correlated with the ability of a strain to produce an infection in humans. In addition, band pattern differences were observed between encapsulated and nonencapsulated isogenic morphotypes of the same strain of V. vulnificus. (+info
Melanoma cells present a MAGE-3 epitope to CD4(+) cytotoxic T cells in association with histocompatibility leukocyte antigen DR11.
In this study we used TEPITOPE, a new epitope prediction software, to identify sequence segments on the MAGE-3 protein with promiscuous binding to histocompatibility leukocyte antigen (HLA)-DR molecules. Synthetic peptides corresponding to the identified sequences were synthesized and used to propagate CD4(+) T cells from the blood of a healthy donor. CD4(+) T cells strongly recognized MAGE-3281-295 and, to a lesser extent, MAGE-3141-155 and MAGE-3146-160. Moreover, CD4(+) T cells proliferated in the presence of recombinant MAGE-3 after processing and presentation by autologous antigen presenting cells, demonstrating that the MAGE-3 epitopes recognized are naturally processed. CD4(+) T cells, mostly of the T helper 1 type, showed specific lytic activity against HLA-DR11/MAGE-3-positive melanoma cells. Cold target inhibition experiments demonstrated indeed that the CD4(+) T cells recognized MAGE-3281-295 in association with HLA-DR11 on melanoma cells. This is the first evidence that a tumor-specific shared antigen forms CD4(+) T cell epitopes. Furthermore, we validated the use of algorithms for the prediction of promiscuous CD4(+) T cell epitopes, thus opening the possibility of wide application to other tumor-associated antigens. These results have direct implications for cancer immunotherapy in the design of peptide-based vaccines with tumor-specific CD4(+) T cell epitopes. (+info
Imagene: an integrated computer environment for sequence annotation and analysis.
MOTIVATION: To be fully and efficiently exploited, data coming from sequencing projects together with specific sequence analysis tools need to be integrated within reliable data management systems. Systems designed to manage genome data and analysis tend to give a greater importance either to the data storage or to the methodological aspect, but lack a complete integration of both components. RESULTS: This paper presents a co-operative computer environment (called Imagenetrade mark) dedicated to genomic sequence analysis and annotation. Imagene has been developed by using an object-based model. Thanks to this representation, the user can directly manipulate familiar data objects through icons or lists. Imagene also incorporates a solving engine in order to manage analysis tasks. A global task is solved by successive divisions into smaller sub-tasks. During program execution, these sub-tasks are graphically displayed to the user and may be further re-started at any point after task completion. In this sense, Imagene is more transparent to the user than a traditional menu-driven package. Imagene also provides a user interface to display, on the same screen, the results produced by several tasks, together with the capability to annotate these results easily. In its current form, Imagene has been designed particularly for use in microbial sequencing projects. AVAILABILITY: Imagene best runs on SGI (Irix 6.3 or higher) workstations. It is distributed free of charge on a CD-ROM, but requires some Ilog licensed software to run. Some modules also require separate license agreements. Please contact the authors for specific academic conditions and other Unix platforms. CONTACT: imagene home page: http://wwwabi.snv.jussieu.fr/imagene (+info
Stem Trace: an interactive visual tool for comparative RNA structure analysis.
MOTIVATION: Stem Trace is one of the latest tools available in STRUCTURELAB, an RNA structure analysis computer workbench. The paradigm used in STRUCTURELAB views RNA structure determination as a problem of dealing with a database of a large number of computationally generated structures. Stem Trace provides the capability to analyze this data set in a novel, visually driven, interactive and exploratory way. In addition to providing graphs at a high level of ion, it is also connected with complementary visualization tools which provide orthogonal views of the same data, as well as drawing of structures represented by a stem trace. Thus, on top of being an analysis tool, Stem Trace is a graphical user interface to an RNA structural information database. RESULTS: We illustrate Stem Trace's capabilities with several examples of the analysis of RNA folding data performed on 24 strains of HIV-1, HIV-2 and SIV sequences around the HIV dimerization region. This dimer linkage site has been found to play a role in encapsidation, reverse transcription, recombination, and inhibition of translation. Our examples show how Stem Trace elucidates preservation of structures in this region across the various strains of HIV. AVAILABILITY: The program can be made available upon request. It runs on SUN, SGI and DEC (Compaq) Unix workstations. (+info
Bayesian inference on biopolymer models.
MOTIVATION: Most existing bioinformatics methods are limited to making point estimates of one variable, e.g. the optimal alignment, with fixed input values for all other variables, e.g. gap penalties and scoring matrices. While the requirement to specify parameters remains one of the more vexing issues in bioinformatics, it is a reflection of a larger issue: the need to broaden the view on statistical inference in bioinformatics. RESULTS: The assignment of probabilities for all possible values of all unknown variables in a problem in the form of a posterior distribution is the goal of Bayesian inference. Here we show how this goal can be achieved for most bioinformatics methods that use dynamic programming. Specifically, a tutorial style description of a Bayesian inference procedure for segmentation of a sequence based on the heterogeneity in its composition is given. In addition, full Bayesian inference algorithms for sequence alignment are described. AVAILABILITY: Software and a set of transparencies for a tutorial describing these ideas are available at http://www.wadsworth.org/res&res/bioinfo/ (+info
E-CELL: software environment for whole-cell simulation.
MOTIVATION: Genome sequencing projects and further systematic functional analyses of complete gene sets are producing an unprecedented mass of molecular information for a wide range of model organisms. This provides us with a detailed account of the cell with which we may begin to build models for simulating intracellular molecular processes to predict the dynamic behavior of living cells. Previous work in biochemical and genetic simulation has isolated well-characterized pathways for detailed analysis, but methods for building integrative models of the cell that incorporate gene regulation, metabolism and signaling have not been established. We, therefore, were motivated to develop a software environment for building such integrative models based on gene sets, and running simulations to conduct experiments in silico. RESULTS: E-CELL, a modeling and simulation environment for biochemical and genetic processes, has been developed. The E-CELL system allows a user to define functions of proteins, protein-protein interactions, protein-DNA interactions, regulation of gene expression and other features of cellular metabolism, as a set of reaction rules. E-CELL simulates cell behavior by numerically integrating the differential equations described implicitly in these reaction rules. The user can observe, through a computer display, dynamic changes in concentrations of proteins, protein complexes and other chemical compounds in the cell. Using this software, we constructed a model of a hypothetical cell with only 127 genes sufficient for transcription, translation, energy production and phospholipid synthesis. Most of the genes are taken from Mycoplasma genitalium, the organism having the smallest known chromosome, whose complete 580 kb genome sequence was determined at TIGR in 1995. We discuss future applications of the E-CELL system with special respect to genome engineering. AVAILABILITY: The E-CELL software is available upon request. SUPPLEMENTARY INFORMATION: The complete list of rules of the developed cell model with kinetic parameters can be obtained via our web site at: http://e-cell.org/. (+info