Assembly of fingerprint contigs: parallelized FPC. (17/179)

SUMMARY: One of the more common uses of the program FingerPrint Contigs (FPC) is to assemble random restriction digest 'fingerprints' of overlapping genomic clones into contigs. To improve the rate of assembling contigs from large fingerprint databases we have adapted FPC so that it can be run in parallel on multiple processors and servers. The current version of 'parallelized FPC' has been used in our laboratory to assemble mammalian BAC fingerprint databases, each containing more than 300000 BAC fingerprints. AVAILABILITY: This parallelized version of FPC is available under the GNU GPL licence, and can be downloaded from ftp://ftp.bcgsc.bc.ca/pub/fpcd.  (+info)

CASA: a server for the critical assessment of protein sequence alignment accuracy. (18/179)

SUMMARY: A public server for evaluating the accuracy of protein sequence alignment methods is presented. CASA is an implementation of the alignment accuracy benchmark presented by Sauder et al. (Proteins, 40, 6-22, 2000). The benchmark currently contains 39321 pairwise protein structure alignments produced with the CE program from SCOP domain definitions. The server produces graphical and tabular comparisons of the accuracy of a user's input sequence alignments with other commonly used programs, such as BLAST, PSI-BLAST, Clustal W, and SAM-T99. AVAILABILITY: The server is located at http://capb.dbi.udel.edu/casa.  (+info)

BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms. (19/179)

MOTIVATION: beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. RESULTS: This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. AVAILABILITY: The server is accessible from http://imtech.res.in/raghava/betatpred/  (+info)

TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. (20/179)

SUMMARY: TREE-PUZZLE is a program package for quartet-based maximum-likelihood phylogenetic analysis (formerly PUZZLE, Strimmer and von Haeseler, Mol. Biol. Evol., 13, 964-969, 1996) that provides methods for reconstruction, comparison, and testing of trees and models on DNA as well as protein sequences. To reduce waiting time for larger datasets the tree reconstruction part of the software has been parallelized using message passing that runs on clusters of workstations as well as parallel computers. AVAILABILITY: http://www.tree-puzzle.de. The program is written in ANSI C. TREE-PUZZLE can be run on UNIX, Windows and Mac systems, including Mac OS X. To run the parallel version of PUZZLE, a Message Passing Interface (MPI) library has to be installed on the system. Free MPI implementations are available on the Web (cf. http://www.lam-mpi.org/mpi/implementations/).  (+info)

Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity. (21/179)

MOTIVATION: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution. RESULTS: We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation. AVAILABILITY: The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.  (+info)

BeoBLAST: distributed BLAST and PSI-BLAST on a Beowulf cluster. (22/179)

BeoBLAST is an integrated software package that handles user requests and distributes BLAST and PSI-BLAST searches to nodes of a Beowulf cluster, thus providing a simple way to implement a scalable BLAST system on top of relatively inexpensive computer clusters. Additionally, BeoBLAST offers a number of novel search features through its web interface, including the ability to perform simultaneous searches of multiple databases with multiple queries, and the ability to start a search using the PSSM generated from a previous PSI-BLAST search on a different database. The underlying system can also handle automated querying for high throughput work. AVAILABILITY: Source code is available under the GNU public license at http://bioinformatics.fccc.edu/  (+info)

Support vector regression applied to the determination of the developmental age of a Drosophila embryo from its segmentation gene expression patterns. (23/179)

MOTIVATION: In this paper we address the problem of the determination of developmental age of an embryo from its segmentation gene expression patterns in Drosophila. RESULTS: By applying support vector regression we have developed a fast method for automated staging of an embryo on the basis of its gene expression pattern. Support vector regression is a statistical method for creating regression functions of arbitrary type from a set of training data. The training set is composed of embryos for which the precise developmental age was determined by measuring the degree of membrane invagination. Testing the quality of regression on the training set showed good prediction accuracy. The optimal regression function was then used for the prediction of the gene expression based age of embryos in which the precise age has not been measured by membrane morphology. Moreover, we show that the same accuracy of prediction can be achieved when the dimensionality of the feature vector was reduced by applying factor analysis. The data reduction allowed us to avoid over-fitting and to increase the efficiency of the algorithm.  (+info)

Improving gene recognition accuracy by combining predictions from two gene-finding programs. (24/179)

MOTIVATION: Despite constant improvements in prediction accuracy, gene-finding programs are still unable to provide automatic gene discovery with desired correctness. The current programs can identify up to 75% of exons correctly and less than 50% of predicted gene structures correspond to actual genes. New approaches to computational gene-finding are clearly needed. RESULTS: In this paper we have explored the benefits of combining predictions from already existing gene prediction programs. We have introduced three novel methods for combining predictions from programs Genscan and HMMgene. The methods primarily aim to improve exon level accuracy of gene-finding by identifying more probable exon boundaries and by eliminating false positive exon predictions. This approach results in improved accuracy at both the nucleotide and exon level, especially the latter, where the average improvement on the newly assembled dataset is 7.9% compared to the best result obtained by Genscan and HMMgene. When tested on a long genomic multi-gene sequence, our method that maintains reading frame consistency improved nucleotide level specificity by 21.0% and exon level specificity by 32.5% compared to the best result obtained by either of the two programs individually. AVAILABILITY: The scripts implementing our methods are available from http://www.cs.ubc.ca/labs/beta/genefinding/  (+info)