Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition. (65/179)

MOTIVATION: A key goal of genomics is to assign function to genes, especially for orphan sequences. RESULTS: We compared the clustered functional domains in the SBASE database to each protein sequence using BLASTP. This representation for a protein is a vector, where each of the non-zero entries in the vector indicates a significant match between the sequence of interest and the SBASE domain. The machine learning methods nearest neighbour algorithm (NNA) and support vector machines are used for predicting protein functional classes from this information. We find that the best results are found using the SBASE-A database and the NNA, namely 72% accuracy for 79% coverage. We tested an assigning function based on searching for InterPro sequence motifs and by taking the most significant BLAST match within the dataset. We applied the functional domain composition method to predict the functional class of 2018 currently unclassified yeast open reading frames. AVAILABILITY: A program for the prediction method, that uses NNA called Functional Class Prediction based on Functional Domains (FCPFD) is available and can be obtained by contacting Y.D.Cai at [email protected]  (+info)

Protein homology detection using string alignment kernels. (66/179)

MOTIVATION: Remote homology detection between protein sequences is a central problem in computational biology. Discriminative methods involving support vector machines (SVMs) are currently the most effective methods for the problem of superfamily recognition in the Structural Classification Of Proteins (SCOP) database. The performance of SVMs depends critically on the kernel function used to quantify the similarity between sequences. RESULTS: We propose new kernels for strings adapted to biological sequences, which we call local alignment kernels. These kernels measure the similarity between two sequences by summing up scores obtained from local alignments with gaps of the sequences. When tested in combination with SVM on their ability to recognize SCOP superfamilies on a benchmark dataset, the new kernels outperform state-of-the-art methods for remote homology detection. AVAILABILITY: Software and data available upon request.  (+info)

Support vector machine classification on the web. (67/179)

The support vector machine (SVM) learning algorithm has been widely applied in bioinformatics. We have developed a simple web interface to our implementation of the SVM algorithm, called Gist. This interface allows novice or occasional users to apply a sophisticated machine learning algorithm easily to their data. More advanced users can download the software and source code for local installation. The availability of these tools will permit more widespread application of this powerful learning algorithm in bioinformatics.  (+info)

The DISOPRED server for the prediction of protein disorder. (68/179)

Dynamically disordered regions appear to be relatively abundant in eukaryotic proteomes. The DISOPRED server allows users to submit a protein sequence, and returns a probability estimate of each residue in the sequence being disordered. The results are sent in both plain text and graphical formats, and the server can also supply predictions of secondary structure to provide further structural information. AVAILABILITY: The server can be accessed by non-commercial users at http://bioinf.cs.ucl.ac.uk/disopred/  (+info)

A case study of high-throughput biological data processing on parallel platforms. (69/179)

MOTIVATION: Analysis of large biological data sets using a variety of parallel processor computer architectures is a common task in bioinformatics. The efficiency of the analysis can be significantly improved by properly handling redundancy present in these data combined with taking advantage of the unique features of these compute architectures. RESULTS: We describe a generalized approach to this analysis, but present specific results using the program CEPAR, an efficient implementation of the Combinatorial Extension algorithm in a massively parallel (PAR) mode for finding pairwise protein structure similarities and aligning protein structures from the Protein Data Bank. CEPAR design and implementation are described and results provided for the efficiency of the algorithm when run on a large number of processors. AVAILABILITY: Source code is available by contacting one of the authors.  (+info)

WebSIDD: server for predicting stress-induced duplex destabilized (SIDD) sites in superhelical DNA. (70/179)

SUMMARY: WebSIDD is a Web-based service designed to predict locations and extents of stress-induced duplex destabilization (SIDD) that occur in a double-stranded DNA molecule of specified base sequence, on which a specified level of superhelical stress is imposed. The algorithm calculates the approximate equilibrium statistical mechanical distribution of a population of identical molecules among its accessible states. The user inputs the DNA sequence, and the program outputs the calculated transition probability and destabilization energy of each base pair in the sequence. As options, the user can specify the temperature and the level of superhelicity. The values of all structural and energy parameters used in the calculation have been experimentally measured. WebSIDD should prove useful for finding SIDD-susceptible sites in genomic sequences, and correlating their occurrence with locations involved in regulatory and pathological processes. This strategy already has illuminated the roles of SIDD in diverse biological regulatory processes, including transcriptional initiation and termination, and the eukaryotic nuclear scaffold attachments that partition chromosomes into domains. AVAILABILITY: http://orange.genomecenter.ucdavis.edu/benham/sidd/index.html  (+info)

Stochastic computing with biomolecular automata. (71/179)

Stochastic computing has a broad range of applications, yet electronic computers realize its basic step, stochastic choice between alternative computation paths, in a cumbersome way. Biomolecular computers use a different computational paradigm and hence afford novel designs. We constructed a stochastic molecular automaton in which stochastic choice is realized by means of competition between alternative biochemical pathways, and choice probabilities are programmed by the relative molar concentrations of the software molecules coding for the alternatives. Programmable and autonomous stochastic molecular automata have been shown to perform direct analysis of disease-related molecular indicators in vitro and may have the potential to provide in situ medical diagnosis and cure.  (+info)

Exploring protein fold space by secondary structure prediction using data distribution method on Grid platform. (72/179)

MOTIVATION: Since the newly developed Grid platform has been considered as a powerful tool to share resources in the Internet environment, it is of interest to demonstrate an efficient methodology to process massive biological data on the Grid environments at a low cost. This paper presents an efficient and economical method based on a Grid platform to predict secondary structures of all proteins in a given organism, which normally requires a long computation time through sequential execution, by means of processing a large amount of protein sequence data simultaneously. From the prediction results, a genome scale protein fold space can be pursued. RESULTS: Using the improved Grid platform, the secondary structure prediction on genomic scale and protein topology derived from the new scoring scheme for four different model proteomes was presented. This protein fold space was compared with structures from the Protein Data Bank, database and it showed similarly aligned distribution. Therefore, the fold space approach based on this new scoring scheme could be a guideline for predicting a folding family in a given organism.  (+info)