In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and
There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing
Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each programs algorithm. Accuracy of alignment was
Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.
CLUSTAL-W is currently one of the most popular automated multiple sequence alignment tools. CLUSTAL-W calculates a distance matrix for the sequences that are to be aligned. The distance matrix is then used to generate a phylogenetic tree that is used to guide the series of global alignments needed to create the multiple alignment. This is referred to as progressive alignment. Mutliple sequence alignments may also be created by hand and involve gapped or ungapped sequences. Typically, gapped alignments are used for full protein sequences, whereas ungapped alignments may be used to identify protein domains or motifs (See BLOCKS database).. Other multiple sequence alignment methods include DIALIGN, T-Coffee, and POA (Lassman and Sonnhammer, 2002).. ...
Jalview hands-on training course is for anyone who works with sequence data and multiple sequence alignments from proteins, RNA and DNA.. Register via the University of Cambridge website.. Jalview is free software for protein and nucleic acid sequence alignment generation, visualisation and analysis. It includes sophisticated editing options and provides a range of analysis tools to investigate the structure and function of macromolecules through a multiple window interface. For example, Jalview supports 8 popular methods for multiple sequence alignment, prediction of protein secondary structure by JPred and disorder prediction by four methods. Jalview also has options to generate phylogenetic trees, and assess consensus and conservation across sequence families. Sequences, alignments and additional annotation can be accessed directly from public databases and journal-quality figures generated for publication.. The course involves of a mixture of talks and hands-on exercises.. Day 1 is an ...
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10,000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were im …
TY - JOUR. T1 - Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. AU - Wrabl, James O.. AU - Grishin, Nick V.. PY - 2005/11/15. Y1 - 2005/11/15. N2 - Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal ...
CombAlign is a new Python code that generates a gapped, multiple structure-based sequence alignment (MSSA) given a set of pairwise structure-based sequence alignments. CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related structures. The method for combining multiple pairwise alignments is straightforward, involving the recording of pre-computed residue-residue correspondences between positions on the reference protein and each compared structure, and insertion of non-redundant gaps, as needed, to reflect amino-acid deletions or structural divergence in the reference relative to one or more compared structures.. CombAlign is not intended for use in applications for which greater benefit would be provided using a multiple structure alignment as generated by the vast majority of open-source programs [20], nor does it propose to address matters of protein evolution or function ...
ALL is a high speed, large data set sequence alignment tool for Pairwise sequence alignment and Multiple Sequence Alignment (MSA). This tool processes both Protein and Nucleotide local sequence alignments. The type of sequence is automatically recognized. Any printable character set can be used except reserved characters.
DNA sequence alignment is a critical step in identifying homology between organisms. The most widely used alignment program, ClustalW, is known to suffer from the local minima problem, where suboptimal guide trees produce incorrect gap insertions. The optimization alignment approach, has been shown to be effective in combining alignment and phylogenetic search in order to avoid the problems associated with poor guide trees. The optimization alignment algorithm operates at a small grain size, aligning each tree found, wasting time producing multiple sequence alignments for suboptimal trees. This research develops and analyzes a large grain size algorithm for optimization alignment that iterates through steps of alignment and phylogeny search, thus improving the quality of guide trees used for computation of multiple sequence alignments and eliminating computation of multiple sequence alignments for sub-optimal guide trees. Local minima are avoided by the use of stochastic search methods. Large Grain Size
Multiple sequence alignments (MSAs) are essential in most bioinformatics analyses that involve comparing homologous sequences. The exact way of computing an optimal alignment between N sequences has a computational complexity of O(LN) for N sequences of length L making it prohibitive for even small numbers of sequences. Most automatic methods are based on the progressive alignment heuristic (Hogeweg and Hesper, 1984), which aligns sequences in larger and larger subalignments, following the branching order in a guide tree. With a complexity of roughly O(N2), this approach can routinely make alignments of a few thousand sequences of moderate length, but it is tough to make alignments much bigger than this. The progressive approach is a greedy algorithm where mistakes made at the initial alignment stages cannot be corrected later. To counteract this effect, the consistency principle was developed (Notredame et al, 2000). This has allowed the production of a new generation of more accurate ...
Hi. Ive been trying to download a multiple sequence alignment from clustal omega as a clustal format file, but whenever I click on the download option, it just opens a new page with only the alignments displayed. I tried downloading the page as a .pdf file and converting it into rtf, but that destroys the formatting. Same thing with simply copy/pasting into a text file. I need a clustal formatted file for use with PriFi ( for designing primers from multiple sequence alignment ). Is there any workaround to this. Or is there something else I can use that does the MSA and the primer design from a multiple sequence fast file. (im using mac os x mavericks ) ...
This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment. Key map: Class: Cα -- Backbone Atom (Cα) Alignment; AllA -- All Atoms Alignment; SSE -- Secondary Structure Elements Alignment; Seq -- Sequence-based alignment Pair -- Pairwise Alignment (2 structures *only*); Multi -- Multiple Structure Alignment (MStA); C-Map -- Contact Map Surf -- Connolly Molecular Surface Alignment SASA -- Solvent Accessible Surface Area Dihed -- Dihedral Backbone Angles PB -- Protein Blocks Flexible: No -- Only rigid-body transformations are considered between the structures being compared. Yes -- The method allows for some flexibility within the structures being compared, such as movements around hinge regions. Aung, Zeyar; Kian-Lee Tan (Dec 2006). MatAlign: Precise protein structure comparison by matrix alignment. Journal of Bioinformatics and Computational Biology. 4 (6): 1197-216. ...
FSA is a probabilistic multiple sequence alignment algorithm which uses a distancebased approach to aligning homologous protein RNA or DNA sequences
document titled Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques is about AI and Robotics
Accurate sequence alignments of distantly related proteins are crucial for the better understanding of proteins at their family/superfamily level. However, such alignments of distantly related proteins are often hard to obtain by automatic multiple sequence alignment programs. Hence, we suggest a protocol that permits the reliable sequence alignment of distantly related proteins whose structural information is available. This protocol employs two stages of structure-based sequence alignments in order to obtain reliable alignments. The method proposed is clearly suited to work for protein structural members with distant relationships. We further propose a novel assessment of the derived alignments using the measurements of the structural variations and the percentage secondary structural equivalences. This structure-based sequence alignment protocol can be employed for a single superfamily or for a large number of structural domain superfamilies in a near-automatic and rapid manner.. Development ...
TY - JOUR. T1 - High performance biological pairwise sequence alignment. T2 - FPGA versus GPU versus cell BE versus GPP. AU - Benkrid, Khaled. AU - Akoglu, Ali. AU - Ling, Cheng. AU - Song, Yang. AU - Liu, Ying. AU - Tian, Xiang. PY - 2012. Y1 - 2012. N2 - This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBMs Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion ...
Download MSAProbs: Multiple Sequence Alignment for free. One of the most accurate multiple protein sequence aligners. MSAProbs is an open-source protein multiple sequence ailgnment algorithm, achieving the stastistically highest alignment accuracy on popular benchmarks: BALIBASE, PREFAB, SABMARK, OXBENCH, compared to ClustalW, MAFFT, MUSCLE, ProbCons and Probalign.
Currently contains parsers and datatypes for: clustalw2, clustalo, mlocarna, cmalign. Clustal tools are multiple sequence alignment tools for biological sequences like DNA, RNA and Protein. For more information on clustal Tools refer to http://www.clustal.org/.. Mlocarna is a multiple sequence alignment tool for RNA sequences with secondary structure output. For more information on mlocarna refer to http://www.bioinf.uni-freiburg.de/Software/LocARNA/.. cmalign is a multiple sequence alignment program based on RNA family models and produces ,among others, clustal output. It is part of infernal http://infernal.janelia.org/.. 4 types of output are parsed. ...
Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments.
Genetic sequence alignment is the basis of many evolutionary and comparative studies, and errors in alignments lead to errors in the interpretation of evolutionary information in genomes. Traditional multiple sequence alignment methods disregard the phylogenetic implications of gap patterns that they create and infer systematically biased alignments with excess deletions and substitutions, too few insertions, and implausible insertion-deletion-event histories. We present a method that prevents these systematic errors by recognizing insertions and deletions as distinct evolutionary events. We show theoretically and practically that this improves the quality of sequence alignments and downstream analyses over a wide range of realistic alignment problems. These results suggest that insertions and sequence turnover are more common than is currently thought and challenge the conventional picture of sequence evolution and mechanisms of functional and structural changes.. ...
Evaluation Measures of Multiple Sequence Alignments - Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identi es a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a speci c MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS
The necessary use of heuristics for multiple alignment means that for an arbitrary set of proteins, there is always a good chance that an alignment will contain errors. For example, an evaluation of several leading alignment programs using the BAliBase benchmark found that at least 24% of all pairs of aligned amino acids were incorrectly aligned.[38] These errors can arise because of unique insertions into one or more regions of sequences, or through some more complex evolutionary process leading to proteins that do not align easily by sequence alone. As the number of sequence and their divergence increases many more errors will be made simply because of the heuristic nature of MSA algorithms. Multiple sequence alignment viewers enable alignments to be visually reviewed, often by inspecting the quality of alignment for annotated functional sites on two or more sequences. Many also enable the alignment to be edited to correct these (usually minor) errors, in order to obtain an optimal curated ...
Multiple sequence alignment for short sequences Kristóf Takács Multiple sequence alignment (MSA) has been one of the most important problems in bioinformatics for more decades and it is still heavily examined by many mathematicians and biologists. However, mostly because of the practical motivation of this problem, the research on this topic is focused on aligning…
Identification of regions in multiple sequence alignments thermodynamically suitable for targeting by consensus oligonucleotides: application to HIV genome - Background: Computer programs for the generation of multiple sequence alignments such as Clustal W allow detection of regions that are most conserved among many sequence variants. However, even for regions that are equally conserved, their potential utility as hybridization targets varies. Mismatches in sequence variants are more disruptive in some duplexes than in others. Additionally, the propensity for self-interactions amongst oligonucleotides targeting conserved regions differs and the structure of target regions themselves can also influence hybridization efficiency. There is a need to develop software that will employ thermodynamic selection criteria for finding optimal hybridization targets in related sequences. Results: A new scheme and new software for optimal detection of oligonucleotide hybridization targets common to families of
This page offers the web documents that are referred to in Chapter 6. In Chapter 3 we discussed pairwise alignment, and then in Chapters 4 and 5 we described how a protein or DNA query can be compared to a database. This chapter covers a series of approaches to multiple sequence alignment, including the popular method of progressive alignment and new methods such as consistency-based and structure-based alignment. We also discuss ways to multiply align long segments of genomic DNA. ...
Object for the calculation of a multiple sequence alignment from a set of unaligned sequences or alignments using the TCoffee program
Automatic extraction of reliable regions from multiple sequence alignments : High quality multiple alignments are crucial in the transfer of annotation from one genome to another. Multiple alignment methods strive to achieve ever increasing levels of average accuracy on benchmark sets while the accuracy of individual alignments is often overlooked. Results We have previously developed a method to automatically assess the accuracy and overall difficulty of multiple
This paper presents [email protected], a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB identifiers or directly uploaded into the server. Given a set of sequences and structures, pairs of structures are aligned with SAP while sequence-structure pairs are aligned with Fugue. The resulting collection of pairwise alignments is then combined into an MSA with the T-Coffee algorithm. The server and its documentation are available from http://igs-server.cnrs-mrs.fr/Tcoffee/.. ...
Announcement: This hands-on computer workshop is designed for people having previous experience with macromolecular visualization in any of the many software packages available. It will focus on the capabilities of Protein Explorer and Chemscape Chime, targeting interests expressed by the participants. Topics may include how to use an automated interface for detailed exploration of noncovalent bonds (the Noncovalent Bond Finder); finding energetically significant cation-pi interactions; generating overviews of noncovalent interactions using contact surface displays; how to animate functional conformational changes or movements, such as the binding of calcium to an EF-hand; searching for proteins with similar structures (regardless of sequence) and viewing the resulting structure alignments. We may also create multiple protein sequence alignments and color 3D proteins by conservation and mutation frequency. (If you already have some multiple protein sequence alignments, bring them in FASTA/PIR ...
PROBCONS is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools ...
TY - JOUR. T1 - SinicView. T2 - A visualization environment for comparisons of multiple nucleotide sequence alignment tools. AU - Shih, Arthur Chun Chieh. AU - Lee, D. T.. AU - Lin, Laurent. AU - Peng, Chin Lin. AU - Chen, Shiang Heng. AU - Wu, Yu Wei. AU - Wong, Chun Yi. AU - Chou, Meng Yuan. AU - Shiao, Tze Chang. AU - Hsieh, Mu Fen. PY - 2006/3/2. Y1 - 2006/3/2. N2 - Background: Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of ...
Sequence similarity with experimentally characterized gene products, as determined by alignments, either pairwise or multiple (tools such as BLAST, ClustalW, MUSCLE). An entry in the with field is mandatory. The ISA code is a sub-category of the ISS code. It should be used whenever a sequence alignment is the basis for making an annotation, but only when a curator has manually reviewed the alignment and choice of GO term or if the information is in a published paper, the authors have manually reviewed the evidence. Such alignments may be pairwise alignments (the alignment of two sequences to one another) or multiple alignments (the alignment of 3 or more sequences to one another). BLAST produces pairwise alignments and any annotations based solely on the evaluation of BLAST results should use this code. GO policy states that in order to assert that a query protein has the same function as a match protein, the match protein MUST be experimentally characterized. This prevents transitive annotation ...
Description:. An X-drop within an alignment, where X,0, is a region of consecutive columns scoring less than -X. Alignments containing no such X-drop are called X-alignments. Obviously, X-alignments avoid the first problem that local alignments contain internal segments scoring less than -X. A normal alignment is an alignment where each prefix or suffix has a non-negative score. Such an alignment is called maximal if it is not contained in any longer normal alignment. Maximal normal alignments clearly avoid the second problem that an entire alignment scores less than a prefix or suffix. The algorithm proposed by Zhang et al. constructs a tree that allows to decompose an alignment into all X-full subalignments where X-full refers to subalignments that are maximal normal alignments and X-alignments. The tree encodes all X-full alignments for all X greater or equal to 0. Hence, the decomposition corresponding to any particular value of X can be readily extracted from the tree. The goal of this ...
I have a set of 520 influenza sequences for which I have already done multiple sequence alignment, and computed the pairwise identity matrix. If Id like to add in another sequence, I have to re-align everything, and recompute the entire PWI matrix. Is there any program I can use to append this other sequence to the alignment, and only compute the PWI w.r.t. every other sequence?. A simple example would be as follows. I have a 2x2 alignment, with the following scores.. ...
The feasibility of predicting the global fold of small proteins by incorporating predicted secondary and tertiary restraints into ab initio folding simulations has been demonstrated on a test set comprised of 20 non-homologous proteins, of which one was a blind prediction of target 42 in the recent CASP2 contest. These proteins contain from 37 to 100 residues and represent all secondary structural classes and a representative variety of global topologies. Secondary structure restraints are provided by the PHD secondary structure prediction algorithm that incorporates multiple sequence information. Predicted tertiary restraints are derived from multiple sequence alignments via a two-step process. First, seed side-chain contacts are identified from correlated mutation analysis, and then a threading-based algorithm is used to expand the number of these seed contacts. A lattice-based reduced protein model and a folding algorithm designed to incorporate these predicted restraints is described. ...
Template:Text-needed See also Wikiomics:Bioinfo_tutorial#Protein_Alignment Multiple sequence alignment is widely used in the sequence analysis. It is more reliable, and hosts more information than derived from BLAST multiple pairwise alignment. The MSA allows for identification of common regions between proteins (including motifs), finding conserved residues and analysis of evolutionary relationships between sequences. ...
High-throughput sequencing has laid the foundation for fast and cost-effective development of phylogenetic markers. Here we present the program discomark, which streamlines the development of nuclear DNA (nDNA) markers from whole-genome (or whole-transcriptome) sequencing data, combining local alignment, alignment trimming, reference mapping and primer design based on multiple sequence alignments to design primer pairs from input orthologous sequences. To demonstrate the suitability of discomark, we designed markers for two groups of species, one consisting of closely related species and one group of distantly related species. For the closely related members of the species complex of Cloeon dipterum s.l. (Insecta, Ephemeroptera), the program discovered a total of 78 markers. Among these, we selected eight markers for amplification and Sanger sequencing. The exon sequence alignments (2526 base pairs) were used to reconstruct a well-supported phylogeny and to infer clearly structured haplotype ...
In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the
Multiple sequence alignment plays an important role in molecular sequence analysis. An alignment is the arrangement of two (pairwise alignment) or more (multiple alignment) sequences of residues (nucleotides or amino acids) that maximizes the similarities between them. Algorithmically, the problem consists of opening and extending gaps in the sequences to maximize an objective function (measurement of similarity). A simple genetic algorithm was developed and implemented in the software MSA-GA. Genetic algorithms, a class of evolutionary algorithms, are well suited for problems of this nature since residues and gaps are discrete units. An evolutionary algorithm cannot compete in terms of speed with progressive alignment methods but it has the advantage of being able to correct for initially misaligned sequences; which is not possible with the progressive method. This was shown using the BaliBase benchmark, where Clustal-W alignments were used to seed the initial population in MSA-GA, improving outcome.
Problem statement: The parallelization of multiple progressive alignment algorithms is a difficult task. All known methods have strong bottlenecks resulting from synchronization delays. This is even more constraining in distributed memory systems, where message passing also delays the interprocess communication. Despite these drawbacks, parallel computing is becoming increasingly necessary to perform multiple sequence alignment. Approach: In this study, it is introduced a solution for parallelizing multiple progressive alignments in distributed memory systems that overcomes such delays. Results: The proposed approach uses threads to separate actual alignment from synchronization and communication. It also uses a different approach to schedule independent tasks. Conclusion/Recommendations: The approach was intensively tested, producing a performance remarkably better than a largely used algorithm. It is suggested that it can be applied to improve the performance of some multiple alignment tools, ...
The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is web ready: written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components.. AVAILABILITY AND IMPLEMENTATION: The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/Supplementary information: Supplementary data are available at Bioinformatics online.. CONTACT: [email protected] ...
The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is web ready: written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components.. AVAILABILITY AND IMPLEMENTATION: The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/Supplementary information: Supplementary data are available at Bioinformatics online.. CONTACT: [email protected] ...
Copyright 2009 by Cymon J. Cox. All rights reserved. # This code is part of the Biopython distribution and governed by its # license. Please see the LICENSE file that should have been included # as part of this package. Command line wrapper for the multiple alignment programme MAFFT. http://align.bmr.kyushu-u.ac.jp/mafft/software/ Citations: Katoh, Toh (BMC Bioinformatics 9:212, 2008) Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework (describes RNA structural alignment methods) Katoh, Toh (Briefings in Bioinformatics 9:286-298, 2008) Recent developments in the MAFFT multiple sequence alignment program (outlines version 6) Katoh, Toh (Bioinformatics 23:372-374, 2007) Errata PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences (describes the PartTree algorithm) Katoh, Kuma, Toh, Miyata (Nucleic Acids Res. 33:511-518, 2005) MAFFT version 5: improvement in accuracy of multiple sequence ...
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit ...
Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the
1) Multiple Sequence Alignment and Analysis with Jalview on Thursday 23rd November 2017. Day 1 workshop employs talks and hands-on exercises to help students learn to use Jalview, a versatile protein and nucleic acid sequence alignment and analysis tool developed within the School of Life Sciences. We will cover launching Jalview, accessing sequence, alignment and 3D structure databases, creating, editing and analysing alignments, phylogenetic trees, analysing alignments with 3D structures, and preparation of figures for presentation and publication.. Workshop trainer: Dr Jim Procter and Dr Suzanne Duce. (2) Protein Sequence Analysis on Thursday 30th November 2017. Day 2 workshop aims to give an understanding of how best to use computational methods to make sense of the structure and function of your favourite protein. The workshop will introduce the principles of sequence analysis and its relationship to protein structure and function. It will highlight common methods and tools for protein ...
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Hi, Im new to programming so forgive me if I say something obviously stupid. Im interested in writing a program to do some primer-design tasks, among other things. The first thing I want the program to do, however, is a multiple sequence alignment. I realise this is like reinventing the wheel, which Id rather not do. Are there a few standard algorithms out there for this task? What about other standard molecualr biology algorithms? Also, maybe someone could suggest a few good beginning references for this sort of programing. Thanks! -- Susan http://www4.ncsu.edu/unity/users/s/sjhogart/public/home.html Check this! http://homepage.cistron.nl/~peterh/gsresources/ ...
Tools for Bioinformatics: DNA Sequence Analysis - Features of DNA sequence analysis, Approaches to EST analysis; Pairwise alignment techniques: Comparing two sequences, PAM and BLOSUM, Global alignment (The Needleman and Wunsch algorithm), Local Alignment (The Smith-Waterman algorithm), Dynamic programming, Pairwise database searching: Sequence analysis- BLAST and other related tools, Different methods of Multiple sequence alignment, Searching databases with multiple alignments; Alignment Scores, Design and Analysis of microarray experiments. ...
From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches are often incapable of detecting hidden structural relationships in the twilight zone of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent d). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments. We demonstrate that our models trained on Α-helical domains can be successfully transferred to recognize sequences encoding Β-sheet domains. Training and benchmarking on a larger, highly challenging data sets shows significant improvement over established approaches.. Notice: This server is freely available to all academic and non-commercial users ...
Regina Barzilay of MIT and Lillian Lee of Cornell University have developed a computer program that can automatically paraphrase English sentences: The program culls text from online news services on particular topics, determines distinguishing sentence patterns in these clusters, and employs these patterns to generate new sentences that convey the same message with different wording. Potential applications for such a tool include report summarization, document checking for repetition or plagiarism, and a way for authors to automatically rewrite their prose to readers of different backgrounds, which Lee describes as a style dial. Kevin Knight of the University of Southern California remarks that the program may even be able to help facilitate machine translation. Barzilay and Lee tested the program by having a computer categorize Agence France-Presse and Reuters articles according to subject, and then look for sentence clusters possessing similar words and phrases; the researchers used a ...
TY - JOUR. T1 - Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes. AU - Have, Christian Theil. AU - Zambach, Sine. AU - Christiansen, Henning. PY - 2013. Y1 - 2013. N2 - BackgroundPyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.ResultsWe propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a ...
BACKGROUND: The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments.. RESULTS: Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using ...
Publications, Genomes and Genes, Scientific Experts, Species, Research Topics, Locale about Experts and Doctors on sequence alignment in Mississippi, United States
DIALIGN is a software program for multiple sequence alignment developed by Burkhard Morgensternet al.. While standard alignment methods rely on comparing single residues and imposing gap penalties, DIALIGN constructs pairwise and multiple alignments by comparing entire segments of the sequences. No gap penalty is used. This approach can be used for both global and local alignment, but it is particularly successful in situations where sequences share only local homologies.. The latest version of the program, DIALIGN-TX, is described in Subramanian et al. (2008), Algorithms Mol. Biol. 3:6. A web server for this program is available at Goettingen Bioinformatics Compute Server (GOBICS). A web server for multiple alignment with user-defined constraints (anchor points) as described by Morgenstern et al. (2006), Algorithms Mol. Biol. 1:6 is also available through GOBICS.. During the last few years, DIALIGN has been successfully used by many researchers to align genomic sequences; some break-through ...
DIALIGN is a software program for multiple sequence alignment developed by Burkhard Morgensternet al.. While standard alignment methods rely on comparing single residues and imposing gap penalties, DIALIGN constructs pairwise and multiple alignments by comparing entire segments of the sequences. No gap penalty is used. This approach can be used for both global and local alignment, but it is particularly successful in situations where sequences share only local homologies.. The latest version of the program, DIALIGN-TX, is described in Subramanian et al. (2008), Algorithms Mol. Biol. 3:6. A web server for this program is available at Goettingen Bioinformatics Compute Server (GOBICS). A web server for multiple alignment with user-defined constraints (anchor points) as described by Morgenstern et al. (2006), Algorithms Mol. Biol. 1:6 is also available through GOBICS.. During the last few years, DIALIGN has been successfully used by many researchers to align genomic sequences; some break-through ...
Jalview http://www.jalview.org/ https://tess.elixir-europe.org/content_providers/jalview Jalview (www.jalview.org) is free-to-use sequence alignment and analysis visualisation software that links genomic variants, protein alignments and 3D structure. Protein, RNA and DNA data can be directly accessed from public databases (e.g. Pfam, Rfam, PDB, UniProt and ENA etc.). Jalview has editing and annotation functionality within a fully integrated, multiple window interface. The sequence alignment programs Clustal Omega, Muscle, MAFFT, ProbCons, T-COFFEE, ClustalW, MSA Prob and GLProb can be run directly from within Jalview. Jalview integrates protein secondary structure prediction (JPred), generate trees, assesses consensus and conservation across sequence families. Journal quality figures can be generated from the results. The Jalview Desktop will run on Mac, MS Windows, Linux and any other platform that supports Java. It has been developed in Geoff Bartons group (www.compbio.dundee.ac.uk) in the ...
Jalview http://www.jalview.org/ https://tess.elixir-europe.org/content_providers/jalview Jalview (www.jalview.org) is free-to-use sequence alignment and analysis visualisation software that links genomic variants, protein alignments and 3D structure. Protein, RNA and DNA data can be directly accessed from public databases (e.g. Pfam, Rfam, PDB, UniProt and ENA etc.). Jalview has editing and annotation functionality within a fully integrated, multiple window interface. The sequence alignment programs Clustal Omega, Muscle, MAFFT, ProbCons, T-COFFEE, ClustalW, MSA Prob and GLProb can be run directly from within Jalview. Jalview integrates protein secondary structure prediction (JPred), generate trees, assesses consensus and conservation across sequence families. Journal quality figures can be generated from the results. The Jalview Desktop will run on Mac, MS Windows, Linux and any other platform that supports Java. It has been developed in Geoff Bartons group (www.compbio.dundee.ac.uk) in the ...
Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences. A comparison of the frequency of sequence alignments, determined by MegaBLAST, between rice coding sequences in TIGR pseudomolecules and annotations vs 4.0 and comprehensive transcript-assembly and methylation-filtered databases from Lolium perenne (ryegrass), Zea mays (maize), Hordeum vulgare (barley), Glycine max (soybean) and Arabidopsis thaliana (thale cress) was undertaken. Each rice pseudomolecule was divided into 10 segments, each containing 10% of the functionally annotated, expressed genes. This indicated a correlation between relative segment position in the rice genome and numbers of alignments with all the queried monocot and dicot plant databases. Colour-coded moving windows of 100 functionally
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent …
Another possibility is to use bioedit that is a alignemnt sequence editor software. this allows easily to align by Clustal the selected sequences and also is possible to performs blast searches directly rom the main windows, retrieve sequences (with all the GenBank information) directli from NCBI and align again... if well setted is also possible to use the complete phylip package to make trees ...
Fig. 1. Phylogram demonstrating amino acid sequence identity among Cry and Cyt proteins. This phylogenetic tree is modified from a TREEVIEW visualization of NEIGHBOR treatment of a CLUSTAL W multiple alignment and distance matrix of the full-length toxin sequences, as described in the text. The gray vertical bars demarcate the four levels of nomenclature ranks. Based on the low percentage of identical residues and the absence of any conserved sequence blocks in multiple-sequence alignments, the lower four lineages are not treated as part of the main toxin family, and their nodes have been replaced with dashed horizontal lines in this figure. ...
A key element in evaluating the quality of a pairwise sequence alignment is the substitution matrix, which assigns a score for aligning any possible pair of residues. The theory of amino acid substitution matrices is described in [1], and applied to DNA sequence comparison in [2]. In general, different substitution matrices are tailored to detecting similarities among sequences that are diverged by differing degrees [1-3]. A single matrix may nevertheless be reasonably efficient over a relatively broad range of evolutionary change [1-3]. Experimentation has shown that the BLOSUM-62 matrix [4] is among the best for detecting most weak protein similarities. For particularly long and weak alignments, the BLOSUM-45 matrix may prove superior. A detailed statistical theory for gapped alignments has not been developed, and the best gap costs to use with a given substitution matrix are determined empirically. Short alignments need to be relatively strong (i.e. have a higher percentage of matching ...
TY - JOUR. T1 - Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. AU - Thomsen, Martin Christen Frølund. AU - Nielsen, Morten. PY - 2012. Y1 - 2012. N2 - Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving ...
This track shows multiple alignments of 100 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species. The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. PHAST/Multiz are built from chains (alignable) and nets (syntenic), see the documentation of the Chain/Net tracks for a description of the complete alignment process. PhastCons is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth ...
Myoskeletal Alignment Techniques is a term first coined by Dalton in the early 1980s. However, Dalton never stops developing the MAT system. Over the years, the work of Phillip Greenman, Serge Gracovetsky and many other visionaries in kinesiology and human performance have been integrated into his training programs. By teaching how to identify and correct dysfunctional, neurologically-driven strain patterns before they become pain patterns, he has created one of the most integrative and complete perspectives on pain management.. MAT practitioners learn how to take clients through a series of sessions in deep tissue therapy that calms hyper-excited nerve receptors. When the pain-generating stimulus is effectively interrupted, new memories can be programmed into muscle cells by inhibiting the chemical activation of pain, which allows the brain to downgrade its signals for chronic protective spasms.. Of course, effective bodywork depends on much more than intellectual knowledge. Daltons program ...
Link to Pubmed [PMID] - 17359063. Phys. Rev. Lett. 2007 Feb;98(7):078101. Alignment algorithms usually rely on simplified models of gaps for computational efficiency. Based on correspondences between alignments and structural models for nucleic acids, and using methods from statistical mechanics, we show that alignments with realistic laws for gaps can be computed with fast algorithms. Improved performances of probabilistic alignments with realistic models of gaps are illustrated. By contrast with optimization-based alignments, such improvements with realistic laws are not observed. General perspectives for biological and physical modelings are mentioned.. https://www.ncbi.nlm.nih.gov/pubmed/17359063 ...
TY - JOUR. T1 - ArchAlign. T2 - Coordinate-free chromatin alignment reveals novel architectures. AU - Lai, William K.M.. AU - Buck, Michael J.. PY - 2010/12/23. Y1 - 2010/12/23. N2 - To facilitate identification and characterization of genomic functional elements, we have developed a chromatin architecture alignment algorithm (ArchAlign). ArchAlign identifies shared chromatin structural patterns from high-resolution chromatin structural datasets derived from next-generation sequencing or tiled microarray approaches for user defined regions of interest. We validated ArchAlign using well characterized functional elements, and used it to explore the chromatin structural architecture at CTCF binding sites in the human genome. ArchAlign is freely available at http://www.acsu.buffalo.edu/~mjbuck/ArchAlign.html.. AB - To facilitate identification and characterization of genomic functional elements, we have developed a chromatin architecture alignment algorithm (ArchAlign). ArchAlign identifies shared ...
Next Generation Sequencing (NGS) technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/M...read more ...
TechTalks.tv is making it super-easy to publish, search and learn from slide-based videos, all in order to share educational content on the web.
We present a method for prediction of functional sites in a set of aligned protein sequences. The method selects sites which are both well conserved and clustered together in space, as inferred from the 3D structures of proteins included in the alignment. We tested the method using 86 alignments from the NCBI CDD database, where the sites of experimentally determined ligand and/or macromolecular interactions are annotated. In agreement with earlier investigations, we found that functional site predictions are most successful when overall background sequence conservation is low, such that sites under evolutionary constraint become apparent. In addition, we found that averaging of conservation values across spatially clustered sites improves predictions under certain conditions: that is, when overall conservation is relatively high and when the site in question involves a large macromolecular binding interface. Under these conditions it is better to look for clusters of conserved sites than to ...
generalized Algebraic Dynamic Programming. A selection of (sequence) alignment algorithms. Both terminal, and syntactic variables, as well as the index type is not fixed here. This makes it possible to select the correct structure of the grammar here, but bind the required data type for alignment in user code.. That being said, these algorithms are mostly aimed towards sequence alignment problems.. List of grammars for sequences:. ...
alignment of short DNA sequences The package provices a reimplementation of the Nearest Alignment Space Termination tool in Python. It was prepared for next generation sequencers. Given a set of sequences and a template alignment, PyNAST will align the input sequences against the template alignment, and return a multiple sequence alignment which contains the same number of positions (or columns) as the template alignment. This facilitates the analysis of new sequences in the context of existing alignments, and additional data derived from existing alignments such as phylogenetic trees. Because any protein or nucleic acid sequences and template alignments can be provided, PyNAST is not limited to the analysis of 16s rDNA sequences. ...
Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill ...
2009/8/4 Ryan Golhar ,golharam at umdnj.edu,: ,,, Im trying to perform a large amount of sequence alignments of long DNA ,,, sequences, some up to 163,000+ bp in length. I was trying to use the ,,, standard Needleman-Wunsch algorithm, but the matrix used requires a ,,, large amount of memory...about 100 GB of memory. This obviously wont ,,, work. ,, ,, How many were you trying to align? You mean 163kb or 163Mb? ,, I was looking for test or comparisons for some alignment code I had which ,, indexed the target sequences, dont recall the suggestions ,, for that discussion but I was able to do simple genomes reasonably well ( ,, I think I used 2 strains of e coli or something about 5 megs long) ,, on a desktop. If you can find responses to my request from a few years ago ,, that may ( or may not ) help. Id offer my code, and indeed I think ,, I have it on a website, but I stopped development and not sure ,, it is nearly useful as-is unless you just want coarse alignment on ,, two similar ...
Traditionally, multiple sequence alignment algorithms use computationally complex heuristics to align the sequences. Unfortunately, the use of heuristics do not guarantee global optimization as it would be prohibitively computationally expensive to achieve an optimal alignment. This is due in part to the sheer size of the genome, which consists of roughly three billion base pairs, and the increasing computational complexity resulting from each additional sequence in an alignment.. ...
Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing biological sequences, such as the amino-acid sequences of different proteins or the DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if human beings carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. BLAST is one of the most widely used bioinformatics programs, probably because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. This emphasis on speed is vital to making the algorithm practical on the huge genome databases currently available, although subsequent algorithms can be even faster. ...