Sequence Alignment: The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.Sequence Analysis, Protein: A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.Software: Sequential operating programs and data which instruct the functioning of a digital computer.Algorithms: A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.Molecular Sequence Data: Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.Amino Acid Sequence: The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION.Computational Biology: A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets.Phylogeny: The relationships of groups of organisms as reflected by their genetic makeup.Sequence Homology, Amino Acid: The degree of similarity between sequences of amino acids. This information is useful for the analyzing genetic relatedness of proteins and species.Proteins: Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.Databases, Protein: Databases containing information about PROTEINS such as AMINO ACID SEQUENCE; PROTEIN CONFORMATION; and other properties.Sequence Analysis, DNA: A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis.Models, Molecular: Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures.Conserved Sequence: A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often composed of conserved sequences.Internet: A loose confederation of computer communication networks around the world. The networks that make up the Internet are connected through several backbone networks. The Internet grew out of the US Government ARPAnet project and was designed to facilitate information exchange.Base Sequence: The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence.Evolution, Molecular: The process of cumulative change at the level of DNA; RNA; and PROTEINS, over successive generations.User-Computer Interface: The portion of an interactive computer program that issues messages to and receives commands from a user.Software Validation: The act of testing the software for compliance with a standard.Structural Homology, Protein: The degree of 3-dimensional shape similarity between proteins. It can be an indication of distant AMINO ACID SEQUENCE HOMOLOGY and used for rational DRUG DESIGN.Sequence Analysis: A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information.Sequence Analysis, RNA: A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE.Protein Structure, Tertiary: The level of protein structure in which combinations of secondary protein structures (alpha helices, beta sheets, loop regions, and motifs) pack together to form folded shapes called domains. Disulfide bridges between cysteines in two different parts of the polypeptide chain along with other interactions between the chains play a role in the formation and stabilization of tertiary structure. Small proteins usually consist of only one domain but larger proteins may contain a number of domains connected by segments of polypeptide chain which lack regular secondary structure.Computing Methodologies: Computer-assisted analysis and processing of problems in a particular area.Sequence Homology: The degree of similarity between sequences. Studies of AMINO ACID SEQUENCE HOMOLOGY and NUCLEIC ACID SEQUENCE HOMOLOGY provide useful information about the genetic relatedness of genes, gene products, and species.Computer Graphics: The process of pictorial communication, between human and computers, in which the computer input and output have the form of charts, drawings, or other appropriate pictorial representation.Protein Structure, Secondary: The level of protein structure in which regular hydrogen-bond interactions within contiguous stretches of polypeptide chain give rise to alpha helices, beta strands (which align to form beta sheets) or other types of coils. This is the first folding level of protein conformation.Markov Chains: A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system.Protein Conformation: The characteristic 3-dimensional shape of a protein, including the secondary, supersecondary (motifs), tertiary (domains) and quaternary structure of the peptide chain. PROTEIN STRUCTURE, QUATERNARY describes the conformation assumed by multimeric proteins (aggregates of more than one polypeptide chain).Genomics: The systematic study of the complete DNA sequences (GENOME) of organisms.Computer Simulation: Computer-based representation of physical systems and phenomena such as chemical processes.Databases, Genetic: Databases devoted to knowledge about specific genes and gene products.Sequence Homology, Nucleic Acid: The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function.Genome: The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA.Binding Sites: The parts of a macromolecule that directly participate in its specific combination with another molecule.Databases, Factual: Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references.Word Processing: Text editing and storage functions using computer software.Databases, Nucleic Acid: Databases containing information about NUCLEIC ACIDS such as BASE SEQUENCE; SNPS; NUCLEIC ACID CONFORMATION; and other properties. Information about the DNA fragments kept in a GENE LIBRARY or GENOMIC LIBRARY is often maintained in DNA databases.Models, Genetic: Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment.Models, Statistical: Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc.Information Storage and Retrieval: Organized activities related to the storage, location, search, and retrieval of information.Pattern Recognition, Automated: In INFORMATION RETRIEVAL, machine-sensing or identification of visible patterns (shapes, forms, and configurations). (Harrod's Librarians' Glossary, 7th ed)Cloning, Molecular: The insertion of recombinant DNA molecules from prokaryotic and/or eukaryotic sources into a replicating vehicle, such as a plasmid or virus vector, and the introduction of the resultant hybrid molecules into recipient cells without altering the viability of those cells.Amino Acid Motifs: Commonly observed structural components of proteins formed by simple combinations of adjacent secondary structures. A commonly observed structure may be composed of a CONSERVED SEQUENCE which can be represented by a CONSENSUS SEQUENCE.Consensus Sequence: A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one which occurs most frequently at that site in the different sequences which occur in nature. The phrase also refers to an actual sequence which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by conserved sequences.Likelihood Functions: Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters.Reproducibility of Results: The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results.Software Design: Specifications and instructions applied to the software.Mutagenesis, Site-Directed: Genetically engineered MUTAGENESIS at a specific site in the DNA molecule that introduces a base substitution, or an insertion or deletion.INDEL Mutation: A mutation named with the blend of insertion and deletion. It refers to a length difference between two ALLELES where it is unknowable if the difference was originally caused by a SEQUENCE INSERTION or by a SEQUENCE DELETION. If the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region, it is also a FRAMESHIFT MUTATION.Programming Languages: Specific languages used to prepare computer programs.Cluster Analysis: A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both.RNA: A polynucleotide consisting essentially of chains with a repeating backbone of phosphate and ribose units to which nitrogenous bases are attached. RNA is unique among biological macromolecules in that it can encode genetic information, serve as an abundant structural component of cells, and also possesses catalytic activity. (Rieger et al., Glossary of Genetics: Classical and Molecular, 5th ed)Escherichia coli: A species of gram-negative, facultatively anaerobic, rod-shaped bacteria (GRAM-NEGATIVE FACULTATIVELY ANAEROBIC RODS) commonly found in the lower part of the intestine of warm-blooded animals. It is usually nonpathogenic, but some strains are known to produce DIARRHEA and pyogenic infections. Pathogenic strains (virotypes) are classified by their specific pathogenic mechanisms such as toxins (ENTEROTOXIGENIC ESCHERICHIA COLI), etc.Protein Folding: Processes involved in the formation of TERTIARY PROTEIN STRUCTURE.Database Management Systems: Software designed to store, manipulate, manage, and control data for specific uses.Bacterial Proteins: Proteins found in any species of bacterium.Nucleic Acid Conformation: The spatial arrangement of the atoms of a nucleic acid or polynucleotide that results in its characteristic 3-dimensional shape.Bone Malalignment: Displacement of bones out of line in relation to joints. It may be congenital or traumatic in origin.Multigene Family: A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes. Examples of multigene families include those that encode the hemoglobins, immunoglobulins, histocompatibility antigens, actins, tubulins, keratins, collagens, heat shock proteins, salivary glue proteins, chorion proteins, cuticle proteins, yolk proteins, and phaseolins, as well as histones, ribosomal RNA, and transfer RNA genes. The latter three are examples of reiterated genes, where hundreds of identical genes are present in a tandem array. (King & Stanfield, A Dictionary of Genetics, 4th ed)Models, Chemical: Theoretical representations that simulate the behavior or activity of chemical processes or phenomena; includes the use of mathematical equations, computers, and other electronic equipment.Mutation: Any detectable and heritable change in the genetic material that causes a change in the GENOTYPE and which is transmitted to daughter cells and to succeeding generations.Artificial Intelligence: Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language.Amino Acid Substitution: The naturally occurring or experimentally induced replacement of one or more AMINO ACIDS in a protein with another. If a functionally equivalent amino acid is substituted, the protein may retain wild-type activity. Substitution may also diminish, enhance, or eliminate protein function. Experimentally induced substitution is often used to study enzyme activities and binding site properties.Species Specificity: The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species.Catalytic Domain: The region of an enzyme that interacts with its substrate to cause the enzymatic reaction.DNA: A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine).Protein Binding: The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments.Structure-Activity Relationship: The relationship between the chemical structure of a compound and its biological or pharmacological activity. Compounds are often classed together because they have structural characteristics in common including shape, size, stereochemical arrangement, and distribution of functional groups.Substrate Specificity: A characteristic feature of enzyme activity in relation to the kind of substrate on which the enzyme or catalytic molecule reacts.Crystallography, X-Ray: The study of crystal structure using X-RAY DIFFRACTION techniques. (McGraw-Hill Dictionary of Scientific and Technical Terms, 4th ed)Sensitivity and Specificity: Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed)Genetic Variation: Genotypic differences observed among individuals in a population.Expressed Sequence Tags: Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived.Work Simplification: The construction or arrangement of a task so that it may be done with the greatest possible efficiency.Bayes Theorem: A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result.Catalysis: The facilitation of a chemical reaction by material (catalyst) that is not consumed by the reaction.Recombinant Proteins: Proteins prepared by recombinant DNA technology.Probability: The study of chance processes or the relative frequency characterizing a chance process.DNA, Complementary: Single-stranded complementary DNA synthesized from an RNA template by the action of RNA-dependent DNA polymerase. cDNA (i.e., complementary DNA, not circular DNA, not C-DNA) is used in a variety of molecular cloning experiments as well as serving as a specific hybridization probe.RNA, Untranslated: RNA which does not code for protein but has some enzymatic, structural or regulatory function. Although ribosomal RNA (RNA, RIBOSOMAL) and transfer RNA (RNA, TRANSFER) are also untranslated RNAs they are not included in this scope.Open Reading Frames: A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon (CODON, TERMINATOR).Entropy: The measure of that part of the heat or energy of a system which is not available to perform work. Entropy increases in all natural (spontaneous and irreversible) processes. (From Dorland, 28th ed)Archaea: One of the three domains of life (the others being BACTERIA and Eukarya), formerly called Archaebacteria under the taxon Bacteria, but now considered separate and distinct. They are characterized by: (1) the presence of characteristic tRNAs and ribosomal RNAs; (2) the absence of peptidoglycan cell walls; (3) the presence of ether-linked lipids built from branched-chain subunits; and (4) their occurrence in unusual habitats. While archaea resemble bacteria in morphology and genomic organization, they resemble eukarya in their method of genomic replication. The domain contains at least four kingdoms: CRENARCHAEOTA; EURYARCHAEOTA; NANOARCHAEOTA; and KORARCHAEOTA.Sarcocystidae: A family of parasitic organisms in the order EIMERIIDAE. They form tissue-cysts in their intermediate hosts, ultimately leading to pathogenesis in the final hosts that includes various mammals (including humans) and birds. The most important genera include NEOSPORA; SARCOCYSTIS; and TOXOPLASMA.DNA Primers: Short sequences (generally about 10 base pairs) of DNA that are complementary to sequences of messenger RNA and allow reverse transcriptases to start copying the adjacent sequences of mRNA. Primers are used extensively in genetic and molecular biology techniques.Base Pairing: Pairing of purine and pyrimidine bases by HYDROGEN BONDING in double-stranded DNA or RNA.ComputersGenome, Human: The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs.Benchmarking: Method of measuring performance against established standards of best practice.Data Compression: Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.Quality Control: A system for verifying and maintaining a desired level of quality in a product or process by careful planning, use of proper equipment, continued inspection, and corrective action as required. (Random House Unabridged Dictionary, 2d ed)Chromosome Mapping: Any method used for determining the location of and relative distances between genes on a chromosome.Monte Carlo Method: In statistics, a technique for numerically approximating the solution of a mathematical problem by studying the distribution of some random variable, often generated by a computer. The name alludes to the randomness characteristic of the games of chance played at the gambling casinos in Monte Carlo. (From Random House Unabridged Dictionary, 2d ed, 1993)Genome, Bacterial: The genetic complement of a BACTERIA as represented in its DNA.Genome, Viral: The complete genetic complement contained in a DNA or RNA molecule in a virus.Data Interpretation, Statistical: Application of statistical procedures to analyze specific observed or assumed facts from a particular study.Computer Communication Networks: A system containing any combination of computers, computer terminals, printers, audio or visual display devices, or telephones interconnected by telecommunications equipment or cables: used to transmit or receive information. (Random House Unabridged Dictionary, 2d ed)Pan troglodytes: The common chimpanzee, a species of the genus Pan, family HOMINIDAE. It lives in Africa, primarily in the tropical rainforests. There are a number of recognized subspecies.Tetraodontiformes: A small order of primarily marine fish containing 340 species. Most have a rotund or box-like shape. TETRODOTOXIN is found in their liver and ovaries.Kinetics: The rate dynamics in chemical or physical systems.Polymerase Chain Reaction: In vitro method for producing large amounts of specific DNA or RNA fragments of defined length and sequence from small amounts of short oligonucleotide flanking sequences (primers). The essential steps include thermal denaturation of the double-stranded target molecules, annealing of the primers to their complementary sequences, and extension of the annealed primers by enzymatic synthesis with DNA polymerase. The reaction is efficient, specific, and extremely sensitive. Uses for the reaction include disease diagnosis, detection of difficult-to-isolate pathogens, mutation analysis, genetic testing, DNA sequencing, and analyzing evolutionary relationships.RNA, Ribosomal: The most abundant form of RNA. Together with proteins, it forms the ribosomes, playing a structural role and also a role in ribosomal binding of mRNA and tRNAs. Individual chains are conventionally designated by their sedimentation coefficients. In eukaryotes, four large chains exist, synthesized in the nucleolus and constituting about 50% of the ribosome. (Dorland, 28th ed)Genes, Overlapping: Genes whose nucleotide sequences overlap to some degree. The overlapped sequences may involve structural or regulatory genes of eukaryotic or prokaryotic cells.Enzyme Stability: The extent to which an enzyme retains its structural conformation or its activity when subjected to storage, isolation, and purification or various other physical or chemical manipulations, including proteolytic enzymes and heat.Nucleotide Motifs: Commonly observed BASE SEQUENCE or nucleotide structural components which can be represented by a CONSENSUS SEQUENCE or a SEQUENCE LOGO.Amino Acids: Organic compounds that generally contain an amino (-NH2) and a carboxyl (-COOH) group. Twenty alpha-amino acids are the subunits which are polymerized to form proteins.Plant Proteins: Proteins found in plants (flowers, herbs, shrubs, trees, etc.). The concept does not include proteins found in vegetables for which VEGETABLE PROTEINS is available.Exons: The parts of a transcript of a split GENE remaining after the INTRONS are removed. They are spliced together to become a MESSENGER RNA or other functional RNA.Biological Evolution: The process of cumulative change over successive generations through which organisms acquire their distinguishing morphological and physiological characteristics.Gene Expression Profiling: The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell.Protein Engineering: Procedures by which protein structure and function are changed or created in vitro by altering existing or synthesizing new structural genes that direct the synthesis of proteins with sought-after properties. Such procedures may include the design of MOLECULAR MODELS of proteins using COMPUTER GRAPHICS or other molecular modeling techniques; site-specific mutagenesis (MUTAGENESIS, SITE-SPECIFIC) of existing genes; and DIRECTED MOLECULAR EVOLUTION techniques to create new genes.Hypermedia: Computerized compilations of information units (text, sound, graphics, and/or video) interconnected by logical nonlinear linkages that enable users to follow optimal paths through the material and also the systems used to create and display this information. (From Thesaurus of ERIC Descriptors, 1994)Molecular Sequence Annotation: The addition of descriptive information about the function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA record.Codon: A set of three nucleotides in a protein coding sequence that specifies individual amino acids or a termination signal (CODON, TERMINATOR). Most codons are universal, but some organisms do not produce the transfer RNAs (RNA, TRANSFER) complementary to all codons. These codons are referred to as unassigned codons (CODONS, NONSENSE).Protein Structure, Quaternary: The characteristic 3-dimensional shape and arrangement of multimeric proteins (aggregates of more than one polypeptide chain).DNA, Intergenic: Any of the DNA in between gene-coding DNA, including untranslated regions, 5' and 3' flanking regions, INTRONS, non-functional pseudogenes, and non-functional repetitive sequences. This DNA may or may not encode regulatory functions.High-Throughput Nucleotide Sequencing: Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc.DNA, Bacterial: Deoxyribonucleic acid that makes up the genetic material of bacteria.Genes, Bacterial: The functional hereditary units of BACTERIA.Bacteria: One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal. Bacteria can be classified by their response to OXYGEN: aerobic, anaerobic, or facultatively anaerobic; by the mode by which they obtain their energy: chemotrophy (via chemical reaction) or PHOTOTROPHY (via light reaction); for chemotrophs by their source of chemical energy: CHEMOLITHOTROPHY (from inorganic compounds) or chemoorganotrophy (from organic compounds); and by their source for CARBON; NITROGEN; etc.; HETEROTROPHY (from organic sources) or AUTOTROPHY (from CARBON DIOXIDE). They can also be classified by whether or not they stain (based on the structure of their CELL WALLS) with CRYSTAL VIOLET dye: gram-negative or gram-positive.Gene Library: A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences.Knee Joint: A synovial hinge connection formed between the bones of the FEMUR; TIBIA; and PATELLA.Mathematical Computing: Computer-assisted interpretation and analysis of various mathematical functions related to a particular problem.Automation: Controlled operation of an apparatus, process, or system by mechanical or electronic devices that take the place of human organs of observation, effort, and decision. (From Webster's Collegiate Dictionary, 1993)Genome, Plant: The genetic complement of a plant (PLANTS) as represented in its DNA.Systems Integration: The procedures involved in combining separately developed modules, components, or subsystems so that they work together as a complete system. (From McGraw-Hill Dictionary of Scientific and Technical Terms, 4th ed)Circular Dichroism: A change from planar to elliptic polarization when an initially plane-polarized light wave traverses an optically active medium. (McGraw-Hill Dictionary of Scientific and Technical Terms, 4th ed)Neural Networks (Computer): A computer architecture, implementable in either hardware or software, modeled after biological neural networks. Like the biological system in which the processing capability is a result of the interconnection strengths between arrays of nonlinear processing nodes, computerized neural networks, often called perceptrons or multilayer connectionist models, consist of neuron-like units. A homogeneous group of units makes up a layer. These networks are good at pattern recognition. They are adaptive, performing tasks by example, and thus are better for decision-making than are linear learning machines or cluster analysis. They do not require explicit programming.Membrane Proteins: Proteins which are found in membranes including cellular and intracellular membranes. They consist of two types, peripheral and integral proteins. They include most membrane-associated enzymes, antigenic proteins, transport proteins, and drug, hormone, and lectin receptors.Hydrogen Bonding: A low-energy attractive force between hydrogen and another element. It plays a major role in determining the properties of water, proteins, and other compounds.Introns: Sequences of DNA in the genes that are located between the EXONS. They are transcribed along with the exons but are removed from the primary gene transcript by RNA SPLICING to leave mature RNA. Some introns code for separate genes.Evaluation Studies as Topic: Studies determining the effectiveness or value of processes, personnel, and equipment, or the material on conducting such studies. For drugs and devices, CLINICAL TRIALS AS TOPIC; DRUG EVALUATION; and DRUG EVALUATION, PRECLINICAL are available.Escherichia coli Proteins: Proteins obtained from ESCHERICHIA COLI.HMG-Box Domains: DNA-binding domains present in proteins of the HMG-box superfamily including the archetypal HMGB PROTEINS, a number of sequence specific TRANSCRIPTION FACTORS, and other DNA-BINDING PROTEINS. The domains consist of 70-80 amino acids that form an L-shaped fold from three alpha-helical segments. The domain has the capacity to recognize and/or induce specific DNA structures and effect the accessibility of the DNA to other proteins involved in transcription, recombination, or DNA repair. (Note that not all HIGH MOBILITY GROUP PROTEINS contain this domain.)Cattle: Domesticated bovine animals of the genus Bos, usually kept on a farm or ranch and used for the production of meat or dairy products or for heavy labor.Eukaryotic Cells: Cells of the higher organisms, containing a true nucleus bounded by a nuclear membrane.Recombination, Genetic: Production of new arrangements of DNA by various mechanisms such as assortment and segregation, CROSSING OVER; GENE CONVERSION; GENETIC TRANSFORMATION; GENETIC CONJUGATION; GENETIC TRANSDUCTION; or mixed infection of viruses.DNA Mutational Analysis: Biochemical identification of mutational changes in a nucleotide sequence.Oryza sativa: Annual cereal grass of the family POACEAE and its edible starchy grain, rice, which is the staple food of roughly one-half of the world's population.RNA, Bacterial: Ribonucleic acid in bacteria having regulatory and catalytic roles as well as involvement in protein synthesis.Mammals: Warm-blooded vertebrate animals belonging to the class Mammalia, including all that possess hair and suckle their young.Nucleotides: The monomeric units from which DNA or RNA polymers are constructed. They consist of a purine or pyrimidine base, a pentose sugar, and a phosphate group. (From King & Stansfield, A Dictionary of Genetics, 4th ed)DNA Barcoding, Taxonomic: Techniques for standardizing and expediting taxonomic identification or classification of organisms that are based on deciphering the sequence of one or a few regions of DNA known as the "DNA barcode".Saccharomyces cerevisiae: A species of the genus SACCHAROMYCES, family Saccharomycetaceae, order Saccharomycetales, known as "baker's" or "brewer's" yeast. The dried form is used as a dietary supplement.Viral Proteins: Proteins found in any species of virus.Solanaceae: A plant family of the order Solanales, subclass Asteridae. Among the most important are POTATOES; TOMATOES; CAPSICUM (green and red peppers); TOBACCO; and BELLADONNA.Peptides: Members of the class of compounds composed of AMINO ACIDS joined together by peptide bonds between adjacent amino acids into linear, branched or cyclical structures. OLIGOPEPTIDES are composed of approximately 2-12 amino acids. Polypeptides are composed of approximately 13 or more amino acids. PROTEINS are linear polypeptides that are normally synthesized on RIBOSOMES.Synteny: The presence of two or more genetic loci on the same chromosome. Extensions of this original definition refer to the similarity in content and organization between chromosomes, of different species for example.Sequence Deletion: Deletion of sequences of nucleic acids from the genetic material of an individual.Models, Theoretical: Theoretical representations that simulate the behavior or activity of systems, processes, or phenomena. They include the use of mathematical equations, computers, and other electronic equipment.Documentation: Systematic organization, storage, retrieval, and dissemination of specialized information, especially of a scientific or technical nature (From ALA Glossary of Library and Information Science, 1983). It often involves authenticating or validating information.Mutagenesis: Process of generating a genetic MUTATION. It may occur spontaneously or be induced by MUTAGENS.National Library of Medicine (U.S.): An agency of the NATIONAL INSTITUTES OF HEALTH concerned with overall planning, promoting, and administering programs pertaining to advancement of medical and related sciences. Major activities of this institute include the collection, dissemination, and exchange of information important to the progress of medicine and health, research in medical informatics and support for medical library development.Mutagenesis, Insertional: Mutagenesis where the mutation is caused by the introduction of foreign DNA sequences into a gene or extragenic sequence. This may occur spontaneously in vivo or be experimentally induced in vivo or in vitro. Proviral DNA insertions into or adjacent to a cellular proto-oncogene can interrupt GENETIC TRANSLATION of the coding sequences or interfere with recognition of regulatory elements and cause unregulated expression of the proto-oncogene resulting in tumor formation.Histidine: An essential amino acid that is required for the production of HISTAMINE.DNA, Ribosomal: DNA sequences encoding RIBOSOMAL RNA and the segments of DNA separating the individual ribosomal RNA genes, referred to as RIBOSOMAL SPACER DNA.Ligands: A molecule that binds to another molecule, used especially to refer to a small molecule that binds specifically to a larger molecule, e.g., an antigen binding to an antibody, a hormone or neurotransmitter binding to a receptor, or a substrate or allosteric effector binding to an enzyme. Ligands are also molecules that donate or accept a pair of electrons to form a coordinate covalent bond with the central metal atom of a coordination complex. (From Dorland, 27th ed)Archaeal Proteins: Proteins found in any species of archaeon.Carrier Proteins: Transport proteins that carry specific substances in the blood or across cell membranes.Imaging, Three-Dimensional: The process of generating three-dimensional images by electronic, photographic, or other methods. For example, three-dimensional images can be generated by assembling multiple tomographic images with the aid of a computer, while photographic 3-D images (HOLOGRAPHY) can be made by exposing film to the interference pattern created when two laser light sources shine on an object.Molecular Conformation: The characteristic three-dimensional shape of a molecule.Prokaryotic Cells: Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region.Pseudogenes: Genes bearing close resemblance to known genes at different loci, but rendered non-functional by additions or deletions in structure that prevent normal transcription or translation. When lacking introns and containing a poly-A segment near the downstream end (as a result of reverse copying from processed nuclear RNA into double-stranded DNA), they are called processed genes.DNA, Chloroplast: Deoxyribonucleic acid that makes up the genetic material of CHLOROPLASTS.Dimerization: The process by which two molecules of the same chemical composition form a condensation product or polymer.Surgery, Computer-Assisted: Surgical procedures conducted with the aid of computers. This is most frequently used in orthopedic and laparoscopic surgery for implant placement and instrument guidance. Image-guided surgery interactively combines prior CT scans or MRI images with real-time video.Enzymes: Biological molecules that possess catalytic activity. They may occur naturally or be synthetically created. Enzymes are usually proteins, however CATALYTIC RNA and CATALYTIC DNA molecules have also been identified.Models, Biological: Theoretical representations that simulate the behavior or activity of biological processes or diseases. For disease models in living animals, DISEASE MODELS, ANIMAL is available. Biological models include the use of mathematical equations, computers, and other electronic equipment.Classification: The systematic arrangement of entities in any field into categories classes based on common characteristics such as properties, morphology, subject matter, etc.Transcription Factors: Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process.RNA, Ribosomal, 16S: Constituent of 30S subunit prokaryotic ribosomes containing 1600 nucleotides and 21 proteins. 16S rRNA is involved in initiation of polypeptide synthesis.Repetitive Sequences, Nucleic Acid: Sequences of DNA or RNA that occur in multiple copies. There are several types: INTERSPERSED REPETITIVE SEQUENCES are copies of transposable elements (DNA TRANSPOSABLE ELEMENTS or RETROELEMENTS) dispersed throughout the genome. TERMINAL REPEAT SEQUENCES flank both ends of another sequence, for example, the long terminal repeats (LTRs) on RETROVIRUSES. Variations may be direct repeats, those occurring in the same direction, or inverted repeats, those opposite to each other in direction. TANDEM REPEAT SEQUENCES are copies which lie adjacent to each other, direct or inverted (INVERTED REPEAT SEQUENCES).Static Electricity: The accumulation of an electric charge on a objectModels, Structural: A representation, generally small in scale, to show the structure, construction, or appearance of something. (From Random House Unabridged Dictionary, 2d ed)Protein Interaction Mapping: Methods for determining interaction between PROTEINS.Thermodynamics: A rigorously mathematical analysis of energy relationships (heat, work, temperature, and equilibrium). It describes systems whose states are determined by thermal parameters, such as temperature, in addition to mechanical and electromagnetic parameters. (From Hawley's Condensed Chemical Dictionary, 12th ed)Peptide Fragments: Partial proteins formed by partial hydrolysis of complete proteins or generated through PROTEIN ENGINEERING techniques.Plants: Multicellular, eukaryotic life forms of kingdom Plantae (sensu lato), comprising the VIRIDIPLANTAE; RHODOPHYTA; and GLAUCOPHYTA; all of which acquired chloroplasts by direct endosymbiosis of CYANOBACTERIA. They are characterized by a mainly photosynthetic mode of nutrition; essentially unlimited growth at localized regions of cell divisions (MERISTEMS); cellulose within cells providing rigidity; the absence of organs of locomotion; absence of nervous and sensory systems; and an alternation of haploid and diploid generations.Tibia: The second longest bone of the skeleton. It is located on the medial side of the lower leg, articulating with the FIBULA laterally, the TALUS distally, and the FEMUR proximally.Cysteine: A thiol-containing non-essential amino acid that is oxidized to form CYSTINE.Methanococcus: A genus of anaerobic coccoid METHANOCOCCACEAE whose organisms are motile by means of polar tufts of flagella. These methanogens are found in salt marshes, marine and estuarine sediments, and the intestinal tract of animals.Base Composition: The relative amounts of the PURINES and PYRIMIDINES in a nucleic acid.Molecular Structure: The location of the atoms, groups or ions relative to one another in a molecule, as well as the number, type and location of covalent bonds.RNA, Messenger: RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm.Time Factors: Elements of limited time intervals, contributing to particular results or situations.

Intracellular signalling: PDK1--a kinase at the hub of things. (1/38700)

Phosphoinositide-dependent kinase 1 (PDK1) is at the hub of many signalling pathways, activating PKB and PKC isoenzymes, as well as p70 S6 kinase and perhaps PKA. PDK1 action is determined by colocalization with substrate and by target site availability, features that may enable it to operate in both resting and stimulated cells.  (+info)

Molecular phylogeny of the ETS gene family. (2/38700)

We have constructed a molecular phylogeny of the ETS gene family. By distance and parsimony analysis of the ETS conserved domains we show that the family containing so far 29 different genes in vertebrates can be divided into 13 groups of genes namely ETS, ER71, GABP, PEA3, ERG, ERF, ELK, DETS4, ELF, ESE, TEL, YAN, SPI. Since the three dimensional structure of the ETS domain has revealed a similarity with the winged-helix-turn-helix proteins, we used two of them (CAP and HSF) to root the tree. This allowed us to show that the family can be divided into five subfamilies: ETS, DETS4, ELF, TEL and SPI. The ETS subfamily comprises the ETS, ER71, GABP, PEA3, ERG, ERF and the ELK groups which appear more related to each other than to any other ETS family members. The fact that some members of these subfamilies were identified in early metazoans such as diploblasts and sponges suggests that the diversification of ETS family genes predates the diversification of metazoans. By the combined analysis of both the ETS and the PNT domains, which are conserved in some members of the family, we showed that the GABP group, and not the ERG group, is the one most closely related to the ETS group. We also observed that the speed of accumulation of mutations in the various genes of the family is highly variable. Noticeably, paralogous members of the ELK group exhibit strikingly different evolutionary speed suggesting that the evolutionary pressure they support is very different.  (+info)

Crystal structure of MHC class II-associated p41 Ii fragment bound to cathepsin L reveals the structural basis for differentiation between cathepsins L and S. (3/38700)

The lysosomal cysteine proteases cathepsins S and L play crucial roles in the degradation of the invariant chain during maturation of MHC class II molecules and antigen processing. The p41 form of the invariant chain includes a fragment which specifically inhibits cathepsin L but not S. The crystal structure of the p41 fragment, a homologue of the thyroglobulin type-1 domains, has been determined at 2.0 A resolution in complex with cathepsin L. The structure of the p41 fragment demonstrates a novel fold, consisting of two subdomains, each stabilized by disulfide bridges. The first subdomain is an alpha-helix-beta-strand arrangement, whereas the second subdomain has a predominantly beta-strand arrangement. The wedge shape and three-loop arrangement of the p41 fragment bound to the active site cleft of cathepsin L are reminiscent of the inhibitory edge of cystatins, thus demonstrating the first example of convergent evolution observed in cysteine protease inhibitors. However, the different fold of the p41 fragment results in additional contacts with the top of the R-domain of the enzymes, which defines the specificity-determining S2 and S1' substrate-binding sites. This enables inhibitors based on the thyroglobulin type-1 domain fold, in contrast to the rather non-selective cystatins, to exhibit specificity for their target enzymes.  (+info)

A single membrane-embedded negative charge is critical for recognizing positively charged drugs by the Escherichia coli multidrug resistance protein MdfA. (4/38700)

The nature of the broad substrate specificity phenomenon, as manifested by multidrug resistance proteins, is not yet understood. In the Escherichia coli multidrug transporter, MdfA, the hydrophobicity profile and PhoA fusion analysis have so far identified only one membrane-embedded charged amino acid residue (E26). In order to determine whether this negatively charged residue may play a role in multidrug recognition, we evaluated the expression and function of MdfA constructs mutated at this position. Replacing E26 with the positively charged residue lysine abolished the multidrug resistance activity against positively charged drugs, but retained chloramphenicol efflux and resistance. In contrast, when the negative charge was preserved in a mutant with aspartate instead of E26, chloramphenicol recognition and transport were drastically inhibited; however, the mutant exhibited almost wild-type multidrug resistance activity against lipophilic cations. These results suggest that although the negative charge at position 26 is not essential for active transport, it dictates the multidrug resistance character of MdfA. We show that such a negative charge is also found in other drug resistance transporters, and its possible significance regarding multidrug resistance is discussed.  (+info)

Anopheles gambiae Ag-STAT, a new insect member of the STAT family, is activated in response to bacterial infection. (5/38700)

A new insect member of the STAT family of transcription factors (Ag-STAT) has been cloned from the human malaria vector Anopheles gambiae. The domain involved in DNA interaction and the SH2 domain are well conserved. Ag-STAT is most similar to Drosophila D-STAT and to vertebrate STATs 5 and 6, constituting a proposed ancient class A of the STAT family. The mRNA is expressed at all developmental stages, and the protein is present in hemocytes, pericardial cells, midgut, skeletal muscle and fat body cells. There is no evidence of transcriptional activation following bacterial challenge. However, bacterial challenge results in nuclear translocation of Ag-STAT protein in fat body cells and induction of DNA-binding activity that recognizes a STAT target site. In vitro treatment with pervanadate (vanadate and H2O2) translocates Ag-STAT to the nucleus in midgut epithelial cells. This is the first evidence of direct participation of the STAT pathway in immune responses in insects.  (+info)

Assembly requirements of PU.1-Pip (IRF-4) activator complexes: inhibiting function in vivo using fused dimers. (6/38700)

Gene expression in higher eukaryotes appears to be regulated by specific combinations of transcription factors binding to regulatory sequences. The Ets factor PU.1 and the IRF protein Pip (IRF-4) represent a pair of interacting transcription factors implicated in regulating B cell-specific gene expression. Pip is recruited to its binding site on DNA by phosphorylated PU.1. PU.1-Pip interaction is shown to be template directed and involves two distinct protein-protein interaction surfaces: (i) the ets and IRF DNA-binding domains; and (ii) the phosphorylated PEST region of PU.1 and a lysine-requiring putative alpha-helix in Pip. Thus, a coordinated set of protein-protein and protein-DNA contacts are essential for PU.1-Pip ternary complex assembly. To analyze the function of these factors in vivo, we engineered chimeric repressors containing the ets and IRF DNA-binding domains connected by a flexible POU domain linker. When stably expressed, the wild-type fused dimer strongly repressed the expression of a rearranged immunoglobulin lambda gene, thereby establishing the functional importance of PU.1-Pip complexes in B cell gene expression. Comparative analysis of the wild-type dimer with a series of mutant dimers distinguished a gene regulated by PU.1 and Pip from one regulated by PU.1 alone. This strategy should prove generally useful in analyzing the function of interacting transcription factors in vivo, and for identifying novel genes regulated by such complexes.  (+info)

Analysis of two cosmid clones from chromosome 4 of Drosophila melanogaster reveals two new genes amid an unusual arrangement of repeated sequences. (7/38700)

Chromosome 4 from Drosophila melanogaster has several unusual features that distinguish it from the other chromosomes. These include a diffuse appearance in salivary gland polytene chromosomes, an absence of recombination, and the variegated expression of P-element transgenes. As part of a larger project to understand these properties, we are assembling a physical map of this chromosome. Here we report the sequence of two cosmids representing approximately 5% of the polytenized region. Both cosmid clones contain numerous repeated DNA sequences, as identified by cross hybridization with labeled genomic DNA, BLAST searches, and dot matrix analysis, which are positioned between and within the transcribed sequences. The repetitive sequences include three copies of the mobile element Hoppel, one copy of the mobile element HB, and 18 DINE repeats. DINE is a novel, short repeated sequence dispersed throughout both cosmid sequences. One cosmid includes the previously described cubitus interruptus (ci) gene and two new genes: that a gene with a predicted amino acid sequence similar to ribosomal protein S3a which is consistent with the Minute(4)101 locus thought to be in the region, and a novel member of the protein family that includes plexin and met-hepatocyte growth factor receptor. The other cosmid contains only the two short 5'-most exons from the zinc-finger-homolog-2 (zfh-2) gene. This is the first extensive sequence analysis of noncoding DNA from chromosome 4. The distribution of the various repeats suggests its organization is similar to the beta-heterochromatic regions near the base of the major chromosome arms. Such a pattern may account for the diffuse banding of the polytene chromosome 4 and the variegation of many P-element transgenes on the chromosome.  (+info)

The mouse Aire gene: comparative genomic sequencing, gene organization, and expression. (8/38700)

Mutations in the human AIRE gene (hAIRE) result in the development of an autoimmune disease named APECED (autoimmune polyendocrinopathy candidiasis ectodermal dystrophy; OMIM 240300). Previously, we have cloned hAIRE and shown that it codes for a putative transcription-associated factor. Here we report the cloning and characterization of Aire, the murine ortholog of hAIRE. Comparative genomic sequencing revealed that the structure of the AIRE gene is highly conserved between human and mouse. The conceptual proteins share 73% homology and feature the same typical functional domains in both species. RT-PCR analysis detected three splice variant isoforms in various mouse tissues, and interestingly one isoform was conserved in human, suggesting potential biological relevance of this product. In situ hybridization on mouse and human histological sections showed that AIRE expression pattern was mainly restricted to a few cells in the thymus, calling for a tissue-specific function of the gene product.  (+info)

*Ancestral reconstruction

For example, PAML is a collection of programs for the phylogenetic analysis of DNA and protein sequence alignments by maximum ... These states include the genetic sequence (ancestral sequence reconstruction), the amino acid sequence of a protein, the ... Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other ... This allowed them to hypothesize a phylogeny for the sequences, and to infer that the standard sequence was probably also the ...

*Molecular phylogenetics

Bast, F. (2013). "Sequence Similarity Search, Multiple Sequence Alignment, Model Selection, Distance Matrix and Phylogeny ... The most common approach is the comparison of homologous sequences for genes using sequence alignment techniques to identify ... Modern sequence comparison techniques overcome this objection by the use of multiple sequences. Once the divergences between ... At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence ...

*Structural alignment software

All Atoms Alignment; SSE -- Secondary Structure Elements Alignment; Seq -- Sequence-based alignment Pair -- Pairwise Alignment ... Inverse alignments, C α only models, Alternative alignments, and Non-sequential alignments". BMC Bioinformatics. 14 (24). doi: ... Brown, P.; Pullan W.; Yang Y.; Zhou Y. (Oct 2015). "Fast and accurate non-sequential protein structure alignment using a new ... This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise ...

*Sequence alignment

Multiple sequence alignment is an extension of pairwise alignment to incorporate more than two sequences at a time. Multiple ... Sequence homology Sequence mining BLAST String searching algorithm Alignment-free sequence analysis UGENE Needleman-Wunsch ... alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time, but they are efficient ... the short sequence should be globally aligned but only a local alignment is desired for the long sequence. Pairwise sequence ...

*Multiple sequence alignment

... tree alignment Phylogenetics Sequence alignment software Multiple sequence alignment viewers Structural alignment Alignment- ... A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or ... Multiple sequence alignment also refers to the process of aligning such a sequence set. Because three or more sequences of ... Multiple sequence alignment viewers enable alignments to be visually reviewed, often by inspecting the quality of alignment for ...

*List of sequence alignment software

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment ... Sequence type: protein or nucleotide *Sequence type: protein or nucleotide **Alignment type: local or global *Sequence type: ... Alignment type: local or global *Sequence type: protein or nucleotide *Sequence type: protein or nucleotide Please see List of ... and multiple sequence alignment. See structural alignment software for structural alignment of proteins. * ...

*Alignment-free sequence analysis

... but when the sequences are divergent, a reliable alignment cannot be obtained and hence the applications of sequence alignment ... alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based ... pairwise or multiple sequence alignment. Alignment-based approaches generally give excellent results when the sequences under ... Sequence analysis Multiple sequence alignment Phylogenomics Bioinformatics Metagenomics Next-generation sequencing Population ...

*SH3D21

"Sequence Alignment". ALIGN. Retrieved 8 May 2013. ... Sequence identity was calculated using available sequence data ... In humans, these SH3 domains have a common amino acid sequence Asp-Glu-Leu. This sequence motif is also conserved in other ...

*Point accepted mutation

Point mutation Sequence alignment Margaret Dayhoff Molecular clock BLOSUM BLAST Campbell NA, Reece JB, Meyers N; Urry LA; Cain ... In bioinformatics, PAM matrices are regularly used as substitution matrices to score sequence alignments for proteins. Each ... Pevsner J (2009). "Pairwise Sequence Alignment". Bioinformatics and Functional Genomics (2nd ed.). Wiley-Blackwell. pp. 58-68. ... are also used as a scoring matrix when comparing DNA sequences or protein sequences to judge the quality of the alignment. This ...

*Inferring horizontal gene transfer

An Appraisal of Benchmarks for Multiple Sequence Alignment". Multiple Sequence Alignment Methods. Methods in molecular biology ... These tests assess the likelihood of the gene sequence alignment when the reference topology is given as the null hypothesis. ... Given simulated sequences which have HGT, analysis of those sequences using the methods of interest and comparison of their ... The donor sequences are inserted into the host unchanged or can be further evolved by simulation, e.g., using the tools ...

*C10orf76

"Sib Dotlet Sequence Alignment". Retrieved 13 May 2013. Pandey NB, Marzluff WF (Dec 1987). "The stem-loop structure at the 3' ... The following table illustrates the sequence similarity between human c10orf76 protein and various orthologs. Similar sequences ... "PredictProtein - Sequence Analysis, Structure and Function Prediction". Retrieved 18 April 2013. Lupas A, Van Dyke M, Stock J ( ... There are ten conserved potential phosphorylation sites within the protein sequence. Also, there are nine residues that are ...

*Clustal

Sequence alignment software DNASTAR Sequence mining T-Coffee Align-m DIALIGN-T DIALIGN-TX JAligner MAFFT MAVID MUSCLE ProbCons ... Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap ... All variants of Clustal align sequences by three main steps: Do a pairwise alignment Create a guide tree (or use a user-defined ... Multiple Sequence Alignment Methods. Methods in Molecular Biology. Humana Press. pp. 105-116. doi:10.1007/978-1-62703-646-7_6. ...

*BAli-Phy

Output alignments include homology information for sequences at internal nodes of the tree. Sequence alignment software ... BAli-Phy is a free software program for simultaneously estimating a multiple sequence alignment and its phylogenetic tree. BAli ... BAli-Phy takes alignment uncertainty into account while estimating the phylogeny by averaging over possible alignments. Unlike ... Alignment uncertainty stems from two main sources: near-optimal alignments and evolutionary parameter uncertainty. Evolutionary ...

*PANDIT (database)

Phylogeny Sequence alignment Whelan, Simon; de Bakker Paul I W; Quevillon Emmanuel; Rodriguez Nicolas; Goldman Nick (Jan 2006 ... PANDIT is a database of multiple sequence alignments and phylogenetic trees covering many common protein domains. ...

*TMEM229B

See multiple sequence alignment below. Annotated diagram of the TMEM229b gene (with its 3 exons), mature mRNA and protein ... Expressed Sequence Tag mapping of TMEM229B gene expression indicates that it is ubiquitously expressed throughout the body. ... "NCBI Nuceleotide BLAST". Basic Local Alignment Search. "EST profile: TMEM229B". UniGene. National Library of Medicine. ...

*MUSCLE (alignment software)

Sequence alignment software DNASTAR Clustal ProbCons AMAP T-COFFEE MAFFT Edgar RC (2004). "MUSCLE: multiple sequence alignment ... MUltiple Sequence Comparison by Log-Expectation (MUSCLE) is computer software for multiple sequence alignment of protein and ... "MUSCLE < Multiple Sequence Alignment < EMBL-EBI". Retrieved 1 September 2014. "Robert C. Edgar - Google Scholar Citations". ... The first paper, published in Nucleic Acids Research, introduced the sequence alignment algorithm. The second paper, published ...

*DECIPHER (software)

Sequence databases: import, maintain, view, and export sequences. Multiple sequence alignment: align sequences of DNA, RNA, or ... Sequence alignment software Wright ES (2015). "DECIPHER: harnessing local sequence context to improve protein multiple sequence ... Genome alignment: find and align the syntenic regions of multiple genomes. Oligonucleotide design: primer design for polymerase ... Manipulate sequences: trim low quality regions, correct frameshifts, reorient nucleotides, determine consensus, or digest with ...

*Ewan Birney

... publications indexed by Google Scholar Birney, Ewan (2000). Sequence alignment in bioinformatics (PhD thesis). ... Feb 2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860-921. Bibcode:2001Natur.409..860L. doi ... He has played a role in annotating the genome sequences of the human, mouse, chicken and several other organisms. His research ... He wrote the first error tolerant, splice aware protein alignment program, used in the human and subsequent genome analysis; he ...

*MAFFT

In bioinformatics, MAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. The software is named ... Sequence alignment software Clustal Katoh, Kazutaka; Misawa, Kazuharu; Kuma, Kei-ichi; Miyata, Takashi (2002). "MAFFT: a novel ... improvement in accuracy of multiple sequence alignment". Nucleic Acids Research. 33 (2): 511-8. doi:10.1093/nar/gki198. PMC ... method for rapid multiple sequence alignment based on fast Fourier transform". Nucleic Acids Research. 30 (14): 3059-66. doi: ...

*Align-m

... multiple sequence alignment, include extra information to guide the sequence alignment, multiple structural alignment, homology ... combining many alignments into one consensus sequence, multiple genome alignment (can cope with rearrangements). Sequence ... combining sequence and structure alignment data, 'filtering' of BLAST or other pairwise alignments, ... Align-m is a multiple sequence alignment program written by Ivo Van Walle. Align-m has the ability to accomplish the following ...

*Richard M. Durbin

Birney, Ewan (2000). Sequence alignment in bioinformatics (PhD thesis). University of Cambridge. Holmes, Ian (1999). Studies in ... c. Elegans Sequencing, C. (1998). "Genome sequence of the nematode C. Elegans: A platform for investigating biology". Science. ... More recently Durbin has returned to sequencing and has developed low coverage approaches to population genome sequencing, ... February 2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860-921. doi:10.1038/35057062. ISSN ...

*2D gel analysis software

Sequence alignment software Biological data visualization. ...

*Lorraine Lisiecki

... for probabilistic sequence alignment of stratigraphic records. Paleoceanography software designed to find the optimal alignment ... "Probabilistic sequence alignment of stratigraphic records". Paleoceanography. 29. doi:10.1002/2014pa002713. "Match Software". ... In the LR04, there are higher resolution records, an improved alignment technique, and a higher percentage of records from the ... Lisiecki's Ph.D. thesis was titled "Paleoclimate time series: New alignment and compositing techniques, a 5.3-Myr benthic δ18O ...

*Circular permutation in proteins

Sequence-based algorithms require only the sequence of two proteins in order to create an alignment. Sequence methods are ... Traditional algorithms for sequence alignment and structure alignment are not able to detect circular permutations between ... Many sequence alignment and protein structure alignment algorithms have been developed assuming linear data representations and ... Zuker, M. (1991). "Suboptimal sequence alignment in molecular biology. Alignment with error analysis". Journal of Molecular ...

*Bacterial phylodynamics

Multiple sequence alignment algorithms can leave a large amount of indels in the sequence alignment when the indels do not ... Multiple sequence alignment algorithms (e.g., MUSCLE, MAFFT, and CLUSAL W) will align the data set with all selected sequences ... After the running a multiple sequence alignment algorithm, manual editing the alignment is highly recommended. ... If phylogenetic signal of an alignment is too low then a longer alignment or an alignment of another gene in the organism may ...

*Baum-Welch algorithm

Bishop, Martin J.; Thompson, Elizabeth A. (20 July 1986). "Maximum likelihood alignment of DNA sequences". Journal of Molecular ... This is equivalent to the number of times state i is observed in the sequence from t = 1 to t = T − 1. b i ∗ ( v k ) = ∑ t = 1 ... The GENSCAN webserver is a gene locator capable of analyzing eukaryotic sequences up to one million base-pairs (1 Mbp) long. ... The feature is then compared to all sequences of the speech recognition units. These units could be phonemes, syllables, or ...

*LRRN3

A multiple sequence alignment has shown this very high conservation of the LRRN3 gene among many different species. All 12 of ... Multiple Sequence Alignment". [permanent dead link] "UniGene: Leucine-Rich Repeat Neuronal 3". "Allen Brain Atlas: Leucine-Rich ... The Ig and FN3 domains also show high conservation in all of the orthologous sequences for mammals and birds, but are not as ... The LINGO1 ectodomain also has a very long stretch of leucine-rich repeats which is the region that has the best alignment with ...
To address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the ...
CLUSTAL-W is currently one of the most popular automated multiple sequence alignment tools. CLUSTAL-W calculates a distance matrix for the sequences that are to be aligned. The distance matrix is then used to generate a phylogenetic tree that is used to guide the series of global alignments needed to create the multiple alignment. This is referred to as progressive alignment. Mutliple sequence alignments may also be created by hand and involve gapped or ungapped sequences. Typically, gapped alignments are used for full protein sequences, whereas ungapped alignments may be used to identify protein domains or motifs (See BLOCKS database).. Other multiple sequence alignment methods include DIALIGN, T-Coffee, and POA (Lassman and Sonnhammer, 2002).. ...
Jalview hands-on training course is for anyone who works with sequence data and multiple sequence alignments from proteins, RNA and DNA.. Register via the University of Cambridge website.. Jalview is free software for protein and nucleic acid sequence alignment generation, visualisation and analysis. It includes sophisticated editing options and provides a range of analysis tools to investigate the structure and function of macromolecules through a multiple window interface. For example, Jalview supports 8 popular methods for multiple sequence alignment, prediction of protein secondary structure by JPred and disorder prediction by four methods. Jalview also has options to generate phylogenetic trees, and assess consensus and conservation across sequence families. Sequences, alignments and additional annotation can be accessed directly from public databases and journal-quality figures generated for publication.. The course involves of a mixture of talks and hands-on exercises.. Day 1 is an ...
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10,000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.
article{abfc0cb4-89dc-418e-ba07-8622f92c12c9, abstract = {Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments.}, articleno = {484}, author = {Ahola, Virpi and Aittokallio, Tero and Vihinen, Mauno and Uusipaikka, Esa}, issn = {1471-2105}, keyword = {Computational Biology: methods}, language = {eng}, publisher = {BioMed Central}, series = {BMC Bioinformatics}, title = {A statistical score for assessing the quality of multiple sequence alignments.}, url = {http://dx.doi.org/10.1186/1471-2105-7-484}, volume = {7}, year = {2006 ...
CombAlign is a new Python code that generates a gapped, multiple structure-based sequence alignment (MSSA) given a set of pairwise structure-based sequence alignments. CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related structures. The method for combining multiple pairwise alignments is straightforward, involving the recording of pre-computed residue-residue correspondences between positions on the reference protein and each compared structure, and insertion of non-redundant gaps, as needed, to reflect amino-acid deletions or structural divergence in the reference relative to one or more compared structures.. CombAlign is not intended for use in applications for which greater benefit would be provided using a multiple structure alignment as generated by the vast majority of open-source programs [20], nor does it propose to address matters of protein evolution or function ...
Scoring matrix for amino acid alignment. The BLOSUM62 matrix is adopted as a default scoring matrix, because this showed slightly higher accuracy values than the BLOSUM80, 45, JTT200PAM, 100PAM and Gonnet matrices in SABmark tests. Scoring matrix for nucleotide alignment. The default scoring matrix is derived from Kimuras two-parameter model. The ratio of transitions to transversions is set at 2 by default. Other parameters can be used, but have not yet been tested. Gap penalties for proteins. The default gap penalties for amino acid alignments have been changed in v.4.0. Note that the current version of MAFFT returns an entirely different alignment from v.,4.0. In v.4.0, two major gap penalties (--op [gap open penalty] and --ep [offset value, which functions like a gap extension penalty, see the mafft3 paper for definition]) were tuned by applying the FFT-NS-2 option to a part of the SABmark benchmark. We adopted the parameter set (--op 1.53 --ep 0.123) optimized for SABmark, because this ...
DNA sequence alignment is a critical step in identifying homology between organisms. The most widely used alignment program, ClustalW, is known to suffer from the local minima problem, where suboptimal guide trees produce incorrect gap insertions. The optimization alignment approach, has been shown to be effective in combining alignment and phylogenetic search in order to avoid the problems associated with poor guide trees. The optimization alignment algorithm operates at a small grain size, aligning each tree found, wasting time producing multiple sequence alignments for suboptimal trees. This research develops and analyzes a large grain size algorithm for optimization alignment that iterates through steps of alignment and phylogeny search, thus improving the quality of guide trees used for computation of multiple sequence alignments and eliminating computation of multiple sequence alignments for sub-optimal guide trees. Local minima are avoided by the use of stochastic search methods. Large Grain Size
Multiple sequence alignments (MSAs) are essential in most bioinformatics analyses that involve comparing homologous sequences. The exact way of computing an optimal alignment between N sequences has a computational complexity of O(LN) for N sequences of length L making it prohibitive for even small numbers of sequences. Most automatic methods are based on the progressive alignment heuristic (Hogeweg and Hesper, 1984), which aligns sequences in larger and larger subalignments, following the branching order in a guide tree. With a complexity of roughly O(N2), this approach can routinely make alignments of a few thousand sequences of moderate length, but it is tough to make alignments much bigger than this. The progressive approach is a greedy algorithm where mistakes made at the initial alignment stages cannot be corrected later. To counteract this effect, the consistency principle was developed (Notredame et al, 2000). This has allowed the production of a new generation of more accurate ...
This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment. Key map: Class: Cα -- Backbone Atom (Cα) Alignment; AllA -- All Atoms Alignment; SSE -- Secondary Structure Elements Alignment; Seq -- Sequence-based alignment Pair -- Pairwise Alignment (2 structures *only*); Multi -- Multiple Structure Alignment (MStA); C-Map -- Contact Map Surf -- Connolly Molecular Surface Alignment SASA -- Solvent Accessible Surface Area Dihed -- Dihedral Backbone Angles PB -- Protein Blocks Flexible: No -- Only rigid-body transformations are considered between the structures being compared. Yes -- The method allows for some flexibility within the structures being compared, such as movements around hinge regions. Aung, Zeyar; Kian-Lee Tan (Dec 2006). "MatAlign: Precise protein structure comparison by matrix alignment". Journal of Bioinformatics and Computational Biology. 4 (6): 1197-216. ...
FSA is a probabilistic multiple sequence alignment algorithm which uses a distancebased approach to aligning homologous protein RNA or DNA sequences
TY - JOUR. T1 - High performance biological pairwise sequence alignment. T2 - FPGA versus GPU versus cell BE versus GPP. AU - Benkrid, Khaled. AU - Akoglu, Ali. AU - Ling, Cheng. AU - Song, Yang. AU - Liu, Ying. AU - Tian, Xiang. PY - 2012. Y1 - 2012. N2 - This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBMs Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion ...
Download MSAProbs: Multiple Sequence Alignment for free. One of the most accurate multiple protein sequence aligners. MSAProbs is an open-source protein multiple sequence ailgnment algorithm, achieving the stastistically highest alignment accuracy on popular benchmarks: BALIBASE, PREFAB, SABMARK, OXBENCH, compared to ClustalW, MAFFT, MUSCLE, ProbCons and Probalign.
Currently contains parsers and datatypes for: clustalw2, clustalo, mlocarna, cmalign. Clustal tools are multiple sequence alignment tools for biological sequences like DNA, RNA and Protein. For more information on clustal Tools refer to http://www.clustal.org/.. Mlocarna is a multiple sequence alignment tool for RNA sequences with secondary structure output. For more information on mlocarna refer to http://www.bioinf.uni-freiburg.de/Software/LocARNA/.. cmalign is a multiple sequence alignment program based on RNA family models and produces ,among others, clustal output. It is part of infernal http://infernal.janelia.org/.. 4 types of output are parsed. ...
Tburglin wrote: , , Does anybody know of a good multiple sequence alignment software , for protein and DNA that works either under Unix or on the Macintosh? Have a look at Dialign. You can find information about it at http://www.gsf.de/biodv/dialign.html Comments are always welcome! Bye, Korbinian -- Korbinian Grote , GSF Forschungszentrum fuer Umwelt & Gesundheit , Ingolstaedter Landstr. 1 Email: grote at gsf.de , D 85758 Neuherberg / Muenchen (Germany) info: http://www.gsf.de/BIODV ...
Correlated mutation analyses (CMA) on multiple sequence alignments are widely used for the prediction of the function of amino acids. The accuracy of CMA-based predictions is mainly determined by the number of sequences, by their evolutionary distances, and by the quality of the alignments. These criteria are best met in structure-based sequence alignments of large super-families. So far, CMA-techniques have mainly been employed to study the receptor interactions. The present work shows how a novel CMA tool, called Comulator, can be used to determine networks of functionally related residues in enzymes. These analyses provide leads for protein engineering studies that are directed towards modification of enzyme specificity or activity. As proof of concept, Comulator has been applied to four enzyme super-families: the isocitrate lyase/phoshoenol-pyruvate mutase super-family, the hexokinase super-family, the RmlC-like cupin super-family, and the FAD-linked oxidases super-family. In each of those ...
Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments.
Multiple sequence alignment for short sequences Kristóf Takács Multiple sequence alignment (MSA) has been one of the most important problems in bioinformatics for more decades and it is still heavily examined by many mathematicians and biologists. However, mostly because of the practical motivation of this problem, the research on this topic is focused on aligning…
Identification of regions in multiple sequence alignments thermodynamically suitable for targeting by consensus oligonucleotides: application to HIV genome - Background: Computer programs for the generation of multiple sequence alignments such as Clustal W allow detection of regions that are most conserved among many sequence variants. However, even for regions that are equally conserved, their potential utility as hybridization targets varies. Mismatches in sequence variants are more disruptive in some duplexes than in others. Additionally, the propensity for self-interactions amongst oligonucleotides targeting conserved regions differs and the structure of target regions themselves can also influence hybridization efficiency. There is a need to develop software that will employ thermodynamic selection criteria for finding optimal hybridization targets in related sequences. Results: A new scheme and new software for optimal detection of oligonucleotide hybridization targets common to families of
This page offers the web documents that are referred to in Chapter 6. In Chapter 3 we discussed pairwise alignment, and then in Chapters 4 and 5 we described how a protein or DNA query can be compared to a database. This chapter covers a series of approaches to multiple sequence alignment, including the popular method of progressive alignment and new methods such as consistency-based and structure-based alignment. We also discuss ways to multiply align long segments of genomic DNA. ...
Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or
Pairwise Sequence Alignment SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
T-Coffee is a multiple sequence alignment server. It can align Protein, DNA and RNA sequences. You can use T-Coffee to align sequences or to combine the output of your favorite alignment methods into one unique alignment. It is also able to combine sequence information with protein structural information, profile information or RNA secondary structures.
T-Coffee is a multiple sequence alignment server. It can align Protein, DNA and RNA sequences. You can use T-Coffee to align sequences or to combine the output of your favorite alignment methods into one unique alignment. It is also able to combine sequence information with protein structural information, profile information or RNA secondary structures.
This paper presents [email protected], a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB identifiers or directly uploaded into the server. Given a set of sequences and structures, pairs of structures are aligned with SAP while sequence-structure pairs are aligned with Fugue. The resulting collection of pairwise alignments is then combined into an MSA with the T-Coffee algorithm. The server and its documentation are available from http://igs-server.cnrs-mrs.fr/Tcoffee/.. ...
One of the most successful algorithms for computing alignments between sequences is MUMmer [4-6]. The first stage of MUMmer is performed by a component called mummer, which computes exact alignments between the pair of sequences. These alignments can be used directly to infer large-scale sequence structure, or they can be used to seed extensions to longer inexact alignments using the post-processing tools bundled with MUMmer. Unlike other popular sequence alignment programs such as BLAST [7], FASTA [8], and LAGAN [9], which use fixed length seeds for constructing their alignments, mummer alignments are variable-length maximal exact matches, where maximal means that they cannot be extended on either end without introducing a mismatch. First, mummer pre-processes the reference sequence to create a data structure, called a suffix tree. This data structure allows mummer to then compute all maximal exact substring alignments of a query sequence in time proportional to the length of the query. The ...
Announcement: This hands-on computer workshop is designed for people having previous experience with macromolecular visualization in any of the many software packages available. It will focus on the capabilities of Protein Explorer and Chemscape Chime, targeting interests expressed by the participants. Topics may include how to use an automated interface for detailed exploration of noncovalent bonds (the Noncovalent Bond Finder); finding energetically significant cation-pi interactions; generating overviews of noncovalent interactions using "contact surface" displays; how to animate functional conformational changes or movements, such as the binding of calcium to an EF-hand; searching for proteins with similar structures (regardless of sequence) and viewing the resulting structure alignments. We may also create multiple protein sequence alignments and color 3D proteins by conservation and mutation frequency. (If you already have some multiple protein sequence alignments, bring them in FASTA/PIR ...
PROBCONS is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools ...
TY - JOUR. T1 - SinicView. T2 - A visualization environment for comparisons of multiple nucleotide sequence alignment tools. AU - Shih, Arthur Chun Chieh. AU - Lee, D. T.. AU - Lin, Laurent. AU - Peng, Chin Lin. AU - Chen, Shiang Heng. AU - Wu, Yu Wei. AU - Wong, Chun Yi. AU - Chou, Meng Yuan. AU - Shiao, Tze Chang. AU - Hsieh, Mu Fen. PY - 2006/3/2. Y1 - 2006/3/2. N2 - Background: Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of ...
Janine Graves wrote: , Im a mathematician just starting to look at multiple sequence alignment , of DNA. Im not very familiar with the field and seek references on , algorithms or survey articles on this subject. A couple of resources, including a tutorial I wrote for an online course, are listed at http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/welcome.html Id like to take the chance to append the list (below), and ask everyone to please MAIL ME ADDITIONS, CORRECTIONS, etc. Ill update the list, and the newest version will be kept available at the URL above, and its mirrors at http://www.biotech.ist.unige.it/bcd/Curric/MulAli/welcome.html http://merlin.mbcr.bcm.tmc.edu:8001/bcd/MulAli/Curric/welcome.html Also, if anyone is interested to test beta Perl code for handling multiple alignments, please drop me a note ! Multiple Alignment Internet Resource Summary ============================================ Thanks to Christian Frosch for his help in maintaining this list ! Analysis of ...
Sequence similarity with experimentally characterized gene products, as determined by alignments, either pairwise or multiple (tools such as BLAST, ClustalW, MUSCLE). An entry in the with field is mandatory. The ISA code is a sub-category of the ISS code. It should be used whenever a sequence alignment is the basis for making an annotation, but only when a curator has manually reviewed the alignment and choice of GO term or if the information is in a published paper, the authors have manually reviewed the evidence. Such alignments may be pairwise alignments (the alignment of two sequences to one another) or multiple alignments (the alignment of 3 or more sequences to one another). BLAST produces pairwise alignments and any annotations based solely on the evaluation of BLAST results should use this code. GO policy states that in order to assert that a query protein has the same function as a match protein, the match protein MUST be experimentally characterized. This prevents transitive annotation ...
Description:. An X-drop within an alignment, where X,0, is a region of consecutive columns scoring less than -X. Alignments containing no such X-drop are called X-alignments. Obviously, X-alignments avoid the first problem that local alignments contain internal segments scoring less than -X. A normal alignment is an alignment where each prefix or suffix has a non-negative score. Such an alignment is called maximal if it is not contained in any longer normal alignment. Maximal normal alignments clearly avoid the second problem that an entire alignment scores less than a prefix or suffix. The algorithm proposed by Zhang et al. constructs a tree that allows to decompose an alignment into all X-full subalignments where X-full refers to subalignments that are maximal normal alignments and X-alignments. The tree encodes all X-full alignments for all X greater or equal to 0. Hence, the decomposition corresponding to any particular value of X can be readily extracted from the tree. The goal of this ...
I have a set of 520 influenza sequences for which I have already done multiple sequence alignment, and computed the pairwise identity matrix. If Id like to add in another sequence, I have to re-align everything, and recompute the entire PWI matrix. Is there any program I can use to "append" this other sequence to the alignment, and only compute the PWI w.r.t. every other sequence?. A simple example would be as follows. I have a 2x2 alignment, with the following scores.. ...
Template:Text-needed See also Wikiomics:Bioinfo_tutorial#Protein_Alignment Multiple sequence alignment is widely used in the sequence analysis. It is more reliable, and hosts more information than derived from BLAST multiple pairwise alignment. The MSA allows for identification of common regions between proteins (including motifs), finding conserved residues and analysis of evolutionary relationships between sequences. ...
In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the
The performance of our method in the pairwise alignment of human and mouse seems satisfactory but the benefits of structure modelling should be more significant in multiple alignments. First, the alignments of closely related more similar sequences should provide information of the spatial variation of evolutionary processes and help the more difficult alignment of distantly related sequences. Second, multiple sequences provide more information of the sequence structure than two sequences only, and multiple closely related sequences can provide information on features that do not exist in a more distantly related sequence. As the method is progressive, information is generated for each internal node and can be used to study e.g. lineage-specific differences.. As expected, the alignment of very close sequences, such as human and chimpanzee, does not provide information on the sequence structure and, with the exception of long gaps, the posterior probabilities of different structure classes ...
Multiple Sequence Alignment. Definition. Given N sequences x 1 , x 2 ,…, x N : Insert gaps (-) in each sequence x i , such that All sequences have the same length L Score of the global map is maximum. Applications. Scoring Function: Sum Of Pairs . Definition: Induced pairwise alignment Slideshow 1606526 by oral
MSA2SNP is a tool for mining SNP sites in multiple sequence alignment MSA This tool inherits the easytouse interface from MEGA4 Explorer with advance data presentation MSA2SNP lets you visualize alignments and import from CLUSTAL program
Allows to align query sequences against those present in a selected target database. BLAST is a suite of programs, provided by NCBI, which can be used to quickly search a sequence database for matches to a query sequence. The software provides an access point for these tools to perform sequence alignment on the web. The set of BLAST command-line applications is organized in a way that groups together similar types of searches in one application.
CLUSTAL W Thompson JD, Higgins DG, and Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-80. (*)Conreal Berezikov, E., V. Guryev, R.H. Plasterk, and E. Cuppen. 2004. CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res 14: 170-178. DIALIGN 2 B. Morgenstern, K. Frech, A. Dress, and T. Werner. 1998. DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14, 1998, 290-294. Lagan and MultiLagan Brudno, M., C.B. Do, G.M. Cooper, M.F. Kim, E. Davydov, E.D. Green, A. Sidow, and S. Batzoglou. 2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13: 721-731. Mauve Darling, A., Mau, B., Blattner, F.R., and N. Perna. Mauve. Code available ...
Structure-based Multiple Sequence Alignment of Wild-type apo-CobY with the Five Most Structurally Similar Proteins.The alignment, which was carried out using th
Multiple sequence alignment obtained with the ClustalW program of the extracellular lipase LIP2 from Y. lipolytica strains CLIB122 (GenBank Accession No. XP50
Multiple sequence alignment and analysis with Jalview Hands-on Training CourseMultiple sequence alignment and analysis with Jalview Hands-on Training Course. This course is for anyone who works with biological sequences as part of their work or study. The morning session covers introductory material suitable for anyone not familiar with working with sequences and sequence alignments, or who has never edited or published alignments with Jalview. It also provides an introduction to tree based alignment analysis, which is one of the fundamental ways in which biological function and structural information can be extracted from sequence alignments. The afternoon session provides an opportunity to explore Jalviews web based functions, including protein secondary structure and disordered region prediction. The final session will focus on exploring 2D and 3D molecular structure information in the context of multiple sequence alignments. The Jalview training course is a hands-on tutorial consisting of a ...
Hi all.. I have used muscle (3.8) to perform a multiple sequence alignment on 635 tumor suppressor gene sequences and edited (via perl) the output file so it conforms with FASTA. I would like to generate a phylogenetic tree from the msa file. I am enrolled in an introductory level bioinformatics / scientific computing course at a local community college and this would directly relate to my semester project requirement.. Thanks for the help.. Caitlin. ...
Multiple sequence alignment (MSA) is essential as an initial step in studying molecular phylogeny as well as during the identification of genomic rearrangements. Recent advances in sequencing techniques have led to a tremendous increase in the number of sequences to be analyzed. As a result, a greater demand is being placed on visualization techniques, as they have the potential to reveal the underlying information in large-scale MSAs. In this work, we present a novel visualization technique for conveying the patterns in large-scale MSAs. By applying gradient vector flow analysis to the MSA data, we can extract and visually emphasize conservations and other patterns that are relevant during the MSA exploration process. In contrast to the traditional visual representation of MSAs, which exploits color-coded tables, the proposed visual metaphor allows us to provide an overview of large MSAs as well as to highlight global patterns, outliers, and data distributions. We will motivate and describe the
DbClustal takes the results from a protein BLAST search that you provide and creates a multiple sequence alignment using ClustalW2. Both the BLAST tool output and your original query sequence are needed as inputs.
Sequence Alignment Shareware and Freeware Programs - Sequence Alignment (seqalign.sourceforge.net), ClustalX (Plate-Forme de Bio-Informatique), CodonCode Aligner (codoncode.com) ...
PubMed comprises more than 30 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
In this episode youll learn the basic stuff of working with FASTA sequence containers and making multiple sequence alignments ...
Process the multiple sequence alignment files, the results can be used for the other functional functions. Note that the reference sequence should be included as the first sequence.
Multiple Sequence Alignments Figure 1 is a comparison of Arp2 sequences across several different species. Figure 2 is a comparison of Arp3 sequences, similarly across several different species. Figure 3 compares multiple isoforms of the same subunits, i.e. this is a comparison of sequences within the human species. The first sequence is that of the…
This paper describes a novel approach to deal with multiple sequence alignment (MSA). MSA is an essential task in bioinformatics which is at the
This course is an 8 hours primer on sequence alignments. Its goal is to present an overview of the basic concepts of sequence alignments and some of their applications. The first two hours will be dedicated to molecular evolution. We will focus on the implications of molecular evolution on sequence variation. We will use these concepts to define homology. We will then see how specific mathematical models (the substitution matrices) have been derived in order to quantify the evolutionary relationship between sequences. The next two hours will be used to introduce the Needleman and Wunsch algorithm (Dynamic programming), a very basic algorithm that makes it possible to derive pairwise alignments from the sequences while using the substitution matrices. Over the following 2 hours, we will see how these pairwise alignment methods can be applied to database searches and we will develop the main concepts behind the BLAST algorithm. I will finally introduce the notion of multiple sequence alignment and ...
Perrey, S. W., Stoye, J., Moulton, V., and Dress, A. (1997). On Simultaneous versus Iterative Multiple Sequence Alignment. Materialien/Preprints, Forschungsschwerpunkt Mathematisierung, Universität Bielefeld ...
Use VectorBuilders free sequence alignment tool to identify regions of similarity between any two DNA or protein sequences of your interest.
plos.org. Blogs. Collections. Send us feedback. Help using this site. LOCKSS. PLOS is a nonprofit 501(c)(3) corporation, #C2354500, and is based in San Francisco, California, US ...
Transmembrane proteins (TMPs) constitute about 20 30% of all protein coding genes. The relative lack of experimental structure has so far made it hard to develop specific alignment methods and the current state of the art (PRALINE™) only manages to recapitulate 50% of the positions in the reference alignments available from the BAliBASE2-ref7.. ...
PROJECT DESCRIPTION: This class (Biomedical Informatics Methods) project involved developing an application that aligned two DNA or protein sequences using dynamic programming. The main reason behind attempting to arrange two sequences is to identify regions of similarity. Such regions could indicate functional, structural or evolutionary associations between the sequences. Dynamic programming is often used to align sequences. It operates on the assumption that a problem can be broken down in smaller sub-problems, that when solved will provide the global optimal solution.. The application for the class project was developed using Visual C++. Paul Reiners has provided an excellent tutorial on sequence alignment. Sequence alignment is a fun project to flex your programming muscles on a real-world problem. If you need to verify your results with an unweighted dynamic programming method, here is a link to the program I developed for my class.. ROLE: Application Developer. STATUS: Completed ...
This document describes the WWW BLAST interface. BLAST (Basic Local Alignment Search Tool) is the heuristic search algorithm employed by the programs blastp, blastn, blastx, tblastn, and tblastx; these programs ascribe signi- ficance to their findings using the statistical methods of Karlin and Altschul (1990, 1993) with a few enhancements. The BLAST programs were tailored for sequence similarity searching -- for example to identify homologs to a query sequence. The programs are not generally useful for motif- style searching. For a discussion of basic issues in simi- larity searching of sequence databases, see Altschul et al. (1994). The five BLAST programs described here perform the following tasks: blastp compares an amino acid query sequence against a protein sequence database; blastn compares a nucleotide query sequence against a nucleotide sequence database; blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein ...
Video created by 北京大学 for the course 生物信息学: 导论与方法. Upon completion of this module, you will be able to: describe dynamic programming based sequence alignment algorithms; differentiate between the Needleman-Wunsch algorithm for global alignment ...
Sentence alignment of sequences in two languages is a solved problem (:TODO: some refs). Aligning more than two languages can be performed by merging the results of pairwise alignments. However, this is not trivial since the pairwise alignments often dont agree ...
Reorganization of multiple sequence alignment tasks - Tasks for displaying and working with multiple sequence alignments have been moved from bldna and blprotein into two new BioLegato interfaces: blnalign for aligned DNA or RNA sequences, and blpalign for aligned protein sequences. For example, if you run TCOFFEE from blprotein to produce a multiple alignment, the output will be sent to blpalign. blpalign can be used to run tasks that only make sense with aligned sequences, such as phylogenetic analysis or alignment display. By making BioLegato adhere more strictly to object-oriented organization, it becomes more difficult to accidentally run tasks such as phylogeny using unaligned sequences as input ...
Reorganization of multiple sequence alignment tasks - Tasks for displaying and working with multiple sequence alignments have been moved from bldna and blprotein into two new BioLegato interfaces: blnalign for aligned DNA or RNA sequences, and blpalign for aligned protein sequences. For example, if you run TCOFFEE from blprotein to produce a multiple alignment, the output will be sent to blpalign. blpalign can be used to run tasks that only make sense with aligned sequences, such as phylogenetic analysis or alignment display. By making BioLegato adhere more strictly to object-oriented organization, it becomes more difficult to accidentally run tasks such as phylogeny using unaligned sequences as input ...
TY - JOUR. T1 - DDSGA. T2 - A data-driven semi-global alignment approach for detecting masquerade attacks. AU - Kholidy, Hisham A.. AU - Baiardi, Fabrizio. AU - Hariri, Salim A. PY - 2015/3/1. Y1 - 2015/3/1. N2 - A masquerade attacker impersonates a legal user to utilize the user services and privileges. The semi-global alignment algorithm (SGA) is one of the most effective and efficient techniques to detect these attacks but it has not reached yet the accuracy and performance required by large scale, multiuser systems. To improve both the effectiveness and the performances of this algorithm, we propose the Data-Driven Semi-Global Alignment, DDSGA approach. From the security effectiveness view point, DDSGA improves the scoring systems by adopting distinct alignment parameters for each user. Furthermore, it tolerates small mutations in user command sequences by allowing small changes in the low-level representation of the commands functionality. It also adapts to changes in the user behaviour by ...
Belvu is used in the manual curation of high-quality "seed" alignments for the Pfam database [11]. Annotators might start with an alignment from MUSCLE or MAFFT, for example, and use Belvu to trim the ends of the alignment to the best conservation, and remove gappy and partial sequences. They use Belvu to analyse conservation patterns, sorting alphabetically to see readily repeated domains on a sequence, or sorting by tree order to see simple evolutionary relationships. They can also sort by similarity to a specific sequence, which is useful when trying to spot false positives. Redundant sequences are removed in order to see the variation across the whole. Once of a high enough quality, the seed alignment is then used to automatically generate a "full" alignment, which contains all detectable protein sequences belonging to the family.. There are many MSA viewers, editors and phylogenetic tools available, offering a wide variety of features. To name but a few: Jalview2, ClustalX, UGENE, AliView, ...
Phase 1 begins with unaligned sequences and selects a subset (called the "backbone dataset") of the sequences; the remaining sequences are the "query sequences". Phase 2 uses PASTA [16, 17] to compute a MSA and ML tree (which is unrooted) on the backbone sequences; these are called the "backbone alignment" and "backbone tree", respectively. As PASTA is a global alignment method and is not designed for the alignment of fragmentary sequences, UPP preferentially selects the backbone sequences from those that are considered to be full length. To determine which sequences are "full length", UPP only includes backbone sequences within 25 % of the length of the typical sequence for the given locus. If the typical length of the locus is not known, we use the median length of the input sequences as an estimate of the average length for thelocus.. This part of UPPs algorithmic design is similar to alignment methods that are based on seed alignments (e.g., the technique used in Infernal [18]), but there ...
Biopython - Sequence Alignments - Sequence alignment is the process of arranging two or more sequences (of DNA, RNA or protein sequences) in a specific order to identify the region of similarity
The data-sets are up to date with PDB Nov. 2008, SCOP 1.73 and Sisyphus 1.3. We introduced an xml-based file format to specify the reference alignments. Since SCOP and Sisyphus may refer to older PDB entries we mapped the chain ids to PDB Nov. 2008. Additionally we provide PDB style files which are referenced in the xml-files. If you use the data-set you should use PDB files provided here. For details specific for a certain set please refer to the set specific pages.. The xml format is used for pairwise and multiple alignments. Each alignment in turn may contain alternative solutions. A certain alternative alignment is written in a row format. Below we show an excerpt of a case from the RIPC set:. ...
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.. A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable ...
1. Hendrickson, D.G., Hogan, D.J., Herschlag, D., Ferrell, J.E. and Brown, P.O. (2008) Systematic identification of mRNAs recruited to argonaute 2 by specific microRNAs and corresponding changes in transcript abundance. PLoS ONE, 3, e2126. PubMed. 2. Elemento, O., Slonim, N. and Tavazoie, S. (2007) A universal framework for regulatory element discovery across all genomes and data types. Mol Cell, 28, 337-350. PubMed. 3. Gardner, P.P., Wilm, A. and Washietl, S. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res, 33, 2433-2439. PubMed. 4. Bartel, D.P. (2009) MicroRNAs: target recognition and regulatory functions. Cell, 136, 215-233. PubMed. 5. Gruber, A.R., Bernhart, S.H., Hofacker, I.L. and Washietl, S. (2008) Strategies for measuring evolutionary conservation of RNA secondary structures. BMC Bioinformatics, 9, 122. PubMed. 6. Mignone, F., Grillo, G., Licciulli, F., Iacono, M., Liuni, S., Kersey, P.J., Duarte, J., Saccone, C. and Pesole, G. (2005) ...
Many strategies have been developed to predict the function of amino acids and the effects of mutations. A multiple sequence alignment for a protein superfamily can be a powerful tool to transfer such information, but it also contains other relevant information about sequence variation and correlated mutations, for example. 3DM ... read more is a molecular-class-specific information system that creates an accurate structure-based multiple sequence alignment. Many derived data, such as correlated mutations, sequence variation, homology models, automatic mutation analyses, etc. are included. All of the information is stored in a relational database that revolves around a comprehensive 3D numbering scheme that encompasses all structurally equivalent positions, which allows the linking of all available data and the transfer of information between all sequences and structures. When building the 3DM for VHHs it was decided to not include the CDRs because their alignment is not reliably possible in an ...
Background Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. Methodology/Principal Findings We introduce Phylo, a human-based computing framework applying
Decrypt aligners, Decrypt letters aligners, Word Decoder for aligners, Word generator using the letters aligners, Word Solver aligners, Possible Crypter words with aligners, Anagram of aligners
Multiple Sequence Alignment with Jalview and Protein Structure and Function Modelling http://www.jalview.org/training/training-courses/Multiple-Sequence-Alignment-with-Jalview-and-Protein-Structure-and https://tess.elixir-europe.org/events/multiple-sequence-alignment-with-jalview-and-protein-structure-and-function Date: Monday 14th to Tuesday 15th May 2018 Time: 9.00 to 17.00 Location: MSTC, Sherrington Building, University of Liverpool, L69 3BX Overview This two day hands-on training course is aimed at students and researchers who want to gain practical understanding of the tools and approaches for protein sequence, structure and function prediction and analysis. In day 1, participants will be introduced to Jalview - a free desktop application for the visualisation and comparative analysis of protein, DNA and RNA sequences. Jalview can integrate data from Ensembl, Uniprot, PDBe, Rfam and Pfam, and can access a range of tools for multiple sequence alignment, conservation analysis and secondary ...
Summary: infernal builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments.. Availability: Source code, documentation and benchmark downloadable from http://infernal.janelia.org. infernal is freely licensed under the GNU GPLv3 and should be portable to any POSIX-compliant operating system, including Linux and Mac OS/X.. Contact: nawrockie,kolbed,[email protected] ...
Our final step is to align all of these structures using Chimeras Sequence/Structure tools6. Some important notes about this procedure. First, this is a pairwise alignment method, so were going to align each structure to 1EZ2. As a result, care must be taken when interpreting the results, just as when viewing the results from pairwise blast values. Second, the alignment reports three values: RMSD, Aligned Pairs, and Score. The alignment score provides a rough indication of similarity in sequence and secondary structure. Unfortunately, there is currently no agreed-upon metric for structural alignment beyond RMSD, which might be misleading when differing numbers of residues are used for the alignment.. To align the structures, we will use the Chimera menu of the Molecular Structure Navigator: Chimera→Align structures→by model. This will bring up the Cytoscape/Chimera Structure Alignment Dialog. Because we are doing pairwise alignments, we need to select a reference structure, then select all ...
PubMed comprises more than 30 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds ...
hello there and thank you for your information - Ive definitely picked up anything new from right here. I did however expertise a few technical points using this web site, since I experienced to reload the site many times previous to I could get it to load properly. I had been wondering if your web host is OK? Not that I am complaining, but slow loading instances times will very frequently affect your placement in google and could damage your high quality score if ads and marketing with Adwords. Anyway I am adding this RSS to my e-mail and can look out for a lot more of your respective fascinating content. Ensure that you update this again very soon ...
The Ensembl project is pleased to announce release 56 of Ensembl (http://e56.ensembl.org/). Highlights of this release are:. Reintroduction of our multi-species views. Alignments (image), formerly alignsliceview, shows pairwise or multiple alignments from the Ensembl Compara database, highlighting any gaps in the alignment.. Multi-species view, formerly known as multicontigview, displays pairwise alignments without gaps; multiple pairwise alignments can be configured to create a multiple alignment display. As well as genes, other types of features such as regulatory features can be displayed in this view, making this a very useful display for comparative genomic analysis.. A new tab has been added in release 56 based on a Regulatory Feature object. This will enable better display some of the data underlying the Ensembl regulatory build. The new pages are accessed from the gene displays by clicking on the Regulation link in the left-hand menu and then clicking on a regulatory stable ID in ...
In this case the sequences are stored in alignment_ds. The chosen Gotoh algorithm uses affine gap costs and it is configured with a scoring scheme, where matches are scored +1, mismatches -1, a gap-opening -2, and a gap-extension -1. The globalAlignment call returns the score of the alignment and stores the actual alignment in alignment_ds, which could be an Alignment Graph or an Align data structure. If it is an alignment graph, its textual representation in PipMaker format is illustrated in the following figure ...
Annotation of each assembled transcriptome was done with the Trinotate annotation suite (http://trinityrnaseq.source forge.net/annotation/Trinotate.html, last accessed April 13, 2014). In brief, TransDecoder (Haas et al. 2013) was first used to predict open reading frames (ORFs) of at least 300 bp. If multiple, overlapping ORFs were present in the same contig, only the longest ORF was retained. In contrast, if multiple but nonoverlapping 300 bp ORFs were identified, all were retained. Thus, two or more ORFs could originate from the same transcript (i.e., ORFs on both forward and reverse strands and/or multiple ORFs on the same strand for long contigs). Untranslated transcripts and translated ORFs were then queried against the Swiss-Prot database (UniProt Consortium 2014) using Basic Local Alignment Search Tool x (BLASTx) and BLASTp, respectively (Altschul et al. 1997), with annotation coming from the best BLAST hit and associated Gene Ontology (GO) terms (Ashburner et al. 2000). Trinotate then ...
MUMmerGPU is an high-throughput parallel pairwise local sequence alignment program. It uses the GPU to simultaneously align multiple query sequences against a single reference sequence stored as a suffix tree. Michael Schatz and Cole Trapnell from Center for Bioinformatics and Computation Biology, University of Maryland College Park, contributes the MUMmerGPU implementation to Rodinia Link: http://mummergpu.sourceforge.net ...
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. ...
We include here a sequence alignment that contains the sequences of the Envelopes used reference panel, aligned with and the standard subtype reference sequences from the Los Alamos database. The sequences names include, separated by periods: ...
The discovery of dilute liquid crystalline media to align biological macromolecules has opened many new possibilities to study protein and nucleic acid structures by NMR spectroscopy. We inspect the basic alignment phenomenon for an ensemble of protein conformations to deduce relative contributions of each member to the residual dipolar coupling signals. We find that molecular fluctuations can affect the alignment and discover a resulting emphasis of certain conformations. However, the internal fluctuations are largely uncorrelated with those of the alignment, implying that proteins have liquidlike molecular surfaces. Furthermore, we consider the implications of a dynamic bias to structure determination using data from the weak alignment method ...
log in you can tag this publication with additional keywords A publication can refer to another publication (outgoing references) or it can be referred to by other publications (incoming references).. ...
We hope it is now evident that there are many use cases for sequence alignment in humanities applications, and we now turn our attention to a more extensive discussion of one particular use case: our ongoing work to identify the sources of Diderot and dAlemberts Encyclopédie. The Encyclopédie ou Dictionnaire raisonné des sciences, des arts et des métiers, edited by Denis Diderot and Jean le Rond dAlembert, was one of the crowning achievements of the French Enlightenment. Published in Paris between 1751 and 1772 in 17 volumes of text and 11 volumes of plates, this monumental work contains some 77,000 articles written by more than 130 contributors. As with all reference works, the authors and editors of the Encyclopédie made extensive use of a vast array of contemporary reference works and scholarship to complete their massive compendium of enlightened knowledge. Identification of the sources used by the philosophes is a massive undertaking in itself, as the authors rarely acknowledged the ...
Blast for Audio Sequences Alignment: a Fast Scalable Cover Identification. . Biblioteca virtual para leer y descargar libros, documentos, trabajos y tesis universitarias en PDF. Material universiario, documentación y tareas realizadas por universitarios en nuestra biblioteca. Para descargar gratis y para leer online.
The successes of RepProfile, both in simulation and validation, show that short reads can predict RNA editing even when standard alignment techniques cannot produce confident alignments. Even if repeats are locally identical, they are likely to form different RNA secondary structures in the context of different transcripts, leading to unique editing patterns. Additionally there may be cell-specific factors that further differentiate hyper-editing patterns. Thus, when endogenous dsRNAs are "marked" by ADAR modification with a unique editing pattern, RepProfile can distinguish between identical repeats.. As far as the authors know, RepProfile is the only tool capable of using RNAseq data to accurately find RNA hyper-editing (or position variation in general) within sequences that form long, perfect dsRNA. RepProfile reveals RNA duplexes with hundreds of edited positions, where other methods, reliant on unambiguous alignment to single reference genome, find few or No sites. Because almost all ...
On Combining Sequence Alignment and Feature-Quantization for Sub-Image Searching: 10.4018/jmdem.2012070102: The availability of various photo archives and photo sharing systems made similarity searching much more important because the photos are not usually
Title : WEIGHT Args : sim or sim_,matrix_name or matrix_file, or integer value Default : sim Description : Weight defines the way alignments are weighted when turned into a library. sim indicates that the weight equals the average identity within the match residues. sim_matrix_name indicates the average identity with two residues regarded as identical when their substitution value is positive. The valid matrices names are in matrices.h (pam250mt) . Matrices not found in this header are considered to be filenames. See the format section for matrices. For instance, -weight=sim_pam250mt indicates that the grouping used for similarity will be the set of classes with positive substitutions. Other groups include sim_clustalw_col ( categories of clustalw marked with :) sim_clustalw_dot ( categories of clustalw marked with .) Value indicates that all the pairs found in the alignments must be given the same weight equal to value. This is useful when the alignment one wishes to turn into a library must be ...
download operator theory in function spaces and banach lattices essays dedicated to a.c. zaanen on the occasion of: indifferent chamber of hospital problems been by ancestor ancestor items. 2010 mode; 38(Web Server language): W7-13. MAFFT Multiple Sequence Alignment Software Version 7: peoples in Performance and Usability.
BLAST stands for Basic Local Alignment Search Tool.The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of your sequence.
Abstract: The primary structure of a ribonucleic acid (RNA) molecule can be represented as a sequence of nucleotides (bases) over the alphabet {A, C, G, U}. The secondary or tertiary structure of an RNA is a set of base pairs which form bonds between A-U and G-C. For secondary structures, these bonds have been traditionally assumed to be one-to-one and non-crossing. This paper considers pattern matching as well as local alignment between two RNA structures. For pattern matching, we present two algorithms, one for obtaining an exact match, the other for approximate match. We then present an algorithm for RNA local structural alignment ...
The beginning of the ORF1 was identified in all sequence alignments except for two of the four L2 subgroup 8 sequences, which are also lacking the 5 untranslated region (UTR). Three main domains were identified, a gag-like CCHC domain, an RRM motif and a PHD (Table 1). A sequence logo of the PHD and CCHC domains for all sequences in which they were found is shown in Additional file 2. Alignments for two examples of RRM domains (CR1 subgroup 3 and L2 subgroup 6) are shown in Additional file 3. The number of sequences in each subgroup from the three lineages, that is, CR1, L2 and Jockey, and the domains identified, are summarized in Table 1. Pairwise identity at the amino acid level for the ORF1 domain sequence alignments range from 21.7 to 85.9% and probabilities from 37.1 to 100% (Table 1). Only four domains have probabilities less than 85%, the RRM domain in the L2 subgroup 2, the zinc finger in the CR1 subgroup 7, the RRM domain in CR1 subgroup 4 and the RRM + CTD domain in CR1 subgroup ...
Title:Genetic Algorithms with Permutation Coding for Multiple Sequence Alignment. VOLUME: 7 ISSUE: 2. Author(s):Mohamed Tahar Ben Othman and Gamil Abdel-Azim. Affiliation:Qassim University, College of Computer, Saudi Arabia.. Keywords:Genetics algorithms Combinatorial, Optimization, Sequence alignment, DNA, Computational molecular biology, Permutation Coding.. Abstract:Multiple sequence alignment (MSA) is one of the topics of bio informatics that has seriously been researched. It is known as NP-complete problem. It is also considered as one of the most important and daunting tasks in computational biology. Concerning this a wide number of heuristic algorithms have been proposed to find optimal alignment. Among these heuristic algorithms are genetic algorithms (GA). The GA has mainly two major weaknesses: it is time consuming and can cause local minima. One of the significant aspects in the GA process in MSA is to maximize the similarities between sequences by adding and shuffling the gaps of ...
In 2000, a fast implementation of the Smith-Waterman algorithm using the SIMD technology available in Intel Pentium MMX processors and similar technology was described in a publication by Rognes and Seeberg.[22] In contrast to the Wozniak (1997) approach, the new implementation was based on vectors parallel with the query sequence, not diagonal vectors. The company Sencel Bioinformatics has applied for a patent covering this approach. Sencel is developing the software further and provides executables for academic use free of charge.. A SSE2 vectorization of the algorithm (Farrar, 2007) is now available providing an 8-16-fold speedup on Intel/AMD processors with SSE2 extensions.[13] When running on Intel processor using the Core microarchitecture the SSE2 implementation achieves a 20-fold increase. Farrars SSE2 implementation is available as the SSEARCH program in the FASTA sequence comparison package. The SSEARCH is included in the European Bioinformatics Institutes suite of similarity ...
In chromatography-based metabonomic research, retention time (RT) alignment of chromatographic peaks poses a challenge for the accurate profiling of biomarkers. Although a number of RT alignment software has been reported, the performance of these software packages have not been comprehensively evaluated. This study aimed to evaluate the RT alignment accuracy of publicly available and commercial RT alignment software. Two gas chromatography/mass spectrometry (GC/MS) datasets acquired from a mixture of standard metabolites and human bladder cancer urine samples, were used to assess three publicly available software packages, MetAlign, MZmine and TagFinder, and two commercial applications comprising the Calibration feature and Statistical Compare of ChromaTOF software. The overall RT alignment accuracies in aligning standard compounds mixture were 93, 92, 74, 73 and 42% for Calibration feature, MZmine, MetAlign, Statistical Compare and TagFinder, respectively. Additionally, unique trends were ...
Protein-binding sites prediction lays a foundation for functional annotation of protein and structure-based drug design. As the number of available protein structures increases, structural alignment based algorithm becomes the dominant approach for protein-binding sites prediction. However, the present algorithms underutilize the ever increasing numbers of three-dimensional protein-ligand complex structures (bound protein), and it could be improved on the process of alignment, selection of templates and clustering of template. Herein, we built so far the largest database of bound templates with stringent quality control. And on this basis, bSiteFinder as a protein-binding sites prediction server was developed. By introducing Homology Indexing, Chain Length Indexing, Stability of Complex and Optimized Multiple-Templates Clustering into our algorithm, the efficiency of our server has been significantly improved. Further, the accuracy was approximately 2-10 % higher than that of other algorithms for the
Altschul S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:3389-3402. [PubMED]. Anton J., E. Llobet-Brossa, F. Rodríguez-Valera and R. I. Amann. 1999. Fluorescence in situ hybridization analysis of the prokaryotic community inhabiting crystallizer ponds. Environ. Microbiol. 1:517-523. [PubMED]. Anton J., R. Rosselló-Mora, F. Rodríguez-Valera and R. Amann. 2000. Extremely halophilic Bacteria in crystallizer ponds from solar salterns. Appl. Environ. Microbiol. 66:3052-3057. [PubMED]. Anton J., A. Pena, F. Santos, M. Martinez-Garcia, P. Schmitt-Kopplin and R. Rossello-Mora. 2008. Distribution, abundance and diversity of the extremely halophilic bacterium. Salinibacter ruber. Saline Systems. 4:15. [PubMED]. Baati H., S. Guermazi, R. Amdouni, N. Gharsallah, A. Sghir and E. Ammar. 2008. Prokaryotic diversity of a Tunisian multipond solar saltern. Extremophiles. 12:505-518. [PubMED]. Benlloch S., A. López-López, E. O. Casamayor, L. ...
The meatus opening of the un aumento del glans penis g-visc is located at the tip of the glans penis. Author information 1 Center for Urethral and Genitalia Reconstructive Surgery, Arezzo, Italy. Un aumento del glans penis g-visc E 1Berdondini ELazzeri MMirri FBarbagli Un aumento del glans penis g-visc. Proteins BioSystems BLAST Basic Local Alignment Search Tool BLAST Stand-alone BLAST Link BLink Un aumento del glans penis g-visc Domain Database CDD Conserved Domain Search Service CD Search E-Utilities ProSplign Protein Clusters Protein Database Reference Sequence RefSeq All Proteins Resources Taxonomy Taxonomy Taxonomy Browser Taxonomy Common Tree All Taxonomy Resources Sequence Analysis BLAST Basic Local Alignment Search Tool BLAST Stand-alone BLAST Link BLink Conserved Un aumento del glans penis g-visc Search Service Un aumento del glans penis g-visc Search Genome ProtMap Genome Workbench Influenza Virus Primer-BLAST ProSplign Splign All Sequence Analysis Resources Add to My Bibliography. ...
Paulownia fortunei is an ecologically and economically important tree species that is widely used as timber and chemical pulp. Its autotetraploid, which carries a number of valuable traits, was successfully induced with colchicine. To identify differences in gene expression between P. fortunei and its synthesized autotetraploid, we performed transcriptome sequencing using an Illumina Genome Analyzer IIx (GAIIx). About 94.8 million reads were generated and assembled into 383,056 transcripts, including 18,984 transcripts with a complete open reading frame. A conducted Basic Local Alignment Search Tool (BLAST) search indicated that 16,004 complete transcripts had significant hits in the National Center for Biotechnology Information (NCBI) non-redundant database. The complete transcripts were given functional assignments using three public protein databases. One thousand one hundred fifty eight differentially expressed complete transcripts were screened through a digital abundance analysis, including
CiteSeerX - Scientific documents that cite the following paper: 119931, A decision graph explanation of protein secondary structure prediction
This paper describes a Bayesian learning based approach to protein secondary structure prediction. Four secondary structure types are considered, including

Multiple sequence alignment - WikipediaMultiple sequence alignment - Wikipedia

A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or ... Multiple sequence alignment also refers to the process of aligning such a sequence set. Because three or more sequences of ... Multiple sequence alignment viewers enable alignments to be visually reviewed, often by inspecting the quality of alignment for ... Grasso C, Lee C (2004). "Combining partial order alignment and progressive multiple sequence alignment increases alignment ...
more infohttps://en.wikipedia.org/wiki/Multiple_Sequence_Alignment

Incremental Multiple Sequence Alignment | SpringerLinkIncremental Multiple Sequence Alignment | SpringerLink

This work proposes a new approach to the alignment of multiple sequences. We take profit from some results on Grammatical ... improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap ... Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1), 1-14 (2002)CrossRefGoogle ... Grammatical inference processing of biosequences multiple alignment of sequences This work is partially supported by the ...
more infohttps://link.springer.com/chapter/10.1007/978-3-540-76725-1_63

multiple sequence alignmentmultiple sequence alignment

I asked for help finding a survey of multiple sequence ,alignment software. Many people responded by e-mail. Many others asked ... multiple sequence alignment. Lloyd Allison lloyd at cs.monash.edu.au Tue Nov 14 01:25:41 EST 1995 *Previous message: multiple ... me , lots of references in ,URL:http://www.cs.monash.edu.au/~lloyd/tildeBIB/index.html, under keywords like multiple alignment ... Previous message: multiple sequence alignment *Next message: multiple sequence alignment * Messages sorted by: [ date ] [ ...
more infohttp://bio.net/bionet/mm/comp-bio/1995-November/000809.html

sequence alignment algorithmssequence alignment algorithms

... Susan Jane Hogarth sjhogart at unity.ncsu.edu Mon Jan 6 13:20:56 EST 1997 *Previous message: ... The first thing I want the program to do, however, is a multiple sequence alignment. I realise this is like reinventing the ...
more infohttp://www.bio.net/bionet/mm/bio-soft/1997-January/015969.html

MSAProbs: Multiple Sequence Alignment download | SourceForge.netMSAProbs: Multiple Sequence Alignment download | SourceForge.net

MSAProbs is an open-source protein multiple sequence ailgnment algorithm, achieving the stastistically highest alignment ... One of the most accurate multiple protein sequence aligners. ... MSAProbs: Multiple Sequence Alignment. beta One of the most ... MSAProbs: Multiple Sequence Alignment Web Site Categories. Algorithms, Bio-Informatics. License. Apache Software License, GNU ... MSAProbs is an open-source protein multiple sequence ailgnment algorithm, achieving the stastistically highest alignment ...
more infohttps://sourceforge.net/projects/msaprobs/

DbClustal | Multiple Sequence Alignment | EMBL-EBIDbClustal | Multiple Sequence Alignment | EMBL-EBI

Both the BLAST tool output and your original query sequence are needed as inputs. ... DbClustal takes the results from a protein BLAST search that you provide and creates a multiple sequence alignment using ... Tools , Multiple Sequence Alignment , DbClustal. Service Retirement. Wise2DBA and Promoterwise are scheduled for retirement on ... To access similar services, please visit the Multiple Sequence Alignment tools page. If you have any questions/concerns please ...
more infohttp://www.ebi.ac.uk/Tools/services/web/toolform.ebi?tool=dbclustal

Sequence alignment - WikipediaSequence alignment - Wikipedia

Multiple sequence alignment is an extension of pairwise alignment to incorporate more than two sequences at a time. Multiple ... alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time, but they are efficient ... Computational approaches to sequence alignment generally fall into two categories: global alignments and local alignments. ... constructs global multiple sequence alignments that attempt to align short conserved sequence motifs among the sequences in the ...
more infohttps://en.m.wikipedia.org/wiki/Sequence_alignment

Burrows-Wheeler DNA Sequence Alignment-Removing Load ImbalanceBurrows-Wheeler DNA Sequence Alignment-Removing Load Imbalance

... language speed Burrows-Wheeler aln program performance in DNA sequence alignment optimizations. ... Sequencing costs have decreased dramatically over the last years, and with the new generation of machines the mythical $1000 ... As this will have an immediate effect on the sample sizes used in sequencing studies, it is crucial to improve the efficiency ... This article focuses on recent advances of the ExaScience Life Lab in optimizing the alignment phase of whole-genome processing ...
more infohttps://www.intel.com/content/www/us/en/healthcare-it/solutions/documents/removing-load-imbalance-burrows-wheeler-sequence-alignment-paper.html

Hardware Accelerated Sequence Alignment with TracebackHardware Accelerated Sequence Alignment with Traceback

... Scott Lloyd and Quinn O. Snell ... Scott Lloyd and Quinn O. Snell, "Hardware Accelerated Sequence Alignment with Traceback," International Journal of ...
more infohttps://www.hindawi.com/journals/ijrc/2009/762362/cta/

Bioclusters] Parallel Sequence Alignment toolBioclusters] Parallel Sequence Alignment tool

Parallel Sequence Alignment tool ,,, ,,, Does anyone have recommnedations for a parallel sequence alignment tool ,,, ,,, User ... Bioclusters] Parallel Sequence Alignment tool. jgans jgans at lanl.gov Tue Aug 25 11:04:35 EDT 2009 *Previous message: [ ... I only modified the first stage pairwise alignment portion of the code). Regards, Jason Gans Bioscience Division, B-7 Los ... Previous message: [Bioclusters] Parallel Sequence Alignment tool *Next message: [Bioclusters] Parallel Sequence Alignment tool ...
more infohttp://bioinformatics.org/pipermail/bioclusters/2009-August/003449.html

Sequence alignment package for LaTeX?Sequence alignment package for LaTeX?

... Francois Jeanmougin pingouin at crystal.u-strasbg.fr Wed Aug 12 06:42:06 EST 1998 * ... Does anyone happen to know of a package for displaying sequence , alignments in LaTex? I use alscript and then import ...
more infohttp://www.bio.net/bionet/mm/bio-soft/1998-August/019270.html

Sequence alignment (howto) - Bioinformatics.Org WikiSequence alignment (howto) - Bioinformatics.Org Wiki

Tips for alignment. * Use an appropriately divergent matrix. * Reduce your gap penalty relative to that you used for your ... Use the MaxSegs/Waterman-Eggert version of the dynamic programming algorithm to provide the best local alignment and also to ...
more infohttp://www.bioinformatics.org/wiki/Sequence_alignment_

Non-approximability of Weighted Multiple Sequence Alignment | SpringerLinkNon-approximability of Weighted Multiple Sequence Alignment | SpringerLink

Multiple sequence alignment without weights is known to be NP-complete and can be approximated within a... ... We consider a weighted generalization of multiple sequence alignment with sum-of-pair score. ... We consider a weighted generalization of multiple sequence alignment with sum-of-pair score. Multiple sequence alignment ... Weighted multiple sequence alignment can be approximated within a factor of O(log2 n) where n is the number of sequences. ...
more infohttps://link.springer.com/chapter/10.1007%2F3-540-44679-6_9

DNA Sequence AlignmentsDNA Sequence Alignments

This page presents several data matrices and DNA sequence alignments published or presented by staff and collaborators of the ...
more infohttps://sciweb.nybg.org/Science2/cullb/dna.html

Multiple sequence alignment with Clustal X.  - PubMed - NCBIMultiple sequence alignment with Clustal X. - PubMed - NCBI

Multiple sequence alignment with Clustal X.. Jeanmougin F1, Thompson JD, Gouy M, Higgins DG, Gibson TJ. ...
more infohttps://www.ncbi.nlm.nih.gov/pubmed/9810230?dopt=Abstract

Sequence Alignment & Analysis: New in Mathematica 7Sequence Alignment & Analysis: New in Mathematica 7

Mathematica 7 adds sequence analysis tools that operate on both strings and general lists, and are fully integrated into the ... Mathematica 7 adds industrial-strength state-of-the-art sequence analysis tools. Suitable for bioinformatics, text analysis and ... other applications, the sequence analysis tools operate on both strings and general lists, and are fully integrated into the ... Rapidly Visualize Large-Scale Sequence Similarity. Solve Classic Sequence Similarity Problems. Generate Sequence Alignments in ...
more infohttps://www.wolfram.com/mathematica/newin7/content/SequenceAlignmentAndAnalysis/

The Sequence Alignment/Map format and SAMtools.  - PubMed - NCBIThe Sequence Alignment/Map format and SAMtools. - PubMed - NCBI

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, ... The Sequence Alignment/Map format and SAMtools.. Li H1, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G ... Padding operations can be absent when an aligner does not support multiple sequence alignment. The last six bases of read r003 ... The CIGAR string for this alignment contains a P. (padding) operation which correctly aligns the inserted sequences. ...
more infohttps://www.ncbi.nlm.nih.gov/pubmed/19505943?dopt=Abstract

Monte Carlo Sequence AlignmentMonte Carlo Sequence Alignment

Churchill, Gary A.; Cornell University. Biometrics Unit.; Cornell University. Dept. of Biometrics.; Cornell University. Dept. of Biological Statistics and Computational Biology ...
more infohttps://ecommons.cornell.edu/handle/1813/31985

Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence AlignmentLearning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

Our approach applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of ... Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. Regina Barzilay and Lillian Lee. ... An Unsupervised Approach Using Multiple-Sequence Alignment}, year = {2003}, pages = {16--23}, booktitle = {Proceedings of HLT- ...
more infohttp://www.cs.cornell.edu/home/llee/papers/statpar.home.html

Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSAMultiple Sequence Alignment Using a Genetic Algorithm and GLOCSA

... Edgar D. Arenas-Díaz,1 Helga Ochoterena,2 and Katya Rodríguez ... Edgar D. Arenas-Díaz, Helga Ochoterena, and Katya Rodríguez-Vázquez, "Multiple Sequence Alignment Using a Genetic Algorithm and ...
more infohttps://www.hindawi.com/journals/jaea/2009/963150/cta/

ISA: Inferred from Sequence Alignment | Gene Ontology ConsortiumISA: Inferred from Sequence Alignment | Gene Ontology Consortium

ISA: Inferred from Sequence Alignment. ISA: Inferred from Sequence Alignment. *Sequence similarity with experimentally ... Such alignments may be pairwise alignments (the alignment of two sequences to one another) or multiple alignments (the ... A curator performs sequence similarity analysis on a group of genes, (e.g. sequence similarity alignments of the human NDUFS8 ... If the process used by the curator for evaluation of the sequence alignments is not in a published paper they should refer to a ...
more infohttp://geneontology.org/page/isa-inferred-sequence-alignment

BiO BB] Sequence alignment to whole genome sequenceBiO BB] Sequence alignment to whole genome sequence

... Martin Gollery marty.gollery at gmail.com Fri Sep 16 18:37:38 EDT 2005 * ... Next message (by thread): [BiO BB] Sequence alignment to whole genome sequence ... Next message (by thread): [BiO BB] Sequence alignment to whole genome sequence ... hundreds of 700 bp sequences to a whole genome , sequence (around 8Mb) of a closely related species/strain? , , TIA for your ...
more infohttp://www.bioinformatics.org/pipermail/bbb/2005-September/002704.html

Global Alignment by Dynamic Programming - Sequence Alignment | CourseraGlobal Alignment by Dynamic Programming - Sequence Alignment | Coursera

... describe dynamic programming based sequence alignment algorithms; differentiate ... ... Sequence Alignment. Upon completion of this module, you will be able to: describe dynamic programming based sequence alignment ... that the best alignment up to a certain position is the best alignment for all previous residues plus the best alignment for ... The input data forpairwise sequence alignment are two sequences S1 and S2. ...
more infohttps://www.coursera.org/lecture/bioinformatics-pku/global-alignment-by-dynamic-programming-9d6N5

lopez-et-al 2010 | Phylogenetic Tree | Sequence Alignmentlopez-et-al 2010 | Phylogenetic Tree | Sequence Alignment

Clustal-W - improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific ... T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205-217. Nye, T.M., Lio, P., ... In order to be sure of the quality of the alignments and the reading frame, we aligned nucleotide sequences using the ... The reading frame of each nucleotide sequence was determined using the emboss wise2 software and the guided alignment was done ...
more infohttps://www.scribd.com/document/149164216/lopez-et-al-2010

sequence alignment Protocols and Video...'sequence alignment' Protocols and Video...

... sequence alignment include Semi-automated Biopanning of Bacterial Display Libraries for Peptide Affinity Reagent Discovery ... Targeted RNA Sequencing Assay to Characterize Gene Expression and Genomic Alterations, Targeted Next-Generation Sequencing ... End Sequencing Library Preparation with A-seq2, A Rapid and Facile Pipeline for Generating Genomic Point Mutants in C. ... Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web, Protein WISDOM: A Workbench for In silico De ...
more infohttps://www.jove.com/keyword/sequence+alignment
  • Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. (sanbi.ac.za)
  • RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. (sanbi.ac.za)
  • We show how the use of Intel tools such as Pintools, VTune™ tools, and the Intel® Cilk™ language allow analyzing and optimizing the performance of the widely-used BWA aln program for alignment. (intel.com)
  • Using this paper as a reference, it was straight forward to add the required OpenMP code to the most recent version of Clustal (I only modified the first stage pairwise alignment portion of the code). (bioinformatics.org)
  • Mathematica 7 adds industrial-strength state-of-the-art sequence analysis tools. (wolfram.com)
  • SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. (nih.gov)
  • a ) Alignments of one pair of reads and three single-end reads. (nih.gov)
  • This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. (sanbi.ac.za)
  • However, most interesting problems require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be aligned solely by human effort. (wikipedia.org)
  • The process of evaluating a sequence alignment involves checking that the length of the matching region and the percent identity with the matching sequence are sufficient to infer shared function. (geneontology.org)
  • The obtained machine compile the common features of the sequences, and can be used to align these sequences. (springer.com)
  • when we will emphasize their differences and use '比对过程' and '比对结果' to denote 'align' and 'alignment', respectively. (coursera.org)
  • The experimentation carried out compare the performance of our method and previous alignment methods. (springer.com)
  • This evaluation may be carried out by the curator, when sequence analysis is performed by the curators, or by authors of a published paper, when the curator is making annotations based on literature. (geneontology.org)
  • Sequence alignments are also used for non-biological sequences, such as calculating the edit distance cost between strings in a natural language or in financial data. (wikipedia.org)
  • T his page presents several data matrices and DNA sequence alignments published or presented by staff and collaborators of the Lewis B. and Dorothy Cullman Program for Molecular Systematics Studies at The New York Botanical Garden. (nybg.org)
  • The input data forpairwise sequence alignment are two sequences S1 and S2. (coursera.org)