Sequence Alignment: The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.Genome, Bacterial: The genetic complement of a BACTERIA as represented in its DNA.Genome: The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA.Molecular Sequence Data: Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.Sequence Analysis, DNA: A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis.Sequence Analysis, Protein: A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.Software: Sequential operating programs and data which instruct the functioning of a digital computer.Phylogeny: The relationships of groups of organisms as reflected by their genetic makeup.Genome, Viral: The complete genetic complement contained in a DNA or RNA molecule in a virus.Algorithms: A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.Amino Acid Sequence: The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION.Computational Biology: A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets.Evolution, Molecular: The process of cumulative change at the level of DNA; RNA; and PROTEINS, over successive generations.Sequence Homology, Amino Acid: The degree of similarity between sequences of amino acids. This information is useful for the analyzing genetic relatedness of proteins and species.Genomics: The systematic study of the complete DNA sequences (GENOME) of organisms.Genome, Plant: The genetic complement of a plant (PLANTS) as represented in its DNA.Genome, Human: The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs.Conserved Sequence: A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often composed of conserved sequences.Proteins: Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.Databases, Protein: Databases containing information about PROTEINS such as AMINO ACID SEQUENCE; PROTEIN CONFORMATION; and other properties.Internet: A loose confederation of computer communication networks around the world. The networks that make up the Internet are connected through several backbone networks. The Internet grew out of the US Government ARPAnet project and was designed to facilitate information exchange.Genome, Mitochondrial: The genetic complement of MITOCHONDRIA as represented in their DNA.Sequence Homology: The degree of similarity between sequences. Studies of AMINO ACID SEQUENCE HOMOLOGY and NUCLEIC ACID SEQUENCE HOMOLOGY provide useful information about the genetic relatedness of genes, gene products, and species.Models, Molecular: Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures.User-Computer Interface: The portion of an interactive computer program that issues messages to and receives commands from a user.Databases, Genetic: Databases devoted to knowledge about specific genes and gene products.Genome, Archaeal: The genetic complement of an archaeal organism (ARCHAEA) as represented in its DNA.Sequence Analysis, RNA: A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE.Genome, Fungal: The complete gene complement contained in a set of chromosomes in a fungus.Sequence Analysis: A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information.Open Reading Frames: A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon (CODON, TERMINATOR).Software Validation: The act of testing the software for compliance with a standard.Sequence Homology, Nucleic Acid: The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function.Structural Homology, Protein: The degree of 3-dimensional shape similarity between proteins. It can be an indication of distant AMINO ACID SEQUENCE HOMOLOGY and used for rational DRUG DESIGN.Models, Genetic: Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment.Protein Structure, Tertiary: The level of protein structure in which combinations of secondary protein structures (alpha helices, beta sheets, loop regions, and motifs) pack together to form folded shapes called domains. Disulfide bridges between cysteines in two different parts of the polypeptide chain along with other interactions between the chains play a role in the formation and stabilization of tertiary structure. Small proteins usually consist of only one domain but larger proteins may contain a number of domains connected by segments of polypeptide chain which lack regular secondary structure.Databases, Nucleic Acid: Databases containing information about NUCLEIC ACIDS such as BASE SEQUENCE; SNPS; NUCLEIC ACID CONFORMATION; and other properties. Information about the DNA fragments kept in a GENE LIBRARY or GENOMIC LIBRARY is often maintained in DNA databases.Computer Graphics: The process of pictorial communication, between human and computers, in which the computer input and output have the form of charts, drawings, or other appropriate pictorial representation.DNA, Bacterial: Deoxyribonucleic acid that makes up the genetic material of bacteria.Chromosome Mapping: Any method used for determining the location of and relative distances between genes on a chromosome.Genome Size: The amount of DNA (or RNA) in one copy of a genome.Synteny: The presence of two or more genetic loci on the same chromosome. Extensions of this original definition refer to the similarity in content and organization between chromosomes, of different species for example.Multigene Family: A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes. Examples of multigene families include those that encode the hemoglobins, immunoglobulins, histocompatibility antigens, actins, tubulins, keratins, collagens, heat shock proteins, salivary glue proteins, chorion proteins, cuticle proteins, yolk proteins, and phaseolins, as well as histones, ribosomal RNA, and transfer RNA genes. The latter three are examples of reiterated genes, where hundreds of identical genes are present in a tandem array. (King & Stanfield, A Dictionary of Genetics, 4th ed)Protein Structure, Secondary: The level of protein structure in which regular hydrogen-bond interactions within contiguous stretches of polypeptide chain give rise to alpha helices, beta strands (which align to form beta sheets) or other types of coils. This is the first folding level of protein conformation.Computing Methodologies: Computer-assisted analysis and processing of problems in a particular area.Genetic Variation: Genotypic differences observed among individuals in a population.Species Specificity: The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species.Markov Chains: A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system.Bacterial Proteins: Proteins found in any species of bacterium.Cloning, Molecular: The insertion of recombinant DNA molecules from prokaryotic and/or eukaryotic sources into a replicating vehicle, such as a plasmid or virus vector, and the introduction of the resultant hybrid molecules into recipient cells without altering the viability of those cells.Computer Simulation: Computer-based representation of physical systems and phenomena such as chemical processes.Protein Conformation: The characteristic 3-dimensional shape of a protein, including the secondary, supersecondary (motifs), tertiary (domains) and quaternary structure of the peptide chain. PROTEIN STRUCTURE, QUATERNARY describes the conformation assumed by multimeric proteins (aggregates of more than one polypeptide chain).Databases, Factual: Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references.Binding Sites: The parts of a macromolecule that directly participate in its specific combination with another molecule.Genome, Protozoan: The complete genetic complement contained in a set of CHROMOSOMES in a protozoan.Genome, Chloroplast: The genetic complement of CHLOROPLASTS as represented in their DNA.Cluster Analysis: A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both.Gene Order: The sequential location of genes on a chromosome.Expressed Sequence Tags: Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived.Genome, Insect: The genetic complement of an insect (INSECTS) as represented in its DNA.INDEL Mutation: A mutation named with the blend of insertion and deletion. It refers to a length difference between two ALLELES where it is unknowable if the difference was originally caused by a SEQUENCE INSERTION or by a SEQUENCE DELETION. If the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region, it is also a FRAMESHIFT MUTATION.Molecular Sequence Annotation: The addition of descriptive information about the function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA record.Mutation: Any detectable and heritable change in the genetic material that causes a change in the GENOTYPE and which is transmitted to daughter cells and to succeeding generations.Models, Statistical: Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc.Likelihood Functions: Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters.Information Storage and Retrieval: Organized activities related to the storage, location, search, and retrieval of information.Base Composition: The relative amounts of the PURINES and PYRIMIDINES in a nucleic acid.Word Processing: Text editing and storage functions using computer software.Escherichia coli: A species of gram-negative, facultatively anaerobic, rod-shaped bacteria (GRAM-NEGATIVE FACULTATIVELY ANAEROBIC RODS) commonly found in the lower part of the intestine of warm-blooded animals. It is usually nonpathogenic, but some strains are known to produce DIARRHEA and pyogenic infections. Pathogenic strains (virotypes) are classified by their specific pathogenic mechanisms such as toxins (ENTEROTOXIGENIC ESCHERICHIA COLI), etc.Consensus Sequence: A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one which occurs most frequently at that site in the different sequences which occur in nature. The phrase also refers to an actual sequence which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by conserved sequences.Reproducibility of Results: The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results.Amino Acid Motifs: Commonly observed structural components of proteins formed by simple combinations of adjacent secondary structures. A commonly observed structure may be composed of a CONSERVED SEQUENCE which can be represented by a CONSENSUS SEQUENCE.Nucleic Acid Conformation: The spatial arrangement of the atoms of a nucleic acid or polynucleotide that results in its characteristic 3-dimensional shape.Database Management Systems: Software designed to store, manipulate, manage, and control data for specific uses.Genes, Bacterial: The functional hereditary units of BACTERIA.Recombination, Genetic: Production of new arrangements of DNA by various mechanisms such as assortment and segregation, CROSSING OVER; GENE CONVERSION; GENETIC TRANSFORMATION; GENETIC CONJUGATION; GENETIC TRANSDUCTION; or mixed infection of viruses.Pattern Recognition, Automated: In INFORMATION RETRIEVAL, machine-sensing or identification of visible patterns (shapes, forms, and configurations). (Harrod's Librarians' Glossary, 7th ed)DNA: A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine).Software Design: Specifications and instructions applied to the software.RNA: A polynucleotide consisting essentially of chains with a repeating backbone of phosphate and ribose units to which nitrogenous bases are attached. RNA is unique among biological macromolecules in that it can encode genetic information, serve as an abundant structural component of cells, and also possesses catalytic activity. (Rieger et al., Glossary of Genetics: Classical and Molecular, 5th ed)High-Throughput Nucleotide Sequencing: Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc.Mutagenesis, Site-Directed: Genetically engineered MUTAGENESIS at a specific site in the DNA molecule that introduces a base substitution, or an insertion or deletion.Gene Duplication: Processes occurring in various organisms by which new genes are copied. Gene duplication may result in a MULTIGENE FAMILY; supergenes or PSEUDOGENES.Gene Transfer, Horizontal: The naturally occurring transmission of genetic information between organisms, related or unrelated, circumventing parent-to-offspring transmission. Horizontal gene transfer may occur via a variety of naturally occurring processes such as GENETIC CONJUGATION; GENETIC TRANSDUCTION; and TRANSFECTION. It may result in a change of the recipient organism's genetic composition (TRANSFORMATION, GENETIC).Programming Languages: Specific languages used to prepare computer programs.Biological Evolution: The process of cumulative change over successive generations through which organisms acquire their distinguishing morphological and physiological characteristics.Repetitive Sequences, Nucleic Acid: Sequences of DNA or RNA that occur in multiple copies. There are several types: INTERSPERSED REPETITIVE SEQUENCES are copies of transposable elements (DNA TRANSPOSABLE ELEMENTS or RETROELEMENTS) dispersed throughout the genome. TERMINAL REPEAT SEQUENCES flank both ends of another sequence, for example, the long terminal repeats (LTRs) on RETROVIRUSES. Variations may be direct repeats, those occurring in the same direction, or inverted repeats, those opposite to each other in direction. TANDEM REPEAT SEQUENCES are copies which lie adjacent to each other, direct or inverted (INVERTED REPEAT SEQUENCES).Gene Expression Profiling: The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell.Archaea: One of the three domains of life (the others being BACTERIA and Eukarya), formerly called Archaebacteria under the taxon Bacteria, but now considered separate and distinct. They are characterized by: (1) the presence of characteristic tRNAs and ribosomal RNAs; (2) the absence of peptidoglycan cell walls; (3) the presence of ether-linked lipids built from branched-chain subunits; and (4) their occurrence in unusual habitats. While archaea resemble bacteria in morphology and genomic organization, they resemble eukarya in their method of genomic replication. The domain contains at least four kingdoms: CRENARCHAEOTA; EURYARCHAEOTA; NANOARCHAEOTA; and KORARCHAEOTA.Pseudogenes: Genes bearing close resemblance to known genes at different loci, but rendered non-functional by additions or deletions in structure that prevent normal transcription or translation. When lacking introns and containing a poly-A segment near the downstream end (as a result of reverse copying from processed nuclear RNA into double-stranded DNA), they are called processed genes.Polymerase Chain Reaction: In vitro method for producing large amounts of specific DNA or RNA fragments of defined length and sequence from small amounts of short oligonucleotide flanking sequences (primers). The essential steps include thermal denaturation of the double-stranded target molecules, annealing of the primers to their complementary sequences, and extension of the annealed primers by enzymatic synthesis with DNA polymerase. The reaction is efficient, specific, and extremely sensitive. Uses for the reaction include disease diagnosis, detection of difficult-to-isolate pathogens, mutation analysis, genetic testing, DNA sequencing, and analyzing evolutionary relationships.Genome, Helminth: The genetic complement of a helminth (HELMINTHS) as represented in its DNA.DNA, Intergenic: Any of the DNA in between gene-coding DNA, including untranslated regions, 5' and 3' flanking regions, INTRONS, non-functional pseudogenes, and non-functional repetitive sequences. This DNA may or may not encode regulatory functions.DNA Primers: Short sequences (generally about 10 base pairs) of DNA that are complementary to sequences of messenger RNA and allow reverse transcriptases to start copying the adjacent sequences of mRNA. Primers are used extensively in genetic and molecular biology techniques.DNA, Viral: Deoxyribonucleic acid that makes up the genetic material of viruses.Contig Mapping: Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.DNA Transposable Elements: Discrete segments of DNA which can excise and reintegrate to another site in the genome. Most are inactive, i.e., have not been found to exist outside the integrated state. DNA transposable elements include bacterial IS (insertion sequence) elements, Tn elements, the maize controlling elements Ac and Ds, Drosophila P, gypsy, and pogo elements, the human Tigger elements and the Tc and mariner elements which are found throughout the animal kingdom.Chromosomes, Artificial, Bacterial: DNA constructs that are composed of, at least, a REPLICATION ORIGIN, for successful replication, propagation to and maintenance as an extra chromosome in bacteria. In addition, they can carry large amounts (about 200 kilobases) of other sequence for a variety of bioengineering purposes.Oryza sativa: Annual cereal grass of the family POACEAE and its edible starchy grain, rice, which is the staple food of roughly one-half of the world's population.Pteridium: A plant genus of the family DENNSTAEDTIACEAE. Members contain ptaquiloside, braxin A1, and braxin B. The name is similar to brake fern (PTERIS).Human Genome Project: A coordinated effort of researchers to map (CHROMOSOME MAPPING) and sequence (SEQUENCE ANALYSIS, DNA) the human GENOME.DNA, Plant: Deoxyribonucleic acid that makes up the genetic material of plants.DNA, Complementary: Single-stranded complementary DNA synthesized from an RNA template by the action of RNA-dependent DNA polymerase. cDNA (i.e., complementary DNA, not circular DNA, not C-DNA) is used in a variety of molecular cloning experiments as well as serving as a specific hybridization probe.Viral Proteins: Proteins found in any species of virus.RNA, Viral: Ribonucleic acid that makes up the genetic material of viruses.Protein Folding: Processes involved in the formation of TERTIARY PROTEIN STRUCTURE.Bayes Theorem: A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result.Protein Binding: The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments.RNA, Untranslated: RNA which does not code for protein but has some enzymatic, structural or regulatory function. Although ribosomal RNA (RNA, RIBOSOMAL) and transfer RNA (RNA, TRANSFER) are also untranslated RNAs they are not included in this scope.Amino Acid Substitution: The naturally occurring or experimentally induced replacement of one or more AMINO ACIDS in a protein with another. If a functionally equivalent amino acid is substituted, the protein may retain wild-type activity. Substitution may also diminish, enhance, or eliminate protein function. Experimentally induced substitution is often used to study enzyme activities and binding site properties.Gene Library: A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences.Codon: A set of three nucleotides in a protein coding sequence that specifies individual amino acids or a termination signal (CODON, TERMINATOR). Most codons are universal, but some organisms do not produce the transfer RNAs (RNA, TRANSFER) complementary to all codons. These codons are referred to as unassigned codons (CODONS, NONSENSE).Chromosomes, Bacterial: Structures within the nucleus of bacterial cells consisting of or containing DNA, which carry genetic information essential to the cell.Pan troglodytes: The common chimpanzee, a species of the genus Pan, family HOMINIDAE. It lives in Africa, primarily in the tropical rainforests. There are a number of recognized subspecies.Tetraodontiformes: A small order of primarily marine fish containing 340 species. Most have a rotund or box-like shape. TETRODOTOXIN is found in their liver and ovaries.Gene Expression Regulation, Bacterial: Any of the processes by which cytoplasmic or intercellular factors influence the differential control of gene action in bacteria.Plasmids: Extrachromosomal, usually CIRCULAR DNA molecules that are self-replicating and transferable from one organism to another. They are found in a variety of bacterial, archaeal, fungal, algal, and plant species. They are used in GENETIC ENGINEERING as CLONING VECTORS.Substrate Specificity: A characteristic feature of enzyme activity in relation to the kind of substrate on which the enzyme or catalytic molecule reacts.DNA, Mitochondrial: Double-stranded DNA of MITOCHONDRIA. In eukaryotes, the mitochondrial GENOME is circular and codes for ribosomal RNAs, transfer RNAs, and about 10 proteins.DNA, Chloroplast: Deoxyribonucleic acid that makes up the genetic material of CHLOROPLASTS.Bone Malalignment: Displacement of bones out of line in relation to joints. It may be congenital or traumatic in origin.Genes, Plant: The functional hereditary units of PLANTS.Artificial Intelligence: Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language.Sensitivity and Specificity: Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed)Genes, Viral: The functional hereditary units of VIRUSES.Introns: Sequences of DNA in the genes that are located between the EXONS. They are transcribed along with the exons but are removed from the primary gene transcript by RNA SPLICING to leave mature RNA. Some introns code for separate genes.Catalytic Domain: The region of an enzyme that interacts with its substrate to cause the enzymatic reaction.Recombinant Proteins: Proteins prepared by recombinant DNA technology.Base Pairing: Pairing of purine and pyrimidine bases by HYDROGEN BONDING in double-stranded DNA or RNA.Mutagenesis, Insertional: Mutagenesis where the mutation is caused by the introduction of foreign DNA sequences into a gene or extragenic sequence. This may occur spontaneously in vivo or be experimentally induced in vivo or in vitro. Proviral DNA insertions into or adjacent to a cellular proto-oncogene can interrupt GENETIC TRANSLATION of the coding sequences or interfere with recognition of regulatory elements and cause unregulated expression of the proto-oncogene resulting in tumor formation.Models, Chemical: Theoretical representations that simulate the behavior or activity of chemical processes or phenomena; includes the use of mathematical equations, computers, and other electronic equipment.Physical Chromosome Mapping: Mapping of the linear order of genes on a chromosome with units indicating their distances by using methods other than genetic recombination. These methods include nucleotide sequencing, overlapping deletions in polytene chromosomes, and electron micrography of heteroduplex DNA. (From King & Stansfield, A Dictionary of Genetics, 5th ed)Polymorphism, Single Nucleotide: A single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population.Transcription, Genetic: The biosynthesis of RNA carried out on a template of DNA. The biosynthesis of DNA from an RNA template is called REVERSE TRANSCRIPTION.Bacteriophages: Viruses whose hosts are bacterial cells.Structure-Activity Relationship: The relationship between the chemical structure of a compound and its biological or pharmacological activity. Compounds are often classed together because they have structural characteristics in common including shape, size, stereochemical arrangement, and distribution of functional groups.Plant Proteins: Proteins found in plants (flowers, herbs, shrubs, trees, etc.). The concept does not include proteins found in vegetables for which VEGETABLE PROTEINS is available.Exons: The parts of a transcript of a split GENE remaining after the INTRONS are removed. They are spliced together to become a MESSENGER RNA or other functional RNA.Crystallography, X-Ray: The study of crystal structure using X-RAY DIFFRACTION techniques. (McGraw-Hill Dictionary of Scientific and Technical Terms, 4th ed)Prophages: Genomes of temperate BACTERIOPHAGES integrated into the DNA of their bacterial host cell. The prophages can be duplicated for many cell generations until some stimulus induces its activation and virulence.RNA, Ribosomal: The most abundant form of RNA. Together with proteins, it forms the ribosomes, playing a structural role and also a role in ribosomal binding of mRNA and tRNAs. Individual chains are conventionally designated by their sedimentation coefficients. In eukaryotes, four large chains exist, synthesized in the nucleolus and constituting about 50% of the ribosome. (Dorland, 28th ed)Metabolic Networks and Pathways: Complex sets of enzymatic reactions connected to each other via their product and substrate metabolites.Prokaryotic Cells: Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region.Genes, Archaeal: The functional genetic units of ARCHAEA.Chromosomes, Plant: Complex nucleoprotein structures which contain the genomic DNA and are part of the CELL NUCLEUS of PLANTS.Genetic Markers: A phenotypically recognizable genetic trait which can be used to identify a genetic locus, a linkage group, or a recombination event.Gene Deletion: A genetic rearrangement through loss of segments of DNA or RNA, bringing sequences which are normally separated into close proximity. This deletion may be detected using cytogenetic techniques and can also be inferred from the phenotype, indicating a deletion at one specific locus.Virulence: The degree of pathogenicity within a group or species of microorganisms or viruses as indicated by case fatality rates and/or the ability of the organism to invade the tissues of the host. The pathogenic capacity of an organism is determined by its VIRULENCE FACTORS.Genomic Library: A form of GENE LIBRARY containing the complete DNA sequences present in the genome of a given organism. It contrasts with a cDNA library which contains only sequences utilized in protein coding (lacking introns).Genotype: The genetic constitution of the individual, comprising the ALLELES present at each GENETIC LOCUS.Work Simplification: The construction or arrangement of a task so that it may be done with the greatest possible efficiency.Plant Diseases: Diseases of plants.Catalysis: The facilitation of a chemical reaction by material (catalyst) that is not consumed by the reaction.Eukaryotic Cells: Cells of the higher organisms, containing a true nucleus bounded by a nuclear membrane.Arabidopsis: A plant genus of the family BRASSICACEAE that contains ARABIDOPSIS PROTEINS and MADS DOMAIN PROTEINS. The species A. thaliana is used for experiments in classical plant genetics as well as molecular genetic studies in plant physiology, biochemistry, and development.Plants: Multicellular, eukaryotic life forms of kingdom Plantae (sensu lato), comprising the VIRIDIPLANTAE; RHODOPHYTA; and GLAUCOPHYTA; all of which acquired chloroplasts by direct endosymbiosis of CYANOBACTERIA. They are characterized by a mainly photosynthetic mode of nutrition; essentially unlimited growth at localized regions of cell divisions (MERISTEMS); cellulose within cells providing rigidity; the absence of organs of locomotion; absence of nervous and sensory systems; and an alternation of haploid and diploid generations.Selection, Genetic: Differential and non-random reproduction of different genotypes, operating to alter the gene frequencies within a population.RNA, Transfer: The small RNA molecules, 73-80 nucleotides long, that function during translation (TRANSLATION, GENETIC) to align AMINO ACIDS at the RIBOSOMES in a sequence determined by the mRNA (RNA, MESSENGER). There are about 30 different transfer RNAs. Each recognizes a specific CODON set on the mRNA through its own ANTICODON and as aminoacyl tRNAs (RNA, TRANSFER, AMINO ACYL), each carries a specific amino acid to the ribosome to add to the elongating peptide chains.Probability: The study of chance processes or the relative frequency characterizing a chance process.Genes, Overlapping: Genes whose nucleotide sequences overlap to some degree. The overlapped sequences may involve structural or regulatory genes of eukaryotic or prokaryotic cells.Amino Acids: Organic compounds that generally contain an amino (-NH2) and a carboxyl (-COOH) group. Twenty alpha-amino acids are the subunits which are polymerized to form proteins.RNA, Bacterial: Ribonucleic acid in bacteria having regulatory and catalytic roles as well as involvement in protein synthesis.Mammals: Warm-blooded vertebrate animals belonging to the class Mammalia, including all that possess hair and suckle their young.Nucleotide Motifs: Commonly observed BASE SEQUENCE or nucleotide structural components which can be represented by a CONSENSUS SEQUENCE or a SEQUENCE LOGO.Quality Control: A system for verifying and maintaining a desired level of quality in a product or process by careful planning, use of proper equipment, continued inspection, and corrective action as required. (Random House Unabridged Dictionary, 2d ed)Archaeal Proteins: Proteins found in any species of archaeon.Cattle: Domesticated bovine animals of the genus Bos, usually kept on a farm or ranch and used for the production of meat or dairy products or for heavy labor.Saccharomyces cerevisiae: A species of the genus SACCHAROMYCES, family Saccharomycetaceae, order Saccharomycetales, known as "baker's" or "brewer's" yeast. The dried form is used as a dietary supplement.Oligonucleotide Array Sequence Analysis: Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING.Entropy: The measure of that part of the heat or energy of a system which is not available to perform work. Entropy increases in all natural (spontaneous and irreversible) processes. (From Dorland, 28th ed)Chromosomes: In a prokaryotic cell or in the nucleus of a eukaryotic cell, a structure consisting of or containing DNA which carries the genetic information essential to the cell. (From Singleton & Sainsbury, Dictionary of Microbiology and Molecular Biology, 2d ed)Data Compression: Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.Kinetics: The rate dynamics in chemical or physical systems.Monte Carlo Method: In statistics, a technique for numerically approximating the solution of a mathematical problem by studying the distribution of some random variable, often generated by a computer. The name alludes to the randomness characteristic of the games of chance played at the gambling casinos in Monte Carlo. (From Random House Unabridged Dictionary, 2d ed, 1993)Siphoviridae: A family of BACTERIOPHAGES and ARCHAEAL VIRUSES which are characterized by long, non-contractile tails.Symbiosis: The relationship between two different species of organisms that are interdependent; each gains benefits from the other or a relationship between different species where both of the organisms in question benefit from the presence of the other.Proteome: The protein complement of an organism coded for by its genome.ComputersPhenotype: The outward appearance of the individual. It is the product of interactions between genes, and between the GENOTYPE and the environment.Sarcocystidae: A family of parasitic organisms in the order EIMERIIDAE. They form tissue-cysts in their intermediate hosts, ultimately leading to pathogenesis in the final hosts that includes various mammals (including humans) and birds. The most important genera include NEOSPORA; SARCOCYSTIS; and TOXOPLASMA.Nucleotides: The monomeric units from which DNA or RNA polymers are constructed. They consist of a purine or pyrimidine base, a pentose sugar, and a phosphate group. (From King & Stansfield, A Dictionary of Genetics, 4th ed)Data Interpretation, Statistical: Application of statistical procedures to analyze specific observed or assumed facts from a particular study.Retroelements: Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome. Long terminal repeats (LTRs) similar to those from retroviruses are contained in retrotransposons and retrovirus-like elements. Retroposons, such as LONG INTERSPERSED NUCLEOTIDE ELEMENTS and SHORT INTERSPERSED NUCLEOTIDE ELEMENTS do not contain LTRs.Benchmarking: Method of measuring performance against established standards of best practice.Sequence Deletion: Deletion of sequences of nucleic acids from the genetic material of an individual.Restriction Mapping: Use of restriction endonucleases to analyze and generate a physical map of genomes, genes, or other segments of DNA.RNA, Messenger: RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm.Cell Line: Established cell cultures that have the potential to propagate indefinitely.Chickens: Common name for the species Gallus gallus, the domestic fowl, in the family Phasianidae, order GALLIFORMES. It is descended from the red jungle fowl of SOUTHEAST ASIA.Interspersed Repetitive Sequences: Copies of transposable elements interspersed throughout the genome, some of which are still active and often referred to as "jumping genes". There are two classes of interspersed repetitive elements. Class I elements (or RETROELEMENTS - such as retrotransposons, retroviruses, LONG INTERSPERSED NUCLEOTIDE ELEMENTS and SHORT INTERSPERSED NUCLEOTIDE ELEMENTS) transpose via reverse transcription of an RNA intermediate. Class II elements (or DNA TRANSPOSABLE ELEMENTS - such as transposons, Tn elements, insertion sequence elements and mobile gene cassettes of bacterial integrons) transpose directly from one site in the DNA to another.Models, Biological: Theoretical representations that simulate the behavior or activity of biological processes or diseases. For disease models in living animals, DISEASE MODELS, ANIMAL is available. Biological models include the use of mathematical equations, computers, and other electronic equipment.Genome Components: The parts of a GENOME sequence that are involved with the different functions or properties of genomes as a whole as opposed to those of individual GENES.Transcription Factors: Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process.Computer Communication Networks: A system containing any combination of computers, computer terminals, printers, audio or visual display devices, or telephones interconnected by telecommunications equipment or cables: used to transmit or receive information. (Random House Unabridged Dictionary, 2d ed)Gene Rearrangement: The ordered rearrangement of gene regions by DNA recombination such as that which occurs normally during development.Enzyme Stability: The extent to which an enzyme retains its structural conformation or its activity when subjected to storage, isolation, and purification or various other physical or chemical manipulations, including proteolytic enzymes and heat.Radiation Hybrid Mapping: A method for ordering genetic loci along CHROMOSOMES. The method involves fusing irradiated donor cells with host cells from another species. Following cell fusion, fragments of DNA from the irradiated cells become integrated into the chromosomes of the host cells. Molecular probing of DNA obtained from the fused cells is used to determine if two or more genetic loci are located within the same fragment of donor cell DNA.Mutagenesis: Process of generating a genetic MUTATION. It may occur spontaneously or be induced by MUTAGENS.Swine: Any of various animals that constitute the family Suidae and comprise stout-bodied, short-legged omnivorous mammals with thick skin, usually covered with coarse bristles, a rather long mobile snout, and small tail. Included are the genera Babyrousa, Phacochoerus (wart hogs), and Sus, the latter containing the domestic pig (see SUS SCROFA).DNA Mutational Analysis: Biochemical identification of mutational changes in a nucleotide sequence.Myoviridae: A family of BACTERIOPHAGES and ARCHAEAL VIRUSES which are characterized by complex contractile tails.
  • One such field is anthropological genetics where the majority of studies have been locus-specific e.g., [ 2 , 3 ], rather than genome-wide. (biomedcentral.com)
  • The resulting alignments from our pipeline can also be integrated into other applications that require alignments, such as calculation of population genetics summary statistics or genome-wide applications, such as phylogenetic analyses of windows across the entire chromosomes. (biomedcentral.com)
  • Thus, in order to understand this aspect of the genetics of ASD and other human diseases, we must understand the mutational processes that give rise to human genetic diversity and the intrinsic and extrinsic forces that shape patterns of variation in the genome. (pubmedcentralcanada.ca)
  • The sequence of YJM789 contains clues to pathogenicity and spurs the development of more powerful approaches to dissecting the genetic basis of complex hereditary traits. (pnas.org)
  • Globally high divergence at the sequence level has been inferred from genetic crosses ( 19 ), from sequencing portions of its genome ( 1 , 25 ), and from hybridization to oligonucleotide arrays that could detect the presence of SNPs and insertions/deletions (indels) but not their sequence identity ( 1 , 25 - 28 ). (pnas.org)
  • The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. (biomedcentral.com)
  • Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. (biomedcentral.com)
  • The completion of its genome sequence ( C. elegans Sequencing Consortium 1998 ) provides a complete description of the genetic information, but decoding the program embedded in the sequence remains a challenge. (plos.org)
  • In order to facilitate ongoing Chlamydomonas research and explain the phenotypic variation, we mapped the genetic diversity within these strains using whole-genome resequencing. (plantcell.org)
  • Four BAC libraries constructed at Texas A&M University and at the Children's Hospital of Oakland Research Institute providing altogether ~28X coverage of the chicken genome have been screened with most of the suitable genes and markers that are found on the chicken genetic linkage map (Schmid et al. (kent.ac.uk)
  • These studies indicate that regional mutation rates are influenced by various properties of the genome and that no single factor can explain the observed patterns of genetic diversity and divergence in humans. (pubmedcentralcanada.ca)
  • The genome sequence will provide information complementary to the experimental data from our genetic study of this strain. (asm.org)
  • Many of the putative proteins show sequence relatedness to proteins from a great variety of other phages, supporting the hypothesis that this phage has evolved through the recombinational exchange of genetic information with other viruses. (asm.org)
  • The Axiom Genome-Wide BOS 1 Array Plate features the highest genetic coverage of 10 commercially important cattle breeds of any microarray on the market. (affymetrix.com)
  • The Axiom Genome-Wide BOS 1 Array Plate is designed to maximize genetic and physical SNP coverage and offers up to 35 percent more genetic coverage of commercially important cattle breeds, including Bos taurus, Bos indicus, as well as dairy and beef breeds. (affymetrix.com)
  • Developed in collaboration with key scientists in the bovine genotyping industry, the Axiom Genome-Wide BOS 1 Array Plate enables accurate genetic merit evaluations, genome-wide association studies to identify variations associated with disease, drug response, and other economically important traits, as well as biodiversity research and linkage disequilibrium (LD) studies. (affymetrix.com)
  • 96 Vibrio cholerae O1 isolates from five regions were characterized, and their genetic relatedness assessed using multi-locus variable-number tandem-repeat analysis (MLVA) and whole genome sequencing (WGS). (springer.com)
  • Here, we use two established methods, multi-locus variable-number tandem-repeat analysis (MLVA) and whole genome sequencing (WGS), to determine the genetic relatedness and establish transmission patterns of outbreak strains, which can lead to conclusions on the source(s) of these cholera outbreaks. (springer.com)
  • The International Genome Sample Resource (IGSR) has been established at EMBL-EBI to continue supporting data generated by the 1000 Genomes Project, supplemented with new data and new analysis. (internationalgenome.org)
  • Li H. , Handsaker B. , Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. (stackexchange.com)
  • And I should say those include HabMap and The 1000 Genomes Project. (coursera.org)
  • These platforms offer much reduced costs and an increased speed of data acquisition, but the length of the sequences acquired is much reduced, from 500-1000 base pairs, to as little as 25 base pairs per read. (utoronto.ca)
  • Recently, whole-genome sequencing (WGS) combined with core-genome MLST (cgMLST), whole-genome MLST, or single nucleotide polymorphism (SNP) calling has been introduced, further facilitating strain discrimination as well as data comparability between laboratories ( 13 - 17 ). (asm.org)
  • One of the cgMLST schemes developed uses a core genome set of 1,701 loci present in the majority of L. monocytogenes isolates ( 14 ). (asm.org)
  • Here we demonstrate that the alignment-free analysis method feature frequency profiling (FFP) can be used to rapidly construct phylogenetic trees of draft bacterial genome sequences on a standard desktop computer and that coupling with in silico genotyping methods gives useful information for comparative and clinical genomic and molecular epidemiology applications. (surrey.ac.uk)
  • Pattern Recognition in Sequences , Pattern Recognition in Secondary Structures , Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays , Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks . (wiley.com)
  • The length of 512 Mb represents 90.1-96.1% of the estimated haploid genome size of rose. (nature.com)
  • A gymnosperm megagametophyte (MGP) is maternally derived tissue found within each seed containing the same haploid genome that is contributed to the diploid zygote (embryo). (g3journal.org)
  • Reduced representation bisulfite sequencing (RRBS): Genome-wide sequncing technique that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. (otago.ac.nz)
  • I performed the first genome-wide study to provide single-nucleotide resolution DNA methylation profiles in human neutrophils and showed existence of wide-spread inter-individual variation in epigenetic marks in normal population. (otago.ac.nz)
  • For a multiple alignment consisting of 8 sequences with low similarity, the accuracy was improved (2-10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value, 105-1020) col-lected from a database. (psu.edu)
  • We also found significant sequence similarity of INS1378 to a pShuttle-SN vector that was in use in the 1980s in China to create a more immunogenic coronavirus (IPAK finding, details below, Option 4). (sott.net)
  • IPAK researchers found a sequence similarity between a pShuttle-SN recombination vector sequence and INS1378. (sott.net)
  • A tool that finds regions of similarity between biological sequences. (nih.gov)
  • We generated a high-quality genome sequence of Gossypioides kirkii (n = 12) using PacBio, Bionano, and Hi-C technologies, and compared this assembly to genome sequences of Kokia (n = 12) and Gossypium diploids (n = 13). (frontiersin.org)
  • In the second half of the course, we will 'zoom out' to compare entire genomes, where we see large scale mutations called genome rearrangements, seismic events that have heaved around large blocks of DNA over millions of years of evolution. (coursera.org)
  • Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations ( indels or gaps) that appear as hyphens in one or more of the sequences in the alignment. (wikipedia.org)
  • The structure of the 2019-NCoV virus genome provides a very strong clue on the likely origin of the virus. (sott.net)
  • Several baculovirus homologs were detected in the Hz-1 virus genome. (asm.org)
  • A recombined virus that naturally picked up a SARS-like spike protein in it N-terminus (3′ end) of the viral genome. (sott.net)
  • Other cellular homologs were also detected dispersed in the viral genome. (asm.org)